dataproc-templates

Developing Spark code with assistance from Gemini CLI

This section assume you are using Gemini CLI, which integrates with the existing development workflow, and can be used for code development and review. You can use Gemini from the browser to create code fragments or Antigravity which is an agentic development platform. All these tools use the same underlying Gemini model.

WARNING: Gemini can generate incorrect code, review before using.

Gemini CLI Installation and pricing

Benefits of developing code with Gemini CLI vs using Google provided dataproc templates

Some of the dataproc templates are tested nightly against Serverless for Apache Spark version 1.2. While they may work with other versions, they are not tested regularly. Here are the benefits of the code developed by Gemini CLI:

  1. Gemini works within user’s existing directory structure and reviews existing code before code generation. Code generated by Gemini integrates well with existing code.
  2. Gemini supports tranforming data from any source or sink to any other source or sink.
  3. Gemini is flexible and supports notebooks and other languages like Scala and R.
  4. User (GCP customer) owns the license on the code that is developed.
  5. With the correct prompt Gemini generated code incorporates nuances like partitions, writing in batches and using secret manager to fetch credentials.

    Sample Prompts for Gemini and modifications needed to run resulting programs

    This directory contains a number of samples that were generated by Gemini and then tested to ensure they functioned correctly. Some changes were required and are described in the documentation. The samples are as follows:

  6. Hive to BigQuery (Python) - See sample prompt and changes here.
  7. GCS to GCS (Python) - See sample prompt and changes here.
  8. JDBC to JDBC (Java) - See sample prompt and changes here.
  9. DeltaLake to Iceberg (Java) - See sample prompt and changes here.
  10. BigQuery to Elasticsearch (Python) - See sample prompt and changes here.
  11. Cassandra to BigQuery (Java) - See sample prompt and changes here.
  12. Kafka to Iceberg (Java) - See sample prompt and changes here.
  13. MongoDB to BigQuery (Java) - See sample prompt and changes here.