The text between the line above and line below was written by a human. The rest of the document was created by Gemini. The initial prompt to Gemini was:
Create Spark job in Java to to migrate a table from Cassandra to BigQuery. Provide instructions to run this job on serverless spark in migrateCassandraToBigquery.md and provide a summary of the session in migrateCassandraToBigqueryREADME.md
Gemini generated the Java app, specifically the file CassandraToBigquery.java and the README file. Minor changes were required to accomodate that Bigquery .jar is included in Serverless Spark and jnr-posix is required by Cassandra. The working gcloud command is:
gcloud dataproc batches submit spark --project dataproc-templates --region us-central1 \
--batch="cassandra-to-bigquery-$(date +%s)" --class com.customer.app.CassandraToBigQuery --version=2.2 \
--jars=<REQUIRED_JARS> -- <CASSANDRA_HOST> <CASSANDRA_KEYSPACE> <CASSANDRA_KEYSPACE> \
<DATASET.TABLE> <TEMP_BUCKET> <WRITE_MODE>
mvn clean package
This will create target/spark-delta-to-iceberg-migration-1.0-SNAPSHOT.jar.
gsutil cp target/spark-delta-to-iceberg-migration-1.0-SNAPSHOT.jar gs://<your-gcs-bucket>/
Submit the Dataproc Serverless Job:
Use the gcloud command to submit the Spark job.
gcloud dataproc batches submit spark \
--project=<your-gcp-project-id> \
--region=<your-gcp-region> \
--batch=cassandra-to-bq-migration \
--class=com.customer.app.CassandraToBigQuery \
--jars=gs://<your-gcs-bucket>/spark-delta-to-iceberg-migration-1.0-SNAPSHOT.jar \
--subnet=<your-vpc-subnet> \
-- \
<cassandra.host> \
<cassandra.keyspace> \
<cassandra.table> \
<bq.table> \
<bq.temp.bucket> \
<write.mode>
<cassandra.host>: The IP address or hostname of the Cassandra cluster.<cassandra.keyspace>: The Cassandra keyspace.<cassandra.table>: The source table in Cassandra.<bq.table>: The destination table in BigQuery (format: project:dataset.table).<bq.temp.bucket>: A GCS bucket used for temporary storage during the BigQuery write process.<write.mode>: Spark save mode (e.g., Overwrite, Append, ErrorIfExists, Ignore).