Streaming Spanner changes to BigQuery using Change Data Capture (CDC) semantics¶
This repo contains sample code to demonstrate how BigqueryIO's CDC functionality can be used to process data capture streams from different databases. Spanner Change Streams are used as the source of the data.
We are using a single table replication as an example. The source and destination table, orders, has three columns - id, status and description.
Setup¶
Environment for this demo is set up using Terraform. You would need to authenticate to the Terraform using credentials with sufficient privileges to enable several services and create required artifacts - Spanner database, BigQuery dataset and Google Cloud Storage bucket used by Dataflow.
To tell Terraform where create these artifacts create
terraform/terraform.tfvars file. At a minimum, it should define the id of the
project to be used:
The terraform/variables.tf file lists additional variables you can override in
that file.
Create the environment:
This will create all the artifacts and set up several environment variables. You can run this script multiple times without causing any side effects, e.g., if you Cloud Shell session disconnects, and you would like to try the following steps.
Compile the code and run the unit tests¶
Make sure to use Java SDK version earlier than 21 because Dataflow doesn't yet support it. This pipeline was tested with version 11.
Run a pipeline¶
You can check on the status of the pipeline by following the link which is printed to the console by this script.
Verify that the pipeline works¶
Once the pipeline is started and running, go to the Spanner console.
Create and modify several orders using these SQL statements as example:
Creating new orders¶
Updating orders¶
Please only use statuses 'NEW', 'SCHEDULED', 'PROCESSED', or 'DELETED' because the pipeline will only accept these values.
Deleting orders¶
Verify that your changes replicated into BigQuery by running the following query in the BigQuery SQL Studio:
It might take several seconds for the changes to appear in BigQuery