Skip to content

Load Apache Avro files to Cloud Spanner with Slowly Changing Dimensions (SCD) using Dataflow Flex template

Update 2026-01-19:

This project has been archived. You can still access the code by browsing the repository at commit 79ceca9

Customers have large volumes of transactional data with Slowly Changing Dimensions (SCD), which may need to be loaded to Cloud Spanner during migrations.

The Dataflow pipeline template in this solution allows customers to load exports (in Apache Avro format) from their current database or data warehouse to Cloud Spanner from a staged Cloud Storage Bucket.

The Dataflow pipeline supports the following SCD Types:

  • SCD Type 1: updates existing row if the primary key exists, or inserts a new row otherwise.

  • SCD Type 2: updates existing row's end date to the current timestamp if the primary key exists, and inserts a new row with null end date and start date with the current timestamp if the column is passed.