Development Documentation
The documentation is targeted to developers. The guide below will help you to run certain tasks focused on contributors and project maintainers.
Run Unit Tests
When making any changes, run Unit Tests to ensure the pipeline is working as intended. You may want to extend these tests if the changes impact the underlying pipeline logic and use cases.
Run Tests Directly via Python
Pre-Requirements to Run Tests
Run Tests with Python
-
Clone this repository.
-
Access this project folder.
-
Install the
requirements-dev.txt
. (Optional) Consider using a virtual environment. -
Run the unit tests.
Run Tests via Container
Pre-Requirements to run the test container
-
Docker Engine or equivalent installed.
-
Docker Compose or equivalent installed.
Build and run the test container
-
Clone this repository.
-
Access this project folder.
-
Build the test container image.
-
Run the test container.
Run Pipeline Locally
For development purposes, you may want to run the pipeline locally using Apache Beam's Direct Runner.
Pre-Requirements
-
JDK 17 or higher.
-
An AlloyDB (or PostgreSQL compatible) database and a data file to upload it. If you do not have one, follow the steps under Run Dataflow Template first.
Run the Dataflow pipeline locally
-
Clone this repository.
-
Access this project folder.
-
Set up the following variables to your project values.
The variables mean the following:
-
BUCKET_NAME
is the name of the Google Cloud Storage bucket that will be used to read the data files. -
ALLOYDB_IP
is the IP or hostname for the AlloyDB instance. Your machine needs to be able to access this IP. You may need to use a Public IP for this. -
ALLOYDB_PASSWORD
is password for the AlloyDB instance. -
Install the
requirements.txt
. (Optional) Consider using a virtual environment. -
Run the pipeline locally.
python3 ./src/dataflow_gcs_to_alloydb.py \ --input_file_format=csv \ --input_file_pattern "gs://$BUCKET_NAME/dataflow-template/*.csv" \ --input_schema "id:int64;first_name:string;last_name:string;department:string;salary:float;hire_date:string" \ --alloydb_ip "$ALLOYDB_IP" \ --alloydb_password "$ALLOYDB_PASSWORD" \ --alloydb_table "employees"
To learn how to customize these flags, read the Configuration section.