Migration Monitoring Dashboard

The Migration Monitoring Dashboard is a custom dashboard created to provide greater visibility into the health and progress of various components of the migration.

Table of contents

Migration Monitoring Dashboard
1. Components in Monitoring Dashboard

Components in Monitoring Dashboard

Monitoring dashboard create by the Spanner is multi-component dashboard, divided into various sections to categorize similar metrics together. Following are the 5 sections:

Overview
Dataflow
Datastream
GCS Bucket
Pubsub
Spanner

Below sections describe each component in more detail.

Overview

The first section of the monitoring dashboard provides key graphs for insights on the migration progress.

Resource	Metric	Description	Aggregation	Relevance
Dataflow	Worker CPU Utilization	Shows the CPU Utilization of a dataflow worker	- 50th percentile worker CPU Utilization - 90th percentile worker CPU Utilization - Max percentile worker CPU Utilization	Used to identify if the pipelines is over or under scaled based on the value of CPU Utilization
Datastream	Throughput(events/sec)	Shows the average events processed/sec generated at source	Average	Used to track if data is being transferred from source to GCS Bucket
Datastream	Unsupported Events	Source events unsupported by Datastream	Sum	Used to identify if there is any data that can’t be transferred by datastream due to a correctness issue
Pubsub	Age of Oldest Unacknowledged Message	Age of the oldest unacknowledged message in the subscription	Max	Used to determine if starvation of dataflow resources is taking place
Spanner	CPU Utilization	CPU Utilization of spanner database and instance	- Database Total CPU Utilization - Instance Total CPU Utilization	Used to track if spanner is overloaded and requires more or less nodes
Spanner	Storage	Storage of spanner database and instance	- Database Total Storage - Instance Total Storage	Used to track how the data is growing as the migration proceeds

Dataflow Metrics

Metric	Description	Aggregation	Relevance
Worker CPU Utilization	Shows the CPU Utilization of a dataflow worker	- 50th percentile worker CPU Utilization - 90th percentile worker CPU Utilization - Max percentile worker CPU Utilization	Used to identify if the pipelines is over or under scaled based on the value of CPU Utilization
Worker Memory Utilization	Shows the Memory Utilization of a dataflow worker	- 50th percentile worker Memory Utilization - 90th percentile worker Memory Utilization - Max percentile worker Memory Utilization	Used to identify if the health of the pipeline based on the value of Memory Utilization
Worker Max Backlog Seconds	Shows max time required to consume the largest backlog across all stages for each dataflow worker	Max	Used to identify if the pipelines is over or under scaled

Datastream Metrics

Metric	Description	Aggregation	Relevance
Throughput(events/sec)	Shows the average events processed/sec generated at source	Sum	Used to track if data is being transferred from source to GCS Bucket
Unsupported Events	Source events unsupported by Datastream	Sum	Used to identify if there is any data that can’t be transferred by datastream due to a correctness issue
Total Latency	Time taken from event being written at source to being written by Datastream to GCS	- 50th Percentile event - 90th Percentile event	Indicator of datastream being overloaded

GCS Bucket Metrics

Metric	Description	Aggregation	Relevance
Total Bytes	Shows the total bytes written to the GCS Bucket	Sum	Used to check if data is successfully being written to the GCS Bucket

Pubsub Metrics

Metric	Description	Aggregation	Relevance
Published message count	Number of messages published by the GCS bucket to the Pub	Sum	Indicates total files in staging which need to be processed
Age of Oldest Unacknowledged Message	Age of the oldest unacknowledged message in the subscription	Max	Used to determine if starvation of dataflow resources is taking place

Spanner Metrics

Metric	Description	Aggregation	Relevance
CPU Utilization	CPU Utilization of spanner database and instance	- Database Total CPU Utilization - Instance Total CPU Utilization	Used to track if spanner is overloaded and requires more or less nodes
Storage	Storage of spanner database and instance	- Database Total Storage - Instance Total Storage	Used to track how the data is growing as the migration proceeds