Schema migration for Cassandra

Spanner migration tool makes some assumptions while performing data type conversion from Cassandra to Spanner(GoogleSQL). There are also nuances to handling certain specific data types. These are captured below.

Adapter Compatibility:

The Spanner migration tool supports only schema migration from Cassandra to the GoogleSQL dialect of Spanner. The generated schema includes cassandra_type annotations, ensuring compatibility with the Cassandra Adapter, which allows existing Cassandra applications to connect to Google Cloud Spanner (GoogleSQL) with minimal or no code changes.

Table of contents
  1. Adapter Compatibility:
  2. Data Type Mapping
  3. DECIMAL and VARINT
  4. UUID and TIMEUUID
  5. COUNTER
  6. DURATION
  7. TIME
  8. SET and LIST
  9. MAP
  10. Storage Use
  11. Primary Keys
  12. Column Nullability
  13. Foreign Keys
  14. Secondary Indexes
  15. Other Cassandra Types
  16. Note

Data Type Mapping

The Spanner migration tool maps Cassandra primitive types to Spanner(GoogleSQL) types as follows:

Cassandra Type Spanner(GoogleSQL) Type Notes
ASCII STRING(MAX)  
BIGINT INT64  
BLOB BYTES(MAX)  
BOOLEAN BOOL  
COUNTER INT64 Spanner(GoogleSQL) does not support a counter data type
DATE DATE  
DECIMAL, VARINT NUMERIC Potential changes of precision
DOUBLE FLOAT64  
FLOAT FLOAT32  
INET STRING(MAX)  
INT, SMALLINT,
TINYINT
INT64 Changes in storage size
TEXT STRING(MAX)  
TIME INT64 Spanner(GoogleSQL) doesn’t support a time data type
TIMESTAMP TIMESTAMP  
UUID, TIMEUUID STRING(MAX) Spanner(GoogleSQL) doesn’t validate the uuid
VARCHAR STRING(MAX)  

Unlike primitive types, Cassandra’s collection types such as Maps, Sets, and Lists do not have direct, one-to-one equivalents in Spanner(GoogleSQL). Their mapping typically involves:

Cassandra Type Spanner(GoogleSQL) Type Notes
SET ARRAY Spanner(GoogleSQL) doesn’t support a dedicated set data type. Use ARRAY columns to represent a set
LIST ARRAY Use ARRAY to store a list of typed objects
MAP JSON Spanner(GoogleSQL) doesn’t support a dedicated map type. Use JSON columns to represent maps

Spanner(GoogleSQL) does not support duration datatype of Cassandra. Along with duration datatype, all other types map to STRING(MAX).

DECIMAL and VARINT

Spanner(GoogleSQL)’s NUMERIC type can store up to 29 digits before the decimal point and up to 9 after the decimal point. Cassandra’s DECIMAL type can potentially support higher precision than this, so please verify that Spanner(GoogleSQL)’s NUMERIC support meets your application needs. Note that the remarks about DECIMAL apply equally to VARINT.

UUID and TIMEUUID

Cassandra has two primary identifier types often used for unique keys: UUID and TIMEUUID. UUID is a standard Type 4 UUID, generally randomly generated. TIMEUUID is a Type 1 UUID, which embeds a timestamp and is time-ordered, providing a natural chronological sorting. Cassandra’s drivers and functions are aware of the internal structure of these types.

Spanner(GoogleSQL) does not have a native UUID or TIMEUUID data type. Instead, these are typically stored using the STRING type (for the hexadecimal string representation) or BYTES (specifically BYTES(16) for the 16-byte UUID value)

When storing UUID or TIMEUUID data in Spanner(GoogleSQL), it does not perform intrinsic validation of the UUID’s internal structure or format (e.g., checking for correct version bits, variant bits, or a valid time component for TIMEUUID) from the source.

COUNTER

Cassandra’s COUNTER type provides atomic, distributed increments/decrements.

Spanner(GoogleSQL) doesn’t have a direct equivalent to Cassandra’s COUNTER. While we typically map this data to an INT64 column in Spanner(GoogleSQL), you’ll need to implement counter logic within your application’s transactions (read, increment, write) to ensure correctness.

DURATION

Cassandra has a DURATION type for periods of time. Spanner(GoogleSQL) doesn’t have a native equivalent, so we typically map this to a STRING (e.g., ISO 8601 format). So please ensure that your application handles this.

TIME

Cassandra has a TIME type for the time of day (nanoseconds since midnight). Spanner(GoogleSQL) doesn’t have a native equivalent, so we typically map this to an INT64 to store nanosecond duration. So please ensure that your application handles this.

SET and LIST

Cassandra uses SET (an unordered collection of unique elements) and LIST (an ordered collection of non-unique elements).

Both of these are typically mapped to Spanner(GoogleSQL)’s ARRAY type (e.g., SET<TEXT> to ARRAY<STRING(MAX)>, LIST<INT> to ARRAY<INT64>). When mapping SET to ARRAY, note that Spanner(GoogleSQL)’s ARRAY is ordered and allows duplicates. Therefore, your application must handle uniqueness if required.

MAP

Cassandra uses MAP for storing typed key-value pairs. Spanner(GoogleSQL) does not have a native MAP type. Cassandra’s MAP typically maps to Spanner(GoogleSQL)’s JSON type. Unlike Cassandra, Spanner(GoogleSQL) does not validate the internal JSON structure or types, so your application must ensure data integrity.

Storage Use

The Spanner migration tool maps several Cassandra types to Spanner(GoogleSQL) types that use more storage. For example, SMALLINT is a two-byte integer, but it maps to Spanner(GoogleSQL)’s INT64, an eight-byte integer.

Primary Keys

Spanner(GoogleSQL) requires primary keys for all tables. Spanner(GoogleSQL)’s primary key is derived as a composite of the Cassandra partition key and clustering key.

Column Nullability

Cassandra does not enforce all columns on all rows, so corresponding Spanner(GoogleSQL) columns are created as NULLABLE by default. Spanner(GoogleSQL) primary key columns, however, are inherently NOT NULL. We can explicitly define other columns as NOT NULL in Spanner(GoogleSQL) if Cassandra data guarantees a value.

Foreign Keys

Cassandra does not support native foreign key constraints. Therefore, no such constraints exist to convert when migrating from Cassandra to Spanner(GoogleSQL).

Secondary Indexes

The tool currently doesn’t support the migration of Cassandra secondary indexes to Spanner(GoogleSQL) secondary indexes.

Other Cassandra Types

Cassandra’s other complex types, such as nested collection types and User Defined Types (UDTs), are currently not natively supported in Spanner(GoogleSQL). By default, these types are mapped to STRING(MAX).

Note

See Migrating from Cassandra to Cloud Spanner(GoogleSQL) for details on data migration since currently SMT supports schema only migration.