Index

Populate a Hyperdisk ML Disk from Google Cloud Storage

This guide uses the Google Cloud API to create a Hyperdisk ML disk from data in Cloud Storage and then use it in a GKE cluster. Refer to this documentation for instructions all in the GKE API

Create a new GCE instance that you will use to hydrate the new Hyperdisk ML disk with data. Note a c3-standard-44 instance is used to provide the max throughput while populating the hyperdisk(Instance to throughput rates).

VM_NAME=hydrator
MACHINE_TYPE=c3-standard-44
IMAGE_FAMILY=debian-11
IMAGE_PROJECT=debian-cloud
ZONE=us-central1-a
SNAP_SHOT_NAME=hdmlsnapshot
PROJECT_ID=myproject
DISK_NAME=model1

gcloud compute instances create $VM_NAME \
    --image-family=$IMAGE_FAMILY \
    --image-project=$IMAGE_PROJECT \
    --zone=$ZONE \
    --machine-type=$MACHINE_TYPE

gcloud compute ssh $VM_NAME

Update and authenticate the instance

sudo apt-get update
sudo apt-get install google-cloud-cli
gcloud init
gcloud auth login

Create and attach the disk to the new GCE VM.

SIZE=140
THROUGHPUT=2400

gcloud compute disks create $DISK_NAME --type=hyperdisk-ml \
--size=$SIZE --provisioned-throughput=$THROUGHPUT  \
--zone $ZONE

gcloud compute instances attach-disk $VM_NAME --disk=$DISK_NAME --zone=$ZONE

Log into the hydrator VM, format the volume, initiate transfer, and dismount the volume.

gcloud compute ssh $VM_NAME

Identify the device name (eg: /dev/nvme0n2) by looking at the output of lsblk. This should correspond to the disk that was attached in the previous step.

lsblk

Save device name given by lsblk(example nvme0n2)

DEVICE=/dev/nvme0n2

GCS_DIR=gs://vertex-model-garden-public-us-central1/llama2/llama2-70b-hf 
sudo /sbin/mkfs -t ext4 -E lazy_itable_init=0,lazy_journal_init=0,discard $DEVICE

sudo mount $DEVICE /mnt
gcloud storage cp -r $GCS_DIR /mnt
sudo umount /mnt

Detach disk from the hydrator and switch to READ_ONLY_MANY access mode.

gcloud compute instances detach-disk $VM_NAME --disk=$DISK_NAME --zone=$ZONE
gcloud compute disks update $DISK_NAME --access-mode=READ_ONLY_MANY  --zone=$ZONE

Create a snapshot from the disk to use as a template.

gcloud compute snapshots create $SNAP_SHOT_NAME \
    --source-disk-zone=$ZONE \
    --source-disk=$DISK_NAME \
    --project=$PROJECT_ID

You now have a hyperdisk ML snapshot populated with your data from Google Cloud Storage. You can delete the hydrator GCE instance and the original disk.

gcloud compute instances delete $VM_NAME --zone=$ZONE
gcloud compute disks delete $DISK_NAME --project $PROJECT_ID --zone $ZONE

In your GKE cluster create your Hypedisk ML multi zone and Hyperdisk ML storage classes. Hyperdisk ML disks are zonal and the Hyperdisk-ml-multi-zone storage class automatically provisions disks in zones where the pods using them are. Replace the zones in this class with the zones you want to allow the Hyperdisk ML snapshot to create disks in.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hyperdisk-ml-multi-zone
parameters:
  type: hyperdisk-ml
  provisioned-throughput-on-create: "2400Mi"
  enable-multi-zone-provisioning: "true"
provisioner: pd.csi.storage.gke.io
allowVolumeExpansion: false
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowedTopologies:
- matchLabelExpressions:
  - key: topology.gke.io/zone
    values:
    - us-central1-a
    - us-central1-c
--- 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: hyperdisk-ml
parameters:
    type: hyperdisk-ml
provisioner: pd.csi.storage.gke.io
allowVolumeExpansion: false
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Create a volumeSnapshotClass and VolumeSnapshotContent config to use your snapshot. Replace the VolumeSnapshotContent.spec.source.snapshotHandle with the path to your snapshot.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: my-snapshotclass
driver: pd.csi.storage.gke.io
deletionPolicy: Delete
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: restored-snapshot
spec:
  volumeSnapshotClassName: my-snapshotclass
  source:
    volumeSnapshotContentName: restored-snapshot-content
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  name: restored-snapshot-content
spec:
  deletionPolicy: Retain
  driver: pd.csi.storage.gke.io
  source:
    snapshotHandle: projects/[project_ID]/global/snapshots/[snapshotname]
  volumeSnapshotRef:
    kind: VolumeSnapshot
    name: restored-snapshot
    namespace: default

Reference your snapshot in the persistent volume claim. Be sure to adjust the spec.dataSource.name and spec.resources.requests.storage to your snapshot name and size.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: hdml-consumer-pvc
spec:
  dataSource:
    name: restored-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
  - ReadOnlyMany
  storageClassName: hyperdisk-ml-multi-zone
  resources:
    requests:
      storage: 140Gi

Add a reference to this PVC in your deployment spec.template.spec.volume.persistentVolumeClaim.claimName parameter.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  labels:
    app: busybox
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - image: busybox:latest
        name: busybox
        command:
          - "sleep"
          - "infinity"
        volumeMounts:
        - name: busybox-persistent-storage
          mountPath: /var/www/html
      volumes:
      - name: busybox-persistent-storage
        persistentVolumeClaim:
          claimName: hdml-consumer-pvc