Best Practices for Faster Workload Cold Start

This doc provides best practices to help achieve faster workload cold start on Google Kubernetes Engine (GKE), and discusses factors that determine the workload startup latency.

Introduction

The cold start problem occurs when workloads are scheduled to nodes that haven't hosted the workloads before. Since the new node has no pre-existing container images, the initial startup time for the workloads can be significantly longer. This extended startup time can lead to latency issues on the overall application performance, especially for handling traffic surge by node autoscaling.

Best Practices

Use ephemeral storage with local SSDs or larger boot disks for Node

Provision ephemeral storage with local SSDs | Google Kubernetes Engine (GKE).

With this feature, you can create a node pool that uses ephemeral storage with local SSDs in an existing cluster running on GKE version 1.25.3-gke.1800 or later. And the local SSDs will also be used by kubelet and containerd as root dirs, which can improve the latency for container runtime operations such as image pull.

gcloud container node-pools create POOL_NAME \
    --cluster=CLUSTER_NAME \
    --ephemeral-storage-local-ssd count=<NUMBER_OF_DISKS> \
    --machine-type=MACHINE_TYPE

Nodes will mount the Kubelet and container runtime (docker or containerd) root directories on a local SSD. Then the container layer to be backed by the local SSD, with the IOPS and throughput documented on About local SSDs, which are usually more cost-effective than increasing the PD size, below is a brief comparison between them in us-central1, with the same cost, LocalSSD has ~3x throughput than PD, with which the image pull runs faster and reduces the workload startup latency.

With the same cost	LocalSSD		PD Balanced		Throughput Comparison
$ per month	Storage space (GB)	Throughput \ (MB/s) R W	Storage space (GB)	Throughput (MB/s) R+W	LocalSSD / PD (Read)	LocalSSD / PD (Write)
$	375	660 350	300	140	471%	250%
$$	750	1320 700	600	168	786%	417%
$$$	1125	1980 1050	900	252	786%	417%
$$$$	1500	2650 1400	1200	336	789%	417%

Enable container image streaming

Use Image streaming to pull container images | Google Kubernetes Engine (GKE)

When customers are using Artifact Registry for their containers and meet requirements, they can enable image streaming on the cluster by

gcloud container clusters create CLUSTER_NAME \
    --zone=COMPUTE_ZONE \
    --image-type="COS_CONTAINERD" \
    --enable-image-streaming

Customers can benefit from image streaming to allow workloads to start without waiting for the entire image to be downloaded, which leads to significant improvements in workload startup time. For example, Nvidia Triton Server (5.4GB container image) end-to-end startup time (from workload creation to server up for traffic) can be reduced from 191s to 30s.

Use Zstandard compressed container images

Zstandard compression is a feature supported in ContainerD. Please note that

Use the zstd builder in docker buildx

docker buildx create --name zstd-builder --driver docker-container \
  --driver-opt image=moby/buildkit:v0.10.3
docker buildx use zstd-builder

Build and push an image

IMAGE_URI=us-central1-docker.pkg.dev/<YOUR-CONTAINER-REPO>/example
IMAGE_TAG=v1

<Create your Dockerfile>

docker buildx build --file Dockerfile --output type=image,name=$IMAGE_URI:$IMAGE_TAG,oci-mediatypes=true,compression=zstd,compression-level=3,force-compression=true,push=true .

Now you can use IMAGE_URIfor your workload which will have zstd compression image format. And Zstandard benchmark shows zstd is >3x faster decompression than gzip (the current default).

Use a preloader DaemonSet to preload the base container on nodes

ContainerD reuse the image layers across different containers if they share the same base container. And the preloader DaemonSet can start running even before the GPU driver is installed (driver installation takes ~30 seconds). So it can preload required containers before the GPU workload can be scheduled to the GPU node and start image pulling ahead of time.

Below is an example of the preloader DaemonSet.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: container-preloader
  labels:
    k8s-app: container-preloader
spec:
  selector:
    matchLabels:
      k8s-app: container-preloader
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: container-preloader
        k8s-app: container-preloader
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-accelerator
                operator: Exists
      tolerations:
      - operator: "Exists"
      containers:
      - image: "<CONTAINER_TO_BE_PRELOADED>"
        name: container-preloader
        command: [ "sleep", "inf" ]

Use GCS Fuse to access DataSet via file system interface

Cloud Storage FUSE and CSI driver now available for GKE | Google Cloud Blog enables workloads to on-demand access GCS data with a local filesystem API.

Use VolumeSnapshot to quickly replicating data to pods by PVC with disk image

Using volume snapshots | Google Kubernetes Engine (GKE) with Disk image parameters to provision volumes used by Pods. This is because the disk image's base storage is reused by all disks created from it in the location, so new disk creation can be done much faster.