Using the solution

Improving GPU utilization in Kubernetes

The default setup of Visual Inspection AI Edge allows you to run one GPU model at a time.

If the workload is not latency sensitive and you want to run multiple VIAI models on the edge server, there are different strategies to consider:

Run these models in CPU mode.
Consider NVIDIA’s concurrency mechanisms, such as MIG / vGPU.
Enable NVIDIA GPU Time-slicing.

In this section, you will learn how to enable GPU time-slicing on the Visual Inspection AI Edge server.

Note: According to the NVIDIA documentation, time-slicing is NOT recommended for production environments. If you need to share GPU resources across multiple models on production environments, consider installing more GPUs or leverage GPUs that support MIG.

Enabling time-slicing on th edge server

Run on Edge Server

Install Helm in the edge server
```
 snap install helm --classic
```

Configure NVIDIA Helm repository.

 helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
 helm repo update

Prepare the time-slicing configuration file.

Note that in this sample configuration, VIAI Edge advertises 10 replicas to the Anthos runtime. If you have two GPUs installed on the edge server, each will have 10 replicas, meaning device plugins advertise 20 GPUs to the Anthos.

You should monitor your GPU memory utilization in your development environment to understand actual requirements of running your machine learning models, and decide what is the best value for your workload.

To monitor GPU utilization, you use tools such as nvitop.
```
 cat << EOF > /tmp/dp-example-config.yaml
 version: v1
 flags:
   migStrategy: "none"
   failOnInitError: true
   nvidiaDriverRoot: "/"
   plugin:
     passDeviceSpecs: false
     deviceListStrategy: "envvar"
     deviceIDStrategy: "uuid"
   gfd:
     oneshot: false
     noTimestamp: false
     outputFile: /etc/kubernetes/node-feature-discovery/features.d/gfd
     sleepInterval: 60s
 sharing:
   timeSlicing:
     resources:
     - name: nvidia.com/gpu
       replicas: 10
 EOF
```

Delete the existing NVIDIA device plugin.

 kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system

Install new NVIDIA device plugin with the time-slicing configuration file.

 export SETUP_DIR=/var/lib/viai
 export ANTHOS_MEMBERSHIP_NAME=<ANTHOS MEMBERSHIP NAME>
 export KUBECONFIG=$SETUP_DIR/bmctl-workspace/$ANTHOS_MEMBERSHIP_NAME/$ANTHOS_MEMBERSHIP_NAME-kubeconfig

 helm install nvdp nvdp/nvidia-device-plugin \
     --version=0.12.2 \
     --namespace nvidia-device-plugin \
     --create-namespace \
     --set gfd.enabled=true \
     --set-file config.map.config=/tmp/dp-example-config.yaml

Where:

SETUP_DIR: is the folder that contains VIAI Edge setup scripts. Defaults to /var/lib/viai
ANTHOS_MEMBERSHIP_NAME Anthos membership of the edge server.
KUBECONFIG Anthos cluster Kube configuration file path. Defaults to: $SETUP_DIR/bmctl-workspace/$ANTHOS_MEMBERSHIP_NAME/$ANTHOS_MEMBERSHIP_NAME-kubeconfig

Wait until the GPU resources are advertised.

Wait for around a minute and run kubectl describe node, you should see that now the Anthos cluster node has 10 nvidia.com/gpu resources are allocatable.