Using the solution
Improving GPU utilization in Kubernetes
The default setup of Visual Inspection AI Edge allows you to run one GPU model at a time.
If the workload is not latency sensitive and you want to run multiple VIAI models on the edge server, there are different strategies to consider:
- Run these models in CPU mode.
- Consider NVIDIA’s concurrency mechanisms, such as MIG / vGPU.
- Enable NVIDIA GPU Time-slicing.
In this section, you will learn how to enable GPU time-slicing on the Visual Inspection AI Edge server.
Note: According to the NVIDIA documentation, time-slicing is NOT recommended for production environments. If you need to share GPU resources across multiple models on production environments, consider installing more GPUs or leverage GPUs that support MIG.
Enabling time-slicing on th edge server
Run on Edge Server
-
Install Helm in the edge server
snap install helm --classic
-
Configure NVIDIA Helm repository.
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm repo update
-
Prepare the time-slicing configuration file.
Note that in this sample configuration, VIAI Edge advertises 10 replicas to the Anthos runtime. If you have two GPUs installed on the edge server, each will have 10 replicas, meaning device plugins advertise 20 GPUs to the Anthos.
You should monitor your GPU memory utilization in your development environment to understand actual requirements of running your machine learning models, and decide what is the best value for your workload.
To monitor GPU utilization, you use tools such as nvitop.
cat << EOF > /tmp/dp-example-config.yaml version: v1 flags: migStrategy: "none" failOnInitError: true nvidiaDriverRoot: "/" plugin: passDeviceSpecs: false deviceListStrategy: "envvar" deviceIDStrategy: "uuid" gfd: oneshot: false noTimestamp: false outputFile: /etc/kubernetes/node-feature-discovery/features.d/gfd sleepInterval: 60s sharing: timeSlicing: resources: - name: nvidia.com/gpu replicas: 10 EOF
-
Delete the existing NVIDIA device plugin.
kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system
-
Install new NVIDIA device plugin with the time-slicing configuration file.
export SETUP_DIR=/var/lib/viai export ANTHOS_MEMBERSHIP_NAME=<ANTHOS MEMBERSHIP NAME> export KUBECONFIG=$SETUP_DIR/bmctl-workspace/$ANTHOS_MEMBERSHIP_NAME/$ANTHOS_MEMBERSHIP_NAME-kubeconfig helm install nvdp nvdp/nvidia-device-plugin \ --version=0.12.2 \ --namespace nvidia-device-plugin \ --create-namespace \ --set gfd.enabled=true \ --set-file config.map.config=/tmp/dp-example-config.yaml
Where:
SETUP_DIR
: is the folder that contains VIAI Edge setup scripts. Defaults to/var/lib/viai
ANTHOS_MEMBERSHIP_NAME
Anthos membership of the edge server.KUBECONFIG
Anthos cluster Kube configuration file path. Defaults to:$SETUP_DIR/bmctl-workspace/$ANTHOS_MEMBERSHIP_NAME/$ANTHOS_MEMBERSHIP_NAME-kubeconfig
-
Wait until the GPU resources are advertised.
Wait for around a minute and run
kubectl describe node
, you should see that now the Anthos cluster node has 10nvidia.com/gpu
resources are allocatable.