GPU nodes will need the following installed.
To see if the container toolkit is correctly installed you can run the following and you should see the connected GPU devices.
#+-----------------------------------------------------------------------------------------+
#| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
#|-----------------------------------------+------------------------+----------------------+
#| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
#| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
#| | | MIG M. |
#|=========================================+========================+======================|
#| 0 NVIDIA GeForce GTX 1050 Ti Off | 00000000:1D:00.0 On | N/A |
#| 0% 40C P0 N/A / 90W | 1556MiB / 4096MiB | 0% Default |
#| | | N/A |
#+-----------------------------------------+------------------------+----------------------+
#
#+-----------------------------------------------------------------------------------------+
#| Processes: |
#| GPU GI CI PID Type Process name GPU Memory |
#| ID ID Usage |
#|=========================================================================================|
#+-----------------------------------------------------------------------------------------+
Apply the following YAML
|
Then you should see the Nvidia runtime class in your available runtime classes.
#NAME HANDLER AGE
#crun crun 13h
#lunatic lunatic 13h
#nvidia nvidia 13h
#nvidia-experimental nvidia-experimental 13h
#slight slight 13h
#spin spin 13h
#wasmedge wasmedge 13h
#wasmer wasmer 13h
#wasmtime wasmtime 13h
#wws wws 13h
Install the Nvidia K8s Device Plugin.
NVIDIA device plugin should have labelled the node as having an NVIDIA GPU:
|
You should see some results.
|
If everything is working the pod will run and go to state completed.