IMPORTANT!

This is a fork of https://github.com/awslabs/aws-virtual-gpu-device-plugin.

AWS's original plugin does not support memory allocation via plugin, but by defining language specific arguments.

This fork is in active development, with following goals/challanges:

Support memory allocation via plugin
Support GPU allocation by model name
Produce telemetry

End goal is something like:

# On a server with 1 T4 and 2 v100 GPUs with 10 vGPU per device
    resources:
      limits:
        k8s.kuartis.com/nvidia-t4: 10
        k8s.kuartis.com/nvidia-t4: 16384
        k8s.kuartis.com/nvidia-v100: 20
        k8s.kuartis.com/nvidia-v100: 32768

Install and Test

# Label your GPU nodes
kubectl label node <node_name> k8s.kuartis.com/accelerator=vgpu

# Install daemonset + service + service monitor (prometheus)
kubectl create -f https://raw.githubusercontent.com/kuartis/kuartis-virtual-gpu-device-plugin/master/manifests/device-plugin.yml

# Notes about daemon set:
# - Uses nvml to find which processes use GPU resources
# - Mounts /proc folder to find container information from process id
# - Uses dockershim socket to read detailed container information

# You can set these variables:
# - --vgpu=<number_of_virtual_gpu_one_pyhsical_gpu_can_have> # Default is 10, Max 48
# - --allowmultigpu=<true|false> # Default is false. Prevents vGPU resources that one container can have to fall on different physical gpus.

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-device-query
spec:
  hostIPC: true # Required for MPS
  containers:
    - name: nvidia-device-query
      image: ghcr.io/kuartis/nvidia-device-query:1.0.0
      command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
      env:
        - name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT # Memory limit for GPU
          value: 0=2G # Read this: https://developer.nvidia.com/blog/revealing-new-features-in-the-cuda-11-5-toolkit/
      resources:
        limits:
          # Partition your GPUs inside daemon set with --vgpu=<number> argument
          # Request virtual gpu here
          k8s.kuartis.com/vgpu: '1'
      volumeMounts:
        - name: nvidia-mps
          mountPath: /tmp/nvidia-mps
  volumes:
    - name: nvidia-mps
      hostPath:
        path: /tmp/nvidia-mps

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
benchmark		benchmark
examples		examples
manifests		manifests
pkg/gpu/nvidia		pkg/gpu/nvidia
static/img		static/img
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

License

kuartis/kuartis-virtual-gpu-device-plugin

Folders and files

Latest commit

History

Repository files navigation

IMPORTANT!

Install and Test

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages