This implementation adresses a fundamental limitation of GPUs in Kubernetes wherein a physical GPU can only be allocated to a pod exclusively. Currently there is no out of the box feature for sharing a GPU across multiple pods (See this open kubernetes issue since 2017). Third-party solutions such as gpu-manager (GaiaGPU) or kubeshare implement fractional pod allocation / isolation by intercepting CUDA calls.
Kubernetes GPUSharing uses gpu-manager under the hood, adding several features on top to provide feasible sharing of GPU devices in production kubernetes clusters. Especially clusters with a large userbase profit from GPU sharing (Reduced waiting time, increased GPU utilization).
Warning: This repo is in a POC stage. Feel free to evaluate and contribute.
-
Allows fractional GPU allocations (E.g. 1/3 of a phsyical GPU) to pods through gpu-manager
-
Enforcement of GPU Request Governance:
- Deny malformed GPU device requests (E.g. resource request must 1<tencent.com/vcuda-core<100 || n * 100)
- Automated setting of required gpu-manager-specific annotation tags
-
Budgeting of GPU Resources:
- Global budget is assigned to every namespace
- While in budget, the pods in a namespace run with default priority
- When out of budget, a pod is started with reduced priority, thus evicted in favor of in-budget pods
- GPU quota implemented using default kubernetes ResourceQuotas
-
Invisible to users and applications. Just state the fractional GPU resource request in the pod/job/deployment specification (See Example)
apiVersion: v1
kind: Pod
metadata:
name: mnist-case
spec:
containers:
- image: localhost:5000/usecase/mnist:latest
name: nvidia
resources:
limits:
tencent.com/vcuda-core: 50 # 50% of a GPU
tencent.com/vcuda-memory: 15 # 15 * 256MB of VRAM
Is a sidecar container that is injected into the pods using GPU resources to report GPU budget consumption to GPU timekeeper.
Is a service that keeps track of per user (identified per namespace) consumption of GPU resources.
Is a Webservice that sits in the kubernetes admission flow to validate GPU request governance, assign priorities to pods and inject sidecar containers.
- Tested with Kubernetes >v1.20
- Only Nvidia GPUs supported
- Setup gpu-manager, including gpu-admission
For Quickstart use provided yaml files in deployment
using images on Dockerhub:
kubectl create -f deployment/
Currently the solution must be configured in code. Configuration outside the code is added shortly.
Configuration of quota in yaml:
deployment/quotas.yaml
Quota must also be set in code for correct calculation of budget consumption:
src/gpu-admission-webhook/src/app.py
Configure global per namespace GPU budget in minutes in:
src/gpu-timekeeper/app/main.py
Copyright (c) 2021 Timo Zerrer and others
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.