【Kubenates新增gpu节点调度】
在Kubernetes中,要使得GPU节点能够调度,需要确保集群中安装了NVIDIA的GPU驱动和相关的device plugin。以下是一个简单的步骤指导和示例代码,用于确保GPU节点可以被Kubernetes调度。
- 确保GPU驱动安装正确。
- 确保Kubernetes集群中的kubelet配置了
--feature-gates=Accelerators=true
。 - 确保安装了NVIDIA的device plugin。
示例代码(在GPU节点上):
# nvidia-device-plugin-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-daemonset
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-device-plugin-daemonset
spec:
containers:
- name: nvidia-device-plugin-container
image: nvidia/k8s-device-plugin:1.0.0-beta
volumeMounts:
- name: device-plugin-socket
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin-socket
hostPath:
path: /var/lib/kubelet/device-plugins
部署device plugin:
kubectl apply -f nvidia-device-plugin-daemonset.yaml
确保GPU资源在Pod规格中被请求:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: cuda-container
image: nvidia/cuda:9.0-devel
resources:
limits:
nvidia.com/gpu: 1 # 请求1个GPU
这样,Kubernetes集群就会调度GPU资源给请求它们的Pod。确保你的节点标签正确,以便调度器可以按期望的方式工作。
评论已关闭