Merge pull request #106 from Project-HAMi/kunlun_support

hami-robot[bot] · web-flow · commit eb400b8d5547 · 2025-09-26T05:36:58.000Z
Add kunlunxin vxpu support
diff --git a/OWNERS b/OWNERS
@@ -1,3 +1,8 @@
+reviewers:
+- archlitchi
+- wawa0210
+- windsonsea
 approvers:
 - archlitchi
 - wawa0210
+- windsonsea
diff --git a/docs/userguide/Device-supported.md b/docs/userguide/Device-supported.md
@@ -14,4 +14,5 @@ The table below lists the devices supported by HAMi:
 | GPU | Mthreads | MTT S4000 | ✅ | ✅ | ❌ |
 | GPU | Metax    | MXC500 | ✅ | ✅ | ❌ |
 | GCU | Enflame  | S60    | ✅ | ✅ | ❌ |
+| XPU | Kunlunxin | P800  | ✅ | ✅ | ❌ | 
 | DPU | Teco | Checking | In progress | In progress | ❌ |
diff --git a/docs/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md b/docs/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md
@@ -55,10 +55,4 @@ spec:
       resources:
         limits:
           kunlunxin.com/xpu: 4 # requesting 4 XPUs
-```
-
-:::note
-
-You can find more examples in examples folder soon.
-
-:::
+```
diff --git a/docs/userguide/Kunlunxin-device/enable-kunlunxin-vxpu.md b/docs/userguide/Kunlunxin-device/enable-kunlunxin-vxpu.md
@@ -0,0 +1,215 @@
+---
+title: Enable Kunlunxin VXPU
+---
+
+## Introduction
+
+This component supports multiplexing Kunlunxin XPU devices (P800-OAM) and provides the following vGPU-like multiplexing capabilities, Special thanks for rise-union and kunlunxin for contributing:
+
+***XPU Sharing***: Each task can occupy only a portion of the device, allowing multiple tasks to share a single XPU
+
+***Memory Allocation Limits***: You can now allocate XPUs using memory values (e.g., 24576M), and the component ensures that tasks do not exceed the allocated memory limit
+
+***Device UUID Selection***: You can specify to use or exclude specific XPU devices through annotations
+
+
+## Prerequisites
+* driver version >= 5.0.21.16
+* xpu-container-toolkit >= xpu_container_1.0.2-1
+* XPU device type: P800-OAM
+
+## Enable XPU-sharing Support
+
+* Deploy [vxpu-device-plugin]
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: vxpu-device-plugin
+rules:
+  - apiGroups: [""]
+    resources: ["pods"]
+    verbs: ["get", "list", "update", "watch", "patch"]
+  - apiGroups: [""]
+    resources: ["nodes"]
+    verbs: ["get", "list", "watch", "update", "patch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: vxpu-device-plugin
+subjects:
+  - kind: ServiceAccount
+    name: vxpu-device-plugin
+    namespace: kube-system
+roleRef:
+  kind: ClusterRole
+  name: vxpu-device-plugin
+  apiGroup: rbac.authorization.k8s.io
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: vxpu-device-plugin
+  namespace: kube-system
+  labels:
+    app.kubernetes.io/component: vxpu-device-plugin
+---
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: vxpu-device-plugin
+  namespace: kube-system
+  labels:
+    app.kubernetes.io/component: vxpu-device-plugin
+spec:
+  selector:
+    matchLabels:
+      app.kubernetes.io/component: vxpu-device-plugin
+  template:
+    metadata:
+      labels:
+        app.kubernetes.io/component: vxpu-device-plugin
+        hami.io/webhook: ignore
+    spec:
+      priorityClassName: "system-node-critical"
+      serviceAccountName: vxpu-device-plugin
+      containers:
+        - image: projecthami/vxpu-device-plugin:v1.0.0
+          name: device-plugin
+          resources:
+            requests:
+              memory: 500Mi
+              cpu: 500m
+            limits:
+              memory: 500Mi
+              cpu: 500m
+          args:
+            - xpu-device-plugin
+            - --memory-unit=MiB
+            - --resource-name=kunlunxin.com/vxpu
+            - -logtostderr
+          securityContext:
+            privileged: true
+            capabilities:
+              add: [ "ALL" ]
+          volumeMounts:
+            - name: device-plugin
+              mountPath: /var/lib/kubelet/device-plugins
+            - name: xre
+              mountPath: /usr/local/xpu
+            - name: dev
+              mountPath: /dev
+          env:
+            - name: NODE_NAME
+              valueFrom:
+                fieldRef:
+                  fieldPath: spec.nodeName
+            - name: KUBECONFIG
+              value: /etc/kubernetes/kubelet.conf
+      volumes:
+        - name: device-plugin
+          hostPath:
+            path: /var/lib/kubelet/device-plugins
+        - name: xre
+          hostPath:
+            path: /usr/local/xpu
+        - name: dev
+          hostPath:
+            path: /dev
+      nodeSelector:
+        xpu: "on"
+```
+
+
+:::note
+Default resource names are as follows:
+
+- `kunlunxin.com/vxpu` for VXPU count
+- `kunlunxin.com/vxpu-memory` for memory allocation
+
+You can customize these names using the parameters above.
+:::
+
+## Device Granularity Partitioning
+
+XPU P800-OAM supports 2 levels of partitioning granularity: 1/4 card and 1/2 card, with memory allocation automatically aligned. The rules are as follows:
+> - Requested memory ≤ 24576M (24G) will be automatically aligned to 24576M (24G)
+> - Requested memory > 24576M (24G) and ≤ 49152M (48G) will be automatically aligned to 49152M (48G)
+> - Requested memory > 49152M (48G) will be allocated as full cards
+
+## Running XPU Tasks
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: vxpu-pod-demo
+spec:
+  containers:
+    - name: vxpu-pod-demo
+      image: pytorch:resnet50
+      workingDir: /root
+      command: ["sleep","infinity"]
+      resources:
+        limits:
+          kunlunxin.com/vxpu: 1 # requesting a VXPU
+          kunlunxin.com/vxpu-memory: 24576 # requesting a virtual XPU that requires 24576 MiB of device memorymemory
+```
+
+## Device UUID Selection
+
+You can specify to use or exclude specific XPU devices through Pod annotations:
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: poddemo
+  annotations:
+    # Use specific XPU devices (comma-separated list)
+    hami.io/use-xpu-uuid: ""
+    # Or exclude specific XPU devices (comma-separated list)
+    hami.io/no-use-xpu-uuid: ""
+spec:
+  # ... rest of Pod configuration
+```
+
+> **Note:** Device ID format is `{BusID}`. You can find available device IDs in the node status.
+
+### Finding Device UUIDs
+
+You can use the following commands to find Kunlunxin P800-OAM XPU device UUIDs on nodes:
+
+```bash
+kubectl get pod <pod-name> -o yaml | grep -A 10 "hami.io/xpu-devices-allocated"
+```
+
+Or by checking node annotations:
+
+```bash
+kubectl get node <node-name> -o yaml | grep -A 10 "hami.io/node-register-xpu"
+```
+
+Look for annotations containing device information in the node annotations.
+
+
+## Important Notes
+
+The current Kunlun chip driver supports a maximum of 32 handles. Eight XPU devices occupy 8 handles, so it is not possible to split all 8 devices into 4 each.
+```yaml
+# valid
+kunlunxin.com/vxpu: 8
+
+# valid
+kunlunxin.com/vxpu: 6
+kunlunxin.com/vxpu-memory: 24576
+
+# valid
+kunlunxin.com/vxpu: 8
+kunlunxin.com/vxpu-memory: 49152
+
+# invalid
+kunlunxin.com/vxpu: 8 # not support
+kunlunxin.com/vxpu-memory: 24576
+```
diff --git a/docs/userguide/Kunlunxin-device/examples/allocate_vxpu.md b/docs/userguide/Kunlunxin-device/examples/allocate_vxpu.md
@@ -0,0 +1,24 @@
+---
+title: Allocate vxpu device
+---
+
+## Allocate vxpu device
+
+To allocate a certain part of device core resource, you need only to assign the `kunlunxin.com/vxpu` along with the `kunlunxin.com/vxpu-memory`
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: xpu-pod
+spec:
+  containers:
+    - name: ubuntu-container
+      image: ubuntu:22.04
+      imagePullPolicy: IfNotPresent
+      command: ["sleep","infinity"]
+      resources:
+        limits:
+          kunlunxin.com/vxpu: 1 # requesting 1 XPU
+          kunlunxin.com/vxpu-memory: 24576 # each XPU require 24576 MiB device memory
+```
diff --git a/docs/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md b/docs/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md
@@ -0,0 +1,23 @@
+---
+title: Allocate a whole xpu card
+---
+
+## Allocate exclusive device
+
+To allocate a whole xpu device, you need to only assign `kunlunxin.com/xpu` without other fields. You can allocate multiple XPUs for a container.
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: xpu-pod
+spec:
+  containers:
+    - name: ubuntu-container
+      image: ubuntu:22.04
+      imagePullPolicy: IfNotPresent
+      command: ["sleep","infinity"]
+      resources:
+        limits:
+          kunlunxin.com/xpu: 1 # requesting 1 XPU
+```
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Device-supported.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Device-supported.md
@@ -7,10 +7,13 @@ HAMi支持的设备视图如下表所示：
 
 | 生产商      | 制造商     | 类型        | 内存隔离       | 核心隔离       | 多卡支持         |
 |-------------|------------|-------------|-----------|---------------|-------------------|
-| GPU         | NVIDIA     | 全部        | ✅              | ✅            | ✅                |
-| MLU         | Cambricon  | 370, 590    | ✅              | ✅            | ❌                |
-| DCU         | Hygon      | Z100, Z100L | ✅              | ✅            | ❌                |
-| Ascend      | Huawei     | 910B, 910B3, 310P  | ✅              | ✅            | ❌                |
-| GPU         | iluvatar   | 全部        | ✅              | ✅            | ❌                |
-| GPU         | Mthreads   | MTT S4000   | ✅              | ✅            | ❌                |
-| DPU         | Teco       | 检查中      | 进行中         | 进行中        | ❌                |
+| GPU         | NVIDIA     | 全部         | ✅             | ✅            | ✅              |
+| MLU         | Cambricon  | 370, 590    | ✅              | ✅            | ❌              |
+| DCU         | Hygon      | Z100, Z100L | ✅              | ✅            | ❌              |
+| Ascend      | Huawei     | 910B, 910B3, 310P | ✅        | ✅            | ❌              |
+| GPU         | iluvatar   | 全部         | ✅              | ✅            | ❌              |
+| GPU         | Mthreads   | MTT S4000   | ✅              | ✅            | ❌              |
+| GPU         | Metax      | MXC500      | ✅              | ✅            | ❌              |
+| GCU         | Enflame    | S60         | ✅              | ✅            | ❌              |
+| XPU         | Kunlunxin  | P800        | ✅              | ✅            | ❌              | 
+| DPU         | Teco       | 检查中       | 进行中           | 进行中        | ❌              |
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md
@@ -0,0 +1,56 @@
+---
+title: 启用昆仑芯 GPU 拓扑感知调度
+---
+
+**昆仑芯 GPU 拓扑感知调度现在通过 `kunlunxin.com/xpu` 资源得到支持。**
+
+当在单个P800服务器上配置多个XPU时，当XPU卡连接到同一NUMA节点或互相之间可以直接连接时，性能会显著提升。从而在服务器上的所有 XPU 之间形成拓扑，如下所示：
+
+![img](../../resources/kunlunxin_topo.jpg)
+
+当用户作业请求一定数量的 `kunlunxin.com/xpu` 资源时，
+Kubernetes 将 Pod 调度到适当的节点上，目标是减少碎片化
+并最大化性能。然后 `xpu-device` 在选定的节点上执行细粒度分配
+请求的资源，遵循以下规则：
+
+1. 只允许 1、2、4 或 8 卡分配。  
+2. 1、2 或 4 个 XPU 的分配不能跨越 NUMA 节点。  
+3. 分配后应最小化碎片化。
+
+## 重要说明
+
+1. 这种模式**不支持**设备共享。  
+2. 这些功能已在昆仑芯 P800 硬件上进行了测试。
+
+## 前置条件
+
+* Kunlunxin driver >= v5.0.21
+* Kubernetes >= v1.23
+* kunlunxin k8s-device-plugin
+
+## 启用拓扑感知调度
+
+- 在 P800 节点上部署昆仑芯设备插件。
+  （请联系您的设备供应商获取相应的软件包和文档。）  
+- 按照 `README.md` 中的说明部署 HAMi。
+
+## 运行昆仑芯作业
+
+昆仑芯 P800 GPU 可以通过容器使用 `kunlunxin.com/xpu` 资源类型来请求。
+以下是 Pod 规范示例：
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: gpu-pod1
+spec:
+  containers:
+    - name: ubuntu-container
+      image: docker.io/library/ubuntu:latest
+      imagePullPolicy: IfNotPresent
+      command: ["sleep", "infinity"]
+      resources:
+        limits:
+          kunlunxin.com/xpu: 4 # 请求 4 个 XPU
+```
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-vxpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-vxpu.md
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_vxpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_vxpu.md
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md
diff --git a/sidebars.js b/sidebars.js