Skip to content

Commit ddcadbe

Browse files
Merge pull request #114 from DSFans2014/docs/volcano
docs: add docs for using ascend device in volcano
2 parents aac6d61 + fd2143d commit ddcadbe

File tree

3 files changed

+224
-0
lines changed

3 files changed

+224
-0
lines changed
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# User Guide for Ascend Devices in Volcano
2+
3+
## Introduction
4+
5+
Volcano supports vNPU feature for both Ascend 310 and Ascend 910 using the `ascend-device-plugin`. It also supports managing heterogeneous Ascend cluster(Cluster with multiple Ascend types, i.e. 910A,910B2,910B3,310p)
6+
7+
**Use case**:
8+
9+
- NPU and vNPU cluster for Ascend 910 series
10+
- NPU and vNPU cluster for Ascend 310 series
11+
- Heterogeneous Ascend cluster
12+
13+
This feature is only available in volcano >= 1.14.
14+
15+
## Quick Start
16+
17+
### Prerequisites
18+
19+
[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime)
20+
21+
### Install Volcano
22+
23+
```shell
24+
helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
25+
helm install volcano volcano-sh/volcano -n volcano-system --create-namespace
26+
```
27+
28+
Additional installation methods can be found [here](https://github.com/volcano-sh/volcano?tab=readme-ov-file#quick-start-guide).
29+
30+
### Label the Node with ascend=on
31+
32+
```shell
33+
kubectl label node {ascend-node} ascend=on
34+
```
35+
36+
### Deploy `hami-scheduler-device` config map
37+
38+
```shell
39+
kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-configmap.yaml
40+
```
41+
42+
### Deploy ascend-device-plugin
43+
44+
```shell
45+
kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-plugin.yaml
46+
```
47+
48+
For more information, refer to the [ascend-device-plugin documentation](https://github.com/Project-HAMi/ascend-device-plugin).
49+
50+
### Scheduler Config Update
51+
52+
Update the scheduler configuration:
53+
54+
```shell
55+
kubectl edit cm -n volcano-system volcano-scheduler-configmap
56+
```
57+
58+
```yaml
59+
kind: ConfigMap
60+
apiVersion: v1
61+
metadata:
62+
name: volcano-scheduler-configmap
63+
namespace: volcano-system
64+
data:
65+
volcano-scheduler.conf: |
66+
actions: "enqueue, allocate, backfill"
67+
tiers:
68+
- plugins:
69+
- name: predicates
70+
- name: deviceshare
71+
arguments:
72+
deviceshare.AscendHAMiVNPUEnable: true # enable ascend vnpu
73+
deviceshare.SchedulePolicy: binpack # scheduling policy. binpack / spread
74+
deviceshare.KnownGeometriesCMNamespace: kube-system
75+
deviceshare.KnownGeometriesCMName: hami-scheduler-device
76+
```
77+
78+
:::note
79+
80+
You may notice that, `volcano-vgpu` has its own `KnownGeometriesCMName` and `KnownGeometriesCMNamespace`, which means if you want to use both vNPU and vGPU in a same volcano cluster, you need to merge the configMap from both sides and set it here.
81+
82+
:::
83+
84+
## Usage
85+
86+
```yaml
87+
apiVersion: v1
88+
kind: Pod
89+
metadata:
90+
name: ascend-pod
91+
spec:
92+
schedulerName: volcano
93+
containers:
94+
- name: ubuntu-container
95+
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
96+
command: ["sleep"]
97+
args: ["100000"]
98+
resources:
99+
limits:
100+
huawei.com/Ascend310P: "1"
101+
huawei.com/Ascend310P-memory: "4096"
102+
103+
```
104+
105+
The supported Ascend chips and their `ResourceNames` are shown in the following table:
106+
107+
| ChipName | ResourceName | ResourceMemoryName |
108+
|-------|-------|-------|
109+
| 910A | huawei.com/Ascend910A | huawei.com/Ascend910A-memory |
110+
| 910B2 | huawei.com/Ascend910B2 | huawei.com/Ascend910B2-memory |
111+
| 910B3 | huawei.com/Ascend910B3 | huawei.com/Ascend910B3-memory |
112+
| 910B4 | huawei.com/Ascend910B4 | huawei.com/Ascend910B4-memory |
113+
| 910B4-1 | huawei.com/Ascend910B4-1 | huawei.com/Ascend910B4-1-memory |
114+
| 310P3 | huawei.com/Ascend310P | huawei.com/Ascend310P-memory |
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Volcano 中 Ascend 设备使用指南
2+
3+
## 介绍
4+
5+
Volcano 通过 `ascend-device-plugin` 支持 Ascend 310 和 Ascend 910 的 vNPU 功能。同时支持管理异构 Ascend 集群(包含多种 Ascend 类型的集群,例如 910A、910B2、910B3、310p)。
6+
7+
**使用场景**
8+
9+
- Ascend 910 系列的 NPU 和 vNPU 集群
10+
- Ascend 310 系列的 NPU 和 vNPU 集群
11+
- 异构 Ascend 集群
12+
13+
此功能仅在Volcano 1.14及以上版本中可用。
14+
15+
## 快速开始
16+
17+
### 环境要求
18+
19+
[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime)
20+
21+
### 安装Volcano
22+
23+
```shell
24+
helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
25+
helm install volcano volcano-sh/volcano -n volcano-system --create-namespace
26+
```
27+
28+
更多安装方式请参考[这里](https://github.com/volcano-sh/volcano?tab=readme-ov-file#quick-start-guide)
29+
30+
### 给 Ascend 设备打上 ascend=on 标签
31+
32+
```shell
33+
kubectl label node {ascend-node} ascend=on
34+
```
35+
36+
### 部署 hami-scheduler-device ConfigMap
37+
38+
```shell
39+
kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-configmap.yaml
40+
```
41+
42+
### 部署 ascend-device-plugin
43+
44+
```shell
45+
kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-plugin.yaml
46+
```
47+
更多信息请参考 [ascend-device-plugin 文档](https://github.com/Project-HAMi/ascend-device-plugin)
48+
49+
### 更新调度器配置
50+
51+
```shell
52+
kubectl edit cm -n volcano-system volcano-scheduler-configmap
53+
```
54+
55+
```yaml
56+
kind: ConfigMap
57+
apiVersion: v1
58+
metadata:
59+
name: volcano-scheduler-configmap
60+
namespace: volcano-system
61+
data:
62+
volcano-scheduler.conf: |
63+
actions: "enqueue, allocate, backfill"
64+
tiers:
65+
- plugins:
66+
- name: predicates
67+
- name: deviceshare
68+
arguments:
69+
deviceshare.AscendHAMiVNPUEnable: true # enable ascend vnpu
70+
deviceshare.SchedulePolicy: binpack # scheduling policy. binpack / spread
71+
deviceshare.KnownGeometriesCMNamespace: kube-system
72+
deviceshare.KnownGeometriesCMName: hami-scheduler-device
73+
```
74+
75+
:::note
76+
77+
您可能会注意到 `volcano-vgpu` 有自己的 `GeometriesCMName` 和 `KnownGeometriesCMNamespace`,这意味着如果要在同一个 Volcano 集群中同时使用 vNPU 和 vGPU,您需要合并两边的 configMap。
78+
79+
:::
80+
81+
## 使用方法
82+
83+
```yaml
84+
apiVersion: v1
85+
kind: Pod
86+
metadata:
87+
name: ascend-pod
88+
spec:
89+
schedulerName: volcano
90+
containers:
91+
- name: ubuntu-container
92+
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
93+
command: ["sleep"]
94+
args: ["100000"]
95+
resources:
96+
limits:
97+
huawei.com/Ascend310P: "1"
98+
huawei.com/Ascend310P-memory: "4096"
99+
100+
```
101+
支持的 Ascend 芯片及其对应的资源名称如下表所示:
102+
| ChipName | ResourceName | ResourceMemoryName |
103+
|-------|-------|-------|
104+
| 910A | huawei.com/Ascend910A | huawei.com/Ascend910A-memory |
105+
| 910B2 | huawei.com/Ascend910B2 | huawei.com/Ascend910B2-memory |
106+
| 910B3 | huawei.com/Ascend910B3 | huawei.com/Ascend910B3-memory |
107+
| 910B4 | huawei.com/Ascend910B4 | huawei.com/Ascend910B4-memory |
108+
| 910B4-1 | huawei.com/Ascend910B4-1 | huawei.com/Ascend910B4-1-memory |
109+
| 310P3 | huawei.com/Ascend310P | huawei.com/Ascend310P-memory |

sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ module.exports = {
3535
"installation/uninstall",
3636
"installation/webui-installation",
3737
"installation/how-to-use-volcano-vgpu",
38+
"installation/how-to-use-volcano-ascend",
3839
"installation/aws-installation"
3940
]
4041
},

0 commit comments

Comments
 (0)