|
| 1 | +# User Guide for Ascend Devices in Volcano |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | + Volcano supports vNPU feature for both Ascend 310 and Ascend 910 using the `ascend-device-plugin`. It also supports managing heterogeneous Ascend cluster(Cluster with multiple Ascend types, i.e. 910A,910B2,910B3,310p) |
| 6 | + |
| 7 | +**Use case**: |
| 8 | + |
| 9 | +- NPU and vNPU cluster for Ascend 910 series |
| 10 | +- NPU and vNPU cluster for Ascend 310 series |
| 11 | +- Heterogeneous Ascend cluster |
| 12 | + |
| 13 | +This feature is only available in volcano >= 1.14. |
| 14 | + |
| 15 | +## Quick Start |
| 16 | + |
| 17 | +### Prerequisites |
| 18 | + |
| 19 | +[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime) |
| 20 | + |
| 21 | +### Install Volcano |
| 22 | + |
| 23 | +```shell |
| 24 | +helm repo add volcano-sh https://volcano-sh.github.io/helm-charts |
| 25 | +helm install volcano volcano-sh/volcano -n volcano-system --create-namespace |
| 26 | +``` |
| 27 | + |
| 28 | +Additional installation methods can be found [here](https://github.com/volcano-sh/volcano?tab=readme-ov-file#quick-start-guide). |
| 29 | + |
| 30 | +### Label the Node with ascend=on |
| 31 | + |
| 32 | +```shell |
| 33 | +kubectl label node {ascend-node} ascend=on |
| 34 | +``` |
| 35 | + |
| 36 | +### Deploy `hami-scheduler-device` config map |
| 37 | + |
| 38 | +```shell |
| 39 | +kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-configmap.yaml |
| 40 | +``` |
| 41 | + |
| 42 | +### Deploy ascend-device-plugin |
| 43 | + |
| 44 | +```shell |
| 45 | +kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-plugin.yaml |
| 46 | +``` |
| 47 | + |
| 48 | +For more information, refer to the [ascend-device-plugin documentation](https://github.com/Project-HAMi/ascend-device-plugin). |
| 49 | + |
| 50 | +### Scheduler Config Update |
| 51 | + |
| 52 | +Update the scheduler configuration: |
| 53 | + |
| 54 | +```shell |
| 55 | +kubectl edit cm -n volcano-system volcano-scheduler-configmap |
| 56 | +``` |
| 57 | + |
| 58 | +```yaml |
| 59 | +kind: ConfigMap |
| 60 | +apiVersion: v1 |
| 61 | +metadata: |
| 62 | + name: volcano-scheduler-configmap |
| 63 | + namespace: volcano-system |
| 64 | +data: |
| 65 | + volcano-scheduler.conf: | |
| 66 | + actions: "enqueue, allocate, backfill" |
| 67 | + tiers: |
| 68 | + - plugins: |
| 69 | + - name: predicates |
| 70 | + - name: deviceshare |
| 71 | + arguments: |
| 72 | + deviceshare.AscendHAMiVNPUEnable: true # enable ascend vnpu |
| 73 | + deviceshare.SchedulePolicy: binpack # scheduling policy. binpack / spread |
| 74 | + deviceshare.KnownGeometriesCMNamespace: kube-system |
| 75 | + deviceshare.KnownGeometriesCMName: hami-scheduler-device |
| 76 | +``` |
| 77 | +
|
| 78 | +:::note |
| 79 | +
|
| 80 | +You may notice that, `volcano-vgpu` has its own `KnownGeometriesCMName` and `KnownGeometriesCMNamespace`, which means if you want to use both vNPU and vGPU in a same volcano cluster, you need to merge the configMap from both sides and set it here. |
| 81 | + |
| 82 | +::: |
| 83 | + |
| 84 | +## Usage |
| 85 | + |
| 86 | +```yaml |
| 87 | +apiVersion: v1 |
| 88 | +kind: Pod |
| 89 | +metadata: |
| 90 | + name: ascend-pod |
| 91 | +spec: |
| 92 | + schedulerName: volcano |
| 93 | + containers: |
| 94 | + - name: ubuntu-container |
| 95 | + image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04 |
| 96 | + command: ["sleep"] |
| 97 | + args: ["100000"] |
| 98 | + resources: |
| 99 | + limits: |
| 100 | + huawei.com/Ascend310P: "1" |
| 101 | + huawei.com/Ascend310P-memory: "4096" |
| 102 | +
|
| 103 | +``` |
| 104 | + |
| 105 | +The supported Ascend chips and their `ResourceNames` are shown in the following table: |
| 106 | + |
| 107 | +| ChipName | ResourceName | ResourceMemoryName | |
| 108 | +|-------|-------|-------| |
| 109 | +| 910A | huawei.com/Ascend910A | huawei.com/Ascend910A-memory | |
| 110 | +| 910B2 | huawei.com/Ascend910B2 | huawei.com/Ascend910B2-memory | |
| 111 | +| 910B3 | huawei.com/Ascend910B3 | huawei.com/Ascend910B3-memory | |
| 112 | +| 910B4 | huawei.com/Ascend910B4 | huawei.com/Ascend910B4-memory | |
| 113 | +| 910B4-1 | huawei.com/Ascend910B4-1 | huawei.com/Ascend910B4-1-memory | |
| 114 | +| 310P3 | huawei.com/Ascend310P | huawei.com/Ascend310P-memory | |
0 commit comments