Skip to content

Commit 1d3d087

Browse files
Merge pull request #110 from qiangwei1983/master
Updated Iluvatar GPU related documentation
2 parents 022f07a + 9e8725f commit 1d3d087

15 files changed

+392
-211
lines changed

docs/userguide/Iluvatar-device/enable-illuvatar-gpu-sharing.md

Lines changed: 43 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
title: Enable Illuvatar GPU Sharing
33
---
44

5-
65
## Introduction
76

87
**We now support iluvatar.ai/gpu(i.e MR-V100、BI-V150、BI-V100) by implementing most device-sharing features as nvidia-GPU**, including:
@@ -28,52 +27,67 @@ title: Enable Illuvatar GPU Sharing
2827

2928
> **NOTICE:** *Install only gpu-manager, don't install gpu-admission package.*
3029
31-
* Identify the resource name about core and memory usage(i.e 'iluvatar.ai/vcuda-core', 'iluvatar.ai/vcuda-memory')
32-
33-
* set the 'iluvatarResourceMem' and 'iluvatarResourceCore' parameters when install hami
34-
30+
* set the devices.iluvatar.enabled=true when install hami
3531
```
36-
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag={your kubernetes version} --set iluvatarResourceMem=iluvatar.ai/vcuda-memory --set iluvatarResourceCore=iluvatar.ai/vcuda-core -n kube-system
32+
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag={your kubernetes version} --set devices.iluvatar.enabled=true
3733
```
3834

39-
> **NOTE:** The default resource names are:
40-
> - `iluvatar.ai/vgpu` for GPU count
41-
> - `iluvatar.ai/vcuda-memory` for memory allocation
42-
> - `iluvatar.ai/vcuda-core` for core allocation
43-
>
44-
> You can customize these names using the parameters above.
35+
**Note:** The currently supported GPU models and resource names are defined in (https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml):
36+
```yaml
37+
iluvatars:
38+
- chipName: MR-V100
39+
commonWord: MR-V100
40+
resourceCountName: iluvatar.ai/MR-V100-vgpu
41+
resourceMemoryName: iluvatar.ai/MR-V100.vMem
42+
resourceCoreName: iluvatar.ai/MR-V100.vCore
43+
- chipName: MR-V50
44+
commonWord: MR-V50
45+
resourceCountName: iluvatar.ai/MR-V50-vgpu
46+
resourceMemoryName: iluvatar.ai/MR-V50.vMem
47+
resourceCoreName: iluvatar.ai/MR-V50.vCore
48+
- chipName: BI-V150
49+
commonWord: BI-V150
50+
resourceCountName: iluvatar.ai/BI-V150-vgpu
51+
resourceMemoryName: iluvatar.ai/BI-V150.vMem
52+
resourceCoreName: iluvatar.ai/BI-V150.vCore
53+
- chipName: BI-V100
54+
commonWord: BI-V100
55+
resourceCountName: iluvatar.ai/BI-V100-vgpu
56+
resourceMemoryName: iluvatar.ai/BI-V100.vMem
57+
resourceCoreName: iluvatar.ai/BI-V100.vCore
58+
```
4559
4660
## Device Granularity
4761
4862
HAMi divides each Iluvatar GPU into 100 units for resource allocation. When you request a portion of a GPU, you're actually requesting a certain number of these units.
4963
5064
### Memory Allocation
5165
52-
- Each unit of `iluvatar.ai/vcuda-memory` represents 256MB of device memory
66+
- Each unit of `iluvatar.ai/<card-type>.vMem` represents 256MB of device memory
5367
- If you don't specify a memory request, the system will default to using 100% of the available memory
5468
- Memory allocation is enforced with hard limits to ensure tasks don't exceed their allocated memory
5569

5670
### Core Allocation
5771

58-
- Each unit of `iluvatar.ai/vcuda-core` represents 1% of the available compute cores
72+
- Each unit of `iluvatar.ai/<card-type>.vCore` represents 1% of the available compute cores
5973
- Core allocation is enforced with hard limits to ensure tasks don't exceed their allocated cores
6074
- When requesting multiple GPUs, the system will automatically set the core resources based on the number of GPUs requested
6175

6276
## Running Iluvatar jobs
6377

6478
Iluvatar GPUs can now be requested by a container
65-
using the `iluvatar.ai/vgpu`, `iluvatar.ai/vcuda-memory` and `iluvatar.ai/vcuda-core` resource type:
79+
using the `iluvatar.ai/BI-V150-vgpu`, `iluvatar.ai/BI-V150.vMem` and `iluvatar.ai/BI-V150.vCore` resource type:
6680

6781
```yaml
6882
apiVersion: v1
6983
kind: Pod
7084
metadata:
71-
name: poddemo
85+
name: BI-V150-poddemo
7286
spec:
7387
restartPolicy: Never
7488
containers:
75-
- name: poddemo
76-
image: harbor.4pd.io/vgpu/corex_transformers@sha256:36a01ec452e6ee63c7aa08bfa1fa16d469ad19cc1e6000cf120ada83e4ceec1e
89+
- name: BI-V150-poddemo
90+
image: registry.iluvatar.com.cn:10443/saas/mr-bi150-4.3.0-x86-ubuntu22.04-py3.10-base-base:v1.0
7791
command:
7892
- bash
7993
args:
@@ -87,19 +101,17 @@ spec:
87101
sleep 360000
88102
resources:
89103
requests:
90-
iluvatar.ai/vgpu: 1
91-
iluvatar.ai/vcuda-core: 50
92-
iluvatar.ai/vcuda-memory: 64
104+
iluvatar.ai/BI-V150-vgpu: 1
105+
iluvatar.ai/BI-V150.vCore: 50
106+
iluvatar.ai/BI-V150.vMem: 64
93107
limits:
94-
iluvatar.ai/vgpu: 1
95-
iluvatar.ai/vcuda-core: 50
96-
iluvatar.ai/vcuda-memory: 64
108+
iluvatar.ai/BI-V150-vgpu: 1
109+
iluvatar.ai/BI-V150.vCore: 50
110+
iluvatar.ai/BI-V150.vMem: 64
97111
```
98112

99113
> **NOTICE1:** *Each unit of vcuda-memory indicates 256M device memory*
100114

101-
> **NOTICE2:** *You can find more examples in [examples/iluvatar folder](https://github.com/Project-HAMi/HAMi/tree/release-v2.6/examples/iluvatar/)*
102-
103115
## Device UUID Selection
104116

105117
You can specify which GPU devices to use or exclude using annotations:
@@ -111,15 +123,13 @@ metadata:
111123
name: poddemo
112124
annotations:
113125
# Use specific GPU devices (comma-separated list)
114-
iluvatar.ai/use-gpuuuid: "node1-iluvatar-0,node1-iluvatar-1"
126+
hami.io/use-<card-type>-uuid: "device-uuid-1,device-uuid-2"
115127
# Or exclude specific GPU devices (comma-separated list)
116-
iluvatar.ai/nouse-gpuuuid: "node1-iluvatar-2,node1-iluvatar-3"
128+
hami.io/no-use-<card-type>-uuid: "device-uuid-1,device-uuid-2"
117129
spec:
118130
# ... rest of pod spec
119131
```
120132

121-
> **NOTE:** The device ID format is `{node-name}-iluvatar-{index}`. You can find the available device IDs in the node status.
122-
123133
### Finding Device UUIDs
124134

125135
You can find the UUIDs of Iluvatar GPUs on a node using the following command:
@@ -131,7 +141,7 @@ kubectl get pod <pod-name> -o yaml | grep -A 10 "hami.io/<card-type>-devices-all
131141
Or by examining the node annotations:
132142

133143
```bash
134-
kubectl get node <node-name> -o yaml | grep -A 10 "hami.io/node-register-<card-type>"
144+
kubectl get node <node-name> -o yaml | grep -A 10 "hami.io/node-<card-type>-register"
135145
```
136146

137147
Look for annotations containing device information in the node status.
@@ -149,6 +159,6 @@ Look for annotations containing device information in the node status.
149159

150160
2. Virtualization takes effect only for containers that apply for one GPU(i.e iluvatar.ai/vgpu=1 ). When requesting multiple GPUs, the system will automatically set the core resources based on the number of GPUs requested.
151161

152-
3. The `iluvatar.ai/vcuda-memory` resource is only effective when `iluvatar.ai/vgpu=1`.
162+
3. The `iluvatar.ai/<card-type>.vMem` resource is only effective when `iluvatar.ai/<card-type>-vgpu=1`.
153163

154-
4. Multi-device requests (`iluvatar.ai/vgpu > 1`) do not support vGPU mode.
164+
4. Multi-device requests (`iluvatar.ai/<card-type>-vgpu= > 1`) do not support vGPU mode.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: Allocate BI-V150 slice
3+
---
4+
5+
To allocate core and memory resources for the container, you only need to configure a certain size of GPU core `iluvatar.ai/BI-V150.vCore` and GPU memory resource `iluvatar.ai/BI-V150.vMem`.
6+
7+
```yaml
8+
apiVersion: v1
9+
kind: Pod
10+
metadata:
11+
name: BI-V150-poddemo
12+
spec:
13+
restartPolicy: Never
14+
containers:
15+
- name: BI-V150-poddemo
16+
image: registry.iluvatar.com.cn:10443/saas/mr-bi150-4.3.0-x86-ubuntu22.04-py3.10-base-base:v1.0
17+
command:
18+
- bash
19+
args:
20+
- -c
21+
- |
22+
set -ex
23+
echo "export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH">> /root/.bashrc
24+
cp -f /usr/local/iluvatar/lib64/libcuda.* /usr/local/corex/lib64/
25+
cp -f /usr/local/iluvatar/lib64/libixml.* /usr/local/corex/lib64/
26+
source /root/.bashrc
27+
sleep 360000
28+
resources:
29+
requests:
30+
iluvatar.ai/BI-V150-vgpu: 1
31+
iluvatar.ai/BI-V150.vCore: 50
32+
iluvatar.ai/BI-V150.vMem: 64
33+
limits:
34+
iluvatar.ai/BI-V150-vgpu: 1
35+
iluvatar.ai/BI-V150.vCore: 50
36+
iluvatar.ai/BI-V150.vMem: 64
37+
```
38+
39+
> **NOTE:** *Each `iluvatar.ai/<card-type>.vCore` unit represents 1% of an available compute core, and each `iluvatar.ai/<card-type>.vMem` unit represents 256MB of device memory*
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Allocate MR-V100 slice
3+
translated: true
4+
---
5+
6+
To allocate core and memory resources for the container, you only need to configure a certain size of GPU core `iluvatar.ai/MR-V100.vCore` and GPU memory resource `iluvatar.ai/MR-V100.vMem`.
7+
8+
```yaml
9+
apiVersion: v1
10+
kind: Pod
11+
metadata:
12+
name: MR-V100-poddemo
13+
spec:
14+
restartPolicy: Never
15+
containers:
16+
- name: MR-V100-poddemo
17+
image: registry.iluvatar.com.cn:10443/saas/mr-bi150-4.3.0-x86-ubuntu22.04-py3.10-base-base:v1.0
18+
command:
19+
- bash
20+
args:
21+
- -c
22+
- |
23+
set -ex
24+
echo "export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH">> /root/.bashrc
25+
cp -f /usr/local/iluvatar/lib64/libcuda.* /usr/local/corex/lib64/
26+
cp -f /usr/local/iluvatar/lib64/libixml.* /usr/local/corex/lib64/
27+
source /root/.bashrc
28+
sleep 360000
29+
resources:
30+
requests:
31+
iluvatar.ai/MR-V100-vgpu: 1
32+
iluvatar.ai/MR-V100.vCore: 50
33+
iluvatar.ai/MR-V100.vMem: 64
34+
limits:
35+
iluvatar.ai/MR-V100-vgpu: 1
36+
iluvatar.ai/MR-V100.vCore: 50
37+
iluvatar.ai/MR-V100.vMem: 64
38+
```
39+
40+
> **NOTE:** *Each `iluvatar.ai/<card-type>.vCore` unit represents 1% of an available compute core, and each `iluvatar.ai/<card-type>.vMem` unit represents 256MB of device memory*

docs/userguide/Iluvatar-device/examples/allocate-device-core-and-memory-to-container.md

Lines changed: 0 additions & 35 deletions
This file was deleted.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: Allocate exclusive BI-V150 device
3+
---
4+
5+
To allocate multiple BI-V150 devices, you only need to assign `iluvatar.ai/BI-V150-vgpu` with no other fields required.
6+
7+
```yaml
8+
apiVersion: v1
9+
kind: Pod
10+
metadata:
11+
name: BI-V150-poddemo
12+
spec:
13+
restartPolicy: Never
14+
containers:
15+
- name: BI-V150-poddemo
16+
image: registry.iluvatar.com.cn:10443/saas/mr-bi150-4.3.0-x86-ubuntu22.04-py3.10-base-base:v1.0
17+
command:
18+
- bash
19+
args:
20+
- -c
21+
- |
22+
set -ex
23+
echo "export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH">> /root/.bashrc
24+
cp -f /usr/local/iluvatar/lib64/libcuda.* /usr/local/corex/lib64/
25+
cp -f /usr/local/iluvatar/lib64/libixml.* /usr/local/corex/lib64/
26+
source /root/.bashrc
27+
sleep 360000
28+
resources:
29+
requests:
30+
iluvatar.ai/BI-V150-vgpu: 2
31+
limits:
32+
iluvatar.ai/BI-V150-vgpu: 2
33+
```
34+
> **Note:** *When applying for exclusive use of a GPU, `iluvatar.ai/<card-type>-vgpu=1`, you need to set the values ​​of `iluvatar.ai/<card-type>.vCore` and `iluvatar.ai/<card-type>.vMem` to the maximum number of GPU resources. `iluvatar.ai/<card-type>-vgpu>1` no longer supports the vGPU function, so you don't need to fill in the core and memory values*
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: Allocate exclusive BI-V100 device
3+
---
4+
5+
To allocate multiple BI-V100 devices, you only need to assign `iluvatar.ai/BI-V150-vgpu` with no other fields required.
6+
7+
```yaml
8+
apiVersion: v1
9+
kind: Pod
10+
metadata:
11+
name: MR-V100-poddemo
12+
spec:
13+
restartPolicy: Never
14+
containers:
15+
- name: MR-V100-poddemo
16+
image: registry.iluvatar.com.cn:10443/saas/mr-bi150-4.3.0-x86-ubuntu22.04-py3.10-base-base:v1.0
17+
command:
18+
- bash
19+
args:
20+
- -c
21+
- |
22+
set -ex
23+
echo "export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH">> /root/.bashrc
24+
cp -f /usr/local/iluvatar/lib64/libcuda.* /usr/local/corex/lib64/
25+
cp -f /usr/local/iluvatar/lib64/libixml.* /usr/local/corex/lib64/
26+
source /root/.bashrc
27+
sleep 360000
28+
resources:
29+
requests:
30+
iluvatar.ai/MR-V100-vgpu: 2
31+
limits:
32+
iluvatar.ai/MR-V100-vgpu: 2
33+
```
34+
> **Note:** *When applying for exclusive use of a GPU, `iluvatar.ai/<card-type>-vgpu=1`, you need to set the values ​​of `iluvatar.ai/<card-type>.vCore` and `iluvatar.ai/<card-type>.vMem` to the maximum number of GPU resources. `iluvatar.ai/<card-type>-vgpu>1` no longer supports the vGPU function, so you don't need to fill in the core and memory values*

docs/userguide/Iluvatar-device/examples/allocate-exclusive.md

Lines changed: 0 additions & 35 deletions
This file was deleted.

0 commit comments

Comments
 (0)