You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> -`iluvatar.ai/vcuda-memory` for memory allocation
42
-
> -`iluvatar.ai/vcuda-core` for core allocation
43
-
>
44
-
> You can customize these names using the parameters above.
35
+
**Note:** The currently supported GPU models and resource names are defined in (https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml):
36
+
```yaml
37
+
iluvatars:
38
+
- chipName: MR-V100
39
+
commonWord: MR-V100
40
+
resourceCountName: iluvatar.ai/MR-V100-vgpu
41
+
resourceMemoryName: iluvatar.ai/MR-V100.vMem
42
+
resourceCoreName: iluvatar.ai/MR-V100.vCore
43
+
- chipName: MR-V50
44
+
commonWord: MR-V50
45
+
resourceCountName: iluvatar.ai/MR-V50-vgpu
46
+
resourceMemoryName: iluvatar.ai/MR-V50.vMem
47
+
resourceCoreName: iluvatar.ai/MR-V50.vCore
48
+
- chipName: BI-V150
49
+
commonWord: BI-V150
50
+
resourceCountName: iluvatar.ai/BI-V150-vgpu
51
+
resourceMemoryName: iluvatar.ai/BI-V150.vMem
52
+
resourceCoreName: iluvatar.ai/BI-V150.vCore
53
+
- chipName: BI-V100
54
+
commonWord: BI-V100
55
+
resourceCountName: iluvatar.ai/BI-V100-vgpu
56
+
resourceMemoryName: iluvatar.ai/BI-V100.vMem
57
+
resourceCoreName: iluvatar.ai/BI-V100.vCore
58
+
```
45
59
46
60
## Device Granularity
47
61
48
62
HAMi divides each Iluvatar GPU into 100 units for resource allocation. When you request a portion of a GPU, you're actually requesting a certain number of these units.
49
63
50
64
### Memory Allocation
51
65
52
-
- Each unit of `iluvatar.ai/vcuda-memory` represents 256MB of device memory
66
+
- Each unit of `iluvatar.ai/<card-type>.vMem` represents 256MB of device memory
53
67
- If you don't specify a memory request, the system will default to using 100% of the available memory
54
68
- Memory allocation is enforced with hard limits to ensure tasks don't exceed their allocated memory
55
69
56
70
### Core Allocation
57
71
58
-
- Each unit of `iluvatar.ai/vcuda-core` represents 1% of the available compute cores
72
+
- Each unit of `iluvatar.ai/<card-type>.vCore` represents 1% of the available compute cores
59
73
- Core allocation is enforced with hard limits to ensure tasks don't exceed their allocated cores
60
74
- When requesting multiple GPUs, the system will automatically set the core resources based on the number of GPUs requested
61
75
62
76
## Running Iluvatar jobs
63
77
64
78
Iluvatar GPUs can now be requested by a container
65
-
using the `iluvatar.ai/vgpu`, `iluvatar.ai/vcuda-memory` and `iluvatar.ai/vcuda-core` resource type:
79
+
using the `iluvatar.ai/BI-V150-vgpu`, `iluvatar.ai/BI-V150.vMem` and `iluvatar.ai/BI-V150.vCore` resource type:
> **NOTE:** The device ID format is `{node-name}-iluvatar-{index}`. You can find the available device IDs in the node status.
122
-
123
133
### Finding Device UUIDs
124
134
125
135
You can find the UUIDs of Iluvatar GPUs on a node using the following command:
@@ -131,7 +141,7 @@ kubectl get pod <pod-name> -o yaml | grep -A 10 "hami.io/<card-type>-devices-all
131
141
Or by examining the node annotations:
132
142
133
143
```bash
134
-
kubectl get node <node-name> -o yaml | grep -A 10 "hami.io/node-register-<card-type>"
144
+
kubectl get node <node-name> -o yaml | grep -A 10 "hami.io/node-<card-type>-register"
135
145
```
136
146
137
147
Look for annotations containing device information in the node status.
@@ -149,6 +159,6 @@ Look for annotations containing device information in the node status.
149
159
150
160
2. Virtualization takes effect only for containers that apply for one GPU(i.e iluvatar.ai/vgpu=1 ). When requesting multiple GPUs, the system will automatically set the core resources based on the number of GPUs requested.
151
161
152
-
3. The `iluvatar.ai/vcuda-memory` resource is only effective when `iluvatar.ai/vgpu=1`.
162
+
3. The `iluvatar.ai/<card-type>.vMem` resource is only effective when `iluvatar.ai/<card-type>-vgpu=1`.
153
163
154
-
4. Multi-device requests (`iluvatar.ai/vgpu > 1`) do not support vGPU mode.
164
+
4. Multi-device requests (`iluvatar.ai/<card-type>-vgpu= > 1`) do not support vGPU mode.
To allocate core and memory resources for the container, you only need to configure a certain size of GPU core `iluvatar.ai/BI-V150.vCore` and GPU memory resource `iluvatar.ai/BI-V150.vMem`.
> **NOTE:** *Each `iluvatar.ai/<card-type>.vCore` unit represents 1% of an available compute core, and each `iluvatar.ai/<card-type>.vMem` unit represents 256MB of device memory*
To allocate core and memory resources for the container, you only need to configure a certain size of GPU core `iluvatar.ai/MR-V100.vCore` and GPU memory resource `iluvatar.ai/MR-V100.vMem`.
> **NOTE:** *Each `iluvatar.ai/<card-type>.vCore` unit represents 1% of an available compute core, and each `iluvatar.ai/<card-type>.vMem` unit represents 256MB of device memory*
> **Note:** *When applying for exclusive use of a GPU, `iluvatar.ai/<card-type>-vgpu=1`, you need to set the values of `iluvatar.ai/<card-type>.vCore` and `iluvatar.ai/<card-type>.vMem` to the maximum number of GPU resources. `iluvatar.ai/<card-type>-vgpu>1` no longer supports the vGPU function, so you don't need to fill in the core and memory values*
> **Note:** *When applying for exclusive use of a GPU, `iluvatar.ai/<card-type>-vgpu=1`, you need to set the values of `iluvatar.ai/<card-type>.vCore` and `iluvatar.ai/<card-type>.vMem` to the maximum number of GPU resources. `iluvatar.ai/<card-type>-vgpu>1` no longer supports the vGPU function, so you don't need to fill in the core and memory values*
0 commit comments