Skip to content

Commit e7779eb

Browse files
authored
Merge pull request #88 from windsonsea/metaxy
Update version-v2.5.1-sidebars.json
2 parents 336391e + efbbd02 commit e7779eb

File tree

3 files changed

+119
-0
lines changed

3 files changed

+119
-0
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
{
2+
"version.label": {
3+
"message": "v2.5.1",
4+
"description": "The label for version v2.5.1"
5+
},
6+
"sidebar.docs.category.Core Concepts": {
7+
"message": "核心概念",
8+
"description": "The label for category Core Concepts in sidebar docs"
9+
},
10+
"sidebar.docs.category.Get Started": {
11+
"message": "开始使用",
12+
"description": "The label for category Get Started in sidebar docs"
13+
},
14+
"sidebar.docs.category.Installation": {
15+
"message": "安装",
16+
"description": "The label for category Installation in sidebar docs"
17+
},
18+
"sidebar.docs.category.User Guide": {
19+
"message": "用户指南",
20+
"description": "The label for category User Guide in sidebar docs"
21+
},
22+
"sidebar.docs.category.Monitoring": {
23+
"message": "监控",
24+
"description": "The label for category Monitoring in sidebar docs"
25+
},
26+
"sidebar.docs.category.Share NVIDIA GPU devices": {
27+
"message": "共享 NVIDIA GPU 设备",
28+
"description": "The label for category Share NVIDIA GPU devices in sidebar docs"
29+
},
30+
"sidebar.docs.category.Examples": {
31+
"message": "示例",
32+
"description": "The label for category Examples in sidebar docs"
33+
},
34+
"sidebar.docs.category.Share Cambricon MLU devices": {
35+
"message": "共享寒武纪 MLU 设备",
36+
"description": "The label for category Share Cambricon MLU devices in sidebar docs"
37+
},
38+
"sidebar.docs.category.Contributor Guide": {
39+
"message": "贡献者指南",
40+
"description": "The label for category Contributor Guide in sidebar docs"
41+
},
42+
"sidebar.docs.category.Developer Guide": {
43+
"message": "开发者指南",
44+
"description": "The label for category Developer Guide in sidebar docs"
45+
},
46+
"sidebar.docs.category.Key Features": {
47+
"message": "核心功能",
48+
"description": "The label for category Key Features in sidebar docs"
49+
},
50+
"sidebar.docs.category.Share Hygon DCU devices": {
51+
"message": "共享海光 DCU 设备",
52+
"description": "The label for category Share Hygon DCU devices in sidebar docs"
53+
},
54+
"sidebar.docs.category.Share Mthreads GPU devices": {
55+
"message": "共享摩尔线程 GPU 设备",
56+
"description": "The label for category Share Mthreads GPU devices in sidebar docs"
57+
},
58+
"sidebar.docs.category.Optimize Metax GPU scheduling": {
59+
"message": "优化沐曦 GPU 调度",
60+
"description": "The label for category Optimize Metax GPU scheduling in sidebar docs"
61+
},
62+
"sidebar.docs.category.Volcano vgpu support": {
63+
"message": "Volcano vGPU",
64+
"description": "The label for category Volcano vgpu support in sidebar docs"
65+
},
66+
"sidebar.docs.category.Share Ascend devices": {
67+
"message": "共享昇腾 GPU 设备",
68+
"description": "The label for category Share Ascend devices in sidebar docs"
69+
}
70+
}
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
title: Enable Metax GPU sharing
3+
---
4+
5+
**HAMi now supports metax.com/gpu by implementing most device-sharing features as nvidia-GPU**, device-sharing features include the following:
6+
7+
- **GPU Sharing**: Tasks can request a fraction of a GPU rather than the entire GPU card, allowing multiple tasks to share the same GPU.
8+
9+
- **Device Memory Control**: Tasks can be allocated a specific amount of GPU memory, with strict enforcement to ensure usage does not exceed the assigned limit.
10+
11+
- **Compute Core Limiting**: Tasks can be allocated a specific percentage of GPU compute cores (e.g., `60` means the container can use 60% of the GPU’s compute cores).
12+
13+
## Prerequisites
14+
15+
* Metax Driver >= 2.31.0
16+
* Metax GPU Operator >= 0.10.1
17+
* Kubernetes >= 1.23
18+
19+
## Enabling GPU-sharing Support
20+
21+
* Deploy Metax GPU Operator on metax nodes (Please consult your device provider to aquire its package and document)
22+
23+
* Deploy HAMi according to README.md
24+
25+
## Running Metax jobs
26+
27+
Metax GPUs can now be requested by a container
28+
using the `metax-tech.com/sgpu` resource type:
29+
30+
```yaml
31+
apiVersion: v1
32+
kind: Pod
33+
metadata:
34+
name: gpu-pod1
35+
spec:
36+
containers:
37+
- name: ubuntu-container
38+
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
39+
imagePullPolicy: IfNotPresent
40+
command: ["sleep","infinity"]
41+
resources:
42+
limits:
43+
metax-tech.com/sgpu: 1 # requesting 1 GPU
44+
metax-tech.com/vcore: 60 # each GPU use 60% of total compute cores
45+
metax-tech.com/vmemory: 4 # each GPU require 4 GiB device memory
46+
```
47+
48+
> **NOTICE1:** *You can find more examples in examples/sgpu folder.*

versioned_sidebars/version-v2.5.1-sidebars.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,7 @@
132132
"label": "Optimize Metax GPU scheduling",
133133
"items": [
134134
"userguide/Metax-device/enable-metax-gpu-schedule",
135+
"userguide/Metax-device/enable-metax-gpu-sharing",
135136
"userguide/Metax-device/specify-binpack-task",
136137
"userguide/Metax-device/specify-spread-task",
137138
{

0 commit comments

Comments
 (0)