Merge pull request #89 from windsonsea/metfor

hami-robot[bot] · web-flow · commit a807b62f7b80 · 2025-07-21T10:25:43.000Z
Clean up userguide/Metax-device/ pages
diff --git a/docs/userguide/Metax-device/Metax-GPU/enable-metax-gpu-schedule.md b/docs/userguide/Metax-device/Metax-GPU/enable-metax-gpu-schedule.md
@@ -2,28 +2,30 @@
 title: Enable Metax GPU topology-aware scheduling
 ---
 
-## Introduction
+**HAMi now supports metax.com/gpu by implementing topo-awareness among metax GPUs.**
 
-**we now support metax.com/gpu by implementing topo-awareness among metax GPUs**
-
-When multiple GPUs are configured on a single server, the GPU cards are connected to the same PCIe Switch or MetaXLink depending on whether they are connected
-, there is a near-far relationship. This forms a topology among all the cards on the server, as shown in the following figure:
+When multiple GPUs are configured on a single server, the GPU cards are connected to the same PCIe Switch or MetaXLink.
+Depending on the connection type, a near-far relationship is formed among the GPUs.
+Together, these connections define the topology of the GPU cards on the server, as shown below:
 
 ![img](https://github.com/Project-HAMi/HAMi/raw/master/imgs/metax_topo.png)
 
-A user job requests a certain number of metax-tech.com/gpu resources, Kubernetes schedule pods to the appropriate node. gpu-device further processes the logic of allocating the remaining resources on the resource node following criterias below:
-1. MetaXLink takes precedence over PCIe Switch in two way:
-– A connection is considered a MetaXLink connection when there is a MetaXLink connection and a PCIe Switch connection between the two cards.
-– When both the MetaXLink and the PCIe Switch can meet the job request
-Equipped with MetaXLink interconnected resources.
+When a user job requests a specific number of `metax-tech.com/gpu` resources,
+Kubernetes schedules the pod to a suitable node. On that node,
+the GPU device plugin (gpu-device) handles fine-grained allocation based on the following criteria:
+
+1. MetaXLink takes precedence over PCIe Switch in two ways:
+
+   - A connection is considered a MetaXLink connection when there is a MetaXLink connection and a PCIe Switch connection between the two cards.
+   - When both the MetaXLink and the PCIe Switch can meet the job request, equipped with MetaXLink interconnected resources.
 
-2. When using `node-scheduler-policy=spread` , Allocate Metax resources to be under the same Metaxlink or Paiswich as much as possible, as the following figure shows:
+2. When using `node-scheduler-policy=spread`, allocate Metax resources to be under the same Metaxlink or Paiswich as much as possible, as shown below:
 
-![img](https://github.com/Project-HAMi/HAMi/raw/master/imgs/metax_spread.png)
+   ![img](https://github.com/Project-HAMi/HAMi/raw/master/imgs/metax_spread.png)
 
-3. When using `node-scheduler-policy=binpack`, Assign GPU resources, so minimize the damage to MetaxXLink topology, as the following figure shows:
+3. When using `node-scheduler-policy=binpack`, assign GPU resources, so minimize the damage to MetaxXLink topology, as shown below:
 
-![img](https://github.com/Project-HAMi/HAMi/raw/master/imgs/metax_binpack.png)
+   ![img](https://github.com/Project-HAMi/HAMi/raw/master/imgs/metax_binpack.png)
 
 ## Important Notes
 
@@ -45,7 +47,7 @@ Equipped with MetaXLink interconnected resources.
 ## Running Metax jobs
 
 Metax GPUs can now be requested by a container
-using the `metax-tech.com/gpu`  resource type:
+using the `metax-tech.com/gpu` resource type:
 
 ```yaml
 apiVersion: v1
diff --git a/docs/userguide/Metax-device/Metax-GPU/examples/allocate-binpack.md b/docs/userguide/Metax-device/Metax-GPU/examples/allocate-binpack.md
@@ -2,11 +2,9 @@
 title: Binpack schedule policy
 ---
 
-## Allocate metax device using binpack schedule policy
+To allocate metax device with mininum damage to topology, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy: "binpack"`.
 
-To allocate metax device with mininum damage to topology, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy`=`binpack`
-
-```
+```yaml
 apiVersion: v1
 kind: Pod
 metadata:
@@ -22,4 +20,4 @@ spec:
       resources:
         limits:
           metax-tech.com/gpu: 1 # requesting 1 metax GPU
-```
+```
diff --git a/docs/userguide/Metax-device/Metax-GPU/examples/allocate-spread.md b/docs/userguide/Metax-device/Metax-GPU/examples/allocate-spread.md
@@ -2,11 +2,9 @@
 title: Spread schedule policy
 ---
 
-## Allocate metax device using spread schedule policy
+To allocate metax device with best performance, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy: "spread"`.
 
-To allocate metax device with best performance, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy`=`spread`
-
-```
+```yaml
 apiVersion: v1
 kind: Pod
 metadata:
@@ -22,4 +20,4 @@ spec:
       resources:
         limits:
           metax-tech.com/gpu: 4 # requesting 4 metax GPUs
-```
+```
diff --git a/docs/userguide/Metax-device/Metax-GPU/examples/default-use.md b/docs/userguide/Metax-device/Metax-GPU/examples/default-use.md
@@ -2,11 +2,9 @@
 title: Allocate metax device
 ---
 
-## Allocate metax device
-
 To allocate metax device, you need to only assign `metax-tech.com/gpu` without other fields.
 
-```
+```yaml
 apiVersion: v1
 kind: Pod
 metadata:
@@ -20,4 +18,4 @@ spec:
       resources:
         limits:
           metax-tech.com/gpu: 1 # requesting 1 metax GPU
-```
+```
diff --git a/docs/userguide/Metax-device/Metax-GPU/specify-binpack-task.md b/docs/userguide/Metax-device/Metax-GPU/specify-binpack-task.md
@@ -2,11 +2,9 @@
 title: Binpack schedule policy
 ---
 
-## Set schedule policy to binpack
+To allocate metax device with mininum damage to topology, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy: "binpack"`.
 
-To allocate metax device with mininum damage to topology, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy`=`binpack`
-
-```
+```yaml
 metadata:
   annotations: 
     hami.io/node-scheduler-policy: "binpack" # when this parameter is set to binpack, the scheduler will try to minimize the topology loss.
diff --git a/docs/userguide/Metax-device/Metax-GPU/specify-spread-task.md b/docs/userguide/Metax-device/Metax-GPU/specify-spread-task.md
@@ -2,11 +2,9 @@
 title: Spread schedule policy
 ---
 
-## Set schedule policy to spread
+To allocate metax device with best performance, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy: "spread"`.
 
-To allocate metax device with best performance, you need to only assign `metax-tech.com/gpu` with annotations `hami.io/node-scheduler-policy`=`spread`
-
-```
+```yaml
 metadata:
   annotations: 
     hami.io/node-scheduler-policy: "spread" # when this parameter is set to spread, the scheduler will try to find the best topology for this task.
diff --git a/docs/userguide/Metax-device/Metax-sGPU/enable-metax-gpu-sharing.md b/docs/userguide/Metax-device/Metax-sGPU/enable-metax-gpu-sharing.md
@@ -3,32 +3,30 @@ title: Enable Metax GPU sharing
 translated: true
 ---
 
-## Introduction
+**HAMi now supports metax.com/gpu by implementing most device-sharing features as nvidia-GPU**, device-sharing features include the following:
 
-**we now support metax.com/gpu by implementing most device-sharing features as nvidia-GPU**, device-sharing features include the following:
+- **GPU Sharing**: Tasks can request a fraction of a GPU rather than the entire GPU card, allowing multiple tasks to share the same GPU.
 
-***GPU sharing***: Each task can allocate a portion of GPU instead of a whole GPU card, thus GPU can be shared among multiple tasks.
+- **Device Memory Control**: Tasks can be allocated a specific amount of GPU memory, with strict enforcement to ensure usage does not exceed the assigned limit.
 
-***Device Memory Control***: GPUs can be allocated with certain device memory size and have made it that it does not exceed the boundary.
+- **Compute Core Limiting**: Tasks can be allocated a specific percentage of GPU compute cores (e.g., `60` means the container can use 60% of the GPU’s compute cores).
 
-***Device compute core limitation***: GPUs can be allocated with certain percentage of device core(60 indicate this container uses 60% compute cores of this device)
-
-### Prerequisites
+## Prerequisites
 
 * Metax Driver >= 2.32.0
 * Metax GPU Operator >= 0.10.2
 * Kubernetes >= 1.23
 
-### Enabling GPU-sharing Support
+## Enabling GPU-sharing support
 
-* Deploy Metax GPU Operator on metax nodes (Please consult your device provider to aquire its package and document)
+* Deploy Metax GPU Operator on metax nodes (Please consult your device provider to obtain the installation package and documentation)
 
 * Deploy HAMi according to README.md
 
-### Running Metax jobs
+## Running Metax jobs
 
 Metax GPUs can now be requested by a container
-using the `metax-tech.com/sgpu`  resource type:
+using the `metax-tech.com/sgpu` resource type:
 
 ```yaml
 apiVersion: v1
diff --git a/docs/userguide/Metax-device/Metax-sGPU/examples/allocate-exclusive.md b/docs/userguide/Metax-device/Metax-sGPU/examples/allocate-exclusive.md
@@ -3,8 +3,6 @@ title: Allocate exclusive device
 translated: true
 ---
 
-## Allocate exclusive device
-
 To allocate a whole Metax GPU device, you need to only assign `metax-tech.com/sgpu` without other fields.
 
 ```yaml
diff --git a/docs/userguide/Metax-device/Metax-sGPU/examples/allocate-qos-policy.md b/docs/userguide/Metax-device/Metax-sGPU/examples/allocate-qos-policy.md
@@ -1,17 +1,15 @@
 ---
-title: Allocate specific Qos policy devices
+title: Allocate specific QoS policy devices
 translated: true
 ---
 
-## Allocate specific Qos policy devices
+Users can configure the QoS policy for tasks using the `metax-tech.com/sgpu-qos-policy` annotation to specify the scheduling policy used by the shared GPU (sGPU). The available sGPU scheduling policies are described in the table below:
 
-Users can configure the Qos Policy parameter for tasks via `metax-tech.com/sgpu-qos-policy` to specify the scheduling policy used by the sGPU. The specific sGPU scheduling policy description can be found in the following table.
-
-| scheduling policy   | description |
-| --- | --- |
-| `best-effort`   | sGPU is no limit on computing power   |
-| `fixed-share`   | sGPU is a fixed computing power quota, and it cannot be used beyond the fixed quota  |
-| `burst-share`   | sGPU is a fixed computing power quota. If the GPU card still has idle computing power, it can be used by the sGPU |
+| Scheduling Policy | Description |
+|-------------------|-------------|
+| `best-effort`     | The sGPU has no restriction on compute usage. |
+| `fixed-share`     | The sGPU is assigned a fixed compute quota and cannot exceed this limit. |
+| `burst-share`     | The sGPU is assigned a fixed compute quota, but may utilize additional GPU compute resources when they are idle. |
 
 ```yaml
 apiVersion: v1
diff --git a/docs/userguide/Metax-device/Metax-sGPU/examples/default-use.md b/docs/userguide/Metax-device/Metax-sGPU/examples/default-use.md
@@ -3,9 +3,7 @@ title: Allocate device core and memory resource
 translated: true
 ---
 
-## Allocate device core and memory to container
-
-To allocate a certain part of device core resource, you need only to assign the `metax-tech.com/vcore` and `metax-tech.com/vmemory` along with the number of Metax GPUs you requested in the container using `metax-tech.com/sgpu`
+To allocate a certain part of device core resource, you need only to assign the `metax-tech.com/vcore` and `metax-tech.com/vmemory` along with the number of Metax GPUs you requested in the container using `metax-tech.com/sgpu`.
 
 ```yaml
 apiVersion: v1