[Feature Request]Explanation of Parallel Techniques

### Prerequisites

- [x] I have searched existing issues and reviewed documentation.

### Problem Description

May I ask what parallel techniques you have implemented? When setting CUDA_VISIBLE_DEVICES=0,1,2,3, all four cards have a certain load and are performing computations. I see that your TODO list states that expert parallelism is still being implemented. I don't quite understand the current behavior of the program. Here is the status of my compute cards:
`
(test) moeserve@test~/MoE-Infinity$ CUDA_VISIBLE_DEVICES=0,1,2,3 python examples/interface_example.py --model_name_or_path "/home/moeserve/.cache/modelscope/hub/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1" --offload_dir ~/offload/

Every 1.0s: nvidia-smi                                                                                                                                         test-NF5468-A7-A0-R0-00: Wed Mar 19 10:45:27 2025

Wed Mar 19 10:45:27 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15              Driver Version: 570.86.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 31%   44C    P0            106W /  450W |   23974MiB /  49140MiB |     12%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:41:00.0 Off |                  Off |
| 30%   43C    P0            100W /  450W |   23138MiB /  49140MiB |     10%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        Off |   00000000:81:00.0 Off |                  Off |
| 31%   42C    P0             88W /  450W |   23158MiB /  49140MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 4090        Off |   00000000:C1:00.0 Off |                  Off |
| 30%   43C    P0             82W /  450W |   23408MiB /  49140MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         3328026      C   python                                23790MiB |
|    1   N/A  N/A         3328026      C   python                                22954MiB |
|    2   N/A  N/A         3328026      C   python                                22974MiB |
|    3   N/A  N/A         3328026      C   python                                23224MiB |
+-----------------------------------------------------------------------------------------+

`

### Proposed Solution

I

### Alternatives Considered

_No response_

### Additional Context

_No response_

### Importance

Nice to have

### Usage Statistics (Optional)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]Explanation of Parallel Techniques #54

Prerequisites

Problem Description

Proposed Solution

Alternatives Considered

Additional Context

Importance

Usage Statistics (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]Explanation of Parallel Techniques #54

Description

Prerequisites

Problem Description

Proposed Solution

Alternatives Considered

Additional Context

Importance

Usage Statistics (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions