while caculate the available tokens for new launched kernel, it just use utilization of vGPU card No.0?

**Test Environment:**
GPU Card:  NVIDIA L20
GPU Driver:  550.54.15
CUDA Version: 12.04
OS: Ubuntu 22.04
Volcano: 1.12.1
Volcano vgpu device plugin: v1.10.0
**HAMi core: master branch with  "[Merge pull request](https://github.com/Project-HAMi/HAMi-core/commit/1aa6bdf32b7b6337067d5747c514149d2d094d3b) https://github.com/Project-HAMi/HAMi-core/pull/103 [from Project-HAMi/fix_v2.6.0](https://github.com/Project-HAMi/HAMi-core/commit/1aa6bdf32b7b6337067d5747c514149d2d094d3b)"**


**Test case**
Volcano vgpu device plugin gpu memory virtualization parameter:  -**-gpu-memory-factor=2**，GPU virtulization mode: **hami-core** , 

When I deploy a Pod which has one container with **2 vGPU card ，and 30% vgpu cores，42000MiB vgpu memory for each card**. 


Resource declaration like below:
...
    resources:
      limits:
        volcano.sh/vgpu-number: 2
        volcano.sh/vgpu-memory: 21000
        volcano.sh/vgpu-cores: 30
....

The Pod has env "GPU_CORE_UTILIZATION_POLICY"set:
    **- name: GPU_CORE_UTILIZATION_POLICY
      value: "force"**


**Question:**
I found that vgpu cores utilization for 2vgpu cards are not correct: it seems like hami-core can not limit the utilization，it will 20% higher  than the desired limited。 And sometimes vGPU card 0 can be limited，but the other card is always can not be limited. the other times, both cards can not be limited. 

Pods that just apply for one card, the effect for controlling core  utilization is better.

**Why a Pod with one container of 2vgpu card (or more) can not be limited** ?

When dive deep to the source code , I find that when try to do ratelimit, it just use **vGPU Card 0** [delta(upper_limit, **userutil[0]**, share);]to caculate the avaivable token? Is it somthings wrong? What happen when each physical card's performance or workload is apparent different , or what happen where the **Non 0** vGPU card is launched  heavy kernel but 0 vGPU care is low load

void* utilization_watcher() {
    nvmlInit();
    int userutil[CUDA_DEVICE_MAX_COUNT];
    int sysprocnum;
    long share = 0;
    int upper_limit = get_current_device_sm_limit(0);
    ensure_initialized();
    LOG_DEBUG("upper_limit=%d\n",upper_limit);
    while (1){
        nanosleep(&g_wait, NULL);
        if (pidfound==0) {
          update_host_pid();
          if (pidfound==0)
            continue;
        }
        init_gpu_device_utilization();
        get_used_gpu_utilization(userutil,&sysprocnum);
        //if (sysprocnum == 1 &&
        //    userutil < upper_limit / 10) {
        //    g_cur_cuda_cores =
        //        delta(upper_limit, userutil, share);
        //    continue;
        //}
        if ((share==g_total_cuda_cores) && (g_cur_cuda_cores<0)) {
          g_total_cuda_cores *= 2;
          share = g_total_cuda_cores;
        }
        if ((userutil[0]<=100) && (userutil[0]>=0)){
          share = delta(upper_limit, **userutil[0]**, share);
          change_token(share);
        }
        LOG_INFO("userutil1=%d currentcores=%ld total=%ld limit=%d share=%ld\n",userutil[0],g_cur_cuda_cores,g_total_cuda_cores,upper_limit,share);
    }
}



Sincerely ask for your answers! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

while caculate the available tokens for new launched kernel, it just use utilization of vGPU card No.0? #104

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

while caculate the available tokens for new launched kernel, it just use utilization of vGPU card No.0? #104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions