Skip to content

while caculate the available tokens for new launched kernel, it just use utilization of vGPU card No.0? #104

@foshanck

Description

@foshanck

Test Environment:
GPU Card: NVIDIA L20
GPU Driver: 550.54.15
CUDA Version: 12.04
OS: Ubuntu 22.04
Volcano: 1.12.1
Volcano vgpu device plugin: v1.10.0
HAMi core: master branch with "Merge pull request #103 from Project-HAMi/fix_v2.6.0"

Test case
Volcano vgpu device plugin gpu memory virtualization parameter: --gpu-memory-factor=2,GPU virtulization mode: hami-core ,

When I deploy a Pod which has one container with 2 vGPU card ,and 30% vgpu cores,42000MiB vgpu memory for each card.

Resource declaration like below:
...
resources:
limits:
volcano.sh/vgpu-number: 2
volcano.sh/vgpu-memory: 21000
volcano.sh/vgpu-cores: 30
....

The Pod has env "GPU_CORE_UTILIZATION_POLICY"set:
- name: GPU_CORE_UTILIZATION_POLICY
value: "force"

Question:
I found that vgpu cores utilization for 2vgpu cards are not correct: it seems like hami-core can not limit the utilization,it will 20% higher than the desired limited。 And sometimes vGPU card 0 can be limited,but the other card is always can not be limited. the other times, both cards can not be limited.

Pods that just apply for one card, the effect for controlling core utilization is better.

Why a Pod with one container of 2vgpu card (or more) can not be limited ?

When dive deep to the source code , I find that when try to do ratelimit, it just use vGPU Card 0 [delta(upper_limit, userutil[0], share);]to caculate the avaivable token? Is it somthings wrong? What happen when each physical card's performance or workload is apparent different , or what happen where the Non 0 vGPU card is launched heavy kernel but 0 vGPU care is low load

void* utilization_watcher() {
nvmlInit();
int userutil[CUDA_DEVICE_MAX_COUNT];
int sysprocnum;
long share = 0;
int upper_limit = get_current_device_sm_limit(0);
ensure_initialized();
LOG_DEBUG("upper_limit=%d\n",upper_limit);
while (1){
nanosleep(&g_wait, NULL);
if (pidfound==0) {
update_host_pid();
if (pidfound==0)
continue;
}
init_gpu_device_utilization();
get_used_gpu_utilization(userutil,&sysprocnum);
//if (sysprocnum == 1 &&
// userutil < upper_limit / 10) {
// g_cur_cuda_cores =
// delta(upper_limit, userutil, share);
// continue;
//}
if ((share==g_total_cuda_cores) && (g_cur_cuda_cores<0)) {
g_total_cuda_cores *= 2;
share = g_total_cuda_cores;
}
if ((userutil[0]<=100) && (userutil[0]>=0)){
share = delta(upper_limit, userutil[0], share);
change_token(share);
}
LOG_INFO("userutil1=%d currentcores=%ld total=%ld limit=%d share=%ld\n",userutil[0],g_cur_cuda_cores,g_total_cuda_cores,upper_limit,share);
}
}

Sincerely ask for your answers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions