-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Test Environment:
GPU Card: NVIDIA L20
GPU Driver: 550.54.15
CUDA Version: 12.04
OS: Ubuntu 22.04
Volcano: 1.12.1
Volcano vgpu device plugin: v1.10.0
HAMi core: master branch with "Merge pull request #103 from Project-HAMi/fix_v2.6.0"
Test case
Volcano vgpu device plugin gpu memory virtualization parameter: --gpu-memory-factor=2,GPU virtulization mode: hami-core ,
When I deploy a Pod which has one container with 2 vGPU card ,and 30% vgpu cores,42000MiB vgpu memory for each card.
Resource declaration like below:
...
resources:
limits:
volcano.sh/vgpu-number: 2
volcano.sh/vgpu-memory: 21000
volcano.sh/vgpu-cores: 30
....
The Pod has env "GPU_CORE_UTILIZATION_POLICY"set:
- name: GPU_CORE_UTILIZATION_POLICY
value: "force"
Question:
I found that vgpu cores utilization for 2vgpu cards are not correct: it seems like hami-core can not limit the utilization,it will 20% higher than the desired limited。 And sometimes vGPU card 0 can be limited,but the other card is always can not be limited. the other times, both cards can not be limited.
Pods that just apply for one card, the effect for controlling core utilization is better.
Why a Pod with one container of 2vgpu card (or more) can not be limited ?
When dive deep to the source code , I find that when try to do ratelimit, it just use vGPU Card 0 [delta(upper_limit, userutil[0], share);]to caculate the avaivable token? Is it somthings wrong? What happen when each physical card's performance or workload is apparent different , or what happen where the Non 0 vGPU card is launched heavy kernel but 0 vGPU care is low load
void* utilization_watcher() {
nvmlInit();
int userutil[CUDA_DEVICE_MAX_COUNT];
int sysprocnum;
long share = 0;
int upper_limit = get_current_device_sm_limit(0);
ensure_initialized();
LOG_DEBUG("upper_limit=%d\n",upper_limit);
while (1){
nanosleep(&g_wait, NULL);
if (pidfound==0) {
update_host_pid();
if (pidfound==0)
continue;
}
init_gpu_device_utilization();
get_used_gpu_utilization(userutil,&sysprocnum);
//if (sysprocnum == 1 &&
// userutil < upper_limit / 10) {
// g_cur_cuda_cores =
// delta(upper_limit, userutil, share);
// continue;
//}
if ((share==g_total_cuda_cores) && (g_cur_cuda_cores<0)) {
g_total_cuda_cores *= 2;
share = g_total_cuda_cores;
}
if ((userutil[0]<=100) && (userutil[0]>=0)){
share = delta(upper_limit, userutil[0], share);
change_token(share);
}
LOG_INFO("userutil1=%d currentcores=%ld total=%ld limit=%d share=%ld\n",userutil[0],g_cur_cuda_cores,g_total_cuda_cores,upper_limit,share);
}
}
Sincerely ask for your answers!