Skip to content

[Issue]: rPD trace vLLM benchmark failed #71

@alexhegit

Description

@alexhegit

Problem Description

runTrace.sh the vLLM benchmark failed

Operating System

Ubuntu22.04 in the docker image rocm/vllm-dev:20241025-tuned

CPU

AMD EPYC 9654 96-Core Processor

GPU

AMD MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

No response

Steps to Reproduce

  1. Start the container
alias drun="docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g --net host -v $PWD:/ws -v /data:/data --entrypoint /bin/bash --env HUGGINGFACE_HUB_CACHE=/data/llm -w /ws"

drun --name rPD-vllm rocm/vllm-dev:20241025-tuned
  1. Install rocmProfileData from /app/rocmProfileData in the container
    Follow the instructions from https://github.com/ROCm/rocmProfileData/

  2. Tracing the vLLM benchmark

runTracer.sh python /app/vllm/benchmarks/benchmark_latency.py \
--model /data/llm/Meta-Llama-3.1-8B/ \
--dtype float16 \
--gpu-memory-utilization 0.99 \
--distributed-executor-backend mp \
--tensor-parallel-size 1 \
--batch-size 32 \
--input-len 128 \
--output-len 128
  1. the vLLM benchmark do not start (test benchmark never run completely without any data result , double check by the rocm-smi show the model never loaded and run)
    The log from rPD show VallueError as bellow,
Creating empty rpd: trace.rpd
rpd_tracer, because
WARNING 11-06 02:52:43 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
Namespace(model='/data/llm/Meta-Llama-3.1-8B/', speculative_model=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, tokenizer=None, quantization=None, tensor_parallel_size=1, input_len=128, output_len=128, batch_size=32, n=1, use_beam_search=False, num_iters_warmup=10, num_iters=30, trust_remote_code=False, max_model_len=None, dtype='float16', enforce_eager=False, kv_cache_dtype='auto', quantization_param_path=None, profile=False, profile_result_dir=None, device='auto', block_size=16, enable_chunked_prefill=False, enable_prefix_caching=False, ray_workers_use_nsight=False, download_dir=None, output_json=None, gpu_memory_utilization=0.99, load_format='auto', distributed_executor_backend='mp', otlp_traces_endpoint=None, num_scheduler_steps=1)
WARNING 11-06 02:52:47 config.py:1711] Casting torch.bfloat16 to torch.float16.
ERROR 11-06 02:52:55 registry.py:270] Error in inspecting model architecture 'LlamaForCausalLM'^M
ERROR 11-06 02:52:55 registry.py:270] Traceback (most recent call last):^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 432, in _run_in_subprocess^M
ERROR 11-06 02:52:55 registry.py:270]     returned.check_returncode()^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 460, in check_returncode^M
ERROR 11-06 02:52:55 registry.py:270]     raise CalledProcessError(self.returncode, self.args, self.stdout,^M
ERROR 11-06 02:52:55 registry.py:270] subprocess.CalledProcessError: Command '['/opt/conda/envs/py_3.9/bin/python', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.^M
ERROR 11-06 02:52:55 registry.py:270] ^M
ERROR 11-06 02:52:55 registry.py:270] The above exception was the direct cause of the following exception:^M
ERROR 11-06 02:52:55 registry.py:270] ^M
ERROR 11-06 02:52:55 registry.py:270] Traceback (most recent call last):^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 268, in _try_inspect_model_cls^M
ERROR 11-06 02:52:55 registry.py:270]     return model.inspect_model_cls()^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 230, in inspect_model_cls^M
ERROR 11-06 02:52:55 registry.py:270]     return _run_in_subprocess(^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 435, in _run_in_subprocess^M
ERROR 11-06 02:52:55 registry.py:270]     raise RuntimeError(f"Error raised in subprocess:\n"^M
ERROR 11-06 02:52:55 registry.py:270] RuntimeError: Error raised in subprocess:^M
ERROR 11-06 02:52:55 registry.py:270] rpd_tracer, because^M
ERROR 11-06 02:52:55 registry.py:270] /opt/conda/envs/py_3.9/lib/python3.9/runpy.py:127: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour^M
ERROR 11-06 02:52:55 registry.py:270]   warn(RuntimeWarning(msg))^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_op: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_api_ops: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_kernelapi: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_copyapi: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_api: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_string: 0^M
ERROR 11-06 02:52:55 registry.py:270] rpd_tracer: finalized in 10.585086 ms^M
ERROR 11-06 02:52:55 registry.py:270] double free or corruption (!prev)^M
ERROR 11-06 02:52:55 registry.py:270]
Traceback (most recent call last):
  File "/app/vllm/benchmarks/benchmark_latency.py", line 286, in <module>
    main(args)
  File "/app/vllm/benchmarks/benchmark_latency.py", line 24, in main
    llm = LLM(
  File "vllm/utils.py", line 1181, in vllm.utils.deprecate_args.wrapper.inner
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 193, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "vllm/engine/llm_engine.py", line 571, in vllm.engine.llm_engine.LLMEngine.from_engine_args
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 918, in create_engine_config
    model_config = self.create_model_config()
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 853, in create_model_config
    return ModelConfig(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 210, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 233, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 390, in is_multimodal_model
    return self.inspect_model_cls(architectures).supports_multimodal
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 359, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 320, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['LlamaForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'Grok1ModelForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'BertModel', 'Gemma2Model', 'MistralModel', 'Qwen2ForRewardModel', 'Phi3VForCausalLM', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2VLForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel']
rocpd_op: 0
rocpd_api_ops: 0
rocpd_kernelapi: 0
rocpd_copyapi: 0
rocpd_api: 0
rocpd_string: 0
rpd_tracer: finalized in 9.810928 ms
double free or corruption (!prev)
/usr/local/bin/runTracer.sh: line 42:  7872 Aborted                 LD_PRELOAD=librpd_tracer.so "$@"

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions