[Issue]: rPD trace vLLM benchmark failed

### Problem Description

runTrace.sh the vLLM benchmark failed

### Operating System

Ubuntu22.04 in the docker image rocm/vllm-dev:20241025-tuned

### CPU

AMD EPYC 9654 96-Core Processor

### GPU

AMD MI300X

### ROCm Version

ROCm 6.2.0

### ROCm Component

_No response_

### Steps to Reproduce

1. Start  the container 
```
alias drun="docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g --net host -v $PWD:/ws -v /data:/data --entrypoint /bin/bash --env HUGGINGFACE_HUB_CACHE=/data/llm -w /ws"

drun --name rPD-vllm rocm/vllm-dev:20241025-tuned
```

2. Install rocmProfileData from /app/rocmProfileData in the container
Follow the instructions from https://github.com/ROCm/rocmProfileData/ 

3. Tracing the vLLM benchmark
```
runTracer.sh python /app/vllm/benchmarks/benchmark_latency.py \
--model /data/llm/Meta-Llama-3.1-8B/ \
--dtype float16 \
--gpu-memory-utilization 0.99 \
--distributed-executor-backend mp \
--tensor-parallel-size 1 \
--batch-size 32 \
--input-len 128 \
--output-len 128
```
4. the vLLM benchmark do not start (test benchmark never run completely without any data result , double check by the rocm-smi show the model never loaded and run)
The log from rPD show VallueError as bellow,
```
Creating empty rpd: trace.rpd
rpd_tracer, because
WARNING 11-06 02:52:43 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
Namespace(model='/data/llm/Meta-Llama-3.1-8B/', speculative_model=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, tokenizer=None, quantization=None, tensor_parallel_size=1, input_len=128, output_len=128, batch_size=32, n=1, use_beam_search=False, num_iters_warmup=10, num_iters=30, trust_remote_code=False, max_model_len=None, dtype='float16', enforce_eager=False, kv_cache_dtype='auto', quantization_param_path=None, profile=False, profile_result_dir=None, device='auto', block_size=16, enable_chunked_prefill=False, enable_prefix_caching=False, ray_workers_use_nsight=False, download_dir=None, output_json=None, gpu_memory_utilization=0.99, load_format='auto', distributed_executor_backend='mp', otlp_traces_endpoint=None, num_scheduler_steps=1)
WARNING 11-06 02:52:47 config.py:1711] Casting torch.bfloat16 to torch.float16.
ERROR 11-06 02:52:55 registry.py:270] Error in inspecting model architecture 'LlamaForCausalLM'^M
ERROR 11-06 02:52:55 registry.py:270] Traceback (most recent call last):^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 432, in _run_in_subprocess^M
ERROR 11-06 02:52:55 registry.py:270]     returned.check_returncode()^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 460, in check_returncode^M
ERROR 11-06 02:52:55 registry.py:270]     raise CalledProcessError(self.returncode, self.args, self.stdout,^M
ERROR 11-06 02:52:55 registry.py:270] subprocess.CalledProcessError: Command '['/opt/conda/envs/py_3.9/bin/python', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.^M
ERROR 11-06 02:52:55 registry.py:270] ^M
ERROR 11-06 02:52:55 registry.py:270] The above exception was the direct cause of the following exception:^M
ERROR 11-06 02:52:55 registry.py:270] ^M
ERROR 11-06 02:52:55 registry.py:270] Traceback (most recent call last):^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 268, in _try_inspect_model_cls^M
ERROR 11-06 02:52:55 registry.py:270]     return model.inspect_model_cls()^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 230, in inspect_model_cls^M
ERROR 11-06 02:52:55 registry.py:270]     return _run_in_subprocess(^M
ERROR 11-06 02:52:55 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 435, in _run_in_subprocess^M
ERROR 11-06 02:52:55 registry.py:270]     raise RuntimeError(f"Error raised in subprocess:\n"^M
ERROR 11-06 02:52:55 registry.py:270] RuntimeError: Error raised in subprocess:^M
ERROR 11-06 02:52:55 registry.py:270] rpd_tracer, because^M
ERROR 11-06 02:52:55 registry.py:270] /opt/conda/envs/py_3.9/lib/python3.9/runpy.py:127: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour^M
ERROR 11-06 02:52:55 registry.py:270]   warn(RuntimeWarning(msg))^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_op: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_api_ops: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_kernelapi: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_copyapi: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_api: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_string: 0^M
ERROR 11-06 02:52:55 registry.py:270] rpd_tracer: finalized in 10.585086 ms^M
ERROR 11-06 02:52:55 registry.py:270] double free or corruption (!prev)^M
ERROR 11-06 02:52:55 registry.py:270]
Traceback (most recent call last):
  File "/app/vllm/benchmarks/benchmark_latency.py", line 286, in <module>
    main(args)
  File "/app/vllm/benchmarks/benchmark_latency.py", line 24, in main
    llm = LLM(
  File "vllm/utils.py", line 1181, in vllm.utils.deprecate_args.wrapper.inner
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 193, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "vllm/engine/llm_engine.py", line 571, in vllm.engine.llm_engine.LLMEngine.from_engine_args
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 918, in create_engine_config
    model_config = self.create_model_config()
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 853, in create_model_config
    return ModelConfig(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 210, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 233, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 390, in is_multimodal_model
    return self.inspect_model_cls(architectures).supports_multimodal
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 359, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 320, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['LlamaForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'Grok1ModelForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'BertModel', 'Gemma2Model', 'MistralModel', 'Qwen2ForRewardModel', 'Phi3VForCausalLM', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2VLForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel']
rocpd_op: 0
rocpd_api_ops: 0
rocpd_kernelapi: 0
rocpd_copyapi: 0
rocpd_api: 0
rocpd_string: 0
rpd_tracer: finalized in 9.810928 ms
double free or corruption (!prev)
/usr/local/bin/runTracer.sh: line 42:  7872 Aborted                 LD_PRELOAD=librpd_tracer.so "$@"

```



### (Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

_No response_

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Issue]: rPD trace vLLM benchmark failed #71

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue]: rPD trace vLLM benchmark failed #71

Description

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions