-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Problem Description
runTrace.sh the vLLM benchmark failed
Operating System
Ubuntu22.04 in the docker image rocm/vllm-dev:20241025-tuned
CPU
AMD EPYC 9654 96-Core Processor
GPU
AMD MI300X
ROCm Version
ROCm 6.2.0
ROCm Component
No response
Steps to Reproduce
- Start the container
alias drun="docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g --net host -v $PWD:/ws -v /data:/data --entrypoint /bin/bash --env HUGGINGFACE_HUB_CACHE=/data/llm -w /ws"
drun --name rPD-vllm rocm/vllm-dev:20241025-tuned
-
Install rocmProfileData from /app/rocmProfileData in the container
Follow the instructions from https://github.com/ROCm/rocmProfileData/ -
Tracing the vLLM benchmark
runTracer.sh python /app/vllm/benchmarks/benchmark_latency.py \
--model /data/llm/Meta-Llama-3.1-8B/ \
--dtype float16 \
--gpu-memory-utilization 0.99 \
--distributed-executor-backend mp \
--tensor-parallel-size 1 \
--batch-size 32 \
--input-len 128 \
--output-len 128
- the vLLM benchmark do not start (test benchmark never run completely without any data result , double check by the rocm-smi show the model never loaded and run)
The log from rPD show VallueError as bellow,
Creating empty rpd: trace.rpd
rpd_tracer, because
WARNING 11-06 02:52:43 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
Namespace(model='/data/llm/Meta-Llama-3.1-8B/', speculative_model=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, tokenizer=None, quantization=None, tensor_parallel_size=1, input_len=128, output_len=128, batch_size=32, n=1, use_beam_search=False, num_iters_warmup=10, num_iters=30, trust_remote_code=False, max_model_len=None, dtype='float16', enforce_eager=False, kv_cache_dtype='auto', quantization_param_path=None, profile=False, profile_result_dir=None, device='auto', block_size=16, enable_chunked_prefill=False, enable_prefix_caching=False, ray_workers_use_nsight=False, download_dir=None, output_json=None, gpu_memory_utilization=0.99, load_format='auto', distributed_executor_backend='mp', otlp_traces_endpoint=None, num_scheduler_steps=1)
WARNING 11-06 02:52:47 config.py:1711] Casting torch.bfloat16 to torch.float16.
ERROR 11-06 02:52:55 registry.py:270] Error in inspecting model architecture 'LlamaForCausalLM'^M
ERROR 11-06 02:52:55 registry.py:270] Traceback (most recent call last):^M
ERROR 11-06 02:52:55 registry.py:270] File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 432, in _run_in_subprocess^M
ERROR 11-06 02:52:55 registry.py:270] returned.check_returncode()^M
ERROR 11-06 02:52:55 registry.py:270] File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 460, in check_returncode^M
ERROR 11-06 02:52:55 registry.py:270] raise CalledProcessError(self.returncode, self.args, self.stdout,^M
ERROR 11-06 02:52:55 registry.py:270] subprocess.CalledProcessError: Command '['/opt/conda/envs/py_3.9/bin/python', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.^M
ERROR 11-06 02:52:55 registry.py:270] ^M
ERROR 11-06 02:52:55 registry.py:270] The above exception was the direct cause of the following exception:^M
ERROR 11-06 02:52:55 registry.py:270] ^M
ERROR 11-06 02:52:55 registry.py:270] Traceback (most recent call last):^M
ERROR 11-06 02:52:55 registry.py:270] File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 268, in _try_inspect_model_cls^M
ERROR 11-06 02:52:55 registry.py:270] return model.inspect_model_cls()^M
ERROR 11-06 02:52:55 registry.py:270] File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 230, in inspect_model_cls^M
ERROR 11-06 02:52:55 registry.py:270] return _run_in_subprocess(^M
ERROR 11-06 02:52:55 registry.py:270] File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 435, in _run_in_subprocess^M
ERROR 11-06 02:52:55 registry.py:270] raise RuntimeError(f"Error raised in subprocess:\n"^M
ERROR 11-06 02:52:55 registry.py:270] RuntimeError: Error raised in subprocess:^M
ERROR 11-06 02:52:55 registry.py:270] rpd_tracer, because^M
ERROR 11-06 02:52:55 registry.py:270] /opt/conda/envs/py_3.9/lib/python3.9/runpy.py:127: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour^M
ERROR 11-06 02:52:55 registry.py:270] warn(RuntimeWarning(msg))^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_op: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_api_ops: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_kernelapi: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_copyapi: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_api: 0^M
ERROR 11-06 02:52:55 registry.py:270] rocpd_string: 0^M
ERROR 11-06 02:52:55 registry.py:270] rpd_tracer: finalized in 10.585086 ms^M
ERROR 11-06 02:52:55 registry.py:270] double free or corruption (!prev)^M
ERROR 11-06 02:52:55 registry.py:270]
Traceback (most recent call last):
File "/app/vllm/benchmarks/benchmark_latency.py", line 286, in <module>
main(args)
File "/app/vllm/benchmarks/benchmark_latency.py", line 24, in main
llm = LLM(
File "vllm/utils.py", line 1181, in vllm.utils.deprecate_args.wrapper.inner
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 193, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "vllm/engine/llm_engine.py", line 571, in vllm.engine.llm_engine.LLMEngine.from_engine_args
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 918, in create_engine_config
model_config = self.create_model_config()
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 853, in create_model_config
return ModelConfig(
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 210, in __init__
self.multimodal_config = self._init_multimodal_config(
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 233, in _init_multimodal_config
if ModelRegistry.is_multimodal_model(architectures):
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 390, in is_multimodal_model
return self.inspect_model_cls(architectures).supports_multimodal
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 359, in inspect_model_cls
return self._raise_for_unsupported(architectures)
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 320, in _raise_for_unsupported
raise ValueError(
ValueError: Model architectures ['LlamaForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'Grok1ModelForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'BertModel', 'Gemma2Model', 'MistralModel', 'Qwen2ForRewardModel', 'Phi3VForCausalLM', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2VLForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel']
rocpd_op: 0
rocpd_api_ops: 0
rocpd_kernelapi: 0
rocpd_copyapi: 0
rocpd_api: 0
rocpd_string: 0
rpd_tracer: finalized in 9.810928 ms
double free or corruption (!prev)
/usr/local/bin/runTracer.sh: line 42: 7872 Aborted LD_PRELOAD=librpd_tracer.so "$@"
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Metadata
Metadata
Assignees
Labels
No labels