-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Description
Checklist
- I searched related issues but found no solution.
- The bug persists in the latest version.
- Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- Please use English. Otherwise, it will be closed.
Describe the bug
Error -
Reproduction
When I run the command - python3 -m sglang.launch_server --model-path "meta-llama/Llama-3.2-1B" --host 0.0.0.0 --dtype float32, I get a vLLM Import Error
Here is the full error trace -
root@158457f6b080:/sgl-workspace/sglang# python3 -m sglang.launch_server --model-path "meta-llama/Llama-3.2-1B" --host 0.0.0.0 --dtype float32
[2025-12-02 19:21:23] INFO model_config.py:881: Upcasting torch.bfloat16 to torch.float32.
[2025-12-02 19:21:23] WARNING server_args.py:1213: Attention backend not explicitly specified. Use fa3 backend by default.
[2025-12-02 19:21:24] Fail to set RLIMIT_NOFILE: current limit exceeds maximum limit
[2025-12-02 19:21:24] server_args=ServerArgs(model_path='meta-llama/Llama-3.2-1B', tokenizer_path='meta-llama/Llama-3.2-1B', tokenizer_mode='auto', tokenizer_worker_num=1, skip_tokenizer_init=False, load_format='auto', model_loader_extra_config='{}', trust_remote_code=False, context_length=None, is_embedding=False, enable_multimodal=None, revision=None, model_impl='auto', host='0.0.0.0', port=30000, grpc_mode=False, skip_server_warmup=False, warmups=None, nccl_port=None, checkpoint_engine_wait_weights_before_ready=False, dtype='float32', quantization=None, quantization_param_path=None, kv_cache_dtype='auto', enable_fp32_lm_head=False, modelopt_quant=None, modelopt_checkpoint_restore_path=None, modelopt_checkpoint_save_path=None, modelopt_export_path=None, quantize_and_serve=False, mem_fraction_static=0.86, max_running_requests=None, max_queued_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='fcfs', enable_priority_scheduling=False, abort_on_priority_when_disabled=False, schedule_low_priority_values_first=False, priority_scheduling_preemption_threshold=10, schedule_conservativeness=1.0, page_size=1, hybrid_kvcache_ratio=None, swa_full_tokens_ratio=0.8, disable_hybrid_swa_memory=False, radix_eviction_policy='lru', device='cuda', tp_size=1, pp_size=1, pp_max_micro_batch_size=None, stream_interval=1, stream_output=False, random_seed=38650796, constrained_json_whitespace_pattern=None, constrained_json_disable_any_whitespace=False, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, sleep_on_idle=False, log_level='info', log_level_http=None, log_requests=False, log_requests_level=2, crash_dump_folder=None, show_time_cost=False, enable_metrics=False, enable_metrics_for_all_schedulers=False, tokenizer_metrics_custom_labels_header='x-custom-labels', tokenizer_metrics_allowed_custom_labels=None, bucket_time_to_first_token=None, bucket_inter_token_latency=None, bucket_e2e_request_latency=None, collect_tokens_histogram=False, prompt_tokens_buckets=None, generation_tokens_buckets=None, gc_warning_threshold_secs=0.0, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, enable_trace=False, otlp_traces_endpoint='localhost:4317', export_metrics_to_file=False, export_metrics_to_file_dir=None, api_key=None, served_model_name='meta-llama/Llama-3.2-1B', weight_version='default', chat_template=None, completion_template=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, tool_call_parser=None, tool_server=None, sampling_defaults='model', dp_size=1, load_balance_method='round_robin', load_watch_interval=0.1, prefill_round_robin_balance=False, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, enable_lora=None, max_lora_rank=None, lora_target_modules=None, lora_paths=None, max_loaded_loras=None, max_loras_per_batch=8, lora_eviction_policy='lru', lora_backend='csgmv', max_lora_chunk_size=16, attention_backend='fa3', decode_attention_backend=None, prefill_attention_backend=None, sampling_backend='flashinfer', grammar_backend='xgrammar', mm_attention_backend=None, nsa_prefill_backend='flashmla_sparse', nsa_decode_backend='fa3', speculative_algorithm=None, speculative_draft_model_path=None, speculative_draft_model_revision=None, speculative_draft_load_format=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, speculative_attention_mode='prefill', speculative_moe_runner_backend=None, speculative_ngram_min_match_window_size=1, speculative_ngram_max_match_window_size=12, speculative_ngram_min_bfs_breadth=1, speculative_ngram_max_bfs_breadth=10, speculative_ngram_match_type='BFS', speculative_ngram_branch_length=18, speculative_ngram_capacity=10000000, ep_size=1, moe_a2a_backend='none', moe_runner_backend='auto', flashinfer_mxfp4_moe_precision='default', enable_flashinfer_allreduce_fusion=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm='static', init_expert_location='trivial', enable_eplb=False, eplb_algorithm='auto', eplb_rebalance_num_iterations=1000, eplb_rebalance_layers_per_chunk=None, eplb_min_rebalancing_utilization_threshold=1.0, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=1000, enable_expert_distribution_metrics=False, deepep_config=None, moe_dense_tp_size=None, elastic_ep_backend=None, mooncake_ib_device=None, max_mamba_cache_size=None, mamba_ssm_dtype='float32', mamba_full_memory_ratio=0.9, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through', hicache_io_backend='kernel', hicache_mem_layout='layer_first', hicache_storage_backend=None, hicache_storage_prefetch_policy='best_effort', hicache_storage_backend_extra_config=None, enable_lmcache=False, kt_weight_path=None, kt_method='AMXINT4', kt_cpuinfer=None, kt_threadpool_count=2, kt_num_gpu_experts=None, kt_max_deferred_experts_per_token=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, cpu_offload_gb=0, offload_group_size=-1, offload_num_in_group=1, offload_prefetch_step=1, offload_mode='cpu', multi_item_scoring_delimiter=None, disable_radix_cache=False, cuda_graph_max_bs=256, cuda_graph_bs=[1, 2, 4, 8, 12, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256], disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_profile_cuda_graph=False, enable_cudagraph_gc=False, enable_layerwise_nvtx_marker=False, enable_nccl_nvls=False, enable_symm_mem=False, disable_flashinfer_cutlass_moe_fp4_allgather=False, enable_tokenizer_batch_encode=False, disable_tokenizer_batch_decode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_mscclpp=False, enable_torch_symm_mem=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_two_batch_overlap=False, enable_single_batch_overlap=False, tbo_token_distribution_threshold=0.48, enable_torch_compile=False, enable_piecewise_cuda_graph=False, torch_compile_max_bs=32, piecewise_cuda_graph_max_tokens=4096, piecewise_cuda_graph_tokens=[4, 8, 12, 16, 20, 24, 28, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 640, 768, 896, 1024, 1152, 1280, 1408, 1536, 1664, 1792, 1920, 2048, 2176, 2304, 2432, 2560, 2688, 2816, 2944, 3072, 3200, 3328, 3456, 3584, 3712, 3840, 3968, 4096], piecewise_cuda_graph_compiler='eager', torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, triton_attention_split_tile_size=None, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, enable_weights_cpu_backup=False, enable_draft_weights_cpu_backup=False, allow_auto_truncate=False, enable_custom_logit_processor=False, flashinfer_mla_disable_ragged=False, disable_shared_experts_fusion=False, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, keep_mm_feature_on_device=False, enable_return_hidden_states=False, scheduler_recv_interval=1, numa_node=None, enable_deterministic_inference=False, rl_on_policy_target=None, enable_attn_tp_input_scattered=False, enable_dynamic_batch_tokenizer=False, dynamic_batch_tokenizer_batch_size=32, dynamic_batch_tokenizer_batch_timeout=0.002, debug_tensor_dump_output_folder=None, debug_tensor_dump_layers=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, disaggregation_mode='null', disaggregation_transfer_backend='mooncake', disaggregation_bootstrap_port=8998, disaggregation_decode_tp=None, disaggregation_decode_dp=None, disaggregation_prefill_pp=1, disaggregation_ib_device=None, disaggregation_decode_enable_offload_kvcache=False, num_reserved_decode_tokens=512, disaggregation_decode_polling_interval=1, custom_weight_loader=[], weight_loader_disable_mmap=False, remote_instance_weight_loader_seed_instance_ip=None, remote_instance_weight_loader_seed_instance_service_port=None, remote_instance_weight_loader_send_weights_group_ports=None, enable_pdmux=False, pdmux_config_path=None, sm_group_num=8, mm_max_concurrent_calls=32, mm_per_request_timeout=10.0, enable_broadcast_mm_inputs_process=False, decrypted_config_file=None, decrypted_draft_config_file=None)
[2025-12-02 19:21:24] Upcasting torch.bfloat16 to torch.float32.
[2025-12-02 19:21:25] No chat template found, defaulting to 'string' content format
[2025-12-02 19:21:30] Upcasting torch.bfloat16 to torch.float32.
[2025-12-02 19:21:31] Upcasting torch.bfloat16 to torch.float32.
[2025-12-02 19:21:31] Init torch distributed begin.
[W1202 19:21:32.334798467 socket.cpp:200] [c10d] The hostname of the client socket cannot be retrieved. err=-3
[W1202 19:21:32.343275226 socket.cpp:200] [c10d] The hostname of the client socket cannot be retrieved. err=-3
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2025-12-02 19:21:32] Init torch distributed ends. mem usage=0.00 GB
[2025-12-02 19:21:32] MOE_RUNNER_BACKEND is not initialized, the backend will be automatically selected
[2025-12-02 19:21:33] Load weight begin. avail mem=5.97 GB
[2025-12-02 19:21:33] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2712, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 312, in __init__
self.tp_worker = TpModelWorker(
^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 237, in __init__
self._model_runner = ModelRunner(
^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 324, in __init__
self.initialize(min_per_gpu_memory)
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 410, in initialize
self.load_model()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 767, in load_model
self.model = get_model(
^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/__init__.py", line 28, in get_model
return loader.load_model(
^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 594, in load_model
model = _initialize_model(
^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 262, in _initialize_model
return model_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 425, in __init__
self.model = self._init_model(config, quant_config, add_prefix("model", prefix))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 457, in _init_model
return LlamaModel(config, quant_config=quant_config, prefix=prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 300, in __init__
self.layers, self.start_layer, self.end_layer = make_layers(
^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 577, in make_layers
+ get_offloader().wrap_modules(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/utils/offloader.py", line 36, in wrap_modules
return list(all_modules_generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 579, in <genexpr>
layer_fn(idx=idx, prefix=add_prefix(idx, prefix))
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 302, in <lambda>
lambda idx, prefix: LlamaDecoderLayer(
^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 227, in __init__
self.self_attn = LlamaAttention(
^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 170, in __init__
self.rotary_emb = get_rope(
^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/rotary_embedding.py", line 2541, in get_rope
rotary_emb = Llama3RotaryEmbedding(
^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/rotary_embedding.py", line 932, in __init__
super().__init__(
File "/sgl-workspace/sglang/python/sglang/srt/layers/rotary_embedding.py", line 121, in __init__
from vllm._custom_ops import rotary_embedding
ModuleNotFoundError: No module named 'vllm'
[2025-12-02 19:21:33] Received sigquit from a child process. It usually means the child failed.
Killed
I cannot figure out what is the right vLLM to install as if I install the latest via - pip install -U vllm
Then I get errors during the installation -
root@158457f6b080:/sgl-workspace/sglang# pip install -U vllm
Collecting vllm
Downloading vllm-0.11.2-cp38-abi3-manylinux1_x86_64.whl.metadata (18 kB)
Requirement already satisfied: regex in /usr/local/lib/python3.12/dist-packages (from vllm) (2025.11.3)
Collecting cachetools (from vllm)
Downloading cachetools-6.2.2-py3-none-any.whl.metadata (5.6 kB)
Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from vllm) (7.1.3)
Requirement already satisfied: sentencepiece in /usr/local/lib/python3.12/dist-packages (from vllm) (0.2.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from vllm) (2.3.5)
Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (2.32.5)
Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from vllm) (4.67.1)
Collecting blake3 (from vllm)
Downloading blake3-1.0.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.6 kB)
Collecting py-cpuinfo (from vllm)
Downloading py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
Requirement already satisfied: transformers<5,>=4.56.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (4.57.1)
Requirement already satisfied: tokenizers>=0.21.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.22.1)
Requirement already satisfied: protobuf in /usr/local/lib/python3.12/dist-packages (from vllm) (6.33.1)
Requirement already satisfied: fastapi>=0.115.0 in /usr/local/lib/python3.12/dist-packages (from fastapi[standard]>=0.115.0->vllm) (0.121.2)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.12/dist-packages (from vllm) (3.13.2)
Requirement already satisfied: openai>=1.99.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (2.6.1)
Requirement already satisfied: pydantic>=2.12.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (2.12.4)
Requirement already satisfied: prometheus_client>=0.18.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.23.1)
Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (from vllm) (12.0.0)
Collecting prometheus-fastapi-instrumentator>=7.0.0 (from vllm)
Downloading prometheus_fastapi_instrumentator-7.1.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: tiktoken>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.12.0)
Collecting lm-format-enforcer==0.11.3 (from vllm)
Downloading lm_format_enforcer-0.11.3-py3-none-any.whl.metadata (17 kB)
Collecting llguidance<1.4.0,>=1.3.0 (from vllm)
Downloading llguidance-1.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting outlines_core==0.2.11 (from vllm)
Downloading outlines_core-0.2.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.8 kB)
Requirement already satisfied: diskcache==5.6.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (5.6.3)
Collecting lark==1.2.2 (from vllm)
Downloading lark-1.2.2-py3-none-any.whl.metadata (1.8 kB)
Requirement already satisfied: xgrammar==0.1.25 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.1.25)
Requirement already satisfied: typing_extensions>=4.10 in /usr/local/lib/python3.12/dist-packages (from vllm) (4.15.0)
Requirement already satisfied: filelock>=3.16.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (3.20.0)
Requirement already satisfied: partial-json-parser in /usr/local/lib/python3.12/dist-packages (from vllm) (0.2.1.1.post6)
Requirement already satisfied: pyzmq>=25.0.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (27.1.0)
Requirement already satisfied: msgspec in /usr/local/lib/python3.12/dist-packages (from vllm) (0.19.0)
Requirement already satisfied: gguf>=0.13.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.17.1)
Collecting mistral_common>=1.8.5 (from mistral_common[image]>=1.8.5->vllm)
Downloading mistral_common-1.8.6-py3-none-any.whl.metadata (5.3 kB)
Collecting opencv-python-headless>=4.11.0 (from vllm)
Downloading opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (19 kB)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.12/dist-packages (from vllm) (6.0.3)
Requirement already satisfied: six>=1.16.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (1.17.0)
Requirement already satisfied: setuptools<81.0.0,>=77.0.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (80.9.0)
Requirement already satisfied: einops in /usr/local/lib/python3.12/dist-packages (from vllm) (0.8.1)
Requirement already satisfied: compressed-tensors==0.12.2 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.12.2)
Collecting depyf==0.20.0 (from vllm)
Downloading depyf-0.20.0-py3-none-any.whl.metadata (7.3 kB)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.12/dist-packages (from vllm) (3.1.2)
Collecting watchfiles (from vllm)
Downloading watchfiles-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting python-json-logger (from vllm)
Downloading python_json_logger-4.0.0-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from vllm) (1.16.3)
Requirement already satisfied: ninja in /usr/local/lib/python3.12/dist-packages (from vllm) (1.13.0)
Requirement already satisfied: pybase64 in /usr/local/lib/python3.12/dist-packages (from vllm) (1.4.2)
Collecting cbor2 (from vllm)
Downloading cbor2-5.7.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (5.4 kB)
Requirement already satisfied: setproctitle in /usr/local/lib/python3.12/dist-packages (from vllm) (1.3.7)
Requirement already satisfied: openai-harmony>=0.0.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.0.4)
Collecting anthropic==0.71.0 (from vllm)
Downloading anthropic-0.71.0-py3-none-any.whl.metadata (28 kB)
Collecting model-hosting-container-standards<1.0.0 (from vllm)
Downloading model_hosting_container_standards-0.1.9-py3-none-any.whl.metadata (24 kB)
Collecting numba==0.61.2 (from vllm)
Downloading numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.8 kB)
Collecting ray>=2.48.0 (from ray[cgraph]>=2.48.0->vllm)
Downloading ray-2.52.1-cp312-cp312-manylinux2014_x86_64.whl.metadata (21 kB)
Collecting torch==2.9.0 (from vllm)
Downloading torch-2.9.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting torchaudio==2.9.0 (from vllm)
Downloading torchaudio-2.9.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.9 kB)
Collecting torchvision==0.24.0 (from vllm)
Downloading torchvision-0.24.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB)
Collecting xformers==0.0.33.post1 (from vllm)
Downloading xformers-0.0.33.post1-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Requirement already satisfied: flashinfer-python==0.5.2 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.5.2)
Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from anthropic==0.71.0->vllm) (4.11.0)
Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from anthropic==0.71.0->vllm) (1.7.0)
Requirement already satisfied: docstring-parser<1,>=0.15 in /usr/local/lib/python3.12/dist-packages (from anthropic==0.71.0->vllm) (0.17.0)
Requirement already satisfied: httpx<1,>=0.25.0 in /usr/local/lib/python3.12/dist-packages (from anthropic==0.71.0->vllm) (0.28.1)
Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from anthropic==0.71.0->vllm) (0.12.0)
Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from anthropic==0.71.0->vllm) (1.3.1)
Requirement already satisfied: loguru in /usr/local/lib/python3.12/dist-packages (from compressed-tensors==0.12.2->vllm) (0.7.3)
Collecting astor (from depyf==0.20.0->vllm)
Downloading astor-0.8.1-py2.py3-none-any.whl.metadata (4.2 kB)
Requirement already satisfied: dill in /usr/local/lib/python3.12/dist-packages (from depyf==0.20.0->vllm) (0.4.0)
Requirement already satisfied: apache-tvm-ffi<0.2,>=0.1 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.5.2->vllm) (0.1.2)
Requirement already satisfied: click in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.5.2->vllm) (8.3.1)
Requirement already satisfied: nvidia-cudnn-frontend>=1.13.0 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.5.2->vllm) (1.16.0)
Requirement already satisfied: nvidia-cutlass-dsl>=4.2.1 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.5.2->vllm) (4.3.0.dev0)
Requirement already satisfied: nvidia-ml-py in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.5.2->vllm) (13.580.82)
Requirement already satisfied: packaging>=24.2 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.5.2->vllm) (25.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.5.2->vllm) (0.9.0)
Requirement already satisfied: interegular>=0.3.2 in /usr/local/lib/python3.12/dist-packages (from lm-format-enforcer==0.11.3->vllm) (0.3.3)
Collecting llvmlite<0.45,>=0.44.0dev0 (from numba==0.61.2->vllm)
Downloading llvmlite-0.44.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.0 kB)
Collecting numpy (from vllm)
Downloading numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.0->vllm) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.0->vllm) (3.5)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.0->vllm) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.0->vllm) (2025.10.0)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch==2.9.0->vllm)
Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch==2.9.0->vllm)
Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch==2.9.0->vllm)
Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.0->vllm) (9.10.2.21)
Collecting nvidia-cublas-cu12==12.8.4.1 (from torch==2.9.0->vllm)
Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufft-cu12==11.3.3.83 (from torch==2.9.0->vllm)
Downloading nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-curand-cu12==10.3.9.90 (from torch==2.9.0->vllm)
Downloading nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch==2.9.0->vllm)
Downloading nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch==2.9.0->vllm)
Downloading nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.0->vllm) (0.7.1)
Collecting nvidia-nccl-cu12==2.27.5 (from torch==2.9.0->vllm)
Downloading nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Collecting nvidia-nvshmem-cu12==3.3.20 (from torch==2.9.0->vllm)
Downloading nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB)
Collecting nvidia-nvtx-cu12==12.8.90 (from torch==2.9.0->vllm)
Downloading nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch==2.9.0->vllm)
Downloading nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufile-cu12==1.13.1.3 (from torch==2.9.0->vllm)
Downloading nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting triton==3.5.0 (from torch==2.9.0->vllm)
Downloading triton-3.5.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.7 kB)
Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.12/dist-packages (from anyio<5,>=3.5.0->anthropic==0.71.0->vllm) (3.11)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.25.0->anthropic==0.71.0->vllm) (2025.11.12)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.25.0->anthropic==0.71.0->vllm) (1.0.9)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1,>=0.25.0->anthropic==0.71.0->vllm) (0.16.0)
Collecting jmespath (from model-hosting-container-standards<1.0.0->vllm)
Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Requirement already satisfied: starlette>=0.49.1 in /usr/local/lib/python3.12/dist-packages (from model-hosting-container-standards<1.0.0->vllm) (0.49.3)
Collecting supervisor>=4.2.0 (from model-hosting-container-standards<1.0.0->vllm)
Downloading supervisor-4.3.0-py2.py3-none-any.whl.metadata (87 kB)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.12.0->vllm) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.12.0->vllm) (2.41.5)
Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.12.0->vllm) (0.4.2)
Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5,>=4.56.0->vllm) (0.36.0)
Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5,>=4.56.0->vllm) (0.6.2)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers<5,>=4.56.0->vllm) (1.2.0)
Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.115.0->fastapi[standard]>=0.115.0->vllm) (0.0.4)
Collecting fastapi-cli>=0.0.8 (from fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading fastapi_cli-0.0.16-py3-none-any.whl.metadata (6.4 kB)
Requirement already satisfied: python-multipart>=0.0.18 in /usr/local/lib/python3.12/dist-packages (from fastapi[standard]>=0.115.0->vllm) (0.0.20)
Collecting email-validator>=2.0.0 (from fastapi[standard]>=0.115.0->vllm)
Downloading email_validator-2.3.0-py3-none-any.whl.metadata (26 kB)
Requirement already satisfied: uvicorn>=0.12.0 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.12.0; extra == "standard"->fastapi[standard]>=0.115.0->vllm) (0.38.0)
Collecting dnspython>=2.0.0 (from email-validator>=2.0.0->fastapi[standard]>=0.115.0->vllm)
Downloading dnspython-2.8.0-py3-none-any.whl.metadata (5.7 kB)
Collecting typer>=0.15.1 (from fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading typer-0.20.0-py3-none-any.whl.metadata (16 kB)
Collecting rich-toolkit>=0.14.8 (from fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading rich_toolkit-0.17.0-py3-none-any.whl.metadata (1.0 kB)
Collecting fastapi-cloud-cli>=0.1.1 (from fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading fastapi_cloud_cli-0.5.2-py3-none-any.whl.metadata (3.3 kB)
Collecting rignore>=0.5.1 (from fastapi-cloud-cli>=0.1.1->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading rignore-0.7.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting sentry-sdk>=2.20.0 (from fastapi-cloud-cli>=0.1.1->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading sentry_sdk-2.46.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting fastar>=0.5.0 (from fastapi-cloud-cli>=0.1.1->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading fastar-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch==2.9.0->vllm) (3.0.3)
Requirement already satisfied: jsonschema>=4.21.1 in /usr/local/lib/python3.12/dist-packages (from mistral_common>=1.8.5->mistral_common[image]>=1.8.5->vllm) (4.25.1)
Collecting pydantic-extra-types>=2.10.5 (from pydantic-extra-types[pycountry]>=2.10.5->mistral_common>=1.8.5->mistral_common[image]>=1.8.5->vllm)
Downloading pydantic_extra_types-2.10.6-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.21.1->mistral_common>=1.8.5->mistral_common[image]>=1.8.5->vllm) (25.4.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.21.1->mistral_common>=1.8.5->mistral_common[image]>=1.8.5->vllm) (2025.9.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.21.1->mistral_common>=1.8.5->mistral_common[image]>=1.8.5->vllm) (0.37.0)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.21.1->mistral_common>=1.8.5->mistral_common[image]>=1.8.5->vllm) (0.29.0)
Requirement already satisfied: cuda-python>=12.8 in /usr/local/lib/python3.12/dist-packages (from nvidia-cutlass-dsl>=4.2.1->flashinfer-python==0.5.2->vllm) (13.0.3)
Requirement already satisfied: cuda-bindings~=13.0.3 in /usr/local/lib/python3.12/dist-packages (from cuda-python>=12.8->nvidia-cutlass-dsl>=4.2.1->flashinfer-python==0.5.2->vllm) (13.0.3)
Requirement already satisfied: cuda-pathfinder~=1.1 in /usr/local/lib/python3.12/dist-packages (from cuda-python>=12.8->nvidia-cutlass-dsl>=4.2.1->flashinfer-python==0.5.2->vllm) (1.3.2)
Requirement already satisfied: pycountry>=23 in /usr/local/lib/python3.12/dist-packages (from pydantic-extra-types[pycountry]>=2.10.5->mistral_common>=1.8.5->mistral_common[image]>=1.8.5->vllm) (24.6.1)
Collecting click (from flashinfer-python==0.5.2->vllm)
Downloading click-8.2.1-py3-none-any.whl.metadata (2.5 kB)
Collecting msgpack<2.0.0,>=1.0.0 (from ray>=2.48.0->ray[cgraph]>=2.48.0->vllm)
Downloading msgpack-1.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (8.1 kB)
Collecting cupy-cuda12x (from ray[cgraph]>=2.48.0->vllm)
Downloading cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl.metadata (2.4 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests>=2.26.0->vllm) (3.4.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests>=2.26.0->vllm) (2.5.0)
Collecting rich>=13.7.1 (from rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading rich-14.2.0-py3-none-any.whl.metadata (18 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=13.7.1->rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading markdown_it_py-4.0.0-py3-none-any.whl.metadata (7.3 kB)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=13.7.1->rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm) (2.19.2)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=13.7.1->rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch==2.9.0->vllm) (1.3.0)
Collecting shellingham>=1.3.0 (from typer>=0.15.1->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting httptools>=0.6.3 (from uvicorn[standard]>=0.12.0; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading httptools-0.7.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata (3.5 kB)
Collecting python-dotenv>=0.13 (from uvicorn[standard]>=0.12.0; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading python_dotenv-1.2.1-py3-none-any.whl.metadata (25 kB)
Requirement already satisfied: uvloop>=0.15.1 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.12.0; extra == "standard"->fastapi[standard]>=0.115.0->vllm) (0.21.0)
Collecting websockets>=10.4 (from uvicorn[standard]>=0.12.0; extra == "standard"->fastapi[standard]>=0.115.0->vllm)
Downloading websockets-15.0.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->vllm) (2.6.1)
Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->vllm) (1.4.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp->vllm) (1.8.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp->vllm) (6.7.0)
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->vllm) (0.4.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->vllm) (1.22.0)
Collecting fastrlock>=0.5 (from cupy-cuda12x->ray[cgraph]>=2.48.0->vllm)
Downloading fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
Downloading vllm-0.11.2-cp38-abi3-manylinux1_x86_64.whl (370.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 370.3/370.3 MB 62.8 MB/s 0:00:06
Downloading anthropic-0.71.0-py3-none-any.whl (355 kB)
Downloading depyf-0.20.0-py3-none-any.whl (39 kB)
Downloading lark-1.2.2-py3-none-any.whl (111 kB)
Downloading lm_format_enforcer-0.11.3-py3-none-any.whl (45 kB)
Downloading numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/3.9 MB 59.1 MB/s 0:00:00
Downloading outlines_core-0.2.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 42.4 MB/s 0:00:00
Downloading torch-2.9.0-cp312-cp312-manylinux_2_28_x86_64.whl (899.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 899.7/899.7 MB 40.3 MB/s 0:00:13
Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 594.3/594.3 MB 62.8 MB/s 0:00:07
Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 64.7 MB/s 0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 80.3 MB/s 0:00:01
Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 51.3 MB/s 0:00:00
Downloading nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 72.2 MB/s 0:00:02
Downloading nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 51.9 MB/s 0:00:00
Downloading nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 79.6 MB/s 0:00:00
Downloading nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 267.5/267.5 MB 74.0 MB/s 0:00:03
Downloading nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 MB 59.1 MB/s 0:00:04
Downloading nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.3/322.3 MB 70.3 MB/s 0:00:05
Downloading nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.3/39.3 MB 95.5 MB/s 0:00:00
Downloading nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.7/124.7 MB 81.4 MB/s 0:00:01
Downloading nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB)
Downloading torchaudio-2.9.0-cp312-cp312-manylinux_2_28_x86_64.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 81.4 MB/s 0:00:00
Downloading torchvision-0.24.0-cp312-cp312-manylinux_2_28_x86_64.whl (8.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.1/8.1 MB 97.9 MB/s 0:00:00
Downloading triton-3.5.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (170.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 170.5/170.5 MB 58.0 MB/s 0:00:02
Downloading xformers-0.0.33.post1-cp39-abi3-manylinux_2_28_x86_64.whl (122.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 122.9/122.9 MB 64.0 MB/s 0:00:01
Downloading llguidance-1.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 13.0 MB/s 0:00:00
Downloading llvmlite-0.44.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.4/42.4 MB 28.6 MB/s 0:00:01
Downloading model_hosting_container_standards-0.1.9-py3-none-any.whl (102 kB)
Downloading numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 76.7 MB/s 0:00:00
Downloading email_validator-2.3.0-py3-none-any.whl (35 kB)
Downloading dnspython-2.8.0-py3-none-any.whl (331 kB)
Downloading fastapi_cli-0.0.16-py3-none-any.whl (12 kB)
Downloading fastapi_cloud_cli-0.5.2-py3-none-any.whl (23 kB)
Downloading fastar-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (821 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 821.6/821.6 kB 39.6 MB/s 0:00:00
Downloading mistral_common-1.8.6-py3-none-any.whl (6.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 56.1 MB/s 0:00:00
Downloading opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (54.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.0/54.0 MB 98.8 MB/s 0:00:00
Downloading prometheus_fastapi_instrumentator-7.1.0-py3-none-any.whl (19 kB)
Downloading pydantic_extra_types-2.10.6-py3-none-any.whl (40 kB)
Downloading ray-2.52.1-cp312-cp312-manylinux2014_x86_64.whl (72.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.3/72.3 MB 68.2 MB/s 0:00:01
Downloading msgpack-1.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (427 kB)
Downloading click-8.2.1-py3-none-any.whl (102 kB)
Downloading rich_toolkit-0.17.0-py3-none-any.whl (31 kB)
Downloading rich-14.2.0-py3-none-any.whl (243 kB)
Downloading markdown_it_py-4.0.0-py3-none-any.whl (87 kB)
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Downloading rignore-0.7.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (959 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 959.8/959.8 kB 47.9 MB/s 0:00:00
Downloading sentry_sdk-2.46.0-py2.py3-none-any.whl (406 kB)
Downloading supervisor-4.3.0-py2.py3-none-any.whl (320 kB)
Downloading typer-0.20.0-py3-none-any.whl (47 kB)
Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Downloading httptools-0.7.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (517 kB)
Downloading python_dotenv-1.2.1-py3-none-any.whl (21 kB)
Downloading watchfiles-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (456 kB)
Downloading websockets-15.0.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (182 kB)
Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Downloading blake3-1.0.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (388 kB)
Downloading cachetools-6.2.2-py3-none-any.whl (11 kB)
Downloading cbor2-5.7.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (285 kB)
Downloading cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl (112.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 112.9/112.9 MB 70.7 MB/s 0:00:01
Downloading fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (53 kB)
Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Downloading python_json_logger-4.0.0-py3-none-any.whl (15 kB)
WARNING: Error parsing dependencies of devscripts: Invalid version: '2.22.1ubuntu1'
Installing collected packages: supervisor, py-cpuinfo, fastrlock, websockets, triton, shellingham, sentry-sdk, rignore, python-json-logger, python-dotenv, outlines_core, nvidia-nvtx-cu12, nvidia-nvshmem-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, msgpack, mdurl, llvmlite, llguidance, lark, jmespath, httptools, fastar, dnspython, click, cbor2, cachetools, blake3, astor, watchfiles, opencv-python-headless, nvidia-cusparse-cu12, nvidia-cufft-cu12, numba, markdown-it-py, email-validator, depyf, cupy-cuda12x, rich, pydantic-extra-types, prometheus-fastapi-instrumentator, nvidia-cusolver-cu12, lm-format-enforcer, anthropic, typer, torch, rich-toolkit, ray, model-hosting-container-standards, xformers, torchvision, torchaudio, mistral_common, fastapi-cloud-cli, fastapi-cli, vllm
Attempting uninstall: triton
Found existing installation: triton 3.4.0
Uninstalling triton-3.4.0:
Successfully uninstalled triton-3.4.0
Attempting uninstall: outlines_core
Found existing installation: outlines_core 0.1.26
Uninstalling outlines_core-0.1.26:
Successfully uninstalled outlines_core-0.1.26
Attempting uninstall: nvidia-nvtx-cu12
Found existing installation: nvidia-nvtx-cu12 12.9.79
Uninstalling nvidia-nvtx-cu12-12.9.79:
Successfully uninstalled nvidia-nvtx-cu12-12.9.79
Attempting uninstall: nvidia-nvjitlink-cu12
Found existing installation: nvidia-nvjitlink-cu12 12.9.86
Uninstalling nvidia-nvjitlink-cu12-12.9.86:
Successfully uninstalled nvidia-nvjitlink-cu12-12.9.86
Attempting uninstall: nvidia-nccl-cu12
Found existing installation: nvidia-nccl-cu12 2.27.3
Uninstalling nvidia-nccl-cu12-2.27.3:
Successfully uninstalled nvidia-nccl-cu12-2.27.3
Attempting uninstall: nvidia-curand-cu12
Found existing installation: nvidia-curand-cu12 10.3.10.19
Uninstalling nvidia-curand-cu12-10.3.10.19:
Successfully uninstalled nvidia-curand-cu12-10.3.10.19
Attempting uninstall: nvidia-cufile-cu12
Found existing installation: nvidia-cufile-cu12 1.14.1.1
Uninstalling nvidia-cufile-cu12-1.14.1.1:
Successfully uninstalled nvidia-cufile-cu12-1.14.1.1
Attempting uninstall: nvidia-cuda-runtime-cu12
Found existing installation: nvidia-cuda-runtime-cu12 12.9.79
Uninstalling nvidia-cuda-runtime-cu12-12.9.79:
Successfully uninstalled nvidia-cuda-runtime-cu12-12.9.79
Attempting uninstall: nvidia-cuda-nvrtc-cu12
Found existing installation: nvidia-cuda-nvrtc-cu12 12.9.86
Uninstalling nvidia-cuda-nvrtc-cu12-12.9.86:
Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.9.86
Attempting uninstall: nvidia-cuda-cupti-cu12
Found existing installation: nvidia-cuda-cupti-cu12 12.9.79
Uninstalling nvidia-cuda-cupti-cu12-12.9.79:
Successfully uninstalled nvidia-cuda-cupti-cu12-12.9.79
Attempting uninstall: nvidia-cublas-cu12
Found existing installation: nvidia-cublas-cu12 12.9.1.4
Uninstalling nvidia-cublas-cu12-12.9.1.4:
Successfully uninstalled nvidia-cublas-cu12-12.9.1.4
Attempting uninstall: numpy
Found existing installation: numpy 2.3.5
Uninstalling numpy-2.3.5:
Successfully uninstalled numpy-2.3.5
Attempting uninstall: llguidance
Found existing installation: llguidance 0.7.30
Uninstalling llguidance-0.7.30:
Successfully uninstalled llguidance-0.7.30
Attempting uninstall: lark
Found existing installation: lark 1.3.1
Uninstalling lark-1.3.1:
Successfully uninstalled lark-1.3.1
Attempting uninstall: click
Found existing installation: click 8.3.1
Uninstalling click-8.3.1:
Successfully uninstalled click-8.3.1
Attempting uninstall: nvidia-cusparse-cu12
Found existing installation: nvidia-cusparse-cu12 12.5.10.65
Uninstalling nvidia-cusparse-cu12-12.5.10.65:
Successfully uninstalled nvidia-cusparse-cu12-12.5.10.65
Attempting uninstall: nvidia-cufft-cu12
Found existing installation: nvidia-cufft-cu12 11.4.1.4
Uninstalling nvidia-cufft-cu12-11.4.1.4:
Successfully uninstalled nvidia-cufft-cu12-11.4.1.4
Attempting uninstall: nvidia-cusolver-cu12
Found existing installation: nvidia-cusolver-cu12 11.7.5.82
Uninstalling nvidia-cusolver-cu12-11.7.5.82:
Successfully uninstalled nvidia-cusolver-cu12-11.7.5.82
Attempting uninstall: anthropic
Found existing installation: anthropic 0.73.0
Uninstalling anthropic-0.73.0:
Successfully uninstalled anthropic-0.73.0
Attempting uninstall: torch
Found existing installation: torch 2.8.0+cu129
Uninstalling torch-2.8.0+cu129:
Successfully uninstalled torch-2.8.0+cu129
Attempting uninstall: torchvision
Found existing installation: torchvision 0.23.0+cu129
Uninstalling torchvision-0.23.0+cu129:
Successfully uninstalled torchvision-0.23.0+cu129
Attempting uninstall: torchaudio
Found existing installation: torchaudio 2.8.0+cu129
Uninstalling torchaudio-2.8.0+cu129:
Successfully uninstalled torchaudio-2.8.0+cu129
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
outlines 0.1.11 requires outlines_core==0.1.26, but you have outlines-core 0.2.11 which is incompatible.
sglang 0.5.5.post3 requires llguidance<0.8.0,>=0.7.11, but you have llguidance 1.3.0 which is incompatible.
sglang 0.5.5.post3 requires nvidia-cutlass-dsl==4.2.1, but you have nvidia-cutlass-dsl 4.3.0.dev0 which is incompatible.
sglang 0.5.5.post3 requires torch==2.8.0, but you have torch 2.9.0 which is incompatible.
sglang 0.5.5.post3 requires torchaudio==2.8.0, but you have torchaudio 2.9.0 which is incompatible.
Successfully installed anthropic-0.71.0 astor-0.8.1 blake3-1.0.8 cachetools-6.2.2 cbor2-5.7.1 click-8.2.1 cupy-cuda12x-13.6.0 depyf-0.20.0 dnspython-2.8.0 email-validator-2.3.0 fastapi-cli-0.0.16 fastapi-cloud-cli-0.5.2 fastar-0.8.0 fastrlock-0.8.3 httptools-0.7.1 jmespath-1.0.1 lark-1.2.2 llguidance-1.3.0 llvmlite-0.44.0 lm-format-enforcer-0.11.3 markdown-it-py-4.0.0 mdurl-0.1.2 mistral_common-1.8.6 model-hosting-container-standards-0.1.9 msgpack-1.1.2 numba-0.61.2 numpy-2.2.6 nvidia-cublas-cu12-12.8.4.1 nvidia-cuda-cupti-cu12-12.8.90 nvidia-cuda-nvrtc-cu12-12.8.93 nvidia-cuda-runtime-cu12-12.8.90 nvidia-cufft-cu12-11.3.3.83 nvidia-cufile-cu12-1.13.1.3 nvidia-curand-cu12-10.3.9.90 nvidia-cusolver-cu12-11.7.3.90 nvidia-cusparse-cu12-12.5.8.93 nvidia-nccl-cu12-2.27.5 nvidia-nvjitlink-cu12-12.8.93 nvidia-nvshmem-cu12-3.3.20 nvidia-nvtx-cu12-12.8.90 opencv-python-headless-4.12.0.88 outlines_core-0.2.11 prometheus-fastapi-instrumentator-7.1.0 py-cpuinfo-9.0.0 pydantic-extra-types-2.10.6 python-dotenv-1.2.1 python-json-logger-4.0.0 ray-2.52.1 rich-14.2.0 rich-toolkit-0.17.0 rignore-0.7.6 sentry-sdk-2.46.0 shellingham-1.5.4 supervisor-4.3.0 torch-2.9.0 torchaudio-2.9.0 torchvision-0.24.0 triton-3.5.0 typer-0.20.0 vllm-0.11.2 watchfiles-1.1.1 websockets-15.0.1 xformers-0.0.33.post1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
and if I run the command after installing vLLM with the step above, I get this error -
root@158457f6b080:/sgl-workspace/sglang# python3 -m sglang.launch_server --model-path "meta-llama/Llama-3.2-1B" --host 0.0.0.0 --dtype float32
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 24, in <module>
server_args = prepare_server_args(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 4106, in prepare_server_args
return ServerArgs.from_cli_args(raw_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 3707, in from_cli_args
return cls(**{attr: getattr(args, attr) for attr in attrs})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 281, in __init__
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 596, in __post_init__
self._handle_gpu_memory_settings(gpu_mem)
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 845, in _handle_gpu_memory_settings
model_config = self.get_model_config()
^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 3728, in get_model_config
from sglang.srt.configs.model_config import ModelConfig
File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 26, in <module>
from sglang.srt.layers.quantization import QUANTIZATION_METHODS
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/__init__.py", line 19, in <module>
from sglang.srt.layers.quantization.auto_round import AutoRoundConfig
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/auto_round.py", line 12, in <module>
from sglang.srt.layers.quantization.utils import get_scalar_types
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/utils.py", line 13, in <module>
from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_kernel.py", line 46, in <module>
from sgl_kernel import sgl_per_tensor_quant_fp8, sgl_per_token_quant_fp8
File "/usr/local/lib/python3.12/dist-packages/sgl_kernel/__init__.py", line 5, in <module>
common_ops = _load_architecture_specific_ops()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sgl_kernel/load_utils.py", line 188, in _load_architecture_specific_ops
raise ImportError(error_msg)
ImportError:
[sgl_kernel] CRITICAL: Could not load any common_ops library!
Attempted locations:
1. Architecture-specific pattern: /usr/local/lib/python3.12/dist-packages/sgl_kernel/sm90/common_ops.* - found files: ['/usr/local/lib/python3.12/dist-packages/sgl_kernel/sm90/common_ops.abi3.so']
2. Fallback pattern: /usr/local/lib/python3.12/dist-packages/sgl_kernel/common_ops.* - found files: []
3. Standard Python import: common_ops - failed
GPU Info:
- Compute capability: 90
- Expected variant: SM90 (Hopper/H100 with fast math optimization)
Please ensure sgl_kernel is properly installed with:
pip install --upgrade sgl_kernel
Error details from previous import attempts:
- ImportError: /usr/local/lib/python3.12/dist-packages/sgl_kernel/sm90/common_ops.abi3.so: undefined symbol: _ZNK3c106SymInt6sym_neERKS0_
- ModuleNotFoundError: No module named 'common_ops'
If vLLM is a required dependency then I think it should be in the image. Since it is not there then can you please share what vLLM version should I be installing?
Environment
I am using the docker image via this command -
docker run --gpus all -it \
--shm-size 32g \
--env "HF_TOKEN=TOKEN" \
--ipc=host \
lmsysorg/sglang:latest \
bash
The exact hash is -
$ docker image inspect --format='{{.RepoDigests}}' lmsysorg/sglang:latest
[lmsysorg/sglang@sha256:97fe3876fd7f0d27c72c79f612b024e08e9ac4ffdc52b5e4f81b7b53e1f3e819]
Metadata
Metadata
Assignees
Labels
No labels