Skip to content

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Dec 1, 2025

FIX #29747

Purpose

test

vllm serve /home/qwen3-8b --scheduling-policy=priority
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

res = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    n=2,
)

print(res)

before

(EngineCore_DP0 pid=28688)   File "/home/jovyan/vllm/vllm/v1/engine/core.py", line 295, in add_request
(EngineCore_DP0 pid=28688)     self.scheduler.add_request(request)
(EngineCore_DP0 pid=28688)   File "/home/jovyan/vllm/vllm/v1/core/sched/scheduler.py", line 1263, in add_request
(EngineCore_DP0 pid=28688)     self.waiting.add_request(request)
(EngineCore_DP0 pid=28688)   File "/home/jovyan/vllm/vllm/v1/core/sched/request_queue.py", line 152, in add_request
(EngineCore_DP0 pid=28688)     heapq.heappush(self._heap, (request.priority, request.arrival_time, request))
(EngineCore_DP0 pid=28688) TypeError: '<' not supported between instances of 'Request' and 'Request'

this pr

(APIServer pid=31017) INFO:     127.0.0.1:39200 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=31017) INFO 12-01 03:22:20 [chat_utils.py:574] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
(APIServer pid=31017) INFO 12-01 03:22:27 [loggers.py:236] Engine 000: Avg prompt throughput: 2.9 tokens/s, Avg generation throughput: 117.8 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.3%, Prefix cache hit rate: 0.0%
(APIServer pid=31017) INFO:     127.0.0.1:39200 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Test Plan

see e2e

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Dec 1, 2025
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 1, 2025
@chaunceyjiang
Copy link
Collaborator Author

/cc @njhill @ApostaC PTAL.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chaunceyjiang! I'm wondering whether it would be better to just add id(request) into the tuple in the priority request queue? I'm not sure about baking specific ordering logic into the Request type itself (and it also kind of duplicates the ordering defined by the tuples).

@njhill njhill added the bug Something isn't working label Dec 1, 2025
@njhill njhill added this to the v0.12.0 milestone Dec 1, 2025
@chaunceyjiang
Copy link
Collaborator Author

I'm wondering whether it would be better to just add id(request) into the tuple in the priority request queue?

@njhill Actually, this problem is solved by using the request_id for sorting. When n=2, the request_id format is 0_AAaabb, 1_AAaabb. Using id(request) is just a fallback measure.

@njhill
Copy link
Member

njhill commented Dec 2, 2025

@chaunceyjiang I mean including one or the other of these in the tuple in the priority queue heap rather than implementing __lt__ in the request.

@chaunceyjiang
Copy link
Collaborator Author

I mean including one or the other of these in the tuple in the priority queue heap rather than implementing lt in the request.

@njhill Done. PTAL.

@chaunceyjiang chaunceyjiang requested a review from njhill December 2, 2025 08:40
@njhill
Copy link
Member

njhill commented Dec 2, 2025

@chaunceyjiang apologies, after seeing the changes and thinking some more, maybe your original change to make the requests comparable would be better.

Since priority and arrival time are properties of the request itself, I think this does make sense as a canonical "default" ordering after all. We can then also simplify the priority heap to contain just the requests rather than wrapping them in tuples.

If it's too late in the day for you to update this, I can make the change quickly so that we can still include this in the release. If you don't respond here soon I'll do that!

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @chaunceyjiang

@njhill njhill enabled auto-merge (squash) December 2, 2025 20:02
@njhill njhill merged commit 0a9caca into vllm-project:main Dec 2, 2025
46 checks passed
khluu pushed a commit that referenced this pull request Dec 2, 2025
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
(cherry picked from commit 0a9caca)
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025
…project#29764)

Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Xingyu Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: --scheduling-policy=priority & n>1 crashes engine

2 participants