Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Dec 1, 2025

Purpose

  • Use tokenizer_mode instead of tokenizer_name as the registry ID, with temporary backward compatibility.
  • In order to simplify the logic of get_tokenizer, also register HfTokenizer and MistralTokenizer to TokenizerRegistry.
  • Moved get_tokenizer to vllm.tokenizers. The old location will be deprecated in the next PR.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 1, 2025
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively unifies tokenizer registration by moving get_tokenizer to vllm.tokenizers and registering in-tree tokenizers like HfTokenizer and MistralTokenizer using a new decorator-based mechanism. This refactoring simplifies the tokenizer loading logic. I have identified two high-severity issues: one related to a breaking change that lacks a proper deprecation cycle, and another concerning a logic flaw in the new TokenizerRegistry.register method that could lead to silent failures. Addressing these will improve backward compatibility and the robustness of the new API.

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 requested a review from noooop as a code owner December 1, 2025 09:27
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 1, 2025 09:34
@DarkLight1337 DarkLight1337 merged commit f0a28bf into vllm-project:main Dec 1, 2025
56 checks passed
@DarkLight1337 DarkLight1337 deleted the unify-tokenizer-reg branch December 1, 2025 11:34
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
amd-hhashemi pushed a commit to amd-hhashemi/vllm that referenced this pull request Dec 2, 2025
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Hashem Hashemi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed structured-output v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants