Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Dec 9, 2025

Purpose

The transformers backend's WeightsMapper uses a broad prefix mapping ("" -> "model.") to transform weight paths. This mapper is also applied to quantization config targets via apply_vllm_mapper. However, compressed-tensors targets can be module class names (e.g., "Linear") or regex patterns (e.g., "re:.*proj"), not just layer paths.

When "Linear" is transformed to "model.Linear", the target matching fails because the module class name check looks for "Linear" in "model.Linear" (substring match), which fails.

The fix filters which targets get transformed: only layer paths (containing . and not starting with re:) are mapped. Class names and regex patterns are preserved.

Test Plan

# Previously failing, now passing - compressed-tensors with transformers backend
vllm serve RedHatAI/Qwen3-0.6B-FP8-BLOCK --model-impl transformers

# Verify native backend still works
vllm serve RedHatAI/Qwen3-0.6B-FP8-BLOCK

# Verify fp8 (non-compressed-tensors) still works with transformers
vllm serve Qwen/Qwen3-0.6B-FP8 --model-impl transformers

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mgoin mgoin added bug Something isn't working quantization ready ONLY add when PR is ready to merge/full CI is needed labels Dec 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and addresses an issue where the WeightsMapper was incorrectly transforming non-path targets in compressed-tensors quantization configurations. The approach of conditionally applying the mapping is sound. However, I've identified a subtle bug in the implementation of the helper functions _apply_dict and _apply_list. They use a truthiness check that would incorrectly filter out targets mapping to an empty string, potentially leading to silent misconfigurations. My review includes a suggested fix to ensure correct behavior by using an explicit is not None check, aligning with the original WeightsMapper logic.

…pressed_tensors.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Michael Goin <[email protected]>
Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mgoin
Copy link
Member Author

mgoin commented Dec 9, 2025

cc @eldarkurtic to fix your issue reported offline

@vllm-bot vllm-bot merged commit 03b91f7 into vllm-project:main Dec 9, 2025
52 of 54 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Transformers backend Dec 9, 2025
mayoohee pushed a commit to mayoohee/vllm that referenced this pull request Dec 9, 2025
…ers backend (vllm-project#30287)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: mayoohee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working quantization ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants