Skip to content

Conversation

@Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Dec 7, 2025

Purpose

  • Currently, there are too many arguments used by platform's get_attn_backend_cls, while not all platforms use all of them. And it will also easily break OOT platform when introduce attention feature with new argument like use_mla and use_sink.
  • To avoid this kind of mess and breakage, this PR wrap platform's attention selection arguments into hashable AttentionSelectorConfig, so that platform can use these arguments on demand and no longer need to update interface for each feature update.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the attention backend selection mechanism by introducing an AttentionSelectorConfig NamedTuple. This new configuration object encapsulates various attention parameters such as head size, data type, KV cache data type, block size, MLA usage, sink token presence, sparse attention usage, and attention type. The get_attn_backend and _cached_get_attn_backend functions in vllm/attention/selector.py are updated to create and pass this single config object instead of multiple individual arguments. This change propagates throughout the platform-specific attention backend selection logic in vllm/platforms/cpu.py, vllm/platforms/cuda.py, and vllm/platforms/interface.py, where relevant methods like get_attn_backend_cls and get_valid_backends are modified to accept and utilize the AttentionSelectorConfig object, simplifying their signatures and improving parameter management. Additionally, logging for attention configurations is updated to use the __repr__ method of the new config object.

@mergify
Copy link

mergify bot commented Dec 7, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 7, 2025
@mergify mergify bot removed the needs-rebase label Dec 7, 2025
Signed-off-by: Isotr0py <[email protected]>
@mergify mergify bot added rocm Related to AMD ROCm tpu Related to Google TPUs labels Dec 7, 2025
@Isotr0py Isotr0py changed the title [Draft][Platform] Refactor Platform attention backend selection to avoid braekpoint for OOT platform [Draft][Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform Dec 8, 2025
@Isotr0py Isotr0py marked this pull request as ready for review December 9, 2025 06:39
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@Isotr0py
Copy link
Member Author

Isotr0py commented Dec 9, 2025

/gemini review

@Isotr0py Isotr0py changed the title [Draft][Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform [Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform Dec 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the attention backend selection by introducing AttentionSelectorConfig to encapsulate the configuration parameters. This is a great improvement as it simplifies the function signatures in platform-specific modules and makes the interface more stable for out-of-tree platforms. The changes are applied consistently across all relevant files. I've found one minor issue with an incomplete __repr__ implementation which could affect debugging.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Isotr0py <[email protected]>
@mergify
Copy link

mergify bot commented Dec 9, 2025

Hi @Isotr0py, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia rocm Related to AMD ROCm tpu Related to Google TPUs

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant