Skip to content

Conversation

@bnellnm
Copy link
Collaborator

@bnellnm bnellnm commented Dec 3, 2025

Purpose

Remove BatchedTritonOrDeepGemmExperts. Always try to use DeepGEMM if it is installed and the MoE layer is compatible, i.e. fp8 + 128 block quantized). If the MoE layer is compatible and DeepGEMM isn't installed, throw an error. Otherwise, use BatchedTritonExperts.

Test Plan

CI

Test Result

cc @tlrmchlsmth , @varun-sundar-rabindranath


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the expert selection logic by removing the BatchedTritonOrDeepGemmExperts wrapper class. The selection logic is now directly implemented in select_gemm_impl, which improves clarity. The new logic correctly handles the selection between BatchedDeepGemmExperts and BatchedTritonExperts based on compatibility and installation status of DeepGEMM. I've found a critical typo in the implementation that would cause a runtime error. Please see the specific comment for details.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mergify
Copy link

mergify bot commented Dec 3, 2025

Documentation preview: https://vllm--29929.org.readthedocs.build/en/29929/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 3, 2025
Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit

@bnellnm bnellnm changed the title [Kernels] Remove BatchedTritonOrDeepGemmExperts [Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton Dec 3, 2025
@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 3, 2025
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) December 3, 2025 16:50
auto-merge was automatically disabled December 3, 2025 17:26

Head branch was pushed to by a user without write access

@mergify mergify bot added multi-modality Related to multi-modality (#4194) performance Performance-related issues gpt-oss Related to GPT-OSS models nvidia labels Dec 3, 2025
@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 3, 2025
bnellnm and others added 11 commits December 3, 2025 12:30
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
…pressed_tensors_moe.py

Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: bnellnm <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) December 3, 2025 19:09
@tlrmchlsmth tlrmchlsmth merged commit 2902c34 into vllm-project:main Dec 3, 2025
56 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Dec 3, 2025
@bnellnm bnellnm deleted the delete-btordg branch December 3, 2025 21:29
PatrykSaffer pushed a commit to PatrykSaffer/vllm that referenced this pull request Dec 4, 2025
…to Triton (vllm-project#29929)

Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: bnellnm <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025
…to Triton (vllm-project#29929)

Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: bnellnm <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Xingyu Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector multi-modality Related to multi-modality (#4194) nvidia performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants