[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton #29929

bnellnm · 2025-12-03T03:02:26Z

Purpose

Remove BatchedTritonOrDeepGemmExperts. Always try to use DeepGEMM if it is installed and the MoE layer is compatible, i.e. fp8 + 128 block quantized). If the MoE layer is compatible and DeepGEMM isn't installed, throw an error. Otherwise, use BatchedTritonExperts.

Test Plan

CI

Test Result

cc @tlrmchlsmth , @varun-sundar-rabindranath

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors the expert selection logic by removing the BatchedTritonOrDeepGemmExperts wrapper class. The selection logic is now directly implemented in select_gemm_impl, which improves clarity. The new logic correctly handles the selection between BatchedDeepGemmExperts and BatchedTritonExperts based on compatibility and installation status of DeepGEMM. I've found a critical typo in the implementation that would cause a runtime error. Please see the specific comment for details.

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

mergify · 2025-12-03T12:58:32Z

Documentation preview: https://vllm--29929.org.readthedocs.build/en/29929/

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

tlrmchlsmth

Just one nit

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

Signed-off-by: Bill Nell <[email protected]>

…pressed_tensors_moe.py Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: bnellnm <[email protected]> Signed-off-by: Bill Nell <[email protected]>

Signed-off-by: Bill Nell <[email protected]>

…to Triton (vllm-project#29929) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: bnellnm <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>

…to Triton (vllm-project#29929) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: bnellnm <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>

bnellnm requested review from WoosukKwon, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 3, 2025 03:02

gemini-code-assist bot reviewed Dec 3, 2025

View reviewed changes

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 3, 2025

View reviewed changes

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py Outdated Show resolved Hide resolved

mergify bot added the documentation Improvements or additions to documentation label Dec 3, 2025

tlrmchlsmth reviewed Dec 3, 2025

View reviewed changes

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py Outdated Show resolved Hide resolved

tlrmchlsmth reviewed Dec 3, 2025

View reviewed changes

bnellnm changed the title ~~[Kernels] Remove BatchedTritonOrDeepGemmExperts~~ [Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton Dec 3, 2025

tlrmchlsmth reviewed Dec 3, 2025

View reviewed changes

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py Outdated Show resolved Hide resolved

tlrmchlsmth approved these changes Dec 3, 2025

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 3, 2025

tlrmchlsmth enabled auto-merge (squash) December 3, 2025 16:50

auto-merge was automatically disabled December 3, 2025 17:26
Head branch was pushed to by a user without write access

bnellnm requested review from ApostaC, DarkLight1337, NickLucche, aarnphm, alexm-redhat, heheda12345, markmc, njhill, noooop, russellb, tjtanaa and ywang96 as code owners December 3, 2025 17:26

mergify bot added multi-modality Related to multi-modality (#4194) performance Performance-related issues gpt-oss Related to GPT-OSS models nvidia labels Dec 3, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements and NVIDIA Dec 3, 2025

mergify bot added v1 kv-connector labels Dec 3, 2025

github-project-automation bot moved this to Ready in gpt-oss Issues & Enhancements Dec 3, 2025

github-project-automation bot moved this to In review in NVIDIA Dec 3, 2025

bnellnm and others added 11 commits December 3, 2025 12:30

Remove BatchedTritonOrDeepGemmExperts

5e5db8a

Signed-off-by: Bill Nell <[email protected]>

fix comment

7b92601

Signed-off-by: Bill Nell <[email protected]>

fix lint

8160d8a

Signed-off-by: Bill Nell <[email protected]>

update docs

62ec8ff

Signed-off-by: Bill Nell <[email protected]>

update error message

c2d57c0

Signed-off-by: Bill Nell <[email protected]>

update error message

0c453ee

Signed-off-by: Bill Nell <[email protected]>

update error message

94ed9f2

Signed-off-by: Bill Nell <[email protected]>

Remove BatchedTritonOrDeepGemmExperts

40d1502

Signed-off-by: Bill Nell <[email protected]>

fix lint

6e8b153

Signed-off-by: Bill Nell <[email protected]>

Update vllm/model_executor/layers/quantization/compressed_tensors/com…

ada1499

…pressed_tensors_moe.py Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: bnellnm <[email protected]> Signed-off-by: Bill Nell <[email protected]>

merge + fix lint

ef85289

Signed-off-by: Bill Nell <[email protected]>

bnellnm force-pushed the delete-btordg branch from f5fd21a to ef85289 Compare December 3, 2025 17:33

Merge branch 'main' into delete-btordg

506eadf

tlrmchlsmth enabled auto-merge (squash) December 3, 2025 19:09

tlrmchlsmth merged commit 2902c34 into vllm-project:main Dec 3, 2025
56 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Dec 3, 2025

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Dec 3, 2025

bnellnm deleted the delete-btordg branch December 3, 2025 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton #29929

[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton #29929

Uh oh!

bnellnm commented Dec 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

mergify bot commented Dec 3, 2025

Uh oh!

Uh oh!

tlrmchlsmth left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton #29929

[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton #29929

Uh oh!

Conversation

bnellnm commented Dec 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify bot commented Dec 3, 2025

Uh oh!

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bnellnm commented Dec 3, 2025 •

edited by github-actions bot

Loading