-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton #29929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the expert selection logic by removing the BatchedTritonOrDeepGemmExperts wrapper class. The selection logic is now directly implemented in select_gemm_impl, which improves clarity. The new logic correctly handles the selection between BatchedDeepGemmExperts and BatchedTritonExperts based on compatibility and installation status of DeepGEMM. I've found a critical typo in the implementation that would cause a runtime error. Please see the specific comment for details.
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
|
Documentation preview: https://vllm--29929.org.readthedocs.build/en/29929/ |
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
Head branch was pushed to by a user without write access
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
…pressed_tensors_moe.py Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: bnellnm <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
f5fd21a to
ef85289
Compare
…to Triton (vllm-project#29929) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: bnellnm <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>
…to Triton (vllm-project#29929) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: bnellnm <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>
Purpose
Remove
BatchedTritonOrDeepGemmExperts. Always try to use DeepGEMM if it is installed and the MoE layer is compatible, i.e. fp8 + 128 block quantized). If the MoE layer is compatible and DeepGEMM isn't installed, throw an error. Otherwise, use BatchedTritonExperts.Test Plan
CI
Test Result
cc @tlrmchlsmth , @varun-sundar-rabindranath
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.