Skip to content

Commit 5f7209a

Browse files
bwastiyewentao256
andauthored
[tiny] Remove unsupported TRITON_MLA backend from batch invariance (#28832)
Signed-off-by: Bram Wasti <[email protected]> Signed-off-by: Bram Wasti <[email protected]> Co-authored-by: Wentao Ye <[email protected]>
1 parent 2d4978a commit 5f7209a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/model_executor/layers/batch_invariant.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -805,11 +805,11 @@ def override_envs_for_invariance():
805805
"FLASH_ATTN", # best supported backend
806806
"FLASHINFER",
807807
"FLASH_ATTN_MLA",
808-
"TRITON_MLA",
809808
# Not yet supported MLA backends
810809
# "FLASHMLA",
811810
# "FLEX_ATTENTION", # IMA issue even if we disable batch invariance
812811
# "FLASHINFER_MLA", https://github.com/vllm-project/vllm/pull/28967
812+
# "TRITON_MLA",
813813
]
814814
if curr_attn_backend not in supported_backends:
815815
warning = (

0 commit comments

Comments
 (0)