[bugfix][quantization] Fix fp8 per_tensor scale shape (#30257)

haoyangli-amd · web-flow · commit 03416eada6c0 · 2025-12-09T19:28:50.000+08:00
Signed-off-by: Haoyang Li &lt;lihaoyang0109@gmail.com&gt;
diff --git a/vllm/_custom_ops.py b/vllm/_custom_ops.py
@@ -1726,7 +1726,7 @@ def scaled_fp8_quant(
                 output, input, scale, scale_ub
             )
         else:
-            scale = torch.empty((1, 1), device=input.device, dtype=torch.float32)
+            scale = torch.empty(1, device=input.device, dtype=torch.float32)
             torch.ops._C.dynamic_scaled_fp8_quant(output, input, scale)
     else:
         assert scale.numel() == 1, f"{scale.shape}"

Original file line number	Diff line number	Diff line change
`@@ -1726,7 +1726,7 @@ def scaled_fp8_quant(`
`1726`	`1726`	`output, input, scale, scale_ub`
`1727`	`1727`	`)`
`1728`	`1728`	`else:`
`1729`		`- scale = torch.empty((1, 1), device=input.device, dtype=torch.float32)`
	`1729`	`+ scale = torch.empty(1, device=input.device, dtype=torch.float32)`
`1730`	`1730`	`torch.ops._C.dynamic_scaled_fp8_quant(output, input, scale)`
`1731`	`1731`	`else:`
`1732`	`1732`	`assert scale.numel() == 1, f"{scale.shape}"`