[Kernel] Add NVFP4 MoE CUTLASS support for SM120 #29242

mgoin · 2025-11-22T17:45:44Z

Purpose

Expand csrc/quantization/fp4/nvfp4_experts_quant.cu and csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu to build for SM120 so we can support nvfp4 cutlass moe on the platform.

Hoping to address #29030, #29141

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <[email protected]>

gemini-code-assist

Code Review

This pull request adds support for NVFP4 MoE kernels on SM120 architecture using CUTLASS. The changes include adding the new kernel file to the build system, implementing the SM120-specific kernel, and creating a dispatcher to select the correct kernel based on the SM version.

The implementation for the SM120 kernel introduces significant code duplication with the existing SM100 kernel. I've left a comment suggesting a refactoring to improve maintainability by using templates to abstract away the architecture-specific details, similar to patterns seen elsewhere in the codebase. Other changes look good.

csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu

Signed-off-by: mgoin <[email protected]>

bbrowning · 2025-11-22T22:03:22Z

csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu

+    int N, int K) {
+  int32_t version_num = get_sm_version_num();
+#if defined ENABLE_NVFP4_SM120 && ENABLE_NVFP4_SM120
+  if (version_num == 120) {


The DGX Spark uses version_num 121, so this may need to be a bit looser. Perhaps something like:

version_num >= 120 && version_num < 130

Add NVFP4 MoE CUTLASS support for SM120

e6bc82e

Signed-off-by: mgoin <[email protected]>

mgoin requested review from LucasWilkinson and tlrmchlsmth as code owners November 22, 2025 17:45

mergify bot added ci/build nvidia labels Nov 22, 2025

github-project-automation bot added this to NVIDIA Nov 22, 2025

gemini-code-assist bot reviewed Nov 22, 2025

View reviewed changes

csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu Show resolved Hide resolved

Add nvfp4_experts_quant for sm120

d7306b8

Signed-off-by: mgoin <[email protected]>

mgoin added kernel moe ready ONLY add when PR is ready to merge/full CI is needed labels Nov 22, 2025

bbrowning reviewed Nov 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernel] Add NVFP4 MoE CUTLASS support for SM120 #29242

[Kernel] Add NVFP4 MoE CUTLASS support for SM120 #29242

mgoin commented Nov 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

bbrowning Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Kernel] Add NVFP4 MoE CUTLASS support for SM120 #29242

Are you sure you want to change the base?

[Kernel] Add NVFP4 MoE CUTLASS support for SM120 #29242

Conversation

mgoin commented Nov 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

bbrowning Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Nov 22, 2025 •

edited by github-actions bot

Loading