Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Nov 22, 2025

Purpose

Expand csrc/quantization/fp4/nvfp4_experts_quant.cu and csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu to build for SM120 so we can support nvfp4 cutlass moe on the platform.

Hoping to address #29030, #29141

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for NVFP4 MoE kernels on SM120 architecture using CUTLASS. The changes include adding the new kernel file to the build system, implementing the SM120-specific kernel, and creating a dispatcher to select the correct kernel based on the SM version.

The implementation for the SM120 kernel introduces significant code duplication with the existing SM100 kernel. I've left a comment suggesting a refactoring to improve maintainability by using templates to abstract away the architecture-specific details, similar to patterns seen elsewhere in the codebase. Other changes look good.

@mgoin mgoin added kernel moe ready ONLY add when PR is ready to merge/full CI is needed labels Nov 22, 2025
int N, int K) {
int32_t version_num = get_sm_version_num();
#if defined ENABLE_NVFP4_SM120 && ENABLE_NVFP4_SM120
if (version_num == 120) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DGX Spark uses version_num 121, so this may need to be a bit looser. Perhaps something like:

version_num >= 120 && version_num < 130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build kernel moe nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants