[Mosaic GPU] Optimize the computation of tcgen05.mma matrix descriptors #33653

copybara-service · 2025-12-02T13:38:00Z

[Mosaic GPU] Optimize the computation of tcgen05.mma matrix descriptors

Previously we used a simple approach of computing the descriptors entirely
using LLVM ops. This was convenient, but it turns out that there are two
problems with it:

LLVM doesn't always fully constant fold properly and sometimes emits
PTX that causes ptxas to generate lots of non-uniform operations.
LLVM is quite aggressive to hoist descriptor computation outside of loops,
which blows up the register pressure.

The alternative implemented here is to compute the descriptors in inline ptx,
with manual constant folding, and right before the MMA operations. This seems
to generate code that has extremely low register pressure and only very few
uniform operations on 32-bit quantities.

Previously we used a simple approach of computing the descriptors entirely using LLVM ops. This was convenient, but it turns out that there are two problems with it: 1. LLVM doesn't always fully constant fold properly and sometimes emits PTX that causes ptxas to generate lots of non-uniform operations. 2. LLVM is quite aggressive to hoist descriptor computation outside of loops, which blows up the register pressure. The alternative implemented here is to compute the descriptors in inline ptx, with manual constant folding, and right before the MMA operations. This seems to generate code that has extremely low register pressure and only very few uniform operations on 32-bit quantities. PiperOrigin-RevId: 839224185

copybara-service bot assigned apaszke Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Mosaic GPU] Optimize the computation of tcgen05.mma matrix descriptors #33653

[Mosaic GPU] Optimize the computation of tcgen05.mma matrix descriptors #33653

Uh oh!

copybara-service bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Mosaic GPU] Optimize the computation of tcgen05.mma matrix descriptors #33653

Are you sure you want to change the base?

[Mosaic GPU] Optimize the computation of tcgen05.mma matrix descriptors #33653

Uh oh!

Conversation

copybara-service bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant