Add epilogue subtiling #948

PaulZhang12 · 2025-10-15T20:59:34Z

Stacked PRs:

->Add epilogue subtiling #948

Co-author: @yf225

Epilogue Subtiling

Add it as an opt-in feature currently, as support for complex epilogues (such as loading a bias + adding to accumulator) is difficult and not currently supported. Furthermore, most kernels do not require epilogue subtiling, as it is generally useful for GEMMs in which the accumulator lives in TMEM for B200.

GEMM CI exhibits ~4% gain, epilogue_subtiling=[2] is often picked as the final config, 0.88x with subtiling, 0.84x without

stack-info: PR: #948, branch: PaulZhang12/stack/14

examples/matmul.py

stack-info: PR: #948, branch: PaulZhang12/stack/14

jansel

Does this help with matmul perf?

helion/_compiler/device_function.py

examples/matmul.py

helion/_compiler/indexing_strategy.py

helion/autotuner/config_spec.py

helion/_compiler/indexing_strategy.py

stack-info: PR: #948, branch: PaulZhang12/stack/14

jansel · 2025-11-02T04:03:29Z

Any perf data on this one?

stack-info: PR: #948, branch: PaulZhang12/stack/14

jansel

Let's turn allow_epilogue_subtiling on by default then do a full test run to shake out any issues (then turn it off by default again).

jansel · 2025-11-07T02:56:52Z

helion/_compiler/device_ir.py

+
+            lowering = current.meta.get("lowering")
+            # Check if this is a pointwise operation with only one user
+            if isinstance(lowering, PointwiseLowering) and len(current.users) == 1:


Can you explain the users==1 requirement? Is this meant to ensure everything is contained in the same graph? Maybe we should check this constraint more directly.

jansel · 2025-11-07T02:57:18Z

helion/_compiler/device_ir.py

+                if current not in pointwise_nodes:
+                    pointwise_nodes[current] = None


Suggested change

if current not in pointwise_nodes:

pointwise_nodes[current] = None

pointwise_nodes.setdefault(current)

jansel · 2025-11-07T04:16:22Z

helion/_compiler/device_ir.py

+
+    for node in graph.nodes:
+        if node.op == "call_function" and node.target == store_api:
+            stores.add(node)


Where is this used?

jansel · 2025-11-07T04:18:25Z

helion/_compiler/device_ir.py

+    # Register a tunable for epilogue subtile for all device stores
+    fragment = ListOf(
+        EnumFragment(choices=VALID_EPILOGUE_SUBTILE_SIZES), length=store_count
+    )


Move the fragment defnition to config spec.

jansel · 2025-11-07T04:19:01Z

helion/_compiler/inductor_lowering.py

+
+                for node in graph.nodes:
+                    if node.op == "call_function" and node.target == store_api:
+                        stores.add(node)


jansel · 2025-11-07T04:22:33Z

helion/_compiler/utils.py

+def _use_epilogue_subtile() -> bool:
+    from .compile_environment import CompileEnvironment
+
+    return (
+        torch.cuda.is_available()
+        and torch.cuda.get_device_capability() >= (10, 0)
+        and CompileEnvironment.current().settings.allow_epilogue_subtiling
+    )


Move this logic to CompileEnvironment and only compute it once.

jansel · 2025-11-07T04:27:40Z

helion/_compiler/indexing_strategy.py

The changes in this file seem to duplicate a lot of codegen logic. IMO it would be cleaner to frame this as a graph transformation rather than trying cram so much logic inside the handling of store codegen.

jansel · 2025-11-07T04:28:29Z

helion/_compiler/indexing_strategy.py

+    return not (block_n_hint % 2 != 0 or block_size <= 16)
+
+
+def _get_accumulator_subtiles(


Do we need to special case accumulators? I think the concept of subtiling is more generic.

jansel · 2025-11-07T04:28:51Z

helion/_compiler/indexing_strategy.py

+    return output_shape
+
+
+def _can_epilogue_subtile_with_output_shape(


This check should happen before we add the option to the configspec.

jansel · 2025-11-07T04:29:54Z

helion/_compiler/indexing_strategy.py

            value=store_value,
        )

+    def codegen_store_subtile(


This looks like a lot of duplicated code. We should refactor things to share more.

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025

Add epilogue subtiling

fcc7492

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from cf439ac to fcc7492 Compare October 15, 2025 20:59

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2025

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025

Add epilogue subtiling

cdbedf6

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from fcc7492 to cdbedf6 Compare October 15, 2025 21:05

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025

Add epilogue subtiling

58496fb

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from cdbedf6 to 58496fb Compare October 15, 2025 21:05

oulgen reviewed Oct 15, 2025

View reviewed changes

examples/matmul.py Outdated Show resolved Hide resolved

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025

Add epilogue subtiling

965b193

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 58496fb to 965b193 Compare October 15, 2025 22:00

jansel requested changes Oct 16, 2025

View reviewed changes

PaulZhang12 added a commit that referenced this pull request Oct 16, 2025

Add epilogue subtiling

2bc36d0

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 965b193 to 2bc36d0 Compare October 16, 2025 01:53

PaulZhang12 added a commit that referenced this pull request Oct 17, 2025

Add epilogue subtiling

1c1e282

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 2bc36d0 to 1c1e282 Compare October 17, 2025 22:11

PaulZhang12 added a commit that referenced this pull request Oct 20, 2025

Add epilogue subtiling

cccb0af

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 1c1e282 to cccb0af Compare October 20, 2025 19:19

PaulZhang12 added a commit that referenced this pull request Oct 20, 2025

Add epilogue subtiling

a6dd082

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from cccb0af to a6dd082 Compare October 20, 2025 19:19

PaulZhang12 mentioned this pull request Oct 20, 2025

Remove unrolling with tma + pipelining #994

Merged

PaulZhang12 changed the base branch from main to PaulZhang12/stack/16 October 20, 2025 19:20

PaulZhang12 changed the base branch from PaulZhang12/stack/16 to main October 20, 2025 19:22

PaulZhang12 added a commit that referenced this pull request Oct 20, 2025

Add epilogue subtiling

88d46a8

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from a6dd082 to 88d46a8 Compare October 20, 2025 19:22

PaulZhang12 changed the base branch from main to PaulZhang12/stack/16 October 20, 2025 19:22

PaulZhang12 changed the base branch from PaulZhang12/stack/16 to main October 20, 2025 19:24

PaulZhang12 added a commit that referenced this pull request Oct 20, 2025

Add epilogue subtiling

48eed82

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 88d46a8 to 48eed82 Compare October 20, 2025 19:24

PaulZhang12 changed the base branch from main to PaulZhang12/stack/16 October 20, 2025 19:25

PaulZhang12 changed the base branch from PaulZhang12/stack/16 to main October 20, 2025 19:28

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 0c3d607 to bdf0793 Compare October 22, 2025 00:08

PaulZhang12 added a commit that referenced this pull request Oct 27, 2025

Add epilogue subtiling

9856699

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from bdf0793 to 9856699 Compare October 27, 2025 16:56

PaulZhang12 added a commit that referenced this pull request Oct 30, 2025

Add epilogue subtiling

3ae89e1

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 9856699 to 3ae89e1 Compare October 30, 2025 16:06

PaulZhang12 added a commit that referenced this pull request Oct 30, 2025

Add epilogue subtiling

0ef154f

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 3ae89e1 to 0ef154f Compare October 30, 2025 22:44

PaulZhang12 added a commit that referenced this pull request Nov 3, 2025

Add epilogue subtiling

4e19822

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 0ef154f to 4e19822 Compare November 3, 2025 20:44

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

7e8b05e

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 4e19822 to 7e8b05e Compare November 5, 2025 02:52

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

a8c83b6

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 7e8b05e to a8c83b6 Compare November 5, 2025 03:04

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

4ebc4f1

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from a8c83b6 to 4ebc4f1 Compare November 5, 2025 03:25

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

5b75ab2

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 4ebc4f1 to 5b75ab2 Compare November 5, 2025 04:42

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

f48a0a3

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 5b75ab2 to f48a0a3 Compare November 5, 2025 14:50

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

95d9ef0

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from f48a0a3 to 95d9ef0 Compare November 5, 2025 15:42

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

a9d2372

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from 95d9ef0 to a9d2372 Compare November 5, 2025 16:18

PaulZhang12 added a commit that referenced this pull request Nov 5, 2025

Add epilogue subtiling

a56a3b7

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from a9d2372 to a56a3b7 Compare November 5, 2025 16:30

Add epilogue subtiling

cd3553e

stack-info: PR: #948, branch: PaulZhang12/stack/14

PaulZhang12 force-pushed the PaulZhang12/stack/14 branch from a56a3b7 to cd3553e Compare November 5, 2025 16:35

jansel requested changes Nov 7, 2025

View reviewed changes

jansel mentioned this pull request Nov 13, 2025

Autotune over epilogue subtiling #1120

Open

		if current not in pointwise_nodes:
		pointwise_nodes[current] = None

	if current not in pointwise_nodes:
	pointwise_nodes[current] = None
	pointwise_nodes.setdefault(current)

		return not (block_n_hint % 2 != 0 or block_size <= 16)


		def _get_accumulator_subtiles(

		return output_shape


		def _can_epilogue_subtile_with_output_shape(

Add epilogue subtiling #948

Are you sure you want to change the base?

Add epilogue subtiling #948

Uh oh!

Conversation

PaulZhang12 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Epilogue Subtiling

Uh oh!

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jansel commented Nov 2, 2025

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

PaulZhang12 commented Oct 15, 2025 •

edited

Loading