[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei #29273

fadara01 · 2025-11-23T17:09:51Z

[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei

Purpose

Current impl of swigluoai_and_mul for CPU assumes that gate-up weights have been de-interleaved at load time, which is not the case and as a result gpt-oss generates garbage.
The new impl we dispatch to in this PR is the same one used for the BF16 path on GPU and handles interleaved gate-up.

See comments on #27024 for full context.

Test Plan

Ran gpt-oss-20b end to end on a few prompts.

Test Result

Generations with this fix are decent and very similar to what we get with HF transformers.
Without this fix, generations are garbage

Essential Elements of an Effective PR Description Checklist

[Y] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[Y] The test plan, such as providing test command.
[Y] The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

fadara01 · 2025-11-23T17:10:48Z

@mgoin this is the last piece needed to enable gpt-oss in BF16 on Arm CPUs, could you please have a look?

gemini-code-assist

Code Review

This pull request correctly fixes a bug in the CPU implementation of swigluoai_and_mul by using an implementation that supports interleaved gate-up weights. The change replaces a local function with a shared SwigluOAIAndMul class from the activation layers, which improves code reuse. However, I've identified a performance issue where this new class is instantiated inside a loop, which should be addressed.

gemini-code-assist · 2025-11-23T17:11:10Z

vllm/model_executor/layers/fused_moe/cpu_fused_moe.py

            gate_up = layer.gate_up_linear[i](tokens_for_this_expert)
            if activation == "swigluoai":
-                gate_up = swigluoai_and_mul(gate_up)
+                gate_up = SwigluOAIAndMul().forward_native(gate_up)


For performance reasons, SwigluOAIAndMul should be instantiated only once, outside of this loop. Creating a new instance on every iteration for each expert introduces unnecessary overhead. Consider creating the instance before the for loop on line 268 and reusing it here.

good idea, done!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-23T17:14:50Z

vllm/model_executor/layers/fused_moe/cpu_fused_moe.py

            if activation == "swigluoai":
-                gate_up = swigluoai_and_mul(gate_up)
+                gate_up = SwigluOAIAndMul().forward_native(gate_up)


SwigluOAI activation pairs gate/up values incorrectly on CPU

On the CPU fallback path, SwigluOAIAndMul().forward_native is now fed the gate/up output directly from gate_up_linear, but forward_native expects the gate and up features to be interleaved element‑wise (uses x[..., ::2]/x[..., 1::2]). The CPU weights are loaded as two contiguous halves (gate followed by up) with no interleaving (vllm/model_executor/layers/fused_moe/layer.py lines 984‑1010), and in the CPUFusedMOE path those weights are used without any reordering (unquantized_fused_moe_method.py lines 238‑267). With the new call the gate tensor mixes gate and up values (e.g., gate0 with gate1 instead of gate0 with up0), producing incorrect activations whenever swigluoai is used on CPU when the SGL kernel is unavailable. This regression will distort expert outputs for models such as GPT‑OSS running on CPU.

Useful? React with 👍 / 👎.

The CPU weights are loaded as two contiguous halves (gate followed by up) with no interleaving (vllm/model_executor/layers/fused_moe/layer.py

I wish ... if that was the case we wouldn't be here.

…ights Current impl of `swigluoai_and_mul` for CPU assumes that gate-up weights have been de-interleaved at load time, which is not the case. The new impl we dispatch to is the same one used for the BF16 path on GPU and handles interleaved gate-up. Signed-off-by: Fadi Arafeh <[email protected]>

fadara01 · 2025-11-24T05:53:32Z

CI is green!

fadara01 · 2025-11-24T06:03:53Z

BF16 loading/de-interleaving of gpt-oss's gate-up needs to be addressed in general for both GPU and CPU path. Any future PR addressing this will have to consequently update all SwigluOAI impls introduced in #22951, including SwigluOAI.forwad_native since it's used as the reference impl for GPUs. Since this PR uses the same SwigluOAI impl as GPU BF16 path, any changes to loading in that path will work (correctly) OOB on CPU backend.

ApostaC · 2025-11-24T18:08:36Z

cc @mgoin

fadara01 requested review from mgoin and pavanimajety as code owners November 23, 2025 17:09

gemini-code-assist bot reviewed Nov 23, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 23, 2025

View reviewed changes

fadara01 mentioned this pull request Nov 23, 2025

Transform HF interleaved weights to halves in vllm #27024

Open

fadara01 force-pushed the enable_swigluoai_with_interleaved_gate_up branch from da1644d to e343b14 Compare November 23, 2025 17:52

mgoin added ready ONLY add when PR is ready to merge/full CI is needed cpu Related to CPU backends gpt-oss Related to GPT-OSS models labels Nov 23, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 23, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 23, 2025

bigPYJ1151 approved these changes Nov 25, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 25, 2025

bigPYJ1151 merged commit 98caead into vllm-project:main Nov 25, 2025
55 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Nov 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei #29273

[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei #29273

fadara01 commented Nov 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

fadara01 commented Nov 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 23, 2025

Uh oh!

fadara01 Nov 23, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 23, 2025

Uh oh!

fadara01 Nov 23, 2025 •

edited

Loading

Uh oh!

fadara01 commented Nov 24, 2025

Uh oh!

fadara01 commented Nov 24, 2025

Uh oh!

ApostaC commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei #29273

[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei #29273

Conversation

fadara01 commented Nov 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

fadara01 commented Nov 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fadara01 commented Nov 24, 2025

Uh oh!

fadara01 commented Nov 24, 2025

Uh oh!

ApostaC commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fadara01 commented Nov 23, 2025 •

edited by github-actions bot

Loading

fadara01 Nov 23, 2025 •

edited

Loading

fadara01 Nov 23, 2025 •

edited

Loading