Add mlp support for qwen3vl series and little refactor #957

Tcc0403 · 2025-11-28T18:43:23Z

Summary

This PR aims to fix #956, plus some refactors. Including:

adds swiglu and geglu mlp patching for qwen3vl series, swiglu for text model, geglu for vision model
- geglu is default to False as it is not close to torch's impl, leading to convergence test failure.
modifies qk norm patching with LigerRMSNorm(row_mode=True)
adds layernorm patching for vision model

Note that moe layers aren't patched since there will be a major change in transformers v5, see #958.

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Signed-off-by: Tcc0403 <[email protected]>

thad0ctor · 2025-12-02T05:35:51Z

Summary

This PR aims to fix #956, plus some refactors. Including:

adds swiglu and geglu mlp patching for qwen3vl series, swiglu for text model, geglu for vision model

geglu is default to False as it is not close to torch's impl, leading to convergence test failure.

modifies qk norm patching with LigerRMSNorm(row_mode=True)

adds layernorm patching for vision model

Note that moe layers aren't patched since there will be a major change in transformers v5, see #958.

Testing Done

Hardware Type:

run make test to ensure correctness

run make checkstyle to ensure code style

run make test-convergence to ensure convergence

I commented on #897 about a regression to fcle increasing training speed. I applied this path locally to change the cuda syncing and make the fcle fsdp2 share aware and it seems to fix the regression and squeeze a little more t/s. It may be worth pulling the thread on this further: f22ce38

Tcc0403 · 2025-12-02T06:13:01Z

I commented on #897 about a regression to fcle increasing training speed. I applied this path locally to change the cuda syncing and make the fcle fsdp2 share aware and it seems to fix the regression and squeeze a little more t/s. It may be worth pulling the thread on this further: f22ce38

Thank you, that's an interesting fix! We are planning a 2026 Q1 roadmap, including fsdp2 (multi-gpu) aware testings, optimizations and so on. Feel free to open a PR so we can discuss how we integrate your work align with our roadmap!

Regarding .item() optimization, have you tried reading it directly as a tensor value in triton kernel instead of converting it to python value as we currently do? I wonder how this approach does compared with your approach.

Tcc0403 added 3 commits November 29, 2025 02:41

Add mlp support for qwen3vl and little refactor

9680d93

Signed-off-by: Tcc0403 <[email protected]>

Add qwen3_vl_moe mlp support

a740bfb

Signed-off-by: Tcc0403 <[email protected]>

format

7d1c0ed

Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 changed the title ~~Add mlp support for qwen3vl and little refactor~~ Add mlp support for qwen3vl series and little refactor Nov 28, 2025

Tcc0403 added 2 commits November 29, 2025 04:19

Default geglu=False

29ebe55

Signed-off-by: Tcc0403 <[email protected]>

Set layer_norm=True in convergence test

064b74c

Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 mentioned this pull request Nov 30, 2025

Support for Qwen3-VL models #897

Open

Add warning if geglu=True and skip checking in monkey patch test

d403672

Signed-off-by: Tcc0403 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mlp support for qwen3vl series and little refactor #957

Add mlp support for qwen3vl series and little refactor #957

Tcc0403 commented Nov 28, 2025 •

edited

Loading

Uh oh!

thad0ctor commented Dec 2, 2025 •

edited

Loading

Summary

Testing Done

Uh oh!

Tcc0403 commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add mlp support for qwen3vl series and little refactor #957

Are you sure you want to change the base?

Add mlp support for qwen3vl series and little refactor #957

Conversation

Tcc0403 commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

thad0ctor commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

Tcc0403 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tcc0403 commented Nov 28, 2025 •

edited

Loading

thad0ctor commented Dec 2, 2025 •

edited

Loading

Tcc0403 commented Dec 2, 2025 •

edited

Loading