Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring #29255

jeremyteboul · 2025-11-23T03:43:30Z

The Qwen3OmniMoeProcessor was losing the audio_sample_rate parameter during kwargs restructuring for transformers < 4.58.0. When mm_kwargs were reorganized into audio_kwargs and text_kwargs dictionaries, the audio_sample_rate (passed at the top level) was not being moved into audio_kwargs where the HuggingFace WhisperFeatureExtractor expects it.

This caused audio processing to fail with:
Failed to apply Qwen3OmniMoeProcessor on data={'audio': [array(...)]}
with kwargs={'audio_sample_rate': 16000, 'audio_kwargs': {}, ...}

Changes:

Extract audio_sample_rate before kwargs restructuring
Place it into audio_kwargs after creating nested dictionaries
Add comprehensive unit tests for various sample rates

Tests:
pytest tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation -v

Test coverage:

test_audio_sample_rate_preserved_in_audio_kwargs: Core fix validation
test_audio_sample_rate_absent_when_not_provided: Edge case handling
test_various_audio_sample_rates_preserved: Parameterized test for 8kHz, 16kHz, 22kHz, 24kHz, 44kHz, and 48kHz sample rates

All 8 tests passing:
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_preserved_in_audio_kwargs PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_absent_when_not_provided PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[8000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[16000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[22050] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[24000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[44100] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[48000] PASSED ========================= 8 passed in 0.15s =========================

gemini-code-assist

Code Review

This pull request correctly addresses a bug where the audio_sample_rate parameter was being lost during kwargs restructuring for older versions of the transformers library. The fix is straightforward and well-implemented. The new unit tests are comprehensive, covering the primary success path, an edge case where the sample rate is not provided, and a variety of different sample rates. I have one suggestion regarding the new tests to improve their long-term maintainability.

tests/multimodal/test_processing.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tests/multimodal/test_processing.py

Isotr0py · 2025-11-23T09:34:56Z

tests/multimodal/test_processing.py

+class TestQwen3OmniAudioSampleRatePreservation:
+    """Test that audio_sample_rate is preserved during kwargs restructuring.
+
+    These tests validate the fix for the audio_sample_rate bug in Qwen3 Omni
+    where the parameter was lost during kwargs restructuring. The tests don't
+    require importing the actual model classes - they just test the kwargs
+    manipulation logic.
+    """


Since this is a special case for Qwen3-omni, let's create test at tests/models/multimodal/processing/test_qwen3_omni.py instead of test_common.py.

done I moved the test in the new file

Isotr0py · 2025-11-23T09:38:25Z

tests/multimodal/test_processing.py

+    def test_audio_sample_rate_preserved_in_audio_kwargs(self) -> None:
+        """
+        Test that audio_sample_rate is moved from top-level mm_kwargs
+        into audio_kwargs during kwargs restructuring.
+
+        This is the core fix: when transformers < 4.58.0, the code
+        restructures kwargs into audio_kwargs and text_kwargs, and
+        audio_sample_rate must be preserved in audio_kwargs.
+        """


Seems that this test is not an e2e test like others. Can you add an e2e ones? You can refer to tests/models/multimodal/processing/test_qwen2_vl.py.

Added also a e2e test ; thanks for the suggestion; let me know if we miss anything else

The Qwen3OmniMoeProcessor was losing the audio_sample_rate parameter during kwargs restructuring for transformers < 4.58.0. When mm_kwargs were reorganized into audio_kwargs and text_kwargs dictionaries, the audio_sample_rate (passed at the top level) was not being moved into audio_kwargs where the HuggingFace WhisperFeatureExtractor expects it. This caused audio processing to fail with: Failed to apply Qwen3OmniMoeProcessor on data={'audio': [array(...)]} with kwargs={'audio_sample_rate': 16000, 'audio_kwargs': {}, ...} Changes: - Extract audio_sample_rate before kwargs restructuring - Place it into audio_kwargs after creating nested dictionaries - Add comprehensive unit tests for various sample rates Tests: Run tests with: source /home/$USER/uv_env/vllm/bin/activate cd /home/jeremyte/vllm pytest tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation -v Test coverage: - test_audio_sample_rate_preserved_in_audio_kwargs: Core fix validation - test_audio_sample_rate_absent_when_not_provided: Edge case handling - test_various_audio_sample_rates_preserved: Parameterized test for 8kHz, 16kHz, 22kHz, 24kHz, 44kHz, and 48kHz sample rates All 8 tests passing: tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_preserved_in_audio_kwargs PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_absent_when_not_provided PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[8000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[16000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[22050] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[24000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[44100] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[48000] PASSED ========================= 8 passed in 0.15s ========================= Fixes audio tensor processing for Qwen3 Omni models when using the raw audio path (non-embeddings mode). Resolves production issue where audio requests were failing on SMC tier.

jeremyteboul · 2025-11-24T22:12:41Z

@Isotr0py it should be good for review now;

Isotr0py · 2025-11-25T12:21:19Z

tests/multimodal/test_processing.py

-
-    for k, v in expected_kwargs.items():
-        assert getattr(processor, k) == v


Is this change necessary?

Isotr0py · 2025-11-25T16:01:39Z

BTW, I doubt if we should expose sampling_rate for audio processor, because whisper feature extractor has a fixed sampling rate, while we have resampled audio to target feature extractor's SR in data parser. So exposing sampling_rate can cause unexpected behaviour.

vllm/vllm/multimodal/parse.py

Lines 440 to 450 in 48ddb02

    
           new_audios = list[np.ndarray]() 
        
           for data_item in data_items: 
        
               audio, orig_sr = self._get_audio_with_sr(data_item) 
        
               if orig_sr is None: 
        
                   new_audio = audio 
        
               else: 
        
                   new_audio = self.audio_resampler.resample(audio, orig_sr=orig_sr) 
        
               new_audios.append(new_audio) 
        
           return AudioProcessorItems(new_audios)

jeremyteboul requested review from DarkLight1337, NickLucche, sighingnow and ywang96 as code owners November 23, 2025 03:43

mergify bot added multi-modality Related to multi-modality (#4194) qwen Related to Qwen models labels Nov 23, 2025

gemini-code-assist bot reviewed Nov 23, 2025

View reviewed changes

tests/multimodal/test_processing.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 23, 2025

View reviewed changes

tests/multimodal/test_processing.py Outdated Show resolved Hide resolved

DarkLight1337 requested a review from Isotr0py November 23, 2025 06:58

jeremyteboul force-pushed the qwen3_omni_sample_rate branch from e143763 to 95063d8 Compare November 23, 2025 07:29

Isotr0py reviewed Nov 23, 2025

View reviewed changes

jeremyteboul force-pushed the qwen3_omni_sample_rate branch 3 times, most recently from 77b06d7 to 27e28b6 Compare November 24, 2025 01:25

jeremyteboul force-pushed the qwen3_omni_sample_rate branch from 27e28b6 to 3a15f76 Compare November 24, 2025 06:37

Isotr0py reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring #29255

Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring #29255

jeremyteboul commented Nov 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Isotr0py Nov 23, 2025

Uh oh!

jeremyteboul Nov 24, 2025

Uh oh!

Isotr0py Nov 23, 2025

Uh oh!

jeremyteboul Nov 24, 2025

Uh oh!

jeremyteboul commented Nov 24, 2025

Uh oh!

Isotr0py Nov 25, 2025

Uh oh!

Isotr0py commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		for k, v in expected_kwargs.items():
		assert getattr(processor, k) == v

Uh oh!

Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring #29255

Are you sure you want to change the base?

Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring #29255

Conversation

jeremyteboul commented Nov 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Isotr0py Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

jeremyteboul Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

jeremyteboul Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

jeremyteboul commented Nov 24, 2025

Uh oh!

Isotr0py Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeremyteboul commented Nov 23, 2025 •

edited by github-actions bot

Loading

Isotr0py commented Nov 25, 2025 •

edited

Loading