Skip to content

Conversation

@jeremyteboul
Copy link
Contributor

@jeremyteboul jeremyteboul commented Nov 23, 2025

The Qwen3OmniMoeProcessor was losing the audio_sample_rate parameter during kwargs restructuring for transformers < 4.58.0. When mm_kwargs were reorganized into audio_kwargs and text_kwargs dictionaries, the audio_sample_rate (passed at the top level) was not being moved into audio_kwargs where the HuggingFace WhisperFeatureExtractor expects it.

This caused audio processing to fail with:
Failed to apply Qwen3OmniMoeProcessor on data={'audio': [array(...)]}
with kwargs={'audio_sample_rate': 16000, 'audio_kwargs': {}, ...}

Changes:

  • Extract audio_sample_rate before kwargs restructuring
  • Place it into audio_kwargs after creating nested dictionaries
  • Add comprehensive unit tests for various sample rates

Tests:
pytest tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation -v

Test coverage:

  • test_audio_sample_rate_preserved_in_audio_kwargs: Core fix validation
  • test_audio_sample_rate_absent_when_not_provided: Edge case handling
  • test_various_audio_sample_rates_preserved: Parameterized test for 8kHz, 16kHz, 22kHz, 24kHz, 44kHz, and 48kHz sample rates

All 8 tests passing:
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_preserved_in_audio_kwargs PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_absent_when_not_provided PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[8000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[16000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[22050] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[24000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[44100] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[48000] PASSED ========================= 8 passed in 0.15s =========================

@mergify mergify bot added multi-modality Related to multi-modality (#4194) qwen Related to Qwen models labels Nov 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug where the audio_sample_rate parameter was being lost during kwargs restructuring for older versions of the transformers library. The fix is straightforward and well-implemented. The new unit tests are comprehensive, covering the primary success path, an edge case where the sample rate is not provided, and a variety of different sample rates. I have one suggestion regarding the new tests to improve their long-term maintainability.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 1033 to 1054
class TestQwen3OmniAudioSampleRatePreservation:
"""Test that audio_sample_rate is preserved during kwargs restructuring.
These tests validate the fix for the audio_sample_rate bug in Qwen3 Omni
where the parameter was lost during kwargs restructuring. The tests don't
require importing the actual model classes - they just test the kwargs
manipulation logic.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a special case for Qwen3-omni, let's create test at tests/models/multimodal/processing/test_qwen3_omni.py instead of test_common.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done I moved the test in the new file

Comment on lines 1042 to 1064
def test_audio_sample_rate_preserved_in_audio_kwargs(self) -> None:
"""
Test that audio_sample_rate is moved from top-level mm_kwargs
into audio_kwargs during kwargs restructuring.
This is the core fix: when transformers < 4.58.0, the code
restructures kwargs into audio_kwargs and text_kwargs, and
audio_sample_rate must be preserved in audio_kwargs.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this test is not an e2e test like others. Can you add an e2e ones? You can refer to tests/models/multimodal/processing/test_qwen2_vl.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added also a e2e test ; thanks for the suggestion; let me know if we miss anything else

@jeremyteboul jeremyteboul force-pushed the qwen3_omni_sample_rate branch 3 times, most recently from 77b06d7 to 27e28b6 Compare November 24, 2025 01:25
The Qwen3OmniMoeProcessor was losing the audio_sample_rate parameter
during kwargs restructuring for transformers < 4.58.0. When mm_kwargs
were reorganized into audio_kwargs and text_kwargs dictionaries, the
audio_sample_rate (passed at the top level) was not being moved into
audio_kwargs where the HuggingFace WhisperFeatureExtractor expects it.

This caused audio processing to fail with:
  Failed to apply Qwen3OmniMoeProcessor on data={'audio': [array(...)]}
  with kwargs={'audio_sample_rate': 16000, 'audio_kwargs': {}, ...}

Changes:
- Extract audio_sample_rate before kwargs restructuring
- Place it into audio_kwargs after creating nested dictionaries
- Add comprehensive unit tests for various sample rates

Tests:
Run tests with:
  source /home/$USER/uv_env/vllm/bin/activate
  cd /home/jeremyte/vllm
  pytest tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation -v

Test coverage:
- test_audio_sample_rate_preserved_in_audio_kwargs: Core fix validation
- test_audio_sample_rate_absent_when_not_provided: Edge case handling
- test_various_audio_sample_rates_preserved: Parameterized test for
  8kHz, 16kHz, 22kHz, 24kHz, 44kHz, and 48kHz sample rates

All 8 tests passing:
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_preserved_in_audio_kwargs PASSED
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_absent_when_not_provided PASSED
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[8000] PASSED
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[16000] PASSED
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[22050] PASSED
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[24000] PASSED
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[44100] PASSED
  tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[48000] PASSED
  ========================= 8 passed in 0.15s =========================

Fixes audio tensor processing for Qwen3 Omni models when using the
raw audio path (non-embeddings mode). Resolves production issue where
audio requests were failing on SMC tier.
@jeremyteboul jeremyteboul force-pushed the qwen3_omni_sample_rate branch from 27e28b6 to 3a15f76 Compare November 24, 2025 06:37
@jeremyteboul
Copy link
Contributor Author

@Isotr0py it should be good for review now;

Comment on lines -1042 to -1044

for k, v in expected_kwargs.items():
assert getattr(processor, k) == v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

@Isotr0py
Copy link
Member

Isotr0py commented Nov 25, 2025

BTW, I doubt if we should expose sampling_rate for audio processor, because whisper feature extractor has a fixed sampling rate, while we have resampled audio to target feature extractor's SR in data parser. So exposing sampling_rate can cause unexpected behaviour.

new_audios = list[np.ndarray]()
for data_item in data_items:
audio, orig_sr = self._get_audio_with_sr(data_item)
if orig_sr is None:
new_audio = audio
else:
new_audio = self.audio_resampler.resample(audio, orig_sr=orig_sr)
new_audios.append(new_audio)
return AudioProcessorItems(new_audios)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants