-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring #29255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring #29255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a bug where the audio_sample_rate parameter was being lost during kwargs restructuring for older versions of the transformers library. The fix is straightforward and well-implemented. The new unit tests are comprehensive, covering the primary success path, an edge case where the sample rate is not provided, and a variety of different sample rates. I have one suggestion regarding the new tests to improve their long-term maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
e143763 to
95063d8
Compare
tests/multimodal/test_processing.py
Outdated
| class TestQwen3OmniAudioSampleRatePreservation: | ||
| """Test that audio_sample_rate is preserved during kwargs restructuring. | ||
| These tests validate the fix for the audio_sample_rate bug in Qwen3 Omni | ||
| where the parameter was lost during kwargs restructuring. The tests don't | ||
| require importing the actual model classes - they just test the kwargs | ||
| manipulation logic. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a special case for Qwen3-omni, let's create test at tests/models/multimodal/processing/test_qwen3_omni.py instead of test_common.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done I moved the test in the new file
tests/multimodal/test_processing.py
Outdated
| def test_audio_sample_rate_preserved_in_audio_kwargs(self) -> None: | ||
| """ | ||
| Test that audio_sample_rate is moved from top-level mm_kwargs | ||
| into audio_kwargs during kwargs restructuring. | ||
| This is the core fix: when transformers < 4.58.0, the code | ||
| restructures kwargs into audio_kwargs and text_kwargs, and | ||
| audio_sample_rate must be preserved in audio_kwargs. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this test is not an e2e test like others. Can you add an e2e ones? You can refer to tests/models/multimodal/processing/test_qwen2_vl.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added also a e2e test ; thanks for the suggestion; let me know if we miss anything else
77b06d7 to
27e28b6
Compare
The Qwen3OmniMoeProcessor was losing the audio_sample_rate parameter
during kwargs restructuring for transformers < 4.58.0. When mm_kwargs
were reorganized into audio_kwargs and text_kwargs dictionaries, the
audio_sample_rate (passed at the top level) was not being moved into
audio_kwargs where the HuggingFace WhisperFeatureExtractor expects it.
This caused audio processing to fail with:
Failed to apply Qwen3OmniMoeProcessor on data={'audio': [array(...)]}
with kwargs={'audio_sample_rate': 16000, 'audio_kwargs': {}, ...}
Changes:
- Extract audio_sample_rate before kwargs restructuring
- Place it into audio_kwargs after creating nested dictionaries
- Add comprehensive unit tests for various sample rates
Tests:
Run tests with:
source /home/$USER/uv_env/vllm/bin/activate
cd /home/jeremyte/vllm
pytest tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation -v
Test coverage:
- test_audio_sample_rate_preserved_in_audio_kwargs: Core fix validation
- test_audio_sample_rate_absent_when_not_provided: Edge case handling
- test_various_audio_sample_rates_preserved: Parameterized test for
8kHz, 16kHz, 22kHz, 24kHz, 44kHz, and 48kHz sample rates
All 8 tests passing:
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_preserved_in_audio_kwargs PASSED
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_absent_when_not_provided PASSED
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[8000] PASSED
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[16000] PASSED
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[22050] PASSED
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[24000] PASSED
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[44100] PASSED
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[48000] PASSED
========================= 8 passed in 0.15s =========================
Fixes audio tensor processing for Qwen3 Omni models when using the
raw audio path (non-embeddings mode). Resolves production issue where
audio requests were failing on SMC tier.
27e28b6 to
3a15f76
Compare
|
@Isotr0py it should be good for review now; |
|
|
||
| for k, v in expected_kwargs.items(): | ||
| assert getattr(processor, k) == v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change necessary?
|
BTW, I doubt if we should expose Lines 440 to 450 in 48ddb02
|
The Qwen3OmniMoeProcessor was losing the audio_sample_rate parameter during kwargs restructuring for transformers < 4.58.0. When mm_kwargs were reorganized into audio_kwargs and text_kwargs dictionaries, the audio_sample_rate (passed at the top level) was not being moved into audio_kwargs where the HuggingFace WhisperFeatureExtractor expects it.
This caused audio processing to fail with:
Failed to apply Qwen3OmniMoeProcessor on data={'audio': [array(...)]}
with kwargs={'audio_sample_rate': 16000, 'audio_kwargs': {}, ...}
Changes:
Tests:
pytest tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation -v
Test coverage:
All 8 tests passing:
tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_preserved_in_audio_kwargs PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_audio_sample_rate_absent_when_not_provided PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[8000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[16000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[22050] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[24000] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[44100] PASSED tests/multimodal/test_processing.py::TestQwen3OmniAudioSampleRatePreservation::test_various_audio_sample_rates_preserved[48000] PASSED ========================= 8 passed in 0.15s =========================