[Model] Add LoRA support for Whisper models #29856
Open
+357
−37
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This PR enables Multi-LoRA support for Whisper speech-to-text models, allowing users to serve multiple fine-tuned Whisper adapters from a single base model.
Background
Currently, vLLM's
WhisperForConditionalGenerationdoes not implement theSupportsLoRAinterface, preventing users from using LoRA adapters with Whisper models. This limitation requires users to deployseparate model instances for each fine-tuned variant, which is inefficient in terms of GPU memory usage.
Changes
1.
vllm/model_executor/models/whisper.pySupportsLoRAinterface toWhisperForConditionalGenerationembedding_modulesandembedding_padding_modulesattributes required by LoRApacked_modules_mappingto use simplified keys (qkv_proj,kv_proj) for LoRA compatibility2.
vllm/lora/layers/column_parallel_linear.pyMergedQKVParallelLinearWithLoRAto support KV-only (2-slice) configurationsencoder_attn.kv_proj) only have K and V projections, not Qcan_replace_layer()to accept both 2-module and 3-module configurationsslice_lora_a()to dynamically handle variable number of slices3.
vllm/lora/worker_manager.pymax_target_positionswhenmax_position_embeddingsis not availablemax_target_positionsinstead ofmax_position_embeddings4.
examples/offline_inference/whisper_multilora_inference.py5.
tests/lora/test_whisper_lora.pyTest Plan
Test Result(Unit Tests)
Manual Testing
Tested with openai/whisper-large-v3-turbo base model and custom LoRA adapters:
Example Usage
or