-
Notifications
You must be signed in to change notification settings - Fork 440
Fix missing property access for multimodal models #966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| # Note: language_model and visual properties can be accessed throught conditional class for BC. | ||
| # Not sure if it is subject to changes in the future. | ||
| # Reference: https://github.com/huggingface/transformers/blob/v4.52.4/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1698 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me remove this comment? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| # The model instance already exists, so we need to additionally patch the | ||
| # instance variables that reference already-instantiated modules | ||
|
|
||
| if isinstance(model, (Qwen2VLForConditionalGeneration, Qwen2VLModel)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to update this condition.
model.model.language for XXXForConditionalGeneration, model.language_model for XXXVLModel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
There still exist some missing attribute error
Similar errors are also listed in #960 (comment). It's just a reminder for myself, not necessarily have to fix all of them in this PR! We can focus on handling all |
Summary
This PR fixes access to missing attributes for multimodal models in
src/liger_kernel/transformers/monkey_patch.py. The main change is to consistently access attributes (likelanguage_model,vision_tower, andvisual) through the submodel.modelattribute of the parent model, rather than directly from the parent model itself.This fixes AttributeError after this PR was merged in transformers:
get_decoder()for multimodal and delete redundant code 🔪 huggingface/transformers#42156See associated issue in TRL:
Fix #960.
Details
Fix: Consistent attribute access via
.modellanguage_model,vision_tower, andvisualto use the.modelattribute (e.g.,model.model.language_modelinstead ofmodel.language_model) across all kernel application functions for models including LLava, Mllama, Gemma3, PaliGemma, Qwen2 VL, Qwen2.5 VL, Qwen3 VL, Qwen3 VL MoE, GLM4V, GLM4V MoE, and InternVL.Normalization and patching logic updates
.model, ensuring that layer normalization and RMS normalization are consistently applied to the correct components.These changes make the codebase more maintainable and robust against future changes in model class implementations.
Testing Done
make testto ensure correctnessmake checkstyleto ensure code stylemake test-convergenceto ensure convergence