-
Notifications
You must be signed in to change notification settings - Fork 27
dashinfer vlm: add tensorrt support for qwen2.5vl #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @x574chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've implemented support for running the Qwen2.5VL Vision Transformer (ViT) engine with TensorRT, which is crucial for accelerating inference. This involved adding a new dedicated Python module for the Qwen2.5VL Vision Transformer, updating the model loading and ONNX export logic to correctly handle its specific configurations, and making minor adjustments to related utility and test files. The goal is to provide a more performant inference solution for Qwen2.5VL models within the DashInfer VLM toolkit.
Highlights
- TensorRT Support for Qwen2.5VL: I've added comprehensive TensorRT support for the Qwen2.5VL Vision Transformer (ViT) engine, enabling optimized inference performance for these models.
- Updated Model Support Documentation: The
README.mdhas been updated to reflect the expanded support for various Qwen2.5-VL model sizes, including 3B, 7B, 32B, and 72B. - Code Refactoring and Modularity: I've refactored the
default_weight_loaderfunction into a newutils.pyfile, improving code organization and reusability across different vision transformer implementations. - Test Script Enhancements: Minor adjustments were made to the OpenAI API benchmark and chat completion test scripts, primarily related to API endpoint configurations and local image handling for testing.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for Qwen2.5VL with TensorRT for the Vision Transformer. The changes include adding a new model definition file for Qwen2.5VL, refactoring some utility functions, and updating the model loading and TensorRT conversion logic to handle the new model. The changes are well-structured, but I've found a few issues. There is a critical bug in the calculation of image tokens for the TRT engine, which will likely lead to runtime errors. I've also identified a high-severity issue related to code duplication and a typo in the ONNX-to-plan conversion script. Additionally, there are several medium-severity suggestions to improve code clarity, remove redundancy, and maintain consistent coding style, such as adding missing newlines at the end of files.
| * embed_dim | ||
| / (embed_dim * (spatial_merge_size**2)) | ||
| ) | ||
| image_tokens = int(visual_inputs["input"].shape[1] * (spatial_merge_size**2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation for image_tokens appears to be incorrect. The number of tokens should be reduced by the merger operation, which effectively divides the sequence length by spatial_merge_size**2. The current implementation uses multiplication, which would incorrectly increase the number of tokens and likely cause shape mismatch errors downstream. This seems to be a bug introduced when refactoring the code.
| image_tokens = int(visual_inputs["input"].shape[1] * (spatial_merge_size**2)) | |
| image_tokens = int(visual_inputs["input"].shape[1] / (spatial_merge_size**2)) |
| self.is_qwen_2_5 = is_qwen_2_5 | ||
| if is_qwen_2_5: | ||
| self.config = Qwen2_5_VLConfig.from_pretrained( | ||
| model_path, trust_remote_code=True, revision=None, code_revision=None | ||
| ).vision_config | ||
| self.model_path = model_path | ||
| self.input_embed_dim = ( | ||
| self.config.in_channels | ||
| * self.config.temporal_patch_size | ||
| * self.config.patch_size | ||
| * self.config.patch_size | ||
| ) | ||
| else: | ||
| self.config = Qwen2VLVisionConfig.from_pretrained( | ||
| model_path, trust_remote_code=True, revision=None, code_revision=None | ||
| ) | ||
| self.model_path = model_path | ||
| self.input_embed_dim = ( | ||
| self.config.in_channels | ||
| * self.config.temporal_patch_size | ||
| * self.config.patch_size | ||
| * self.config.patch_size | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is significant code duplication in this __init__ method. The assignments to self.model_path and the calculation of self.input_embed_dim are identical in both branches of the if/else statement. This can be refactored to remove duplication, which improves maintainability. Additionally, there is a typo with extra spaces in self.model_path = model_path in the else block.
self.is_qwen_2_5 = is_qwen_2_5
if is_qwen_2_5:
self.config = Qwen2_5_VLConfig.from_pretrained(
model_path, trust_remote_code=True, revision=None, code_revision=None
).vision_config
else:
self.config = Qwen2VLVisionConfig.from_pretrained(
model_path, trust_remote_code=True, revision=None, code_revision=None
)
self.model_path = model_path
self.input_embed_dim = (
self.config.in_channels
* self.config.temporal_patch_size
* self.config.patch_size
* self.config.patch_size
)| b = v.size(0) | ||
| q = q.view(b, -1, n, d) | ||
| k = k.view(b, -1, n, d) | ||
| v = v.view(b, -1, n, d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are redundant. The variable b is already available as a parameter to the function, and its value is the same as v.size(0). The calls to .view() on q, k, and v are also redundant as they already have the correct shape from the operations on lines 254-256. This code can be safely removed to improve clarity.
| # Note: only used for debug | ||
| if name not in params_dict.keys(): | ||
| continue | ||
| default_weight_loader(params_dict[name], loaded_weight) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| except Exception: | ||
| # NOTE: This exception is added for the purpose of setting breakpoint to | ||
| # debug weight loading issues. | ||
| raise No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ''' | ||
| from ..visual_embedding.DFN_vit import Qwen2VisionTransformer | ||
| from ..visual_embedding.DFN_vit import Qwen2VisionTransformer | ||
| from ..visual_embedding.DFN_vit_2_5 import Qwen2_5VisionTransformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from .cache import * | ||
|
|
||
| from .. import Qwen2VisionTransformer | ||
| from .. import Qwen2VisionTransformer, Qwen2_5VisionTransformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if hasattr(self.hf_model_config, "architectures") and "Qwen2_5_VLForConditionalGeneration" in self.hf_model_config.architectures: | ||
| is_qwen_2_5= True | ||
| else: | ||
| is_qwen_2_5 = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if/else block can be simplified into a single boolean assignment. This improves code conciseness and readability. Also, there is a typo is_qwen_2_5= True with a space before =.
is_qwen_2_5 = hasattr(self.hf_model_config, "architectures") and "Qwen2_5_VLForConditionalGeneration" in self.hf_model_config.architectures* dashinfer vlm: add tensorrt support for qwen2.5vl * update benchmark --------- Co-authored-by: Xiaotong Chen <[email protected]>
支持Qwen2.5VL ViT TensorRT运行