dashinfer vlm: add tensorrt support for qwen2.5vl #91

x574chen · 2025-07-29T11:04:30Z

支持Qwen2.5VL ViT TensorRT运行

gemini-code-assist

Summary of Changes

Hello @x574chen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented support for running the Qwen2.5VL Vision Transformer (ViT) engine with TensorRT, which is crucial for accelerating inference. This involved adding a new dedicated Python module for the Qwen2.5VL Vision Transformer, updating the model loading and ONNX export logic to correctly handle its specific configurations, and making minor adjustments to related utility and test files. The goal is to provide a more performant inference solution for Qwen2.5VL models within the DashInfer VLM toolkit.

Highlights

TensorRT Support for Qwen2.5VL: I've added comprehensive TensorRT support for the Qwen2.5VL Vision Transformer (ViT) engine, enabling optimized inference performance for these models.
Updated Model Support Documentation: The README.md has been updated to reflect the expanded support for various Qwen2.5-VL model sizes, including 3B, 7B, 32B, and 72B.
Code Refactoring and Modularity: I've refactored the default_weight_loader function into a new utils.py file, improving code organization and reusability across different vision transformer implementations.
Test Script Enhancements: Minor adjustments were made to the OpenAI API benchmark and chat completion test scripts, primarily related to API endpoint configurations and local image handling for testing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for Qwen2.5VL with TensorRT for the Vision Transformer. The changes include adding a new model definition file for Qwen2.5VL, refactoring some utility functions, and updating the model loading and TensorRT conversion logic to handle the new model. The changes are well-structured, but I've found a few issues. There is a critical bug in the calculation of image tokens for the TRT engine, which will likely lead to runtime errors. I've also identified a high-severity issue related to code duplication and a typo in the ONNX-to-plan conversion script. Additionally, there are several medium-severity suggestions to improve code clarity, remove redundancy, and maintain consistent coding style, such as adding missing newlines at the end of files.

gemini-code-assist · 2025-07-29T11:07:25Z

multimodal/dashinfer_vlm/vl_inference/utils/trt/vit_process.py

-            * embed_dim
-            / (embed_dim * (spatial_merge_size**2))
-        )
+        image_tokens = int(visual_inputs["input"].shape[1] * (spatial_merge_size**2))


The calculation for image_tokens appears to be incorrect. The number of tokens should be reduced by the merger operation, which effectively divides the sequence length by spatial_merge_size**2. The current implementation uses multiplication, which would incorrectly increase the number of tokens and likely cause shape mismatch errors downstream. This seems to be a bug introduced when refactoring the code.

Suggested change

image_tokens = int(visual_inputs["input"].shape[1] * (spatial_merge_size**2))

image_tokens = int(visual_inputs["input"].shape[1] / (spatial_merge_size**2))

gemini-code-assist · 2025-07-29T11:07:25Z

multimodal/dashinfer_vlm/vl_inference/utils/trt/onnx_to_plan.py

+        self.is_qwen_2_5 = is_qwen_2_5
+        if is_qwen_2_5:
+            self.config = Qwen2_5_VLConfig.from_pretrained(
+                model_path, trust_remote_code=True, revision=None, code_revision=None
+            ).vision_config
+            self.model_path = model_path
+            self.input_embed_dim = (
+                self.config.in_channels
+                * self.config.temporal_patch_size
+                * self.config.patch_size
+                * self.config.patch_size
+            )
+        else:
+            self.config = Qwen2VLVisionConfig.from_pretrained(
+                model_path, trust_remote_code=True, revision=None, code_revision=None
+            )
+            self.model_path = model_path        
+            self.input_embed_dim = (
+                self.config.in_channels
+                * self.config.temporal_patch_size
+                * self.config.patch_size
+                * self.config.patch_size
+            )


There is significant code duplication in this __init__ method. The assignments to self.model_path and the calculation of self.input_embed_dim are identical in both branches of the if/else statement. This can be refactored to remove duplication, which improves maintainability. Additionally, there is a typo with extra spaces in self.model_path = model_path in the else block.

self.is_qwen_2_5 = is_qwen_2_5 if is_qwen_2_5: self.config = Qwen2_5_VLConfig.from_pretrained( model_path, trust_remote_code=True, revision=None, code_revision=None ).vision_config else: self.config = Qwen2VLVisionConfig.from_pretrained( model_path, trust_remote_code=True, revision=None, code_revision=None ) self.model_path = model_path self.input_embed_dim = ( self.config.in_channels * self.config.temporal_patch_size * self.config.patch_size * self.config.patch_size )

gemini-code-assist · 2025-07-29T11:07:25Z

multimodal/dashinfer_vlm/visual_embedding/DFN_vit_2_5.py

+        b = v.size(0)
+        q = q.view(b, -1, n, d)
+        k = k.view(b, -1, n, d)
+        v = v.view(b, -1, n, d)


These lines are redundant. The variable b is already available as a parameter to the function, and its value is the same as v.size(0). The calls to .view() on q, k, and v are also redundant as they already have the correct shape from the operations on lines 254-256. This code can be safely removed to improve clarity.

gemini-code-assist · 2025-07-29T11:07:25Z

multimodal/dashinfer_vlm/visual_embedding/DFN_vit_2_5.py

+            # Note: only used for debug
+            if name not in params_dict.keys():
+                continue
+            default_weight_loader(params_dict[name], loaded_weight)


For consistency and adherence to POSIX standards, please add a newline character at the end of the file.

Suggested change

default_weight_loader(params_dict[name], loaded_weight)

default_weight_loader(params_dict[name], loaded_weight)

gemini-code-assist · 2025-07-29T11:07:26Z

multimodal/dashinfer_vlm/visual_embedding/utils.py

+    except Exception:
+        # NOTE: This exception is added for the purpose of setting breakpoint to
+        # debug weight loading issues.
+        raise


For consistency and adherence to POSIX standards, please add a newline character at the end of the file.

Suggested change

raise

raise

gemini-code-assist · 2025-07-29T11:07:26Z

multimodal/dashinfer_vlm/vl_inference/__init__.py

 '''
-from ..visual_embedding.DFN_vit import Qwen2VisionTransformer
+from ..visual_embedding.DFN_vit import Qwen2VisionTransformer
+from ..visual_embedding.DFN_vit_2_5 import Qwen2_5VisionTransformer


For consistency and adherence to POSIX standards, please add a newline character at the end of the file.

Suggested change

from ..visual_embedding.DFN_vit_2_5 import Qwen2_5VisionTransformer

from ..visual_embedding.DFN_vit_2_5 import Qwen2_5VisionTransformer

gemini-code-assist · 2025-07-29T11:07:26Z

multimodal/dashinfer_vlm/vl_inference/utils/__init__.py

 from .cache import *

-from .. import Qwen2VisionTransformer
+from .. import Qwen2VisionTransformer, Qwen2_5VisionTransformer


For consistency and adherence to POSIX standards, please add a newline character at the end of the file.

Suggested change

from .. import Qwen2VisionTransformer, Qwen2_5VisionTransformer

from .. import Qwen2VisionTransformer, Qwen2_5VisionTransformer

gemini-code-assist · 2025-07-29T11:07:26Z

multimodal/dashinfer_vlm/vl_inference/utils/model_loader.py

+            if hasattr(self.hf_model_config, "architectures") and "Qwen2_5_VLForConditionalGeneration" in self.hf_model_config.architectures:
+                is_qwen_2_5= True
+            else:
+                is_qwen_2_5 = False


This if/else block can be simplified into a single boolean assignment. This improves code conciseness and readability. Also, there is a typo is_qwen_2_5= True with a space before =.

is_qwen_2_5 = hasattr(self.hf_model_config, "architectures") and "Qwen2_5_VLForConditionalGeneration" in self.hf_model_config.architectures

* dashinfer vlm: add tensorrt support for qwen2.5vl * update benchmark --------- Co-authored-by: Xiaotong Chen <[email protected]>

dashinfer vlm: add tensorrt support for qwen2.5vl

56bad5d

gemini-code-assist bot reviewed Jul 29, 2025

View reviewed changes

update benchmark

44db07b

x574chen added this pull request to the merge queue Jul 29, 2025

Merged via the queue into modelscope:main with commit 4ebf617 Jul 29, 2025
4 checks passed

kzjeef pushed a commit that referenced this pull request Jul 29, 2025

dashinfer vlm: add tensorrt support for qwen2.5vl (#91)

f3cca8e

* dashinfer vlm: add tensorrt support for qwen2.5vl * update benchmark --------- Co-authored-by: Xiaotong Chen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dashinfer vlm: add tensorrt support for qwen2.5vl #91

dashinfer vlm: add tensorrt support for qwen2.5vl #91

Uh oh!

x574chen commented Jul 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	image_tokens = int(visual_inputs["input"].shape[1] * (spatial_merge_size**2))
	image_tokens = int(visual_inputs["input"].shape[1] / (spatial_merge_size**2))

	default_weight_loader(params_dict[name], loaded_weight)
	default_weight_loader(params_dict[name], loaded_weight)

	from ..visual_embedding.DFN_vit_2_5 import Qwen2_5VisionTransformer
	from ..visual_embedding.DFN_vit_2_5 import Qwen2_5VisionTransformer

	from .. import Qwen2VisionTransformer, Qwen2_5VisionTransformer
	from .. import Qwen2VisionTransformer, Qwen2_5VisionTransformer

dashinfer vlm: add tensorrt support for qwen2.5vl #91

dashinfer vlm: add tensorrt support for qwen2.5vl #91

Uh oh!

Conversation

x574chen commented Jul 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant