build: refine the build process, make normal workflow code works. #90

kzjeef · 2025-07-28T11:50:52Z

lots of build change for normal release works again.

- add qwen vl 2.5 model support. - Qwen VL2.5 only support 'transformers' as vit engine, (trt not support yet.) - upgrade package version to make sure VL2.5 code is added. test command: server: `dashinfer_vlm_serve --model qwen/Qwen2.5-VL-3B-Instruct --vision_engine transformers --port 8000 --host=127.0.0.1` client: ``` curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d \ '{"model": "qwen/Qwen2.5-VL-3B-Instruct", "messages": [{"role": "user", "content": [{ "type": "text", "text": "Describe the image." }, {"type": "image_url", "image_url": {"url": "https://farm4.staticflickr.com/3075/3168662394_7d7103de7d_z_d.jpg"}}]}], "max_completion_tokens": 1024, "top_p": 0.5, "temperature": 0.1, "frequency_penalty": 1.05 }' ``` result: ``` {"id":"chatcmpl-rxqDiCQEJweEeeB7FADiER","object":"chat.completion", "created":1747992522,"model":"model","choices":[{"index":0,"message":{"role":"assistant","content":"The image features a small hummingbird perched on a branch. The bird is positioned in the center of the scene, with its vibrant colors and delicate features clearly visible. The hummingbird appears to be enjoying its time in nature, possibly searching for food or simply resting on the branch. \n\nThere are no other birds or animals present in the image, making it a solitary moment captured in this natural setting."},"finish_reason":"stop"}],"usage":{"prompt_tokens":382,"total_tokens":95,"completion_tokens":81}} ```

This reverts commit 6705c60efb8e2e32efcc92dda5a8e58712e0dfdc.

gemini-code-assist

Summary of Changes

Hello @kzjeef, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the project's build infrastructure and expanding its capabilities, particularly for GPU-accelerated operations and multimodal AI models. It streamlines the build process, integrates highly optimized custom CUDA kernels for core computational tasks, and extends support to newer vision-language models, all while improving development and benchmarking tools.

Highlights

Build System Refinement: The build process has been significantly refined, including updates to CMake configurations, Conan dependencies, and shell scripts. This aims to ensure a smoother and more reliable build workflow, especially for CUDA-enabled components and Python wheels. Key changes include moving from a CUTLASS git submodule to a pre-packaged tarball for easier dependency management and updating pybind11 across various Conan profiles.
CUDA Kernel Optimizations: A substantial amount of new custom CUDA kernel code has been introduced, focusing on optimizing attention mechanisms and general matrix multiplication (GEMM) operations. This includes custom epilogues for pipelined operations, output rescaling, and log-sum-exp application, as well as specialized MMA (Matrix Multiply-Accumulate) implementations designed for specific hardware architectures (e.g., SM80) to improve performance and shared memory utilization.
Multimodal Model Support Expansion: Support for Qwen2.5-VL models has been added to the multimodal inference framework. The model loading logic has been updated to dynamically detect and handle different Qwen-VL architectures, ensuring compatibility and proper initialization of visual and language components. This also includes changes to attention implementation settings for visual models.
Development Environment & Benchmarking Improvements: Dockerfiles for CUDA development environments have been updated with more robust package installation methods and build tool configurations. New benchmarking scripts and data download utilities have been added for the OpenAI API, allowing for more comprehensive performance evaluation of the multimodal models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refines the build process and seems to restore the normal release workflow. Key changes include vendoring the cutlass dependency instead of using a submodule, adding a significant amount of custom CUDA code for high-performance attention mechanisms (based on cutlass), and updating various build scripts, Dockerfiles, and dependencies to support new models and improve build robustness.

My review focuses on improving the maintainability of the build scripts by suggesting translation of comments to English and making build configurations more consistent and flexible. The changes are generally positive and move the project in a good direction.

kzjeef and others added 26 commits May 25, 2025 02:01

Update build-check-share-runner.yml

a0b1e39

ci: adjust code clone for build test.

a2364a4

thridparty: reduce mkl size, remove unused so.

562d496

xformers: replace link with source.

6adb588

cutlass: try use local tar rather than git submodule.

1848c40

ci: remove shared runner

f2dd0aa

Update README.md

169754a

Update build-check.yml

0990098

Update build-check.yml

f845eb8

ci: fix release script. (#88)

cdcf2be

cmake: nccl find lib without version.

a776706

flash attn: less memory footprint for compile.

3625b4a

workflow: fix typo.

6513788

ci: fix git safe dir issue.

db372c3

multimodel: minor fix for benchmark

b70fea6

build: support 90a by default.

5fdca5c

[Build]: upgrade pybind11 for python 3.12, also remove cutlass folder.

5339d55

docker: fine centos docker and ubi8 docker.

b3685e2

add build script for ubi8 support.

2fd9952

github: action use ubi8 image rather than centos image.

dec6847

github: try use github provided machine.

6705c60

Revert "github: try use github provided machine."

ba55f05

This reverts commit 6705c60efb8e2e32efcc92dda5a8e58712e0dfdc.

github: remove prefix of docker image.

89c8bd9

github: change workflow image to ubi8 images.

1749dd8

Merge branch 'main' into main

166be5f

gemini-code-assist bot reviewed Jul 28, 2025

View reviewed changes

kzjeef added this pull request to the merge queue Jul 28, 2025

Merged via the queue into modelscope:main with commit 125cf63 Jul 28, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build: refine the build process, make normal workflow code works. #90

build: refine the build process, make normal workflow code works. #90

Uh oh!

kzjeef commented Jul 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

build: refine the build process, make normal workflow code works. #90

build: refine the build process, make normal workflow code works. #90

Uh oh!

Conversation

kzjeef commented Jul 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant