-
Notifications
You must be signed in to change notification settings - Fork 27
build: refine the build process, make normal workflow code works. #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- add qwen vl 2.5 model support.
- Qwen VL2.5 only support 'transformers' as vit engine, (trt not support
yet.)
- upgrade package version to make sure VL2.5 code is added.
test command:
server:
`dashinfer_vlm_serve --model qwen/Qwen2.5-VL-3B-Instruct --vision_engine transformers --port 8000 --host=127.0.0.1`
client:
```
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d \ '{"model": "qwen/Qwen2.5-VL-3B-Instruct", "messages": [{"role": "user", "content": [{ "type": "text", "text": "Describe the image." }, {"type": "image_url", "image_url": {"url": "https://farm4.staticflickr.com/3075/3168662394_7d7103de7d_z_d.jpg"}}]}], "max_completion_tokens": 1024, "top_p": 0.5, "temperature": 0.1, "frequency_penalty": 1.05 }'
```
result:
```
{"id":"chatcmpl-rxqDiCQEJweEeeB7FADiER","object":"chat.completion",
"created":1747992522,"model":"model","choices":[{"index":0,"message":{"role":"assistant","content":"The
image features a small hummingbird perched on a branch. The bird is positioned in the center of the scene,
with its vibrant colors and delicate features clearly visible. The hummingbird appears to be enjoying its
time in nature, possibly searching for food or simply resting on the branch. \n\nThere are no other birds
or animals present in the image, making it a solitary moment captured in this natural
setting."},"finish_reason":"stop"}],"usage":{"prompt_tokens":382,"total_tokens":95,"completion_tokens":81}}
```
This reverts commit 6705c60efb8e2e32efcc92dda5a8e58712e0dfdc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kzjeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on enhancing the project's build infrastructure and expanding its capabilities, particularly for GPU-accelerated operations and multimodal AI models. It streamlines the build process, integrates highly optimized custom CUDA kernels for core computational tasks, and extends support to newer vision-language models, all while improving development and benchmarking tools.
Highlights
- Build System Refinement: The build process has been significantly refined, including updates to CMake configurations, Conan dependencies, and shell scripts. This aims to ensure a smoother and more reliable build workflow, especially for CUDA-enabled components and Python wheels. Key changes include moving from a CUTLASS git submodule to a pre-packaged tarball for easier dependency management and updating
pybind11across various Conan profiles. - CUDA Kernel Optimizations: A substantial amount of new custom CUDA kernel code has been introduced, focusing on optimizing attention mechanisms and general matrix multiplication (GEMM) operations. This includes custom epilogues for pipelined operations, output rescaling, and log-sum-exp application, as well as specialized MMA (Matrix Multiply-Accumulate) implementations designed for specific hardware architectures (e.g., SM80) to improve performance and shared memory utilization.
- Multimodal Model Support Expansion: Support for Qwen2.5-VL models has been added to the multimodal inference framework. The model loading logic has been updated to dynamically detect and handle different Qwen-VL architectures, ensuring compatibility and proper initialization of visual and language components. This also includes changes to attention implementation settings for visual models.
- Development Environment & Benchmarking Improvements: Dockerfiles for CUDA development environments have been updated with more robust package installation methods and build tool configurations. New benchmarking scripts and data download utilities have been added for the OpenAI API, allowing for more comprehensive performance evaluation of the multimodal models.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refines the build process and seems to restore the normal release workflow. Key changes include vendoring the cutlass dependency instead of using a submodule, adding a significant amount of custom CUDA code for high-performance attention mechanisms (based on cutlass), and updating various build scripts, Dockerfiles, and dependencies to support new models and improve build robustness.
My review focuses on improving the maintainability of the build scripts by suggesting translation of comments to English and making build configurations more consistent and flexible. The changes are generally positive and move the project in a good direction.
lots of build change for normal release works again.