-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[CI/Build]: make it possible to build with a free-threaded interpreter #29241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
53ea5d3 to
33907a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to enable building vLLM with a free-threaded Python interpreter by disabling the Limited C API and Stable ABI, which are not supported in free-threaded builds. The changes correctly modify setup.py to conditionally disable py_limited_api. However, there is a critical issue in cmake/utils.cmake where the check for a free-threaded interpreter will not work as intended due to how CMake handles boolean strings, which could break existing builds. I've provided a suggestion to fix this.
33907a1 to
67553b4
Compare
For Python 3.13/3.14, free-threaded Python does not support using the
Limited C API and the Stable ABI.
Without this change, the build ends with:
```
In file included from /path/to/vllm/csrc/core/registration.h:3,
from /path/to/vllm/csrc/cpu/torch_bindings.cpp:3:
/..../include/python3.14t/Python.h:51:4: error: #error "The limited API is not currently supported in the free-threaded build"
```
Signed-off-by: Ralf Gommers <[email protected]>
67553b4 to
c082838
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces changes to enable building vLLM with a free-threaded Python interpreter, a feature becoming available in Python 3.13/3.14. The author correctly identifies that the Limited C API and Stable ABI are not supported in free-threaded builds. The changes disable these features by modifying both setup.py for setuptools and cmake/utils.cmake for the CMake build process. The detection of a free-threaded interpreter is implemented correctly in both places using sysconfig.get_config_var("Py_GIL_DISABLED"). The changes are clear, well-motivated, and appear correct. I have not found any issues of high or critical severity.
ApostaC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
| class CMakeExtension(Extension): | ||
| def __init__(self, name: str, cmake_lists_dir: str = ".", **kwa) -> None: | ||
| super().__init__(name, sources=[], py_limited_api=True, **kwa) | ||
| super().__init__(name, sources=[], py_limited_api=not is_freethreaded(), **kwa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dumb question: is py_limited_api=False required when using the free-threaded python ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @ApostaC. Yes, it should be set to False, otherwise setuptools raises an exception. The limited API is not supported yet with free-threading. There is very active work on adding that support for 3.15 (either PEP 803 or PEP 809 will add it, and both require PEP 793), but that's a new ABI which will be compatible with both free-threaded and with-GIL interpreters. Using that in the future will require both a new setuptools version and some source-level changes in vLLM to use PyModExport (PEP 793). So until that's all done, the limited API has to be avoided here.
Purpose
For Python 3.13/3.14, free-threaded Python does not support using the Limited C API and the Stable ABI. The purpose of this PR is to ensure that vLLM can be built from source under a free-threaded interpreter. It doesn't change anything for default (with-GIL) Python interpreters.
See gh-28762 for more context on getting vLLM to work with free-threaded Python.
Note that this same change is needed in vllm_flash_attn; PR at vllm-project/flash-attention#112. The order in which these two PRs are merged doesn't matter.
Test Plan
This was tested locally. It's too early to propose adding CI, given the dependencies that don't provide free-threaded wheels yet or need to be built from source (xref gh-28762).
Test Result
Without this change, the build ends with:
With this change, the build succeeds and a
cp313torcp314twheel is built as expected (tested locally for CPU and CUDA on Linux x86-64).Here is a build log for CPU as an example:
Build log - editable build for CPU in a cp314t environment
Note: Python 3.14 isn't yet supported by vLLM (but more interesting for free-threading), so it requires this tiny patch locally:
Patch for allowing a 3.14 interpreter
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.