Skip to content

Conversation

@rgommers
Copy link

@rgommers rgommers commented Nov 22, 2025

Purpose

For Python 3.13/3.14, free-threaded Python does not support using the Limited C API and the Stable ABI. The purpose of this PR is to ensure that vLLM can be built from source under a free-threaded interpreter. It doesn't change anything for default (with-GIL) Python interpreters.

See gh-28762 for more context on getting vLLM to work with free-threaded Python.

Note that this same change is needed in vllm_flash_attn; PR at vllm-project/flash-attention#112. The order in which these two PRs are merged doesn't matter.

Test Plan

This was tested locally. It's too early to propose adding CI, given the dependencies that don't provide free-threaded wheels yet or need to be built from source (xref gh-28762).

Test Result

Without this change, the build ends with:

  In file included from /path/to/vllm/csrc/core/registration.h:3,
                   from /path/to/vllm/csrc/cpu/torch_bindings.cpp:3:
  /..../include/python3.14t/Python.h:51:4: error: #error "The limited API is not currently supported in the free-threaded build"

With this change, the build succeeds and a cp313t or cp314t wheel is built as expected (tested locally for CPU and CUDA on Linux x86-64).

Here is a build log for CPU as an example:

Build log - editable build for CPU in a cp314t environment
$ VLLM_TARGET_DEVICE=cpu python -m pip install -e . -v --no-build-isolation --no-deps
Using pip 25.3 from /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/pip (python 3.14)
Obtaining file:///home/rgommers/code/tmp/pixidev-vllm/vllm/vllm
  Running command Checking if build backend supports build_editable
  Checking if build backend supports build_editable ... done
  Running command Preparing editable metadata (pyproject.toml)
  /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set
    warnings.warn(self.message)
  running dist_info
  creating /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info
  writing /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/dependency_links.txt
  writing entry points to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/entry_points.txt
  writing requirements to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/requires.txt
  writing top-level names to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/top_level.txt
  writing manifest file '/tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/SOURCES.txt'
  creating '/tmp/pip-modern-metadata-v__mr59k/vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info'
  Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: vllm
  Running command Building editable for vllm (pyproject.toml)
  /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set
    warnings.warn(self.message)
  running editable_wheel
  creating /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info
  writing /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/dependency_links.txt
  writing entry points to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/entry_points.txt
  writing requirements to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/requires.txt
  writing top-level names to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/top_level.txt
  writing manifest file '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/SOURCES.txt'
  creating '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info'
  creating /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/WHEEL
  running build_py
  running build_ext
  -- The CXX compiler identification is GNU 14.3.0
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/ccache - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Build type: Release
  -- Target device: cpu
  -- Found Python: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/python (found version "3.14.0") found components: Interpreter Development.Module Development.SABIModule
  -- Found python matching: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/python.
  CMake Warning at /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
    static library kineto_LIBRARY-NOTFOUND not found.
  Call Stack (most recent call first):
    /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found)
    CMakeLists.txt:91 (find_package)


  -- Found Torch: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/torch/lib/libtorch.so
  CMake Warning at cmake/cpu_extension.cmake:127 (message):
    Disable AVX512-BF16 ISA support, no avx512_bf16 found in local CPU flags.
    If cross-compilation is required, please set env VLLM_CPU_AVX512BF16=1.
  Call Stack (most recent call first):
    CMakeLists.txt:111 (include)


  CMake Warning at cmake/cpu_extension.cmake:142 (message):
    Disable AVX512-VNNI ISA support, no avx512_vnni found in local CPU flags.
    If cross-compilation is required, please set env VLLM_CPU_AVX512VNNI=1.
  Call Stack (most recent call first):
    CMakeLists.txt:111 (include)


  CMake Warning at cmake/cpu_extension.cmake:158 (message):
    Disable AMX_BF16 ISA support, no amx_bf16 found in local CPU flags.  If
    cross-compilation is required, please set env VLLM_CPU_AMXBF16=1.
  Call Stack (most recent call first):
    CMakeLists.txt:111 (include)


  -- Downloading oneDNN from GitHub
  -- The C compiler identification is GNU 14.3.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/ccache - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- DNNL_TARGET_ARCH: X64
  -- DNNL compat: set DNNL_VERBOSE to ONEDNN_VERBOSE with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_MAX_CPU_ISA to ONEDNN_ENABLE_MAX_CPU_ISA with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_CPU_ISA_HINTS to ONEDNN_ENABLE_CPU_ISA_HINTS with value `OFF`
  -- DNNL compat: set DNNL_BUILD_DOC to ONEDNN_BUILD_DOC with value `OFF`
  -- DNNL compat: set DNNL_BUILD_EXAMPLES to ONEDNN_BUILD_EXAMPLES with value `OFF`
  -- DNNL compat: set DNNL_BUILD_TESTS to ONEDNN_BUILD_TESTS with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_JIT_PROFILING to ONEDNN_ENABLE_JIT_PROFILING with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_ITT_TASKS to ONEDNN_ENABLE_ITT_TASKS with value `OFF`
  -- DNNL compat: set DNNL_AARCH64_USE_ACL to ONEDNN_AARCH64_USE_ACL with value `OFF`
  -- DNNL compat: set DNNL_LIBRARY_TYPE to ONEDNN_LIBRARY_TYPE with value `STATIC`
  -- DNNL compat: set DNNL_ENABLE_WORKLOAD to ONEDNN_ENABLE_WORKLOAD with value `INFERENCE`
  -- DNNL compat: set DNNL_ENABLE_PRIMITIVE to ONEDNN_ENABLE_PRIMITIVE with value `MATMUL;REORDER`
  -- DNNL_LIBRARY_NAME: dnnl
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - found
  -- Found Threads: TRUE
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- Found Git: /usr/bin/git (found version "2.49.0")
  -- Enabled testing coverage: CI
  -- Enabled workload: INFERENCE
  -- Enabled primitives: MATMUL;REORDER
  -- Enabled primitive CPU ISA: ALL
  -- Enabled primitive GPU ISA: ALL
  -- Enabled GeMM kernels ISA: ALL
  -- Primitive cache is enabled
  -- CPU extension compile flags: -mf16c;-fopenmp;-DVLLM_CPU_EXTENSION;-mavx512f;-mavx512vl;-mavx512bw;-mavx512dq
  -- CPU extension source files: csrc/cpu/dnnl_kernels.cpp;csrc/cpu/shm.cpp;csrc/cpu/cpu_wna16.cpp;csrc/cpu/activation.cpp;csrc/cpu/utils.cpp;csrc/cpu/layernorm.cpp;csrc/cpu/mla_decode.cpp;csrc/cpu/pos_encoding.cpp;csrc/moe/dynamic_4bit_int_moe_cpu.cpp;csrc/cpu/cpu_attn.cpp;csrc/cpu/scratchpad_manager.cpp;csrc/cpu/torch_bindings.cpp
  -- Enabling C extension.
  -- Configuring done (2.8s)
  -- Generating done (0.1s)
  -- Build files have been written to: /tmp/tmp3kcxe3na.build-temp
   [1/436] Building CXX object /home/rgommers/code/tmp/pixidev-vllm/vllm/vllm/.deps/onednn-build/src/common/CMakeFiles/dnnl_common.dir/bfloat16.cpp.o
  [2/436] Building CXX object /home/rgommers/code/tmp/pixidev-vllm/vllm/vllm/.deps/onednn-build/src/common/CMakeFiles/dnnl_common.dir/dnnl_debug_autogenerated.cpp.o
...
[435/436] Building CXX object CMakeFiles/_C.dir/csrc/cpu/cpu_attn.cpp.o
  [436/436] Linking CXX shared module _C.cpython-314t-x86_64-linux-gnu.so
  -- Install configuration: "Release"
  -- Installing: /tmp/tmpm606u7y7.build-lib/vllm/_C.cpython-314t-x86_64-linux-gnu.so
  -- Set non-toolchain portion of runtime path of "/tmp/tmpm606u7y7.build-lib/vllm/_C.cpython-314t-x86_64-linux-gnu.so" to ""
  copying /tmp/tmpm606u7y7.build-lib/vllm/_C.cpython-314t-x86_64-linux-gnu.so -> vllm
  running egg_info
  creating /tmp/tmp3kcxe3na.build-temp/vllm.egg-info
  writing /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/PKG-INFO
  writing dependency_links to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/dependency_links.txt
  writing entry points to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/entry_points.txt
  writing requirements to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/requires.txt
  writing top-level names to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/top_level.txt
  writing manifest file '/tmp/tmp3kcxe3na.build-temp/vllm.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file '/tmp/tmp3kcxe3na.build-temp/vllm.egg-info/SOURCES.txt'
  Editable install will be performed using a meta path finder.

  Options like `package-data`, `include/exclude-package-data` or
  `packages.find.exclude/include` may have no effect.

  adding '__editable___vllm_0_11_2_dev102_g07a5d100d_cpu_finder.py'
  adding '__editable__.vllm-0.11.2.dev102+g07a5d100d.cpu.pth'
  creating '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm-0.11.2.dev102+g07a5d100d.cpu-0.editable-cp314-cp314t-linux_x86_64.whl' and adding '/tmp/tmpzkmwfurivllm-0.11.2.dev102+g07a5d100d.cpu-0.editable-cp314-cp314t-linux_x86_64.whl' to it
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/licenses/LICENSE'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/METADATA'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/WHEEL'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/entry_points.txt'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/top_level.txt'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/RECORD'
  /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/setuptools/command/editable_wheel.py:351: InformationOnly: Editable installation.
  !!

          ********************************************************************************
          Please be careful with folders in your working directory with the same
          name as your package as they may take precedence during imports.
          ********************************************************************************

  !!
    with strategy, WheelFile(wheel_path, "w") as wheel_obj:
  Building editable for vllm (pyproject.toml) ... done
  Created wheel for vllm: filename=vllm-0.11.2.dev102+g07a5d100d.cpu-0.editable-cp314-cp314t-linux_x86_64.whl size=14520 sha256=5963f6f48f960b30a49ca42ad3f856672444ad6c89664cd8e113cd593d733628
  Stored in directory: /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42
Successfully built vllm
Installing collected packages: vllm
  Attempting uninstall: vllm
    Found existing installation: vllm 0.11.2.dev102+g45888cf12.cpu
    Uninstalling vllm-0.11.2.dev102+g45888cf12.cpu:
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/vllm
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/__editable__.vllm-0.11.2.dev102+g45888cf12.cpu.pth
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/__editable___vllm_0_11_2_dev102_g45888cf12_cpu_finder.py
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/__pycache__/__editable___vllm_0_11_2_dev102_g45888cf12_cpu_finder.cpython-314.pyc
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/vllm-0.11.2.dev102+g45888cf12.cpu.dist-info/
      Successfully uninstalled vllm-0.11.2.dev102+g45888cf12.cpu
  changing mode of /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/vllm to 755
Successfully installed vllm-0.11.2.dev102+g07a5d100d.cpu

Note: Python 3.14 isn't yet supported by vLLM (but more interesting for free-threading), so it requires this tiny patch locally:

Patch for allowing a 3.14 interpreter
diff --git a/CMakeLists.txt b/CMakeLists.txt
index a4cf51d17..151332651 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -34,7 +34,7 @@ install(CODE "set(CMAKE_INSTALL_LOCAL_ONLY TRUE)" ALL_COMPONENTS)
 # Supported python versions.  These versions will be searched in order, the
 # first match will be selected.  These should be kept in sync with setup.py.
 #
-set(PYTHON_SUPPORTED_VERSIONS "3.10" "3.11" "3.12" "3.13")
+set(PYTHON_SUPPORTED_VERSIONS "3.10" "3.11" "3.12" "3.13" "3.14")
 
 # Supported AMD GPU architectures.
 set(HIP_SUPPORTED_ARCHS "gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1200;gfx1201;gfx1150;gfx1151")
diff --git a/pyproject.toml b/pyproject.toml
index a250ab656..69d691ae4 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -30,7 +30,7 @@ classifiers = [
     "Topic :: Scientific/Engineering :: Artificial Intelligence",
     "Topic :: Scientific/Engineering :: Information Analysis",
 ]
-requires-python = ">=3.10,<3.14"
+requires-python = ">=3.10,<3.15"
 dynamic = [ "version", "dependencies", "optional-dependencies"]

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link

mergify bot commented Nov 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rgommers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to enable building vLLM with a free-threaded Python interpreter by disabling the Limited C API and Stable ABI, which are not supported in free-threaded builds. The changes correctly modify setup.py to conditionally disable py_limited_api. However, there is a critical issue in cmake/utils.cmake where the check for a free-threaded interpreter will not work as intended due to how CMake handles boolean strings, which could break existing builds. I've provided a suggestion to fix this.

For Python 3.13/3.14, free-threaded Python does not support using the
Limited C API and the Stable ABI.

Without this change, the build ends with:
```
  In file included from /path/to/vllm/csrc/core/registration.h:3,
                   from /path/to/vllm/csrc/cpu/torch_bindings.cpp:3:
  /..../include/python3.14t/Python.h:51:4: error: #error "The limited API is not currently supported in the free-threaded build"
```

Signed-off-by: Ralf Gommers <[email protected]>
@rgommers
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to enable building vLLM with a free-threaded Python interpreter, a feature becoming available in Python 3.13/3.14. The author correctly identifies that the Limited C API and Stable ABI are not supported in free-threaded builds. The changes disable these features by modifying both setup.py for setuptools and cmake/utils.cmake for the CMake build process. The detection of a free-threaded interpreter is implemented correctly in both places using sysconfig.get_config_var("Py_GIL_DISABLED"). The changes are clear, well-motivated, and appear correct. I have not found any issues of high or critical severity.

Copy link
Collaborator

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

class CMakeExtension(Extension):
def __init__(self, name: str, cmake_lists_dir: str = ".", **kwa) -> None:
super().__init__(name, sources=[], py_limited_api=True, **kwa)
super().__init__(name, sources=[], py_limited_api=not is_freethreaded(), **kwa)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question: is py_limited_api=False required when using the free-threaded python ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @ApostaC. Yes, it should be set to False, otherwise setuptools raises an exception. The limited API is not supported yet with free-threading. There is very active work on adding that support for 3.15 (either PEP 803 or PEP 809 will add it, and both require PEP 793), but that's a new ABI which will be compatible with both free-threaded and with-GIL interpreters. Using that in the future will require both a new setuptools version and some source-level changes in vLLM to use PyModExport (PEP 793). So until that's all done, the limited API has to be avoided here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants