[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)

ioghiban · web-flow · commit 1bb17ecb396f · 2025-12-03T13:33:50.000Z
Signed-off-by: Ioana Ghiban &lt;ioana.ghiban@arm.com&gt;
diff --git a/docs/getting_started/installation/cpu.apple.inc.md b/docs/getting_started/installation/cpu.apple.inc.md
@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must
 
 Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
 
-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
-
 # --8<-- [end:installation]
 # --8<-- [start:requirements]
 
@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
 
+Currently, there are no pre-built Apple silicon CPU wheels.
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]
 
@@ -78,6 +77,8 @@ uv pip install -e .
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]
 
+Currently, there are no pre-built Arm silicon CPU images.
+
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]
 
diff --git a/docs/getting_started/installation/cpu.arm.inc.md b/docs/getting_started/installation/cpu.arm.inc.md
@@ -1,11 +1,6 @@
 # --8<-- [start:installation]
 
-vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
-
-ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
-
-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
+vLLM offers basic model inferencing and serving on Arm CPU platform, with support NEON, data types FP32, FP16 and BF16.
 
 # --8<-- [end:installation]
 # --8<-- [start:requirements]
@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
 
+Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
+Please replace `<version>` in the commands below with a specific version string (e.g., `0.11.2`).
+
+```bash
+uv pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
+```
+
+??? console "pip"
+    ```bash
+    pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
+    ```
+
+The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
+
+!!! note
+    Nightly wheels are currently unsupported for this architecture. (e.g. to bisect the behavior change, performance regression).
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]
 
@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]
 
+Currently, there are no pre-built Arm CPU images.
+
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]
 ```bash
diff --git a/docs/getting_started/installation/cpu.md b/docs/getting_started/installation/cpu.md
@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C
 
 ### Pre-built wheels
 
-Please refer to the instructions for [pre-built wheels on GPU](./gpu.md#pre-built-wheels).
-
 When specifying the index URL, please make sure to use the `cpu` variant subdirectory.
 For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`.
 
+=== "Intel/AMD x86"
+
+    --8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-wheels"
+
+=== "ARM AArch64"
+
+    --8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-wheels"
+
+=== "Apple silicon"
+
+    --8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-wheels"
+
+=== "IBM Z (S390X)"
+
+    --8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-wheels"
+
 ### Build wheel from source
 
 #### Set up using Python-only build (without compilation) {#python-only-build}
@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu
 
     --8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"
 
+=== "ARM AArch64"
+
+    --8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-images"
+
+=== "Apple silicon"
+
+    --8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-images"
+
+=== "IBM Z (S390X)"
+
+    --8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-images"
+
 ### Build image from source
 
 === "Intel/AMD x86"
diff --git a/docs/getting_started/installation/cpu.s390x.inc.md b/docs/getting_started/installation/cpu.s390x.inc.md
@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now,
 
 Currently, the CPU implementation for s390x architecture supports FP32 datatype only.
 
-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
-
 # --8<-- [end:installation]
 # --8<-- [start:requirements]
 
@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
 
+Currently, there are no pre-built IBM Z CPU wheels.
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]
 
@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source.
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]
 
+Currently, there are no pre-built IBM Z CPU images.
+
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]
 
diff --git a/docs/getting_started/installation/cpu.x86.inc.md b/docs/getting_started/installation/cpu.x86.inc.md
@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
 
+Currently, there are no pre-built x86 CPU wheels.
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]