Skip to content

Commit 1bb17ec

Browse files
authored
[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)
Signed-off-by: Ioana Ghiban <[email protected]>
1 parent 15b1511 commit 1bb17ec

File tree

5 files changed

+58
-14
lines changed

5 files changed

+58
-14
lines changed

docs/getting_started/installation/cpu.apple.inc.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must
44

55
Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
66

7-
!!! warning
8-
There are no pre-built wheels or images for this device, so you must build vLLM from source.
9-
107
# --8<-- [end:installation]
118
# --8<-- [start:requirements]
129

@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
2017
# --8<-- [end:set-up-using-python]
2118
# --8<-- [start:pre-built-wheels]
2219

20+
Currently, there are no pre-built Apple silicon CPU wheels.
21+
2322
# --8<-- [end:pre-built-wheels]
2423
# --8<-- [start:build-wheel-from-source]
2524

@@ -78,6 +77,8 @@ uv pip install -e .
7877
# --8<-- [end:build-wheel-from-source]
7978
# --8<-- [start:pre-built-images]
8079

80+
Currently, there are no pre-built Arm silicon CPU images.
81+
8182
# --8<-- [end:pre-built-images]
8283
# --8<-- [start:build-image-from-source]
8384

docs/getting_started/installation/cpu.arm.inc.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,6 @@
11
# --8<-- [start:installation]
22

3-
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
4-
5-
ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
6-
7-
!!! warning
8-
There are no pre-built wheels or images for this device, so you must build vLLM from source.
3+
vLLM offers basic model inferencing and serving on Arm CPU platform, with support NEON, data types FP32, FP16 and BF16.
94

105
# --8<-- [end:installation]
116
# --8<-- [start:requirements]
@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
2015
# --8<-- [end:set-up-using-python]
2116
# --8<-- [start:pre-built-wheels]
2217

18+
Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
19+
Please replace `<version>` in the commands below with a specific version string (e.g., `0.11.2`).
20+
21+
```bash
22+
uv pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
23+
```
24+
25+
??? console "pip"
26+
```bash
27+
pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
28+
```
29+
30+
The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
31+
32+
!!! note
33+
Nightly wheels are currently unsupported for this architecture. (e.g. to bisect the behavior change, performance regression).
34+
2335
# --8<-- [end:pre-built-wheels]
2436
# --8<-- [start:build-wheel-from-source]
2537

@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
6981
# --8<-- [end:build-wheel-from-source]
7082
# --8<-- [start:pre-built-images]
7183

84+
Currently, there are no pre-built Arm CPU images.
85+
7286
# --8<-- [end:pre-built-images]
7387
# --8<-- [start:build-image-from-source]
7488
```bash

docs/getting_started/installation/cpu.md

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C
4646

4747
### Pre-built wheels
4848

49-
Please refer to the instructions for [pre-built wheels on GPU](./gpu.md#pre-built-wheels).
50-
5149
When specifying the index URL, please make sure to use the `cpu` variant subdirectory.
5250
For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`.
5351

52+
=== "Intel/AMD x86"
53+
54+
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-wheels"
55+
56+
=== "ARM AArch64"
57+
58+
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-wheels"
59+
60+
=== "Apple silicon"
61+
62+
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-wheels"
63+
64+
=== "IBM Z (S390X)"
65+
66+
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-wheels"
67+
5468
### Build wheel from source
5569

5670
#### Set up using Python-only build (without compilation) {#python-only-build}
@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu
87101

88102
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"
89103

104+
=== "ARM AArch64"
105+
106+
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-images"
107+
108+
=== "Apple silicon"
109+
110+
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-images"
111+
112+
=== "IBM Z (S390X)"
113+
114+
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-images"
115+
90116
### Build image from source
91117

92118
=== "Intel/AMD x86"

docs/getting_started/installation/cpu.s390x.inc.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now,
44

55
Currently, the CPU implementation for s390x architecture supports FP32 datatype only.
66

7-
!!! warning
8-
There are no pre-built wheels or images for this device, so you must build vLLM from source.
9-
107
# --8<-- [end:installation]
118
# --8<-- [start:requirements]
129

@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype
2118
# --8<-- [end:set-up-using-python]
2219
# --8<-- [start:pre-built-wheels]
2320

21+
Currently, there are no pre-built IBM Z CPU wheels.
22+
2423
# --8<-- [end:pre-built-wheels]
2524
# --8<-- [start:build-wheel-from-source]
2625

@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source.
6968
# --8<-- [end:build-wheel-from-source]
7069
# --8<-- [start:pre-built-images]
7170

71+
Currently, there are no pre-built IBM Z CPU images.
72+
7273
# --8<-- [end:pre-built-images]
7374
# --8<-- [start:build-image-from-source]
7475

docs/getting_started/installation/cpu.x86.inc.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data
1717
# --8<-- [end:set-up-using-python]
1818
# --8<-- [start:pre-built-wheels]
1919

20+
Currently, there are no pre-built x86 CPU wheels.
21+
2022
# --8<-- [end:pre-built-wheels]
2123
# --8<-- [start:build-wheel-from-source]
2224

0 commit comments

Comments
 (0)