paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+environment in the compose.yaml file to change the DCU used. For example, if you need to use card 1 for deployment, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ environment:
++ - HIP_VISIBLE_DEVICES: 1
+ ...
+ paddleocr-vlm-server:
+ ...
+ environment:
++ - HIP_VISIBLE_DEVICES: 1
+ ...
+```
+
+
+paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Replace /path/to/your_config.yaml with your actual configuration file path.
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+compose.yaml 文件中的 environment 来更改所使用的 DCU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ environment:
++ - HIP_VISIBLE_DEVICES: 1
+ ...
+ paddleocr-vlm-server:
+ ...
+ environment:
++ - HIP_VISIBLE_DEVICES: 1
+ ...
+```
+
+
+paddleocr-vlm-server.volumes 和 paddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+environment in the compose.yaml file to change the GPU used. For example, if you need to use card 1 for deployment, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+ paddleocr-vlm-server:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+```
+
+paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Replace /path/to/your_config.yaml with your actual configuration file path.
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+compose.yaml 文件中的 environment 来更改所使用的 GPU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+ paddleocr-vlm-server:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+```
+
+paddleocr-vlm-server.volumes 和 paddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+| Name | -Description | -
|---|---|
--pipeline |
-Registered name of the PaddleX pipeline or path to the pipeline configuration file. | -
--device |
-Device for pipeline deployment. By default, the GPU is used if available; otherwise, the CPU is used. | -
--host |
-Hostname or IP address to which the server is bound. The default is 0.0.0.0. |
-
--port |
-Port number on which the server listens. The default is 8080. |
-
--use_hpip |
-Enable high-performance inference mode. Refer to the high-performance inference documentation for more information. | -
--hpi_config |
-High-performance inference configuration. Refer to the high-performance inference documentation for more information. | -
| 名称 | -说明 | -
|---|---|
--pipeline |
-PaddleX 产线注册名或产线配置文件路径。 | -
--device |
-产线部署设备。默认情况下,若 GPU 可用则使用 GPU,否则使用 CPU。 | -
--host |
-服务器绑定的主机名或 IP 地址,默认为 0.0.0.0。 |
-
--port |
-服务器监听的端口号,默认为 8080。 |
-
--use_hpip |
-启用高性能推理模式。请参考高性能推理文档了解更多信息。 | -
--hpi_config |
-高性能推理配置。请参考高性能推理文档了解更多信息。 | -
paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+environment in the compose.yaml file to change the XPU used. For example, if you need to use card 1 for deployment, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ environment:
++ - XPU_VISIBLE_DEVICES: 1
+ ...
+ paddleocr-vlm-server:
+ ...
+ environment:
++ - XPU_VISIBLE_DEVICES: 1
+ ...
+```
+
+paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Replace /path/to/your_config.yaml with your actual configuration file path.
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+compose.yaml 文件中的 environment 来更改所使用的 XPU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ environment:
++ - XPU_VISIBLE_DEVICES: 1
+ ...
+ paddleocr-vlm-server:
+ ...
+ environment:
++ - XPU_VISIBLE_DEVICES: 1
+ ...
+```
+
+paddleocr-vlm-server.volumes 和 paddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+
-## PaddleOCR-VL Inference Device Support
+## Process Guide
-Currently, PaddleOCR-VL offers three inference methods, each with varying levels of support for inference devices. Please verify that your inference device meets the requirements in the table below before proceeding with PaddleOCR-VL inference deployment:
+Before starting, please refer to the next section for information on the inference device support provided by PaddleOCR-VL to **determine if your device meets the operational requirements.** If your device meets the requirements, please select the relevant section to read based on your needs.
+
+For some inference hardware, you may need to refer to other environment configuration documents we provide, but the process remains the same and does not affect your reading of the following process guide:
+
+1. **Want to quickly experience PaddleOCR-VL**:
+
+ If you wish to quickly experience the inference effects of PaddleOCR-VL, please read [1. Environment Preparation](#1-environment-preparation) and [2. Quick Start](#2-quick-start).
+
+2. **Want to use PaddleOCR-VL in a production environment**:
+
+ Although the quick experience allows you to feel the effects of PaddleOCR-VL, it may not be optimal in terms of inference speed and GPU memory usage. If you wish to apply PaddleOCR-VL in a production environment and have higher requirements for inference performance, please read [3. Enhancing VLM Inference Performance Using Inference Acceleration Frameworks](#3-enhancing-vlm-inference-performance-using-inference-acceleration-frameworks).
+
+3. **Want to deploy PaddleOCR-VL as an API service**:
+
+ If you want to deploy PaddleOCR-VL as a web service (API) so that other devices or applications can access and call it through a specific URL without configuring the environment, we offer two methods:
+
+ - Deployment using Docker Compose (one-click start, recommended): Please read [4.1 Method 1: Deploy Using Docker Compose](#41-method-1-deploy-using-docker-compose-recommended) and [4.3 Client-Side Invocation](#43-client-side-invocation).
+ - Manual deployment: Please read [1. Environment Preparation](#1-environment-preparation), [4.2 Method 2: Manual Deployment](#42-method-2-manual-deployment), and [4.3 Client-Side Invocation](#43-client-side-invocation).
+
+4. **Want to fine-tune PaddleOCR-VL to adapt to specific business needs**:
+
+ If you find that the accuracy performance of PaddleOCR-VL in specific business scenarios does not meet expectations, please read [5. Model Fine-tuning](#5-model-fine-tuning).
+
+## Inference Device Support for PaddleOCR-VL
+
+Currently, PaddleOCR-VL offers four inference methods, with varying levels of support for different inference devices. Please confirm that your inference device meets the requirements in the table below before proceeding with PaddleOCR-VL deployment:
| Inference Method | -x64 CPU Support | -GPU Compute Capability Support | -CUDA Version Support | +NVIDIA GPU | +KUNLUNXIN XPU | +HYGON DCU | +MetaX GPU | +Iluvatar GPU | +x64 CPU |
|---|---|---|---|---|---|---|---|---|---|
| PaddlePaddle | ✅ | -≥ 7 | -≥ 11.8 | +✅ | +✅ | +🚧 | +🚧 | +✅ | |
| vLLM | +✅ | +🚧 | +✅ | 🚧 | -≥ 8 (RTX 3060, RTX 5070, A10, A100, ...) - 7 ≤ GPU Compute Capability < 8 (T4, V100, ...) is supported but may encounter request timeouts, OOM errors, or other abnormalities. Not recommended. - |
- ≥ 12.6 | +🚧 | +❌ | |
| SGLang | -🚧 | -8 ≤ GPU Compute Capability < 12 | -≥ 12.6 | +✅ | +🚧 | +🚧 | +🚧 | +🚧 | +❌ | +
| FastDeploy | +✅ | +✅ | +🚧 | +🚧 | +🚧 | +❌ |
| Hardware Type | -Hardware Model | -Environment Configuration Tutorial | -
|---|---|---|
| NVIDIA GPU | -RTX 30, 40 Series | -This usage tutorial | -
| RTX 50 Series | -PaddleOCR-VL RTX 50 Environment Configuration Tutorial | -|
| x64 CPU | -- | -This usage tutorial | -
| XPU | -🚧 | -🚧 | -
| DCU | -🚧 | -🚧 | -
paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+device_ids in the compose.yaml file to change the GPU used. For example, if you need to use GPU card 1 for deployment, make the following modifications:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+ paddleocr-vlm-server:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+```
-If you need to adjust pipeline configurations (such as model path, batch size, deployment device, etc.), refer to Section 4.4.
+paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Please replace /path/to/your_config.yaml with your actual configuration file path.
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+VLM_BACKEND in the .env file, for example, to change the VLM inference backend to fastdeploy:
+
+```diff
+ API_IMAGE_TAG_SUFFIX=latest-offline
+- VLM_BACKEND=vllm
++ VLM_BACKEND=fastdeploy
+ VLM_IMAGE_TAG_SUFFIX=latest-offline
+```
+
+
+## 流程导览
+
+在开始之前,请参考下一节了解 PaddleOCR-VL 对推理设备的支持情况,**以确定您的设备是否满足运行要求。** 若您的设备满足运行要求,请根据您的需求选择相关章节阅读。
+
+部分推理硬件可能需要参考我们提供的其他环境配置文档,但流程是一样的,不影响您的阅读下面的流程导览:
+
+1. **希望快速体验 PaddleOCR-VL**:
+
+ 如果您希望快速体验 PaddleOCR-VL 的推理效果,请阅读 [1. 环境准备](#1-环境准备) 和 [2. 快速开始](#2-快速开始)。
+
+2. **希望将 PaddleOCR-VL 用于生产环境**:
+
+ 快速体验虽然可以让您感受到 PaddleOCR-VL 的效果,但在推理速度、显存占用等方面不是最佳状态。如果您希望将 PaddleOCR-VL 应用于生产环境,并且对推理性能有更高的要求,请阅读 [3. 使用推理加速框架提升 VLM 推理性能](#3-使用推理加速框架提升-vlm-推理性能) 。
+
+3. **希望将 PaddleOCR-VL 部署为 API 服务**:
+
+ 如果您想将 PaddleOCR-VL 部署为一个网络服务(API),这样其他设备或应用程序无需配置环境,仅通过一个特定的网址就可以来访问和调用它,我们提供两种方式:
+
+ - 使用 Docker Compose 部署(一键启动,推荐使用):请阅读 [4.1 方法一:使用 Docker Compose 部署](#41-方法一使用-docker-compose-部署推荐使用) 和 [4.3 客户端调用方式](#43-客户端调用方式)。
+ - 进行手动部署:请阅读 [1. 环境准备](#1-环境准备)、 [4.2 方法二:手动部署](#42-方法二手动部署) 和 [4.3 客户端调用方式](#43-客户端调用方式)。
+
+4. **希望对 PaddleOCR-VL 进行微调以适配特定业务**:
+
+ 如果您发现 PaddleOCR-VL 在特定业务场景中的精度表现未达预期,请阅读 [5. 模型微调](#5-模型微调)。
+
## PaddleOCR-VL 对推理设备的支持情况
-目前 PaddleOCR-VL 有三种推理方式,支持的推理设备不完全相同,请确认您的推理设备是否满足下表要求再进行 PaddleOCR-VL 的推理部署:
+目前 PaddleOCR-VL 有四种推理方式,支持的推理设备不完全相同,请确认您的推理设备是否满足下表要求再进行 PaddleOCR-VL 的推理部署:
| 推理方式 | -支持 x64 CPU | -支持的 GPU Compute Capability | -支持的 CUDA 版本 | +英伟达 GPU | +昆仑芯 XPU | +海光 DCU | +沐曦 GPU | +天数 GPU | +x64 CPU |
|---|---|---|---|---|---|---|---|---|---|
| PaddlePaddle | ✅ | -≥ 7 | -≥ 11.8 | +✅ | +✅ | +🚧 | +🚧 | +✅ | |
| vLLM | +✅ | +🚧 | +✅ | 🚧 | -≥ 8 (RTX 3060,RTX 5070,A10,A100, ...) - 7 ≤ GPU Compute Capability < 8 (T4,V100,...)支持运行,但可能出现请求超时、OOM 等异常情况,不推荐使用 - |
- ≥ 12.6 | +🚧 | +❌ | |
| SGLang | -🚧 | -8 ≤ GPU Compute Capability < 12 | -≥ 12.6 | +✅ | +🚧 | +🚧 | +🚧 | +🚧 | +❌ | +
| FastDeploy | +✅ | +✅ | +🚧 | +🚧 | +🚧 | +❌ |
| 硬件类型 | -硬件型号 | -环境配置教程 | -
|---|---|---|
| NVIDIA GPU | -RTX 30、40 系 | -本教程 | -
| RTX 50 系 | -PaddleOCR-VL RTX 50 环境配置教程 | -|
| x64 CPU | -- | -本教程 | -
| XPU | -🚧 | -🚧 | -
| DCU | -🚧 | -🚧 | -
compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ ports:
+- - 8080:8080
++ - 8111:8080
+ ...
+```
+
+compose.yaml 文件中的 device_ids 来更改所使用的 GPU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改:
+
+```diff
+ paddleocr-vl-api:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+ paddleocr-vlm-server:
+ ...
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+- device_ids: ["0"]
++ device_ids: ["1"]
+ capabilities: [gpu]
+ ...
+```
+
+paddleocr-vlm-server.volumes 和 paddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。
+
+```yaml
+ paddleocr-vlm-server:
+ ...
+ volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml
+ command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml
+ ...
+```
+
+.env 文件中的 VLM_BACKEND,例如将 VLM 推理后端修改为 fastdeploy:
+
+```diff
+ API_IMAGE_TAG_SUFFIX=latest-offline
+- VLM_BACKEND=vllm
++ VLM_BACKEND=fastdeploy
+ VLM_IMAGE_TAG_SUFFIX=latest-offline
+```
+
+