diff --git a/.github/workflows/build_publish_develop_docs.yml b/.github/workflows/build_publish_develop_docs.yml index 604c54da04..8514238f63 100644 --- a/.github/workflows/build_publish_develop_docs.yml +++ b/.github/workflows/build_publish_develop_docs.yml @@ -25,7 +25,7 @@ jobs: path: .cache restore-keys: | mkdocs-material- - - run: pip install mike mkdocs-material jieba mkdocs-git-revision-date-localized-plugin mkdocs-git-committers-plugin-2 mkdocs-static-i18n + - run: pip install mike mkdocs-material jieba mkdocs-git-revision-date-localized-plugin mkdocs-git-committers-plugin-2 mkdocs-static-i18n markdown-callouts - run: | git fetch origin gh-pages --depth=1 mike deploy --push --update-aliases main latest diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-DCU.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-DCU.en.md new file mode 100644 index 0000000000..ad47dd78de --- /dev/null +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL-DCU.en.md @@ -0,0 +1,236 @@ +--- +comments: true +--- + +# PaddleOCR-VL DCU Environment Configuration Tutorial + +This tutorial is a guide for configuring the PaddleOCR-VL HYGON DCU environment. The purpose is to complete the relevant environment setup. After the environment configuration is complete, please refer to the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md) to use PaddleOCR-VL. + +## 1. Environment Preparation + +This step mainly introduces how to set up the runtime environment for PaddleOCR-VL. There are two methods available; choose either one: + +- Method 1: Use the official Docker image. + +- Method 2: Manually install PaddlePaddle and PaddleOCR. + +### 1.1 Method 1: Using Docker Image + +We recommend using the official Docker image (requires Docker version >= 19.03): + +```shell +docker run -it \ + --rm \ + --user root \ + --privileged \ + --device /dev/kfd \ + --device /dev/dri \ + --device /dev/mkfd \ + --group-add video \ + --cap-add SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -v /opt/hyhal/:/opt/hyhal/:ro \ + --shm-size=64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-dcu \ + /bin/bash +# Call PaddleOCR CLI or Python API in the container +``` + +If you wish to start the service in an environment without internet access, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-dcu` (image size approximately 21 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-dcu-offline` (image size approximately 23 GB). + +### 1.2 Method 2: Manually Install PaddlePaddle and PaddleOCR + +If you cannot use Docker, you can also manually install PaddlePaddle and PaddleOCR. Python version 3.8–3.12 is required. + +**We strongly recommend installing PaddleOCR-VL in a virtual environment to avoid dependency conflicts.** For example, use the Python venv standard library to create a virtual environment: + +```shell +# Create a virtual environment +python -m venv .venv_paddleocr +# Activate the environment +source .venv_paddleocr/bin/activate +``` + +Execute the following commands to complete the installation: + +```shell +python -m pip install paddlepaddle-dcu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/dcu/ +python -m pip install -U "paddleocr[doc-parser]" +python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl +``` + +> **Please note to install PaddlePaddle version 3.2.1 or above, and install the special version of safetensors.** + +## 2. Quick Start + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 3. Improving VLM Inference Performance Using Inference Acceleration Framework + +The inference performance under default configurations is not fully optimized and may not meet actual production requirements. This step mainly introduces how to use the vLLM inference acceleration framework to improve the inference performance of PaddleOCR-VL. + +### 3.1 Starting the VLM Inference Service + +PaddleOCR provides a Docker image for quickly starting the vLLM inference service. Use the following command to start the service (requires Docker version >= 19.03): + +```shell +docker run -it \ + --rm \ + --user root \ + --privileged \ + --device /dev/kfd \ + --device /dev/dri \ + --device /dev/mkfd \ + --group-add video \ + --cap-add SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -v /opt/hyhal/:/opt/hyhal/:ro \ + --shm-size=64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm +``` + +If you wish to start the service in an environment without internet access, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu` (image size approximately 25 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu-offline` (image size approximately 27 GB). + +When launching the vLLM inference service, we provide a set of default parameter settings. If you need to adjust parameters such as GPU memory usage, you can configure additional parameters yourself. Please refer to [3.3.1 Server-side Parameter Adjustment](./PaddleOCR-VL.md#331-server-side-parameter-adjustment) to create a configuration file, then mount the file into the container and specify the configuration file using `backend_config` in the command to start the service, for example: + +```shell +docker run -it \ + --rm \ + --user root \ + --privileged \ + --device /dev/kfd \ + --device /dev/dri \ + --device /dev/mkfd \ + --group-add video \ + --cap-add SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -v /opt/hyhal/:/opt/hyhal/:ro \ + -v vllm_config.yml:/tmp/vllm_config.yml \ + --shm-size=64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/vllm_config.yml +``` + +### 3.2 Client Usage Method + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 4. Service Deployment + +>Please note that the PaddleOCR-VL service introduced in this section is different from the VLM inference service in the previous section: the latter is only responsible for one part of the complete process (i.e., VLM inference) and is called as an underlying service by the former. + +This step mainly introduces how to use Docker Compose to deploy PaddleOCR-VL as a service and call it. The specific process is as follows: + +1. Copy the content from [here](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/compose_dcu.yaml) and save it as a `compose.yaml` file. + +2. Copy the following content and save it as a `.env` file: + + ``` + API_IMAGE_TAG_SUFFIX=latest-dcu-offline + VLM_BACKEND=vllm + VLM_IMAGE_TAG_SUFFIX=latest-dcu-offline + ```3. Execute the following command in the directory where the `compose.yaml` and `.env` files are located to start the server, which listens on port **8080** by default: + + ```shell + # Must be executed in the directory where compose.yaml and .env files are located + docker compose up + ``` + + After startup, you will see output similar to the following: + + ```text + paddleocr-vl-api | INFO: Started server process [1] + paddleocr-vl-api | INFO: Waiting for application startup. + paddleocr-vl-api | INFO: Application startup complete. + paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + +This method accelerates VLM inference using the vLLM framework and is more suitable for production environment deployment. + +Additionally, after starting the server in this manner, no internet connection is required except for image pulling. For deployment in an offline environment, you can first pull the images involved in the Compose file on a connected machine, export them, and transfer them to the offline machine for import to start the service in an offline environment. + +Docker Compose starts two containers sequentially by reading configurations from the `.env` and `compose.yaml` files, running the underlying VLM inference service and the PaddleOCR-VL service (pipeline service) respectively. + +The meanings of each environment variable contained in the `.env` file are as follows: + +``` +- `API_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the pipeline service. +- `VLM_BACKEND`: The VLM inference backend. +- `VLM_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the VLM inference service. +``` + +You can modify `compose.yaml` to meet custom requirements, for example: + +
+1. Change the port of the PaddleOCR-VL service + +Edit paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. Specify the DCU used by the PaddleOCR-VL service + +Edit environment in the compose.yaml file to change the DCU used. For example, if you need to use card 1 for deployment, make the following modifications: + +```diff + paddleocr-vl-api: + ... + environment: ++ - HIP_VISIBLE_DEVICES: 1 + ... + paddleocr-vlm-server: + ... + environment: ++ - HIP_VISIBLE_DEVICES: 1 + ... +``` + + +
+ +
+3. Adjust VLM server-side configuration + +If you want to adjust the VLM server configuration, refer to 3.3.1 Server Parameter Adjustment to generate a configuration file. + +After generating the configuration file, add the following paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Replace /path/to/your_config.yaml with your actual configuration file path. + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. Adjust pipeline-related configurations (such as model path, batch size, deployment device, etc.) + +Refer to the 4.4 Pipeline Configuration Adjustment Instructions section. + +
+ +### 4.3 Client Invocation Method + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +### 4.4 Pipeline Configuration Adjustment Instructions + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 5. Model Fine-Tuning + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-DCU.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-DCU.md new file mode 100644 index 0000000000..186bd0948a --- /dev/null +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL-DCU.md @@ -0,0 +1,236 @@ +--- +comments: true +--- + +# PaddleOCR-VL DCU 环境配置教程 + +本教程是 PaddleOCR-VL 海光 DCU 的环境配置教程,目的是完成相关的环境配置,环境配置完毕后请参考 [PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 使用 PaddleOCR-VL。 + +## 1. 环境准备 + +此步骤主要介绍如何搭建 PaddleOCR-VL 的运行环境,有以下两种方式,任选一种即可: + +- 方法一:使用官方 Docker 镜像。 + +- 方法二:手动安装 PaddlePaddle 和 PaddleOCR。 + +### 1.1 方法一:使用 Docker 镜像 + +我们推荐使用官方 Docker 镜像(要求 Docker 版本 >= 19.03): + +```shell +docker run -it \ + --rm \ + --user root \ + --privileged \ + --device /dev/kfd \ + --device /dev/dri \ + --device /dev/mkfd \ + --group-add video \ + --cap-add SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -v /opt/hyhal/:/opt/hyhal/:ro \ + --shm-size=64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-dcu \ + /bin/bash +# 在容器中调用 PaddleOCR CLI 或 Python API +``` + +如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-dcu`(镜像大小约为 21 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-dcu-offline`(镜像大小约为 23 GB)。 + +### 1.2 方法二:手动安装 PaddlePaddle 和 PaddleOCR + +如果您无法使用 Docker,也可以手动安装 PaddlePaddle 和 PaddleOCR。要求 Python 版本为 3.8–3.12。 + +**我们强烈推荐您在虚拟环境中安装 PaddleOCR-VL,以避免发生依赖冲突。** 例如,使用 Python venv 标准库创建虚拟环境: + +```shell +# 创建虚拟环境 +python -m venv .venv_paddleocr +# 激活环境 +source .venv_paddleocr/bin/activate +``` + +执行如下命令完成安装: + +```shell +python -m pip install paddlepaddle-dcu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/dcu/ +python -m pip install -U "paddleocr[doc-parser]" +python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl +``` + +> **请注意安装 3.2.1 及以上版本的飞桨框架,同时安装特殊版本的 safetensors。** + +## 2. 快速开始 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md)相同章节。 + +## 3. 使用推理加速框架提升 VLM 推理性能 + +默认配置下的推理性能未经过充分优化,可能无法满足实际生产需求。此步骤主要介绍如何使用 vLLM 推理加速框架来提升 PaddleOCR-VL 的推理性能。 + +### 3.1 启动 VLM 推理服务 + +PaddleOCR 提供了 Docker 镜像,用于快速启动 vLLM 推理服务。可使用以下命令启动服务(要求 Docker 版本 >= 19.03): + +```shell +docker run -it \ + --rm \ + --user root \ + --privileged \ + --device /dev/kfd \ + --device /dev/dri \ + --device /dev/mkfd \ + --group-add video \ + --cap-add SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -v /opt/hyhal/:/opt/hyhal/:ro \ + --shm-size=64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm +``` + +如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu`(镜像大小约为 25 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu-offline`(镜像大小约为 27 GB)。 + +启动 vLLM 推理服务时,我们提供了一套默认参数设置。如果您有调整显存占用等更多参数的需求,可以自行配置更多参数。请参考 [3.3.1 服务端参数调整](./PaddleOCR-VL.md#331-服务端参数调整) 创建配置文件,然后将该文件挂载到容器中,并在启动服务的命令中使用 `backend_config` 指定配置文件,例如: + +```shell +docker run -it \ + --rm \ + --user root \ + --privileged \ + --device /dev/kfd \ + --device /dev/dri \ + --device /dev/mkfd \ + --group-add video \ + --cap-add SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -v /opt/hyhal/:/opt/hyhal/:ro \ + -v vllm_config.yml:/tmp/vllm_config.yml \ + --shm-size=64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-dcu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/vllm_config.yml +``` + +### 3.2 客户端使用方法 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +## 4. 服务化部署 + +>请注意,本节所介绍 PaddleOCR-VL 服务与上一节中的 VLM 推理服务有所区别:后者仅负责完整流程中的一个环节(即 VLM 推理),并作为前者的底层服务被调用。 + +此步骤主要介绍如何使用 Docker Compose 将 PaddleOCR-VL 部署为服务并调用,具体流程如下: + +1. 从 [此处](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/compose_dcu.yaml) 复制内容保存为 `compose.yaml` 文件。 + +2. 复制以下内容并保存为 `.env` 文件: + + ``` + API_IMAGE_TAG_SUFFIX=latest-dcu-offline + VLM_BACKEND=vllm + VLM_IMAGE_TAG_SUFFIX=latest-dcu-offline + ``` + +3. 在 `compose.yaml` 和 `.env` 文件所在目录下执行以下命令启动服务器,默认监听 **8080** 端口: + + ```shell + # 必须在 compose.yaml 和 .env 文件所在的目录中执行 + docker compose up + ``` + + 启动后将看到类似如下输出: + + ```text + paddleocr-vl-api | INFO: Started server process [1] + paddleocr-vl-api | INFO: Waiting for application startup. + paddleocr-vl-api | INFO: Application startup complete. + paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + +此方式基于 vLLM 框架对 VLM 推理进行加速,更适合生产环境部署。 + +此外,使用此方式启动服务器后,除拉取镜像外,无需连接互联网。如需在离线环境中部署,可先在联网机器上拉取 Compose 文件中涉及的镜像,导出并传输至离线机器中导入,即可在离线环境下启动服务。 + +Docker Compose 通过读取 `.env` 和 `compose.yaml` 文件中配置,先后启动 2 个容器,分别运行底层 VLM 推理服务,以及 PaddleOCR-VL 服务(产线服务)。 + +`.env` 文件中包含的各环境变量含义如下: + +- `API_IMAGE_TAG_SUFFIX`:启动产线服务使用的镜像的标签后缀。 +- `VLM_BACKEND`:VLM 推理后端。 +- `VLM_IMAGE_TAG_SUFFIX`:启动 VLM 推理服务使用的镜像的标签后缀。 + +您可以通过修改 `compose.yaml` 来满足自定义需求,例如: + +
+1. 更改 PaddleOCR-VL 服务的端口 + +编辑 compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. 指定 PaddleOCR-VL 服务所使用的 DCU + +编辑 compose.yaml 文件中的 environment 来更改所使用的 DCU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + environment: ++ - HIP_VISIBLE_DEVICES: 1 + ... + paddleocr-vlm-server: + ... + environment: ++ - HIP_VISIBLE_DEVICES: 1 + ... +``` + + +
+ +
+3. 调整 VLM 服务端配置 + +若您想调整 VLM 服务端的配置,可以参考 3.3.1 服务端参数调整 生成配置文件。 + +生成配置文件后,将以下的 paddleocr-vlm-server.volumespaddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。 + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. 调整产线相关配置(如模型路径、批处理大小、部署设备等) + +参考 4.4 产线配置调整说明 小节。 + +
+ +### 4.3 客户端调用方式 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +### 4.4 产线配置调整说明 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +## 5. 模型微调 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md new file mode 100644 index 0000000000..dd688c7343 --- /dev/null +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.en.md @@ -0,0 +1,303 @@ +--- +comments: true +--- + +# PaddleOCR-VL NVIDIA Blackwell-Architecture GPUs Environment Configuration Tutorial + +This tutorial provides guidance on configuring the environment for NVIDIA Blackwell-architecture GPUs. After completing the environment setup, please refer to the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md) to use PaddleOCR-VL. + +NVIDIA Blackwell-architecture GPUs include, but are not limited to: + +- RTX 5090 +- RTX 5080 +- RTX 5070、RTX 5070 Ti +- RTX 5060、RTX 5060 Ti +- RTX 5050 + +Before starting the tutorial, **please ensure that your NVIDIA driver supports CUDA 12.9 or higher**. + +## 1. Environment Preparation + +This section introduces how to set up the PaddleOCR-VL runtime environment using one of the following two methods: + +- Method 1: Use the official Docker image. + +- Method 2: Manually install PaddlePaddle and PaddleOCR. + +### 1.1 Method 1: Using Docker Image + +We recommend using the official Docker image (requires Docker version >= 19.03, GPU-equipped machine with NVIDIA driver supporting CUDA 12.9 or higher): + +```shell +docker run \ + -it \ + --gpus all \ + --network host \ + --user root \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-gpu-sm120 \ + /bin/bash +# Call PaddleOCR CLI or Python API in the container +``` + +If you wish to use PaddleOCR-VL in an offline environment, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-gpu-sm120` (image size ~10 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-gpu-sm120-offline` (image size ~12 GB). + +### 1.2 Method 2: Manually Install PaddlePaddle and PaddleOCR + +If Docker is not an option, you can manually install PaddlePaddle and PaddleOCR. Python version 3.8–3.12 is required. + +**We strongly recommend installing PaddleOCR-VL in a virtual environment to avoid dependency conflicts.** For example, create a virtual environment using Python's standard venv library: + +```shell +# Create a virtual environment +python -m venv .venv_paddleocr +# Activate the environment +source .venv_paddleocr/bin/activate +``` + +Run the following commands to complete the installation: + +```shell +# Note that PaddlePaddle for cu129 is being installed here +python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/ +python -m pip install -U "paddleocr[doc-parser]" +# For Linux systems, run: +python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl +# For Windows systems, run: +python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl +``` + +> **Please ensure that PaddlePaddle framework version 3.2.1 or higher is installed, along with the special version of safetensors.** + +## 2. Quick Start + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 3. Improving VLM Inference Performance Using Inference Acceleration Frameworks + +The inference performance under default configurations may not be fully optimized and may not meet actual production requirements. This section introduces how to use the vLLM and SGLang inference acceleration frameworks to enhance PaddleOCR-VL's inference performance. + +### 3.1 Starting the VLM Inference Service + +There are two methods to start the VLM inference service; choose one: + +- Method 1: Start the service using the official Docker image. + +- Method 2: Manually install dependencies and start the service via PaddleOCR CLI. + +#### 3.1.1 Method 1: Using Docker Image + +PaddleOCR provides a Docker image for quickly starting the vLLM inference service. Use the following command to start the service (requires Docker version >= 19.03, GPU-equipped machine with NVIDIA driver supporting CUDA 12.9 or higher): + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120 \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm +``` + +If you wish to start the service in an offline environment, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120` (image size ~12 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120-offline` (image size ~14 GB). + +When launching the vLLM inference service, we provide a set of default parameter settings. If you need to adjust parameters such as GPU memory usage, you can configure additional parameters yourself. Please refer to [3.3.1 Server-side Parameter Adjustment](./PaddleOCR-VL.md#331-server-side-parameter-adjustment) to create a configuration file, then mount the file into the container and specify the configuration file using `backend_config` in the command to start the service, for example: + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + -v vllm_config.yml:/tmp/vllm_config.yml \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120 \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/vllm_config.yml +``` + +#### 3.1.2 Method 2: Installation and Usage via PaddleOCR CLI + +Since inference acceleration frameworks may have dependency conflicts with the PaddlePaddle framework, installation in a virtual environment is recommended. Taking vLLM as an example: + +```shell +# If there is an active virtual environment, deactivate it first using `deactivate` +# Create a virtual environment +python -m venv .venv_vlm +# Activate the environment +source .venv_vlm/bin/activate +# Install PaddleOCR +python -m pip install "paddleocr[doc-parser]"# Install dependencies for inference acceleration services +paddleocr install_genai_server_deps vllm +python -m pip install flash-attn==2.8.3 +``` + +> The `paddleocr install_genai_server_deps` command may require CUDA compilation tools such as nvcc during execution. If these tools are not available in your environment or the installation takes too long, you can obtain a pre-compiled version of FlashAttention from [this repository](https://github.com/mjun0812/flash-attention-prebuild-wheels). For example, run `python -m pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu128torch2.8-cp310-cp310-linux_x86_64.whl`. + +Usage of the `paddleocr install_genai_server_deps` command: + +```shell +paddleocr install_genai_server_deps +``` + +Currently supported framework names are `vllm` and `sglang`, corresponding to vLLM and SGLang, respectively. + +After installation, you can start the service using the `paddleocr genai_server` command: + +```shell +paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 +``` + +The parameters supported by this command are as follows: + +| Parameter | Description | +|------------------|-----------------------------------------------------------------------------| +| `--model_name` | Name of the model | +| `--model_dir` | Directory containing the model | +| `--host` | Server hostname | +| `--port` | Server port number | +| `--backend` | Backend name, i.e., the name of the inference acceleration framework being used; options are `vllm` or `sglang` | +| `--backend_config`| YAML file specifying backend configuration | + +### 3.2 Client Usage + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 4. Service Deployment + +This section mainly introduces how to deploy PaddleOCR-VL as a service and invoke it. There are two methods available; choose one: + +- Method 1: Deploy using Docker Compose. + +- Method 2: Manually install dependencies for deployment. + +Please note that the PaddleOCR-VL service introduced in this section differs from the VLM inference service in the previous section: the latter is responsible for only one part of the complete process (i.e., VLM inference) and is called as an underlying service by the former. + +### 4.1 Method 1: Deploy Using Docker Compose + +1. Copy the content from [here](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/compose.yaml) and save it as a `compose.yaml` file. + +2. Copy the following content and save it as a `.env` file: + + ``` + API_IMAGE_TAG_SUFFIX=latest-gpu-sm120-offline + VLM_BACKEND=vllm + VLM_IMAGE_TAG_SUFFIX=latest-gpu-sm120-offline + ``` + +3. Execute the following command in the directory containing the `compose.yaml` and `.env` files to start the server, which will listen on port **8080** by default: + + ```shell + # Must be executed in the directory containing compose.yaml and .env files + docker compose up + ``` + + After startup, you will see output similar to the following: + + ```text + paddleocr-vl-api | INFO: Started server process [1] + paddleocr-vl-api | INFO: Waiting for application startup. + paddleocr-vl-api | INFO: Application startup complete. + paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + +This method accelerates VLM inference using the vLLM framework and is more suitable for production environment deployment. + +Additionally, after starting the server in this manner, no internet connection is required except for image pulling. For deployment in an offline environment, you can first pull the images involved in the Compose file on a connected machine, export them, and transfer them to the offline machine for import to start the service in an offline environment. + +Docker Compose starts two containers sequentially by reading configurations from the `.env` and `compose.yaml` files, running the underlying VLM inference service and the PaddleOCR-VL service (pipeline service) respectively. + +The meanings of each environment variable contained in the `.env` file are as follows: + +``` +- `API_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the pipeline service. +- `VLM_BACKEND`: The VLM inference backend. +- `VLM_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the VLM inference service. +``` + +You can modify `compose.yaml` to meet custom requirements, for example: + +
+1. Change the port of the PaddleOCR-VL service + +Edit paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. Specify the GPU used by the PaddleOCR-VL service + +Edit environment in the compose.yaml file to change the GPU used. For example, if you need to use card 1 for deployment, make the following modifications: + +```diff + paddleocr-vl-api: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... + paddleocr-vlm-server: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... +``` + +
+ +
+3. Adjust VLM server-side configuration + +If you want to adjust the VLM server configuration, refer to 3.3.1 Server Parameter Adjustment to generate a configuration file. + +After generating the configuration file, add the following paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Replace /path/to/your_config.yaml with your actual configuration file path. + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. Adjust pipeline-related configurations (such as model path, batch size, deployment device, etc.) + +Refer to the 4.4 Pipeline Configuration Adjustment Instructions section. + +
+ +### 4.2 Method 2: Manually Deployment + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +### 4.3 Client Invocation Methods + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +### 4.4 Pipeline Configuration Adjustment Instructions + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 5. Model Fine-Tuning + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.md new file mode 100644 index 0000000000..c38c7e89e3 --- /dev/null +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.md @@ -0,0 +1,302 @@ +--- +comments: true +--- + +# PaddleOCR-VL NVIDIA Blackwell 架构 GPU 环境配置教程 + +本教程是 NVIDIA Blackwell 架构 GPU 的环境配置教程,目的是完成相关的环境配置,环境配置完毕后请参考 [PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 使用 PaddleOCR-VL。 + +NVIDIA Blackwell 架构 GPU 包括但不限于以下几种: + +- RTX 5090 +- RTX 5080 +- RTX 5070、RTX 5070 Ti +- RTX 5060、RTX 5060 Ti +- RTX 5050 + +教程开始前,**请确认您的 NVIDIA 驱动支持 CUDA 12.9 或以上版本**。 + +## 1. 环境准备 + +此步骤主要介绍如何搭建 PaddleOCR-VL 的运行环境,有以下两种方式,任选一种即可: + +- 方法一:使用官方 Docker 镜像。 + +- 方法二:手动安装 PaddlePaddle 和 PaddleOCR。 + +### 1.1 方法一:使用 Docker 镜像 + +我们推荐使用官方 Docker 镜像(要求 Docker 版本 >= 19.03,机器装配有 GPU 且 NVIDIA 驱动支持 CUDA 12.9 或以上版本): + +```shell +docker run \ + -it \ + --gpus all \ + --network host \ + --user root \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-gpu-sm120 \ + /bin/bash +# 在容器中调用 PaddleOCR CLI 或 Python API +``` + +如果您希望在无法连接互联网的环境中使用 PaddleOCR-VL,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-gpu-sm120`(镜像大小约为 10 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-gpu-sm120-offline`(镜像大小约为 12 GB)。 + +### 1.2 方法二:手动安装 PaddlePaddle 和 PaddleOCR + +如果您无法使用 Docker,也可以手动安装 PaddlePaddle 和 PaddleOCR。要求 Python 版本为 3.8–3.12。 + +**我们强烈推荐您在虚拟环境中安装 PaddleOCR-VL,以避免发生依赖冲突。** 例如,使用 Python venv 标准库创建虚拟环境: + +```shell +# 创建虚拟环境 +python -m venv .venv_paddleocr +# 激活环境 +source .venv_paddleocr/bin/activate +``` + +执行如下命令完成安装: + +```shell +# 注意这里安装的是 cu129 的 PaddlePaddle +python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/ +python -m pip install -U "paddleocr[doc-parser]" +# 对于 Linux 系统,执行: +python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl +# 对于 Windows 系统,执行: +python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl +``` + +> **请注意安装 3.2.1 及以上版本的飞桨框架,同时安装特殊版本的 safetensors。** + +## 2. 快速开始 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md)相同章节。 + +## 3. 使用推理加速框架提升 VLM 推理性能 + +默认配置下的推理性能未经过充分优化,可能无法满足实际生产需求。此步骤主要介绍如何使用 vLLM 和 SGLang 推理加速框架来提升 PaddleOCR-VL 的推理性能。 + +### 3.1 启动 VLM 推理服务 + +启动 VLM 推理服务有以下两种方式,任选一种即可: + +- 方法一:使用官方 Docker 镜像启动服务。 + +- 方法二:通过 PaddleOCR CLI 手动安装依赖后启动服务。 + +#### 3.1.1 方法一:使用 Docker 镜像 + +PaddleOCR 提供了 Docker 镜像,用于快速启动 vLLM 推理服务。可使用以下命令启动服务(要求 Docker 版本 >= 19.03,机器装配有 GPU 且 NVIDIA 驱动支持 CUDA 12.9 或以上版本): + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120 \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm +``` + +如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120`(镜像大小约为 12 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120-offline`(镜像大小约为 14 GB)。 + +启动 vLLM 推理服务时,我们提供了一套默认参数设置。如果您有调整显存占用等更多参数的需求,可以自行配置更多参数。请参考 [3.3.1 服务端参数调整](#331-服务端参数调整) 创建配置文件,然后将该文件挂载到容器中,并在启动服务的命令中使用 `backend_config` 指定配置文件,例如: + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + -v vllm_config.yml:/tmp/vllm_config.yml \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-gpu-sm120 \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/vllm_config.yml +``` + +#### 3.1.2 方法二:通过 PaddleOCR CLI 安装和使用 + +由于推理加速框架可能与飞桨框架存在依赖冲突,建议在虚拟环境中安装。以 vLLM 为例: + +```shell +# 如果当前存在已激活的虚拟环境,先通过 `deactivate` 取消激活 +# 创建虚拟环境 +python -m venv .venv_vlm +# 激活环境 +source .venv_vlm/bin/activate +# 安装 PaddleOCR +python -m pip install "paddleocr[doc-parser]" +# 安装推理加速服务依赖 +paddleocr install_genai_server_deps vllm +python -m pip install flash-attn==2.8.3 +``` + +> `paddleocr install_genai_server_deps` 命令在执行过程中可能需要使用 nvcc 等 CUDA 编译工具。如果您的环境中没有这些工具或者安装时间过长,可以从 [此仓库](https://github.com/mjun0812/flash-attention-prebuild-wheels) 获取 FlashAttention 的预编译版本,例如执行 `python -m pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu128torch2.8-cp310-cp310-linux_x86_64.whl`。 + +`paddleocr install_genai_server_deps` 命令用法: + +```shell +paddleocr install_genai_server_deps <推理加速框架名称> +``` + +当前支持的框架名称为 `vllm` 和 `sglang`,分别对应 vLLM 和 SGLang。 + +安装完成后,可通过 `paddleocr genai_server` 命令启动服务: + +```shell +paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 +``` + +该命令支持的参数如下: + +| 参数 | 说明 | +| ------------------ | ------------------------- | +| `--model_name` | 模型名称 | +| `--model_dir` | 模型目录 | +| `--host` | 服务器主机名 | +| `--port` | 服务器端口号 | +| `--backend` | 后端名称,即使用的推理加速框架名称,可选 `vllm` 或 `sglang` | +| `--backend_config` | 可指定 YAML 文件,包含后端配置 | + +### 3.2 客户端使用方法 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +## 4. 服务化部署 + +此步骤主要介绍如何将 PaddleOCR-VL 部署为服务并调用,有以下两种方式,任选一种即可: + +- 方法一:使用 Docker Compose 部署。 + +- 方法二:手动安装依赖部署。 + +请注意,本节所介绍 PaddleOCR-VL 服务与上一节中的 VLM 推理服务有所区别:后者仅负责完整流程中的一个环节(即 VLM 推理),并作为前者的底层服务被调用。 + +### 4.1 方法一:使用 Docker Compose 部署 + +1. 从 [此处](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/compose.yaml) 复制内容保存为 `compose.yaml` 文件。 + +2. 复制以下内容并保存为 `.env` 文件: + + ``` + API_IMAGE_TAG_SUFFIX=latest-gpu-sm120-offline + VLM_BACKEND=vllm + VLM_IMAGE_TAG_SUFFIX=latest-gpu-sm120-offline + ``` + +3. 在 `compose.yaml` 和 `.env` 文件所在目录下执行以下命令启动服务器,默认监听 **8080** 端口: + + ```shell + # 必须在 compose.yaml 和 .env 文件所在的目录中执行 + docker compose up + ``` + + 启动后将看到类似如下输出: + + ```text + paddleocr-vl-api | INFO: Started server process [1] + paddleocr-vl-api | INFO: Waiting for application startup. + paddleocr-vl-api | INFO: Application startup complete. + paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + +此方式基于 vLLM 框架对 VLM 推理进行加速,更适合生产环境部署。 + +此外,使用此方式启动服务器后,除拉取镜像外,无需连接互联网。如需在离线环境中部署,可先在联网机器上拉取 Compose 文件中涉及的镜像,导出并传输至离线机器中导入,即可在离线环境下启动服务。 + +Docker Compose 通过读取 `.env` 和 `compose.yaml` 文件中配置,先后启动 2 个容器,分别运行底层 VLM 推理服务,以及 PaddleOCR-VL 服务(产线服务)。 + +`.env` 文件中包含的各环境变量含义如下: + +- `API_IMAGE_TAG_SUFFIX`:启动产线服务使用的镜像的标签后缀。 +- `VLM_BACKEND`:VLM 推理后端。 +- `VLM_IMAGE_TAG_SUFFIX`:启动 VLM 推理服务使用的镜像的标签后缀。 + +您可以通过修改 `compose.yaml` 来满足自定义需求,例如: + +
+1. 更改 PaddleOCR-VL 服务的端口 + +编辑 compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. 指定 PaddleOCR-VL 服务所使用的 GPU + +编辑 compose.yaml 文件中的 environment 来更改所使用的 GPU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... + paddleocr-vlm-server: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... +``` + +
+ +
+3. 调整 VLM 服务端配置 + +若您想调整 VLM 服务端的配置,可以参考 3.3.1 服务端参数调整 生成配置文件。 + +生成配置文件后,将以下的 paddleocr-vlm-server.volumespaddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。 + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. 调整产线相关配置(如模型路径、批处理大小、部署设备等) + +参考 4.4 产线配置调整说明 小节。 + +
+ +### 4.2 方法二:手动部署 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +### 4.3 客户端调用方式 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +### 4.4 产线配置调整说明 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +## 5. 模型微调 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-RTX50.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-RTX50.en.md deleted file mode 100644 index e5447f1f97..0000000000 --- a/docs/version3.x/pipeline_usage/PaddleOCR-VL-RTX50.en.md +++ /dev/null @@ -1,201 +0,0 @@ ---- -comments: true ---- - -# PaddleOCR-VL-RTX50 Environment Configuration Tutorial - -This tutorial is an environment configuration guide for NVIDIA RTX 50 series GPUs, aiming to complete the relevant environment setup. After completing the environment configuration, please refer to the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md) to use PaddleOCR-VL. - -Before starting the tutorial, **please confirm that your NVIDIA driver supports CUDA 12.9 or later**. - -## 1. Environment Preparation - -This section mainly introduces how to set up the runtime environment for PaddleOCR-VL. There are two methods below; choose either one: - -- Method 1: Use the official Docker image (not currently supported, adaptation in progress). - -- Method 2: Manually install PaddlePaddle and PaddleOCR. - -### 1.1 Method 1: Using Docker Image - -Not currently supported, adaptation in progress. - -### 1.2 Method 2: Manually Install PaddlePaddle and PaddleOCR - -If you cannot use Docker, you can also manually install PaddlePaddle and PaddleOCR. Python version 3.8–3.12 is required. - -**We strongly recommend installing PaddleOCR-VL in a virtual environment to avoid dependency conflicts.** For example, use the Python `venv` standard library to create a virtual environment: - -```shell -# Create a virtual environment -python -m venv .venv_paddleocr -# Activate the environment -source .venvenv_paddleocr/bin/activate -``` - -Run the following commands to complete the installation: - -```shell -python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/ -python -m pip install -U "paddleocr[doc-parser]" -# For Linux systems, run: -python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl -# For Windows systems, run: -python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl -``` - -> **Please ensure that you install PaddlePaddle version 3.2.1 or later, along with the special version of safetensors.** - -## 2. Quick Start - -Please refer to the same section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). - -## 3. Improving VLM Inference Performance Using Inference Acceleration Frameworks - -The inference performance under default configurations is not fully optimized and may not meet actual production requirements. This section mainly introduces how to use the vLLM and SGLang inference acceleration frameworks to improve the inference performance of PaddleOCR-VL. - -### 3.1 Starting the VLM Inference Service - -There are two methods to start the VLM inference service; choose either one: - -- Method 1: Use the official Docker image to start the service (not currently supported, adaptation in progress). - -- Method 2: Manually install dependencies via the PaddleOCR CLI and then start the service. - -#### 3.1.1 Method 1: Using Docker Image - -Not currently supported, adaptation in progress. - -#### 3.1.2 Method 2: Installation and Usage via PaddleOCR CLI - -Since inference acceleration frameworks may have dependency conflicts with PaddlePaddle, it is recommended to install them in a virtual environment. Taking vLLM as an example: - -```shell -# If there is an currently activated virtual environment, deactivate it first using `deactivate` -# Create a virtual environment -python -m venv .venv_vlm -# Activate the environment -source .venv_vlm/bin/activate -# Install PaddleOCR -python -m pip install "paddleocr[doc-parser]" -# Install dependencies for the inference acceleration service -paddleocr install_genai_server_deps vllm -python -m pip install flash-attn==2.8.3 -``` - -> The `paddleocr install_genai_server_deps` command may require CUDA compilation tools such as `nvcc` during execution. If these tools are not available in your environment or the installation takes too long, you can obtain a precompiled version of FlashAttention from [this repository](https://github.com/mjun0812/flash-attention-prebuild-wheels), for example, by running `python -m pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu128torch2.8-cp310-cp310-linux_x86_64.whl`. - -Usage of the `paddleocr install_genai_server_deps` command: - -```shell -paddleocr install_genai_server_deps -``` - -The currently supported framework names are `vllm` and `sglang`, corresponding to vLLM and SGLang, respectively. - -After installation, you can start the service using the `paddleocr genai_server` command: - -```shell -paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 -``` - -The supported parameters for this command are as follows: - -| Parameter | Description | -|-------------------|--------------------------------------| -| `--model_name` | Model name | -| `--model_dir` | Model directory | -| `--host` | Server hostname | -| `--port` | Server port number | -| `--backend` | Backend name, i.e., the name of the inference acceleration framework used; options are `vllm` or `sglang` | -| `--backend_config` | Specify a YAML file containing backend configurations | - -### 3.2 Client Usage - -Please refer to the same section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md).## 4. Service-Oriented Deployment - -This section primarily introduces how to deploy PaddleOCR-VL as a service and invoke it. There are two methods available; choose either one: - -- Method 1: Deployment using Docker Compose (currently not supported, adaptation in progress). - -- Method 2: Manual installation of dependencies for deployment. - -Please note that the PaddleOCR-VL service introduced in this section differs from the VLM inference service in the previous section: the latter is responsible for only one part of the complete workflow (i.e., VLM inference) and is called as an underlying service by the former. - -### 4.1 Method 1: Deployment Using Docker Compose - -Currently not supported, adaptation in progress. - -### 4.2 Method 2: Manual Installation of Dependencies for Deployment - -Execute the following commands to install the service deployment plugin via the PaddleX CLI: - -```shell -paddlex --install serving -``` - -Then, start the server using the PaddleX CLI: - -```shell -paddlex --serve --pipeline PaddleOCR-VL -``` - -After startup, you will see output similar to the following. The server listens on port **8080** by default: - -```text -INFO: Started server process [63108] -INFO: Waiting for application startup. -INFO: Application startup complete. -INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) -``` - -The command-line parameters related to service-oriented deployment are as follows: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NameDescription
--pipelineRegistered name of the PaddleX pipeline or path to the pipeline configuration file.
--deviceDevice for pipeline deployment. By default, the GPU is used if available; otherwise, the CPU is used.
--hostHostname or IP address to which the server is bound. The default is 0.0.0.0.
--portPort number on which the server listens. The default is 8080.
--use_hpipEnable high-performance inference mode. Refer to the high-performance inference documentation for more information.
--hpi_configHigh-performance inference configuration. Refer to the high-performance inference documentation for more information.
- -To adjust pipeline-related configurations (such as model paths, batch sizes, deployment devices, etc.), refer to Section 4.4. - -### 4.3 Client Invocation Method - -Refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). - -### 4.4 Pipeline Configuration Adjustment Instructions - -Refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). - -## 5. Model Fine-Tuning - -Refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-RTX50.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-RTX50.md deleted file mode 100644 index 4e8e254341..0000000000 --- a/docs/version3.x/pipeline_usage/PaddleOCR-VL-RTX50.md +++ /dev/null @@ -1,203 +0,0 @@ ---- -comments: true ---- - -# PaddleOCR-VL-RTX50 环境配置教程 - -本教程是 NVIDIA RTX 50 系 GPU 的环境配置教程,目的是完成相关的环境配置,环境配置完毕后请参考 [PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 使用 PaddleOCR-VL。 - -教程开始前,**请确认您的 NVIDIA 驱动支持 CUDA 12.9 或以上版本**。 - -## 1. 环境准备 - -此步骤主要介绍如何搭建 PaddleOCR-VL 的运行环境,有以下两种方式,任选一种即可: - -- 方法一:使用官方 Docker 镜像(暂不支持,正在适配中)。 - -- 方法二:手动安装 PaddlePaddle 和 PaddleOCR。 - -### 1.1 方法一:使用 Docker 镜像 - -暂不支持,正在适配中。 - -### 1.2 方法二:手动安装 PaddlePaddle 和 PaddleOCR - -如果您无法使用 Docker,也可以手动安装 PaddlePaddle 和 PaddleOCR。要求 Python 版本为 3.8–3.12。 - -**我们强烈推荐您在虚拟环境中安装 PaddleOCR-VL,以避免发生依赖冲突。** 例如,使用 Python venv 标准库创建虚拟环境: - -```shell -# 创建虚拟环境 -python -m venv .venv_paddleocr -# 激活环境 -source .venv_paddleocr/bin/activate -``` - -执行如下命令完成安装: - -```shell -python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/ -python -m pip install -U "paddleocr[doc-parser]" -# 对于 Linux 系统,执行: -python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl -# 对于Windows 系统,执行: -python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl -``` - -> **请注意安装 3.2.1 及以上版本的飞桨框架,同时安装特殊版本的 safetensors。** - -## 2. 快速开始 - -请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md)相同章节。 - -## 3. 使用推理加速框架提升 VLM 推理性能 - -默认配置下的推理性能未经过充分优化,可能无法满足实际生产需求。此步骤主要介绍如何使用 vLLM 和 SGLang 推理加速框架来提升 PaddleOCR-VL 的推理性能。 - -### 3.1 启动 VLM 推理服务 - -启动 VLM 推理服务有以下两种方式,任选一种即可: - -- 方法一:使用官方 Docker 镜像启动服务(暂不支持,正在适配中)。 - -- 方法二:通过 PaddleOCR CLI 手动安装依赖后启动服务。 - -#### 3.1.1 方法一:使用 Docker 镜像 - -暂不支持,正在适配中。 - -#### 3.1.2 方法二:通过 PaddleOCR CLI 安装和使用 - -由于推理加速框架可能与飞桨框架存在依赖冲突,建议在虚拟环境中安装。以 vLLM 为例: - -```shell -# 如果当前存在已激活的虚拟环境,先通过 `deactivate` 取消激活 -# 创建虚拟环境 -python -m venv .venv_vlm -# 激活环境 -source .venv_vlm/bin/activate -# 安装 PaddleOCR -python -m pip install "paddleocr[doc-parser]" -# 安装推理加速服务依赖 -paddleocr install_genai_server_deps vllm -python -m pip install flash-attn==2.8.3 -``` - -> `paddleocr install_genai_server_deps` 命令在执行过程中可能需要使用 nvcc 等 CUDA 编译工具。如果您的环境中没有这些工具或者安装时间过长,可以从 [此仓库](https://github.com/mjun0812/flash-attention-prebuild-wheels) 获取 FlashAttention 的预编译版本,例如执行 `python -m pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu128torch2.8-cp310-cp310-linux_x86_64.whl`。 - -`paddleocr install_genai_server_deps` 命令用法: - -```shell -paddleocr install_genai_server_deps <推理加速框架名称> -``` - -当前支持的框架名称为 `vllm` 和 `sglang`,分别对应 vLLM 和 SGLang。 - -安装完成后,可通过 `paddleocr genai_server` 命令启动服务: - -```shell -paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 -``` - -该命令支持的参数如下: - -| 参数 | 说明 | -| ------------------ | ------------------------- | -| `--model_name` | 模型名称 | -| `--model_dir` | 模型目录 | -| `--host` | 服务器主机名 | -| `--port` | 服务器端口号 | -| `--backend` | 后端名称,即使用的推理加速框架名称,可选 `vllm` 或 `sglang` | -| `--backend_config` | 可指定 YAML 文件,包含后端配置 | - -### 3.2 客户端使用方法 - -请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 - -## 4. 服务化部署 - -此步骤主要介绍如何将 PaddleOCR-VL 部署为服务并调用,有以下两种方式,任选一种即可: - -- 方法一:使用 Docker Compose 部署(暂不支持,正在适配中)。 - -- 方法二:手动安装依赖部署。 - -请注意,本节所介绍 PaddleOCR-VL 服务与上一节中的 VLM 推理服务有所区别:后者仅负责完整流程中的一个环节(即 VLM 推理),并作为前者的底层服务被调用。 - -### 4.1 方法一:使用 Docker Compose 部署 - -暂不支持,正在适配中。 - -### 4.2 方法二:手动安装依赖部署 - -执行以下命令,通过 PaddleX CLI 安装服务化部署插件: - -```shell -paddlex --install serving -``` - -然后,使用 PaddleX CLI 启动服务器: - -```shell -paddlex --serve --pipeline PaddleOCR-VL -``` - -启动后将看到类似如下输出,服务器默认监听 **8080** 端口: - -```text -INFO: Started server process [63108] -INFO: Waiting for application startup. -INFO: Application startup complete. -INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) -``` - -与服务化部署相关的命令行参数如下: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
名称说明
--pipelinePaddleX 产线注册名或产线配置文件路径。
--device产线部署设备。默认情况下,若 GPU 可用则使用 GPU,否则使用 CPU。
--host服务器绑定的主机名或 IP 地址,默认为 0.0.0.0
--port服务器监听的端口号,默认为 8080
--use_hpip启用高性能推理模式。请参考高性能推理文档了解更多信息。
--hpi_config高性能推理配置。请参考高性能推理文档了解更多信息。
- -如需调整产线相关配置(如模型路径、批处理大小、部署设备等),可参考 4.4 小节。 - -### 4.3 客户端调用方式 - -请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 - -### 4.4 产线配置调整说明 - -请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 - -## 5. 模型微调 - -请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-XPU.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-XPU.en.md new file mode 100644 index 0000000000..54c43353df --- /dev/null +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL-XPU.en.md @@ -0,0 +1,217 @@ +--- +comments: true +--- + +# PaddleOCR-VL XPU Environment Configuration Tutorial + +This tutorial is a guide for configuring the environment for PaddleOCR-VL KUNLUNXIN XPU. After completing the environment setup, please refer to the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md) to use PaddleOCR-VL. + +## 1. Environment Preparation + +This step mainly introduces how to set up the runtime environment for PaddleOCR-VL. There are two methods available; choose one as needed: + +- Method 1: Use the official Docker image. + +- Method 2: Manually install PaddlePaddle and PaddleOCR. + +### 1.1 Method 1: Using Docker Image + +We recommend using the official Docker image (requires Docker version >= 19.03): + +```shell +docker run \ + -it \ + --network host \ + --user root \ + --shm-size 64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu \ + /bin/bash +# Call PaddleOCR CLI or Python API in the container +``` + +If you wish to start the service in an environment without internet access, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu` (image size approximately 12 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu-offline` (image size approximately 14 GB). + +### 1.2 Method 2: Manually Install PaddlePaddle and PaddleOCR + +If you cannot use Docker, you can also manually install PaddlePaddle and PaddleOCR. The required Python version is 3.8–3.12. + +**We strongly recommend installing PaddleOCR-VL in a virtual environment to avoid dependency conflicts.** For example, use the Python venv standard library to create a virtual environment: + +```shell +# Create a virtual environment +python -m venv .venv_paddleocr +# Activate the environment +source .venv_paddleocr/bin/activate +``` + +Execute the following commands to complete the installation: + +```shell +python -m pip install paddlepaddle-xpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/xpu-p800/ +python -m pip install -U "paddleocr[doc-parser]" +python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl +``` + +> **Please ensure to install PaddlePaddle version 3.2.1 or above, along with the special version of safetensors.** + +## 2. Quick Start + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 3. Enhancing VLM Inference Performance Using Inference Acceleration Framework + +The inference performance under default configurations is not fully optimized and may not meet actual production requirements. This step mainly introduces how to use the FastDeploy inference acceleration framework to enhance the inference performance of PaddleOCR-VL. + +### 3.1 Starting the VLM Inference Service + +PaddleOCR provides a Docker image for quickly starting the FastDeploy inference service. Use the following command to start the service (requires Docker version >= 19.03): + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + --shm-size 64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-xpu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy +``` + +If you wish to start the service in an environment without internet access, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-xpu` (image size approximately 47 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-xpu-offline` (image size approximately 49 GB). + +When launching the FastDeploy inference service, we provide a set of default parameter settings. If you need to adjust parameters such as GPU memory usage, you can configure additional parameters yourself. Please refer to [3.3.1 Server Parameter Adjustment](./PaddleOCR-VL.en.md#331-server-parameter-adjustment) to create a configuration file, then mount this file into the container and specify the configuration file using `backend_config` in the command to start the service, for example: + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + -v fastdeploy_config.yml:/tmp/fastdeploy_config.yml \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/fastdeploy_config.yml +``` + +### 3.2 Client Usage Method + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 4. Service Deployment + +>Please note that the PaddleOCR-VL service introduced in this section differs from the VLM inference service in the previous section: the latter is only responsible for one part of the complete process (i.e., VLM inference) and is called as an underlying service by the former. + +This step mainly introduces how to deploy PaddleOCR-VL as a service and call it using Docker Compose. The specific process is as follows: + +1. Copy the content from [here](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/compose_xpu.yaml) and save it as a `compose.yaml` file. + +2. Copy the following content and save it as a `.env` file: + + ``` + API_IMAGE_TAG_SUFFIX=latest-xpu-offline + VLM_BACKEND=fastdeploy + VLM_IMAGE_TAG_SUFFIX=latest-xpu-offline + ``` + +3. Execute the following command in the directory where `compose.yaml` and `.env` files are located to start the server, which listens on port **8080** by default: + + ```shell + # Must be executed in the directory where compose.yaml and .env files are located + docker compose up + ``` + + After starting, you will see output similar to the following: + + ```text + paddleocr-vl-api | INFO: Started server process [1] + paddleocr-vl-api | INFO: Waiting for application startup. + paddleocr-vl-api | INFO: Application startup complete. + paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + +This method accelerates VLM inference based on the FastDeploy framework and is more suitable for production environment deployment. + +Additionally, after starting the server using this method, no internet connection is required except for pulling the image. If you need to deploy in an offline environment, you can first pull the images involved in the Compose file on a connected machine, export them, and transfer them to the offline machine for import to start the service in an offline environment. + +Docker Compose starts two containers sequentially by reading configurations from the `.env` and `compose.yaml` files, running the underlying VLM inference service and the PaddleOCR-VL service (pipeline service) respectively. + +The meanings of each environment variable contained in the `.env` file are as follows: + +``` +- `API_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the pipeline service. +- `VLM_BACKEND`: The VLM inference backend. +- `VLM_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the VLM inference service. +``` + +You can modify `compose.yaml` to meet custom requirements, for example: + +
+1. Change the port of the PaddleOCR-VL service + +Edit paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. Specify the XPU used by the PaddleOCR-VL service + +Edit environment in the compose.yaml file to change the XPU used. For example, if you need to use card 1 for deployment, make the following modifications: + +```diff + paddleocr-vl-api: + ... + environment: ++ - XPU_VISIBLE_DEVICES: 1 + ... + paddleocr-vlm-server: + ... + environment: ++ - XPU_VISIBLE_DEVICES: 1 + ... +``` + +
+ +
+3. Adjust VLM server configuration + +If you want to adjust the VLM server configuration, refer to 3.3.1 Server Parameter Adjustment to generate a configuration file. + +After generating the configuration file, add the following paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Replace /path/to/your_config.yaml with your actual configuration file path. + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. Adjust pipeline-related configurations (such as model path, batch size, deployment device, etc.) + +Refer to the 4.4 Pipeline Configuration Adjustment Instructions section. + +
+ +### 4.3 Client Invocation Methods + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +### 4.4 Pipeline Configuration Adjustment Instructions + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). + +## 5. Model Fine-Tuning + +Please refer to the corresponding section in the [PaddleOCR-VL Usage Tutorial](./PaddleOCR-VL.en.md). diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-XPU.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-XPU.md new file mode 100644 index 0000000000..82d6c901d1 --- /dev/null +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL-XPU.md @@ -0,0 +1,216 @@ +--- +comments: true +--- + +# PaddleOCR-VL XPU 环境配置教程 + +本教程是 PaddleOCR-VL 昆仑芯 XPU 的环境配置教程,目的是完成相关的环境配置,环境配置完毕后请参考 [PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 使用 PaddleOCR-VL。 + +## 1. 环境准备 + +此步骤主要介绍如何搭建 PaddleOCR-VL 的运行环境,有以下两种方式,任选一种即可: + +- 方法一:使用官方 Docker 镜像。 + +- 方法二:手动安装 PaddlePaddle 和 PaddleOCR。 + +### 1.1 方法一:使用 Docker 镜像 + +我们推荐使用官方 Docker 镜像(要求 Docker 版本 >= 19.03): + +```shell +docker run \ + -it \ + --network host \ + --user root \ + --shm-size 64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu \ + /bin/bash +# 在容器中调用 PaddleOCR CLI 或 Python API +``` + +如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu`(镜像大小约为 12 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu-offline`(镜像大小约为 14 GB)。 + +### 1.2 方法二:手动安装 PaddlePaddle 和 PaddleOCR + +如果您无法使用 Docker,也可以手动安装 PaddlePaddle 和 PaddleOCR。要求 Python 版本为 3.8–3.12。 + +**我们强烈推荐您在虚拟环境中安装 PaddleOCR-VL,以避免发生依赖冲突。** 例如,使用 Python venv 标准库创建虚拟环境: + +```shell +# 创建虚拟环境 +python -m venv .venv_paddleocr +# 激活环境 +source .venv_paddleocr/bin/activate +``` + +执行如下命令完成安装: + +```shell +python -m pip install paddlepaddle-xpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/xpu-p800/ +python -m pip install -U "paddleocr[doc-parser]" +python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl +``` + +> **请注意安装 3.2.1 及以上版本的飞桨框架,同时安装特殊版本的 safetensors。** + +## 2. 快速开始 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md)相同章节。 + +## 3. 使用推理加速框架提升 VLM 推理性能 + +默认配置下的推理性能未经过充分优化,可能无法满足实际生产需求。此步骤主要介绍如何使用 FastDeploy 推理加速框架来提升 PaddleOCR-VL 的推理性能。 + +### 3.1 启动 VLM 推理服务 + +PaddleOCR 提供了 Docker 镜像,用于快速启动 FastDeploy 推理服务。可使用以下命令启动服务(要求 Docker 版本 >= 19.03): + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + --shm-size 64G \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-xpu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy +``` + +如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-xpu`(镜像大小约为 47 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-xpu-offline`(镜像大小约为 49 GB)。 + +启动 FastDeploy 推理服务时,我们提供了一套默认参数设置。如果您有调整显存占用等更多参数的需求,可以自行配置更多参数。请参考 [3.3.1 服务端参数调整](./PaddleOCR-VL.md#331-服务端参数调整) 创建配置文件,然后将该文件挂载到容器中,并在启动服务的命令中使用 `backend_config` 指定配置文件,例如: + +```shell +docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + -v fastdeploy_config.yml:/tmp/fastdeploy_config.yml \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-xpu \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/fastdeploy_config.yml +``` + +### 3.2 客户端使用方法 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +## 4. 服务化部署 + +>请注意,本节所介绍 PaddleOCR-VL 服务与上一节中的 VLM 推理服务有所区别:后者仅负责完整流程中的一个环节(即 VLM 推理),并作为前者的底层服务被调用。 + +此步骤主要介绍如何使用 Docker Compose 将 PaddleOCR-VL 部署为服务并调用,具体流程如下: + + +1. 从 [此处](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/compose_xpu.yaml) 复制内容保存为 `compose.yaml` 文件。 + +2. 复制以下内容并保存为 `.env` 文件: + + ``` + API_IMAGE_TAG_SUFFIX=latest-xpu-offline + VLM_BACKEND=fastdeploy + VLM_IMAGE_TAG_SUFFIX=latest-xpu-offline + ``` + +3. 在 `compose.yaml` 和 `.env` 文件所在目录下执行以下命令启动服务器,默认监听 **8080** 端口: + + ```shell + # 必须在 compose.yaml 和 .env 文件所在的目录中执行 + docker compose up + ``` + + 启动后将看到类似如下输出: + + ```text + paddleocr-vl-api | INFO: Started server process [1] + paddleocr-vl-api | INFO: Waiting for application startup. + paddleocr-vl-api | INFO: Application startup complete. + paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + +此方式基于 FastDeploy 框架对 VLM 推理进行加速,更适合生产环境部署。 + +此外,使用此方式启动服务器后,除拉取镜像外,无需连接互联网。如需在离线环境中部署,可先在联网机器上拉取 Compose 文件中涉及的镜像,导出并传输至离线机器中导入,即可在离线环境下启动服务。 + +Docker Compose 通过读取 `.env` 和 `compose.yaml` 文件中配置,先后启动 2 个容器,分别运行底层 VLM 推理服务,以及 PaddleOCR-VL 服务(产线服务)。 + +`.env` 文件中包含的各环境变量含义如下: + +- `API_IMAGE_TAG_SUFFIX`:启动产线服务使用的镜像的标签后缀。 +- `VLM_BACKEND`:VLM 推理后端。 +- `VLM_IMAGE_TAG_SUFFIX`:启动 VLM 推理服务使用的镜像的标签后缀。 + +您可以通过修改 `compose.yaml` 来满足自定义需求,例如: + +
+1. 更改 PaddleOCR-VL 服务的端口 + +编辑 compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. 指定 PaddleOCR-VL 服务所使用的 XPU + +编辑 compose.yaml 文件中的 environment 来更改所使用的 XPU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + environment: ++ - XPU_VISIBLE_DEVICES: 1 + ... + paddleocr-vlm-server: + ... + environment: ++ - XPU_VISIBLE_DEVICES: 1 + ... +``` + +
+ +
+3. 调整 VLM 服务端配置 + +若您想调整 VLM 服务端的配置,可以参考 3.3.1 服务端参数调整 生成配置文件。 + +生成配置文件后,将以下的 paddleocr-vlm-server.volumespaddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。 + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. 调整产线相关配置(如模型路径、批处理大小、部署设备等) + +参考 4.4 产线配置调整说明 小节。 + +
+ +### 4.3 客户端调用方式 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +### 4.4 产线配置调整说明 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 + +## 5. 模型微调 + +请参考[PaddleOCR-VL 使用教程](./PaddleOCR-VL.md) 相同章节。 diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md index 17a9487879..ac7a058fb1 100644 --- a/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md @@ -8,85 +8,109 @@ PaddleOCR-VL is an advanced and efficient document parsing model designed specif -## PaddleOCR-VL Inference Device Support +## Process Guide -Currently, PaddleOCR-VL offers three inference methods, each with varying levels of support for inference devices. Please verify that your inference device meets the requirements in the table below before proceeding with PaddleOCR-VL inference deployment: +Before starting, please refer to the next section for information on the inference device support provided by PaddleOCR-VL to **determine if your device meets the operational requirements.** If your device meets the requirements, please select the relevant section to read based on your needs. + +For some inference hardware, you may need to refer to other environment configuration documents we provide, but the process remains the same and does not affect your reading of the following process guide: + +1. **Want to quickly experience PaddleOCR-VL**: + + If you wish to quickly experience the inference effects of PaddleOCR-VL, please read [1. Environment Preparation](#1-environment-preparation) and [2. Quick Start](#2-quick-start). + +2. **Want to use PaddleOCR-VL in a production environment**: + + Although the quick experience allows you to feel the effects of PaddleOCR-VL, it may not be optimal in terms of inference speed and GPU memory usage. If you wish to apply PaddleOCR-VL in a production environment and have higher requirements for inference performance, please read [3. Enhancing VLM Inference Performance Using Inference Acceleration Frameworks](#3-enhancing-vlm-inference-performance-using-inference-acceleration-frameworks). + +3. **Want to deploy PaddleOCR-VL as an API service**: + + If you want to deploy PaddleOCR-VL as a web service (API) so that other devices or applications can access and call it through a specific URL without configuring the environment, we offer two methods: + + - Deployment using Docker Compose (one-click start, recommended): Please read [4.1 Method 1: Deploy Using Docker Compose](#41-method-1-deploy-using-docker-compose-recommended) and [4.3 Client-Side Invocation](#43-client-side-invocation). + - Manual deployment: Please read [1. Environment Preparation](#1-environment-preparation), [4.2 Method 2: Manual Deployment](#42-method-2-manual-deployment), and [4.3 Client-Side Invocation](#43-client-side-invocation). + +4. **Want to fine-tune PaddleOCR-VL to adapt to specific business needs**: + + If you find that the accuracy performance of PaddleOCR-VL in specific business scenarios does not meet expectations, please read [5. Model Fine-tuning](#5-model-fine-tuning). + +## Inference Device Support for PaddleOCR-VL + +Currently, PaddleOCR-VL offers four inference methods, with varying levels of support for different inference devices. Please confirm that your inference device meets the requirements in the table below before proceeding with PaddleOCR-VL deployment: - - - + + + + + + - + - - + + + + + - + + + + - - + + - + - - - + + + + + + + + + + + + + + +
Inference Methodx64 CPU SupportGPU Compute Capability SupportCUDA Version SupportNVIDIA GPUKUNLUNXIN XPUHYGON DCUMetaX GPUIluvatar GPUx64 CPU
PaddlePaddle ≥ 7≥ 11.8🚧🚧
vLLM🚧 🚧≥ 8 (RTX 3060, RTX 5070, A10, A100, ...)
- 7 ≤ GPU Compute Capability < 8 (T4, V100, ...) is supported but may encounter request timeouts, OOM errors, or other abnormalities. Not recommended. -
≥ 12.6🚧
SGLang🚧8 ≤ GPU Compute Capability < 12≥ 12.6🚧🚧🚧🚧
FastDeploy🚧🚧🚧
-> Currently, PaddleOCR-VL does not support ARM architecture CPUs. Additional hardware support will be added based on actual demand in the future. Stay tuned! -> vLLM and SGLang cannot run natively on Windows or macOS. Please use our provided Docker image instead. - -Since different hardware configurations require different dependencies, if your hardware meets the requirements in the table above, please refer to the following table for the corresponding environment configuration tutorial: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hardware TypeHardware ModelEnvironment Configuration Tutorial
NVIDIA GPURTX 30, 40 SeriesThis usage tutorial
RTX 50 SeriesPaddleOCR-VL RTX 50 Environment Configuration Tutorial
x64 CPU-This usage tutorial
XPU🚧🚧
DCU🚧🚧
- -> For example, if you are using an RTX 50 Series GPU that meets the device requirements for PaddlePaddle and vLLM inference methods, please refer to the [PaddleOCR-VL RTX 50 Environment Configuration Tutorial](./PaddleOCR-VL-RTX50.en.md) to complete environment configuration before using PaddleOCR-VL. +> TIP: +> - When using NVIDIA GPU for inference, ensure that the Compute Capability (CC) and CUDA version meet the requirements: +> > - PaddlePaddle: CC ≥ 7.0, CUDA ≥ 11.8 +> > - vLLM: CC ≥ 8.0, CUDA ≥ 12.6 +> > - SGLang: 8.0 ≤ CC < 12.0, CUDA ≥ 12.6 +> > - FastDeploy: 8.0 ≤ CC < 12.0, CUDA ≥ 12.6 +> > - Common GPUs with CC ≥ 8 include RTX 30/40/50 series and A10/A100, etc. For more models, refer to [CUDA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) +> - vLLM compatibility note: Although vLLM can be launched on NVIDIA GPUs with CC 7.x such as T4/V100, timeout or OOM issues may occur, and its use is not recommended. +> - Currently, PaddleOCR-VL does not support ARM architecture CPUs. More hardware support will be expanded based on actual needs in the future, so stay tuned! +> - vLLM, SGLang, and FastDeploy cannot run natively on Windows or macOS. Please use the Docker images we provide. + +Since different hardware requires different dependencies, if your hardware meets the requirements in the table above, please refer to the following table for the corresponding tutorial to configure your environment: + +| Hardware Type | Environment Configuration Tutorial | +|----------------|------------------------------------------------------------------------------------------------------------------------------| +| x64 CPU | This tutorial | +| NVIDIA GPU | - NVIDIA Blackwell architecture GPU (e.g., RTX 50 series) refer to [PaddleOCR-VL NVIDIA Blackwell Architecture GPU Environment Configuration Tutorial](./PaddleOCR-VL-NVIDIA-Blackwell.en.md)
- Other NVIDIA GPUs refer to this tutorial | +| KUNLUNXIN XPU | [PaddleOCR-VL XPU Environment Configuration Tutorial](./PaddleOCR-VL-XPU.en.md) | +| HYGON DCU | [PaddleOCR-VL DCU Environment Configuration Tutorial](./PaddleOCR-VL-DCU.en.md) | + +> TIP: +> For example, if you are using an RTX 50 series GPU that meets the device requirements for both PaddlePaddle and vLLM inference methods, please refer to the [PaddleOCR-VL NVIDIA Blackwell Architecture GPU Environment Configuration Tutorial](./PaddleOCR-VL-NVIDIA-Blackwell.en.md) to complete the environment configuration before using PaddleOCR-VL. ## 1. Environment Preparation @@ -111,7 +135,7 @@ docker run \ # Invoke PaddleOCR CLI or Python API within the container ``` -The image size is approximately 8 GB. If you need to use PaddleOCR-VL in an offline environment, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest` in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-offline` (offline image size is approximately 11 GB). You will need to pull the image on an internet-connected machine, import it into the offline machine, and then start the container using this image on the offline machine. For example: +If you need to use PaddleOCR-VL in an offline environment, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest` (image size approximately 8 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-offline` (image size is approximately 10 GB). You will need to pull the image on an internet-connected machine, import it into the offline machine, and then start the container using this image on the offline machine. For example: ```shell # Execute on an internet-connected machine @@ -145,18 +169,20 @@ Run the following commands to complete the installation: # The following command installs the PaddlePaddle version for CUDA 12.6. For other CUDA versions and the CPU version, please refer to https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ python -m pip install -U "paddleocr[doc-parser]" -# For Linux systems, run: +# For Linux systems, please directly copy and execute the following commands without modifying the cuda version in the link: python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl -# For Windows systems, run: +# For Windows systems, directly copy and execute the following command: python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl ``` +> IMPORTANT: > **Please ensure that you install PaddlePaddle framework version 3.2.1 or above, along with the special version of safetensors.** For macOS users, please use Docker to set up the environment. ## 2. Quick Start PaddleOCR-VL supports two usage methods: CLI command line and Python API. The CLI command line method is simpler and suitable for quickly verifying functionality, while the Python API method is more flexible and suitable for integration into existing projects. +> TIP: > The methods introduced in this section are primarily for rapid validation. Their inference speed, memory usage, and stability may not meet the requirements of a production environment. **If deployment to a production environment is needed, we strongly recommend using a dedicated inference acceleration framework**. For specific methods, please refer to the next section. ### 2.1 Command Line Usage @@ -1006,7 +1032,7 @@ Additionally, it also supports obtaining visualized images and prediction result ## 3. Enhancing VLM Inference Performance Using Inference Acceleration Frameworks -The inference performance under default configurations is not fully optimized and may not meet actual production requirements. This step primarily introduces how to use the vLLM and SGLang inference acceleration frameworks to enhance the inference performance of PaddleOCR-VL. +The inference performance under default configurations is not fully optimized and may not meet actual production requirements. This step primarily introduces how to use the vLLM, SGLang and FastDeploy inference acceleration frameworks to enhance the inference performance of PaddleOCR-VL. ### 3.1 Launching the VLM Inference Service @@ -1018,7 +1044,37 @@ There are two methods to launch the VLM inference service; choose either one: #### 3.1.1 Method 1: Using Docker Image -PaddleOCR provides a Docker image (approximately 13 GB in size) for quickly launching the vLLM inference service. Use the following command to launch the service (requires Docker version >= 19.03, a machine equipped with a GPU, and NVIDIA drivers supporting CUDA 12.6 or higher): +PaddleOCR provides Docker images for quickly launching vLLM or FastDeploy inference services. You can use the following commands to start the services (requires Docker version >= 19.03, a machine equipped with a GPU, and NVIDIA drivers supporting CUDA 12.6 or later): + +=== "Launch vLLM Service" + + ```shell + docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm + ``` + + If you wish to start the service in an environment without internet access, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest` (image size approximately 13 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-offline` (image size approximately 15 GB). + +=== "Launch FastDeploy Service" + + ```shell + docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy + ``` + + If you wish to start the service in an environment without internet access, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest` (image size approximately 43 GB) in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-offline` (image size approximately 45 GB). + +When starting the vLLM or FastDeploy inference service, we provide a set of default parameter settings. If you have additional requirements for adjusting parameters such as GPU memory usage, you can configure more parameters yourself. Please refer to [3.3.1 Server-side Parameter Adjustment](#331-server-side-parameter-adjustment) to create a configuration file, then mount this file into the container, and specify the configuration file using `backend_config` in the command to start the service. Taking vLLM as an example: ```shell docker run \ @@ -1026,14 +1082,11 @@ docker run \ --rm \ --gpus all \ --network host \ + -v vllm_config.yml:/tmp/vllm_config.yml \ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest \ - paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/vllm_config.yml ``` -More parameters can be passed when launching the vLLM inference service; refer to the next subsection for supported parameters. - -If you wish to launch the service in an environment without internet access, replace `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest` in the above command with the offline version image `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-offline`. The offline image is approximately 15 GB in size. - #### 3.1.2 Method 2: Installation and Usage via PaddleOCR CLI Since inference acceleration frameworks may have dependency conflicts with the PaddlePaddle framework, it is recommended to install them in a virtual environment. Taking vLLM as an example: @@ -1056,7 +1109,7 @@ Usage of the `paddleocr install_genai_server_deps` command: paddleocr install_genai_server_deps ``` -Currently supported framework names are `vllm` and `sglang`, corresponding to vLLM and SGLang, respectively. +Currently supported framework names are `vllm`, `sglang` and `fastdeploy`, corresponding to vLLM, SGLang and FastDeploy, respectively. The vLLM and SGLang installed via `paddleocr install_genai_server_deps` are both **CUDA 12.6** versions; ensure that your local NVIDIA drivers are consistent with or higher than this version. @@ -1109,6 +1162,7 @@ Different inference acceleration frameworks support different parameters. Refer - [vLLM Official Parameter Tuning Guide](https://docs.vllm.ai/en/latest/configuration/optimization.html) - [SGLang Hyperparameter Tuning Documentation](https://docs.sglang.ai/advanced_features/hyperparameter_tuning.html) +- [FastDeploy Best Practices](https://paddlepaddle.github.io/FastDeploy/best_practices/PaddleOCR-VL-0.9B/) The PaddleOCR VLM inference service supports parameter tuning through configuration files. The following example shows how to adjust the `gpu-memory-utilization` and `max-num-seqs` parameters for the vLLM server: @@ -1147,7 +1201,10 @@ The following configurations are for scenarios with a 1:1 client-to-VLM inferenc **NVIDIA RTX 3060** - **Server-Side** - - vLLM: `gpu-memory-utilization=0.8` + - vLLM: `gpu-memory-utilization: 0.8` + - FastDeploy: + - `gpu-memory-utilization: 0.8` + - `max-concurrency: 2048` ## 4. Service Deployment @@ -1155,7 +1212,7 @@ This step mainly introduces how to deploy PaddleOCR-VL as a service and invoke i - Method 1: Deploy using Docker Compose (recommended). -- Method 2: Manually install dependencies for deployment. +- Method 2: Manual Deployment. Note that the PaddleOCR-VL service described in this section differs from the VLM inference service in the previous section: the latter is responsible for only one part of the complete process (i.e., VLM inference) and is called as an underlying service by the former. @@ -1179,17 +1236,105 @@ paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 This solution accelerates VLM inference based on frameworks like vLLM, making it more suitable for production environment deployment. However, it requires the machine to be equipped with a GPU and the NVIDIA driver to support CUDA 12.6 or higher. -The `.env` file can be used to configure environment variables, with detailed descriptions as follows: +Additionally, after starting the server using this method, no internet connection is required except for pulling the image. For offline environment deployment, you can first pull the images involved in the Compose file on an online machine, export and transfer them to the offline machine for import, and then start the service in the offline environment. + +Docker Compose starts two containers in sequence by reading the configurations in the `.env` and `compose.yaml` files, running the underlying VLM inference service and the PaddleOCR-VL service (Pipeline) respectively. + +The meanings of each environment variable contained in the `.env` file are as follows: - `API_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to start the pipeline service. The default is `latest-offline`, indicating the use of an offline GPU image. - `VLM_BACKEND`: The VLM inference backend, currently supporting `vllm` and `fastdeploy`. The default is `vllm`. - `VLM_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to start the VLM inference service. The default is `latest-offline`, indicating the use of an offline GPU image. -Additionally, after starting the server using this method, no internet connection is required except for pulling the image. For offline environment deployment, you can first pull the images involved in the Compose file on an online machine, export and transfer them to the offline machine for import, and then start the service in the offline environment. +You can meet custom requirements by modifying `.env` and `compose.yaml`, for example: + +
+1. Change the port of the PaddleOCR-VL service + +Edit paddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. Specify the GPU used by the PaddleOCR-VL service + +Edit device_ids in the compose.yaml file to change the GPU used. For example, if you need to use GPU card 1 for deployment, make the following modifications: + +```diff + paddleocr-vl-api: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... + paddleocr-vlm-server: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... +``` -If you need to adjust pipeline configurations (such as model path, batch size, deployment device, etc.), refer to Section 4.4. +
+ +
+3. Adjust VLM server-side configuration + +If you want to adjust the VLM server-side configuration, please refer to 3.3.1 Server-side Parameter Adjustment to generate a configuration file. + +After generating the configuration file, add the following paddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Please replace /path/to/your_config.yaml with your actual configuration file path. + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. Change the VLM inference backend + +Modify VLM_BACKEND in the .env file, for example, to change the VLM inference backend to fastdeploy: + +```diff + API_IMAGE_TAG_SUFFIX=latest-offline +- VLM_BACKEND=vllm ++ VLM_BACKEND=fastdeploy + VLM_IMAGE_TAG_SUFFIX=latest-offline +``` + +
+ +
+5. Adjust pipeline configurations (such as model path, batch size, deployment device, etc.) + +Refer to section 4.4 Pipeline Configuration Adjustment Instructions in this document. + +
-### 4.2 Method 2: Manually Install Dependencies for Deployment +### 4.2 Method 2: Manual Deployment Execute the following command to install the service deployment plugin via the PaddleX CLI: @@ -2088,6 +2233,7 @@ foreach ($result as $i => $item) { ### 4.4 Pipeline Configuration Adjustment Instructions +> NOTE: > If you do not need to adjust pipeline configurations, you can ignore this section. Adjusting the PaddleOCR-VL configuration for service deployment involves only three steps: diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL.md index cb68de9ffa..130bf901bc 100644 --- a/docs/version3.x/pipeline_usage/PaddleOCR-VL.md +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL.md @@ -8,85 +8,109 @@ PaddleOCR-VL 是一款先进、高效的文档解析模型,专为文档中的 +## 流程导览 + +在开始之前,请参考下一节了解 PaddleOCR-VL 对推理设备的支持情况,**以确定您的设备是否满足运行要求。** 若您的设备满足运行要求,请根据您的需求选择相关章节阅读。 + +部分推理硬件可能需要参考我们提供的其他环境配置文档,但流程是一样的,不影响您的阅读下面的流程导览: + +1. **希望快速体验 PaddleOCR-VL**: + + 如果您希望快速体验 PaddleOCR-VL 的推理效果,请阅读 [1. 环境准备](#1-环境准备) 和 [2. 快速开始](#2-快速开始)。 + +2. **希望将 PaddleOCR-VL 用于生产环境**: + + 快速体验虽然可以让您感受到 PaddleOCR-VL 的效果,但在推理速度、显存占用等方面不是最佳状态。如果您希望将 PaddleOCR-VL 应用于生产环境,并且对推理性能有更高的要求,请阅读 [3. 使用推理加速框架提升 VLM 推理性能](#3-使用推理加速框架提升-vlm-推理性能) 。 + +3. **希望将 PaddleOCR-VL 部署为 API 服务**: + + 如果您想将 PaddleOCR-VL 部署为一个网络服务(API),这样其他设备或应用程序无需配置环境,仅通过一个特定的网址就可以来访问和调用它,我们提供两种方式: + + - 使用 Docker Compose 部署(一键启动,推荐使用):请阅读 [4.1 方法一:使用 Docker Compose 部署](#41-方法一使用-docker-compose-部署推荐使用) 和 [4.3 客户端调用方式](#43-客户端调用方式)。 + - 进行手动部署:请阅读 [1. 环境准备](#1-环境准备)、 [4.2 方法二:手动部署](#42-方法二手动部署) 和 [4.3 客户端调用方式](#43-客户端调用方式)。 + +4. **希望对 PaddleOCR-VL 进行微调以适配特定业务**: + + 如果您发现 PaddleOCR-VL 在特定业务场景中的精度表现未达预期,请阅读 [5. 模型微调](#5-模型微调)。 + ## PaddleOCR-VL 对推理设备的支持情况 -目前 PaddleOCR-VL 有三种推理方式,支持的推理设备不完全相同,请确认您的推理设备是否满足下表要求再进行 PaddleOCR-VL 的推理部署: +目前 PaddleOCR-VL 有四种推理方式,支持的推理设备不完全相同,请确认您的推理设备是否满足下表要求再进行 PaddleOCR-VL 的推理部署: - + - - - + + + + + + - + - - + + + + + - + + + + - - + + - + - - - + + + + + + + + + + + + + + +
推理方式支持 x64 CPU支持的 GPU Compute Capability支持的 CUDA 版本英伟达 GPU昆仑芯 XPU海光 DCU沐曦 GPU天数 GPUx64 CPU
PaddlePaddle ≥ 7≥ 11.8🚧🚧
vLLM🚧 🚧≥ 8 (RTX 3060,RTX 5070,A10,A100, ...)
- 7 ≤ GPU Compute Capability < 8 (T4,V100,...)支持运行,但可能出现请求超时、OOM 等异常情况,不推荐使用 -
≥ 12.6🚧
SGLang🚧8 ≤ GPU Compute Capability < 12≥ 12.6🚧🚧🚧🚧
FastDeploy🚧🚧🚧
-> 当前,PaddleOCR-VL 暂不支持 ARM 架构 CPU。后续将根据实际需求扩展更多硬件支持,敬请期待! -> vLLM 与 SGLang 无法在 Windows 或 macOS 上原生运行,请使用我们提供的 Docker 镜像。 +> TIP: +> - 使用英伟达 GPU 推理时需要注意 Compute Capability(简称 CC) 和 CUDA 版本(简称 CUDA)是否满足要求: +> > - PaddlePaddle: CC ≥ 7.0, CUDA ≥ 11.8 +> > - vLLM: CC ≥ 8.0, CUDA ≥ 12.6 +> > - SGLang: 8.0 ≤ CC < 12.0, CUDA ≥ 12.6 +> > - FastDeploy: 8.0 ≤ CC < 12.0, CUDA ≥ 12.6 +> > - CC ≥ 8 的常见显卡包括 RTX 30/40/50 系列及 A10/A100 等,更多型号可查看 [CUDA GPU 计算能力](https://developer.nvidia.cn/cuda-gpus) +> - 虽然 vLLM 可在 T4/V100 等 CC 7.x 的 NVIDIA GPU 上启动,但容易出现超时或 OOM,不推荐使用。 +> - 当前,PaddleOCR-VL 暂不支持 ARM 架构 CPU。后续将根据实际需求扩展更多硬件支持,敬请期待! +> - vLLM、SGLang 和 FastDeploy 无法在 Windows 或 macOS 上原生运行,请使用我们提供的 Docker 镜像。 由于不同硬件所需的依赖各不相同,如果您的硬件满足上述表格的要求,请参考下表查看对应的教程进行环境配置: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
硬件类型硬件型号环境配置教程
NVIDIA GPURTX 30、40 系本教程
RTX 50 系PaddleOCR-VL RTX 50 环境配置教程
x64 CPU-本教程
XPU🚧🚧
DCU🚧🚧
+| 硬件类型 | 环境配置教程 | +|-----------------|--------------------------------------------------| +| x64 CPU | 本教程 | +| 英伟达 GPU | - NVIDIA Blackwell 架构 GPU(如RTX 50 系)参考 [PaddleOCR-VL NVIDIA Blackwell 架构 GPU 环境配置教程](./PaddleOCR-VL-NVIDIA-Blackwell.md)
- 其他 NVIDIA GPU 参考本教程 | +| 昆仑芯 XPU | [PaddleOCR-VL XPU 环境配置教程](./PaddleOCR-VL-XPU.md) | +| 海光 DCU | [PaddleOCR-VL DCU 环境配置教程](./PaddleOCR-VL-DCU.md) | -> 例如您使用的是 RTX 50 系 GPU,满足 PaddlePaddle 和 vLLM 推理方式的设备要求,请参考 [PaddleOCR-VL RTX 50 环境配置教程](./PaddleOCR-VL-RTX50.md) 完成环境配置后再进行 PaddleOCR-VL 的使用。 +> TIP: +> 例如您使用的是 RTX 50 系 GPU,满足 PaddlePaddle 和 vLLM 推理方式的设备要求,请参考 [PaddleOCR-VL NVIDIA Blackwell 架构 GPU 环境配置教程](./PaddleOCR-VL-NVIDIA-Blackwell.md) 完成环境配置后再进行 PaddleOCR-VL 的使用。 ## 1. 环境准备 @@ -111,7 +135,7 @@ docker run \ # 在容器中调用 PaddleOCR CLI 或 Python API ``` -镜像的大小约为 8 GB。如果您希望在无法连接互联网的环境中使用 PaddleOCR-VL,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest` 更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-offline`(离线镜像大小约为 11 GB)。您需要在可以联网的机器上拉取镜像,将镜像导入到离线机器,然后在离线机器使用该镜像启动容器。例如: +如果您希望在无法连接互联网的环境中使用 PaddleOCR-VL,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest` (镜像的大小约为 8 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-offline`(镜像大小约为 10 GB)。您需要在可以联网的机器上拉取镜像,将镜像导入到离线机器,然后在离线机器使用该镜像启动容器。例如: ```shell # 在能够联网的机器上执行 @@ -145,12 +169,13 @@ source .venv_paddleocr/bin/activate # 以下命令安装 CUDA 12.6 版本的 PaddlePaddle,对于其他 CUDA 版本以及 CPU 版本,请参考 https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html python -m pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ python -m pip install -U "paddleocr[doc-parser]" -# 对于 Linux 系统,执行: +# 对于 Linux 系统,请直接复制并执行以下命令,无需修改链接中的 cuda 版本: python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl -# 对于Windows 系统,执行: +# 对于Windows 系统,请直接复制并执行以下命令: python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl ``` +> IMPORTANT: > **请注意安装 3.2.1 及以上版本的飞桨框架,同时安装特殊版本的 safetensors。** 对于 macOS 用户,请使用 Docker 进行环境搭建。 ## 2. 快速开始 @@ -159,6 +184,7 @@ python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safe PaddleOCR-VL 支持 CLI 命令行方式和 Python API 两种使用方式,其中 CLI 命令行方式更简单,适合快速验证功能,而 Python API 方式更灵活,适合集成到现有项目中。 +> TIP: > 本节所介绍的方法主要用于快速验证,其推理速度、显存占用及稳定性表现未必能满足生产环境的要求。**若需部署至生产环境,我们强烈建议使用专门的推理加速框架** ,具体方法请参考下一节。 ### 2.1 命令行方式体验 @@ -1044,7 +1070,7 @@ MKL-DNN 缓存容量。 ## 3. 使用推理加速框架提升 VLM 推理性能 -默认配置下的推理性能未经过充分优化,可能无法满足实际生产需求。此步骤主要介绍如何使用 vLLM 和 SGLang 推理加速框架来提升 PaddleOCR-VL 的推理性能。 +默认配置下的推理性能未经过充分优化,可能无法满足实际生产需求。此步骤主要介绍如何使用 vLLM、SGLang 和 FastDeploy 推理加速框架来提升 PaddleOCR-VL 的推理性能。 ### 3.1 启动 VLM 推理服务 @@ -1056,7 +1082,37 @@ MKL-DNN 缓存容量。 #### 3.1.1 方法一:使用 Docker 镜像 -PaddleOCR 提供了 Docker 镜像(镜像大小约为 13 GB),用于快速启动 vLLM 推理服务。可使用以下命令启动服务(要求 Docker 版本 >= 19.03,机器装配有 GPU 且 NVIDIA 驱动支持 CUDA 12.6 或以上版本): +PaddleOCR 提供了 Docker 镜像,用于快速启动 vLLM 或 FastDeploy 推理服务。可使用以下命令启动服务(要求 Docker 版本 >= 19.03,机器装配有 GPU 且 NVIDIA 驱动支持 CUDA 12.6 或以上版本): + +=== "启动 vLLM 服务" + + ```shell + docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm + ``` + + 如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest`(镜像大小约为 13 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-offline`(镜像大小约为 15 GB)。 + +=== "启动 FastDeploy 服务" + + ```shell + docker run \ + -it \ + --rm \ + --gpus all \ + --network host \ + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest \ + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend fastdeploy + ``` + + 如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest`(镜像大小约为 43 GB)更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-fastdeploy-server:latest-offline`(镜像大小约为 45 GB)。 + +启动 vLLM 或 FastDeploy 推理服务时,我们提供了一套默认参数设置。如果您有调整显存占用等更多参数的需求,可以自行配置更多参数。请参考 [3.3.1 服务端参数调整](#331-服务端参数调整) 创建配置文件,然后将该文件挂载到容器中,并在启动服务的命令中使用 `backend_config` 指定配置文件,以 vLLM 为例: ```shell docker run \ @@ -1064,14 +1120,11 @@ docker run \ --rm \ --gpus all \ --network host \ + -v vllm_config.yml:/tmp/vllm_config.yml \ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest \ - paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm + paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/vllm_config.yml ``` -启动 vLLM 推理服务可以传入更多参数,支持的参数详见下一小节。 - -如果您希望在无法连接互联网的环境中启动服务,请将上述命令中的 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest` 更换为离线版本镜像 `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-offline`。离线镜像大小约为 15 GB。 - #### 3.1.2 方法二:通过 PaddleOCR CLI 安装和使用 由于推理加速框架可能与飞桨框架存在依赖冲突,建议在虚拟环境中安装。以 vLLM 为例: @@ -1094,7 +1147,7 @@ paddleocr install_genai_server_deps vllm paddleocr install_genai_server_deps <推理加速框架名称> ``` -当前支持的框架名称为 `vllm` 和 `sglang`,分别对应 vLLM 和 SGLang。 +当前支持的框架名称为 `vllm`、`sglang` 和 `fastdeploy`,分别对应 vLLM、SGLang 和 FastDeploy。 通过 `paddleocr install_genai_server_deps` 安装的 vLLM 与 SGLang 均为 **CUDA 12.6** 版本,请确保本地 NVIDIA 驱动与此版本一致或更高。 @@ -1147,6 +1200,7 @@ pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://1 - [vLLM 官方参数调优指南](https://docs.vllm.ai/en/latest/configuration/optimization.html) - [SGLang 超参数调整文档](https://docs.sglang.ai/advanced_features/hyperparameter_tuning.html) +- [FastDeploy 最佳实践文档](https://paddlepaddle.github.io/FastDeploy/zh/best_practices/PaddleOCR-VL-0.9B/) PaddleOCR VLM 推理服务支持通过配置文件进行调参。以下示例展示如何调整 vLLM 服务器的 `gpu-memory-utilization` 和 `max-num-seqs` 参数: @@ -1185,7 +1239,10 @@ PaddleOCR 会将来自单张或多张输入图像中的子图分组并对服务 **NVIDIA RTX 3060** - **服务端** - - vLLM:`gpu-memory-utilization=0.8` + - vLLM:`gpu-memory-utilization: 0.8` + - FastDeploy: + - `gpu-memory-utilization: 0.8` + - `max-concurrency: 2048` ## 4. 服务化部署 @@ -1193,7 +1250,7 @@ PaddleOCR 会将来自单张或多张输入图像中的子图分组并对服务 - 方法一:使用 Docker Compose 部署(推荐使用)。 -- 方法二:手动安装依赖部署。 +- 方法二:手动部署。 请注意,本节所介绍 PaddleOCR-VL 服务与上一节中的 VLM 推理服务有所区别:后者仅负责完整流程中的一个环节(即 VLM 推理),并作为前者的底层服务被调用。 @@ -1217,17 +1274,105 @@ paddleocr-vl-api | INFO: Uvicorn running on http://0.0.0.0:8080 此方式基于 vLLM 等框架对 VLM 推理进行加速,更适合生产环境部署,但要求机器配备 GPU,并且 NVIDIA 驱动程序支持 CUDA 12.6 或以上版本。 -`.env` 文件可用于配置环境变量,详细介绍如下: +此外,使用此方式启动服务器后,除拉取镜像外,无需连接互联网。如需在离线环境中部署,可先在联网机器上拉取 Compose 文件中涉及的镜像,导出并传输至离线机器中导入,即可在离线环境下启动服务。 + +Docker Compose 通过读取 `.env` 和 `compose.yaml` 文件中配置,先后启动 2 个容器,分别运行底层 VLM 推理服务,以及 PaddleOCR-VL 服务(产线服务)。 + +`.env` 文件中包含的各环境变量含义如下: - `API_IMAGE_TAG_SUFFIX`:启动产线服务使用的镜像的标签后缀。默认为 `latest-offline`,表示使用离线 GPU 镜像。 - `VLM_BACKEND`:VLM 推理后端,目前支持 `vllm` 和 `fastdeploy`。默认为 `vllm`。 - `VLM_IMAGE_TAG_SUFFIX`:启动 VLM 推理服务使用的镜像的标签后缀。默认为 `latest-offline`,表示使用离线 GPU 镜像。 -此外,使用此方式启动服务器后,除拉取镜像外,无需连接互联网。如需在离线环境中部署,可先在联网机器上拉取 Compose 文件中涉及的镜像,导出并传输至离线机器中导入,即可在离线环境下启动服务。 +您可以通过修改 `.env` 和 `compose.yaml` 来满足自定义需求,例如: -如需调整产线相关配置(如模型路径、批处理大小、部署设备等),可参考 4.4 小节。 +
+1. 更改 PaddleOCR-VL 服务的端口 + +编辑 compose.yaml 文件中的 paddleocr-vl-api.ports 来更改端口。例如,如果您需要将服务端口更换为 8111,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + ports: +- - 8080:8080 ++ - 8111:8080 + ... +``` + +
+ +
+2. 指定 PaddleOCR-VL 服务所使用的 GPU + +编辑 compose.yaml 文件中的 device_ids 来更改所使用的 GPU。例如,如果您需要使用卡 1 进行部署,可以进行以下修改: + +```diff + paddleocr-vl-api: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... + paddleocr-vlm-server: + ... + deploy: + resources: + reservations: + devices: + - driver: nvidia +- device_ids: ["0"] ++ device_ids: ["1"] + capabilities: [gpu] + ... +``` + +
+ +
+3. 调整 VLM 服务端配置 + +若您想调整 VLM 服务端的配置,可以参考 3.3.1 服务端参数调整 生成配置文件。 + +生成配置文件后,将以下的 paddleocr-vlm-server.volumespaddleocr-vlm-server.command 字段增加到您的 compose.yaml 中。请将 /path/to/your_config.yaml 替换为您的实际配置文件路径。 + +```yaml + paddleocr-vlm-server: + ... + volumes: /path/to/your_config.yaml:/home/paddleocr/vlm_server_config.yaml + command: paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /home/paddleocr/vlm_server_config.yaml + ... +``` + +
+ +
+4. 更改 VLM 推理后端 + +修改 .env 文件中的 VLM_BACKEND,例如将 VLM 推理后端修改为 fastdeploy: + +```diff + API_IMAGE_TAG_SUFFIX=latest-offline +- VLM_BACKEND=vllm ++ VLM_BACKEND=fastdeploy + VLM_IMAGE_TAG_SUFFIX=latest-offline +``` + +
+ +
+5. 调整产线相关配置(如模型路径、批处理大小、部署设备等) + +参考本文中 4.4 产线配置调整说明 小节。 + +
-### 4.2 方法二:手动安装依赖部署 +### 4.2 方法二:手动部署 执行以下命令,通过 PaddleX CLI 安装服务化部署插件: @@ -2132,6 +2277,7 @@ foreach ($result as $i => $item) { ### 4.4 产线配置调整说明 +> NOTE: > 若您无需调整产线配置,可忽略此小节。 调整服务化部署的 PaddleOCR-VL 配置只需以下三步: @@ -2149,7 +2295,7 @@ foreach ($result as $i => $item) { - vLLM:[pipeline_config_vllm.yaml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/pipeline_config_vllm.yaml) - FastDeploy:[pipeline_config_fastdeploy.yaml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/paddleocr_vl_docker/pipeline_config_fastdeploy.yaml) -**若您手动安装依赖部署:** +**若您是手动部署:** 执行以下命令生成产线配置文件: @@ -2245,7 +2391,7 @@ services: > 在生产环境中,您也可以自行构建镜像,将配置文件打包到镜像中。 -**若您手动安装依赖部署:** +**若您是手动部署:** 在启动服务时,将 `--pipeline` 参数指定为自定义配置文件路径。 diff --git a/mkdocs.yml b/mkdocs.yml index 971d98ebf1..765dea42ca 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -110,7 +110,9 @@ plugins: PP-ChatOCRv4简介: PP-ChatOCRv4 Introduction PaddleOCR-VL: PaddleOCR-VL PaddleOCR-VL简介: PaddleOCR-VL Introduction - PaddleOCR-VL RTX 50 环境配置教程: PaddleOCR-VL-RTX50 Environment Configuration Tutorial + PaddleOCR-VL NVIDIA Blackwell 架构 GPU 环境配置教程: PaddleOCR-VL NVIDIA Blackwell-Architecture GPUs Environment Configuration Tutorial + PaddleOCR-VL XPU 环境配置教程: PaddleOCR-VL XPU Environment Configuration Tutorial + PaddleOCR-VL DCU 环境配置教程: PaddleOCR-VL DCU Environment Configuration Tutorial 推理部署: Model Deploy 高性能推理: High-Performance Inference 打包PaddleOCR项目: Package PaddleOCR Projects @@ -226,6 +228,7 @@ plugins: markdown_extensions: - abbr - attr_list + - callouts - pymdownx.snippets - pymdownx.critic - pymdownx.caret @@ -298,7 +301,9 @@ nav: - PaddleOCR-VL: - 使用教程: version3.x/pipeline_usage/PaddleOCR-VL.md - PaddleOCR-VL简介: version3.x/algorithm/PaddleOCR-VL/PaddleOCR-VL.md - - PaddleOCR-VL RTX 50 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-RTX50.md + - PaddleOCR-VL NVIDIA Blackwell 架构 GPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.md + - PaddleOCR-VL XPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-XPU.md + - PaddleOCR-VL DCU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-DCU.md - 推理部署: - 高性能推理: version3.x/deployment/high_performance_inference.md - 获取ONNX模型: version3.x/deployment/obtaining_onnx_models.md