Skip to content

Commit d8d9536

Browse files
authored
[Feat] Add PaddleOCR-VL MCP server (#17057)
* Support PaddleOCR-VL * Update docs * Install from PyPI * Update docs
1 parent 51037d2 commit d8d9536

File tree

5 files changed

+138
-93
lines changed

5 files changed

+138
-93
lines changed

docs/version3.x/deployment/mcp_server.en.md

Lines changed: 28 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,10 @@ This project provides a lightweight [Model Context Protocol (MCP)](https://model
1414
- **Currently Supported Tools**
1515
- **OCR**: Performs text detection and recognition on images and PDF files.
1616
- **PP-StructureV3**: Identifies and extracts text blocks, titles, paragraphs, images, tables, and other layout elements from images or PDF files, converting the input into Markdown documents.
17+
- **PaddleOCR-VL**: Identifies and extracts text blocks, titles, paragraphs, images, tables, and other layout elements from images or PDF files, converting the input into Markdown documents. A VLM-based approach is used.
1718
- **Supported Working Modes**
1819
- **Local Python Library**: Runs PaddleOCR pipelines directly on the local machine. This mode requires a suitable local environment and hardware, and is ideal for offline use or privacy-sensitive scenarios.
19-
- **AI Studio Community Service**: Invokes services hosted on the [PaddlePaddle AI Studio Community](https://aistudio.baidu.com/pipeline/mine). This is suitable for quick testing, prototyping, or no-code scenarios.
20+
- **PaddleOCR Official Website Service**: Invokes services provided by the [PaddleOCR Official Website](https://aistudio.baidu.com/paddleocr?lang=en). This is suitable for quick testing, prototyping, or no-code scenarios.
2021
- **Self-hosted Service**: Invokes the user's self-hosted PaddleOCR services. This mode offers the advantages of serving and high flexibility. It is suitable for scenarios requiring customized service configurations, as well as those with strict data privacy requirements. **Currently, only the basic serving solution is supported.**
2122

2223
## Examples:
@@ -83,7 +84,7 @@ Convert images containing formulas and tables to editable csv/Excel format:
8384
This section explains how to install the `paddleocr-mcp` library via pip.
8485

8586
- For the local Python library mode, you need to install both `paddleocr-mcp` and the PaddlePaddle framework along with PaddleOCR, as per the [PaddleOCR installation documentation](../installation.en.md).
86-
- For the AI Studio community service or the self-hosted service modes, if used within MCP hosts like Claude for Desktop, the server can also be run without installation via tools like `uvx`. See [2. Using with Claude for Desktop](#2-using-with-claude-for-desktop) for details.
87+
- For the PaddleOCR official website service or the self-hosted service modes, if used within MCP hosts like Claude for Desktop, the server can also be run without installation via tools like `uvx`. See [2. Using with Claude for Desktop](#2-using-with-claude-for-desktop) for details.
8788

8889
For the local Python library mode you may optionally choose convenience extras (helpful to reduce manual dependency steps):
8990

@@ -95,19 +96,19 @@ It is still recommended to use an isolated virtual environment to avoid conflict
9596
To install `paddleocr-mcp` using pip:
9697

9798
```bash
98-
# Install the wheel
99-
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.2.1/paddleocr_mcp-0.2.1-py3-none-any.whl
99+
# Install from PyPI
100+
pip install -U paddleocr-mcp
100101

101102
# Or install from source
102103
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
103104
# pip install -e mcp_server
104105

105106
# Install with optional extras (choose ONE of the following if you prefer convenience installs)
106107
# Install PaddleOCR together with the MCP server (framework not included):
107-
pip install "paddleocr-mcp[local] @ https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.2.1/paddleocr_mcp-0.2.1-py3-none-any.whl"
108+
pip install "paddleocr-mcp[local]"
108109

109110
# Install PaddleOCR and CPU PaddlePaddle framework together:
110-
pip install "paddleocr-mcp[local-cpu] @ https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.2.1/paddleocr_mcp-0.2.1-py3-none-any.whl"
111+
pip install "paddleocr-mcp[local-cpu]"
111112
```
112113

113114
To verify successful installation:
@@ -159,6 +160,7 @@ This section explains how to use the PaddleOCR MCP server within Claude for Desk
159160

160161
**Notes**:
161162

163+
- `PADDLEOCR_MCP_PIPELINE` should be set to the pipeline name. See Section 4 for more details.
162164
- `PADDLEOCR_MCP_PIPELINE_CONFIG` is optional; if not set, the default pipeline configuration will be used. If you need to adjust the configuration, such as changing the model, please refer to the [PaddleOCR documentation](../paddleocr_and_paddlex.md) to export the pipeline configuration file, and set `PADDLEOCR_MCP_PIPELINE_CONFIG` to the absolute path of this configuration file.
163165

164166
- **Inference Performance Tips**:
@@ -195,6 +197,8 @@ This section explains how to use the PaddleOCR MCP server within Claude for Desk
195197
pipeline.export_paddlex_config_to_yaml("PP-StructureV3.yaml")
196198
```
197199

200+
**For PaddleOCR-VL, it is note recommended to use CPUs for inference.**
201+
198202
**Important**:
199203

200204
- If `paddleocr_mcp` is not in your system's `PATH`, set `command` to the absolute path of the executable.
@@ -219,16 +223,15 @@ You can configure the MCP server according to your requirements to run in differ
219223

220224
See [2.1 Quick Start](#21-quick-start).
221225

222-
#### Mode 2: AI Studio Community Service
226+
#### Mode 2: PaddleOCR Official Website Service
223227

224228
1. Install `paddleocr-mcp`.
225-
2. Set up AI Studio community service.
226-
- Visit [PaddlePaddle AI Studio Community](https://aistudio.baidu.com/pipeline/mine) and log in.
227-
- Under "PaddleX Pipeline" in the "More" section on the left, click in sequence: [Create Pipeline] - [OCR] - [General OCR] - [Deploy Directly] - [Start Deployment].
228-
- After deployment, obtain your **service base URL** (e.g., `https://xxxxxx.aistudio-hub.baidu.com`).
229-
- Get your **access token** from [this page](https://aistudio.baidu.com/index/accessToken).
229+
2. Obtain the service base URL and AI Studio Community access token.
230+
231+
On this page, click "API" in the upper-left corner. Copy the `API_URL` corresponding to "Text Recognition (PP-OCRv5)", and remove the trailing endpoint (`/ocr`) to get the base URL of the service (e.g., `https://xxxxxx.aistudio-app.com`). Also copy the `TOKEN`, which is your access token. You may need to register and log in to your PaddlePaddle AI Studio Community account.
232+
230233
3. Refer to the configuration example below to modify the contents of the `claude_desktop_config.json` file.
231-
3. Restart the MCP host.
234+
4. Restart the MCP host.
232235

233236
Configuration example:
234237

@@ -251,22 +254,20 @@ Configuration example:
251254

252255
**Notes**:
253256

254-
- Replace `<your-server-url>` with your AI Studio service base URL, e.g., `https://xxxxx.aistudio-hub.baidu.com`. Make sure not to include the endpoint path (such as `/ocr`).
257+
- `PADDLEOCR_MCP_PIPELINE` should be set to the pipeline name. See Section 4 for more details.
258+
- Replace `<your-server-url>` with your service base URL.
255259
- Replace `<your-access-token>` with your access token.
256260

257261
**Important**:
258262

259263
- Do not expose your access token.
260264

261-
You may also train and deploy custom models on the platform.
262-
263265
#### Mode 3: Self-hosted Service
264266

265267
1. In the environment where you need to run the PaddleOCR inference server, run the inference server as per the [PaddleOCR serving documentation](./serving.en.md).
266268
2. Install `paddleocr-mcp` where the MCP server will run.
267-
3. Refer to the configuration example below to modify the contents of the `claude_desktop_config.json` file.
268-
4. Set `PADDLEOCR_MCP_SERVER_URL` (e.g., `"http://127.0.0.1:8000"`).
269-
5. Restart the MCP host.
269+
3. Refer to the configuration example below to modify the contents of the `claude_desktop_config.json` file. Set `PADDLEOCR_MCP_SERVER_URL` (e.g., `"http://127.0.0.1:8000"`).
270+
4. Restart the MCP host.
270271

271272
Configuration example:
272273

@@ -288,11 +289,12 @@ Configuration example:
288289

289290
**Note**:
290291

292+
- `PADDLEOCR_MCP_PIPELINE` should be set to the pipeline name. See Section 4 for more details.
291293
- Replace `<your-server-url>` with your service’s base URL (e.g., `http://127.0.0.1:8000`).
292294

293295
### 2.4 Using `uvx`
294296

295-
Currently, for the AI Studio and self-hosted modes, and (for CPU inference) the local mode, starting the MCP server via `uvx` is also supported. With this approach, manual installation of `paddleocr-mcp` is not required. The main steps are as follows:
297+
Currently, for the PaddleOCR official website and self-hosted modes, and (for CPU inference) the local mode, starting the MCP server via `uvx` is also supported. With this approach, manual installation of `paddleocr-mcp` is not required. The main steps are as follows:
296298

297299
1. Install [uv](https://docs.astral.sh/uv/#installation).
298300
2. Modify `claude_desktop_config.json`. Examples:
@@ -306,7 +308,7 @@ Currently, for the AI Studio and self-hosted modes, and (for CPU inference) the
306308
"command": "uvx",
307309
"args": [
308310
"--from",
309-
"paddleocr-mcp@https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.2.1/paddleocr_mcp-0.2.1-py3-none-any.whl",
311+
"paddleocr-mcp",
310312
"paddleocr_mcp"
311313
],
312314
"env": {
@@ -328,7 +330,7 @@ Currently, for the AI Studio and self-hosted modes, and (for CPU inference) the
328330
"command": "uvx",
329331
"args": [
330332
"--from",
331-
"paddleocr_mcp[local-cpu]@https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/mcp/paddleocr_mcp/releases/v0.2.1/paddleocr_mcp-0.2.1-py3-none-any.whl",
333+
"paddleocr_mcp[local-cpu]",
332334
"paddleocr_mcp"
333335
],
334336
"env": {
@@ -355,7 +357,7 @@ paddleocr_mcp --help
355357
Example commands:
356358

357359
```bash
358-
# OCR + AI Studio community service + stdio
360+
# OCR + PaddleOCR official website service + stdio
359361
PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN=xxxxxx paddleocr_mcp --pipeline OCR --ppocr_source aistudio --server_url https://xxxxxx.aistudio-hub.baidu.com
360362

361363
# PP-StructureV3 + local Python library + stdio
@@ -373,8 +375,8 @@ You can control the MCP server via environment variables or CLI arguments.
373375

374376
| Environment Variable | CLI Argument | Type | Description | Options | Default |
375377
| ------------------------------------- | ------------------------- | ------ | --------------------------------------------------------------------- | ---------------------------------------- | ------------- |
376-
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | Pipeline to run. | `"OCR"`, `"PP-StructureV3"` | `"OCR"` |
377-
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | Source of PaddleOCR capabilities. | `"local"` (local Python library), `"aistudio"` (AI Studio community service), `"self_hosted"` (self-hosted service) | `"local"` |
378+
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | Pipeline to run. | `"OCR"`, `"PP-StructureV3"`, `"PaddleOCR-VL"` | `"OCR"` |
379+
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | Source of PaddleOCR capabilities. | `"local"` (local Python library), `"aistudio"` (PaddleOCR official website service), `"self_hosted"` (self-hosted service) | `"local"` |
378380
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | Base URL for the underlying service (`aistudio` or `self_hosted` mode only). | - | `None` |
379381
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio access token (`aistudio` mode only). | - | `None` |
380382
| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | Read timeout for the underlying requests (seconds). | - | `60` |
@@ -389,4 +391,4 @@ You can control the MCP server via environment variables or CLI arguments.
389391

390392
- In the local Python library mode, the current tools cannot process PDF document inputs that are Base64 encoded.
391393
- In the local Python library mode, the current tools do not infer the file type based on the model's `file_type` prompt, and may fail to process some complex URLs.
392-
- For the PP-StructureV3 pipeline, if the input file contains images, the returned results may significantly increase token usage. If image content is not needed, you can explicitly exclude it through prompts to reduce resource consumption.
394+
- For the PP-StructureV3 and PaddleOCR-VL pipelines, if the input file contains images, the returned results may significantly increase token usage. If image content is not needed, you can explicitly exclude it through prompts to reduce resource consumption.

0 commit comments

Comments
 (0)