Skip to content

Commit 62de4f4

Browse files
authored
[Frontend] Resettle pooling entrypoints (#29634)
Signed-off-by: wang.yuqi <[email protected]>
1 parent 83805a6 commit 62de4f4

39 files changed

+1264
-1067
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ mkdocs.yaml @hmellor
149149
/examples/*/pooling/ @noooop
150150
/tests/models/*/pooling* @noooop
151151
/tests/entrypoints/pooling @noooop
152+
/vllm/entrypoints/pooling @aarnphm @chaunceyjiang @noooop
152153
/vllm/config/pooler.py @noooop
153154
/vllm/pooling_params.py @noooop
154155
/vllm/model_executor/layers/pooler.py @noooop

docs/design/io_processor_plugins.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ The `parse_request` method is used for validating the user prompt and converting
7777
The `pre_process*` methods take the validated plugin input to generate vLLM's model prompts for regular inference.
7878
The `post_process*` methods take `PoolingRequestOutput` objects as input and generate a custom plugin output.
7979
The `validate_or_generate_params` method is used for validating with the plugin any `SamplingParameters`/`PoolingParameters` received with the user request, or to generate new ones if none are specified. The function always returns the validated/generated parameters.
80-
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/openai/serving_pooling.py).
80+
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/pooling/pooling/serving.py).
8181

8282
An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/IBM/terratorch/tree/main/terratorch/vllm/plugins/segmentation). Please, also refer to our online ([examples/online_serving/pooling/prithvi_geospatial_mae.py](../../examples/online_serving/pooling/prithvi_geospatial_mae.py)) and offline ([examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py](../../examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py)) inference examples.
8383

docs/serving/openai_compatible_server.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -351,15 +351,15 @@ The following extra parameters are supported by default:
351351
??? code
352352

353353
```python
354-
--8<-- "vllm/entrypoints/openai/protocol.py:embedding-extra-params"
354+
--8<-- "vllm/entrypoints/pooling/embed/protocol.py:embedding-extra-params"
355355
```
356356

357357
For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead:
358358

359359
??? code
360360

361361
```python
362-
--8<-- "vllm/entrypoints/openai/protocol.py:chat-embedding-extra-params"
362+
--8<-- "vllm/entrypoints/pooling/embed/protocol.py:chat-embedding-extra-params"
363363
```
364364

365365
### Transcriptions API
@@ -629,7 +629,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
629629
The following extra parameters are supported:
630630

631631
```python
632-
--8<-- "vllm/entrypoints/openai/protocol.py:classification-extra-params"
632+
--8<-- "vllm/entrypoints/pooling/classify/protocol.py:classification-extra-params"
633633
```
634634

635635
### Score API
@@ -834,7 +834,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
834834
The following extra parameters are supported:
835835

836836
```python
837-
--8<-- "vllm/entrypoints/openai/protocol.py:score-extra-params"
837+
--8<-- "vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
838838
```
839839

840840
### Re-rank API
@@ -915,7 +915,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
915915
The following extra parameters are supported:
916916

917917
```python
918-
--8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params"
918+
--8<-- "vllm/entrypoints/pooling/score/protocol.py:rerank-extra-params"
919919
```
920920

921921
## Ray Serve LLM

tests/entrypoints/openai/test_run_batch.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
import pytest
99

10-
from vllm.entrypoints.openai.protocol import BatchRequestOutput
10+
from vllm.entrypoints.openai.run_batch import BatchRequestOutput
1111

1212
MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"
1313

tests/entrypoints/pooling/classify/test_online.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@
77
import torch.nn.functional as F
88

99
from tests.utils import RemoteOpenAIServer
10-
from vllm.entrypoints.openai.protocol import ClassificationResponse, PoolingResponse
10+
from vllm.entrypoints.pooling.classify.protocol import ClassificationResponse
11+
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
1112

1213
MODEL_NAME = "jason9693/Qwen2.5-1.5B-apeach"
1314
DTYPE = "float32" # Use float32 to avoid NaN issue

tests/entrypoints/pooling/classify/test_online_vision.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
import requests
88

99
from tests.utils import RemoteOpenAIServer
10-
from vllm.entrypoints.openai.protocol import ClassificationResponse
10+
from vllm.entrypoints.pooling.classify.protocol import ClassificationResponse
1111

1212
VLM_MODEL_NAME = "muziyongshixin/Qwen2.5-VL-7B-for-VideoCls"
1313
MAXIMUM_VIDEOS = 1

tests/entrypoints/pooling/embed/test_online.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,8 @@
1515
from tests.models.language.pooling.embed_utils import run_embedding_correctness_test
1616
from tests.models.utils import check_embeddings_close
1717
from tests.utils import RemoteOpenAIServer
18-
from vllm.entrypoints.openai.protocol import (
19-
EmbeddingResponse,
20-
PoolingResponse,
21-
)
18+
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
19+
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
2220
from vllm.platforms import current_platform
2321
from vllm.transformers_utils.tokenizer import get_tokenizer
2422
from vllm.utils.serial_utils import (

tests/entrypoints/pooling/embed/test_online_dimensions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from tests.models.language.pooling.embed_utils import run_embedding_correctness_test
1212
from tests.models.utils import EmbedModelInfo
1313
from tests.utils import RemoteOpenAIServer
14-
from vllm.entrypoints.openai.protocol import EmbeddingResponse
14+
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
1515
from vllm.platforms import current_platform
1616

1717
if current_platform.is_rocm():

tests/entrypoints/pooling/embed/test_online_long_text.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
import pytest_asyncio
1616

1717
from tests.utils import RemoteOpenAIServer
18-
from vllm.entrypoints.openai.protocol import EmbeddingResponse
18+
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
1919
from vllm.platforms import current_platform
2020

2121
if current_platform.is_rocm():

tests/entrypoints/pooling/embed/test_online_vision.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from transformers import AutoProcessor
99

1010
from tests.utils import VLLM_PATH, RemoteOpenAIServer
11-
from vllm.entrypoints.openai.protocol import EmbeddingResponse
11+
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
1212
from vllm.multimodal.utils import encode_image_base64, fetch_image
1313

1414
MODEL_NAME = "TIGER-Lab/VLM2Vec-Full"

0 commit comments

Comments
 (0)