Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ mkdocs.yaml @hmellor
/examples/*/pooling/ @noooop
/tests/models/*/pooling* @noooop
/tests/entrypoints/pooling @noooop
/vllm/entrypoints/pooling @aarnphm @chaunceyjiang @noooop
/vllm/config/pooler.py @noooop
/vllm/pooling_params.py @noooop
/vllm/model_executor/layers/pooler.py @noooop
Expand Down
2 changes: 1 addition & 1 deletion docs/design/io_processor_plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ The `parse_request` method is used for validating the user prompt and converting
The `pre_process*` methods take the validated plugin input to generate vLLM's model prompts for regular inference.
The `post_process*` methods take `PoolingRequestOutput` objects as input and generate a custom plugin output.
The `validate_or_generate_params` method is used for validating with the plugin any `SamplingParameters`/`PoolingParameters` received with the user request, or to generate new ones if none are specified. The function always returns the validated/generated parameters.
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/openai/serving_pooling.py).
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/pooling/pooling/serving.py).

An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/IBM/terratorch/tree/main/terratorch/vllm/plugins/segmentation). Please, also refer to our online ([examples/online_serving/pooling/prithvi_geospatial_mae.py](../../examples/online_serving/pooling/prithvi_geospatial_mae.py)) and offline ([examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py](../../examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py)) inference examples.

Expand Down
10 changes: 5 additions & 5 deletions docs/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,15 +351,15 @@ The following extra parameters are supported by default:
??? code

```python
--8<-- "vllm/entrypoints/openai/protocol.py:embedding-extra-params"
--8<-- "vllm/entrypoints/pooling/embed/protocol.py:embedding-extra-params"
```

For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead:

??? code

```python
--8<-- "vllm/entrypoints/openai/protocol.py:chat-embedding-extra-params"
--8<-- "vllm/entrypoints/pooling/embed/protocol.py:chat-embedding-extra-params"
```

### Transcriptions API
Expand Down Expand Up @@ -629,7 +629,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
The following extra parameters are supported:

```python
--8<-- "vllm/entrypoints/openai/protocol.py:classification-extra-params"
--8<-- "vllm/entrypoints/pooling/classify/protocol.py:classification-extra-params"
```

### Score API
Expand Down Expand Up @@ -834,7 +834,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
The following extra parameters are supported:

```python
--8<-- "vllm/entrypoints/openai/protocol.py:score-extra-params"
--8<-- "vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
```

### Re-rank API
Expand Down Expand Up @@ -915,7 +915,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
The following extra parameters are supported:

```python
--8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params"
--8<-- "vllm/entrypoints/pooling/score/protocol.py:rerank-extra-params"
```

## Ray Serve LLM
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/openai/test_run_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import pytest

from vllm.entrypoints.openai.protocol import BatchRequestOutput
from vllm.entrypoints.openai.run_batch import BatchRequestOutput

MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"

Expand Down
3 changes: 2 additions & 1 deletion tests/entrypoints/pooling/classify/test_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
import torch.nn.functional as F

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import ClassificationResponse, PoolingResponse
from vllm.entrypoints.pooling.classify.protocol import ClassificationResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse

MODEL_NAME = "jason9693/Qwen2.5-1.5B-apeach"
DTYPE = "float32" # Use float32 to avoid NaN issue
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/classify/test_online_vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import requests

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import ClassificationResponse
from vllm.entrypoints.pooling.classify.protocol import ClassificationResponse

VLM_MODEL_NAME = "muziyongshixin/Qwen2.5-VL-7B-for-VideoCls"
MAXIMUM_VIDEOS = 1
Expand Down
6 changes: 2 additions & 4 deletions tests/entrypoints/pooling/embed/test_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,8 @@
from tests.models.language.pooling.embed_utils import run_embedding_correctness_test
from tests.models.utils import check_embeddings_close
from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import (
EmbeddingResponse,
PoolingResponse,
)
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
from vllm.platforms import current_platform
from vllm.transformers_utils.tokenizer import get_tokenizer
from vllm.utils.serial_utils import (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from tests.models.language.pooling.embed_utils import run_embedding_correctness_test
from tests.models.utils import EmbedModelInfo
from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/embed/test_online_long_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import pytest_asyncio

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/embed/test_online_vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from transformers import AutoProcessor

from tests.utils import VLLM_PATH, RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.multimodal.utils import encode_image_base64, fetch_image

MODEL_NAME = "TIGER-Lab/VLM2Vec-Full"
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/pooling/test_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from tests.models.utils import check_embeddings_close
from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import PoolingResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
from vllm.transformers_utils.tokenizer import get_tokenizer
from vllm.utils.serial_utils import (
EMBED_DTYPE_TO_TORCH_DTYPE,
Expand Down
3 changes: 2 additions & 1 deletion tests/entrypoints/pooling/score/test_online_rerank.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
import torch.nn.functional as F

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import PoolingResponse, RerankResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
from vllm.entrypoints.pooling.score.protocol import RerankResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/score/test_online_score.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from torch import tensor

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import ScoreResponse
from vllm.entrypoints.pooling.score.protocol import ScoreResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,10 @@
from terratorch.datamodules import Sen1Floods11NonGeoDataModule

from vllm.config import VllmConfig
from vllm.entrypoints.openai.protocol import IOProcessorRequest, IOProcessorResponse
from vllm.entrypoints.pooling.pooling.protocol import (
IOProcessorRequest,
IOProcessorResponse,
)
from vllm.inputs.data import PromptType
from vllm.logger import init_logger
from vllm.outputs import PoolingRequestOutput
Expand Down
2 changes: 1 addition & 1 deletion tests/plugins_tests/test_io_processor_plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from tests.utils import RemoteOpenAIServer
from vllm.config import VllmConfig
from vllm.entrypoints.openai.protocol import IOProcessorResponse
from vllm.entrypoints.pooling.pooling.protocol import IOProcessorResponse
from vllm.plugins.io_processors import get_io_processor

MODEL_NAME = "ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11"
Expand Down
Loading