Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ mkdocs.yaml @hmellor
/examples/*/pooling/ @noooop
/tests/models/*/pooling* @noooop
/tests/entrypoints/pooling @noooop
/vllm/entrypoints/pooling @aarnphm @chaunceyjiang @noooop
/vllm/config/pooler.py @noooop
/vllm/pooling_params.py @noooop
/vllm/model_executor/layers/pooler.py @noooop
Expand Down
2 changes: 1 addition & 1 deletion docs/design/io_processor_plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ The `parse_request` method is used for validating the user prompt and converting
The `pre_process*` methods take the validated plugin input to generate vLLM's model prompts for regular inference.
The `post_process*` methods take `PoolingRequestOutput` objects as input and generate a custom plugin output.
The `validate_or_generate_params` method is used for validating with the plugin any `SamplingParameters`/`PoolingParameters` received with the user request, or to generate new ones if none are specified. The function always returns the validated/generated parameters.
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/openai/serving_pooling.py).
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/pooling/pooling/serving.py).

An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/IBM/terratorch/tree/main/terratorch/vllm/plugins/segmentation). Please, also refer to our online ([examples/online_serving/pooling/prithvi_geospatial_mae.py](../../examples/online_serving/pooling/prithvi_geospatial_mae.py)) and offline ([examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py](../../examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py)) inference examples.

Expand Down
10 changes: 5 additions & 5 deletions docs/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,15 +351,15 @@ The following extra parameters are supported by default:
??? code

```python
--8<-- "vllm/entrypoints/openai/protocol.py:embedding-extra-params"
--8<-- "vllm/entrypoints/pooling/embed/protocol.py:embedding-extra-params"
```

For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead:

??? code

```python
--8<-- "vllm/entrypoints/openai/protocol.py:chat-embedding-extra-params"
--8<-- "vllm/entrypoints/pooling/embed/protocol.py:chat-embedding-extra-params"
```

### Transcriptions API
Expand Down Expand Up @@ -629,7 +629,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
The following extra parameters are supported:

```python
--8<-- "vllm/entrypoints/openai/protocol.py:classification-extra-params"
--8<-- "vllm/entrypoints/pooling/classify/protocol.py:classification-extra-params"
```

### Score API
Expand Down Expand Up @@ -834,7 +834,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
The following extra parameters are supported:

```python
--8<-- "vllm/entrypoints/openai/protocol.py:score-extra-params"
--8<-- "vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
```

### Re-rank API
Expand Down Expand Up @@ -915,7 +915,7 @@ The following [pooling parameters][vllm.PoolingParams] are supported.
The following extra parameters are supported:

```python
--8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params"
--8<-- "vllm/entrypoints/pooling/score/protocol.py:rerank-extra-params"
```

## Ray Serve LLM
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/openai/test_run_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import pytest

from vllm.entrypoints.openai.protocol import BatchRequestOutput
from vllm.entrypoints.openai.run_batch import BatchRequestOutput

MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"

Expand Down
3 changes: 2 additions & 1 deletion tests/entrypoints/pooling/classify/test_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
import torch.nn.functional as F

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import ClassificationResponse, PoolingResponse
from vllm.entrypoints.pooling.classify.protocol import ClassificationResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse

MODEL_NAME = "jason9693/Qwen2.5-1.5B-apeach"
DTYPE = "float32" # Use float32 to avoid NaN issue
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/classify/test_online_vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import requests

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import ClassificationResponse
from vllm.entrypoints.pooling.classify.protocol import ClassificationResponse

VLM_MODEL_NAME = "muziyongshixin/Qwen2.5-VL-7B-for-VideoCls"
MAXIMUM_VIDEOS = 1
Expand Down
6 changes: 2 additions & 4 deletions tests/entrypoints/pooling/embed/test_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,8 @@
from tests.models.language.pooling.embed_utils import run_embedding_correctness_test
from tests.models.utils import check_embeddings_close
from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import (
EmbeddingResponse,
PoolingResponse,
)
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
from vllm.platforms import current_platform
from vllm.transformers_utils.tokenizer import get_tokenizer
from vllm.utils.serial_utils import (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from tests.models.language.pooling.embed_utils import run_embedding_correctness_test
from tests.models.utils import EmbedModelInfo
from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/embed/test_online_long_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import pytest_asyncio

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/embed/test_online_vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from transformers import AutoProcessor

from tests.utils import VLLM_PATH, RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import EmbeddingResponse
from vllm.entrypoints.pooling.embed.protocol import EmbeddingResponse
from vllm.multimodal.utils import encode_image_base64, fetch_image

MODEL_NAME = "TIGER-Lab/VLM2Vec-Full"
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/pooling/test_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from tests.models.utils import check_embeddings_close
from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import PoolingResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
from vllm.transformers_utils.tokenizer import get_tokenizer
from vllm.utils.serial_utils import (
EMBED_DTYPE_TO_TORCH_DTYPE,
Expand Down
3 changes: 2 additions & 1 deletion tests/entrypoints/pooling/score/test_online_rerank.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
import torch.nn.functional as F

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import PoolingResponse, RerankResponse
from vllm.entrypoints.pooling.pooling.protocol import PoolingResponse
from vllm.entrypoints.pooling.score.protocol import RerankResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
2 changes: 1 addition & 1 deletion tests/entrypoints/pooling/score/test_online_score.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from torch import tensor

from tests.utils import RemoteOpenAIServer
from vllm.entrypoints.openai.protocol import ScoreResponse
from vllm.entrypoints.pooling.score.protocol import ScoreResponse
from vllm.platforms import current_platform

if current_platform.is_rocm():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,10 @@
from terratorch.datamodules import Sen1Floods11NonGeoDataModule

from vllm.config import VllmConfig
from vllm.entrypoints.openai.protocol import IOProcessorRequest, IOProcessorResponse
from vllm.entrypoints.pooling.pooling.protocol import (
IOProcessorRequest,
IOProcessorResponse,
)
from vllm.inputs.data import PromptType
from vllm.logger import init_logger
from vllm.outputs import PoolingRequestOutput
Expand Down
2 changes: 1 addition & 1 deletion tests/plugins_tests/test_io_processor_plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from tests.utils import RemoteOpenAIServer
from vllm.config import VllmConfig
from vllm.entrypoints.openai.protocol import IOProcessorResponse
from vllm.entrypoints.pooling.pooling.protocol import IOProcessorResponse
from vllm.plugins.io_processors import get_io_processor

MODEL_NAME = "ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11"
Expand Down
Loading