Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
284 changes: 284 additions & 0 deletions examples/embeddings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
# Set up and run locally caikit embeddings server

#### Setting Up Virtual Environment using Python venv

For [(venv)](https://docs.python.org/3/library/venv.html), make sure you are in an activated `venv` when running `python` in the example commands that follow. Use `deactivate` if you want to exit the `venv`.

```shell
python3 -m venv venv
source venv/bin/activate
```

### Models

For this tutorial, you can download [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), to do that you need to follow the steps to clone and use `git lfs` to get all the models files:

```shell
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
```

To create a model configuration and artifacts, the best practice is to run the module's bootstrap() and save() methods. This will:

* Load the model by name (from Hugging Face hub or repository) or from a local directory. The model is loaded using the sentence-transformers library.
* Save a config.yml which:
* Ties the model to the module (with a module_id GUID)
* Sets the artifacts_path to the default "artifacts" subdirectory
* Saves the model in the artifacts subdirectory
* Check an example of the folder structure at [models_](./models_/)

> For the reranker service, models supported are bi-encoder and are the same used by the other embeddings tasks.

This can be achieved by the following lines of code, using BGE as example model:

```python
import os
os.environ['ALLOW_DOWNLOADS'] = "1"

import caikit_nlp
model_name = "BAAI/bge-large-en-v1.5"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the model name to "sentence-transformers/all-MiniLM-L6-v2" just so that the person who follows this tutorial can mostly copy and paste the commands?

model = caikit_nlp.text_embedding.EmbeddingModule.bootstrap(model_name)
model.save(f"{model_name}-caikit")
```

To avoid overwriting your files, the save() will return an error if the output directory already exists. You may want to use a temporary name. After success, move the output directory to a `<model-id>` directory under your local models dir.

### Environment variables

These are the set of variables/params related to the environment which embeddings will be run:

```bash
# use IPEX optimization
IPEX_OPTIMIZE: 'true'

# use "xpu" for IPEX on GPU instead of IPEX on CPU
USE_XPU: 'false'

# IPEX performs best with autocast using bfloat16
BFLOAT16: '1'

# use Mac chip
USE_MPS: 'false'

# use Pytorch compile
PT2_COMPILE: 'false'
```

### Starting the Caikit Runtime

Run caikit-runtime configured to use the caikit-nlp library. Set up the following environment variables:

```bash
# set where the runtime should look for the models
export RUNTIME_LOCAL_MODELS_DIR=/models_

# load the models from the path set up at previous var
export RUNTIME_LAZY_LOAD_LOCAL_MODELS=true

# set the runtime
export RUNTIME_LIBRARY='caikit_nlp'
```

In one terminal, start the runtime server:

```bash
source venv/bin/activate
pip install -r requirements.txt
caikit-runtime
```

To run the library locally:

```bash
pip install caikit-nlp@file:///<path-to-your-local-caikit_nlp-clone-repo>/caikit-nlp
python -m caikit.runtime
```

### Embedding retrieval example Python client

In another terminal, run the example client code to retrieve embeddings.

```shell
source venv/bin/activate
MODEL=<model-id> python embeddings.py
```

The client code calls the model and queries for embeddings using 2 example sentences.

You should see output similar to the following:

```ShellSession
$ python embeddings.py
INPUT TEXTS: ['test first sentence', 'another test sentence']
OUTPUT: {
{
"results": [
[
-0.17895537614822388,
0.03200146183371544,
-0.030327674001455307,
...
],
[
-0.17895537614822388,
0.03200146183371544,
-0.030327674001455307,
...
]
],
"producerId": {
"name": "EmbeddingModule",
"version": "0.0.1"
},
"inputTokenCount": "9"
}
}
LENGTH: 2 x 384
```

### Sentence similarity example Python client

In another terminal, run the client code to infer sentence similarity.

```shell
source venv/bin/activate
MODEL=<model-id> python sentence_similarity.py
```

The client code calls the model and queries sentence similarity using 1 source sentence and 2 other sentences (hardcoded in sentence_similarity.py). The result produces the cosine similarity score by comparing the source sentence with each of the other sentences.

You should see output similar to the following:

```ShellSession
$ python sentence_similarity.py
SOURCE SENTENCE: first sentence
SENTENCES: ['test first sentence', 'another test sentence']
OUTPUT: {
"result": {
"scores": [
1.0000001192092896
]
},
"producerId": {
"name": "EmbeddingModule",
"version": "0.0.1"
},
"inputTokenCount": "9"
}
```

### Reranker example Python client

In another terminal, run the client code to execute the reranker task using both gRPC and REST.

```shell
source venv/bin/activate
MODEL=<model-id> python reranker.py
```

You should see output similar to the following:

```ShellSession
$ python reranker.py
======================
TOP N: 3
QUERIES: ['first sentence', 'any sentence']
DOCUMENTS: [{'text': 'first sentence', 'title': 'first title'}, {'_text': 'another sentence', 'more': 'more attributes here'}, {'text': 'a doc with a nested metadata', 'meta': {'foo': 'bar', 'i': 999, 'f': 12.34}}]
======================
RESPONSE from gRPC:
===
QUERY: first sentence
score: 0.9999997019767761 index: 0 text: first sentence
score: 0.7350112199783325 index: 1 text: another sentence
score: 0.10398174077272415 index: 2 text: a doc with a nested metadata
===
QUERY: any sentence
score: 0.6631797552108765 index: 0 text: first sentence
score: 0.6505964398384094 index: 1 text: another sentence
score: 0.11903437972068787 index: 2 text: a doc with a nested metadata
===================
RESPONSE from HTTP:
{
"results": [
{
"query": "first sentence",
"scores": [
{
"document": {
"text": "first sentence",
"title": "first title"
},
"index": 0,
"score": 0.9999997019767761,
"text": "first sentence"
},
{
"document": {
"_text": "another sentence",
"more": "more attributes here"
},
"index": 1,
"score": 0.7350112199783325,
"text": "another sentence"
},
{
"document": {
"text": "a doc with a nested metadata",
"meta": {
"foo": "bar",
"i": 999,
"f": 12.34
}
},
"index": 2,
"score": 0.10398174077272415,
"text": "a doc with a nested metadata"
}
]
},
{
"query": "any sentence",
"scores": [
{
"document": {
"text": "first sentence",
"title": "first title"
},
"index": 0,
"score": 0.6631797552108765,
"text": "first sentence"
},
{
"document": {
"_text": "another sentence",
"more": "more attributes here"
},
"index": 1,
"score": 0.6505964398384094,
"text": "another sentence"
},
{
"document": {
"text": "a doc with a nested metadata",
"meta": {
"foo": "bar",
"i": 999,
"f": 12.34
}
},
"index": 2,
"score": 0.11903437972068787,
"text": "a doc with a nested metadata"
}
]
}
],
"producerId": {
"name": "EmbeddingModule",
"version": "0.0.1"
},
"inputTokenCount": "9"
}
```
11 changes: 11 additions & 0 deletions examples/embeddings/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Standard
import os

port = (
os.getenv("CAIKIT_EMBEDDINGS_PORT") if os.getenv("CAIKIT_EMBEDDINGS_PORT") else 8085
)
host = (
os.getenv("CAIKIT_EMBEDDINGS_HOST")
if os.getenv("CAIKIT_EMBEDDINGS_HOST")
else "localhost"
)
Loading