Skip to content

Commit 8303169

Browse files
committed
Update streaming document
1 parent 42bedfb commit 8303169

File tree

1 file changed

+45
-0
lines changed

1 file changed

+45
-0
lines changed

app/ai-gateway/streaming.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,51 @@ The following is an example `llm/v1/completions` route streaming request:
139139

140140
You should receive each batch of tokens as HTTP chunks, each containing one or many server-sent events.
141141

142+
### Token usage in streaming responses {% new_in 3.13 %}
143+
144+
You can receive token usage statistics in an SSE streaming response. Set the following parameter in the request JSON:
145+
146+
```json
147+
{
148+
"stream_options": {
149+
"include_usage": true
150+
}
151+
}
152+
```
153+
154+
When you set this parameter, the `usage` object appears in the final SSE frame, before the `[DONE]` terminator. This object contains token count statistics for the request.
155+
156+
157+
The following example shows how to request and process token usage statistics in a streaming response:
158+
159+
```python
160+
from openai import OpenAI
161+
162+
client = OpenAI(
163+
base_url="http://127.0.0.1:8000/openai",
164+
api_key="none"
165+
)
166+
167+
stream = client.chat.completions.create(
168+
model="gpt-4",
169+
messages=[{"role": "user", "content": "Tell me the history of Kong Inc."}],
170+
stream=True,
171+
stream_options={"include_usage": True}
172+
)
173+
174+
for chunk in stream:
175+
if chunk.choices and chunk.choices[0].delta.content:
176+
print(chunk.choices[0].delta.content, end="", flush=True)
177+
if chunk.usage:
178+
print("\nDONE. Usage stats:\n")
179+
print(chunk.usage)
180+
```
181+
182+
{:.info}
183+
> This feature works with any provider and model when `llm_format` is set to `openai` mode.
184+
>
185+
> See the [OpenAI API Documentation](https://platform.openai.com/docs/api-reference/chat/create#chat_create-stream_options) for more information on stream options.
186+
142187
### Response streaming configuration parameters
143188

144189
In the AI Proxy and AI Proxy Advanced plugin configuration, you can set an optional field `config.response_streaming` to one of three values:

0 commit comments

Comments
 (0)