Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions app/_data/products/gateway.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1478,6 +1478,193 @@ releases:
- 3.2
- Confluent Cloud

- release: "3.13"
unreleased: true
latest: false
ee-version: "3.12.0.1"
eol: 2026-12-08
distributions:
- amazonlinux2:
package: true
package_support:
fips: false
arm: true
graviton: true
docker: true
- amazonlinux2023:
package: true
package_support:
fips: false
arm: true
graviton: true
docker: true
default: true
- debian11:
package: true
package_support:
fips: false
arm: true
graviton: true
docker: true
- debian12:
package: true
package_support:
fips: false
arm: true
graviton: true
docker: true
default: true
- rhel8:
package: true
package_support:
arm: false
graviton: false
fips: true
docker: false
- rhel9:
package: true
package_support:
graviton: false
arm: true
fips: true
docker: true
docker_support:
fips: true
default: true
- ubuntu2004:
package: true
package_support:
graviton: false
arm: false
fips: true
docker: false
eol: April 2025
- ubuntu2204:
package: true
package_support:
arm: true
graviton: true
fips: true
docker: true
docker_support:
fips: true
- ubuntu2404:
package: true
package_support:
arm: true
graviton: true
fips: true
docker: true
docker_support:
fips: true
default: true
third_party_support:
ai_providers:
- openai:
- cohere:
- azure_ai:
- anthropic:
- mistral:
- llama2:
format:
- Raw
- OLLAMA
- OpenAI
- bedrock:
- gemini:

s3_api:
- s3
- minio

log_provider:
- splunk
- datadog
- loggly

service_mesh:
- kongmesh:
versions:
- 2.0
- istio:
versions:
- 1.16
- 1.15
- 1.14

identity_provider:
- auth0
- cognito
- connect2id
- curity
- dex
- gluu
- google
- identityserver
- keycloak
- azure-ad
- microsoft-adfs
- microsoft-live-connect
- okta
- onelogin
- openam
- paypal
- pingfederate
- salesforce
- wso2
- yahoo

vault:
- vaultproject:
versions:
- 1.12
- aws-sm:
- azure-key-vaults:
- gcp-sm:
- conjur:
versions:
- 1.22.2-12
metrics:
- prometheus:
versions:
- 2.40
- 2.37
- statsd:
versions:
- 0.9
- opentelemetry:
- zipkin:
versions:
- 2.23
- 2.22

datastore:
- postgres:
versions:
- 17
- 16
- 15
- 14
- 13
- Amazon RDS
- Amazon Aurora
- redis:
versions:
- 6
- 7
- AWS Elasticache
- valkey:
versions:
- 8
- influxdb:
versions:
- 1
- kafka:
versions:
- 3.3
- 3.2
- Confluent Cloud

cloud_deployment_platforms:
- AWS EKS
- AWS EKS Fargate
Expand Down
45 changes: 45 additions & 0 deletions app/ai-gateway/streaming.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,51 @@ The following is an example `llm/v1/completions` route streaming request:

You should receive each batch of tokens as HTTP chunks, each containing one or many server-sent events.

### Token usage in streaming responses {% new_in 3.13 %}

You can receive token usage statistics in an SSE streaming response. Set the following parameter in the request JSON:

```json
{
"stream_options": {
"include_usage": true
}
}
```

When you set this parameter, the `usage` object appears in the final SSE frame, before the `[DONE]` terminator. This object contains token count statistics for the request.


The following example shows how to request and process token usage statistics in a streaming response:

```python
from openai import OpenAI

client = OpenAI(
base_url="http://127.0.0.1:8000/openai",
api_key="none"
)

stream = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Tell me the history of Kong Inc."}],
stream=True,
stream_options={"include_usage": True}
)

for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
if chunk.usage:
print("\nDONE. Usage stats:\n")
print(chunk.usage)
```

{:.info}
> This feature works with any provider and model when `llm_format` is set to `openai` mode.
>
> See the [OpenAI API Documentation](https://platform.openai.com/docs/api-reference/chat/create#chat_create-stream_options) for more information on stream options.

### Response streaming configuration parameters

In the AI Proxy and AI Proxy Advanced plugin configuration, you can set an optional field `config.response_streaming` to one of three values:
Expand Down
Loading