Skip to content

Commit 0d5eb5c

Browse files
authored
Merge pull request #8224 from PatrickFarley/mr-ignite
Model router nextgen ignite
2 parents eb29b75 + e5db045 commit 0d5eb5c

File tree

9 files changed

+331
-49
lines changed

9 files changed

+331
-49
lines changed

articles/ai-foundry/default/toc-files-foundry/model-capabilities/toc.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,6 @@ items:
2121
href: ../../../../ai-services/speech-service/voice-live-quickstart.md?toc=/azure/ai-foundry/default/toc.json&bc=/azure/ai-foundry/breadcrumb/toc.json
2222
- name: Concepts
2323
items:
24-
- name: Model router concepts
25-
href: ../../../openai/concepts/model-router.md
26-
displayName: gpt-5, GPT-5, gpt-5-nano, gpt-5-mini
2724
- name: Video generation (preview)
2825
href: ../../../openai/concepts/video-generation.md
2926
- name: Video translation (preview)

articles/ai-foundry/default/toc-files-foundry/model-catalog/toc.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ items:
77
href: ../../../foundry-models/concepts/model-versions.md
88
- name: Marketplace configuration for partner models
99
href: ../../../foundry-models/how-to/configure-marketplace.md
10+
- name: Model router
11+
href: ../../../openai/how-to/model-router.md
1012
- name: Work with chat models
1113
href: ../../../foundry-models/how-to/use-chat-completions.md
1214
- name: Image and video models
@@ -38,5 +40,4 @@ items:
3840
- name: Use blocklists
3941
href: ../../../openai/how-to/use-blocklists.md
4042

41-
- name: Model router
42-
href: ../../../openai/how-to/model-router.md
43+

articles/ai-foundry/foundry-models/includes/models-azure-direct-others.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,32 @@ Microsoft models include various model groups such as MAI models, Phi models, he
118118

119119
See [the Microsoft model collection in Azure AI Foundry portal](https://ai.azure.com/explore/models?&selectedCollection=Microsoft/?cid=learnDocs). You can also find several Microsoft models available [from partners and community](../concepts/models-from-partners.md#microsoft).
120120

121+
### Model router
122+
123+
Model router is a large language model that intelligently selects from a set of underlying chat models to respond to a given prompt. For more information, see the [Model router overview](/azure/ai-foundry/openai/how-to/model-router).
124+
125+
#### Region availability
126+
127+
| Model | Region |
128+
|---|---|
129+
| `model-router` (2025-08-07) | East US 2 (Global Standard & Data Zone Standard), Sweden Central (Global Standard & Data Zone Standard) |
130+
| `model-router` (2025-05-19) | East US 2 (Global Standard & Data Zone Standard), Sweden Central (Global Standard & Data Zone Standard) |
131+
| `model-router` (2025-11-18) | TBD |
132+
133+
*Billing for Data Zone Standard model router deployments will begin no earlier than November 1, 2025.*
134+
135+
#### Capabilities
136+
137+
| Model ID | Description | Context window | Max output tokens | Training data (up to) |
138+
| --- | :--- |:--- |:---|:---: |
139+
| `model-router` (2025-08-07) | A model that intelligently selects from a set of underlying models to respond to a given prompt. | 200,000 | 32,768 (`GPT-4.1 series`)</br> 100,000 (`o4-mini`)</br> 128,000 (`gpt-5 reasoning models`) </br> 16,384 (`gpt-5-chat`) | - |
140+
| `model-router` (2025-05-19) | A model that intelligently selects from a set of underlying models to respond to a given prompt. | 200,000 | 32,768 (`GPT-4.1 series`)</br> 100,000 (`o4-mini`) | May 31, 2024 |
141+
| `model-router` (2025-11-18) | A model that intelligently selects from a configurable set of underlying chat models to respond to a given prompt. | TBD | TBD | TBD |
142+
143+
Larger context windows are compatible with *some* of the underlying models. That means an API call with a larger context succeeds only if the prompt happens to be routed to the right model. Otherwise, the call fails.
144+
145+
146+
121147
## Mistral models sold directly by Azure
122148

123149
::: moniker range="foundry-classic"
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: What's new in model router in Azure AI Foundry Models?
3+
description: Learn about the latest news and features updates for Azure model router.
4+
author: PatrickFarley
5+
ms.author: pafarley
6+
manager: nitinme
7+
ms.date: 11/06/2025
8+
ms.service: azure-ai-foundry
9+
ms.topic: whats-new
10+
---
11+
12+
# What's new in model router in Azure AI Foundry Models?
13+
14+
This article provides a summary of the latest releases and major documentation updates for Azure model router.
15+
16+
17+
## November 2025
18+
19+
### Model router GA version
20+
21+
A new model router model is now available. Version `2025-11-18` includes support for all underlying models in previous versions, as well as 10 new language models.
22+
23+
It also includes new features that make it more versatile and effective.
24+
- TBD
25+
26+
27+
For more information on model router and its capabilities, see the [Model router concepts guide](../openai/concepts/model-router.md).
28+
29+
## August 2025
30+
31+
### New version of model router (preview)
32+
33+
- Model router now supports GPT-5 series models.
34+
35+
- Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../openai/concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](../openai/how-to/model-router.md).
36+
37+
## May 2025
38+
39+
### Model router (preview)
40+
41+
Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../openai/concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](../openai/how-to/model-router.md).

articles/ai-foundry/openai/concepts/model-router.md

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Model router for Azure AI Foundry (preview) concepts
2+
title: Model router for Azure AI Foundry concepts
33
titleSuffix: Azure OpenAI
44
description: Learn about the model router feature in Azure OpenAI in Azure AI Foundry Models.
55
author: PatrickFarley
@@ -15,26 +15,85 @@ monikerRange: 'foundry-classic || foundry'
1515

1616
---
1717

18-
# Model router for Azure AI Foundry (preview)
18+
# Model router for Azure AI Foundry
1919

2020
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. Thus, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.
2121

2222
## Why use model router?
2323

2424
Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single deployment and chat experience that combines the best features from all of the underlying chat models.
2525

26+
::: moniker range="foundry"
27+
28+
With the latest version of model router, you can configure the routing behavior to better match your application's needs. You can choose a predefined routing mode and specify a subset of underlying models to use. See below for more details.
29+
30+
::: moniker-end
31+
2632
## Versioning
2733

2834
Each version of model router is associated with a specific set of underlying models and their versions. This set is fixed&mdash;only newer versions of model router can expose new underlying models.
2935

3036
If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-foundry/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model automatically updates when new versions become available. When that happens, the set of underlying models also changes, which could affect the overall performance of the model and costs.
3137

38+
3239
## Underlying models
3340

3441
|Model router version|Underlying models| Model version
3542
|:---:|:---|:----:|
3643
| `2025-08-07` | `gpt-4.1` </br>`gpt-4.1-mini` </br>`gpt-4.1-nano` </br>`o4-mini` </br> `gpt-5` <br> `gpt-5-mini` <br> `gpt-5-nano` <br> `gpt-5-chat` | `2025-04-14` <br> `2025-04-14` <br> `2025-04-14` <br> `2025-04-16` <br> `2025-08-07` <br> `2025-08-07` <br> `2025-08-07` <br> `2025-08-07` |
3744
|`2025-05-19`| `gpt-4.1` </br>`gpt-4.1-mini` </br>`gpt-4.1-nano` </br>`o4-mini` | `2025-04-14` <br> `2025-04-14` <br> `2025-04-14` <br> `2025-04-16` |
45+
|`2025-11-18`| `gpt-4.1` </br> `gpt-4.1-mini` </br>`gpt-4.1-nano` </br>`o4-mini`<br> `gpt-5-nano` <br>`gpt-5-mini`<br>`gpt-5`<br>`gpt-5-chat`<br>`Deepseek-v3.1`<br>`llama-33-70b-instruct`<br>`gpt-oss-120b`<br>`llama4-maverick-instruct`<br>`grok-4`<br>`grok-4-fast`<br>`gpt-4o`<br>`gpt-4o-mini` | `2025-04-14` <br> `2025-04-14` <br> `2025-04-14` <br> `2025-04-16` , <br> `2025-08-07`<br> `2025-08-07`<br> `2025-08-07`<br> `2025-08-07` <br> N/A <br> N/A<br> N/A<br> N/A<br> N/A<br> N/A <br> TBD <br> TBD |
46+
47+
::: moniker range="foundry"
48+
49+
## Routing profiles
50+
51+
Model Router automatically chooses among a set of base models for each request, and routing profiles let you skew those choices to optimize for different things while maintaining a baseline level of performance. Setting a routing profile is optional, and if you don’t set one, your deployment defaults to the `balanced` strategy.
52+
53+
Use routing profiles if you:
54+
* Want a simple “set-and-go” optimization without manually benchmarking every model.
55+
* Need to reduce spend while retaining near-maximum quality.
56+
* Need consistent access to the highest-quality model for critical workloads.
57+
* Want to A/B test quality vs. cost trade-offs through per-request overrides.
58+
59+
> [!NOTE]
60+
> Routing modes are currently in preview. APIs, thresholds, or mode semantics might change before general availability.
61+
62+
### Available routing profiles
63+
64+
| Mode | Objective | Selection logic (conceptual) | Typical use cases | Trade-offs |
65+
|------|-----------|------------------------------|-------------------|------------|
66+
| Balanced (default) | Maintain near-best quality with cost sensitivity | Includes any candidate model whose estimated accuracy is within ~1% of the top model’s accuracy | General-purpose applications, mixed workloads | Slightly higher cost than strict cost mode; not always the single top-quality model |
67+
| Quality | Always choose the highest-quality model. This is usually the largest model, but depends on internal quality scoring, which can incorporate more than just parameter count. | Equivalent to a strict selection (α = 0) picking the top model | Mission‑critical tasks, legal/risk reviews, complex reasoning | Highest cost among modes |
68+
| Cost | Minimize cost while staying within a broader acceptable quality band | Includes models within ~5% of best estimated accuracy, then chooses lower-cost candidate | High-volume workloads, exploratory or background processing | Possible small quality reduction vs. balanced/quality |
69+
70+
> [!IMPORTANT]
71+
> The ±1% and ±5% quality deltas are internal target thresholds for in-domain evaluation sets. Actual realized differences can vary by domain, prompt style, and data distribution. Validate against your own test set.
72+
73+
Each mode encodes a fixed optimization pattern, but you can use per-request overrides plus workload segmentation to approximate hybrid behavior.
74+
75+
Routing profiles don't guarantee that a specific model will be chosen for a given request. If you need to route to a specific model (for regulatory reasons, for example), deploy that model directly instead of routing.
76+
77+
### Best practices with routing profiles
78+
79+
Consider how you can use different routing profiles in your own use cases:
80+
* Benchmark: Run a small evaluation set under `balanced` vs. `cost` to quantify quality delta before large-scale shift.
81+
* Start conservative: Move from `quality``balanced``cost` only after confirming acceptable outputs.
82+
* Mixed workloads: Use deployment default = `balanced` and override individual background requests with `cost`.
83+
* Guardrails: For safety-critical tasks, keep `quality` and add post-processing validation.
84+
85+
86+
## Model subsets
87+
88+
The latest version of model router supports custom subsets: you can specify which underlying models to include in routing decisions. This gives you more control over cost, compliance, and performance characteristics.
89+
90+
You can make this specification at deployment time, and you can override it at request time.
91+
92+
When new base models become available, they're not included in your selection unless you explicitly add them to your deployment's inclusion list.
93+
94+
::: moniker-end
95+
96+
3897

3998
## Limitations
4099

0 commit comments

Comments
 (0)