You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/foundry-models/includes/models-azure-direct-others.md
+26Lines changed: 26 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -118,6 +118,32 @@ Microsoft models include various model groups such as MAI models, Phi models, he
118
118
119
119
See [the Microsoft model collection in Azure AI Foundry portal](https://ai.azure.com/explore/models?&selectedCollection=Microsoft/?cid=learnDocs). You can also find several Microsoft models available [from partners and community](../concepts/models-from-partners.md#microsoft).
120
120
121
+
### Model router
122
+
123
+
Model router is a large language model that intelligently selects from a set of underlying chat models to respond to a given prompt. For more information, see the [Model router overview](/azure/ai-foundry/openai/how-to/model-router).
124
+
125
+
#### Region availability
126
+
127
+
| Model | Region |
128
+
|---|---|
129
+
|`model-router` (2025-08-07) | East US 2 (Global Standard & Data Zone Standard), Sweden Central (Global Standard & Data Zone Standard) |
130
+
|`model-router` (2025-05-19) | East US 2 (Global Standard & Data Zone Standard), Sweden Central (Global Standard & Data Zone Standard) |
131
+
|`model-router` (2025-11-18) | TBD |
132
+
133
+
*Billing for Data Zone Standard model router deployments will begin no earlier than November 1, 2025.*
134
+
135
+
#### Capabilities
136
+
137
+
| Model ID | Description | Context window | Max output tokens | Training data (up to) |
138
+
| --- | :--- |:--- |:---|:---: |
139
+
|`model-router` (2025-08-07) | A model that intelligently selects from a set of underlying models to respond to a given prompt. | 200,000 | 32,768 (`GPT-4.1 series`)</br> 100,000 (`o4-mini`)</br> 128,000 (`gpt-5 reasoning models`) </br> 16,384 (`gpt-5-chat`) | - |
140
+
|`model-router` (2025-05-19) | A model that intelligently selects from a set of underlying models to respond to a given prompt. | 200,000 | 32,768 (`GPT-4.1 series`)</br> 100,000 (`o4-mini`) | May 31, 2024 |
141
+
|`model-router` (2025-11-18) | A model that intelligently selects from a configurable set of underlying chat models to respond to a given prompt. | TBD | TBD | TBD |
142
+
143
+
Larger context windows are compatible with *some* of the underlying models. That means an API call with a larger context succeeds only if the prompt happens to be routed to the right model. Otherwise, the call fails.
title: What's new in model router in Azure AI Foundry Models?
3
+
description: Learn about the latest news and features updates for Azure model router.
4
+
author: PatrickFarley
5
+
ms.author: pafarley
6
+
manager: nitinme
7
+
ms.date: 11/06/2025
8
+
ms.service: azure-ai-foundry
9
+
ms.topic: whats-new
10
+
---
11
+
12
+
# What's new in model router in Azure AI Foundry Models?
13
+
14
+
This article provides a summary of the latest releases and major documentation updates for Azure model router.
15
+
16
+
17
+
## November 2025
18
+
19
+
### Model router GA version
20
+
21
+
A new model router model is now available. Version `2025-11-18` includes support for all underlying models in previous versions, as well as 10 new language models.
22
+
23
+
It also includes new features that make it more versatile and effective.
24
+
- TBD
25
+
26
+
27
+
For more information on model router and its capabilities, see the [Model router concepts guide](../openai/concepts/model-router.md).
28
+
29
+
## August 2025
30
+
31
+
### New version of model router (preview)
32
+
33
+
- Model router now supports GPT-5 series models.
34
+
35
+
- Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../openai/concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](../openai/how-to/model-router.md).
36
+
37
+
## May 2025
38
+
39
+
### Model router (preview)
40
+
41
+
Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../openai/concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](../openai/how-to/model-router.md).
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. Thus, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.
21
21
22
22
## Why use model router?
23
23
24
24
Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single deployment and chat experience that combines the best features from all of the underlying chat models.
25
25
26
+
::: moniker range="foundry"
27
+
28
+
With the latest version of model router, you can configure the routing behavior to better match your application's needs. You can choose a predefined routing mode and specify a subset of underlying models to use. See below for more details.
29
+
30
+
::: moniker-end
31
+
26
32
## Versioning
27
33
28
34
Each version of model router is associated with a specific set of underlying models and their versions. This set is fixed—only newer versions of model router can expose new underlying models.
29
35
30
36
If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-foundry/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model automatically updates when new versions become available. When that happens, the set of underlying models also changes, which could affect the overall performance of the model and costs.
31
37
38
+
32
39
## Underlying models
33
40
34
41
|Model router version|Underlying models| Model version
Model Router automatically chooses among a set of base models for each request, and routing profiles let you skew those choices to optimize for different things while maintaining a baseline level of performance. Setting a routing profile is optional, and if you don’t set one, your deployment defaults to the `balanced` strategy.
52
+
53
+
Use routing profiles if you:
54
+
* Want a simple “set-and-go” optimization without manually benchmarking every model.
55
+
* Need to reduce spend while retaining near-maximum quality.
56
+
* Need consistent access to the highest-quality model for critical workloads.
57
+
* Want to A/B test quality vs. cost trade-offs through per-request overrides.
58
+
59
+
> [!NOTE]
60
+
> Routing modes are currently in preview. APIs, thresholds, or mode semantics might change before general availability.
| Balanced (default) | Maintain near-best quality with cost sensitivity | Includes any candidate model whose estimated accuracy is within ~1% of the top model’s accuracy | General-purpose applications, mixed workloads | Slightly higher cost than strict cost mode; not always the single top-quality model |
67
+
| Quality | Always choose the highest-quality model. This is usually the largest model, but depends on internal quality scoring, which can incorporate more than just parameter count. | Equivalent to a strict selection (α = 0) picking the top model | Mission‑critical tasks, legal/risk reviews, complex reasoning | Highest cost among modes |
68
+
| Cost | Minimize cost while staying within a broader acceptable quality band | Includes models within ~5% of best estimated accuracy, then chooses lower-cost candidate | High-volume workloads, exploratory or background processing | Possible small quality reduction vs. balanced/quality |
69
+
70
+
> [!IMPORTANT]
71
+
> The ±1% and ±5% quality deltas are internal target thresholds for in-domain evaluation sets. Actual realized differences can vary by domain, prompt style, and data distribution. Validate against your own test set.
72
+
73
+
Each mode encodes a fixed optimization pattern, but you can use per-request overrides plus workload segmentation to approximate hybrid behavior.
74
+
75
+
Routing profiles don't guarantee that a specific model will be chosen for a given request. If you need to route to a specific model (for regulatory reasons, for example), deploy that model directly instead of routing.
76
+
77
+
### Best practices with routing profiles
78
+
79
+
Consider how you can use different routing profiles in your own use cases:
80
+
* Benchmark: Run a small evaluation set under `balanced` vs. `cost` to quantify quality delta before large-scale shift.
81
+
* Start conservative: Move from `quality` → `balanced` → `cost` only after confirming acceptable outputs.
82
+
* Mixed workloads: Use deployment default = `balanced` and override individual background requests with `cost`.
83
+
* Guardrails: For safety-critical tasks, keep `quality` and add post-processing validation.
84
+
85
+
86
+
## Model subsets
87
+
88
+
The latest version of model router supports custom subsets: you can specify which underlying models to include in routing decisions. This gives you more control over cost, compliance, and performance characteristics.
89
+
90
+
You can make this specification at deployment time, and you can override it at request time.
91
+
92
+
When new base models become available, they're not included in your selection unless you explicitly add them to your deployment's inclusion list.
0 commit comments