Skip to content

Commit badf029

Browse files
Merge pull request #8270 from MicrosoftDocs/main
Auto Publish – main to live - 2025-11-07 18:08 UTC
2 parents c53e193 + f281fa0 commit badf029

File tree

7 files changed

+74
-42
lines changed

7 files changed

+74
-42
lines changed

articles/ai-foundry/how-to/develop/trace-agents-sdk.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,7 @@ Azure AI Foundry makes it easy to log traces with minimal changes by using our t
265265

266266
Azure AI Foundry has native integrations with Microsoft Agent Framework and Semantic Kernel. Agents built on these two frameworks get out-of-the-box tracing in Azure AI Foundry Observability.
267267

268-
- Learn more about tracing and observability in [Semantic Kernel](/semantic-kernel/concepts/enterprise-readiness/observability) and [Microsoft Agent Framework](https://learn.microsoft.com/agent-framework/user-guide/workflows/observability).
268+
- Learn more about tracing and observability in [Semantic Kernel](/semantic-kernel/concepts/enterprise-readiness/observability) and [Microsoft Agent Framework](/agent-framework/user-guide/workflows/observability).
269269

270270

271271
### Enable tracing for Agents built on LangChain & LangGraph

articles/ai-foundry/openai/how-to/quota.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,17 @@ To minimize issues related to rate limits, it's a good idea to use the following
120120
- Avoid sharp changes in the workload. Increase the workload gradually.
121121
- Test different load increase patterns.
122122

123+
## Understanding 429 throttling errors and what to do
124+
125+
### Why you may see a 429 Error
126+
127+
You may encounter a 429 error (“Too Many Requests”) when your usage exceeds the allowed limits or when the system is experiencing high demand. We have recently improved our error messaging to make these situations more transparent and actionable.
128+
129+
### Common 429 scenarios and what to do
130+
1. **Rate Limit Exceeded**. This is the most common situation when you've received 429 responses. It means your requests exceeded the rate limit for your current quota. In this case, you can request a quota increase using the provided link in the error message.
131+
2. **System is experiencing high demand and cannot process your request**. The system is under high demand and cannot process your request due to capacity or latency limits. In this case, you can retry after the suggested time. Please note that Standard offer has no latency SLA and may experience variable latency if you exceed the [Usage tier](/azure/ai-foundry/openai/quotas-limits?tabs=REST#usage-tiers). If you are looking for improved reliability or lower latency, consider upgrading to the Premium offer (Provisioned throughput) for better predictability.
132+
133+
123134
## Automate deployment
124135

125136
This section contains brief example templates to help get you started programmatically creating deployments that use quota to set TPM rate limits. With the introduction of quota you must use API version `2023-05-01` for resource management related activities. This API version is only for managing your resources, and doesn't impact the API version used for inferencing calls like completions, chat completions, embedding, image generation, etc.

articles/ai-foundry/openai/quotas-limits.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,11 +271,25 @@ During the preview, the rate limits for each `gpt-4o` realtime model deployment
271271
|`gpt-image-1-mini` |Medium | N/A | 108 |
272272
|`gpt-image-1-mini` |High | N/A | 360 |
273273

274-
275274
## Usage tiers
276275

277276
Global Standard deployments use the global infrastructure of Azure. They dynamically route customer traffic to the data center with the best availability for the customer's inference requests. Similarly, Data Zone Standard deployments allow you to use the global infrastructure of Azure to dynamically route traffic to the data center within the Microsoft-defined data zone with the best availability for each request. This practice enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
278277

278+
Azure OpenAI usage tiers are designed to provide consistent performance for most customers with low to medium levels of traffic. Each usage tier defines the maximum throughput (tokens per minute) you can expect with predictable latency. When your usage stays within your assigned tier, latency remains stable and response times are consistent.
279+
280+
### What happens if you exceed your usage tier?
281+
282+
- If your request throughput exceeds your usage tier—especially during periods of high demand—your response latency may increase significantly.
283+
- Latency can vary and, in some cases, may be more than two times higher than when operating within your usage tier.
284+
- This variability is most noticeable for customers with high sustained usage or bursty traffic patterns.
285+
286+
### Recommended actions If you exceed your usage tier
287+
If you encounter 429 errors or notice increased latency variability, here’s what you should do:
288+
289+
- Request a quota increase: visit the Azure portal to request a higher quota for your subscription.
290+
- Consider upgrading to a premium offer (PTU): for latency-critical or high-volume workloads, upgrade to Provisioned Throughput Units (PTU). PTU provides dedicated resources, guaranteed capacity, and predictable latency—even at scale. This is the best choice for mission-critical applications that require consistent performance.
291+
- Monitor your usage: regularly review your usage metrics in the Azure portal to ensure you are operating within your tier limits. Adjust your workload or deployment strategy as needed.
292+
279293
The usage limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model. It's the total number of tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
280294

281295
> [!NOTE]
@@ -285,6 +299,10 @@ The usage limit determines the level of usage above which customers might see la
285299

286300
|Model| Usage tiers per month |
287301
|----|:----|
302+
| `gpt-5` | 32 billion tokens |
303+
| `gpt-5-mini` | 160 billion tokens |
304+
| `gpt-5-nano` | 800 billion tokens |
305+
| `gpt-5-chat` | 32 billion tokens |
288306
| `gpt-4` + `gpt-4-32k` (all versions) | 6 billion tokens |
289307
| `gpt-4o` | 12 billion tokens |
290308
| `gpt-4o-mini` | 85 billion tokens |

articles/ai-services/language-service/summarization/includes/quickstarts/rest-api.md

Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,8 @@
22
author: laujan
33
manager: nitinme
44
ms.service: azure-ai-language
5-
ms.custom:
6-
- build-2024
7-
- ignite-2024
85
ms.topic: include
9-
ms.date: 06/30/2025
6+
ms.date: 11/05/2025
107
ms.author: lajanuar
118
---
129

@@ -18,7 +15,7 @@ ms.author: lajanuar
1815

1916
---
2017

21-
Use this quickstart to send text summarization requests using the REST API. In the following example, you will use cURL to summarize documents or text-based customer service conversations.
18+
Use this quickstart to send text summarization requests using the [REST API](/rest/api/language/analyze-documents/analyze-documents-submit-job/analyze-documents-submit-job?view=rest-language-analyze-documents-2024-11-15-preview&preserve-view=true&tabs=HTTP). In the following example, you will use cURL to summarize documents or text-based customer service conversations.
2219

2320
[!INCLUDE [Use Language Studio](../use-language-studio.md)]
2421

@@ -71,7 +68,7 @@ The following example will get you started with text extractive summarization:
7168
1. Copy the command below into a text editor. The BASH example uses the `\` line continuation character. If your console or terminal uses a different line continuation character, use that character instead.
7269

7370
```bash
74-
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2023-04-01 \
71+
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2024-11-15-preview \
7572
-H "Content-Type: application/json" \
7673
-H "Ocp-Apim-Subscription-Key: $LANGUAGE_KEY" \
7774
-d \
@@ -83,7 +80,7 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2023-0
8380
{
8481
"id": "1",
8582
"language": "en",
86-
"text": "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the intersection of all three, there’s magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have pre-trained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI capabilities, achieving multi-sensory and multilingual learning that is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
83+
"text": "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the intersection of all three, there's magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have pre-trained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI capabilities, achieving multi-sensory and multilingual learning that is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
8784
}
8885
]
8986
},
@@ -107,13 +104,13 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2023-0
107104
4. Get the `operation-location` from the response header. The value will look similar to the following URL:
108105

109106
```http
110-
https://<your-language-resource-endpoint>/language/analyze-text/jobs/12345678-1234-1234-1234-12345678?api-version=2023-04-01
107+
https://<your-language-resource-endpoint>/language/analyze-text/jobs/12345678-1234-1234-1234-12345678?api-version=2024-11-15-preview
111108
```
112109

113110
5. To get the results of the request, use the following cURL command. Be sure to replace `<my-job-id>` with the numerical ID value you received from the previous `operation-location` response header:
114111

115112
```bash
116-
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-text/jobs/<my-job-id>?api-version=2023-04-01 \
113+
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-text/jobs/<my-job-id>?api-version=2024-11-15-preview \
117114
-H "Content-Type: application/json" \
118115
-H "Ocp-Apim-Subscription-Key: $LANGUAGE_KEY"
119116
```
@@ -158,7 +155,7 @@ curl -X GET $LANGUAGE_ENDPOINT/language/analyze-text/jobs/<my-job-id>?api-versio
158155
"length": 192
159156
},
160157
{
161-
"text": "At the intersection of all three, theres magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better.",
158+
"text": "At the intersection of all three, there's magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better.",
162159
"rankScore": 0.63,
163160
"offset": 517,
164161
"length": 203
@@ -203,7 +200,7 @@ The following example will get you started with conversation issue and resolutio
203200
1. Copy the command below into a text editor. The BASH example uses the `\` line continuation character. If your console or terminal uses a different line continuation character, use that character instead.
204201

205202
```bash
206-
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-version=2023-04-01 \
203+
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-version=2024-11-15-preview \
207204
-H "Content-Type: application/json" \
208205
-H "Ocp-Apim-Subscription-Key: $LANGUAGE_KEY" \
209206
-d \
@@ -215,19 +212,19 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-versi
215212
{
216213
"conversationItems": [
217214
{
218-
"text": "Hello, youre chatting with Rene. How may I help you?",
215+
"text": "Hello, you're chatting with Rene. How may I help you?",
219216
"id": "1",
220217
"role": "Agent",
221218
"participantId": "Agent_1"
222219
},
223220
{
224-
"text": "Hi, I tried to set up wifi connection for Smart Brew 300 espresso machine, but it didnt work.",
221+
"text": "Hi, I tried to set up wifi connection for Smart Brew 300 espresso machine, but it didn't work.",
225222
"id": "2",
226223
"role": "Customer",
227224
"participantId": "Customer_1"
228225
},
229226
{
230-
"text": "Im sorry to hear that. Lets see what we can do to fix this issue. Could you please try the following steps for me? First, could you push the wifi connection button, hold for 3 seconds, then let me know if the power light is slowly blinking on and off every second?",
227+
"text": "I'm sorry to hear that. Let's see what we can do to fix this issue. Could you please try the following steps for me? First, could you push the wifi connection button, hold for 3 seconds, then let me know if the power light is slowly blinking on and off every second?",
231228
"id": "3",
232229
"role": "Agent",
233230
"participantId": "Agent_1"
@@ -251,7 +248,7 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-versi
251248
"participantId": "Customer_1"
252249
},
253250
{
254-
"text": "Im very sorry to hear that. Let me see if theres another way to fix the issue. Please hold on for a minute.",
251+
"text": "I'm very sorry to hear that. Let me see if there's another way to fix the issue. Please hold on for a minute.",
255252
"id": "7",
256253
"role": "Agent",
257254
"participantId": "Agent_1"
@@ -292,13 +289,13 @@ Only the `resolution` aspect supports sentenceCount. If you do not specify the `
292289
4. Get the `operation-location` from the response header. The value will look similar to the following URL:
293290
294291
```http
295-
https://<your-language-resource-endpoint>/language/analyze-conversations/jobs/12345678-1234-1234-1234-12345678?api-version=2023-04-01
292+
https://<your-language-resource-endpoint>/language/analyze-conversations/jobs/12345678-1234-1234-1234-12345678?api-version=2024-11-15-preview
296293
```
297294
298295
5. To get the results of the request, use the following cURL command. Be sure to replace `<my-job-id>` with the numerical ID value you received from the previous `operation-location` response header:
299296
300297
```bash
301-
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs/<my-job-id>?api-version=2023-04-01 \
298+
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs/<my-job-id>?api-version=2024-11-15-preview \
302299
-H "Content-Type: application/json" \
303300
-H "Ocp-Apim-Subscription-Key: $LANGUAGE_KEY"
304301
```

articles/ai-services/language-service/summarization/quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: laujan
66
manager: nitinme
77
ms.service: azure-ai-language
88
ms.topic: quickstart
9-
ms.date: 09/15/2025
9+
ms.date: 11/05/2025
1010
ms.author: lajanuar
1111
ms.devlang: csharp
1212
# ms.devlang: csharp, java, javascript, python

0 commit comments

Comments
 (0)