You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/how-to/develop/trace-agents-sdk.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -265,7 +265,7 @@ Azure AI Foundry makes it easy to log traces with minimal changes by using our t
265
265
266
266
Azure AI Foundry has native integrations with Microsoft Agent Framework and Semantic Kernel. Agents built on these two frameworks get out-of-the-box tracing in Azure AI Foundry Observability.
267
267
268
-
- Learn more about tracing and observability in [Semantic Kernel](/semantic-kernel/concepts/enterprise-readiness/observability) and [Microsoft Agent Framework](https://learn.microsoft.com/agent-framework/user-guide/workflows/observability).
268
+
- Learn more about tracing and observability in [Semantic Kernel](/semantic-kernel/concepts/enterprise-readiness/observability) and [Microsoft Agent Framework](/agent-framework/user-guide/workflows/observability).
269
269
270
270
271
271
### Enable tracing for Agents built on LangChain & LangGraph
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/how-to/quota.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -120,6 +120,17 @@ To minimize issues related to rate limits, it's a good idea to use the following
120
120
- Avoid sharp changes in the workload. Increase the workload gradually.
121
121
- Test different load increase patterns.
122
122
123
+
## Understanding 429 throttling errors and what to do
124
+
125
+
### Why you may see a 429 Error
126
+
127
+
You may encounter a 429 error (“Too Many Requests”) when your usage exceeds the allowed limits or when the system is experiencing high demand. We have recently improved our error messaging to make these situations more transparent and actionable.
128
+
129
+
### Common 429 scenarios and what to do
130
+
1.**Rate Limit Exceeded**. This is the most common situation when you've received 429 responses. It means your requests exceeded the rate limit for your current quota. In this case, you can request a quota increase using the provided link in the error message.
131
+
2.**System is experiencing high demand and cannot process your request**. The system is under high demand and cannot process your request due to capacity or latency limits. In this case, you can retry after the suggested time. Please note that Standard offer has no latency SLA and may experience variable latency if you exceed the [Usage tier](/azure/ai-foundry/openai/quotas-limits?tabs=REST#usage-tiers). If you are looking for improved reliability or lower latency, consider upgrading to the Premium offer (Provisioned throughput) for better predictability.
132
+
133
+
123
134
## Automate deployment
124
135
125
136
This section contains brief example templates to help get you started programmatically creating deployments that use quota to set TPM rate limits. With the introduction of quota you must use API version `2023-05-01` for resource management related activities. This API version is only for managing your resources, and doesn't impact the API version used for inferencing calls like completions, chat completions, embedding, image generation, etc.
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/quotas-limits.md
+19-1Lines changed: 19 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -271,11 +271,25 @@ During the preview, the rate limits for each `gpt-4o` realtime model deployment
271
271
|`gpt-image-1-mini`|Medium | N/A | 108 |
272
272
|`gpt-image-1-mini`|High | N/A | 360 |
273
273
274
-
275
274
## Usage tiers
276
275
277
276
Global Standard deployments use the global infrastructure of Azure. They dynamically route customer traffic to the data center with the best availability for the customer's inference requests. Similarly, Data Zone Standard deployments allow you to use the global infrastructure of Azure to dynamically route traffic to the data center within the Microsoft-defined data zone with the best availability for each request. This practice enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
278
277
278
+
Azure OpenAI usage tiers are designed to provide consistent performance for most customers with low to medium levels of traffic. Each usage tier defines the maximum throughput (tokens per minute) you can expect with predictable latency. When your usage stays within your assigned tier, latency remains stable and response times are consistent.
279
+
280
+
### What happens if you exceed your usage tier?
281
+
282
+
- If your request throughput exceeds your usage tier—especially during periods of high demand—your response latency may increase significantly.
283
+
- Latency can vary and, in some cases, may be more than two times higher than when operating within your usage tier.
284
+
- This variability is most noticeable for customers with high sustained usage or bursty traffic patterns.
285
+
286
+
### Recommended actions If you exceed your usage tier
287
+
If you encounter 429 errors or notice increased latency variability, here’s what you should do:
288
+
289
+
- Request a quota increase: visit the Azure portal to request a higher quota for your subscription.
290
+
- Consider upgrading to a premium offer (PTU): for latency-critical or high-volume workloads, upgrade to Provisioned Throughput Units (PTU). PTU provides dedicated resources, guaranteed capacity, and predictable latency—even at scale. This is the best choice for mission-critical applications that require consistent performance.
291
+
- Monitor your usage: regularly review your usage metrics in the Azure portal to ensure you are operating within your tier limits. Adjust your workload or deployment strategy as needed.
292
+
279
293
The usage limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model. It's the total number of tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
280
294
281
295
> [!NOTE]
@@ -285,6 +299,10 @@ The usage limit determines the level of usage above which customers might see la
Copy file name to clipboardExpand all lines: articles/ai-services/language-service/summarization/includes/quickstarts/rest-api.md
+14-17Lines changed: 14 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,11 +2,8 @@
2
2
author: laujan
3
3
manager: nitinme
4
4
ms.service: azure-ai-language
5
-
ms.custom:
6
-
- build-2024
7
-
- ignite-2024
8
5
ms.topic: include
9
-
ms.date: 06/30/2025
6
+
ms.date: 11/05/2025
10
7
ms.author: lajanuar
11
8
---
12
9
@@ -18,7 +15,7 @@ ms.author: lajanuar
18
15
19
16
---
20
17
21
-
Use this quickstart to send text summarization requests using the REST API. In the following example, you will use cURL to summarize documents or text-based customer service conversations.
18
+
Use this quickstart to send text summarization requests using the [REST API](/rest/api/language/analyze-documents/analyze-documents-submit-job/analyze-documents-submit-job?view=rest-language-analyze-documents-2024-11-15-preview&preserve-view=true&tabs=HTTP). In the following example, you will use cURL to summarize documents or text-based customer service conversations.
22
19
23
20
[!INCLUDE [Use Language Studio](../use-language-studio.md)]
24
21
@@ -71,7 +68,7 @@ The following example will get you started with text extractive summarization:
71
68
1. Copy the command below into a text editor. The BASH example uses the `\` line continuation character. If your console or terminal uses a different line continuation character, use that character instead.
72
69
73
70
```bash
74
-
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2023-04-01 \
71
+
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2024-11-15-preview \
75
72
-H "Content-Type: application/json" \
76
73
-H "Ocp-Apim-Subscription-Key: $LANGUAGE_KEY" \
77
74
-d \
@@ -83,7 +80,7 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2023-0
83
80
{
84
81
"id": "1",
85
82
"language": "en",
86
-
"text": "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the intersection of all three, there’s magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have pre-trained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI capabilities, achieving multi-sensory and multilingual learning that is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
83
+
"text": "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the intersection of all three, there's magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have pre-trained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI capabilities, achieving multi-sensory and multilingual learning that is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
87
84
}
88
85
]
89
86
},
@@ -107,13 +104,13 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-text/jobs?api-version=2023-0
107
104
4. Get the `operation-location` from the response header. The value will look similar to the following URL:
5. To get the results of the request, use the following cURL command. Be sure to replace `<my-job-id>` with the numerical ID value you received from the previous `operation-location` response header:
114
111
115
112
```bash
116
-
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-text/jobs/<my-job-id>?api-version=2023-04-01 \
113
+
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-text/jobs/<my-job-id>?api-version=2024-11-15-preview \
117
114
-H "Content-Type: application/json" \
118
115
-H "Ocp-Apim-Subscription-Key: $LANGUAGE_KEY"
119
116
```
@@ -158,7 +155,7 @@ curl -X GET $LANGUAGE_ENDPOINT/language/analyze-text/jobs/<my-job-id>?api-versio
158
155
"length": 192
159
156
},
160
157
{
161
-
"text": "At the intersection of all three, there’s magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better.",
158
+
"text": "At the intersection of all three, there's magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better.",
162
159
"rankScore": 0.63,
163
160
"offset": 517,
164
161
"length": 203
@@ -203,7 +200,7 @@ The following example will get you started with conversation issue and resolutio
203
200
1. Copy the command below into a text editor. The BASH example uses the `\` line continuation character. If your console or terminal uses a different line continuation character, use that character instead.
204
201
205
202
```bash
206
-
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-version=2023-04-01 \
203
+
curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-version=2024-11-15-preview \
207
204
-H "Content-Type: application/json" \
208
205
-H "Ocp-Apim-Subscription-Key: $LANGUAGE_KEY" \
209
206
-d \
@@ -215,19 +212,19 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-versi
215
212
{
216
213
"conversationItems": [
217
214
{
218
-
"text": "Hello, you’re chatting with Rene. How may I help you?",
215
+
"text": "Hello, you're chatting with Rene. How may I help you?",
219
216
"id": "1",
220
217
"role": "Agent",
221
218
"participantId": "Agent_1"
222
219
},
223
220
{
224
-
"text": "Hi, I tried to set up wifi connection for Smart Brew 300 espresso machine, but it didn’t work.",
221
+
"text": "Hi, I tried to set up wifi connection for Smart Brew 300 espresso machine, but it didn't work.",
225
222
"id": "2",
226
223
"role": "Customer",
227
224
"participantId": "Customer_1"
228
225
},
229
226
{
230
-
"text": "I’m sorry to hear that. Let’s see what we can do to fix this issue. Could you please try the following steps for me? First, could you push the wifi connection button, hold for 3 seconds, then let me know if the power light is slowly blinking on and off every second?",
227
+
"text": "I'm sorry to hear that. Let's see what we can do to fix this issue. Could you please try the following steps for me? First, could you push the wifi connection button, hold for 3 seconds, then let me know if the power light is slowly blinking on and off every second?",
231
228
"id": "3",
232
229
"role": "Agent",
233
230
"participantId": "Agent_1"
@@ -251,7 +248,7 @@ curl -i -X POST $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs?api-versi
251
248
"participantId": "Customer_1"
252
249
},
253
250
{
254
-
"text": "I’m very sorry to hear that. Let me see if there’s another way to fix the issue. Please hold on for a minute.",
251
+
"text": "I'm very sorry to hear that. Let me see if there's another way to fix the issue. Please hold on for a minute.",
255
252
"id": "7",
256
253
"role": "Agent",
257
254
"participantId": "Agent_1"
@@ -292,13 +289,13 @@ Only the `resolution` aspect supports sentenceCount. If you do not specify the `
292
289
4. Get the `operation-location` from the response header. The value will look similar to the following URL:
5. To get the results of the request, use the following cURL command. Be sure to replace `<my-job-id>` with the numerical ID value you received from the previous `operation-location` response header:
299
296
300
297
```bash
301
-
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs/<my-job-id>?api-version=2023-04-01 \
298
+
curl -X GET $LANGUAGE_ENDPOINT/language/analyze-conversations/jobs/<my-job-id>?api-version=2024-11-15-preview \
0 commit comments