Implement Adaptive Request Concurrency (ARC) for HTTP and gRPC Exporters

### Component(s)

exporter/exporterhelper

### Is your feature request related to a problem? Please describe.


Currently, configuring OTel exporters (e.g., `otlphttp`, `otlpgrpc`, `elasticsearch`, `loki`) to be resilient without overwhelming downstream services is a significant challenge. Users must manually tune static concurrency limits, typically `sending_queue.num_consumers`.

This creates a "vicious loop" for operators:

  * **Set concurrency too high:** The collector can easily overwhelm a downstream service (like Elasticsearch or a custom OTLP receiver), leading to HTTP `429` (Too Many Requests) / gRPC `RESOURCE_EXHAUSTED` errors, dropped data, and potential cascading failures.
  * **Set concurrency too low:** The collector under-utilizes the downstream service's capacity, leading to wasted resources, increased buffer usage (high memory/disk), and higher end-to-end latency.

This static limit is a "blunt instrument" for a dynamic problem. The optimal concurrency rate is not static; it changes constantly based on:

1.  The number of collector instances being deployed (e.g., in a Kubernetes HPA).
2.  The current capacity of the downstream service (e.g., an Elasticsearch cluster scaling up or down).
3.  The real-time volume of telemetry data being sent.

Operators are forced to "chase the dragon" by constantly re-tuning this static value, or they must provision backends to handle a worst-case scenario that may rarely occur, which is expensive.


### Describe the solution you'd like


I propose implementing an **Adaptive Request Concurrency (ARC)** mechanism within the `exporterhelper` to support both **HTTP and gRPC-based exporters**.

This feature would dynamically and automatically adjust the exporter's concurrency level (`sending_queue.num_consumers`) based on real-time feedback from the downstream service. The mechanism would be inspired by TCP congestion control algorithms (AIMD - Additive Increase, Multiplicative Decrease).

The core logic would be tailored to the protocol:

-----

### For HTTP Exporters (e.g., `otlphttp`, `elasticsearch`)

  * **Monitor key signals:**
      * **Round-Trip Time (RTT)** of requests. An [Exponentially Weighted Moving Average (EWMA)](https://en.wikiversity.org/wiki/Moving_Average/Exponential) could be used to establish a baseline RTT.
      * **HTTP Response Codes:** Specifically looking for success (`2xx`) vs. backpressure signals (`429`, `503`, or other `5xx` errors).
  * **Implement AIMD Logic:**
      * **Additive Increase:** If RTT is stable or decreasing AND HTTP responses are consistently successful (`2xx`), the collector should *linearly increase* its concurrency limit.
      * **Multiplicative Decrease:** If RTT starts to increase significantly (e.g., `current_rtt > baseline_rtt * rtt_threshold_ratio`) OR the exporter receives backpressure signals (`429`, `503`), the collector should *exponentially decrease* its concurrency limit.

-----

### For gRPC Exporters (e.g., `otlpgrpc`)

gRPC (built on HTTP/2) has native *flow control* for network-level backpressure, but this proposal addresses *application-level* backpressure (e.g., the receiving server's application logic is overwhelmed). The signals for this are explicit gRPC status codes.

  * **Monitor key signals:**
      * **gRPC Status Codes:** This is the primary signal.
          * **Success:** `OK` (Code 0)
          * **Backpressure Signals:** `RESOURCE_EXHAUSTED` (Code 8, the gRPC equivalent of HTTP 429) and `UNAVAILABLE` (Code 14, the gRPC equivalent of HTTP 503).
  * **Implement AIMD Logic:**
      * **Additive Increase:** On consistent `OK` responses, the collector should *linearly increase* its concurrency limit (the number of concurrent streams, controlled by `num_consumers`).
      * **Multiplicative Decrease:** On receiving `RESOURCE_EXHAUSTED` or `UNAVAILABLE` status codes, the collector should *exponentially decrease* its concurrency limit.

-----

This combined approach creates a feedback loop that automatically "finds" the optimal concurrency level that the downstream service can handle at any given moment, maximizing throughput while ensuring reliability for all major OTLP exporters.


### Proposed Configuration

This feature could be added to the `sending_queue` settings, where it would be leveraged by any exporter using the queue (both gRPC and HTTP).

**Example 1: Simple toggle**

```yaml
exporters:
  otlphttp:
    endpoint: "http://my-backend:4318"
    sending_queue:
      enabled: true
      queue_size: 1000
      num_consumers: adaptive # New "adaptive" keyword
```

**Example 2: Detailed configuration block (preferred)**

This would allow users to set boundaries and tune the algorithm if needed, while `num_consumers` would be the static alternative. This single config structure would work for both `otlphttp` and `otlpgrpc`.

```yaml
exporters:
  otlphttp:
    endpoint: "http://my-backend:4318"
    sending_queue:
      enabled: true
      queue_size: 1000
      # num_consumers: 10 # This would be ignored if adaptive_concurrency is enabled
      adaptive_concurrency:
        enabled: true
        min_concurrency: 1      # Optional: The floor for concurrency
        max_concurrency: 100    # Optional: The ceiling for concurrency
        # Optional: Algorithm tuning parameters with sane defaults
        # decrease_ratio: 0.9       # Factor to multiply by on "decrease" signal
        # rtt_threshold_ratio: 1.1  # e.g., trigger decrease if RTT > 110% of baseline (HTTP only)
```

### Describe alternatives you've considered

The alternative is the current state: manual, static tuning of `num_consumers`. This is inefficient, error-prone, and adds significant operational overhead, as described in the problem statement.

### Additional context

This proposal is heavily inspired by **Vector's "Adaptive Request Concurrency" (ARC) feature**, which solves this exact problem for its HTTP-based sinks. Vector's implementation (itself inspired by [work done at Netflix](https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581)) has proven to be extremely effective at improving reliability and performance.

By adopting a similar pattern, the OTel Collector would become a "better infrastructure citizen" out-of-the-box, reducing the tuning burden on users and making OTel-based pipelines more resilient to downstream slowdowns or failures.

### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Adaptive Request Concurrency (ARC) for HTTP and gRPC Exporters #14080

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

For HTTP Exporters (e.g., `otlphttp`, `elasticsearch`)

For gRPC Exporters (e.g., `otlpgrpc`)

Proposed Configuration

Describe alternatives you've considered

Additional context

Tip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Adaptive Request Concurrency (ARC) for HTTP and gRPC Exporters #14080

Description

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

For HTTP Exporters (e.g., otlphttp, elasticsearch)

For gRPC Exporters (e.g., otlpgrpc)

Proposed Configuration

Describe alternatives you've considered

Additional context

Tip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

For HTTP Exporters (e.g., `otlphttp`, `elasticsearch`)

For gRPC Exporters (e.g., `otlpgrpc`)