Skip to content

Commit 1b97057

Browse files
andrewm4894claude
andcommitted
docs(otel): Update README with provider transformers and Mastra support
- Document provider transformers pattern for framework-specific data transformations - Add Mastra to architecture diagram and v1 instrumentation section - Explain v1 framework detection via instrumentation scope name - Document event type determination for v1 frameworks ($ai_span for root spans) - Add section on adding new provider transformers - Update design decisions with provider transformers and v1 event type logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent cae5807 commit 1b97057

File tree

1 file changed

+85
-8
lines changed
  • products/llm_analytics/backend/api/otel

1 file changed

+85
-8
lines changed

products/llm_analytics/backend/api/otel/README.md

Lines changed: 85 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@ OTLP HTTP Request
2323
| | |
2424
| | +---> posthog_native.py
2525
| | +---> genai.py
26+
| | | |
27+
| | | +---> providers/ (framework-specific transformations)
28+
| | | |
29+
| | | +---> mastra.py
30+
| | | +---> base.py
2631
| |
2732
| +---> event_merger.py (Redis cache for v2)
2833
|
@@ -74,9 +79,14 @@ v1 instrumentation sends complete LLM call data within span attributes using ind
7479
- Completions: `gen_ai.completion.0.role`, `gen_ai.completion.0.content`
7580
- Metadata: `gen_ai.request.model`, `gen_ai.usage.input_tokens`, etc.
7681

77-
**Processing**: When a span contains `prompt` or `completion` attributes (after extraction), the transformer recognizes it as v1 and sends the event immediately without caching. This works because v1 spans are self-contained.
82+
**Processing**: The transformer recognizes v1 in two ways:
7883

79-
**Package**: `opentelemetry-instrumentation-openai`
84+
1. Span contains `prompt` or `completion` attributes (after extraction)
85+
2. Framework detection via instrumentation scope name (e.g., `@mastra/otel` for Mastra)
86+
87+
When detected, events are sent immediately without caching since v1 spans are self-contained.
88+
89+
**Packages**: `opentelemetry-instrumentation-openai`, Mastra framework (`@mastra/otel-exporter`)
8090

8191
### v2 Instrumentation
8292

@@ -104,17 +114,19 @@ Main entry point for OTLP HTTP requests. Parses protobuf payloads and routes to
104114
Converts OTel spans to PostHog AI events using a waterfall attribute extraction pattern:
105115

106116
1. Extract PostHog-native attributes (highest priority)
107-
2. Extract GenAI semantic convention attributes (fallback)
117+
2. Extract GenAI semantic convention attributes (fallback, with provider transformers)
108118
3. Merge with PostHog attributes taking precedence
109119

110120
Determines event type based on span characteristics:
111121

112122
- `$ai_generation`: LLM completion requests (has model, tokens, and input)
113123
- `$ai_embedding`: Embedding requests (operation_name matches embedding patterns)
114-
- `$ai_trace`: Root spans (no parent)
115-
- `$ai_span`: All other spans
124+
- `$ai_trace`: Root spans (no parent) for v2 frameworks
125+
- `$ai_span`: All other spans, including root spans from v1 frameworks
126+
127+
**v1 Detection**: Checks for `prompt` or `completion` attributes OR framework scope name (e.g., `@mastra/otel`). v1 spans bypass the event merger.
116128

117-
Detects v1 vs v2 by checking for `prompt` or `completion` in extracted attributes. v1 spans bypass the event merger.
129+
**Event Type Logic**: For v1 frameworks like Mastra, root spans are marked as `$ai_span` (not `$ai_trace`) to ensure they appear in the tree hierarchy. This is necessary because `TraceQueryRunner` filters out `$ai_trace` events from the events array.
118130

119131
### logs_transformer.py
120132

@@ -141,7 +153,12 @@ Attribute extraction modules implementing semantic conventions:
141153

142154
**posthog_native.py**: Extracts PostHog-specific attributes prefixed with `posthog.ai.*`. These take precedence in the waterfall.
143155

144-
**genai.py**: Extracts OpenTelemetry GenAI semantic convention attributes (`gen_ai.*`). Handles indexed message fields by collecting attributes like `gen_ai.prompt.0.role` into structured message arrays.
156+
**genai.py**: Extracts OpenTelemetry GenAI semantic convention attributes (`gen_ai.*`). Handles indexed message fields by collecting attributes like `gen_ai.prompt.0.role` into structured message arrays. Supports provider-specific transformations for frameworks that use custom OTEL formats.
157+
158+
**providers/**: Framework-specific transformers for handling custom OTEL formats:
159+
160+
- **base.py**: Abstract base class defining the provider transformer interface (`can_handle()`, `transform_prompt()`, `transform_completion()`)
161+
- **mastra.py**: Transforms Mastra's wrapped message format (e.g., `{"messages": [...]}` for input, `{"text": "...", "files": [], ...}` for output) into standard PostHog format. Detected by instrumentation scope name `@mastra/otel`.
145162

146163
## Event Schema
147164

@@ -237,14 +254,74 @@ v2 can send multiple log events in a single HTTP request. The ingestion layer gr
237254

238255
### v1/v2 Detection
239256

240-
Rather than requiring explicit configuration, the transformer auto-detects instrumentation version by checking for `prompt` or `completion` attributes. This allows both patterns to coexist without configuration.
257+
Rather than requiring explicit configuration, the transformer auto-detects instrumentation version by:
258+
259+
1. Checking for `prompt` or `completion` attributes (after extraction)
260+
2. Detecting framework via instrumentation scope name (e.g., `@mastra/otel`)
261+
262+
This allows both patterns to coexist without configuration, and supports frameworks that don't follow standard attribute conventions.
263+
264+
### Provider Transformers
265+
266+
Some frameworks (like Mastra) wrap OTEL data in custom structures that don't match standard GenAI conventions. Provider transformers detect these frameworks (via instrumentation scope or attribute prefixes) and unwrap their data into standard format. This keeps framework-specific logic isolated while maintaining compatibility with the core transformer pipeline.
267+
268+
**Example**: Mastra wraps prompts as `{"messages": [{"role": "user", "content": [...]}]}` where content is an array of `{"type": "text", "text": "..."}` objects. The Mastra transformer unwraps this into standard `[{"role": "user", "content": "..."}]` format.
269+
270+
### Event Type Determination for v1 Frameworks
271+
272+
v1 frameworks create root spans that should appear in the tree hierarchy alongside their children. These root spans are marked as `$ai_span` (not `$ai_trace`) because `TraceQueryRunner` filters out `$ai_trace` events from the events array. This ensures v1 framework traces display correctly with proper parent-child relationships in the UI.
241273

242274
### TTL-Based Cleanup
243275

244276
The event merger uses 60-second TTL on cache entries. This automatically cleans up orphaned data from incomplete traces (e.g., lost log packets) without requiring background jobs or manual cleanup.
245277

246278
## Extending the System
247279

280+
### Adding New Provider Transformers
281+
282+
Create a new transformer in `conventions/providers/`:
283+
284+
```python
285+
from .base import ProviderTransformer
286+
from typing import Any
287+
288+
class CustomFrameworkTransformer(ProviderTransformer):
289+
"""Transform CustomFramework's OTEL format."""
290+
291+
def can_handle(self, span: dict[str, Any], scope: dict[str, Any]) -> bool:
292+
"""Detect CustomFramework by scope name or attributes."""
293+
scope_name = scope.get("name", "")
294+
return scope_name == "custom-framework-scope"
295+
296+
def transform_prompt(self, prompt: Any) -> Any:
297+
"""Transform wrapped prompt format to standard."""
298+
if not isinstance(prompt, str):
299+
return None
300+
301+
try:
302+
parsed = json.loads(prompt)
303+
# Transform custom format to standard
304+
return [{"role": "user", "content": parsed["text"]}]
305+
except (json.JSONDecodeError, KeyError):
306+
return None
307+
308+
def transform_completion(self, completion: Any) -> Any:
309+
"""Transform wrapped completion format to standard."""
310+
# Similar transformation logic
311+
pass
312+
```
313+
314+
Register in `conventions/providers/__init__.py`:
315+
316+
```python
317+
from .custom_framework import CustomFrameworkTransformer
318+
319+
PROVIDER_TRANSFORMERS = [
320+
CustomFrameworkTransformer,
321+
MastraTransformer,
322+
]
323+
```
324+
248325
### Adding New Semantic Conventions
249326

250327
Create a new extractor in `conventions/`:

0 commit comments

Comments
 (0)