-
Notifications
You must be signed in to change notification settings - Fork 470
fix(llmobs): persist annotation context across batches #15571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 245 ± 2 ms. The average import time from base is: 248 ± 2 ms. The import time difference between this PR and base is: -2.82 ± 0.08 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate zachg/llmobs_annotate_context_fix (e007b13) with baseline main (ddb3870) 📈 Performance Regressions (2 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.104µs (SLO: <10.000µs 📉 -49.0%) vs baseline: 📈 +21.3% Memory: ✅ 38.594MB (SLO: <41.000MB -5.9%) vs baseline: +4.8% ✅ ospathbasename_noaspectTime: ✅ 1.073µs (SLO: <10.000µs 📉 -89.3%) vs baseline: -0.6% Memory: ✅ 38.555MB (SLO: <41.000MB -6.0%) vs baseline: +4.9% ✅ ospathjoin_aspectTime: ✅ 6.001µs (SLO: <10.000µs 📉 -40.0%) vs baseline: -0.8% Memory: ✅ 38.535MB (SLO: <41.000MB -6.0%) vs baseline: +4.6% ✅ ospathjoin_noaspectTime: ✅ 2.299µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +0.8% Memory: ✅ 38.437MB (SLO: <41.000MB -6.3%) vs baseline: +4.4% ✅ ospathnormcase_aspectTime: ✅ 3.467µs (SLO: <10.000µs 📉 -65.3%) vs baseline: -0.3% Memory: ✅ 38.378MB (SLO: <41.000MB -6.4%) vs baseline: +4.5% ✅ ospathnormcase_noaspectTime: ✅ 0.563µs (SLO: <10.000µs 📉 -94.4%) vs baseline: -0.8% Memory: ✅ 38.594MB (SLO: <41.000MB -5.9%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 4.821µs (SLO: <10.000µs 📉 -51.8%) vs baseline: -0.3% Memory: ✅ 38.398MB (SLO: <41.000MB -6.3%) vs baseline: +4.2% ✅ ospathsplit_noaspectTime: ✅ 1.597µs (SLO: <10.000µs 📉 -84.0%) vs baseline: -0.7% Memory: ✅ 38.398MB (SLO: <41.000MB -6.3%) vs baseline: +4.5% ✅ ospathsplitdrive_aspectTime: ✅ 3.711µs (SLO: <10.000µs 📉 -62.9%) vs baseline: +1.0% Memory: ✅ 38.476MB (SLO: <41.000MB -6.2%) vs baseline: +4.4% ✅ ospathsplitdrive_noaspectTime: ✅ 0.695µs (SLO: <10.000µs 📉 -93.1%) vs baseline: -0.2% Memory: ✅ 38.378MB (SLO: <41.000MB -6.4%) vs baseline: +4.2% ✅ ospathsplitext_aspectTime: ✅ 4.536µs (SLO: <10.000µs 📉 -54.6%) vs baseline: +0.5% Memory: ✅ 38.574MB (SLO: <41.000MB -5.9%) vs baseline: +5.0% ✅ ospathsplitext_noaspectTime: ✅ 1.379µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -0.8% Memory: ✅ 38.594MB (SLO: <41.000MB -5.9%) vs baseline: +5.0% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.361µs (SLO: <20.000µs 📉 -83.2%) vs baseline: 📈 +15.6% Memory: ✅ 34.662MB (SLO: <35.500MB -2.4%) vs baseline: +4.7% ✅ 1-count-metrics-100-timesTime: ✅ 198.031µs (SLO: <220.000µs -10.0%) vs baseline: -0.1% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.7% ✅ 1-distribution-metric-1-timesTime: ✅ 3.192µs (SLO: <20.000µs 📉 -84.0%) vs baseline: -0.5% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +5.2% ✅ 1-distribution-metrics-100-timesTime: ✅ 211.634µs (SLO: <230.000µs -8.0%) vs baseline: -1.3% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.9% ✅ 1-gauge-metric-1-timesTime: ✅ 2.151µs (SLO: <20.000µs 📉 -89.2%) vs baseline: -1.0% Memory: ✅ 34.642MB (SLO: <35.500MB -2.4%) vs baseline: +4.6% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.310µs (SLO: <150.000µs -9.1%) vs baseline: ~same Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.9% ✅ 1-rate-metric-1-timesTime: ✅ 2.995µs (SLO: <20.000µs 📉 -85.0%) vs baseline: -1.2% Memory: ✅ 34.701MB (SLO: <35.500MB -2.2%) vs baseline: +4.9% ✅ 1-rate-metrics-100-timesTime: ✅ 210.214µs (SLO: <250.000µs 📉 -15.9%) vs baseline: -2.4% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.317ms (SLO: <22.000ms -7.7%) vs baseline: +0.6% Memory: ✅ 34.682MB (SLO: <35.500MB -2.3%) vs baseline: +4.9% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.220ms (SLO: <2.550ms 📉 -13.0%) vs baseline: -0.2% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +5.3% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.407ms (SLO: <1.550ms -9.2%) vs baseline: ~same Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.9% ✅ 100-rate-metrics-100-timesTime: ✅ 2.202ms (SLO: <2.550ms 📉 -13.6%) vs baseline: +0.7% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +5.0% ✅ flush-1-metricTime: ✅ 4.514µs (SLO: <20.000µs 📉 -77.4%) vs baseline: +0.3% Memory: ✅ 35.075MB (SLO: <35.500MB 🟡 -1.2%) vs baseline: +4.9% ✅ flush-100-metricsTime: ✅ 172.872µs (SLO: <250.000µs 📉 -30.9%) vs baseline: +0.3% Memory: ✅ 35.075MB (SLO: <35.500MB 🟡 -1.2%) vs baseline: +4.7% ✅ flush-1000-metricsTime: ✅ 2.166ms (SLO: <2.500ms 📉 -13.3%) vs baseline: -0.2% Memory: ✅ 35.979MB (SLO: <36.500MB 🟡 -1.4%) vs baseline: +5.1% 🟡 Near SLO Breach (14 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
PROFeNoM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is sound and explanations clear. LGTM! 🚢
Description
For this support card
Taking a look at the with_structured_output method it calls RunnableSequence.batch(), instead of BaseChatModel.batch() (what is called otherwise) which causes us to not reactivate the context after the first llm batch call.
Essentially RunnableSequence.batch() creates a root-level span that encompasses the entire operation. When this root span finishes, the context provider's cleanup logic runs and (because _reactivate=False) the annotation context is lost. Here’s a nice code flow cursor made for me:
User calls: gpt_4.with_structured_output(Schema).batch([...])
↓
RunnableSequence.batch() - TRACED (creates chain span as ROOT)
↓
Internally calls ChatModel.generate() - creates child LLM spans
↓
Chain span finishes → triggers context provider's _update_active()
↓
Since _reactivate=False, annotation context is NOT restored
↓
Second batch() has no active context → annotations don't apply
What we can do is track the context differently and reactivate it for batches, and then deactivate it when we know we’re actually leaving the context block.
Testing
Added tests to make sure we persist context across multiple roots
Risks
Additional Notes