-
Notifications
You must be signed in to change notification settings - Fork 470
chore(llmobs): send record metadata with experiment spans & make experiment spans i/o searchable #15545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 248 ± 3 ms. The average import time from base is: 249 ± 3 ms. The import time difference between this PR and base is: -0.7 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate gary/mlob-4708 (e7b6a2d) with baseline main (d323b6a) ❌ Test Failures (1 suite)❌ telemetryaddmetric - 29/30✅ 1-count-metric-1-timesTime: ✅ 3.429µs (SLO: <20.000µs 📉 -82.9%) vs baseline: 📈 +13.7% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +5.2% ✅ 1-count-metrics-100-timesTime: ✅ 199.829µs (SLO: <220.000µs -9.2%) vs baseline: -0.9% Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.4% ✅ 1-distribution-metric-1-timesTime: ✅ 3.273µs (SLO: <20.000µs 📉 -83.6%) vs baseline: -2.2% Memory: ✅ 34.662MB (SLO: <35.500MB -2.4%) vs baseline: +4.5% ✅ 1-distribution-metrics-100-timesTime: ✅ 216.240µs (SLO: <230.000µs -6.0%) vs baseline: -0.8% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.9% ✅ 1-gauge-metric-1-timesTime: ✅ 2.151µs (SLO: <20.000µs 📉 -89.2%) vs baseline: -5.3% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.069µs (SLO: <150.000µs -9.3%) vs baseline: -0.7% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.2% ✅ 1-rate-metric-1-timesTime: ✅ 3.073µs (SLO: <20.000µs 📉 -84.6%) vs baseline: -3.1% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.8% ✅ 1-rate-metrics-100-timesTime: ✅ 213.851µs (SLO: <250.000µs 📉 -14.5%) vs baseline: -0.9% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.7% ✅ 100-count-metrics-100-timesTime: ✅ 20.390ms (SLO: <22.000ms -7.3%) vs baseline: -1.1% Memory: ✅ 34.682MB (SLO: <35.500MB -2.3%) vs baseline: +4.8% ❌ 100-distribution-metrics-100-timesTime: ❌ 2.310ms (SLO: <2.300ms +0.4%) vs baseline: +1.4% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +3.5% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.409ms (SLO: <1.550ms -9.1%) vs baseline: +0.1% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ 100-rate-metrics-100-timesTime: ✅ 2.232ms (SLO: <2.550ms 📉 -12.5%) vs baseline: ~same Memory: ✅ 34.682MB (SLO: <35.500MB -2.3%) vs baseline: +4.7% ✅ flush-1-metricTime: ✅ 4.664µs (SLO: <20.000µs 📉 -76.7%) vs baseline: +2.0% Memory: ✅ 35.095MB (SLO: <35.500MB 🟡 -1.1%) vs baseline: +5.0% ✅ flush-100-metricsTime: ✅ 175.029µs (SLO: <250.000µs 📉 -30.0%) vs baseline: +0.2% Memory: ✅ 35.193MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +5.2% ✅ flush-1000-metricsTime: ✅ 2.180ms (SLO: <2.500ms 📉 -12.8%) vs baseline: -1.5% Memory: ✅ 35.881MB (SLO: <36.500MB 🟡 -1.7%) vs baseline: +4.7% 📈 Performance Regressions (1 suite)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.215µs (SLO: <10.000µs 📉 -47.9%) vs baseline: 📈 +26.0% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.3% ✅ ospathbasename_noaspectTime: ✅ 1.085µs (SLO: <10.000µs 📉 -89.1%) vs baseline: +0.8% Memory: ✅ 40.049MB (SLO: <41.000MB -2.3%) vs baseline: +4.4% ✅ ospathjoin_aspectTime: ✅ 6.138µs (SLO: <10.000µs 📉 -38.6%) vs baseline: -0.6% Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 2.293µs (SLO: <10.000µs 📉 -77.1%) vs baseline: -0.7% Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +5.1% ✅ ospathnormcase_aspectTime: ✅ 3.399µs (SLO: <10.000µs 📉 -66.0%) vs baseline: -1.7% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8% ✅ ospathnormcase_noaspectTime: ✅ 0.565µs (SLO: <10.000µs 📉 -94.4%) vs baseline: -0.9% Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.7% ✅ ospathsplit_aspectTime: ✅ 4.738µs (SLO: <10.000µs 📉 -52.6%) vs baseline: -0.2% Memory: ✅ 40.128MB (SLO: <41.000MB -2.1%) vs baseline: +4.6% ✅ ospathsplit_noaspectTime: ✅ 1.589µs (SLO: <10.000µs 📉 -84.1%) vs baseline: +0.1% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +5.1% ✅ ospathsplitdrive_aspectTime: ✅ 3.613µs (SLO: <10.000µs 📉 -63.9%) vs baseline: -0.5% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +5.2% ✅ ospathsplitdrive_noaspectTime: ✅ 0.699µs (SLO: <10.000µs 📉 -93.0%) vs baseline: +0.3% Memory: ✅ 40.049MB (SLO: <41.000MB -2.3%) vs baseline: +4.0% ✅ ospathsplitext_aspectTime: ✅ 4.484µs (SLO: <10.000µs 📉 -55.2%) vs baseline: -0.3% Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.8% ✅ ospathsplitext_noaspectTime: ✅ 1.391µs (SLO: <10.000µs 📉 -86.1%) vs baseline: +0.8% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.1% 🟡 Near SLO Breach (18 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
832f91c to
828abac
Compare
828abac to
5097a2e
Compare
emmettbutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
release note looks good
sabrenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good just some non-blocking questions!
| EXPERIMENT_DATASET_NAME_KEY = "_ml_obs.experiment_dataset_name" | ||
| EXPERIMENT_NAME_KEY = "_ml_obs.experiment_name" | ||
|
|
||
| # experiment context keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't these the same constants from another place in this file? can we reuse them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, we could, but the span type in the BE is distinct from the core span type (albeit the same path), so if we the same const on the SDK side to change the path then it might adversely effect the other span type (i.e. if experiments moved metadata to top level then it might be easily overlooked that core's metadata also moved to top level)
| """ | ||
| if input_value is not None: | ||
| span._set_ctx_item(EXPERIMENTS_INPUT, safe_json(input_value)) | ||
| span._set_ctx_item(EXPERIMENTS_INPUT, input_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to check - these values of Any (now not guaranteed strings which was enforced by safe_json) can be handled by the experiments intake? just been out of the loop a bit 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good! yes they can be handled by experiments intake now (experiments spans go through EVP intake anyway for the most part), would there be any problems with EVP intake? the processor can handle Any now and we would like to handle Any on the experiments side
Description
Testing
Experiment spans now have the corresponding record metadata attached

Before
After

All experiments fields that need to be searchable if user has an object for the column are now searchable

with this query on the internal search tool we can see there are 5 spans that match
@event_type:span experiment_id:4d07e9bf-1e6a-455e-8b13-814fcab6f8f7 @meta.input.question:"Which city serves as the capital of Canada?"which matches this experiment, there are 5 spans with the input that is keyed as the
questionabove and they are not stringified and parsed properlythe internal search tool will also show the input fields are not stringified like before (try looking at the i/o from experiment 8faea4c4-abcc-4cc2-9cfe-8e755d9e5445 for example)
Risks
Additional Notes