-
Notifications
You must be signed in to change notification settings - Fork 470
feat(pydantic_ai): support stream_output in pydantic_ai>=1.0.0
#15535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
| if exc_type: | ||
| self._dd_span.set_exc_info(exc_type, exc_val, exc_tb) | ||
| elif self._dd_integration.is_pc_sampled_llmobs(self._dd_span): | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a drive-by change, i don't think we really rely on is_pc_sampled_llmobs anymore, but can add it back if i'm wrong!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 252 ± 4 ms. The average import time from base is: 254 ± 3 ms. The import time difference between this PR and base is: -2.2 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate sabrenner/more-pydantic-support (73a734c) with baseline main (d238908) 📈 Performance Regressions (3 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.155µs (SLO: <10.000µs 📉 -48.5%) vs baseline: 📈 +24.7% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.7% ✅ ospathbasename_noaspectTime: ✅ 1.090µs (SLO: <10.000µs 📉 -89.1%) vs baseline: +0.2% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.4% ✅ ospathjoin_aspectTime: ✅ 6.140µs (SLO: <10.000µs 📉 -38.6%) vs baseline: ~same Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.1% ✅ ospathjoin_noaspectTime: ✅ 2.291µs (SLO: <10.000µs 📉 -77.1%) vs baseline: -0.5% Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.9% ✅ ospathnormcase_aspectTime: ✅ 3.449µs (SLO: <10.000µs 📉 -65.5%) vs baseline: +0.4% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.2% ✅ ospathnormcase_noaspectTime: ✅ 0.568µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.6% Memory: ✅ 40.383MB (SLO: <41.000MB 🟡 -1.5%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 4.772µs (SLO: <10.000µs 📉 -52.3%) vs baseline: +0.3% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.5% ✅ ospathsplit_noaspectTime: ✅ 1.589µs (SLO: <10.000µs 📉 -84.1%) vs baseline: -0.9% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.8% ✅ ospathsplitdrive_aspectTime: ✅ 3.642µs (SLO: <10.000µs 📉 -63.6%) vs baseline: -0.7% Memory: ✅ 40.344MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.2% ✅ ospathsplitdrive_noaspectTime: ✅ 0.695µs (SLO: <10.000µs 📉 -93.1%) vs baseline: -1.3% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.6% ✅ ospathsplitext_aspectTime: ✅ 4.519µs (SLO: <10.000µs 📉 -54.8%) vs baseline: ~same Memory: ✅ 40.364MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.5% ✅ ospathsplitext_noaspectTime: ✅ 1.382µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -0.5% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% 📈 iastaspectssplit - 12/12✅ rsplit_aspectTime: ✅ 1.587µs (SLO: <10.000µs 📉 -84.1%) vs baseline: 📈 +10.1% Memory: ✅ 40.088MB (SLO: <41.000MB -2.2%) vs baseline: +4.3% ✅ rsplit_noaspectTime: ✅ 0.579µs (SLO: <10.000µs 📉 -94.2%) vs baseline: ~same Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.0% ✅ split_aspectTime: ✅ 1.419µs (SLO: <10.000µs 📉 -85.8%) vs baseline: +0.6% Memory: ✅ 40.364MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.4% ✅ split_noaspectTime: ✅ 0.571µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.2% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.7% ✅ splitlines_aspectTime: ✅ 1.427µs (SLO: <10.000µs 📉 -85.7%) vs baseline: -0.8% Memory: ✅ 40.108MB (SLO: <41.000MB -2.2%) vs baseline: +4.8% ✅ splitlines_noaspectTime: ✅ 0.581µs (SLO: <10.000µs 📉 -94.2%) vs baseline: -1.2% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.438µs (SLO: <20.000µs 📉 -82.8%) vs baseline: 📈 +17.1% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.3% ✅ 1-count-metrics-100-timesTime: ✅ 202.810µs (SLO: <220.000µs -7.8%) vs baseline: -0.9% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.7% ✅ 1-distribution-metric-1-timesTime: ✅ 3.289µs (SLO: <20.000µs 📉 -83.6%) vs baseline: -0.8% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.4% ✅ 1-distribution-metrics-100-timesTime: ✅ 218.210µs (SLO: <230.000µs -5.1%) vs baseline: -1.3% Memory: ✅ 35.055MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.7% ✅ 1-gauge-metric-1-timesTime: ✅ 2.193µs (SLO: <20.000µs 📉 -89.0%) vs baseline: +0.4% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +5.0% ✅ 1-gauge-metrics-100-timesTime: ✅ 138.569µs (SLO: <150.000µs -7.6%) vs baseline: +0.5% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.6% ✅ 1-rate-metric-1-timesTime: ✅ 3.083µs (SLO: <20.000µs 📉 -84.6%) vs baseline: -0.5% Memory: ✅ 34.977MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +5.5% ✅ 1-rate-metrics-100-timesTime: ✅ 215.896µs (SLO: <250.000µs 📉 -13.6%) vs baseline: -0.5% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.6% ✅ 100-count-metrics-100-timesTime: ✅ 20.508ms (SLO: <22.000ms -6.8%) vs baseline: +0.2% Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.9% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.326ms (SLO: <2.550ms -8.8%) vs baseline: +1.2% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.5% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.438ms (SLO: <1.550ms -7.2%) vs baseline: +1.2% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.7% ✅ 100-rate-metrics-100-timesTime: ✅ 2.230ms (SLO: <2.550ms 📉 -12.5%) vs baseline: +0.1% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +5.0% ✅ flush-1-metricTime: ✅ 4.653µs (SLO: <20.000µs 📉 -76.7%) vs baseline: +0.7% Memory: ✅ 35.134MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.9% ✅ flush-100-metricsTime: ✅ 175.510µs (SLO: <250.000µs 📉 -29.8%) vs baseline: ~same Memory: ✅ 35.252MB (SLO: <35.500MB 🟡 -0.7%) vs baseline: +5.3% ✅ flush-1000-metricsTime: ✅ 2.185ms (SLO: <2.500ms 📉 -12.6%) vs baseline: ~same Memory: ✅ 35.999MB (SLO: <36.500MB 🟡 -1.4%) vs baseline: +4.6% 🟡 Near SLO Breach (15 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
ncybul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for jumping on this Sam! Left a few clarifying questions and suggestions but the bulk of this looks good. Lmk when you need another review
| if exc_type: | ||
| self._dd_span.set_exc_info(exc_type, exc_val, exc_tb) | ||
| elif self._dd_integration.is_pc_sampled_llmobs(self._dd_span): | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
| --- | ||
| features: | ||
| - | | ||
| pydantic_ai: Introduces tracing support for PydanticAI's ``Agent.run_stream.stream_output`` method as introduced in ``pydantic-ai>=1.0.0``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include a note under the LLM Observability scope as well here?
| # pydantic-ai>=1.0.0 requires Python 3.10+ | ||
| pys=select_pys(min_version="3.10", max_version="3.13"), | ||
| pkgs={ | ||
| "pydantic-ai": ["==0.3.0", "==0.4.4", "==1.0.0", latest], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the pydantic-ai packages below 1.0.0 here?
| ) | ||
| return self._generator | ||
|
|
||
| def stream_output(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job integrating this into the existing pattern! I also saw that pydantic released a new stream_responses method. I think we should either add support for that method in this PR or create a ticket to track that work (probably does not have to be too high of a priority but always nice to get ahead of customers raising this issue).
| ) | ||
|
|
||
| async def test_agent_run_stream_structured_with_tool(self, pydantic_ai, request_vcr, llmobs_events, mock_tracer): | ||
| @pytest.mark.skipif(PYDANTIC_AI_VERSION > (1, 0, 0), reason="Test specific for pydantic-ai > 1.0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skip reasons are kinda confusing me LOL. Are they swapped? This line here says we are skipping if the pydantic version is greater than 1.0.0 which implies that this test is for versions less than 1.0.0, but the skip reason says that the test is specific to versions greater than 1.0.0 😅
| if instructions is None: | ||
| expected_instructions = None if PYDANTIC_AI_VERSION <= (1, 0, 0) else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the default value for instructions different based on the pydantic version? Is it possible to update the agent manifest logic so that we enforce a list of instructions in all cases instead?
| final_result_span = trace[2] | ||
| assert len(llmobs_events) == 3 | ||
| assert llmobs_events[0] == expected_run_tool_span_event(tool_span) | ||
| assert llmobs_events[1] == expected_run_tool_span_event( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this extra span come from? Is this just how the newer pydantic versions work?
Note: Most LOC are from regenerating riot lockfiles.
Description
Supports
stream_outputon Pydantic AI agent results, which was added inpydantic_ai>=1.0.0. To help accomplish testing this, I've bumped the range of our testing support, and updated tests. The only real change was that for agent manifests, emptyinstructionsare now an empty array[]instead ofNone. I didn't change source and instead changed the test assertion to be conditional on the version since I don't think it makes sense to normalize it.Testing
Updated
riotfile.pywith newestpydantic_aiversionsRisks
None, all other tests passed with minimal source & test changes