Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Nov 7, 2025

Previously, the mz_ore::tracing::configure would initialize the Sentry transport and then return a TracingGuard that needed to be kept alive to prevent the Sentry transport from shutting down. Callers would usually keep the guard until the end of the current function scope. This caused us to miss reporting panics when a fatal error was bubbled up to the process' main before becoming a panic, and the guard was dropped in the process.

The fix taken here is to immediately forget the guard, to make sure the Sentry transport remains intact for the process lifetime. If there is ever a need to shut down the Sentry transport again we'll need to reconsider, but we can keep things simple for now.

Motivation

  • This PR fixes a previously unreported bug.

Panics caused by fatal errors that bubble up to main are not reported to Sentry.

Slack thread.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje marked this pull request as ready for review November 7, 2025 11:26
@teskje teskje requested review from a team and ggevay as code owners November 7, 2025 11:27
Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good way to write a test for this?

@teskje
Copy link
Contributor Author

teskje commented Nov 7, 2025

Is there a good way to write a test for this?

I think we'd need some way to run Materialize together with Sentry in tests. Then we could force a panic and check if it gets reported. The same thing would also have been useful with other Sentry reporting issues we had previously.

Previously, the `mz_ore::tracing::configure` would initialize the Sentry
transport and then return a `TracingGuard` that needed to be kept alive
to prevent the Sentry transport from shutting down. Callers would
usually keep the guard until the end of the current function scope. This
caused us to miss reporting panics when a fatal error was bubbled up to
the process' `main` before becoming a panic, and the guard was dropped
in the process.

The fix taken here is to immediately forget the guard, to make sure the
Sentry transport remains intact for the process lifetime. If there is
ever a need to shut down the Sentry transport again we'll need to
reconsider, but we can keep things simple for now.
@teskje teskje force-pushed the sentry-prevent-shutdown branch from cf205f0 to 908943b Compare November 7, 2025 11:39
@def-
Copy link
Contributor

def- commented Nov 7, 2025

We have test/tracing/mzcompose.py, which reports to a special sentry DSN, which we can then check via the API to see if the expected log has made it to it?

@teskje
Copy link
Contributor Author

teskje commented Nov 7, 2025

Ah, is that connected to our production Sentry instance? Yes, that would work. Though DSN authentication doesn't work to list issues in a project, we'd also need to supply a Sentry auth token.

@def-
Copy link
Contributor

def- commented Nov 7, 2025

Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@teskje
Copy link
Contributor Author

teskje commented Nov 12, 2025

I don't have time to look at testing the Sentry integration now, and I don't think we should block this fix on that. So I'm going ahead with merging. I've opened an issue to track adding tests: https://github.com/MaterializeInc/database-issues/issues/9897

@teskje
Copy link
Contributor Author

teskje commented Nov 12, 2025

TFTR!

@teskje teskje merged commit c1a55c7 into MaterializeInc:main Nov 12, 2025
129 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants