Skip to content

Conversation

@cretz
Copy link
Member

@cretz cretz commented Dec 3, 2025

What was changed

Defer updating workflow completion metrics until completion accepted by server. This is done by capturing the metrics to update in the task handler result and then providing a post-completion runnable to be invoked if the gRPC method is a success.

This technically changes behavior to no longer record metrics in situations like unhandled commands, which is a more accurate approach.

Checklist

  1. Closes workflow_completed counter counts successful completions of workflow method instead of workflow executions #1590

@cretz cretz requested a review from a team as a code owner December 3, 2025 13:47

if (context.isCancelRequested()) {
workflowStateMachines.cancelWorkflow();
metricsScope.counter(MetricsType.WORKFLOW_CANCELED_COUNTER).inc(1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These completion counters are now tracked inside workflow state machines object

@Quinn-With-Two-Ns
Copy link
Contributor

@maciejdudko as co-maintainer may want to review as well

Copy link
Contributor

@maciejdudko maciejdudko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests could be more split up but it's not blocking. LGTM

Comment on lines 204 to 214
assertEquals(
2, counts.get(io.temporal.worker.MetricsType.WORKFLOW_COMPLETED_COUNTER).intValue());
assertEquals(
1, counts.get(io.temporal.worker.MetricsType.WORKFLOW_FAILED_COUNTER).intValue());
assertEquals(
1,
counts
.get(io.temporal.worker.MetricsType.WORKFLOW_CONTINUE_AS_NEW_COUNTER)
.intValue());
assertEquals(
1, counts.get(io.temporal.worker.MetricsType.WORKFLOW_CANCELED_COUNTER).intValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: splitting each workflow run into separate testcase would more strongly verify that the right counter was incremented in each scenario. Could also add checking presence of WORKFLOW_E2E_LATENCY.

Copy link
Member Author

@cretz cretz Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the tests. I wasn't going to and was going to add e2e check in here, but it turns out we calculate e2e as current - start event time, and the latter is extremely skewed with time skipping causing negative values (which Tally ignores), so it was easier to check in separate tests.

@cretz cretz merged commit a39d10c into temporalio:master Dec 10, 2025
23 of 24 checks passed
@cretz cretz deleted the metrics-after-rpc branch December 10, 2025 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

workflow_completed counter counts successful completions of workflow method instead of workflow executions

3 participants