Skip to content

Report non-fatal errors, detect hangs #61

@mhofman

Description

@mhofman

#30 and Agoric/agoric-sdk#4114 are both examples of non-fatal errors that get printed on the output of the solo or chain, but do not prevent the load cycles from completing. These are still believed to be regressions however, and it's be good to surface them. One solution might be to count and report the number of errors reported on the output of the chain / solo.

In a similar way, there are errors that do not cause an abnormal termination, but that cause a hang. Since the runner has no way to know if the task is simply still pending, or actually hung, we could report some metrics about pending tasks at the end of the cycle, e.g. how many, how long they've been pending, or even when is the last time a task was started before shutdown (and if that's much higher than max time it took a task to complete, fail?).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions