This needs discussion as we mostly don't want to bloat Tulip.
It would be nice to have a /metrics endpoint in the backend following the OpenTelemetry format.
This could allow teams to monitor their instance and be alerted when something very wrong is happening (before the scoreboard).
Metrics wishlist:
- (Counter per service) Total count of TCP flows in the MongoDB
- (Counter per service) Total size in bytes of all payloads in the MongoDB
- (Counter per service) Total amount of FLAG OUT / IN
- (Counter) Total amount of backend API requests
- (Gauge) Average duration of backend API response time
I don't believe we should expose per-TCP flow information as the Tulip frontend is already made for that.