-
Notifications
You must be signed in to change notification settings - Fork 32
stats: almost all of the data is inaccurate? #58
Description
Right now, there's a confirmed bug in OpenCensus when it comes to the DistributionData's Min/Max fields.
To see what the bug is and how it will be solved, see here: census-instrumentation/opencensus-go#1181
The TL;DR is that OpenCensus never restarts the Min/Max values (or any other values really) in between Flush Intervals and therefore the Min will remain the Min for the entirety of the process (so does the max). Therefore, on the DataDog UI, you'll never see a spike in Min's or a drop in Max's, as long as the process is alive. The Mean is also the Mean of the entire process and not every flush interval...So if you have a big spike in latency, it will be diluted by the many hours of regular traffic that will always be part of the Mean.
However, this one bug brought up a lot of underlying potential inaccuracies that makes almost everything in this exporter (on the stats side), inaccurately represented. To get all of the details, see the discussion starting with this comment: census-instrumentation/opencensus-go#1182 (comment)
It's a long discussion but here's the gist:
The way OpenCensus does Count(), Mean() and Sum() aggregations, are in a cumulative way while if the process restarts then those cumulative numbers are reset back to zero and DataDog backend needs to catch that by doing some magical time/value calculation and adjusting the values accordingly.
The Count() and Sum() can be solved by just remembering the last number and subtracting the current value from the previous value. That said, the OpenCensus maintainer did advice against keeping state in the exporter and I'm not yet sure why.
As for Mean(), I'm not sure if that can be solved and if so how.
That said, I just wanted to bring the maintainers attention to these issues to see if I'm missing something.