-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Description
I recently wrote a similar tool to run-resourcewatch here: https://codeberg.org/mdbooth/k8s-object-collector. I suspect I could bring some of that code to run-resourcewatch.
From skimming code comments I might be able to help with:
- Robust handling of restarts, including detection of missed deletes
origin/pkg/resourcewatch/operator/starter.go
Lines 18 to 19 in 2525941
// this doesn't appear to handle restarts cleanly. To do so it would need to compare the resource version that it is applying // to the resource version present and it would need to handle unobserved deletions properly. both are possible, neither is easy. - Asynchronous handling of informer notifications (and no informer)
origin/pkg/resourcewatch/controller/configmonitor/crd_controller.go
Lines 17 to 21 in 2525941
// this is an unusual controller. it really wants an pure watch stream, but that change is too big to reason about at // the moment. For the moment we'll allow it have synchronous handling of informer notifications. This has severe consequences // for cache correctness and latency, but it keeps me from having rip out more logic than I want to. // It doesn't logically need to run because there is no sync method. it's all handled by the gitStorage. // if you ask for a resource that doesn't exist, it will simply repeated error until it appears while watching all the other types.
k8s-object-collector runs each resource collector in a separate go thread in a loop which does list and watch. It doesn't use an informer. It emits meta events for the list and watch operations, so it can detect if an object it knows about wasn't listed, i.e. we missed a delete. The threads all write to a single channel, so the output is a synchronous combined stream of objects.
My test case wasn't collecting as many resources as run-resourcewatch, but it was collecting events and pods which likely make up the bulk of the volume. My workstation was entirely un-stressed simply writing an un-processed stream of json objects.
A concrete proposal:
- Copy (with appropriate refactoring) the
collectandfilterpackages from k8s-object-collector, which do raw collection and de-deduplication/delete reconstruction respectively. - Add 2 new commands to run-resourcewatch
collectdoes resource collection only, writing to a raw json fileto-gitpost-processes the output ofcollectinto the same format that is currently produced
- Maintain the existing behaviour of run-resourcewatch when called with the same arguments, doing both synchronously for compatibility with existing jobs
- Re-use the existing
resourcewatch/storagepackage to ensure the resulting git-repo output remains the same.
IIUC an issue with the current implementation is performance writing to git. Assuming the performance of writing a stream of json objects remains acceptable, moving the creation of the git repo to a post-processing step should resolve this.