-
Notifications
You must be signed in to change notification settings - Fork 258
feat: rewrite alias/upstream computation in Go #4395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a major rewrite of the alias and upstream computation logic from Python to Go, and the changes are extensive but well-executed. The introduction of interfaces for cloud clients (CloudStorage, Publisher) and mock implementations for testing is a significant improvement in code quality and testability. The new relations Go command is well-structured, and the test coverage is comprehensive. The addition of a datastore emulator for testing is also a great enhancement to the development workflow. I've identified a minor resource leak and a point of code complexity that could be simplified, but overall this is a high-quality contribution.
| for _, upstream := range group.UpstreamIDs { | ||
| if _, ok := visited[upstream]; !ok && !slices.Contains(toVisit, upstream) { | ||
| toVisit = append(toVisit, upstream) | ||
| } else { | ||
| if u, ok := upstreamMap[vulnID]; !ok { | ||
| upstreamMap[vulnID] = []string{upstream} | ||
| } else if !slices.Contains(u, upstream) { | ||
| upstreamMap[vulnID] = append(u, upstream) | ||
| } | ||
| } | ||
| } | ||
| // Add the immediate upstreams of the vuln to the map | ||
| upstreamMap[vulnID] = group.UpstreamIDs | ||
| for _, upstream := range group.UpstreamIDs { | ||
| if _, ok := visited[upstream]; !ok && !slices.Contains(toVisit, upstream) { | ||
| toVisit = append(toVisit, upstream) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code for populating toVisit and upstreamMap appears to be more complex than necessary and contains some redundant logic. The first loop from line 135 seems to be partially overwritten by the assignment on line 147 and the second loop on line 148. This can be simplified for clarity and to remove the redundancy.
if len(group.UpstreamIDs) == 0 {
continue
}
// Add the immediate upstreams of the vuln to the map
upstreamMap[vulnID] = group.UpstreamIDs
for _, upstream := range group.UpstreamIDs {
if _, ok := visited[upstream]; !ok && !slices.Contains(toVisit, upstream) {
toVisit = append(toVisit, upstream)
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this logic is basically directly translated from the python implementation:
osv.dev/gcp/workers/alias/upstream_computation.py
Lines 171 to 181 in 3016163
| for upstream in upstreams: | |
| if upstream not in visited and upstream not in to_visit: | |
| to_visit.add(upstream) | |
| else: | |
| if bug_id not in upstream_map: | |
| upstream_map[bug_id] = set([upstream]) | |
| else: | |
| upstream_map[bug_id].add(upstream) | |
| # Add the immediate upstreams of the bug to the dict | |
| upstream_map[bug_id] = upstreams | |
| to_visit.update(upstreams - visited) |
8bd87f7 to
1da9942
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a significant and well-executed pull request that rewrites the alias and upstream computation logic from Python to Go. The new Go implementation is well-structured, and the introduction of interfaces for cloud clients and comprehensive test utilities is a fantastic improvement for testability and maintainability. The core logic appears to be a faithful translation of the original Python code. I've identified a couple of critical compilation errors related to the misuse of sync.WaitGroup that need to be addressed. Otherwise, the changes are excellent.
| vuln_proto.aliases[:] = aliases | ||
| if aliases_modified > modified: | ||
| modified = aliases_modified | ||
| fields = field.split(',') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe in the PR description what this change is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added:
One difference is the alias and upstream computations now run at the same time, and the writes to the Vulnerability protos now happen together. I had to modify the recoverer to be able to handle multiple updating multiple fields when the GCS generation is mismatched to support this properly.
go/cmd/exporter/exporter.go
Outdated
| // generate the final files, sending the data to be written to `routerToWriteCh`. | ||
| // 5. A pool of `writer` workers receive the file data and write it to the output. | ||
| gcsObjToDownloaderCh := make(chan *storage.ObjectHandle) | ||
| gcsObjToDownloaderCh := make(chan string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename this variable since it's taking in a path now
nit2: Alias string to a gcsPath type and put that as the channel type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to gcsPathToDownloaderCh
re: nit2 - I don't really see the benefit of doing that - it's fairly standard to use strings as paths in go (e.g. in the path package)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more to clarify whether you are meant to pass in a path, a name, or a vuln ID...etc. This makes it a bit more self documenting. I think you don't even need to cast when passing in a string for aliases
| if errors.Is(err, iterator.Done) { | ||
| break | ||
| } | ||
| for path, err := range vulnClient.Objects(ctx, gcsProtoPrefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so clean!
go/cmd/relations/upstream.go
Outdated
| query := datastore.NewQuery("Vulnerability").FilterField("upstream_raw", ">", "") | ||
| it := cl.Run(ctx, query) | ||
|
|
||
| vulns := make(map[string][]string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here about what vulns is suppose to contain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this to rawUpstreams which is more descriptive
go/cmd/relations/upstream.go
Outdated
| "google.golang.org/api/iterator" | ||
| ) | ||
|
|
||
| func computeUpstream(targetBugUpstream []string, bugs map[string][]string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment describing what this function does, and what the two args are. it's unclear why targetBugUpstream is a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added back in all the original docstrings from python to the go code, which should make it slightly clearer.
I also changed the function signature to be computeUpstream(vulnID []string, rawUpstreams map[string][]string) []string , which is simpler and clearer.
This is pretty much a direct translation of the existing python code (which makes some of it unidiomatic go), changed to use the new
Vulnerabilityinstead ofBug.One difference is the alias and upstream computations now run at the same time, and the writes to the Vulnerability protos now happen together. I had to modify the recoverer to be able to handle multiple updating multiple fields when the GCS generation is mismatched to support this properly.
Also ported over the datastore emulator manager, and made some interfaces for the google cloud clients (with the added benefit of introducing write retries which resolves #4355)
This has allowed my to bring over the existing tests.
I've renamed the
alias-computationcron torelations- Cloud Deploy is going to create a new workload for this, and the old one is going to have to be deleted manually.In a follow-up, I'm going to add another goroutine to calculate the related field, since these aren't being kept up to date at the moment.