-
Notifications
You must be signed in to change notification settings - Fork 2
Description
This project will port the SGD-based graph visualization strategy used in odgi to FlatGFA. This is a computationally interesting approach to visualizing large pangenome graphs (unlike #222, which is a simpler strategy for visualizing small ones). It produces birds-eye pictures for understanding the structure of variation graphs like this one from the odgi docs:
The SGD-based approach to graph layout has been fairly well studied in the Panorama group, so there are several things to build on:
- The original
odgi layoutsource code. - Jiajie's GPU implementation of the algorithm and its associated PyTorch reference implementation.
- The extracted PGSGD kernel from the PangenomicsBench benchmark suite.
Our goal in this project is to be as lazy as possible, i.e., to reuse as much as possible. Wherever feasible, we will reuse the actual implementation of SGD-based graph layout and just change the data source to FlatGFA. We will produce the same layout file format that odgi uses, meaning that we can reuse odgi draw to actually produce the final images.
To that end, we will need to pick one of several approaches:
- Use the PyTorch reference implementation. This means figuring out exactly where that tool's data comes from, and then exposing enough stuff through the FlatGFA Python API that we can instead load the data from there.
- Use Jiajie's high-performance GPU implementation. This will require building a new C API to FlatGFA. Then we can link it as a library into a project that uses CUDA to do the hard work.
- Use odgi's original CPU implementation. This similarly requires creating that C API.
- As a last resort, reimplement the algorithm in Rust. Maybe we can use an off-the-shelf crate for the core SGD loop, or maybe we just implement that ourselves too.
Here's a step-by-step plan:
- Evaluate the above options and pick one.
- Do that, i.e., implement FlatGFA-based graph layout.
- Add some tests. Differential testing with odgi is probably impractical (because this is a randomized algorithm), but we can at least take some snapshots.
- Compare performance.
