Polygonize with watershed #152

Gedeon-m-gedus · 2025-09-12T04:33:52Z

Add Watershed-based Polygonization for Improved Field Instance Separation

Adding hierarchical watershed-based instance segmentation to improve handling of touching/overlapping agricultural fields during polygonization.

Changes

New module: ftw_tools/postprocess/watershed_polygonize.py with watershed algorithm implementation
New CLI command: ftw inference polygonize-watershed with watershed-specific parameters (--t_ext, --t_bound).
t_ext is the threshold for binarizing fields extent and t_bound is threshold for cutting the hierarchical watershed tree.
Fallback support: Graceful handling when fiboa_cli.parquet is unavailable
Isolated implementation: Separate from existing polygonization to minimize conflicts

Key Features

Better separation of touching fields using hierarchical watershed
Configurable thresholds for extent and boundary detection. While the expected input field extent should be a probability map, this should work fine with ftw outputs because the function binarizes the input (background vs fields_and_boundaries).
Same output format compatibility as standard polygonization

Dependencies

It needs higra package, I installed it manually by conda install higra -c conda-forge. I think pip should work also pip install higra
All existing dependencies remain unchanged

Example Usage

ftw inference polygonize-watershed model_output_fields.tif -o output_fields_instances.geojson

Following images show sample visualization from the tests I did;

Input image:

Predicted fields with 3classes model:

Instance segmentation with this algorithm:

Performance Note

⚠️ Slower than standard polygonization - The hierarchical watershed algorithm is computationally more intensive but provides good field separation for complex cases.

cholmes · 2025-09-12T06:03:19Z

Love it! I've wanted us to iterate on the polygonization, and offer different techniques.

Though I'd imagined that we would put them all under the ftw inference polygonize command, instead of adding a different command.

Would it be possible to redo this to be a call like:

ftw inference polygonize --algorithm watershed -o output_fields_instances.geojson ?

And then the existing one could be like --algorithm simple (open to more descriptive names), and make that one the default.

hannah-rae · 2025-09-17T23:01:46Z

I tested this with one of our prediction tifs in South America. Some usage feedback:

I agree with @cholmes that it should be unified with the ftw inference polygonize command
There should be a progress bar like we have in the baseline polygonize (I ran it and was like... is it going?)
It seems like there could be a max size of tif this will work with. I ran the baseline method on my tif and it took ~1 minute and found 80.5k polygons (for an area of ~350 km x 350 km). The watershed method ran for >15 minutes and then died with no error - I'm guessing the process ran out of memory or something. Not sure we can do anything about this 🤷‍♀️

I ran it on the austria example in the repo and that worked (took ~7 minutes compared to <1min for baseline). There are some minor differences, but I don't see any major improvement... it seems like some polygons might be better, some might be worse.

baseline/GDAL polygonize:

watershed polygonize:

Gedeon-m-gedus · 2025-09-19T17:39:18Z

@cholmes @hannah-rae
Thanks for the feedback; I have updated the cli to be single command

Simple: ftw inference polygonize tests/data-files/mask.tif --algorithm simple -o test_simple.geojson
Watershed: ftw inference polygonize tests/data-files/mask.tif --algorithm watershed -o test_watershed.geojson

Simple is the default, so normal command should run (eg: ftw inference polygonize tests/data-files/mask.tif -o test_simple.geojson)

I also added progress bar (especially since watershed is slow)

RE: Max size, I don't have a number but we can find the limit experimentally (it could also depends on user hardware)

hannah-rae · 2025-09-24T23:35:02Z

Thanks @Gedeon-m-gedus! I tested the updated code and it works for me. I see the main branch needs to be merged and there are some pytest issues, but I approve merging once those are fixed.

hannah-rae

Good for me once the pytests are passing

hannah-rae · 2025-10-02T17:08:52Z

@Gedeon-m-gedus Don't worry about the pytest issue as this is fixed from another PR now. I think you mainly need to resolve the conflicts now before merging.

Copilot

Pull Request Overview

Adds a watershed-based polygonization option and supporting CLI parameters, plus build/packaging restructuring (switch to hatchling, introduce pixi environment) and a fiboa fallback when parquet utilities are unavailable.

Introduces hierarchical watershed instance segmentation (new algorithm branch in polygonize)
Adds CLI options: --algorithm, --t_ext, --t_bound
Adds dependency on higra and fallback stubs for missing fiboa parquet writer

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

File	Description
pyproject.toml	Migrates build backend to hatchling and refactors project metadata layout.
pixi.toml	Adds new environment/dependency management configuration including higra and dev features.
ftw_tools/postprocess/polygonize.py	Implements watershed instance segmentation path, fiboa fallback, and supporting helper functions.
ftw_tools/cli.py	Exposes new polygonization algorithm and threshold options via CLI.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-03T00:44:34Z