-
Notifications
You must be signed in to change notification settings - Fork 18
Polygonize with watershed #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Love it! I've wanted us to iterate on the polygonization, and offer different techniques. Though I'd imagined that we would put them all under the Would it be possible to redo this to be a call like:
And then the existing one could be like |
|
I tested this with one of our prediction tifs in South America. Some usage feedback:
I ran it on the austria example in the repo and that worked (took ~7 minutes compared to <1min for baseline). There are some minor differences, but I don't see any major improvement... it seems like some polygons might be better, some might be worse.
|
|
@cholmes @hannah-rae
Simple is the default, so normal command should run (eg: I also added progress bar (especially since watershed is slow) RE: Max size, I don't have a number but we can find the limit experimentally (it could also depends on user hardware) |
|
Thanks @Gedeon-m-gedus! I tested the updated code and it works for me. I see the main branch needs to be merged and there are some pytest issues, but I approve merging once those are fixed. |
hannah-rae
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for me once the pytests are passing
ca74b11 to
e5ae991
Compare
|
@Gedeon-m-gedus Don't worry about the pytest issue as this is fixed from another PR now. I think you mainly need to resolve the conflicts now before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a watershed-based polygonization option and supporting CLI parameters, plus build/packaging restructuring (switch to hatchling, introduce pixi environment) and a fiboa fallback when parquet utilities are unavailable.
- Introduces hierarchical watershed instance segmentation (new algorithm branch in polygonize)
- Adds CLI options: --algorithm, --t_ext, --t_bound
- Adds dependency on higra and fallback stubs for missing fiboa parquet writer
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| pyproject.toml | Migrates build backend to hatchling and refactors project metadata layout. |
| pixi.toml | Adds new environment/dependency management configuration including higra and dev features. |
| ftw_tools/postprocess/polygonize.py | Implements watershed instance segmentation path, fiboa fallback, and supporting helper functions. |
| ftw_tools/cli.py | Exposes new polygonization algorithm and threshold options via CLI. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| import os | ||
| import re | ||
| import time | ||
| from typing import Optional |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional is imported but not used anywhere in the shown changes; remove the unused import to reduce clutter.
| from typing import Optional |
| import higra as hg | ||
|
|
||
| from ftw_tools.settings import SUPPORTED_POLY_FORMATS_TXT | ||
|
|
||
|
|
||
| def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2): |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
higra is imported unconditionally; users selecting the 'simple' algorithm without higra installed will now get an ImportError. Move this import inside the watershed branch (or wrap in try/except with a clear error) to preserve previous functionality for non-watershed usage.
| import higra as hg | |
| from ftw_tools.settings import SUPPORTED_POLY_FORMATS_TXT | |
| def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2): | |
| from ftw_tools.settings import SUPPORTED_POLY_FORMATS_TXT | |
| def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2): | |
| try: | |
| import higra as hg | |
| except ImportError: | |
| raise ImportError( | |
| "The 'higra' package is required for the 'watershed' algorithm. " | |
| "Please install it with 'pip install higra' to use this feature." | |
| ) |
| def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2): | ||
| extent = np.asarray(extent).squeeze().astype(np.float32) | ||
| boundary = np.asarray(boundary).squeeze().astype(np.float32) | ||
|
|
||
| if extent.shape != boundary.shape: | ||
| raise ValueError(f"extent and boundary must have same shape. Got {extent.shape} vs {boundary.shape}") | ||
|
|
||
| ext_binary = (extent >= t_ext).astype(np.uint8) | ||
| input_hws = boundary.copy() | ||
| input_hws[ext_binary == 0] = 1.0 | ||
|
|
||
| size = input_hws.shape[:2] | ||
| graph = hg.get_8_adjacency_graph(size) | ||
| edge_weights = hg.weight_graph(graph, input_hws, hg.WeightFunction.mean) | ||
| tree, altitudes = hg.watershed_hierarchy_by_dynamics(graph, edge_weights) | ||
|
|
||
| instances = hg.labelisation_horizontal_cut_from_threshold( | ||
| tree, altitudes, threshold=t_bound | ||
| ).astype(float) | ||
|
|
||
| instances[ext_binary == 0] = np.nan | ||
| return instances |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for a new non-trivial algorithmic function; add a docstring describing expected input ranges, shapes, meaning of t_ext/t_bound, and returned array semantics (NaN masking, instance labeling).
| def get_boundary(mask): | ||
| m = mask.copy() | ||
| m[m == 3] = 0 | ||
| field_mask = (m > 0).astype(np.uint8) | ||
|
|
||
| local_max = maximum_filter(m, size=3) | ||
| local_min = minimum_filter(m, size=3) | ||
| boundary = ((local_max != local_min) & (field_mask > 0)).astype(np.float32) | ||
|
|
||
| return boundary |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a docstring clarifying the assumed class encoding in mask (e.g., why value 3 is zeroed), the rationale for the 3x3 filters, and the expected output value range.
| """Polygonize the output from inference.""" | ||
|
|
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring not updated to document new parameters algorithm, t_ext, and t_bound; include descriptions and valid value ranges to aid CLI and API users.
| """Polygonize the output from inference.""" | |
| """ | |
| Polygonize the output from inference. | |
| Parameters | |
| ---------- | |
| input : str | |
| Path to the input file (e.g., mask or raster to polygonize). | |
| out : str | |
| Path to the output file (e.g., .parquet, .gpkg, etc.). | |
| simplify : bool, optional | |
| Whether to simplify polygons (default: True). | |
| min_size : int, optional | |
| Minimum polygon area in pixels to keep (default: 500). | |
| max_size : int or None, optional | |
| Maximum polygon area in pixels to keep (default: None, meaning no maximum). | |
| overwrite : bool, optional | |
| Whether to overwrite the output file if it exists (default: False). | |
| close_interiors : bool, optional | |
| Whether to close polygon interiors (default: False). | |
| algorithm : str, optional | |
| Polygonization algorithm to use. Options: | |
| - "simple": Basic polygonization (default) | |
| - "watershed": Use watershed-based segmentation | |
| (default: "simple") | |
| t_ext : float, optional | |
| Threshold for exterior (field) mask. Range: 0.0 to 1.0 (default: 0.5). | |
| Higher values may result in fewer polygons. | |
| t_bound : float, optional | |
| Threshold for boundary mask. Range: 0.0 to 1.0 (default: 0.2). | |
| Lower values may result in more sensitive boundary detection. | |
| Returns | |
| ------- | |
| None | |
| Notes | |
| ----- | |
| The function reads the input mask, extracts polygons using the specified algorithm and thresholds, | |
| and writes the result to the specified output file. | |
| """ |
| except ImportError: | ||
| FIBOA_AVAILABLE = False | ||
| print("Warning: fiboa_cli.parquet not available. Parquet output will use alternative implementation.") | ||
|
|
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid printing directly at import time; use warnings.warn (e.g., warnings.warn(..., RuntimeWarning)) so callers can filter or capture the message.
| def create_parquet(gdf, columns, collection, out, config, compression="brotli"): | ||
| import geopandas as gpd |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback create_parquet ignores columns, collection, and config arguments—either document this divergence or use them to preserve expected fiboa metadata (e.g., setting column order and required attributes) to reduce surprises.
| def create_parquet(gdf, columns, collection, out, config, compression="brotli"): | |
| import geopandas as gpd | |
| def create_parquet(gdf, columns, collection, out, config, compression="brotli"): | |
| """ | |
| Fallback implementation of create_parquet. | |
| - Uses columns argument to set column order and ensure all required columns are present. | |
| - Ignores collection and config arguments (fiboa metadata not preserved). | |
| """ | |
| import geopandas as gpd | |
| import pandas as pd | |
| # Ensure all required columns are present and in the correct order | |
| gdf = gdf.copy() | |
| for col in columns: | |
| if col not in gdf.columns: | |
| gdf[col] = pd.NA | |
| # Reorder columns (excluding geometry, which must be last for GeoPandas) | |
| geometry_col = gdf.geometry.name if hasattr(gdf, "geometry") else "geometry" | |
| non_geom_cols = [col for col in columns if col != geometry_col] | |
| ordered_cols = non_geom_cols + [geometry_col] if geometry_col in gdf.columns else non_geom_cols | |
| gdf = gdf[ordered_cols] |
| higra = ">=0.6.12,<0.7" | ||
|
|
||
| [pypi-dependencies] | ||
| ftw-tools = "*" |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the project itself (ftw-tools) as a pypi-dependency can create a circular or conflicting install when developing the repo; remove this line or pin only external runtime dependencies.



Add Watershed-based Polygonization for Improved Field Instance Separation
Adding hierarchical watershed-based instance segmentation to improve handling of touching/overlapping agricultural fields during polygonization.
Changes
ftw_tools/postprocess/watershed_polygonize.pywith watershed algorithm implementationpolygonize-watershedwith watershed-specific parameters (--t_ext, --t_bound).t_extis the threshold for binarizing fields extent andt_boundis threshold for cutting the hierarchical watershed tree.fiboa_cli.parquetis unavailableKey Features
Dependencies
conda install higra -c conda-forge. I think pip should work alsopip install higraExample Usage
Following images show sample visualization from the tests I did;
Input image:

Predicted fields with 3classes model:

Instance segmentation with this algorithm:

Performance Note