Skip to content

Conversation

@Gedeon-m-gedus
Copy link
Collaborator

Add Watershed-based Polygonization for Improved Field Instance Separation

Adding hierarchical watershed-based instance segmentation to improve handling of touching/overlapping agricultural fields during polygonization.

Changes

  • New module: ftw_tools/postprocess/watershed_polygonize.py with watershed algorithm implementation
  • New CLI command: ftw inference polygonize-watershed with watershed-specific parameters (--t_ext, --t_bound).
    t_ext is the threshold for binarizing fields extent and t_bound is threshold for cutting the hierarchical watershed tree.
  • Fallback support: Graceful handling when fiboa_cli.parquet is unavailable
  • Isolated implementation: Separate from existing polygonization to minimize conflicts

Key Features

  • Better separation of touching fields using hierarchical watershed
  • Configurable thresholds for extent and boundary detection. While the expected input field extent should be a probability map, this should work fine with ftw outputs because the function binarizes the input (background vs fields_and_boundaries).
  • Same output format compatibility as standard polygonization

Dependencies

  • It needs higra package, I installed it manually by conda install higra -c conda-forge. I think pip should work also pip install higra
  • All existing dependencies remain unchanged

Example Usage

ftw inference polygonize-watershed model_output_fields.tif -o output_fields_instances.geojson

Following images show sample visualization from the tests I did;

Input image:
Screenshot 2025-09-11 at 8 53 34 PM

Predicted fields with 3classes model:
Screenshot 2025-09-11 at 8 53 43 PM

Instance segmentation with this algorithm:
Screenshot 2025-09-11 at 8 53 50 PM

Performance Note

⚠️ Slower than standard polygonization - The hierarchical watershed algorithm is computationally more intensive but provides good field separation for complex cases.

@cholmes
Copy link
Member

cholmes commented Sep 12, 2025

Love it! I've wanted us to iterate on the polygonization, and offer different techniques.

Though I'd imagined that we would put them all under the ftw inference polygonize command, instead of adding a different command.

Would it be possible to redo this to be a call like:

ftw inference polygonize --algorithm watershed -o output_fields_instances.geojson ?

And then the existing one could be like --algorithm simple (open to more descriptive names), and make that one the default.

@hannah-rae
Copy link
Member

I tested this with one of our prediction tifs in South America. Some usage feedback:

  • I agree with @cholmes that it should be unified with the ftw inference polygonize command
  • There should be a progress bar like we have in the baseline polygonize (I ran it and was like... is it going?)
  • It seems like there could be a max size of tif this will work with. I ran the baseline method on my tif and it took ~1 minute and found 80.5k polygons (for an area of ~350 km x 350 km). The watershed method ran for >15 minutes and then died with no error - I'm guessing the process ran out of memory or something. Not sure we can do anything about this 🤷‍♀️

I ran it on the austria example in the repo and that worked (took ~7 minutes compared to <1min for baseline). There are some minor differences, but I don't see any major improvement... it seems like some polygons might be better, some might be worse.

image

baseline/GDAL polygonize:
image

watershed polygonize:
image

@Gedeon-m-gedus
Copy link
Collaborator Author

@cholmes @hannah-rae
Thanks for the feedback; I have updated the cli to be single command

  • Simple: ftw inference polygonize tests/data-files/mask.tif --algorithm simple -o test_simple.geojson
  • Watershed: ftw inference polygonize tests/data-files/mask.tif --algorithm watershed -o test_watershed.geojson

Simple is the default, so normal command should run (eg: ftw inference polygonize tests/data-files/mask.tif -o test_simple.geojson)

I also added progress bar (especially since watershed is slow)

RE: Max size, I don't have a number but we can find the limit experimentally (it could also depends on user hardware)

@hannah-rae
Copy link
Member

Thanks @Gedeon-m-gedus! I tested the updated code and it works for me. I see the main branch needs to be merged and there are some pytest issues, but I approve merging once those are fixed.

Copy link
Member

@hannah-rae hannah-rae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me once the pytests are passing

@asmithml asmithml force-pushed the postprocessing/watershade_instances branch from ca74b11 to e5ae991 Compare September 30, 2025 16:07
@hannah-rae
Copy link
Member

@Gedeon-m-gedus Don't worry about the pytest issue as this is fixed from another PR now. I think you mainly need to resolve the conflicts now before merging.

@isaaccorley isaaccorley requested a review from Copilot October 3, 2025 00:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a watershed-based polygonization option and supporting CLI parameters, plus build/packaging restructuring (switch to hatchling, introduce pixi environment) and a fiboa fallback when parquet utilities are unavailable.

  • Introduces hierarchical watershed instance segmentation (new algorithm branch in polygonize)
  • Adds CLI options: --algorithm, --t_ext, --t_bound
  • Adds dependency on higra and fallback stubs for missing fiboa parquet writer

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

File Description
pyproject.toml Migrates build backend to hatchling and refactors project metadata layout.
pixi.toml Adds new environment/dependency management configuration including higra and dev features.
ftw_tools/postprocess/polygonize.py Implements watershed instance segmentation path, fiboa fallback, and supporting helper functions.
ftw_tools/cli.py Exposes new polygonization algorithm and threshold options via CLI.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

import os
import re
import time
from typing import Optional
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional is imported but not used anywhere in the shown changes; remove the unused import to reduce clutter.

Suggested change
from typing import Optional

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +51
import higra as hg

from ftw_tools.settings import SUPPORTED_POLY_FORMATS_TXT


def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2):
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

higra is imported unconditionally; users selecting the 'simple' algorithm without higra installed will now get an ImportError. Move this import inside the watershed branch (or wrap in try/except with a clear error) to preserve previous functionality for non-watershed usage.

Suggested change
import higra as hg
from ftw_tools.settings import SUPPORTED_POLY_FORMATS_TXT
def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2):
from ftw_tools.settings import SUPPORTED_POLY_FORMATS_TXT
def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2):
try:
import higra as hg
except ImportError:
raise ImportError(
"The 'higra' package is required for the 'watershed' algorithm. "
"Please install it with 'pip install higra' to use this feature."
)

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +72
def InstSegm(extent, boundary, t_ext=0.5, t_bound=0.2):
extent = np.asarray(extent).squeeze().astype(np.float32)
boundary = np.asarray(boundary).squeeze().astype(np.float32)

if extent.shape != boundary.shape:
raise ValueError(f"extent and boundary must have same shape. Got {extent.shape} vs {boundary.shape}")

ext_binary = (extent >= t_ext).astype(np.uint8)
input_hws = boundary.copy()
input_hws[ext_binary == 0] = 1.0

size = input_hws.shape[:2]
graph = hg.get_8_adjacency_graph(size)
edge_weights = hg.weight_graph(graph, input_hws, hg.WeightFunction.mean)
tree, altitudes = hg.watershed_hierarchy_by_dynamics(graph, edge_weights)

instances = hg.labelisation_horizontal_cut_from_threshold(
tree, altitudes, threshold=t_bound
).astype(float)

instances[ext_binary == 0] = np.nan
return instances
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for a new non-trivial algorithmic function; add a docstring describing expected input ranges, shapes, meaning of t_ext/t_bound, and returned array semantics (NaN masking, instance labeling).

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +84
def get_boundary(mask):
m = mask.copy()
m[m == 3] = 0
field_mask = (m > 0).astype(np.uint8)

local_max = maximum_filter(m, size=3)
local_min = minimum_filter(m, size=3)
boundary = ((local_max != local_min) & (field_mask > 0)).astype(np.float32)

return boundary
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring clarifying the assumed class encoding in mask (e.g., why value 3 is zeroed), the rationale for the 3x3 filters, and the expected output value range.

Copilot uses AI. Check for mistakes.
Comment on lines 99 to 100
"""Polygonize the output from inference."""

Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring not updated to document new parameters algorithm, t_ext, and t_bound; include descriptions and valid value ranges to aid CLI and API users.

Suggested change
"""Polygonize the output from inference."""
"""
Polygonize the output from inference.
Parameters
----------
input : str
Path to the input file (e.g., mask or raster to polygonize).
out : str
Path to the output file (e.g., .parquet, .gpkg, etc.).
simplify : bool, optional
Whether to simplify polygons (default: True).
min_size : int, optional
Minimum polygon area in pixels to keep (default: 500).
max_size : int or None, optional
Maximum polygon area in pixels to keep (default: None, meaning no maximum).
overwrite : bool, optional
Whether to overwrite the output file if it exists (default: False).
close_interiors : bool, optional
Whether to close polygon interiors (default: False).
algorithm : str, optional
Polygonization algorithm to use. Options:
- "simple": Basic polygonization (default)
- "watershed": Use watershed-based segmentation
(default: "simple")
t_ext : float, optional
Threshold for exterior (field) mask. Range: 0.0 to 1.0 (default: 0.5).
Higher values may result in fewer polygons.
t_bound : float, optional
Threshold for boundary mask. Range: 0.0 to 1.0 (default: 0.2).
Lower values may result in more sensitive boundary detection.
Returns
-------
None
Notes
-----
The function reads the input mask, extracts polygons using the specified algorithm and thresholds,
and writes the result to the specified output file.
"""

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +20
except ImportError:
FIBOA_AVAILABLE = False
print("Warning: fiboa_cli.parquet not available. Parquet output will use alternative implementation.")

Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid printing directly at import time; use warnings.warn (e.g., warnings.warn(..., RuntimeWarning)) so callers can filter or capture the message.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +22
def create_parquet(gdf, columns, collection, out, config, compression="brotli"):
import geopandas as gpd
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback create_parquet ignores columns, collection, and config arguments—either document this divergence or use them to preserve expected fiboa metadata (e.g., setting column order and required attributes) to reduce surprises.

Suggested change
def create_parquet(gdf, columns, collection, out, config, compression="brotli"):
import geopandas as gpd
def create_parquet(gdf, columns, collection, out, config, compression="brotli"):
"""
Fallback implementation of create_parquet.
- Uses columns argument to set column order and ensure all required columns are present.
- Ignores collection and config arguments (fiboa metadata not preserved).
"""
import geopandas as gpd
import pandas as pd
# Ensure all required columns are present and in the correct order
gdf = gdf.copy()
for col in columns:
if col not in gdf.columns:
gdf[col] = pd.NA
# Reorder columns (excluding geometry, which must be last for GeoPandas)
geometry_col = gdf.geometry.name if hasattr(gdf, "geometry") else "geometry"
non_geom_cols = [col for col in columns if col != geometry_col]
ordered_cols = non_geom_cols + [geometry_col] if geometry_col in gdf.columns else non_geom_cols
gdf = gdf[ordered_cols]

Copilot uses AI. Check for mistakes.
higra = ">=0.6.12,<0.7"

[pypi-dependencies]
ftw-tools = "*"
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the project itself (ftw-tools) as a pypi-dependency can create a circular or conflicting install when developing the repo; remove this line or pin only external runtime dependencies.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants