-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
Description
Add support for count_distinct aggregate function
Fenic currently supports the count() aggregate function and the drop_duplicates() DataFrame method, but does not yet support the count_distinct() aggregate function.
This issue is about adding support for count_distinct() to Fenic. Fenic represents expressions in a logical expression tree, which it then transpiles into Polars expressions for execution.
🛠️ What needs to be done
-
Add a new expression class
- Create a
CountDistinctExprclass that extendsAggregateExpr(which in turn extendsLogicalExpr). - Reference other aggregate expression definitions here:
aggregate.py
- Create a
-
Transpile the expression to Polars
- Extend the expression transpiler switch to convert
CountDistinctExprto a Polars expression. - You'll find the switch here:
expr_converter.py
- Extend the expression transpiler switch to convert
-
Expose a user-facing API
- Add a new
count_distinctfunction to Fenic’s public API with a clean docstring and appropriate typing. - Follow the style and conventions in this file:
builtin.py
- Add a new
-
Write unit tests
- Add tests for
count_distinctunder group-by aggregations to:
test_aggregations.py
- Add tests for
Feel free to ask questions or open a draft PR early if you're unsure about anything. We're happy to help!