fix: added field validation to aggregations #2768
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ticket(s) Closed
What
This PR adds validation to aggregation queries to ensure that field names exist in the schema and are configured as fast fields. Previously, aggregations would silently return empty results when given invalid field names, making it difficult to debug typos and configuration errors.
Why
Users were experiencing confusing behavior where aggregations with typos in field names would succeed but return empty results (
{"buckets": []}), with no indication that the field didn't exist. This made it impossible to distinguish between:This behavior was inconsistent with other Tantivy query types (like
ExistsQuery) and user expectations from SQL databases and Elasticsearch, where invalid column/field references return clear errors.How
Modified the aggregation field accessor functions in
accessor_helpers.rs:get_ff_reader(): Now validates field existence before returning a column reader. Returns:FieldNotFounderror if the field doesn't exist in the schemaSchemaErrorif the field exists but isn't configured as a fast fieldget_all_ff_reader_or_empty(): Added the same validation logic for terms aggregations that handle multiple column typesThe validation checks the schema to distinguish between non-existent fields and valid fields that happen to be empty in a particular segment, ensuring we only error on actual configuration problems.
Tests
test_aggregation_invalid_field_returns_errorcovering all major aggregation types (date_histogram, histogram, terms, avg, range)Breaking Change: Code using invalid field names in aggregations will now receive errors instead of empty results. This is intentional to catch configuration errors early.