fix: added field validation to aggregations #2768

mdashti · 2025-12-10T18:31:57Z

Ticket(s) Closed

Closes aggregations silently return empty results for invalid fields #2767

What

This PR adds validation to aggregation queries to ensure that field names exist in the schema and are configured as fast fields. Previously, aggregations would silently return empty results when given invalid field names, making it difficult to debug typos and configuration errors.

Why

Users were experiencing confusing behavior where aggregations with typos in field names would succeed but return empty results ({"buckets": []}), with no indication that the field didn't exist. This made it impossible to distinguish between:

A typo in the field name (configuration error)
A valid field with no data
A query that matched no documents

This behavior was inconsistent with other Tantivy query types (like ExistsQuery) and user expectations from SQL databases and Elasticsearch, where invalid column/field references return clear errors.

How

Modified the aggregation field accessor functions in accessor_helpers.rs:

get_ff_reader(): Now validates field existence before returning a column reader. Returns:
- FieldNotFound error if the field doesn't exist in the schema
- SchemaError if the field exists but isn't configured as a fast field
- Empty column only if the field is valid but has no values in the segment
get_all_ff_reader_or_empty(): Added the same validation logic for terms aggregations that handle multiple column types

The validation checks the schema to distinguish between non-existent fields and valid fields that happen to be empty in a particular segment, ensuring we only error on actual configuration problems.

Tests

Added test test_aggregation_invalid_field_returns_error covering all major aggregation types (date_histogram, histogram, terms, avg, range)
Fixed existing tests that were inadvertently using invalid field names

Breaking Change: Code using invalid field names in aggregations will now receive errors instead of empty results. This is intentional to catch configuration errors early.

PSeitz · 2025-12-10T23:54:38Z

The main difference is that we don't always have a fixed schema with JSON fields, which means a field could exist on one segment, but not another one.

Even if a requested field is not part of a JSON field, it would break use cases where you federate a query over different indices.

Adding some validation on top should easy with the get_fast_field_names method.

Fixed agg validation

fe29322

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: added field validation to aggregations #2768

fix: added field validation to aggregations #2768

Uh oh!

mdashti commented Dec 10, 2025

Uh oh!

PSeitz commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix: added field validation to aggregations #2768

Are you sure you want to change the base?

fix: added field validation to aggregations #2768

Uh oh!

Conversation

mdashti commented Dec 10, 2025

Ticket(s) Closed

What

Why

How

Tests

Uh oh!

PSeitz commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants