Skip to content

Support additional partition_projection_interval_units like MONTHS and YEARS #27233

@aksakalli

Description

@aksakalli

Trino should support MONTHS and YEARS for partition projection so that some basic Hive partition patterns are covered like:

  • ../year=2025/month=11/day=06/..
  • ../dt=2025-11/...

Athena vs Trino

Trino currently supports those interval units:

private static final Set<ChronoUnit> DATE_PROJECTION_INTERVAL_UNITS = ImmutableSet.of(DAYS, HOURS, MINUTES, SECONDS);

Athena supports following values for projection.columnName.range: YEARS, MONTHS, HOURS, MINUTES, SECONDS

I'm aware of this comment in Trino:

Limited to only DAYS, HOURS, MINUTES, SECONDS as we are not fully sure how everything above day is implemented in Athena. So we limit it to a subset of interval units which are explicitly clear how to calculate. The rest will be implemented if this is required as it would require making compatibility tests for results received from Athena and verifying if we receive identical with Trino.

Default MONTH interval is not supported

Let's assume we are creating an external table with partitions like s3://my-bucket/my_table/created_date=2024-08 as following:

CREATE TABLE IF NOT EXISTS my_table (
   something varchar,
   "created_date" varchar WITH (
       partition_projection_type = 'date',
       partition_projection_range = ARRAY['2024-08','NOW'],
       partition_projection_format = 'yyyy-MM'
    )
)
WITH (
   external_location = 's3://my-bucket/my_table',
   format = 'PARQUET',
   partition_projection_enabled = true,
   partitioned_by = ARRAY['created_date']
)

The interval unit is not specified in this create table statement but it will be assigned as MONTH by default since it's the smallest unit in this template, thanks to this method:

private static ChronoUnit resolveDefaultChronoUnit(String columnName, String dateFormatPattern)
{
String datePatternWithoutText = dateFormatPattern.replaceAll("'.*?'", "");
if (datePatternWithoutText.contains("S") || datePatternWithoutText.contains("s")
|| datePatternWithoutText.contains("m") || datePatternWithoutText.contains("H")) {
// When the provided dates are at single-day or single-month precision.
throw new InvalidProjectionException(
columnName,
format(
"Property: '%s' needs to be set when provided '%s' is less that single-day precision. Interval defaults to 1 day or 1 month, respectively. Otherwise, interval is required",
COLUMN_PROJECTION_INTERVAL_UNIT,
COLUMN_PROJECTION_FORMAT));
}
if (datePatternWithoutText.contains("d")) {
return DAYS;
}
return MONTHS;
}

but later the validation will throw an exception:

io.trino.plugin.hive.projection.InvalidProjectionException: Column projection for column 'created_date' failed. Property: 'partition_projection_interval_unit' value 'Months' is invalid. Available options: [Days, Hours, Minutes, Seconds]

from this condition:

if (!DATE_PROJECTION_INTERVAL_UNITS.contains(intervalUnit)) {
throw new InvalidProjectionException(
columnName,
format(
"Property: '%s' value '%s' is invalid. Available options: %s",
COLUMN_PROJECTION_INTERVAL_UNIT,
intervalUnit,
DATE_PROJECTION_INTERVAL_UNITS));
}

This doesn't make sense to me. A default value assigned by the projection class is later considered as invalid. MONTHS and YEARS should be supported.

Related issue: #22760

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions