-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Trino should support MONTHS and YEARS for partition projection so that some basic Hive partition patterns are covered like:
../year=2025/month=11/day=06/..../dt=2025-11/...
Athena vs Trino
Trino currently supports those interval units:
trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/projection/DateProjection.java
Line 83 in 7f380c0
| private static final Set<ChronoUnit> DATE_PROJECTION_INTERVAL_UNITS = ImmutableSet.of(DAYS, HOURS, MINUTES, SECONDS); |
Athena supports following values for projection.columnName.range: YEARS, MONTHS, HOURS, MINUTES, SECONDS
I'm aware of this comment in Trino:
Limited to only DAYS, HOURS, MINUTES, SECONDS as we are not fully sure how everything above day is implemented in Athena. So we limit it to a subset of interval units which are explicitly clear how to calculate. The rest will be implemented if this is required as it would require making compatibility tests for results received from Athena and verifying if we receive identical with Trino.
Default MONTH interval is not supported
Let's assume we are creating an external table with partitions like s3://my-bucket/my_table/created_date=2024-08 as following:
CREATE TABLE IF NOT EXISTS my_table (
something varchar,
"created_date" varchar WITH (
partition_projection_type = 'date',
partition_projection_range = ARRAY['2024-08','NOW'],
partition_projection_format = 'yyyy-MM'
)
)
WITH (
external_location = 's3://my-bucket/my_table',
format = 'PARQUET',
partition_projection_enabled = true,
partitioned_by = ARRAY['created_date']
)The interval unit is not specified in this create table statement but it will be assigned as MONTH by default since it's the smallest unit in this template, thanks to this method:
trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/projection/DateProjection.java
Lines 262 to 279 in 7f380c0
| private static ChronoUnit resolveDefaultChronoUnit(String columnName, String dateFormatPattern) | |
| { | |
| String datePatternWithoutText = dateFormatPattern.replaceAll("'.*?'", ""); | |
| if (datePatternWithoutText.contains("S") || datePatternWithoutText.contains("s") | |
| || datePatternWithoutText.contains("m") || datePatternWithoutText.contains("H")) { | |
| // When the provided dates are at single-day or single-month precision. | |
| throw new InvalidProjectionException( | |
| columnName, | |
| format( | |
| "Property: '%s' needs to be set when provided '%s' is less that single-day precision. Interval defaults to 1 day or 1 month, respectively. Otherwise, interval is required", | |
| COLUMN_PROJECTION_INTERVAL_UNIT, | |
| COLUMN_PROJECTION_FORMAT)); | |
| } | |
| if (datePatternWithoutText.contains("d")) { | |
| return DAYS; | |
| } | |
| return MONTHS; | |
| } |
but later the validation will throw an exception:
io.trino.plugin.hive.projection.InvalidProjectionException: Column projection for column 'created_date' failed. Property: 'partition_projection_interval_unit' value 'Months' is invalid. Available options: [Days, Hours, Minutes, Seconds]
from this condition:
trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/projection/DateProjection.java
Lines 136 to 144 in 7f380c0
| if (!DATE_PROJECTION_INTERVAL_UNITS.contains(intervalUnit)) { | |
| throw new InvalidProjectionException( | |
| columnName, | |
| format( | |
| "Property: '%s' value '%s' is invalid. Available options: %s", | |
| COLUMN_PROJECTION_INTERVAL_UNIT, | |
| intervalUnit, | |
| DATE_PROJECTION_INTERVAL_UNITS)); | |
| } |
This doesn't make sense to me. A default value assigned by the projection class is later considered as invalid. MONTHS and YEARS should be supported.
Related issue: #22760