-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Open
Description
As title, when INSERT INTO an existing partition of an external Hive table with customized partition location, the partition won't be updated.
Can reproduce the issue in version 435 and 476.
Reproducing Flow
- Create an external table.
CREATE TABLE tmp2 (v integer, k varchar)
WITH (
external_location = 's3://<table_location>',
partitioned_by = ARRAY['k']
);
- Register a partition k=k1 to a customized location k=k1_plus
CALL system.register_partition('<schema>', 'tmp2', array['k'], array['k1'], 's3://<table_location>/k=k1_plus');
- Change insert behavior to OVERWRITE
SET SESSION hive.insert_existing_partitions_behavior='OVERWRITE';
- INSERT INTO the table
INSERT INTO tmp2 values(1, 'k1'),(2, 'k2');
- Check the contents with paths
select *, "$path" from tmp2;
Result will look like
v | k | $path
---+----+-------------------------------------------------------------------------------------------------------------------------------
2 | k2 | s3://<table_location>/k=k2/<some_file_name>
(1 row)
Note that the first row (1, 'k1') disappeared.
Further more, we can find objects created by Trino under the path s3://<table_location>/k=k1/, though it should be created under s3://<table_location>/k=k1_plus/
Investigation
Per my understanding, this issue is because the existing partition information is completely ignored when determining the target write path in this line, while the potential partition change is not committed to metastore.
Metadata
Metadata
Assignees
Labels
No labels