Skip to content

Can't INSERT INTO existing partitions of Hive external table when partition location is customized #27271

@yufanho

Description

@yufanho

As title, when INSERT INTO an existing partition of an external Hive table with customized partition location, the partition won't be updated.
Can reproduce the issue in version 435 and 476.

Reproducing Flow

  1. Create an external table.
 CREATE TABLE tmp2 (v integer, k varchar)
 WITH (
    external_location = 's3://<table_location>',
    partitioned_by = ARRAY['k']
 );
  1. Register a partition k=k1 to a customized location k=k1_plus
CALL system.register_partition('<schema>', 'tmp2', array['k'], array['k1'], 's3://<table_location>/k=k1_plus');
  1. Change insert behavior to OVERWRITE
SET SESSION hive.insert_existing_partitions_behavior='OVERWRITE';
  1. INSERT INTO the table
INSERT INTO tmp2 values(1, 'k1'),(2, 'k2');
  1. Check the contents with paths
select *, "$path" from tmp2;

Result will look like

 v | k  |                                                             $path
---+----+-------------------------------------------------------------------------------------------------------------------------------
 2 | k2 | s3://<table_location>/k=k2/<some_file_name>
(1 row)

Note that the first row (1, 'k1') disappeared.
Further more, we can find objects created by Trino under the path s3://<table_location>/k=k1/, though it should be created under s3://<table_location>/k=k1_plus/

Investigation

Per my understanding, this issue is because the existing partition information is completely ignored when determining the target write path in this line, while the potential partition change is not committed to metastore.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions