-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Labels
enhancementNew feature or requestNew feature or request
Description
How about reading data from a Parquet file without the fuss (inspired by https://duckdb.org/docs/guides/file_formats/parquet_import )?
Below a very sketchy, untested script how to implement such a thing. I actually could only guess what parquet.schema returns, since there's no docs yet about this (see #48).
The resulting table product_example2 from reading product_example.parquet (as defined in https://github.com/CrunchyData/pg_parquet) should be created without having created table product_example2 before hand:
CREATE OR REPLACE FUNCTION import_foreign_schema_from_parquet(uri TEXT, table_name TEXT DEFAULT 'parquet_table') RETURNS VOID AS $$
DECLARE
parquet_info RECORD;
columns TEXT := '';
BEGIN
-- Loop through the metadata returned by parquet.metadata function
FOR parquet_info IN EXECUTE 'SELECT column_name, type_name FROM parquet.metadata(' || quote_literal(uri) || ')' LOOP
-- Append column name and type to the columns string
columns := columns || quote_ident(parquet_info.column_name) || ' ' ||
CASE parquet_info.type_name
WHEN 'INT32' THEN 'INTEGER'
WHEN 'INT64' THEN 'BIGINT'
WHEN 'FLOAT' THEN 'REAL'
WHEN 'DOUBLE' THEN 'DOUBLE PRECISION'
WHEN 'BOOLEAN' THEN 'BOOLEAN'
WHEN 'UTF8' THEN 'TEXT'
WHEN 'LIST' THEN 'ANYELEMENT[]'
WHEN 'TIMESTAMP_MICROS' THEN 'TIMESTAMP'
WHEN 'TIMESTAMP_MILLIS' THEN 'TIMESTAMPTZ'
ELSE 'TEXT' -- Default to TEXT for unknown types
END || ', ';
END LOOP;
-- Remove trailing comma and space
columns := RTRIM(columns, ', ');
-- Execute the CREATE TABLE statement with the generated columns
EXECUTE 'CREATE TABLE ' || quote_ident(table_name) || ' (' || columns || ');';
END;
$$ LANGUAGE plpgsql;
-- Import foreign "schema" (i.e. table definition) from Parquet - creates table "product_example2":
SELECT import_foreign_schema_from_parquet('/tmp/product_example.parquet', 'product_example2');
-- Copy the Parquet file to the newly created table product_example2:
COPY product_example2 FROM '/tmp/product_example.parquet' (format 'parquet', compression 'gzip');
-- Show table product_example2 - should have the same structure as the original table product_example:
SELECT * FROM product_example2
UNION ALL
SELECT * FROM product_example;
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request