-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[feature](catalog) support varbinary type mapping in hive/iceberg/paimon table #57821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as duplicate.
This comment was marked as duplicate.
2ada766 to
94c865d
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 34948 ms |
TPC-DS: Total hot run time: 188642 ms |
ClickBench: Total hot run time: 28.57 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 34556 ms |
TPC-DS: Total hot run time: 183655 ms |
ClickBench: Total hot run time: 28.39 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
### What problem does this PR solve? Problem Summary: support varbinary type mapping in DB2,MYSQL,Oracle,PostgreSQL,SQLServer JDBC catalog. u can control this when create catalog with property "enable.mapping.varbinary", default value is false. if it's true, will mapping the binary type to doris varbinary type, if it's false, will mapping the binary type to doris string type. Followup #57821 ### Release note support mapping varbinary type in JBDC catalog
### What problem does this PR solve? Problem Summary: support varbinary type mapping in DB2,MYSQL,Oracle,PostgreSQL,SQLServer JDBC catalog. u can control this when create catalog with property "enable.mapping.varbinary", default value is false. if it's true, will mapping the binary type to doris varbinary type, if it's false, will mapping the binary type to doris string type. Followup #57821 ### Release note support mapping varbinary type in JBDC catalog
…mon table (apache#57821) Problem Summary: support varbinary type in hive/iceberg/paimon table, could mapping varbinary type into doris directly, not of use string type, could use catalog properties enable.mapping.varbinary control it, and default is false. and TVF function, eg HDFS also have param could control, and default is false. 1. when parquet file column type is tparquet::Type::BYTE_ARRAY and no logicalType and converted_type,read it to column_varbianry directly. so both physical convert and logical convert are consistent. if tparquet::Type::BYTE_ARRAY and have set logicalType, eg String, so those will be reading as column_string, and if the table column create as binary column, so VarBinaryConverter used convert column_string to column_varbinary. 2. when orc file column is binary type, also mapping to varbinary type directly, and could reuse StringVectorBatch. 3. add cast between string and varbinary type. 4. mapping UUID to binary type instead of string in iceberg . 5. change the bool safe_cast_string(**const char\* startptr, size_t buffer_size**, xxx) signature to safe_cast_string(**const StringRef& str_ref**, xxx). 6. add **const** to read_date_text_impl function. 7. add some test with paimon catalog test varbinary, will add more case for hive/iceberg and update doc. ``` mysql> show create table binary_demo3; +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | binary_demo3 | CREATE TABLE `binary_demo3` ( `id` int NULL, `record_name` char(10) NULL, `vrecord_name` text NULL, `bin` varbinary(10) NULL, `varbin` varbinary(2147483647) NULL ) ENGINE=PAIMON_EXTERNAL_TABLE LOCATION 'file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3' PROPERTIES ( "path" = "file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3", "primary-key" = "id" ); | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select *, length(record_name),length(vrecord_name),length(bin),length(varbin) from binary_demo3; +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ | id | record_name | vrecord_name | bin | varbin | length(record_name) | length(vrecord_name) | length(bin) | length(varbin) | +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ | 1 | AAAA | AAAA | 0xAAAA0000000000000000 | 0xAAAA | 10 | 4 | 10 | 2 | | 2 | 6161 | 6161 | 0x61610000000000000000 | 0x6161 | 10 | 4 | 10 | 2 | | 3 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ ``` support varbinary type mapping in hive/iceberg/paimon table
…/iceberg/paimon table (#57821) (#58482) ### What problem does this PR solve? Problem Summary: cherry-pick from (#57821) ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
### What problem does this PR solve? Problem Summary: support varbinary type mapping in DB2,MYSQL,Oracle,PostgreSQL,SQLServer JDBC catalog. u can control this when create catalog with property "enable.mapping.varbinary", default value is false. if it's true, will mapping the binary type to doris varbinary type, if it's false, will mapping the binary type to doris string type. Followup #57821 ### Release note support mapping varbinary type in JBDC catalog
### What problem does this PR solve? the introduced pr: #57821
### What problem does this PR solve? the introduced pr: #57821
What problem does this PR solve?
Problem Summary:
support varbinary type in hive/iceberg/paimon table, could mapping varbinary type into doris directly, not of use string type, could use catalog properties enable.mapping.varbinary control it, and default is false.
and TVF function, eg HDFS also have param could control, and default is false.
when parquet file column type is tparquet::Type::BYTE_ARRAY and no logicalType and converted_type,read it to column_varbianry directly. so both physical convert and logical convert are consistent.
if tparquet::Type::BYTE_ARRAY and have set logicalType, eg String, so those will be reading as column_string, and if the table column create as binary column, so VarBinaryConverter used convert column_string to column_varbinary.
when orc file column is binary type, also mapping to varbinary type directly, and could reuse StringVectorBatch.
add cast between string and varbinary type.
mapping UUID to binary type instead of string in iceberg .
change the bool safe_cast_string(const char* startptr, size_t buffer_size, xxx) signature to safe_cast_string(const StringRef& str_ref, xxx).
add const to read_date_text_impl function.
add some test with paimon catalog test varbinary, will add more case for hive/iceberg and update doc.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)