-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Feature request
Is your feature request related to a problem? Please describe.
ClickHouse provides the sumMap aggregate function for summing values by keys in MAP-type data. This function is very useful in data analysis scenarios, especially for tag aggregation and metric summarization.
Currently, StarRocks does not support the sumMap function. Although similar functionality can be implemented via Java UDF, it has the following issues:
High Performance Overhead: Java UDF requires JNI calls and data serialization, resulting in 10-20x lower performance compared to native C++ implementation
Complex Deployment: Users need to manually write, package, upload JAR files and register functions
High Maintenance Cost: Each user needs to maintain their own UDF code independently
Describe the solution you'd like
Add a built-in sumMap aggregate function in StarRocks to provide ClickHouse-compatible functionality.
Function Signature
sumMap(map<K, V>) -> map<K, V>
Where:
K: Key type, supporting basic types like VARCHAR, INT, BIGINT
V: Value type, supporting numeric types (TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL)
Behavior Description
For multiple input MAPs, group by keys and sum values for the same keys:
-- Example
SELECT sumMap(data) FROM (
VALUES
(map{'a': 10, 'b': 20}),
(map{'a': 20, 'c': 30})
) t(data);
-- Expected result: {'a': 30, 'b': 20, 'c': 30}
Edge Case Handling
NULL Keys: Support NULL as a key, with NULL key values aggregated separately
NULL Values: NULL values are treated as 0 when summing
Empty MAP: Returns empty MAP
Overflow Handling: Numeric overflow behavior consistent with corresponding type's SUM function
Describe alternatives you've considered
Additional context
Performance Improvement: Native C++ implementation is 10-20x faster than Java UDF
Ease of Use: Users don't need to write and deploy UDF, works out of the box
Compatibility: Reduces migration cost from ClickHouse to StarRocks