-
-
Notifications
You must be signed in to change notification settings - Fork 62
feat(query-pipeline): Add configs to customize maximum size of a query #7546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if ( | ||
| not cluster_name or cluster_name not in disable_max_query_size_check_for_clusters | ||
| ) and query_size_bytes > _max_query_size_bytes(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Substring matching used instead of list membership for disable_max_query_size_check_for_clusters config.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The in operator on line 229 performs Python substring matching instead of checking for membership in a list of cluster names. If disable_max_query_size_check_for_clusters is configured as a comma-separated string (e.g., "test_cluster,prod_cluster"), a cluster name like "test" will incorrectly match as a substring, leading to the max query size check being disabled when it should not be. This bypasses a critical safety mechanism.
💡 Suggested Fix
Split the disable_max_query_size_check_for_clusters string by comma into a list of cluster names, then check for cluster_name membership in that list.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: snuba/pipeline/stages/query_execution.py#L228-L230
Potential issue: The `in` operator on line 229 performs Python substring matching
instead of checking for membership in a list of cluster names. If
`disable_max_query_size_check_for_clusters` is configured as a comma-separated string
(e.g., "test_cluster,prod_cluster"), a cluster name like "test" will incorrectly match
as a substring, leading to the max query size check being disabled when it should not
be. This bypasses a critical safety mechanism.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference_id: 2826178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change should take a couple minutes and is worth making
| def _max_query_size_bytes() -> int: | ||
| return ( | ||
| state.get_int_config(MAX_QUERY_SIZE_BYTES_CONFIG, MAX_QUERY_SIZE_BYTES) | ||
| or MAX_QUERY_SIZE_BYTES | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: _max_query_size_bytes() incorrectly replaces 0 config value with default due to falsy evaluation.
Severity: HIGH | Confidence: High
🔍 Detailed Analysis
In _max_query_size_bytes(), when MAX_QUERY_SIZE_BYTES_CONFIG is explicitly set to 0, state.get_int_config() correctly returns 0. However, the subsequent or MAX_QUERY_SIZE_BYTES clause evaluates 0 as falsy, causing the default MAX_QUERY_SIZE_BYTES to be used instead of the configured 0. This prevents 0 from being a valid configuration value, limiting flexibility.
💡 Suggested Fix
Replace the or operator with an explicit is not None check to correctly handle 0 as a valid configuration value. For example, return config_value if config_value is not None else MAX_QUERY_SIZE_BYTES.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: snuba/pipeline/stages/query_execution.py#L166-L170
Potential issue: In `_max_query_size_bytes()`, when `MAX_QUERY_SIZE_BYTES_CONFIG` is
explicitly set to `0`, `state.get_int_config()` correctly returns `0`. However, the
subsequent `or MAX_QUERY_SIZE_BYTES` clause evaluates `0` as falsy, causing the default
`MAX_QUERY_SIZE_BYTES` to be used instead of the configured `0`. This prevents `0` from
being a valid configuration value, limiting flexibility.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference_id: 2826178
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
onewland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly fine but I agree with the AI that you should have some delimiter for the clusters where size checking is disabled and look for exact matches
| if ( | ||
| not cluster_name or cluster_name not in disable_max_query_size_check_for_clusters | ||
| ) and query_size_bytes > _max_query_size_bytes(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change should take a couple minutes and is worth making
We have instances where we want to allow for a bigger query size. This will add 2 settings to customize if we enforce query size for a given cluster or not:
ExecutionStage.disable_max_query_size_check_for_clusters: disable checking for max query size, leaving it to ClickHouse to reject the queryExecutionStage.max_query_size_bytes: customize the global query size limit Snuba enforces