-
-
Notifications
You must be signed in to change notification settings - Fork 62
feat(deletes): support item attribute conditions in API and consumer #7537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
onewland
wants to merge
21
commits into
master
Choose a base branch
from
feat/delete-support-attribute-conditions-in-consumer
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+442
−52
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
07a62e8
feat(deletes): add support for delete by attribute in config
onewland f4530fe
[wip] add support for attribute_conditions in bulk_delete_query
onewland 7cf9ac1
wire in TraceItemFilter[s] to attribute_conditions in delete_from_sto…
onewland 0ccbaa2
small cleanup
onewland a61a7fc
log when delete by attribute accepted, but does nothing
onewland 0a508a4
make delete by attribute true no-op with return
onewland 8c806eb
move item_type into AttributeConditions class
onewland a77c7a0
make smoke test very marginally less dumb
onewland c355a18
[wip] wire through attribute_conditions in DeleteQueryMessage
onewland 7afb17c
[wip] resolve attribute_conditions to columns
onewland 771193d
remove whitespace
onewland fa0a3cb
Merge remote-tracking branch 'origin/master' into feat/delete-support…
onewland 94004cd
allow attribute_conditions deletes to proceed, if feature flag is ena…
onewland 5d26fa0
fix rename
onewland bfe41ee
[wip] functioning delete by attribute
onewland 55bf7b1
little test cleanup
onewland 8bd30d6
more test cleanup
onewland 394afd4
correct item type verification in API
onewland 7838e20
move local import
onewland 0b92678
remove AI noise
onewland 4143463
more test cleanup
onewland File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,9 @@ | ||
| from abc import ABC, abstractmethod | ||
| from collections import defaultdict | ||
| from typing import Mapping, MutableMapping, Sequence, Type | ||
| from typing import List, Mapping, MutableMapping, Sequence, Type | ||
|
|
||
| from snuba.datasets.storages.storage_key import StorageKey | ||
| from snuba.utils.hashes import fnv_1a | ||
| from snuba.web.bulk_delete_query import DeleteQueryMessage | ||
| from snuba.web.delete_query import ConditionsType | ||
|
|
||
|
|
@@ -54,14 +55,70 @@ def format(self, messages: Sequence[DeleteQueryMessage]) -> Sequence[ConditionsT | |
| ] | ||
|
|
||
|
|
||
| class IdentityFormatter(Formatter): | ||
| class EAPItemsFormatter(Formatter): | ||
| # Number of attribute buckets used in eap_items storage | ||
| # TODO: find a way to wire this in from a canonical source | ||
| NUM_ATTRIBUTE_BUCKETS = 40 | ||
|
|
||
| def format(self, messages: Sequence[DeleteQueryMessage]) -> Sequence[ConditionsType]: | ||
| return [msg["conditions"] for msg in messages] | ||
| """ | ||
| For eap_items storage, we need to resolve attribute_conditions to their | ||
| appropriate column names based on type: | ||
| - int/bool: Single columns (no bucketing) like attributes_int['key'] or attributes_bool['key'] | ||
| - string/float: Hash-bucketed columns like attributes_string_0['key'] or attributes_float_23['key'] | ||
|
|
||
| For example, if attribute_conditions has {"group_id": [123]}, we need to: | ||
| 1. Determine the type based on the values (int in this case) | ||
| 2. Since int doesn't use bucketing, add it as attributes_int['group_id'] = [123] | ||
| """ | ||
| formatted_conditions: List[ConditionsType] = [] | ||
|
|
||
| for message in messages: | ||
| conditions = dict(message["conditions"]) | ||
|
|
||
| # Process attribute_conditions if present | ||
| if "attribute_conditions" in message and message["attribute_conditions"]: | ||
| attribute_conditions = message["attribute_conditions"] | ||
|
|
||
| # For each attribute, determine its type and bucket (if applicable) | ||
| for attr_name, attr_values in attribute_conditions.items(): | ||
| if not attr_values: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this seems like a case that shouldn't happen? should we be logging something if it does? or is this a valid case and im just missing something |
||
| continue | ||
|
|
||
| # Determine the attribute type from the first value | ||
| # All values in the list should be of the same type | ||
| first_value = attr_values[0] | ||
| if isinstance(first_value, bool): | ||
| # Check bool before int since bool is a subclass of int in Python | ||
| attr_type = "bool" | ||
| elif isinstance(first_value, int): | ||
| attr_type = "int" | ||
| elif isinstance(first_value, float): | ||
| attr_type = "float" | ||
| else: | ||
| # Default to string for str and any other type | ||
| attr_type = "string" | ||
|
|
||
| # Only string and float attributes use bucketing | ||
| # int and bool attributes are stored in single columns | ||
| if attr_type in ("int", "bool"): | ||
| # No bucketing for int and bool | ||
| bucketed_column = f"attributes_{attr_type}['{attr_name}']" | ||
| else: | ||
| # Bucketing for string and float | ||
| bucket_idx = fnv_1a(attr_name.encode("utf-8")) % self.NUM_ATTRIBUTE_BUCKETS | ||
| bucketed_column = f"attributes_{attr_type}_{bucket_idx}['{attr_name}']" | ||
|
|
||
| conditions[bucketed_column] = attr_values | ||
|
|
||
| formatted_conditions.append(conditions) | ||
|
|
||
| return formatted_conditions | ||
|
|
||
|
|
||
| STORAGE_FORMATTER: Mapping[str, Type[Formatter]] = { | ||
| StorageKey.SEARCH_ISSUES.value: SearchIssuesFormatter, | ||
| # TODO: We will probably do something more sophisticated here in the future | ||
| # but it won't make much of a difference until we support delete by attribute | ||
| StorageKey.EAP_ITEMS.value: IdentityFormatter, | ||
| StorageKey.EAP_ITEMS.value: EAPItemsFormatter, | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import pytest | ||
| from sentry_protos.snuba.v1.request_common_pb2 import TraceItemType | ||
|
|
||
| from snuba.datasets.deletion_settings import get_trace_item_type_name | ||
|
|
||
|
|
||
| def test_get_trace_item_type_name_valid() -> None: | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_UNSPECIFIED) == "unspecified" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_SPAN) == "span" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_ERROR) == "error" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_LOG) == "log" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_UPTIME_CHECK) == "uptime_check" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_UPTIME_RESULT) == "uptime_result" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_REPLAY) == "replay" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_OCCURRENCE) == "occurrence" | ||
| assert get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_METRIC) == "metric" | ||
| assert ( | ||
| get_trace_item_type_name(TraceItemType.TRACE_ITEM_TYPE_PROFILE_FUNCTION) | ||
| == "profile_function" | ||
| ) | ||
|
|
||
|
|
||
| def test_get_trace_item_type_name_by_integer() -> None: | ||
| assert get_trace_item_type_name(0) == "unspecified" | ||
| assert get_trace_item_type_name(1) == "span" | ||
| assert get_trace_item_type_name(7) == "occurrence" | ||
|
|
||
|
|
||
| def test_get_trace_item_type_name_invalid() -> None: | ||
| with pytest.raises(ValueError, match="Unknown TraceItemType value: 999"): | ||
| get_trace_item_type_name(999) | ||
|
|
||
| with pytest.raises(ValueError, match="Unknown TraceItemType value: -1"): | ||
| get_trace_item_type_name(-1) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug:
EAPItemsFormatteruses a hardcodedNUM_ATTRIBUTE_BUCKETS, causing incorrect bucket indexing if configuration changes, leading to silent delete failures.Severity: CRITICAL | Confidence: 1.00
🔍 Detailed Analysis
The
EAPItemsFormatteruses a hardcodedNUM_ATTRIBUTE_BUCKETS = 40for calculating attribute bucket indices. If thenum_attribute_bucketsconfiguration changes (e.g., to 64), but the hardcoded value is not updated, delete queries will incorrectly calculate bucket indices. For example, an attribute hashing to 45 would targetattributes_string_5instead ofattributes_string_45, leading to silent failures where data is not deleted as intended, causing data inconsistency.💡 Suggested Fix
Parameterize
EAPItemsFormatterto receivenum_attribute_bucketsas a constructor argument, injecting the value from the canonical storage configuration, similar to other transformers.🤖 Prompt for AI Agent
Did we get this right? 👍 / 👎 to inform future reviews.
Reference_id: 2749980