Skip to content

Commit 965ccfa

Browse files
authored
chore: stop capturing redis oom set failure in sentry (#7528)
I was just looking into one of the errors that triggered our [Critical: Snuba errors over 30 minutes.](https://sentry.sentry.io/issues/alerts/rules/details/46321/?alert=230878&referrer=metric_alert_slack&detection_type=static&notification_uuid=653cde63-5ca2-4d50-8fa1-933cab7377bd) It was ResponseError: OOM command not allowed under OOM prevention. This comes from our redis cache when it reaches max memory. The cache has an lru eviction policy when it fills up so usually this doesnt happen, but occasionally it still can. We have a datadog metric, [snuba.read_through_cache.redis_cache_set_error](https://app.datadoghq.com/metric/explorer?graph_layout=stacked&start=1762286323568&end=1762891123568&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGniqyBoieABGGAB0RhhQTkza+JxOmRiZbM1wUBAaPX1saYhOOp3awFoiBVhORjCSIsh4GjyNGKOZ+IsAFACUIDwAulSu7niYoeE3qncYMaXx51cgGnJoOaDyDB-BAIHJJHAwHr3DQQTKJNCiODtBRg9IIqDwxFOehMZQiNweNDnfgjeSYZZyRTKBEiJSXHh8b5k8QAYSkwhgKFqaDQPCAA) that basically exactly tracks this errors occurrence. In order to reduce feed-sns noise I will remove this error from sentry and instead use the datadog metric to monitor the percentage of redis cache writes that fail, and have a monitor on it that will send in feed-sns if it ever gets too high. [Here is the new monitor I created](https://app.datadoghq.com/monitors/236008510) and here [a new widget I added in the api dashboard](https://app.datadoghq.com/dashboard/spg-jqb-tgz/snuba-api?fromUser=true&refresh_mode=paused&tpl_var_sentry_region%5B0%5D=us&from_ts=1762366350809&to_ts=1762370761457&live=false&tile_focus=1728346643271425) This PR stops capturing the sentry metric
1 parent ec9d8fb commit 965ccfa

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

snuba/state/cache/redis/backend.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
import sentry_sdk
55

6+
from redis import ResponseError
67
from redis.exceptions import ConnectionError, ReadOnlyError
78
from redis.exceptions import TimeoutError as RedisTimeoutError
89
from snuba import environment, settings
@@ -22,6 +23,11 @@
2223
RESULT_WAIT = 2
2324
SIMPLE_READTHROUGH = 3
2425

26+
DONT_CAPTURE_ERRORS = {
27+
# if you need to track this error, see datadog metric snuba.read_through_cache.redis_cache_set_error
28+
ResponseError("OOM command not allowed under OOM prevention."),
29+
}
30+
2531

2632
class RedisCache(Cache[TValue]):
2733
def __init__(
@@ -71,7 +77,8 @@ def __get_value_with_simple_readthrough(
7177
except Exception as e:
7278
if settings.RAISE_ON_READTHROUGH_CACHE_REDIS_FAILURES:
7379
raise e
74-
sentry_sdk.capture_exception(e)
80+
if e not in DONT_CAPTURE_ERRORS:
81+
sentry_sdk.capture_exception(e)
7582

7683
if timer is not None:
7784
timer.mark("cache_get")
@@ -91,7 +98,8 @@ def __get_value_with_simple_readthrough(
9198
)
9299
except Exception as e:
93100
metrics.increment("redis_cache_set_error", tags=metric_tags)
94-
sentry_sdk.capture_exception(e)
101+
if e not in DONT_CAPTURE_ERRORS:
102+
sentry_sdk.capture_exception(e)
95103
return value
96104
record_cache_hit_type(RESULT_EXECUTE)
97105
if timer is not None:

0 commit comments

Comments
 (0)