Skip to content

Conversation

@yuvmen
Copy link
Member

@yuvmen yuvmen commented Nov 3, 2025

Introduce training_mode parameter to send dual embeddings during model upgrades. Centralize model version config and add should_send_new_model_embeddings() to track which groups need new embeddings. Rename feature flag to be version-agnostic.

Introduce training_mode parameter to send dual embeddings during model upgrades.
Centralize model version config and add should_send_new_model_embeddings() to
track which groups need new embeddings. Rename feature flag to be version-agnostic.
@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##           master   #102623       +/-   ##
============================================
+ Coverage   70.58%    80.80%   +10.21%     
============================================
  Files        9200      8936      -264     
  Lines      392841    391345     -1496     
  Branches    25009     24848      -161     
============================================
+ Hits       277304    316234    +38930     
+ Misses     115111     74743    -40368     
+ Partials      426       368       -58     

@yuvmen yuvmen force-pushed the yuvmen/support-similiar-issues-model-transition branch from 51dd1a9 to 261abb2 Compare November 4, 2025 00:34
@yuvmen yuvmen marked this pull request as ready for review November 4, 2025 19:16
@yuvmen yuvmen requested review from a team as code owners November 4, 2025 19:16
# Enable v2 similarity grouping model (part of v2 grouping rollout)
manager.add("projects:similarity-grouping-v2-model", ProjectFeature, FeatureHandlerStrategy.FLAGPOLE, api_expose=False)
# Enable new similarity grouping model upgrade (version-agnostic rollout)
manager.add("projects:similarity-grouping-model-upgrade", ProjectFeature, FeatureHandlerStrategy.FLAGPOLE, api_expose=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure here if I should keep the old feature, if its safe to delete a feature and its usage in the same PR. This is basically just a rename of the flag.

result = "found_secondary"
maybe_send_seer_for_new_model_training(
event, secondary.existing_grouphash, secondary.variants
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Inconsistent Updates for Seer-Matched Groups Embeddings

Missing call to maybe_send_seer_for_new_model_training when a Seer match is found. The function is called when an existing grouphash is found via primary grouping (line 1293) and secondary grouping (lines 1305-1307), but it's not called when an existing grouphash is found via Seer matching (line 1316). This means that existing groups found through Seer similarity matching won't get their embeddings updated for the new model version, which is inconsistent with the behavior for groups found through other grouping methods.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong, they get updated on the regular flow its fine

gh_metadata = existing_grouphash.metadata
grouphash_seer_model = gh_metadata.seer_model if gh_metadata else None

if should_send_new_model_embeddings(event.project, grouphash_seer_model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use guard clause / early return here

if should_send_new_model_embeddings(event.project, grouphash_seer_model):
had_metadata = gh_metadata is not None
# Send training mode request (honor all checks like rate limits, circuit breaker, etc.)
if should_call_seer_for_grouping(event, variants, existing_grouphash):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

had_metadata = gh_metadata is not None
# Send training mode request (honor all checks like rate limits, circuit breaker, etc.)
if should_call_seer_for_grouping(event, variants, existing_grouphash):
record_did_call_seer_metric(event, call_made=True, blocker="none")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add some tag to this metric to say that we're calling seer from new_model_training?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yea agreed

get_seer_similar_issues(event, existing_grouphash, variants, training_mode=True)

# Record metrics for new model embedding requests
metrics.incr(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we have a bunch of telemetry in get_seer_similar_issues already, should we just consolidate there with a training_mode tag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yea for sure, this doesnt make sense, ill consolidate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants