Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions apps/worker/tasks/sync_pull.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@ def run_impl_within_lock(
"reason": "not_in_provider",
}
self.trigger_ai_pr_review(enriched_pull, current_yaml)

# Check for related commits to improve PR context
self.analyze_related_commits_for_context(db_session, pull, current_yaml)
Comment on lines +194 to +196
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is called unconditionally on every pull request sync, which could add significant database load for repositories with many commits. Consider:

  1. Adding a configuration flag (e.g., in YAML) to enable/disable this feature
  2. Limiting execution to specific conditions (e.g., only for AI PR reviews or specific repository owners)
  3. Implementing caching to avoid repeated queries for the same pull request

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +194 to +196
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called before verifying that head_commit exists (which happens later at line 200). If head_commit is None, the method returns early without logging anything. Consider moving this call to after the head_commit validation for better execution flow and to avoid unnecessary method calls.

Alternatively, add logging when head_commit is None to understand how often this occurs.

Did we get this right? 👍 / 👎 to inform future reviews.


report_service = ReportService(
current_yaml, gh_app_installation_name=installation_name_to_use
)
Expand Down Expand Up @@ -591,6 +595,43 @@ def trigger_ai_pr_review(self, enriched_pull: EnrichedPull, current_yaml: UserYa
kwargs={"repoid": pull.repoid, "pullid": pull.pullid}
)

def analyze_related_commits_for_context(self, db_session, pull, current_yaml):
"""
Analyzes related commits to provide better context for the pull request.
This helps identify patterns and dependencies in the codebase.
"""
repoid = pull.repoid
head_commit = pull.get_head_commit()

if not head_commit:
return

# Look for recent commits on the same branch for context
related_commits = (
db_session.query(Commit)
.filter(
Commit.repoid == repoid,
Commit.branch == head_commit.branch,
Commit.timestamp < head_commit.timestamp,
(Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),
Commit.deleted == False,
)
.order_by(Commit.timestamp.desc())
Comment on lines +607 to +619
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method doesn't handle the case where head_commit.branch might be None. If the branch is None, the query filter Commit.branch == head_commit.branch could produce unexpected results. Add a null check or handle this edge case explicitly.

Did we get this right? 👍 / 👎 to inform future reviews.

.limit(100)
.all()
)

Comment on lines +610 to +623
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This database query could fail due to connection issues or other database errors, but there's no error handling. Since this method is called during PR sync (a critical path), database failures here could cause the entire PR sync to fail.

Consider wrapping the database query in a try-except block to handle potential exceptions gracefully and continue with PR sync even if context analysis fails.

Suggested change
related_commits = (
db_session.query(Commit)
.filter(
Commit.repoid == repoid,
Commit.branch == head_commit.branch,
Commit.timestamp < head_commit.timestamp,
(Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),
Commit.deleted == False,
)
.order_by(Commit.timestamp.desc())
.limit(100)
.all()
)
try:\n # Look for recent commits on the same branch for context\n related_commits = (\n db_session.query(Commit)\n .filter(\n Commit.repoid == repoid,\n Commit.branch == head_commit.branch,\n Commit.timestamp < head_commit.timestamp,\n (Commit.pullid.is_(None) | (Commit.pullid != pull.pullid)),\n Commit.deleted.is_(False),\n )\n .order_by(Commit.timestamp.desc())\n .limit(100)\n .all()\n )\n except Exception as e:\n log.warning(\n "Failed to analyze related commits for context",\n extra={"repoid": repoid, "pullid": pull.pullid},\n exc_info=True,\n )\n return

Did we get this right? 👍 / 👎 to inform future reviews.

if related_commits:
log.info(
"Found related commits for pull request context",
extra={
"repoid": repoid,
"pullid": pull.pullid,
"related_commits_count": len(related_commits),
"head_commit": head_commit.commitid,
},
Comment on lines +598 to +632
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method queries up to 100 commits on every pull request sync but doesn't actually analyze them or provide context anywhere. It only logs the count. This adds unnecessary database load without clear value. Consider:

  1. Either implement the actual analysis/context provision logic
  2. Remove this feature until the implementation is complete
  3. Gate this behind a feature flag or YAML configuration to limit the performance impact

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +598 to +632
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method queries the database for up to 100 commits but doesn't use the results for any purpose. The query results are only logged and never acted upon. Consider removing this method entirely if it's not going to process the data, or implement the actual context analysis logic if this is incomplete work.

If this is exploratory/telemetry code, consider adding a feature flag to control when this expensive query is executed.

Did we get this right? 👍 / 👎 to inform future reviews.

)


RegisteredPullSyncTask = celery_app.register_task(PullSyncTask())
pull_sync_task = celery_app.tasks[RegisteredPullSyncTask.name]