Skip to content

Conversation

@Hemil36
Copy link
Contributor

@Hemil36 Hemil36 commented Nov 1, 2025

  • Fixes : BUG: Database Integrity Issues - Foreign Key Constraints Not Enforced #571
    This pull request refactors the database access layer to use a context-managed connection helper (get_db_connection) across the albums, face clusters, and faces modules. This change centralizes connection handling, improves code readability, and ensures more robust transaction management and error logging. Additionally, it updates database transaction settings for better concurrency and reliability.

Database connection management and transaction handling:

  • Replaced direct usage of sqlite3.connect and manual connection/transaction handling in albums.py, face_clusters.py, and faces.py with the centralized get_db_connection context manager, significantly reducing boilerplate code and potential connection leaks.

  • Refactored batch and update operations in face_clusters.py to accept an optional connection, further supporting reuse and atomic transactions. [

Database reliability and logging improvements:

  • Enhanced get_db_connection to set SQLite PRAGMAs for Write-Ahead Logging (journal_mode=WAL), increased busy timeout, and added exception logging on transaction failure for better reliability and debugging.

These changes make the database access layer more robust, maintainable, and easier to debug.

Summary by CodeRabbit

  • Improvements

    • Unified database connection handling for more reliable resource cleanup and better concurrent access behavior.
    • Improved database logging and error reporting to aid troubleshooting.
    • DB performance/timing improvements to reduce contention.
  • Bug Fixes

    • Added validation to prevent invalid album–image operations and to surface clear errors when operations fail.
  • Refactor

    • Core DB workflows reorganized for consistent transaction handling and simpler public interfaces.

- Updated all database interaction functions to utilize `get_db_connection()` for managing SQLite connections.
- Removed manual connection handling and commit/rollback logic from individual functions.
- Improved error handling and logging for database operations.
- Ensured that all database operations are performed within a single transaction where applicable.
- Enhanced code readability and maintainability by reducing boilerplate connection code.
@github-actions github-actions bot added backend bug Something isn't working labels Nov 1, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 1, 2025

Walkthrough

Refactors multiple DB modules to use a centralized get_db_connection() context manager, replaces manual sqlite3 connect/commit/close patterns, propagates optional sqlite3.Connection parameters across functions, and tightens album-image validation and error logging.

Changes

Cohort / File(s) Summary
Connection infrastructure
backend/app/database/connection.py
Added module logger, set PRAGMA journal_mode=WAL and busy_timeout=30000, and logs exceptions before re-raising; provides get_db_connection() context manager for transactions.
Albums
backend/app/database/albums.py
Switched to with get_db_connection() usage for all operations; strengthened album-image association validation (raise ValueError if image(s) missing); removed manual connect/commit/close.
Images
backend/app/database/images.py
Replaced _connect and direct sqlite3 usage with get_db_connection(); all queries/inserts/updates use context manager; removed explicit commit/close and updated batch/lookup behavior to same outputs.
Faces
backend/app/database/faces.py
Reworked to use get_db_connection(); refactored insert/query/update functions; changed db_update_face_cluster_ids_batch to accept conn: Optional[sqlite3.Connection].
Face clusters
backend/app/database/face_clusters.py
Replaced direct connections with get_db_connection; introduced inner operation helpers; db_delete_all_clusters and db_insert_clusters_batch now accept optional conn: Optional[sqlite3.Connection]; other reads use context manager.
Folders
backend/app/database/folders.py
Migrated all DB operations to use get_db_connection() context manager; removed explicit commit/close and simplified error handling.
Metadata
backend/app/database/metadata.py
Uses get_db_connection(); changed db_update_metadata signature to accept conn: Optional[sqlite3.Connection] with internal helper to support external connections.
Cluster utilities
backend/app/utils/face_clusters.py
Updated helper signatures to accept optional sqlite3.Connection (e.g., _update_cluster_face_image, _get_cluster_face_data, _generate_cluster_face_image); improved error handling/logging and propagated connection usage through workflows.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant ConnMgr as get_db_connection()
  participant DB as SQLite
  Caller->>ConnMgr: with get_db_connection() as conn
  ConnMgr->>DB: open connection + PRAGMA (WAL, busy_timeout)
  Caller->>ConnMgr: conn.cursor() -> execute SQL
  alt success
    ConnMgr->>DB: COMMIT
    ConnMgr-->>Caller: results
  else error
    ConnMgr->>DB: ROLLBACK
    ConnMgr-->>Caller: exception (logged)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay attention to signature changes where callers must now pass or propagate conn instead of cursor.
  • Verify transaction boundaries and that get_db_connection() correctly commits/rolls back in all error paths.
  • Check album-image validation changes for callers expecting silent failures vs raised ValueError.

Possibly related PRs

Suggested reviewers

  • rahulharpal1603

Poem

🐰
I hopped through code to tidy the tracks,
swapped connect/close for tidy with-blocks,
now PRAGMAs hum and errors get logged,
images and albums checked with care —
a fluffle of clean DBs everywhere.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.87% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Refactor database access to use context managers for connection handling" accurately captures the primary change across the entire changeset. The raw_summary demonstrates that the core refactoring pattern is consistently applied across multiple database modules (albums.py, face_clusters.py, faces.py, folders.py, images.py, metadata.py) and utility files, where manual sqlite3 connection handling with explicit connect/close/commit calls is systematically replaced with a centralized get_db_connection() context manager. The title is specific, concise (10 words), and clearly communicates the main architectural change without vagueness or unnecessary detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08bfdf7 and d3f7591.

📒 Files selected for processing (1)
  • backend/app/utils/face_clusters.py (6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.116Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.
🧬 Code graph analysis (1)
backend/app/utils/face_clusters.py (4)
backend/app/database/face_clusters.py (2)
  • db_delete_all_clusters (36-57)
  • db_insert_clusters_batch (60-104)
backend/app/database/faces.py (1)
  • db_update_face_cluster_ids_batch (270-313)
backend/app/database/metadata.py (1)
  • db_update_metadata (49-75)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Backend Tests
🔇 Additional comments (3)
backend/app/utils/face_clusters.py (3)

124-156: Excellent transaction scope and connection threading.

The refactoring properly threads the conn parameter through all database operations within the transaction blocks. The operation sequence ensures foreign key constraints are satisfied (clusters created before face assignments), and the dual-path approach (full reclustering vs. incremental assignment) is cleanly separated with appropriate transaction boundaries.


336-376: Past review comment addressed—error handling now consistent.

Both connection paths (external and managed) now log and re-raise exceptions consistently. This ensures predictable behavior and proper transaction rollback regardless of how the function is called.


555-556: Good design: connection now required rather than optional.

Making the conn parameter required (not Optional) is the right choice. This function is only called from within transactions (line 136) where a connection is always available, so the signature now accurately reflects its usage contract and prevents misuse.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/database/faces.py (1)

88-132: Consider transaction atomicity for multi-face insertion.

When inserting multiple faces for one image (lines 113-126), each call to db_insert_face_embeddings() creates a separate transaction. If insertion fails partway through, earlier faces are already committed, potentially leaving partial data.

Consider refactoring to use a single transaction for all faces:

 def db_insert_face_embeddings_by_image_id(
     image_id: ImageId,
     embeddings: Union[FaceEmbedding, List[FaceEmbedding]],
     confidence: Optional[Union[float, List[float]]] = None,
     bbox: Optional[Union[BoundingBox, List[BoundingBox]]] = None,
     cluster_id: Optional[Union[ClusterId, List[ClusterId]]] = None,
 ) -> Union[FaceId, List[FaceId]]:
     # Handle multiple faces in one image
     if (
         isinstance(embeddings, list)
         and len(embeddings) > 0
         and isinstance(embeddings[0], np.ndarray)
     ):
-        face_ids = []
-        for i, emb in enumerate(embeddings):
-            conf = (
-                confidence[i]
-                if isinstance(confidence, list) and i < len(confidence)
-                else confidence
-            )
-            bb = bbox[i] if isinstance(bbox, list) and i < len(bbox) else bbox
-            cid = (
-                cluster_id[i]
-                if isinstance(cluster_id, list) and i < len(cluster_id)
-                else cluster_id
-            )
-            face_id = db_insert_face_embeddings(image_id, emb, conf, bb, cid)
-            face_ids.append(face_id)
-        return face_ids
+        with get_db_connection() as conn:
+            cursor = conn.cursor()
+            face_ids = []
+            for i, emb in enumerate(embeddings):
+                conf = (
+                    confidence[i]
+                    if isinstance(confidence, list) and i < len(confidence)
+                    else confidence
+                )
+                bb = bbox[i] if isinstance(bbox, list) and i < len(bbox) else bbox
+                cid = (
+                    cluster_id[i]
+                    if isinstance(cluster_id, list) and i < len(cluster_id)
+                    else cluster_id
+                )
+                embeddings_json = json.dumps([emb.tolist() for emb in emb])
+                bbox_json = json.dumps(bb) if bb is not None else None
+                cursor.execute(
+                    "INSERT INTO faces (image_id, cluster_id, embeddings, confidence, bbox) VALUES (?, ?, ?, ?, ?)",
+                    (image_id, cid, embeddings_json, conf, bbox_json),
+                )
+                face_ids.append(cursor.lastrowid)
+            return face_ids
     else:
         # Single face
         return db_insert_face_embeddings(
             image_id, embeddings, confidence, bbox, cluster_id
         )
backend/app/utils/face_clusters.py (1)

378-435: Inconsistent error handling between connection paths.

Similar to _update_cluster_face_image, this function has asymmetric error handling:

  • Lines 422-427: Raises when using external connection
  • Lines 430-435: Returns None when using managed connection

For consistency and proper transaction rollback, both paths should raise exceptions.

Apply this diff:

     # Case 2: Use managed connection
     try:
         with get_db_connection() as conn:
             return perform_query(conn)
     except Exception as e:
         logger.error(f"Error getting face data for cluster {cluster_uuid}: {e}")
-        return None
+        raise
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe2f98d and 08bfdf7.

📒 Files selected for processing (8)
  • backend/app/database/albums.py (7 hunks)
  • backend/app/database/connection.py (2 hunks)
  • backend/app/database/face_clusters.py (10 hunks)
  • backend/app/database/faces.py (10 hunks)
  • backend/app/database/folders.py (15 hunks)
  • backend/app/database/images.py (8 hunks)
  • backend/app/database/metadata.py (3 hunks)
  • backend/app/utils/face_clusters.py (6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.116Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.
📚 Learning: 2025-10-31T17:00:50.116Z
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.116Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.

Applied to files:

  • backend/app/database/images.py
  • backend/app/database/faces.py
  • backend/app/database/connection.py
🧬 Code graph analysis (7)
backend/app/database/albums.py (1)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
backend/app/database/images.py (2)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
backend/app/utils/images.py (1)
  • image_util_parse_metadata (496-513)
backend/app/database/faces.py (2)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
backend/app/database/metadata.py (1)
  • _update (63-67)
backend/app/database/metadata.py (2)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
backend/app/database/faces.py (1)
  • _update (292-307)
backend/app/database/folders.py (1)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
backend/app/utils/face_clusters.py (4)
backend/app/database/face_clusters.py (2)
  • db_delete_all_clusters (36-57)
  • db_insert_clusters_batch (60-104)
backend/app/database/faces.py (1)
  • db_update_face_cluster_ids_batch (270-313)
backend/app/database/metadata.py (1)
  • db_update_metadata (49-75)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
backend/app/database/face_clusters.py (1)
backend/app/database/connection.py (1)
  • get_db_connection (11-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Backend Tests
🔇 Additional comments (11)
backend/app/database/faces.py (5)

5-5: LGTM: Centralized connection import added.

The import of get_db_connection aligns with the PR objective to standardize connection handling across the codebase.


29-48: LGTM: Table creation now uses context manager.

The refactoring correctly delegates transaction management to get_db_connection(), eliminating manual commit/close calls.


50-86: LGTM: Insertion now uses context manager.

The function correctly uses the context manager and returns the face_id before the automatic commit. This is safe because the context manager commits on successful exit.


135-204: LGTM: Read-only operations correctly refactored.

All query functions properly use the context manager for connection handling. The automatic commit on read-only operations is harmless and simplifies the pattern.

Also applies to: 207-228, 231-267, 316-363


270-313: LGTM: Batch update supports optional connection reuse.

The signature change from cursor to optional connection, combined with the dual-path pattern, correctly enables both standalone use and transaction reuse. This matches the pattern used in backend/app/database/face_clusters.py for consistency.

backend/app/utils/face_clusters.py (2)

124-150: Excellent: Atomic transaction for full reclustering.

All database operations for the full reclustering path—deleting old clusters, inserting new ones, updating face assignments, generating face images, and updating metadata—are executed within a single transaction. This ensures atomicity: either all changes succeed together or all are rolled back on failure.


13-13: LGTM: Supporting changes for transaction pattern.

The Tuple import supports the updated type hints (line 380), and making conn a required parameter for _generate_cluster_face_image is appropriate since this function is only called within an active transaction (line 136).

Also applies to: 555-594

backend/app/database/face_clusters.py (4)

3-3: LGTM: Centralized connection import.

Consistent with the refactoring in backend/app/database/faces.py.


21-33: LGTM: Table creation refactored correctly.

The context manager pattern is correctly applied for DDL operations.


36-57: LGTM: Consistent dual-path pattern for connection reuse.

These functions correctly implement the same dual-path pattern seen in backend/app/database/faces.py::db_update_face_cluster_ids_batch, enabling both standalone use and connection reuse within transactions. The pattern is consistent across all batch/update operations in the refactored codebase.

Also applies to: 60-104, 161-205


107-131: LGTM: Read-only operations refactored correctly.

All query operations properly use the context manager. The refactoring eliminates manual connection management while maintaining correct behavior.

Also applies to: 134-158, 208-247, 250-316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Database Integrity Issues - Foreign Key Constraints Not Enforced

1 participant