Re-queue failed pageview batch for retry on next interval #710

rockinrimmer · 2025-11-18T23:11:29Z

This re-adds pageviews to the queue if there is an error inserting into Clickhouse.

Summary by CodeRabbit

Bug Fixes
- Enhanced error recovery for pageview tracking to automatically re-attempt failed submissions on the next processing interval, improving overall data reliability and preventing data loss.

vercel · 2025-11-18T23:11:34Z

@rockinrimmer is attempting to deploy a commit to the goldflag's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2025-11-18T23:11:44Z

Walkthrough

The error handling in the pageview queue's bulk insert operation has been enhanced to re-queue failed batches. When an insertion fails, the batch is now prepended to the front of the queue for retry on the next interval, with updated logging reflecting the number of re-queued items.

Changes

Cohort / File(s)	Summary
Error recovery for bulk insert `server/src/services/tracker/pageviewQueue.ts`	Modified error handler to re-queue failed batches at the front of the queue and enhanced logging to indicate items being re-queued. Only error path affected; successful processing unchanged.

Sequence Diagram(s)

sequenceDiagram
    actor Scheduler
    participant Queue
    participant DB as Bulk Insert
    
    Scheduler->>Queue: Poll batch
    Queue->>DB: Insert batch
    
    alt Success
        DB-->>Queue: Complete
        Note over Queue: Batch removed from queue
    else Failure
        DB-->>Queue: Error
        Note over Queue: Log re-queue message<br/>(X items re-queued)
        Queue->>Queue: Unshift batch to front
        Note over Queue: Batch ready for next interval
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Single file modification with localized error handling logic
Straightforward re-queue mechanism using unshift operation
Enhanced logging is additive and non-breaking

Poem

🐰 A batch that stumbles, lost in the night,
Now bounces back to the queue's front—polite!
Retry, persist, till success takes the lead,
unshift makes persistent dreams guaranteed! 🔄

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and directly summarizes the main change: re-queuing failed pageview batches for retry, which matches the PR's primary objective and the file modifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

server/src/services/tracker/pageviewQueue.ts (1)

45-116: Consider preserving processed pageviews to avoid redundant geo-data fetching.

Currently, when a batch fails and is re-queued, the original unprocessed payloads are added back, which means the geo-location lookup (line 45) and all transformations (lines 48-116) are repeated on retry. If the ClickHouse insert failed but geo-data was successfully fetched, this re-processing is redundant.

You could optimize by:

Caching the processedPageviews alongside the retry count when re-queuing

Or, rely on the geo-data service's caching layer if one exists

However, this adds complexity and may not be worth the effort if:

Geo-data lookups are fast/cached

Retry failures are rare

Fresh data on retry is preferred

Only pursue this if profiling shows geo-data fetching is a bottleneck during retries.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba9a3a7 and f79a29f.

📒 Files selected for processing (1)

server/src/services/tracker/pageviewQueue.ts (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build Client Images (ubuntu-24.04-arm, linux/arm64)
GitHub Check: Build Client Images (ubuntu-latest, linux/amd64)

coderabbitai · 2025-11-18T23:14:22Z

server/src/services/tracker/pageviewQueue.ts

+      this.logger.error(error, `Error processing pageview queue, re-adding ${batch.length} items to the start of the queue`);
+      // Re-queue the failed batch so it can be retried on the next interval
+      this.queue.unshift(...batch);


⚠️ Potential issue | 🟠 Major

Infinite retry loop and potential memory exhaustion.

The retry mechanism lacks safeguards for persistent failures:

Unbounded retries: If ClickHouse insertion fails repeatedly (e.g., network issues, schema mismatch, or corrupt data in the batch), the same batch will retry indefinitely every 10 seconds.

Memory leak: During prolonged ClickHouse outages, new pageviews continuously arrive while failed batches are re-queued, causing the queue to grow without bounds.

No failure isolation: A single corrupt item in the batch causes the entire batch to fail and retry forever.

Consider implementing:

Retry limit: Track retry attempts per batch (e.g., max 3 retries) and move permanently failed batches to a dead-letter queue or log them for investigation.

Exponential backoff: Delay subsequent retries to reduce pressure on ClickHouse (e.g., 10s, 30s, 60s).

Queue size limit: Implement a maximum queue size with alerting when approaching capacity.

Item-level error handling: Consider catching per-item validation errors if ClickHouse supports partial batch inserts.

Example approach for retry tracking:

type TotalPayloadWithRetry = TotalPayload & { retryCount?: number; }; // In the error handler: const batchWithRetry = batch.map(item => ({ ...item, retryCount: (item.retryCount || 0) + 1 })); const itemsToRequeue = batchWithRetry.filter(item => (item.retryCount || 0) < 3); const failedItems = batchWithRetry.filter(item => (item.retryCount || 0) >= 3); if (itemsToRequeue.length > 0) { this.queue.unshift(...itemsToRequeue); } if (failedItems.length > 0) { this.logger.error({ count: failedItems.length }, "Dropping items after max retries"); // Optionally: write to dead-letter queue/file for investigation }

@goldflag I can add handling so we don't retry forever if you think its worth it?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Re-queue failed pageview batch for retry on next interval

f79a29f

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-queue failed pageview batch for retry on next interval #710

Re-queue failed pageview batch for retry on next interval #710

Uh oh!

rockinrimmer commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Nov 18, 2025

Uh oh!

coderabbitai bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 18, 2025 •

edited

Loading

Uh oh!

rockinrimmer Nov 19, 2025

Uh oh!

coderabbitai bot Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Re-queue failed pageview batch for retry on next interval #710

Are you sure you want to change the base?

Re-queue failed pageview batch for retry on next interval #710

Uh oh!

Conversation

rockinrimmer commented Nov 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Nov 18, 2025

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rockinrimmer Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rockinrimmer commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading

coderabbitai bot Nov 18, 2025 •

edited

Loading