Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion server/src/services/tracker/pageviewQueue.ts
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,9 @@ class PageviewQueue {
format: "JSONEachRow",
});
} catch (error) {
this.logger.error(error, "Error processing pageview queue");
this.logger.error(error, `Error processing pageview queue, re-adding ${batch.length} items to the start of the queue`);
// Re-queue the failed batch so it can be retried on the next interval
this.queue.unshift(...batch);
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Infinite retry loop and potential memory exhaustion.

The retry mechanism lacks safeguards for persistent failures:

  1. Unbounded retries: If ClickHouse insertion fails repeatedly (e.g., network issues, schema mismatch, or corrupt data in the batch), the same batch will retry indefinitely every 10 seconds.
  2. Memory leak: During prolonged ClickHouse outages, new pageviews continuously arrive while failed batches are re-queued, causing the queue to grow without bounds.
  3. No failure isolation: A single corrupt item in the batch causes the entire batch to fail and retry forever.

Consider implementing:

  • Retry limit: Track retry attempts per batch (e.g., max 3 retries) and move permanently failed batches to a dead-letter queue or log them for investigation.
  • Exponential backoff: Delay subsequent retries to reduce pressure on ClickHouse (e.g., 10s, 30s, 60s).
  • Queue size limit: Implement a maximum queue size with alerting when approaching capacity.
  • Item-level error handling: Consider catching per-item validation errors if ClickHouse supports partial batch inserts.

Example approach for retry tracking:

type TotalPayloadWithRetry = TotalPayload & {
  retryCount?: number;
};

// In the error handler:
const batchWithRetry = batch.map(item => ({
  ...item,
  retryCount: (item.retryCount || 0) + 1
}));

const itemsToRequeue = batchWithRetry.filter(item => (item.retryCount || 0) < 3);
const failedItems = batchWithRetry.filter(item => (item.retryCount || 0) >= 3);

if (itemsToRequeue.length > 0) {
  this.queue.unshift(...itemsToRequeue);
}
if (failedItems.length > 0) {
  this.logger.error({ count: failedItems.length }, "Dropping items after max retries");
  // Optionally: write to dead-letter queue/file for investigation
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldflag I can add handling so we don't retry forever if you think its worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea maybe we can retry 3 times. What were the issues you were seeing though? i wouldn't expect clickhouse to be flaky in this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can do.

The only issue was during a restart of Clickhouse. For example I added some limits to the k8s pod which recreates the pod and it was down for a short while.

} finally {
this.processing = false;
}
Expand Down