-
Notifications
You must be signed in to change notification settings - Fork 0
feat(core): Add comprehensive retry mechanism and batch fallback for Gemini embeddings #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
e2ec636
00d73bd
31da1ea
740cd79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,11 @@ export abstract class Embedding { | |||||||||||||||||
| * @returns Processed text | ||||||||||||||||||
| */ | ||||||||||||||||||
| protected preprocessText(text: string): string { | ||||||||||||||||||
| // Handle null/undefined by converting to empty string | ||||||||||||||||||
| if (text == null) { | ||||||||||||||||||
| return ' ' | ||||||||||||||||||
| } | ||||||||||||||||||
|
||||||||||||||||||
| // Handle null/undefined by converting to empty string | |
| if (text == null) { | |
| return ' ' | |
| } | |
| // Handle null/undefined by converting to single space | |
| if (text == null) { | |
| return ' ' | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,5 +1,6 @@ | ||||||||||||||||||||||||||||
| import type { ContentEmbedding } from '@google/genai' | ||||||||||||||||||||||||||||
| import type { EmbeddingVector } from './base-embedding' | ||||||||||||||||||||||||||||
| import { retry } from 'es-toolkit' | ||||||||||||||||||||||||||||
| import { GoogleGenAI } from '@google/genai' | ||||||||||||||||||||||||||||
| import { Embedding } from './base-embedding' | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
@@ -8,17 +9,23 @@ export interface GeminiEmbeddingConfig { | |||||||||||||||||||||||||||
| apiKey: string | ||||||||||||||||||||||||||||
| baseURL?: string // Optional custom API endpoint URL | ||||||||||||||||||||||||||||
| outputDimensionality?: number // Optional dimension override | ||||||||||||||||||||||||||||
| maxRetries?: number // Maximum number of retry attempts (default: 3) | ||||||||||||||||||||||||||||
| baseDelay?: number // Base delay in milliseconds for exponential backoff (default: 1000ms) | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| export class GeminiEmbedding extends Embedding { | ||||||||||||||||||||||||||||
| private client: GoogleGenAI | ||||||||||||||||||||||||||||
| private config: GeminiEmbeddingConfig | ||||||||||||||||||||||||||||
| private dimension: number = 3072 // Default dimension for gemini-embedding-001 | ||||||||||||||||||||||||||||
| protected maxTokens: number = 2048 // Maximum tokens for Gemini embedding models | ||||||||||||||||||||||||||||
| private maxRetries: number | ||||||||||||||||||||||||||||
| private baseDelay: number | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| constructor(config: GeminiEmbeddingConfig) { | ||||||||||||||||||||||||||||
| super() | ||||||||||||||||||||||||||||
| this.config = config | ||||||||||||||||||||||||||||
| this.maxRetries = config.maxRetries ?? 3 | ||||||||||||||||||||||||||||
| this.baseDelay = config.baseDelay ?? 1000 | ||||||||||||||||||||||||||||
| this.client = new GoogleGenAI({ | ||||||||||||||||||||||||||||
| apiKey: config.apiKey, | ||||||||||||||||||||||||||||
| ...(config.baseURL && { | ||||||||||||||||||||||||||||
|
|
@@ -52,6 +59,88 @@ export class GeminiEmbedding extends Embedding { | |||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||||
| * Determine if an error is retryable | ||||||||||||||||||||||||||||
| * @param error Error object to check | ||||||||||||||||||||||||||||
| * @returns True if error is retryable | ||||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||||
| private isRetryableError(error: unknown): boolean { | ||||||||||||||||||||||||||||
| if (typeof error !== 'object' || error === null) { | ||||||||||||||||||||||||||||
| return false | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| // Network errors | ||||||||||||||||||||||||||||
| const networkErrorCodes = ['ECONNREFUSED', 'ETIMEDOUT', 'ENOTFOUND', 'EAI_AGAIN'] | ||||||||||||||||||||||||||||
| if ('code' in error && typeof error.code === 'string' && networkErrorCodes.includes(error.code)) { | ||||||||||||||||||||||||||||
| return true | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| // HTTP status codes | ||||||||||||||||||||||||||||
| const retryableStatusCodes = [429, 500, 502, 503, 504] | ||||||||||||||||||||||||||||
| if ('status' in error && typeof error.status === 'number' && retryableStatusCodes.includes(error.status)) { | ||||||||||||||||||||||||||||
| return true | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| // Error message patterns | ||||||||||||||||||||||||||||
| const errorMessage = ('message' in error && typeof error.message === 'string') | ||||||||||||||||||||||||||||
| ? error.message.toLowerCase() | ||||||||||||||||||||||||||||
| : '' | ||||||||||||||||||||||||||||
| const retryablePatterns = [ | ||||||||||||||||||||||||||||
| 'rate limit', | ||||||||||||||||||||||||||||
| 'quota exceeded', | ||||||||||||||||||||||||||||
| 'service unavailable', | ||||||||||||||||||||||||||||
| 'timeout', | ||||||||||||||||||||||||||||
| 'connection', | ||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| return retryablePatterns.some(pattern => errorMessage.includes(pattern)) | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
|
Comment on lines
+66
to
+96
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π’ Well-designed error classification logic The Minor suggestion: Consider making the retryable patterns configurable for advanced users who may need custom error handling. Also, the error message matching is case-insensitive which is good, but consider using more specific patterns to avoid false positives (e.g., "timeout" might match unrelated messages). |
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||||
| * Execute operation with retry logic using es-toolkit retry | ||||||||||||||||||||||||||||
| * Only retries on retryable errors (network errors, rate limits, server errors) | ||||||||||||||||||||||||||||
| * @param operation Operation to execute | ||||||||||||||||||||||||||||
| * @param context Context string for error messages | ||||||||||||||||||||||||||||
| * @returns Operation result | ||||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||||
| private async executeWithRetry<T>( | ||||||||||||||||||||||||||||
| operation: () => Promise<T>, | ||||||||||||||||||||||||||||
| context: string, | ||||||||||||||||||||||||||||
| ): Promise<T> { | ||||||||||||||||||||||||||||
| try { | ||||||||||||||||||||||||||||
| return await retry( | ||||||||||||||||||||||||||||
| async () => { | ||||||||||||||||||||||||||||
| try { | ||||||||||||||||||||||||||||
| return await operation() | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
| catch (error) { | ||||||||||||||||||||||||||||
| // If error is not retryable, throw a special error to stop retries | ||||||||||||||||||||||||||||
| if (!this.isRetryableError(error)) { | ||||||||||||||||||||||||||||
| // Wrap in a non-retryable marker | ||||||||||||||||||||||||||||
| const nonRetryableError = new Error(`${context}: ${error instanceof Error ? error.message : 'Unknown error'}`) | ||||||||||||||||||||||||||||
| ;(nonRetryableError as any).__nonRetryable = true | ||||||||||||||||||||||||||||
| throw nonRetryableError | ||||||||||||||||||||||||||||
amondnet marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
| // Re-throw retryable errors to trigger retry | ||||||||||||||||||||||||||||
| throw error | ||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||||
| retries: this.maxRetries, | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
| retries: this.maxRetries, | |
| retries: this.maxRetries, | |
| // 'attempts' is a 0-based retry index: first retry uses baseDelay, subsequent retries double the delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π¬ Need further discussion
You're right that the attempts parameter is 0-based. The current behavior is:
- First retry: baseDelay Γ 2^0 = 1000ms
- Second retry: baseDelay Γ 2^1 = 2000ms
- Third retry: baseDelay Γ 2^2 = 4000ms
This matches the intended exponential backoff sequence. However, the parameter naming could be clearer. Should we:
- Add a comment explaining this (as you suggested)
- Verify this matches es-toolkit's documented behavior
@lee-dora also raised a related concern. Would value your thoughts on the best approach, @Copilot!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π‘ Incorrect delay calculation for exponential backoff
The delay calculation uses attempts as the exponent, but the retry function from es-toolkit passes the attempt number starting from 0. This means:
- Attempt 0: delay = 1000 * 2^0 = 1000ms β
- Attempt 1: delay = 1000 * 2^1 = 2000ms β
- Attempt 2: delay = 1000 * 2^2 = 4000ms β
However, the delay should be applied before the retry attempt, not after. The current implementation may not provide the expected backoff sequence described in the PR description (1s β 2s β 4s β 8s).
Recommendation: Verify the behavior of es-toolkit's retry function to ensure the delay callback works as expected. Consider using attempts + 1 if the library's attempt numbering doesn't match your expectation.
| delay: (attempts) => Math.min(this.baseDelay * Math.pow(2, attempts), 10000), | |
| delay: (attempts) => Math.min(this.baseDelay * Math.pow(2, attempts + 1), 10000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π¬ Need further discussion
The current implementation uses attempts as-is (0-based), which gives us:
- Attempt 0: 1000ms Γ 2^0 = 1000ms
- Attempt 1: 1000ms Γ 2^1 = 2000ms
- Attempt 2: 1000ms Γ 2^2 = 4000ms
This matches the intended sequence (1s β 2s β 4s). Using attempts + 1 would shift this to (2s β 4s β 8s), which would delay the first retry unnecessarily.
However, I see @Copilot also noted the naming could be clearer. The behavior is correct, but should we:
- Keep current implementation (working as intended)
- Add a comment explaining the 0-based indexing
- Verify es-toolkit's retry behavior in their docs
Would appreciate your input, @lee-dora!
Outdated
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback mechanism processes texts sequentially (awaiting each embed call in a loop). This could significantly impact performance for large batches. Consider using Promise.all() to process individual requests concurrently while maintaining order, or at least document why sequential processing is necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π¬ Need further discussion
Great point about performance! The sequential processing was intentional to avoid triggering rate limits during fallback, but there are tradeoffs:
Sequential (current):
- β Avoids rate limiting
- β Added 100ms delays between requests
- β Slower for large batches
Promise.all (parallel):
- β Much faster
- β Could trigger rate limiting (defeats the fallback purpose)
- β May exhaust quota quickly
Since the fallback only runs when batch processing fails (likely already hitting limits), sequential with delays seems safer. However, we could:
- Make this configurable
- Use a semaphore for controlled parallelism (e.g., 5 concurrent requests)
- Document the tradeoff
What do you think would be best for production use, @Copilot?
amondnet marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π‘ Missing validation for zero retries edge case
While the validation checks for negative values, it allows maxRetries = 0 which disables retries entirely. This may be intentional, but it's not clear from the documentation or interface whether this is a supported use case.
Recommendation: Either explicitly document that 0 is allowed (to disable retries), or validate against it if it's not intended.
| setMaxRetries(maxRetries: number): void { | |
| if (maxRetries < 0) { | |
| throw new Error('maxRetries must be non-negative') | |
| } | |
| this.maxRetries = maxRetries | |
| } | |
| setMaxRetries(maxRetries: number): void { | |
| if (maxRetries < 0) { | |
| throw new Error('maxRetries must be non-negative') | |
| } | |
| // Note: maxRetries = 0 is allowed to disable retry mechanism | |
| this.maxRetries = maxRetries | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π’ Excellent documentation additions
The documentation clearly explains:
Minor improvement: Consider adding a section about when to adjust retry settings (e.g., "Increase maxRetries for unreliable networks, decrease for faster failure detection").