Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions packages/core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,45 @@ const context = new Context({
})
```

### Using Gemini Embeddings with Retry Configuration

```typescript
import { Context, MilvusVectorDatabase, GeminiEmbedding } from '@pleaseai/context-please-core'

// Initialize with Gemini embedding provider
const embedding = new GeminiEmbedding({
apiKey: process.env.GEMINI_API_KEY || 'your-gemini-api-key',
model: 'gemini-embedding-001',
outputDimensionality: 768, // Optional: Matryoshka Representation Learning support (256, 768, 1536, 3072)
maxRetries: 3, // Optional: Maximum retry attempts (default: 3)
baseDelay: 1000 // Optional: Base delay in ms for exponential backoff (default: 1000ms)
})

const vectorDatabase = new MilvusVectorDatabase({
address: process.env.MILVUS_ADDRESS || 'localhost:19530',
token: process.env.MILVUS_TOKEN || ''
})

const context = new Context({
embedding,
vectorDatabase
})

// The retry mechanism automatically handles:
// - Rate limit errors (429)
// - Server errors (500, 502, 503, 504)
Comment on lines +250 to +267

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟒 Excellent documentation additions

The documentation clearly explains:

  • Configuration options with defaults
  • Matryoshka Representation Learning support
  • Runtime configuration methods
  • Example usage

Minor improvement: Consider adding a section about when to adjust retry settings (e.g., "Increase maxRetries for unreliable networks, decrease for faster failure detection").

// - Network errors (ECONNREFUSED, ETIMEDOUT, ENOTFOUND, EAI_AGAIN)
// - Transient API failures with exponential backoff (1s β†’ 2s β†’ 4s β†’ 8s, capped at 10s)

// Update retry configuration at runtime
embedding.setMaxRetries(5)
embedding.setBaseDelay(2000)

// Check current retry configuration
const retryConfig = embedding.getRetryConfig()
console.log(`Max retries: ${retryConfig.maxRetries}, Base delay: ${retryConfig.baseDelay}ms`)
```

### Custom File Filtering

```typescript
Expand Down
1 change: 1 addition & 0 deletions packages/core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
"@qdrant/js-client-grpc": "^1.15.1",
"@qdrant/js-client-rest": "^1.15.1",
"@zilliz/milvus2-sdk-node": "^2.5.10",
"es-toolkit": "^1.41.0",
"faiss-node": "^0.5.1",
"fs-extra": "^11.0.0",
"glob": "^10.0.0",
Expand Down
5 changes: 5 additions & 0 deletions packages/core/src/embedding/base-embedding.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ export abstract class Embedding {
* @returns Processed text
*/
protected preprocessText(text: string): string {
// Handle null/undefined by converting to empty string
if (text == null) {
return ' '
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟒 Good null safety addition

This change improves robustness by handling null/undefined inputs gracefully. Consider adding JSDoc to document this behavior for API consumers.

Suggested change
// Handle null/undefined by converting to empty string
if (text == null) {
return ' '
}
// Handle null/undefined by converting to single space
if (text == null) {
return ' '
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

βœ… Applied in 31da1ea

  • File: packages/core/src/embedding/base-embedding.ts
  • Change: Updated comment to "single space" for clarity

Thanks for the suggestion, @lee-dora!


// Replace empty string with single space
if (text === '') {
return ' '
Expand Down
190 changes: 165 additions & 25 deletions packages/core/src/embedding/gemini-embedding.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import type { ContentEmbedding } from '@google/genai'
import type { EmbeddingVector } from './base-embedding'
import { retry } from 'es-toolkit'
import { GoogleGenAI } from '@google/genai'
import { Embedding } from './base-embedding'

Expand All @@ -8,17 +9,23 @@ export interface GeminiEmbeddingConfig {
apiKey: string
baseURL?: string // Optional custom API endpoint URL
outputDimensionality?: number // Optional dimension override
maxRetries?: number // Maximum number of retry attempts (default: 3)
baseDelay?: number // Base delay in milliseconds for exponential backoff (default: 1000ms)
}

export class GeminiEmbedding extends Embedding {
private client: GoogleGenAI
private config: GeminiEmbeddingConfig
private dimension: number = 3072 // Default dimension for gemini-embedding-001
protected maxTokens: number = 2048 // Maximum tokens for Gemini embedding models
private maxRetries: number
private baseDelay: number

constructor(config: GeminiEmbeddingConfig) {
super()
this.config = config
this.maxRetries = config.maxRetries ?? 3
this.baseDelay = config.baseDelay ?? 1000
this.client = new GoogleGenAI({
apiKey: config.apiKey,
...(config.baseURL && {
Expand Down Expand Up @@ -52,6 +59,88 @@ export class GeminiEmbedding extends Embedding {
}
}

/**
* Determine if an error is retryable
* @param error Error object to check
* @returns True if error is retryable
*/
private isRetryableError(error: unknown): boolean {
if (typeof error !== 'object' || error === null) {
return false
}

// Network errors
const networkErrorCodes = ['ECONNREFUSED', 'ETIMEDOUT', 'ENOTFOUND', 'EAI_AGAIN']
if ('code' in error && typeof error.code === 'string' && networkErrorCodes.includes(error.code)) {
return true
}

// HTTP status codes
const retryableStatusCodes = [429, 500, 502, 503, 504]
if ('status' in error && typeof error.status === 'number' && retryableStatusCodes.includes(error.status)) {
return true
}

// Error message patterns
const errorMessage = ('message' in error && typeof error.message === 'string')
? error.message.toLowerCase()
: ''
const retryablePatterns = [
'rate limit',
'quota exceeded',
'service unavailable',
'timeout',
'connection',
]

return retryablePatterns.some(pattern => errorMessage.includes(pattern))
}
Comment on lines +66 to +96

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟒 Well-designed error classification logic

The isRetryableError method effectively categorizes errors into retryable and non-retryable categories. Good coverage of network errors, HTTP status codes, and error message patterns.

Minor suggestion: Consider making the retryable patterns configurable for advanced users who may need custom error handling.

Also, the error message matching is case-insensitive which is good, but consider using more specific patterns to avoid false positives (e.g., "timeout" might match unrelated messages).


/**
* Execute operation with retry logic using es-toolkit retry
* Only retries on retryable errors (network errors, rate limits, server errors)
* @param operation Operation to execute
* @param context Context string for error messages
* @returns Operation result
*/
private async executeWithRetry<T>(
operation: () => Promise<T>,
context: string,
): Promise<T> {
try {
return await retry(
async () => {
try {
return await operation()
}
catch (error) {
// If error is not retryable, throw a special error to stop retries
if (!this.isRetryableError(error)) {
// Wrap in a non-retryable marker
const nonRetryableError = new Error(`${context}: ${error instanceof Error ? error.message : 'Unknown error'}`)
;(nonRetryableError as any).__nonRetryable = true
throw nonRetryableError
}
// Re-throw retryable errors to trigger retry
throw error
}
},
{
retries: this.maxRetries,
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delay calculation uses attempts parameter directly which starts from 0. For the first retry (attempts=0), the delay would be baseDelay * 2^0 = baseDelay * 1 = baseDelay. However, with each subsequent retry, the delays will be: baseDelay, 2baseDelay, 4baseDelay, etc. This is correct exponential backoff, but the parameter naming attempts is misleading since it's actually the retry index (0-based). Consider documenting this or using attempts - 1 if the intention is for the first retry to have baseDelay/2.

Suggested change
retries: this.maxRetries,
retries: this.maxRetries,
// 'attempts' is a 0-based retry index: first retry uses baseDelay, subsequent retries double the delay

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’¬ Need further discussion

You're right that the attempts parameter is 0-based. The current behavior is:

  • First retry: baseDelay Γ— 2^0 = 1000ms
  • Second retry: baseDelay Γ— 2^1 = 2000ms
  • Third retry: baseDelay Γ— 2^2 = 4000ms

This matches the intended exponential backoff sequence. However, the parameter naming could be clearer. Should we:

  1. Add a comment explaining this (as you suggested)
  2. Verify this matches es-toolkit's documented behavior

@lee-dora also raised a related concern. Would value your thoughts on the best approach, @Copilot!

delay: (attempts) => Math.min(this.baseDelay * Math.pow(2, attempts), 10000),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 Incorrect delay calculation for exponential backoff

The delay calculation uses attempts as the exponent, but the retry function from es-toolkit passes the attempt number starting from 0. This means:

  • Attempt 0: delay = 1000 * 2^0 = 1000ms βœ…
  • Attempt 1: delay = 1000 * 2^1 = 2000ms βœ…
  • Attempt 2: delay = 1000 * 2^2 = 4000ms βœ…

However, the delay should be applied before the retry attempt, not after. The current implementation may not provide the expected backoff sequence described in the PR description (1s β†’ 2s β†’ 4s β†’ 8s).

Recommendation: Verify the behavior of es-toolkit's retry function to ensure the delay callback works as expected. Consider using attempts + 1 if the library's attempt numbering doesn't match your expectation.

Suggested change
delay: (attempts) => Math.min(this.baseDelay * Math.pow(2, attempts), 10000),
delay: (attempts) => Math.min(this.baseDelay * Math.pow(2, attempts + 1), 10000),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’¬ Need further discussion

The current implementation uses attempts as-is (0-based), which gives us:

  • Attempt 0: 1000ms Γ— 2^0 = 1000ms
  • Attempt 1: 1000ms Γ— 2^1 = 2000ms
  • Attempt 2: 1000ms Γ— 2^2 = 4000ms

This matches the intended sequence (1s β†’ 2s β†’ 4s). Using attempts + 1 would shift this to (2s β†’ 4s β†’ 8s), which would delay the first retry unnecessarily.

However, I see @Copilot also noted the naming could be clearer. The behavior is correct, but should we:

  1. Keep current implementation (working as intended)
  2. Add a comment explaining the 0-based indexing
  3. Verify es-toolkit's retry behavior in their docs

Would appreciate your input, @lee-dora!

},
)
}
catch (error) {
// If it's a non-retryable error, throw as-is
if (typeof error === 'object' && error !== null && (error as any).__nonRetryable) {
throw error
}
// If it was retryable but still failed after all retries
throw new Error(`${context}: ${error instanceof Error ? error.message : 'Unknown error'}`)
}
}

async detectDimension(): Promise<number> {
// Gemini doesn't need dynamic detection, return configured dimension
return this.dimension
Expand All @@ -61,7 +150,7 @@ export class GeminiEmbedding extends Embedding {
const processedText = this.preprocessText(text)
const model = this.config.model || 'gemini-embedding-001'

try {
return this.executeWithRetry(async () => {
const response = await this.client.models.embedContent({
model,
contents: processedText,
Expand All @@ -78,41 +167,59 @@ export class GeminiEmbedding extends Embedding {
vector: response.embeddings[0].values,
dimension: response.embeddings[0].values.length,
}
}
catch (error) {
throw new Error(`Gemini embedding failed: ${error instanceof Error ? error.message : 'Unknown error'}`)
}
}, 'Gemini embedding failed')
}

async embedBatch(texts: string[]): Promise<EmbeddingVector[]> {
const processedTexts = this.preprocessTexts(texts)
const model = this.config.model || 'gemini-embedding-001'

// Try batch processing with retry logic
try {
const response = await this.client.models.embedContent({
model,
contents: processedTexts,
config: {
outputDimensionality: this.config.outputDimensionality || this.dimension,
},
})
return await this.executeWithRetry(async () => {
const response = await this.client.models.embedContent({
model,
contents: processedTexts,
config: {
outputDimensionality: this.config.outputDimensionality || this.dimension,
},
})

if (!response.embeddings) {
throw new Error('Gemini API returned invalid response')
}
if (!response.embeddings) {
throw new Error('Gemini API returned invalid response')
}

return response.embeddings.map((embedding: ContentEmbedding) => {
if (!embedding.values) {
throw new Error('Gemini API returned invalid embedding data')
}
return {
vector: embedding.values,
dimension: embedding.values.length,
}
})
}, 'Gemini batch embedding failed')
}
catch (batchError) {
// Fallback: Process individually if batch fails after all retries
// This handles cases where batch processing is temporarily unavailable
// or when individual requests are more reliable
const results: EmbeddingVector[] = []

return response.embeddings.map((embedding: ContentEmbedding) => {
if (!embedding.values) {
throw new Error('Gemini API returned invalid embedding data')
for (const text of processedTexts) {
try {
const result = await this.embed(text)
results.push(result)
}
return {
vector: embedding.values,
dimension: embedding.values.length,
catch (individualError) {
// If individual request also fails, re-throw the error with cause
const error = new Error('Gemini batch embedding failed (both batch and individual attempts failed)')
;(error as any).cause = individualError
throw error
}
})
}
catch (error) {
throw new Error(`Gemini batch embedding failed: ${error instanceof Error ? error.message : 'Unknown error'}`)
}
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback mechanism processes texts sequentially (awaiting each embed call in a loop). This could significantly impact performance for large batches. Consider using Promise.all() to process individual requests concurrently while maintaining order, or at least document why sequential processing is necessary here.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’¬ Need further discussion

Great point about performance! The sequential processing was intentional to avoid triggering rate limits during fallback, but there are tradeoffs:

Sequential (current):

  • βœ… Avoids rate limiting
  • βœ… Added 100ms delays between requests
  • ❌ Slower for large batches

Promise.all (parallel):

  • βœ… Much faster
  • ❌ Could trigger rate limiting (defeats the fallback purpose)
  • ❌ May exhaust quota quickly

Since the fallback only runs when batch processing fails (likely already hitting limits), sequential with delays seems safer. However, we could:

  1. Make this configurable
  2. Use a semaphore for controlled parallelism (e.g., 5 concurrent requests)
  3. Document the tradeoff

What do you think would be best for production use, @Copilot?


return results
}
}

Expand Down Expand Up @@ -178,4 +285,37 @@ export class GeminiEmbedding extends Embedding {
const supportedDimensions = this.getSupportedDimensions()
return supportedDimensions.includes(dimension)
}

/**
* Get current retry configuration
* @returns Object containing maxRetries and baseDelay
*/
getRetryConfig(): { maxRetries: number, baseDelay: number } {
return {
maxRetries: this.maxRetries,
baseDelay: this.baseDelay,
}
}

/**
* Set maximum number of retry attempts
* @param maxRetries Maximum retry attempts
*/
setMaxRetries(maxRetries: number): void {
if (maxRetries < 0) {
throw new Error('maxRetries must be non-negative')
}
this.maxRetries = maxRetries
}
Comment on lines +318 to +323

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 Missing validation for zero retries edge case

While the validation checks for negative values, it allows maxRetries = 0 which disables retries entirely. This may be intentional, but it's not clear from the documentation or interface whether this is a supported use case.

Recommendation: Either explicitly document that 0 is allowed (to disable retries), or validate against it if it's not intended.

Suggested change
setMaxRetries(maxRetries: number): void {
if (maxRetries < 0) {
throw new Error('maxRetries must be non-negative')
}
this.maxRetries = maxRetries
}
setMaxRetries(maxRetries: number): void {
if (maxRetries < 0) {
throw new Error('maxRetries must be non-negative')
}
// Note: maxRetries = 0 is allowed to disable retry mechanism
this.maxRetries = maxRetries
}


/**
* Set base delay for exponential backoff
* @param baseDelay Base delay in milliseconds
*/
setBaseDelay(baseDelay: number): void {
if (baseDelay <= 0) {
throw new Error('baseDelay must be positive')
}
this.baseDelay = baseDelay
}
}
Loading
Loading