Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 13, 2025

Deployment failed with the following error:

Too many requests - try again in 3 minutes (more than 120, code: "api-deployments-flood-pro").

Learn More: https://vercel.com/docs/limits#rate-limits

Copy link
Member Author

NathanFlurry commented Nov 13, 2025


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 13, 2025

PR Review: Add Ability to Destroy Actors

Summary

This PR adds actor destruction functionality to RivetKit, allowing actors to be properly destroyed and resurrected with generation tracking. The implementation is comprehensive and well-tested, but there are several areas that need attention.


Critical Issues

1. Race Condition in Destroy Logic (High Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:230-250

The startDestroy method has a potential race condition:

startDestroy(actorId: string): void {
    const handler = this.#globalState.actors.get(doId);
    
    // Actor not loaded, nothing to destroy
    if (!handler || !handler.actor) {
        return;  // ⚠️ Actor exists in storage but isn't loaded - silent failure
    }
    
    // Check if already destroying
    if (handler.destroying) {
        return;
    }
    handler.destroying = true;
    // ...
}

Problem: If an actor exists in Durable Object storage but hasn't been loaded into memory yet, the destroy operation silently fails. This could lead to:

  • Actors that should be destroyed remaining active
  • Inconsistent state between expected and actual actor lifecycle
  • Confusion when destroy() is called but the actor continues functioning

Recommendation: Either:

  1. Load the actor if not present before destroying
  2. Mark as destroyed in storage even if not loaded
  3. Throw an error to indicate the operation wasn't completed

2. Missing Error Handling in Background KV Deletion (Medium Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:255-258

doState.ctx.waitUntil(
    env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId)),
);

Problem: The background KV deletion has no error handling. If this fails:

  • The actor will be destroyed locally but metadata remains in KV
  • Future lookups might incorrectly think the actor still exists
  • No logging or notification of the failure

Recommendation: Add error handling and logging:

doState.ctx.waitUntil(
    env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId))
        .catch(err => {
            this.#log.error({
                msg: "failed to delete actor metadata from KV",
                actorId,
                error: stringifyError(err),
            });
        })
);

3. Generation Mismatch Error Handling (Medium Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:144-147

if (generation !== expectedGeneration) {
    throw new Error(
        `Actor ${actorId} generation mismatch - expected ${expectedGeneration}, got ${generation}`,
    );
}

Problem: This throws a generic Error instead of a proper ActorError. This means:

  • Client-side error handling won't recognize this as an actor-specific error
  • Error won't have proper group and code for catching
  • Not following the project's error handling patterns

Recommendation: Create a specific error type:

export class ActorGenerationMismatch extends ActorError {
    constructor(actorId: string, expected: number, actual: number) {
        super(
            "actor",
            "generation_mismatch",
            `Actor ${actorId} generation mismatch - expected ${expected}, got ${actual}`,
            { public: true, metadata: { actorId, expected, actual } }
        );
    }
}

Code Quality Issues

4. Inconsistent SQL Schema Comments (Low Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-handler-do.ts:92-115

The SQL table creation comments are helpful, but the schema definition could be clearer:

CREATE TABLE IF NOT EXISTS _rivetkit_metadata(
    id INTEGER PRIMARY KEY CHECK (id = 1),  // Not obvious why id=1 constraint
    name TEXT NOT NULL,
    key TEXT NOT NULL,
    destroyed INTEGER DEFAULT 0,
    generation INTEGER DEFAULT 0
);

Recommendation: Add a comment explaining the id=1 constraint ensures single-row table.


5. Potential Memory Leak in Global State (Medium Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:30-42

#dos: Map<string, DurableObjectGlobalState> = new Map();
#actors: Map<string, ActorHandler> = new Map();

Problem: The #dos map grows indefinitely with DO IDs. While Cloudflare Workers have memory isolation per request, in long-lived scenarios this could accumulate.

Recommendation: Consider cleanup strategy or document the lifecycle expectations.


6. Type Safety in SQL Queries (Low Priority)

Location: Multiple files using sql.exec()

The SQL queries use type assertions without validation:

const name = result.value[0] as string;
const key = JSON.parse(result.value[1] as string) as ActorKey;
const destroyed = result.value[2] as number;
const generation = result.value[3] as number;

Recommendation: Add runtime validation for critical fields:

invariant(typeof name === 'string', 'name must be string');
invariant(typeof destroyed === 'number', 'destroyed must be number');

Security Concerns

7. Actor Key Storage Security (Low Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-handler-do.ts:132-136

Actor keys are stored as JSON strings in SQLite:

key TEXT NOT NULL,
// ...
const key = JSON.parse(result.value[1] as string) as ActorKey;

Analysis: This is acceptable since:

  • Durable Objects have isolated storage per DO
  • Keys are not sensitive credentials (they're identifiers)
  • However, document that keys should not contain sensitive data

Recommendation: Add a comment or documentation noting that actor keys should not contain sensitive information.


Performance Considerations

8. Serial KV Operations (Medium Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:191-198, 214-221

async kvBatchPut(actorId: string, entries: [Uint8Array, Uint8Array][]): Promise<void> {
    const sql = this.#getDOCtx(actorId).storage.sql;
    for (const [key, value] of entries) {
        kvPut(sql, key, value);  // Sequential INSERTs
    }
}

Problem: Each kvPut is a separate SQL INSERT. For large batches, this is inefficient.

Recommendation: Use a single SQL transaction with multiple inserts:

sql.exec('BEGIN TRANSACTION');
try {
    for (const [key, value] of entries) {
        kvPut(sql, key, value);
    }
    sql.exec('COMMIT');
} catch (err) {
    sql.exec('ROLLBACK');
    throw err;
}

9. Full Table Scan in kvListPrefix (High Priority)

Location: rivetkit-typescript/packages/cloudflare-workers/src/kv_query.ts:31-47

export function kvListPrefix(sql: SqlStorage, prefix: Uint8Array): [Uint8Array, Uint8Array][] {
    const cursor = sql.exec("SELECT key, value FROM _rivetkit_kv_storage");  // ⚠️ Full table scan!
    const entries: [Uint8Array, Uint8Array][] = [];
    
    for (const row of cursor.raw()) {
        const key = toUint8Array(row[0]);
        const value = toUint8Array(row[1]);
        
        if (hasPrefix(key, prefix)) {  // Filtering in application code
            entries.push([key, value]);
        }
    }
    return entries;
}

Problem: This does a full table scan and filters in memory. For actors with many KV entries, this is very inefficient.

Recommendation: Unfortunately, SQLite doesn't support BLOB prefix matching well. Consider:

  1. Adding an index on key (though BLOB indexing is limited)
  2. Storing a hex-encoded version of keys for efficient prefix queries
  3. Documenting the performance characteristics
  4. If prefix queries are rare, this might be acceptable - but needs documentation

Test Coverage

10. Excellent Test Coverage

The test suite (actor-destroy.ts) is comprehensive:

  • Tests both connected and non-connected scenarios
  • Validates state persistence before/after destroy
  • Confirms lifecycle hooks are called
  • Tests actor resurrection with fresh state
  • Uses proper async waiting patterns

Minor Suggestion: Add test cases for:

  • Destroying an actor that's already being destroyed (idempotency)
  • Destroying an actor mid-action execution
  • Generation mismatch scenarios

Best Practices & Style

11. Good Use of Generation Tracking

The generation tracking pattern is excellent:

  • Prevents stale references
  • Enables safe resurrection
  • Clear separation of DO ID vs Actor ID

Observation: The buildActorId/parseActorId utilities are clean and well-documented.


12. Logging Consistency (Low Priority)

Most logging follows the structured logging pattern correctly:

logger().debug({ msg: "actor creating", key: req.key, generation });

However, some logs could include more context for debugging destroyed actors.


Documentation

13. Missing Documentation (Medium Priority)

Missing:

  1. Comments explaining the generation tracking pattern
  2. Documentation about when actors can/cannot be destroyed
  3. Behavior of destroy during active connections
  4. KV storage schema documentation

Recommendation: Add JSDoc comments to key methods and a README section explaining:

  • Actor lifecycle with destroy
  • Generation tracking
  • Storage cleanup behavior

Breaking Changes

14. Wrangler Configuration Change (Breaking Change)

Location: examples/*/wrangler.json

-"new_classes": ["ActorHandler"]
+"new_sqlite_classes": ["ActorHandler"]

Impact: This is a breaking change for existing deployments. Users will need to update their wrangler.json files.

Recommendation:

  • Document this in migration notes
  • Consider adding a migration guide
  • Mention in PR description

Summary

Strengths:

  • ✅ Comprehensive implementation of actor destruction
  • ✅ Strong test coverage
  • ✅ Good use of generation tracking to prevent stale references
  • ✅ Proper lifecycle hook integration

Must Fix Before Merge:

  1. Race condition in startDestroy when actor not loaded ([SVC-2555] Set up issue templates #1)
  2. Full table scan performance issue in kvListPrefix ([SOC-15] Fix error when running user-delete #9)
  3. Error handling for background KV deletion ([SVC-2479] Send cluster events to PostHog #2)

Should Fix:

  1. Use proper ActorError types ([SVC-2504] Fix 5 GB upload limit for local development from Cloudflare #3)
  2. Add SQL transaction batching for KV operations (Remove fallible uses of as in api services #8)
  3. Document breaking changes (Fix compile error in mm-worker #14)

Nice to Have:

  1. Additional test coverage for edge cases
  2. Documentation for generation tracking
  3. Memory cleanup strategy for global state

Overall, this is a solid implementation. The core functionality is well-designed, but the race condition and performance issues need to be addressed before merging.

Estimated Time to Fix Critical Issues: 2-3 hours

@vercel
Copy link

vercel bot commented Nov 13, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Error Error Nov 13, 2025 10:35pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 13, 2025 10:35pm
rivet-inspector Ignored Ignored Preview Nov 13, 2025 10:35pm
rivet-site Ignored Ignored Preview Nov 13, 2025 10:35pm

@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 13, 2025 02:27 Inactive
@claude
Copy link

claude bot commented Nov 13, 2025

PR Review: Actor Destroy Functionality

Summary

This PR adds the ability to destroy actors in RivetKit, introducing a generation-based tracking system for Cloudflare Workers actors to handle actor lifecycle management including resurrection. The implementation is comprehensive with good test coverage.

🎯 Strengths

Architecture & Design

  • Generation tracking: Excellent solution using actor IDs formatted as {doId}:{generation} to prevent stale references after actor resurrection
  • Separation of concerns: Clean separation between DO state and actor state using WeakMap for proper garbage collection
  • SQL-based storage: Good migration from native Cloudflare KV to SQL storage for blob key support

Testing

  • Comprehensive test coverage with both connected and non-connected actor scenarios
  • Observer pattern in tests is clean for verifying async destroy behavior
  • Tests verify state cleanup, resurrection, and error cases

🐛 Potential Issues

Critical

  1. Race condition in loadActor (actor-driver.ts:132-161)

    if (actorState?.actorInstance) {
        return actorState.actorInstance;
    }

    If startDestroy is called between this check and returning the instance, you could return an actor that's being destroyed. Consider checking the destroying flag here.

  2. Generation mismatch handling (actor-driver.ts:183-188)

    if (generation !== expectedGeneration) {
        throw new Error(`Actor ${actorId} generation mismatch...`);
    }

    This will throw when loading a resurrected actor if the cached generation is stale. The error handling should be more graceful - perhaps trigger a reload or return a more specific error type.

  3. Memory leak in global state (actor-driver.ts:45-57)
    The #dos Map in CloudflareDurableObjectGlobalState never removes entries. In long-running workers, this could accumulate destroyed actor references. Consider adding cleanup in ActorGlobalState.reset():

    // In reset():
    this.destroying = false;
    // Should also remove from #dos Map somehow

Moderate

  1. SQL injection concerns (actor-kv.ts:34)

    const cursor = sql.exec("SELECT key, value FROM _rivetkit_kv_storage");

    The kvListPrefix function fetches ALL rows then filters in memory. For actors with many KV entries, this is very inefficient and could cause memory issues. Consider using SQL LIKE with proper escaping:

    // Convert prefix to hex and use SQL LIKE
    sql.exec("SELECT key, value FROM _rivetkit_kv_storage WHERE key LIKE ?", prefix + '%')
  2. Error handling in #callOnStopAsync (actor-driver.ts:293-321)
    This method is fire-and-forget with no error handling. If actor.onStop() throws, the destroy might partially complete. Wrap in try-catch and log errors properly:

    try {
        await actor.onStop("destroy");
    } catch (err) {
        logger().error({ msg: "error in onStop during destroy", err });
        // Consider whether to continue with cleanup or abort
    }
  3. Concurrent destroy handling (actor-driver.ts:242-264)
    The check if (actorState.destroying) and immediate return doesn't prevent double-destroy if called concurrently before the flag is set. This is probably fine given the fire-and-forget nature, but worth documenting.

  4. Missing await on KV delete (actor-driver.ts:312-315)

    doState.ctx.waitUntil(
        env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId)),
    );

    This is fire-and-forget which is fine, but if the KV delete fails, the metadata will be stale. Consider logging failures or implementing retry logic.

Minor

  1. Type safety in SQL results (actor-handler-do.ts:139-146)

    const name = result.value[0] as string;
    const key = JSON.parse(result.value[1] as string) as ActorKey;

    No validation that the JSON parse succeeds or that values are the expected types. Add runtime validation:

    if (!name || typeof name !== 'string') {
        throw new Error(`Invalid actor metadata: missing or invalid name`);
    }
  2. Logging improvements

    • Line actor-driver.ts:293 - Consider logging the actorId being destroyed for debugging
    • Line actor-handler-do.ts:252 - "actor is destroyed, cannot load" should include the actorId for correlation
  3. Inconsistent error messages

    • actor-driver.ts:157 uses "is destroyed"
    • actor-handler-do.ts:252 uses "is destroyed, cannot load"
    • Consider standardizing error messages for actor destroy scenarios

⚡ Performance Considerations

  1. SQL table scans (actor-kv.ts:34): As mentioned above, kvListPrefix does a full table scan which will be O(n) with number of KV entries. This could significantly impact actors with large state.

  2. No transaction boundaries: The SQL operations in create() and during destroy aren't wrapped in transactions. If operations fail mid-way, you could have inconsistent state. Consider:

    sql.exec("BEGIN TRANSACTION");
    try {
        // operations
        sql.exec("COMMIT");
    } catch {
        sql.exec("ROLLBACK");
        throw;
    }
  3. waitUntil usage: The background KV delete (actor-driver.ts:314) is good for performance but means metadata cleanup is eventually consistent. Document this behavior.

🔒 Security Considerations

  1. SQL injection: While Cloudflare's SQL API with parameterized queries is generally safe, ensure all user input goes through parameters, not string concatenation. The current code looks good on this front.

  2. Actor resurrection validation: The generation check prevents unauthorized access to destroyed actors - this is good security practice.

📝 Code Quality & Best Practices

Positive

  • Follows CLAUDE.md conventions (structured logging, error handling patterns)
  • Good use of TypeScript for type safety
  • Clear separation between DO lifecycle and actor lifecycle

Suggestions

  1. Add JSDoc comments for public APIs, especially:

    • buildActorId and parseActorId in actor-id.ts (though they do have good comments)
    • ActorGlobalState class and its properties
    • The create() method signature change
  2. Consider extracting constants (actor-handler-do.ts:137):

    const METADATA_QUERY = "SELECT name, key, destroyed, generation FROM _rivetkit_metadata WHERE id = 1";
  3. Wrangler.json change documentation: The change from new_classes to new_sqlite_classes should be noted in commit message or migration guide.

🧪 Test Coverage

Excellent coverage with:

  • Actor destruction without connection
  • Actor destruction with connection
  • State verification post-destroy
  • Actor resurrection verification
  • Observer pattern for async verification

Missing test cases to consider:

  • Concurrent destroy calls
  • Destroy during active request processing
  • Generation mismatch scenarios
  • Actor with very large KV state (performance test)
  • Failed destroy (onStop throws error)

📋 Recommendations

High Priority

  1. Fix the kvListPrefix performance issue with proper SQL LIKE query
  2. Add error handling to #callOnStopAsync
  3. Add race condition check for destroying flag in loadActor
  4. Add memory cleanup for #dos Map in global state

Medium Priority

  1. Add SQL transactions for state consistency
  2. Add runtime validation for SQL query results
  3. Add JSDoc documentation for public APIs

Low Priority

  1. Standardize error messages
  2. Extract SQL query constants
  3. Add additional edge case tests

✅ Verdict

This is a well-designed feature with good architecture and comprehensive testing. The generation-based tracking is a clever solution for handling actor resurrection. The critical issues identified are mostly around edge cases and performance optimization.

Recommendation: Approve with requested changes to address the race condition and performance issues. The memory leak and error handling should be fixed before merge.

Great work overall! 🚀

@NathanFlurry NathanFlurry force-pushed the 11-12-feat_rivet-engine_udb_key_parser branch from 72318fc to cc00181 Compare November 13, 2025 02:47
@NathanFlurry NathanFlurry force-pushed the 11-12-feat_rivetkit_add_ability_to_destroy_actors branch from 892e9cb to 90de47a Compare November 13, 2025 02:47
@claude
Copy link

claude bot commented Nov 13, 2025

PR Review: Add ability to destroy actors

Summary

This PR implements actor destruction functionality for RivetKit, adding the ability to explicitly destroy actors and clean up their state. The implementation includes generation tracking to handle actor resurrection and prevent stale references.

Code Quality & Architecture

Strengths

  1. Generation-based actor IDs: The {doId}:{generation} pattern in actor-id.ts is a clean solution for tracking actor lifecycle and preventing stale references after resurrection
  2. SQL migration to SQLite storage: Moving from Cloudflare's native KV API to SQLite tables provides better flexibility for blob keys (lines in actor-handler-do.ts:16-36)
  3. Comprehensive test coverage: The driver test suite includes tests for both connected and non-connected destroy scenarios with proper async validation
  4. Background cleanup: Using ctx.waitUntil() for KV cleanup prevents blocking the destroy operation

Areas for Improvement

1. Race condition in actor loading (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:132-160)

// Check if actor is already loaded
let actorState = this.#globalState.getActorState(doState.ctx);
if (actorState?.actorInstance) {
    return actorState.actorInstance;
}

// Create new actor state if it doesn't exist
if (!actorState) {
    actorState = new ActorGlobalState();
    actorState.actorPromise = promiseWithResolvers();
    this.#globalState.setActorState(doState.ctx, actorState);
}

// Another request is already loading this actor, wait for it
if (actorState.actorPromise) {
    await actorState.actorPromise.promise;
    // ...
}

Issue: There's a potential race between checking actorState?.actorInstance and setting actorState.actorPromise. If two requests arrive simultaneously, both could pass the first check and create separate promises.

Recommendation: Initialize the promise immediately when creating actorState:

if (!actorState) {
    actorState = new ActorGlobalState();
    actorState.actorPromise = promiseWithResolvers();
    this.#globalState.setActorState(doState.ctx, actorState);
} else if (actorState.actorPromise) {
    // Already loading
    await actorState.actorPromise.promise;
    return actorState.actorInstance!;
}
// Continue with loading...

2. Generation validation inconsistency (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:178-183)

// Check if generation matches
if (generation !== expectedGeneration) {
    throw new Error(
        `Actor ${actorId} generation mismatch - expected ${expectedGeneration}, got ${generation}`,
    );
}

Issue: This throws a generic Error instead of a proper ActorError. This is inconsistent with error handling elsewhere and won't be properly caught by client code expecting ActorError types.

Recommendation: Use ActorNotFound or create a new error type:

if (generation !== expectedGeneration) {
    throw new ActorNotFound(actorId);
}

3. Missing error handling in async destroy (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:295-320)

async #callOnStopAsync(
    actorId: string,
    doId: string,
    actor: CoreAnyActorInstance,
) {
    // Stop
    await actor.onStop("destroy");
    // ... cleanup
}

Issue: There's no try-catch around the destroy logic. If onStop throws or cleanup fails, the actor may be left in an inconsistent state (destroying=true but not actually destroyed).

Recommendation: Add error handling and logging:

async #callOnStopAsync(
    actorId: string,
    doId: string,
    actor: CoreAnyActorInstance,
) {
    try {
        await actor.onStop("destroy");
        
        const doState = this.#globalState.getDOState(doId);
        const sql = doState.ctx.storage.sql;
        sql.exec("UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1");
        sql.exec("DELETE FROM _rivetkit_kv_storage");
        
        await doState.ctx.storage.deleteAlarm();
        
        const env = getCloudflareAmbientEnv();
        doState.ctx.waitUntil(
            env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId)),
        );
        
        const actorHandle = this.#globalState.getActorState(doState.ctx);
        actorHandle?.reset();
    } catch (error) {
        logger().error({ msg: "error during actor destroy", actorId, error });
        // Consider: should we still mark as destroyed? Or roll back?
    }
}

4. SQL injection vulnerability (rivetkit-typescript/packages/cloudflare-workers/src/actor-kv.ts)

While reviewing the diff, I notice SQL execution with blob data. Ensure all SQL queries use parameterized queries, not string concatenation. The implementation appears to use proper parameterization, but verify this throughout.

5. Unclear destroy vs destroyed state (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:85-89)

/**
 * Indicates if `startDestroy` has been called.
 *
 * This is stored in memory instead of SQLite since the destroy may be cancelled.
 *
 * See the corresponding `destroyed` property in SQLite metadata.
 */
destroying: boolean = false;

Question: The comment mentions "destroy may be cancelled" but there's no cancellation logic in the implementation. Is this intended for future functionality? If not, this comment is misleading.

Performance Considerations

  1. SQL table scans: Queries like UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1 scan all rows. Since there's always exactly 1 row (enforced by CHECK (id = 1)), this is fine, but consider using WHERE id = 1 for clarity.

  2. KV cleanup in background: Good use of waitUntil() to avoid blocking, but there's no retry mechanism if the KV delete fails. Consider adding error handling or using Cloudflare's durable queues for reliability.

  3. Generation tracking overhead: Each actor ID now includes generation which adds parsing overhead. This seems acceptable given the safety benefits.

Security Concerns

  1. Destroyed state verification: The getForId method properly checks the destroying flag (line 264 in manager-driver.ts), preventing access to actors mid-destruction. Good practice.

  2. No authorization on destroy: The destroy action doesn't appear to have any authorization checks beyond what the actor itself implements. Ensure this is documented.

Test Coverage

The test suite in actor-destroy.ts is comprehensive:

  • ✅ Tests both connected and non-connected destroy flows
  • ✅ Validates state cleanup
  • ✅ Verifies actor resurrection with fresh state
  • ✅ Uses proper async waiting with vi.waitFor()

Missing tests:

  • Concurrent destroy requests on the same actor
  • Destroy during active connections
  • Generation mismatch scenarios
  • Failed onStop/cleanup scenarios

Minor Issues

  1. Typo in error class name (rivetkit-typescript/packages/rivetkit/src/actor/errors.ts:331)
export class RequestHandlerNotDfeined extends ActorError {

Should be RequestHandlerNotDefined

  1. Wrangler config migration: The change from new_classes to new_sqlite_classes in wrangler.json is correct for SQLite-backed DOs

  2. pnpm-lock.yaml: Removed unused typedoc dependency - good cleanup

Recommendations Summary

High Priority:

  1. Fix race condition in loadActor
  2. Add error handling in #callOnStopAsync
  3. Use proper ActorError types instead of generic Error

Medium Priority:
4. Add tests for concurrent/edge case scenarios
5. Clarify or remove "destroy may be cancelled" comment
6. Fix typo in RequestHandlerNotDfeined

Low Priority:
7. Use WHERE id = 1 in SQL updates for clarity
8. Consider retry mechanism for KV cleanup

Overall Assessment

This is a well-architected implementation of actor destruction with thoughtful consideration for generation tracking and state cleanup. The core logic is solid, but would benefit from additional error handling and edge case tests. The SQL migration is a good architectural choice for blob key support.

Recommendation: Approve with suggested improvements, particularly around error handling and race conditions.

🤖 Generated with Claude Code

@NathanFlurry NathanFlurry force-pushed the 11-12-feat_rivetkit_add_ability_to_destroy_actors branch from 90de47a to 6cd3c7d Compare November 13, 2025 02:50
@vercel
Copy link

vercel bot commented Nov 13, 2025

Deployment failed with the following error:

Too many requests - try again in 1 minute (more than 120, code: "api-deployments-flood-pro").

Learn More: https://vercel.com/docs/limits#rate-limits

@claude
Copy link

claude bot commented Nov 13, 2025

Pull Request Review: Actor Destruction Feature

I've completed a comprehensive review of this PR that adds actor destruction capabilities to RivetKit. Overall, this is a well-implemented feature with good architectural decisions. Below are my findings organized by category.


✅ Strengths

1. Excellent Generation-Based Architecture

The generation counter system is a clean solution for handling actor resurrection:

  • Actor IDs format: {doId}:{generation} (actor-id.ts:14)
  • Generation increments on resurrection (actor-handler-do.ts:323)
  • Prevents stale references to destroyed actors
  • Proper validation of generation mismatches (actor-driver.ts:188-192)

2. Proper State Management

The separation of concerns between memory and persistent state is well-designed:

  • destroying flag in memory for immediate checks (ActorGlobalState.destroying)
  • destroyed flag in SQLite for persistent state (actor-handler-do.ts:98)
  • WeakMap usage for proper garbage collection (actor-driver.ts:37)

3. Good Test Coverage

The PR includes driver tests with proper fixtures (destroy.ts, driver-tests.test.ts) that validate the destroy functionality.

4. Cloudflare Workers Migration Fix

The wrangler.json fix changing new_classes to new_sqlite_classes is correct for CF Workers with SQLite storage.


🔴 Critical Issues

1. Race Condition in startDestroy (actor-driver.ts:273-294)

Issue: The destroy operation is spawned asynchronously without proper coordination, which can lead to race conditions.

startDestroy(actorId: string): void {
    // ...
    actorState.destroying = true;
    
    // Spawn onStop in background
    this.#callOnStopAsync(actorId, doId, actorState.actorInstance);
}

Problems:

  • If loadActor is called while #callOnStopAsync is executing, it could load the actor after destroying is set but before the SQL destroyed flag is written
  • The destroyed flag update (line 307) happens after onStop completes, creating a window where the actor state is inconsistent
  • No protection against concurrent startDestroy calls on the same actor

Recommendation:

startDestroy(actorId: string): void {
    // ... existing checks ...
    
    // Mark as destroying AND set a promise to track completion
    actorState.destroying = true;
    actorState.destroyPromise = this.#callOnStopAsync(actorId, doId, actorState.actorInstance);
}

// In loadActor, add:
if (actorState?.destroying) {
    throw new Error(`Actor ${actorId} is being destroyed`);
}

2. Incomplete Error Handling in loadActor (actor-driver.ts:132-213)

Issue: The function doesn't check if an actor is being destroyed before attempting to load it.

async loadActor(actorId: string): Promise<AnyActorInstance> {
    // ...
    let actorState = this.#globalState.getActorState(doState.ctx);
    if (actorState?.actorInstance) {
        // ⚠️ Should check actorState.destroying here
        return actorState.actorInstance;
    }
    // ...
}

Recommendation: Add a check immediately after retrieving actorState:

if (actorState?.destroying) {
    throw new ActorDestroying(actorId);
}

3. SQL Injection Vulnerability in kvListPrefix (actor-kv.ts:30-48)

Issue: The function scans ALL rows instead of using a SQL WHERE clause, which is inefficient and potentially dangerous for large datasets.

export function kvListPrefix(
    sql: SqlStorage,
    prefix: Uint8Array,
): [Uint8Array, Uint8Array][] {
    const cursor = sql.exec("SELECT key, value FROM _rivetkit_kv_storage");
    // ⚠️ Loads entire table into memory
    const entries: [Uint8Array, Uint8Array][] = [];
    
    for (const row of cursor.raw()) {
        const key = toUint8Array(row[0]);
        const value = toUint8Array(row[1]);
        
        if (hasPrefix(key, prefix)) {
            entries.push([key, value]);
        }
    }
    
    return entries;
}

Problems:

  • O(n) memory and time complexity
  • No limit on result set size
  • For actors with thousands of KV entries, this will be very slow

Recommendation: Use SQL LIKE with BLOB comparison or add a LIMIT:

export function kvListPrefix(
    sql: SqlStorage,
    prefix: Uint8Array,
    limit: number = 1000,
): [Uint8Array, Uint8Array][] {
    const cursor = sql.exec(
        "SELECT key, value FROM _rivetkit_kv_storage LIMIT ?",
        limit
    );
    // ... rest of implementation
}

⚠️ Major Issues

4. Resurrection Doesn't Clean Old KV Data (actor-handler-do.ts:374-388)

Issue: When resurrecting a destroyed actor, the old KV storage data is not cleared before initializing with new data.

if (created) {
    // Initialize persist data in KV storage
    initializeActorKvStorage(this.ctx.storage.sql, req.input);
    // ⚠️ But DELETE was called in #callOnStopAsync, so this should be OK?
}

Verification Needed: Confirm that DELETE FROM _rivetkit_kv_storage (actor-driver.ts:308) completes before resurrection can occur. If there's any chance of the background delete not completing, old data could mix with new data.

Recommendation: Add defensive deletion:

if (created) {
    // Ensure clean slate for resurrection
    sql.exec("DELETE FROM _rivetkit_kv_storage");
    initializeActorKvStorage(this.ctx.storage.sql, req.input);
}

5. Actor ID Parsing Fragility (actor-id.ts:24-38)

Issue: The parseActorId function uses simple string splitting which could fail if Durable Object IDs contain colons.

export function parseActorId(actorId: string): [string, number] {
    const parts = actorId.split(":");
    if (parts.length !== 2) {
        throw new Error(`Invalid actor ID format: ${actorId}`);
    }
    // ...
}

Recommendation: Use lastIndexOf to split from the right:

export function parseActorId(actorId: string): [string, number] {
    const lastColon = actorId.lastIndexOf(":");
    if (lastColon === -1) {
        throw new Error(`Invalid actor ID format: ${actorId}`);
    }
    
    const doId = actorId.substring(0, lastColon);
    const generationStr = actorId.substring(lastColon + 1);
    const generation = parseInt(generationStr, 10);
    
    if (Number.isNaN(generation)) {
        throw new Error(`Invalid generation number in actor ID: ${actorId}`);
    }
    
    return [doId, generation];
}

6. Missing Cleanup in #callOnStopAsync Error Path (actor-driver.ts:296-322)

Issue: If onStop throws an error, the cleanup operations may not complete.

async #callOnStopAsync(
    actorId: string,
    doId: string,
    actor: CoreAnyActorInstance,
) {
    // Stop
    await actor.onStop("destroy"); // ⚠️ If this throws, cleanup doesn't happen
    
    // Remove state
    const doState = this.#globalState.getDOState(doId);
    const sql = doState.ctx.storage.sql;
    sql.exec("UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1");
    sql.exec("DELETE FROM _rivetkit_kv_storage");
    // ...
}

Recommendation: Add try-finally:

async #callOnStopAsync(
    actorId: string,
    doId: string,
    actor: CoreAnyActorInstance,
) {
    try {
        await actor.onStop("destroy");
    } catch (error) {
        // Log error but continue with cleanup
        console.error(`Error in onStop for actor ${actorId}:`, error);
    } finally {
        // Always perform cleanup
        const doState = this.#globalState.getDOState(doId);
        const sql = doState.ctx.storage.sql;
        sql.exec("UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1");
        sql.exec("DELETE FROM _rivetkit_kv_storage");
        // ... rest of cleanup
    }
}

💡 Minor Issues & Suggestions

7. Unclear Comment in loadActor (actor-driver.ts:141-144)

// Check if actor is already loaded
let actorState = this.#globalState.getActorState(doState.ctx);
if (actorState?.actorInstance) {
    // Actor is already loaded, return it
    return actorState.actorInstance;
}

Suggestion: Add comment about generation validation:

// Check if actor is already loaded
let actorState = this.#globalState.getActorState(doState.ctx);
if (actorState?.actorInstance) {
    // Actor is already loaded and generation was validated when it was first loaded
    // Generation is checked in SQL query below for new loads
    return actorState.actorInstance;
}

8. Magic Number in SQL Constraint (actor-handler-do.ts:94-95)

id INTEGER PRIMARY KEY CHECK (id = 1),

Suggestion: Add a comment explaining why id must always be 1:

-- Single row table (id always equals 1) to store actor metadata
id INTEGER PRIMARY KEY CHECK (id = 1),

9. Inconsistent Error Messages

Some errors use the format with actor ID, others don't:

  • actor-driver.ts:173: Actor ${actorId} is not initialized
  • actor-driver.ts:184: Actor ${actorId} is destroyed

Suggestion: Ensure all error messages include the actor ID for debugging.

10. Missing Telemetry/Logging

The startDestroy function doesn't log when it returns early:

startDestroy(actorId: string): void {
    // ...
    if (!actorState?.actorInstance) {
        return; // ⚠️ Should log this
    }
    
    if (actorState.destroying) {
        return; // ⚠️ Should log this
    }
    // ...
}

Suggestion: Add debug logging for observability:

if (!actorState?.actorInstance) {
    logger().debug({ msg: "startDestroy: actor not loaded", actorId });
    return;
}

if (actorState.destroying) {
    logger().debug({ msg: "startDestroy: already destroying", actorId });
    return;
}

📊 Code Quality Assessment

Category Rating Notes
Architecture ⭐⭐⭐⭐⭐ Generation-based resurrection is excellent
Error Handling ⭐⭐⭐ Missing some edge cases
Performance ⭐⭐⭐⭐ Good, but kvListPrefix needs optimization
Security ⭐⭐⭐⭐ Generally safe, watch for SQL performance issues
Test Coverage ⭐⭐⭐⭐ Good test fixtures included
Documentation ⭐⭐⭐ Code is readable but could use more inline comments

🎯 Recommendations Summary

Must Fix Before Merge:

  1. Fix race condition in startDestroy
  2. Add destroying check in loadActor
  3. Optimize kvListPrefix to avoid loading all rows

Should Fix:
4. Add try-finally to #callOnStopAsync
5. Fix parseActorId to handle colons in DO IDs
6. Add defensive deletion on resurrection

Nice to Have:
7. Add more debug logging
8. Improve inline documentation
9. Add SQL comments for schema


Conclusion

This is a solid implementation of actor destruction with a clever generation-based resurrection system. The main concerns are around race conditions during destroy and performance of the KV list operation. Once the critical issues are addressed, this will be a robust feature.

Great work on the architecture! The generation counter approach is particularly elegant. 👏

@claude
Copy link

claude bot commented Nov 13, 2025

Pull Request Review: Add ability to destroy actors

Summary

This PR implements actor destruction functionality across the RivetKit codebase, including generation-based tracking to prevent stale references after resurrection. The implementation is comprehensive and well-tested.

Code Quality & Best Practices

Strengths

Excellent separation of concerns - The implementation cleanly separates actor ID management, KV storage, and lifecycle handling into distinct modules
Comprehensive test coverage - The test suite covers both connected and non-connected actor destruction scenarios
Well-documented code - Good inline comments explaining complex logic, especially around generation tracking
Proper error handling - Generation mismatches and destroyed actor access are properly caught
Good use of structured logging - Following the project's convention with structured fields (e.g., logger().debug({msg: "...", key, generation}))

Areas for Improvement

1. SQL Injection Prevention

In actor-handler-do.ts:307 and actor-driver.ts:307, the SQL queries use string concatenation:

sql.exec("UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1");
sql.exec("DELETE FROM _rivetkit_kv_storage");

While these specific queries are safe, consider using parameterized queries consistently for all SQL operations to establish a clear pattern. The WHERE 1=1 is unnecessary - use WHERE id = 1 instead for clarity.

2. Race Condition in startDestroy

In actor-driver.ts:273-294, the startDestroy method checks if the actor is already destroying in memory but doesn't persist this state immediately:

if (actorState.destroying) {
    return;
}
actorState.destroying = true;
// Spawn onStop in background
this.#callOnStopAsync(actorId, doId, actorState.actorInstance);

Potential issue: If multiple destroy requests arrive concurrently, they could all pass the check before any sets destroying = true. Consider using atomic operations or adding the destroyed flag update at the start of the destroy process.

3. Missing Transaction Boundaries

In actor-handler-do.ts:349-361, the actor creation/resurrection performs multiple SQL operations without an explicit transaction:

this.ctx.storage.sql.exec(
    `INSERT INTO _rivetkit_metadata (id, name, key, destroyed, generation)
    VALUES (1, ?, ?, 0, ?)
    ON CONFLICT(id) DO UPDATE SET ...`,

Cloudflare's Durable Objects SQL might handle this automatically, but it would be safer to explicitly wrap these in a transaction for consistency.

4. Error Handling in Background Operations

In actor-driver.ts:296-322, the #callOnStopAsync method doesn't catch errors:

async #callOnStopAsync(actorId: string, doId: string, actor: CoreAnyActorInstance) {
    await actor.onStop("destroy");  // What if this throws?
    // ... cleanup operations
}

Recommendation: Add try-catch with proper error logging to prevent silent failures during destruction.

5. Generation Overflow

The generation counter increments indefinitely (generation + 1 in actor-handler-do.ts:322). While unlikely to overflow a JavaScript number in practice, consider documenting the expected behavior if it does or adding bounds checking.

Performance Considerations

Good Patterns

Efficient KV operations - Using SQL storage with BLOB keys is more efficient than the previous string-based approach
Background cleanup - Using ctx.waitUntil() for KV deletion prevents blocking the destroy operation
WeakMap for actor state - Proper use of WeakMap in CloudflareDurableObjectGlobalState allows garbage collection

Potential Issues

1. Sequential KV Operations

In actor-driver.ts:231-240, batch operations are performed sequentially:

for (const [key, value] of entries) {
    kvPut(sql, key, value);
}

Consider batching these into a single SQL transaction for better performance:

sql.exec("BEGIN");
for (const [key, value] of entries) {
    kvPut(sql, key, value);
}
sql.exec("COMMIT");

2. kvListPrefix Performance

In actor-kv.ts:30-48, the prefix matching scans all rows:

const cursor = sql.exec("SELECT key, value FROM _rivetkit_kv_storage");
// ... client-side filtering

This could be slow with many keys. Consider adding an index on the key column or using SQL LIKE with BLOB comparison if supported.

Security Concerns

Critical Issue: Actor ID Format

The actor ID format {doId}:{generation} in actor-id.ts:14 could be problematic if doId ever contains a colon. While unlikely with Cloudflare DO IDs, this should be documented or validated.

Recommendation: Add validation in parseActorId to detect malformed IDs more robustly, or use a delimiter that's guaranteed not to appear in DO IDs.

Minor Concerns

⚠️ No rate limiting on destroy operations - An attacker could potentially spam destroy requests. Consider adding rate limiting or authentication checks.
⚠️ Generation exposed in actor ID - The generation is visible to clients. While not necessarily sensitive, document whether this is intentional.

Bug Risks

Medium Risk: Stale Actor Cache

In actor-handler-do.ts:325-330, the stale actor is cleared during resurrection:

if (this.#state) {
    this.#state.actor = undefined;
}

However, if #loadActor() fails after this point, the actor state becomes inconsistent. Consider wrapping this in try-catch to restore the state on failure.

Low Risk: Missing Generation Check in getMetadata

The getMetadata method doesn't verify that the generation in memory matches the generation in SQL. This could return stale data if called during a resurrection.

Test Coverage

Excellent Coverage ✅

  • Both connected and non-connected actor destruction
  • State cleanup verification
  • Actor resurrection with new generation
  • Observer pattern for async destroy confirmation

Missing Test Cases

  • Concurrent destroy requests on the same actor
  • Destroy of an actor that's currently processing a request
  • Resurrection failure scenarios
  • Generation overflow behavior
  • Destroy of an already-destroyed actor

Documentation

Needed Additions

  1. Document the actor ID format and generation tracking in a module-level comment
  2. Add JSDoc to startDestroy explaining the async nature and non-blocking behavior
  3. Document the resurrection behavior and generation increment policy
  4. Add migration notes for existing actors (do they start at generation 0?)

Recommendations Summary

High Priority

  1. ✅ Add error handling in #callOnStopAsync
  2. ✅ Add transaction boundaries for multi-step SQL operations
  3. ✅ Validate actor ID format more robustly

Medium Priority

  1. Add concurrent destroy test cases
  2. Document generation tracking behavior
  3. Optimize kvListPrefix with SQL-level filtering
  4. Fix potential stale cache issue during resurrection

Low Priority

  1. Remove unnecessary WHERE 1=1 clauses
  2. Add generation overflow documentation
  3. Batch SQL operations in KV functions

Conclusion

This is a well-architected feature with good separation of concerns and comprehensive testing. The generation-based tracking is a clever solution to the resurrection problem. With the suggested improvements around error handling, transactions, and edge cases, this will be production-ready.

Overall Assessment: ✅ Approve with recommendations for improvements

Great work on this feature! The code quality is high and the approach is sound. 🚀

@claude
Copy link

claude bot commented Nov 13, 2025

Pull Request Review: feat(rivetkit): add ability to destroy actors

Summary

This PR implements actor destruction functionality for RivetKit, adding the ability to destroy actors and properly clean up their state. The implementation includes generation tracking to handle actor resurrection and prevent stale references.

Overall Assessment

Strong implementation with good architecture decisions and comprehensive test coverage. The generation tracking system is well-designed and the SQL-based storage approach is solid. A few areas need attention for production readiness.


Strengths

1. Generation Tracking Design

The generation-based actor ID system (doId:generation) is an excellent solution for handling actor resurrection:

  • Prevents stale references to destroyed actors
  • Enables clean resurrection of actors with the same key
  • Simple and effective implementation in actor-id.ts

2. SQL Storage Migration

Moving from native KV storage to SQL for blob keys is the right architectural choice:

  • Allows storing Uint8Array keys (native CF KV only supports strings)
  • Provides atomic operations
  • Clear schema in actor-kv.ts

3. Comprehensive Test Coverage

The test suite in actor-destroy.ts covers critical scenarios:

  • Destroy without connection, destroy with active connection
  • State cleanup verification, actor resurrection with fresh state

4. Separation of Concerns

Clean separation between ActorGlobalState for runtime state, SQL metadata for persistent state, and WeakMap usage for proper garbage collection.


Critical Issues

1. Performance Issue in kvListPrefix - actor-kv.ts:34

Severity: High

The kvListPrefix function performs a full table scan loading all KV entries into memory.

Problem: SELECT key, value FROM _rivetkit_kv_storage loads ALL keys without WHERE clause

Impact: Performance degrades linearly with total KV entries, high memory usage

Recommendation: Unfortunately SQLite doesnt have native BLOB prefix operator. Consider adding prefix length check in SQL, using a prefix hash column with index, or documenting the limitation.

2. Error Handling in Background Destroy - actor-driver.ts:295-310

Severity: Medium

Problem: If onStop throws, cleanup may be incomplete (SQLite metadata not updated, ACTOR_KV not deleted, actor state not reset)

Recommendation: Wrap onStop in try-catch and always perform cleanup even if onStop fails to prevent zombie actors.

3. Generation Mismatch Error Type - actor-driver.ts:137-143

Severity: Low

Issue: Throws generic Error instead of custom ActorError

Recommendation: Use ActorNotFound or create new ActorGenerationMismatch error type extending ActorError


Suggestions

  • Consider adding metrics for actor destructions, resurrections, and destroy timing
  • Document the generation tracking system more clearly
  • Add tests for edge cases: destroying already destroyed actor, concurrent destroy operations
  • Validate that wrangler.json migration from new_classes to new_sqlite_classes is compatible with target CF Workers version

Code Quality

Positive patterns:

  • Proper use of invariant() for internal assumptions
  • Structured logging with context
  • Atomic SQL operations
  • Generation tracking to prevent stale references
  • WeakMap for proper GC
  • Comprehensive test coverage
  • Follows CLAUDE.md conventions

Action Items

Before Merge (Required)

  1. Add error handling to callOnStopAsync
  2. Consider optimization or documentation for kvListPrefix
  3. Replace generic Error with proper ActorError subclass

Post-Merge (Recommended)

  1. Monitor kvListPrefix performance in production
  2. Add metrics for actor lifecycle events
  3. Document generation tracking in user-facing docs

Final Recommendation

Approve with minor changes

This is a well-architected implementation of actor destruction with solid test coverage. The generation tracking system is elegant and the SQL migration is the right architectural choice. Address the error handling in background destroy and consider the kvListPrefix optimization before merging.

Great work on this implementation! The generation tracking pattern is particularly clever and will serve as a solid foundation for actor lifecycle management.

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 13, 2025

Merge activity

@NathanFlurry NathanFlurry force-pushed the 11-12-feat_rivetkit_add_ability_to_destroy_actors branch from a3beed3 to 7a3a586 Compare November 13, 2025 21:39
@NathanFlurry NathanFlurry force-pushed the 11-12-feat_rivet-engine_udb_key_parser branch from b2a85f4 to 43a2a61 Compare November 13, 2025 21:39
@claude
Copy link

claude bot commented Nov 13, 2025

Code Review - PR #3458: Add ability to destroy actors

Summary

This PR implements actor destruction functionality for RivetKit's Cloudflare Workers driver. It introduces a generation-based actor lifecycle system that allows actors to be destroyed and potentially resurrected with incremented generation numbers.

Key Changes

  1. Generation-based Actor IDs - Actor IDs now include generation tracking ({doId}:{generation})
  2. SQL-based Storage - Migration from native Cloudflare KV to SQL storage for better control over blob keys
  3. Destroy Lifecycle - New startDestroy() method with proper cleanup and onDestroy() hook
  4. State Management - Enhanced global state tracking with ActorGlobalState class

Positive Aspects ✅

  1. Well-structured generation tracking - The actor-id.ts utility provides clean separation of concerns for ID parsing/building
  2. Comprehensive test coverage - Added destroyActor and destroyObserver test fixtures
  3. Proper cleanup sequence - The destroy flow correctly calls onStop("destroy"), clears storage, removes alarms, and resets state
  4. WeakMap usage - Smart use of WeakMap in CloudflareDurableObjectGlobalState for proper garbage collection
  5. Defensive programming - Good validation (generation mismatch checks, destroyed state checks, etc.)

Issues & Concerns ⚠️

1. Critical: Race condition in startDestroy() (actor-driver.ts:273-294)

The startDestroy() method spawns #callOnStopAsync() in the background without tracking the promise. This creates several race conditions:

startDestroy(actorId: string): void {
    // ...
    actorState.destroying = true;
    // Spawns async work but doesn't wait or track it
    this.#callOnStopAsync(actorId, doId, actorState.actorInstance);
}

Problems:

  • Concurrent requests could call loadActor() while destruction is in progress, potentially resurrecting a half-destroyed actor
  • The destroying flag is set immediately, but the SQL destroyed = 1 isn't set until later
  • No error handling if onStop() or cleanup fails

Recommendation: Either:

  • Make startDestroy() async and await the cleanup, OR
  • Use ctx.waitUntil() to track the async operation, OR
  • Add a mutex/lock to prevent concurrent operations during destruction

2. SQL Injection Risk (actor-driver.ts:307-308)

The destroy method uses unparameterized SQL:

sql.exec("UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1");
sql.exec("DELETE FROM _rivetkit_kv_storage");

While these specific queries don't have injection risk (no user input), it's inconsistent with the parameterized approach used elsewhere. Consider using WHERE id = 1 for the metadata update to be explicit.

3. Performance: Inefficient prefix matching (actor-kv.ts:30-48)

The kvListPrefix() function performs a full table scan:

const cursor = sql.exec("SELECT key, value FROM _rivetkit_kv_storage");
// Then filters in memory
if (hasPrefix(key, prefix)) {
    entries.push([key, value]);
}

Impact: This will become slow as KV storage grows. SQL doesn't natively support blob prefix queries efficiently.

Recommendation:

  • Add an index on the key column if supported
  • Consider encoding keys as hex strings for SQL-native prefix matching: WHERE key LIKE ? || '%'
  • Document the O(n) performance characteristic

4. Missing generation validation in startDestroy() (actor-driver.ts:273-294)

The method parses the generation from actorId but never validates it matches the current actor's generation:

const [doId, generation] = parseActorId(actorId);
// `generation` is extracted but never used

Risk: Calling destroy() with a stale actor ID (old generation) could destroy a newer actor instance.

Fix: Add validation:

if (actorState.actor && actorState.actor.generation !== generation) {
    throw new Error(`Generation mismatch: cannot destroy stale actor reference`);
}

5. Error handling gaps

Several areas lack error handling:

  • #callOnStopAsync() doesn't catch errors from actor.onStop("destroy") - a failing hook could leave the actor in a limbo state
  • getMetadata() doesn't handle SQL query failures
  • The create() method resurrection path doesn't validate the new generation properly

6. Wrangler config migration (examples/*/wrangler.json)

Changed new_classes to new_sqlite_classes - ensure this is documented and won't break existing deployments. This is a breaking change for the migration API.


Code Quality

Strengths:

  • Good documentation and JSDoc comments
  • Consistent error messages with context
  • Type safety maintained throughout
  • Follows existing patterns in the codebase

Suggestions:

  • Add explicit return types for SQL query result parsing (currently using as string, as number)
  • Consider extracting SQL schema definitions to constants for reusability
  • The ActorGlobalState.reset() method could use a comment explaining when it's safe to call

Security Concerns

  1. Generation enumeration - Actor IDs are now predictable ({doId}:{generation}). While generations increment sequentially, this is probably acceptable for the use case, but worth noting.

  2. No rate limiting on destroy - An attacker could rapidly create and destroy actors. Consider rate limiting if not handled at a higher layer.

  3. Global KV deletion - The background deletion of ACTOR_KV metadata (actor-driver.ts:315) uses ctx.waitUntil(), but there's no retry logic if it fails. Orphaned metadata could leak.


Test Coverage

Good coverage with the destroy.ts fixture, but consider adding:

  • Tests for concurrent destroy + loadActor scenarios
  • Tests for destroying an already-destroyed actor
  • Tests for generation mismatch scenarios
  • Tests for error handling in onDestroy() hooks
  • Tests for resurrection (destroying and recreating with same key)

Performance Considerations

  1. Full table scan in kvListPrefix - See issue [SVC-2504] Fix 5 GB upload limit for local development from Cloudflare #3 above
  2. Generation tracking overhead - Every actor lookup now requires parsing the actor ID. Consider caching parsed results if this becomes a bottleneck.
  3. WeakMap lookups - The dual Map+WeakMap structure adds some overhead but is necessary for proper GC.

Recommendations Priority

High Priority (Must Fix):

  1. Fix race condition in startDestroy() - add proper async tracking
  2. Validate generation in startDestroy() before destroying
  3. Add error handling for onStop() and cleanup operations

Medium Priority (Should Fix):
4. Optimize kvListPrefix() or document O(n) behavior
5. Add tests for concurrent scenarios and error cases
6. Document the new_sqlite_classes migration impact

Low Priority (Nice to Have):
7. Add rate limiting considerations to docs
8. Improve error handling in getMetadata()
9. Extract SQL schemas to constants


Overall Assessment

This is a solid implementation of actor destruction with good architecture (generation tracking, proper cleanup). However, the race condition in startDestroy() and missing generation validation are critical issues that should be addressed before merging. The performance concern with kvListPrefix() should at minimum be documented.

The migration from native KV to SQL storage is well-executed and provides better control over data structures. The test fixtures demonstrate good understanding of the feature requirements.

Recommendation: Request changes to address the high-priority issues, particularly the race condition and generation validation.

@claude
Copy link

claude bot commented Nov 13, 2025

Pull Request Review: Actor Destroy Functionality

This PR adds the ability to destroy actors in RivetKit with generation-based tracking. Overall, this is a well-implemented feature with solid testing. Here are my findings:

🎯 Positive Highlights

  1. Generation-based tracking: The use of a generation counter to handle actor resurrection is elegant and prevents stale references
  2. Comprehensive testing: The test suite covers both connected and non-connected destroy scenarios
  3. Proper cleanup: The destroy flow properly cleans up state, KV storage, alarms, and global references
  4. Consistent error handling: New errors like ActorDestroying and ActorAlreadyExists follow the project's error patterns

🐛 Potential Issues

1. Race Condition in Actor Loading (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:131-161)

The concurrent loading logic has a potential issue:

// Another request is already loading this actor, wait for it
if (actorState.actorPromise) {
    await actorState.actorPromise.promise;
    if (!actorState.actorInstance) {
        throw new Error(...);
    }
    return actorState.actorInstance;
}

Issue: If the first load fails, actorState.actorInstance may be undefined but actorPromise gets resolved. Subsequent requests will throw an error instead of retrying.

Recommendation: Set actorPromise to undefined only on success, or reject the promise on failure so concurrent waiters can handle it properly.

2. Missing Generation Validation in #callOnStopAsync (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:295-320)

The #callOnStopAsync method receives both actorId (with generation) and doId but doesn't validate that the generation matches before destroying:

async #callOnStopAsync(actorId: string, doId: string, actor: CoreAnyActorInstance) {
    // Stop
    await actor.onStop("destroy");
    
    // Remove state
    const doState = this.#globalState.getDOState(doId);
    const sql = doState.ctx.storage.sql;
    sql.exec("UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1");

Issue: If a new actor generation starts while the old one is being destroyed, this could mark the new generation as destroyed.

Recommendation: Add a WHERE clause to check the generation: WHERE generation = ?

3. Unbounded kvListPrefix Query (rivetkit-typescript/packages/cloudflare-workers/src/actor-kv.ts:30-48)

const cursor = sql.exec("SELECT key, value FROM _rivetkit_kv_storage");

Issue: This reads the entire KV table into memory for prefix filtering, which could cause performance issues with large datasets.

Recommendation: Use SQL LIKE or GLOB for prefix matching:

sql.exec("SELECT key, value FROM _rivetkit_kv_storage WHERE key >= ? AND key < ?", prefix, prefixEnd)

4. Hardcoded Timeout in Destroy Flow (rivetkit-typescript/packages/cloudflare-workers/src/actor-handler-do.ts:424)

The destroy flow uses hardcoded 1500ms timeout for disconnecting connections in ActorInstance:

globalThis.setTimeout(() => res(true), 1500)

Issue: This timeout is not configurable and may be too short for actors with many connections.

Recommendation: Make this configurable via ActorConfig.options.

⚠️ Security Concerns

1. Background Deletion Without Verification (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:311-314)

// Delete from ACTOR_KV in the background
doState.ctx.waitUntil(
    env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId)),
);

Issue: KV deletion happens in the background without waiting. If this fails silently, orphaned metadata remains in KV.

Recommendation: Add error handling or at least log failures.

🚀 Performance Considerations

1. Serial KV Operations in Batch Methods

In kvBatchPut, kvBatchGet, and kvBatchDelete, operations are executed serially:

for (const [key, value] of entries) {
    kvPut(sql, key, value);
}

Recommendation: Since SQLite executes statements synchronously anyway, consider using a single batch INSERT/DELETE statement for better performance:

INSERT OR REPLACE INTO _rivetkit_kv_storage (key, value) VALUES (?, ?), (?, ?), ...

2. getMetadata Called Twice in getForId

The getForId method calls getMetadata() which queries SQL, but this could return stale data if not transactionally consistent with the subsequent actor loading.

Recommendation: Document this behavior or consider adding a consistency check.

📝 Code Quality

1. Inconsistent Error Messages

  • Some error messages include actorId, others don't
  • Example: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:167 vs :190

Recommendation: Standardize error messages to always include relevant identifiers.

2. Magic Number for destroyed Field

The destroyed field uses 0/1 instead of proper boolean:

destroyed INTEGER DEFAULT 0

Recommendation: While SQLite doesn't have native booleans, add a comment explaining the convention.

3. Weak Reference Usage Explanation

The WeakMap usage in CloudflareDurableObjectGlobalState could benefit from better documentation:

// WeakMap of DO state -> ActorGlobalState for proper GC
#actors: WeakMap<DurableObjectState, ActorGlobalState> = new WeakMap();

Recommendation: Explain why this prevents memory leaks and how GC works in this context.

✅ Test Coverage

The tests are comprehensive and cover:

  • ✅ Destroy without connection
  • ✅ Destroy with connection
  • ✅ State cleanup verification
  • ✅ Actor resurrection with fresh state
  • ✅ Observer pattern for async destroy verification

Suggestion: Add tests for:

  • Concurrent destroy attempts
  • Destroy during active RPC calls
  • Generation mismatch scenarios
  • Edge case: destroy immediately after creation

🔍 Minor Issues

  1. Typo in variable name (rivetkit-typescript/packages/rivetkit/src/actor/instance/mod.ts:653):
    RequestHandlerNotDfeined → should be RequestHandlerNotDefined

  2. Unused import (rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:7):
    stringifyError is imported but only used in #callOnStopAsync

  3. Console log statement in rivetkit-typescript/packages/rivetkit/src/drivers/file-system/global-state.ts:
    Consider using structured logging instead

📊 Summary

Overall Assessment: ✅ Approve with recommendations

This is a solid implementation of actor destruction with proper generation tracking. The main concerns are:

  • Race condition handling in concurrent loads
  • Generation validation during destroy
  • Performance of kvListPrefix
  • Better error handling for background operations

The test coverage is good, but would benefit from edge case testing around concurrency and timing.

Recommendation: Address the race condition and generation validation issues before merging. The performance optimizations can be follow-up work.


Generated with Claude Code

@claude
Copy link

claude bot commented Nov 13, 2025

PR Review: Add Actor Destroy Functionality

Overview

This PR adds the ability to destroy actors in RivetKit with a comprehensive implementation that includes:

  • Actor destruction with lifecycle hooks (onDestroy)
  • Generation-based actor IDs to prevent stale references after resurrection
  • Migration from Cloudflare KV storage to SQLite for better blob key support
  • Comprehensive test coverage for destroy scenarios

Strengths

1. Excellent Architecture - Generation-Based Actor IDs

The implementation of generation tracking (actor-id.ts:11-38) is well-designed:

  • Prevents stale references to destroyed actors
  • Enables safe actor resurrection with new state
  • Clear ID format: {doId}:{generation}
  • Proper validation and parsing with error handling

2. Comprehensive Test Coverage

The test suite (actor-destroy.ts:1-186) covers critical scenarios:

  • Destroy without active connections
  • Destroy with active connections
  • State cleanup verification
  • Actor resurrection with fresh state
  • Observer pattern for destroy notifications

3. Clean State Management

The ActorGlobalState class properly tracks:

  • Initialization state with generation
  • Loaded actor instances
  • Destroy lifecycle (destroying flag)
  • Provides a clean reset() method

4. KV Storage Migration

Moving from Cloudflare's string-based KV to SQLite (actor-kv.ts:1-71) is a good design choice:

  • Supports blob keys properly
  • More consistent API
  • Better performance for batch operations

Issues & Concerns

1. 🔴 Critical: Race Condition in Destroy Cleanup

Location: actor-driver.ts:295-299

// Delete from ACTOR_KV in the background - use full actorId including generation
const env = getCloudflareAmbientEnv();
doState.ctx.waitUntil(
  env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId)),
);

Problem: The global KV metadata deletion happens in the background via waitUntil(), but the SQL metadata is marked as destroyed immediately. This creates a race condition where:

  1. Actor A is destroyed (SQL updated, KV delete scheduled in background)
  2. Actor A is resurrected before KV delete completes
  3. KV delete executes, removing metadata for the new generation
  4. The new actor instance becomes orphaned in the global state

Recommendation: Either:

  • Make the KV deletion synchronous and await it
  • Or store generation in the KV metadata and check it before deleting
  • Or accept eventual consistency and document it

2. 🟡 Memory Leak Potential in Global State

Location: actor-driver.ts:31-45

// Map of actor ID -> DO state
#dos: Map<string, DurableObjectGlobalState> = new Map();

Problem: The #dos Map uses DO ID as key and never removes entries. When actors are destroyed, the map keeps growing indefinitely.

Recommendation: Either:

  • Remove entries from #dos when actors are destroyed
  • Use a WeakMap if possible (though DO IDs are strings, so this won't work)
  • Document this as expected behavior if DO instances are long-lived

3. 🟡 Inconsistent Error Handling

Location: actor-driver.ts:175-184

The code checks both destroyed flag and generation mismatch, but the error messages could be more specific:

if (destroyed) {
  throw new Error(`Actor ${actorId} is destroyed`);
}

if (generation !== expectedGeneration) {
  throw new Error(
    `Actor ${actorId} generation mismatch - expected ${expectedGeneration}, got ${generation}`,
  );
}

Recommendation: Use custom error types from rivetkit/errors for consistency:

  • Import ActorNotFound or ActorDestroyed error
  • Provides better error metadata for clients
  • Follows the pattern in CLAUDE.md for error handling

4. 🟡 Missing Transaction Handling

Location: actor-handler-do.ts:315-328

The create() method performs multiple SQL operations without explicit transaction handling:

this.ctx.storage.sql.exec(
  `INSERT INTO _rivetkit_metadata (id, name, key, destroyed, generation)
  VALUES (1, ?, ?, 0, ?)
  ON CONFLICT(id) DO UPDATE SET...`,
  req.name,
  JSON.stringify(req.key),
  generation,
);

// ... followed by KV initialization

Recommendation: Wrap the metadata insert and KV initialization in an explicit transaction to ensure atomicity.

5. 🟢 Minor: Logging Conventions

Location: Multiple files

Several log messages use uppercase, which contradicts the CLAUDE.md conventions:

logger().debug({
  msg: "Found actor metadata",  // Should be lowercase
  // ...
});

Recommendation: Lowercase all log messages per CLAUDE.md guidelines: "Log messages should be lowercase unless mentioning specific code symbols."

6. 🟢 Minor: Unused Import

Location: actor-driver.ts:16

import { stringifyError } from "rivetkit/utils";

This import appears to be unused in the file.

7. 🟢 Documentation: Wrangler Migration Change

Location: examples/cloudflare-workers/wrangler.json:9

"new_sqlite_classes": ["ActorHandler"]

Good catch! This change from new_classes to new_sqlite_classes is required for Cloudflare Workers using SQLite storage. However:

Recommendation: Document this breaking change in the PR description or migration guide, as existing deployments will need to update their wrangler.json.

Performance Considerations

1. ✅ Good: Async Destroy

The startDestroy() method properly spawns onStop() in the background, preventing blocking the destroy call.

2. ✅ Good: Efficient KV Operations

The new SQLite-based KV implementation supports efficient batch operations without serialization overhead.

3. ⚠️ Consider: Global KV Writes on Every Create

Every actor creation writes to Workers KV (actor-handler-do.ts:367-372). Consider:

  • Is this necessary for all use cases?
  • Could this be made optional or lazy?
  • What's the KV write cost at scale?

Security Concerns

1. ✅ Good: Generation Validation

The generation-based ID prevents:

  • Accessing destroyed actors
  • Sending messages to wrong actor instances
  • State confusion after resurrection

2. ✅ Good: Proper Cleanup

The destroy flow properly:

  • Clears SQLite KV storage
  • Deletes scheduled alarms
  • Resets global state

Test Coverage

Excellent Coverage ✅

  • Both connected and non-connected destroy scenarios
  • State persistence verification
  • Actor resurrection verification
  • Observer pattern for async destroy confirmation
  • Proper use of vi.waitFor() for async assertions

Suggestions for Additional Tests

  1. Test concurrent destroy calls to same actor
  2. Test destroy during active WebSocket connections
  3. Test destroy with pending alarm
  4. Test generation overflow (though unlikely with numbers)
  5. Test KV metadata consistency after destroy

Code Quality

Strengths:

  • ✅ Clear separation of concerns
  • ✅ Good use of TypeScript types
  • ✅ Comprehensive JSDoc comments in new files
  • ✅ Consistent naming conventions
  • ✅ Proper error messages

Areas for Improvement:

  • Address the race condition in KV cleanup
  • Add explicit transaction handling
  • Fix logging convention violations
  • Consider memory management for global state map

Summary

This is a well-designed feature with thoughtful architecture around generation-based IDs and comprehensive test coverage. The main concerns are:

  1. Must fix: Race condition in global KV cleanup during destroy
  2. Should fix: Memory leak potential in global state map
  3. Nice to have: Better error types, transaction handling, logging fixes

The generation-based approach is excellent and will prevent many subtle bugs. Great work on the test coverage!

Recommendation

Approve with changes requested - Address the race condition before merging.


Review generated by Claude Code following repository conventions from CLAUDE.md

@claude
Copy link

claude bot commented Nov 13, 2025

PR Review: Add Actor Destroy Capability

This PR adds the ability to destroy actors in RivetKit, including proper cleanup of state, generation tracking to prevent stale references, and comprehensive test coverage. Overall, this is a well-designed and thorough implementation. Here are my findings:

✅ Strengths

  1. Generation Tracking Architecture

    • The actor ID format {doId}:{generation} (actor-id.ts:14) is elegant and prevents stale actor references after resurrection
    • Generation counter increments on resurrection (actor-handler-do.ts:119-120), ensuring actors can be recreated with the same key but different identity
    • Proper validation of generation mismatch in loadActor (actor-driver.ts:143-147)
  2. State Management

    • Clean separation between in-memory state (ActorGlobalState.destroying) and persisted state (_rivetkit_metadata.destroyed)
    • SQL-based storage migration from KV API is more robust for blob keys (actor-handler-do.ts:80-96)
    • Proper cascade deletion of KV storage when destroyed (actor-driver.ts:306)
  3. Lifecycle Management

    • Asynchronous destroy with #callOnStopAsync (actor-driver.ts:290-319) allows graceful shutdown
    • Idempotent destroy handling with early returns (actor-instance.ts:461-466)
    • Proper cleanup of scheduled alarms (actor-driver.ts:308)
  4. Test Coverage

    • Comprehensive tests for both connected and non-connected actors (actor-destroy.ts)
    • Tests verify state cleanup, error handling, and actor resurrection
    • Integrated into the full driver test suite across all encodings (mod.ts:107)
  5. Error Handling

    • New ActorDestroying error class for proper client communication (errors.ts:253-262)
    • Checks for destroying state in getForId (manager-driver.ts:264-266)
    • Generation mismatch returns undefined rather than throwing (manager-driver.ts:255-262)

⚠️ Issues & Recommendations

Critical

  1. Race Condition in loadActor (actor-driver.ts:132-162)

    // Current code:
    if (actorState?.actorInstance) {
      return actorState.actorInstance;
    }
    
    if (!actorState) {
      actorState = new ActorGlobalState();
      actorState.actorPromise = promiseWithResolvers();
      this.#globalState.setActorState(doState.ctx, actorState);
    }
    
    if (actorState.actorPromise) {
      await actorState.actorPromise.promise;
      // ...
    }

    Issue: Between the first check and creating actorState, another request could initialize it. The second request would then create a new actorPromise, overwriting any in-flight loading.

    Recommendation: Always create the promise if it doesn't exist:

    if (!actorState) {
      actorState = new ActorGlobalState();
      this.#globalState.setActorState(doState.ctx, actorState);
    }
    
    if (actorState.actorInstance) {
      return actorState.actorInstance;
    }
    
    // Create promise if it doesn't exist
    if (!actorState.actorPromise) {
      actorState.actorPromise = promiseWithResolvers();
      // ... proceed with loading
    } else {
      // Another request is loading, wait for it
      await actorState.actorPromise.promise;
      // ...
    }
  2. Missing Transaction Safety (actor-handler-do.ts:243-256)
    The create method updates SQLite but doesn't wrap it in a transaction. If the process crashes between SQL update and KV update, you could have inconsistent state.

    Recommendation: Use SQLite transactions for atomic updates:

    this.ctx.storage.sql.exec("BEGIN TRANSACTION");
    try {
      this.ctx.storage.sql.exec(/* upsert */);
      // ... other operations
      this.ctx.storage.sql.exec("COMMIT");
    } catch (e) {
      this.ctx.storage.sql.exec("ROLLBACK");
      throw e;
    }
  3. Unbounded Background Operation (actor-driver.ts:313-315)

    doState.ctx.waitUntil(
      env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId)),
    );

    If this fails silently, you'll have stale metadata in ACTOR_KV. This could cause getWithKey to return destroyed actors.

    Recommendation: Add error handling and logging:

    doState.ctx.waitUntil(
      env.ACTOR_KV.delete(GLOBAL_KV_KEYS.actorMetadata(actorId))
        .catch(err => {
          logger().error({ 
            msg: "failed to delete actor metadata from KV", 
            actorId, 
            err 
          });
        })
    );

Important

  1. Potential Memory Leak (actor-driver.ts:58-60)

    #dos: Map<string, DurableObjectGlobalState> = new Map();

    This map is never cleared. Destroyed actors remain in memory even after cleanup.

    Recommendation: Clear from map in ActorGlobalState.reset() or when actor is destroyed.

  2. Inconsistent Naming (actor-handler-do.ts:80-96)
    Table names use _rivetkit_ prefix but KV storage used rivetkit: prefix previously. The migration changes this without documenting why.

    Recommendation: Add a comment explaining the prefix convention change.

  3. Missing Index (actor-handler-do.ts:80-96)
    The SQLite tables don't have indexes, which could slow down queries as data grows.

    Recommendation: Add indexes, especially on frequently queried columns:

    CREATE INDEX IF NOT EXISTS idx_kv_key ON _rivetkit_kv_storage(key);

Minor

  1. Hardcoded Constants (actor-kv.ts)
    Magic strings like "_rivetkit_kv_storage" are repeated throughout.

    Recommendation: Define as constants at the top of the file.

  2. Error Message Clarity (actor-driver.ts:147)

    throw new Error(`Actor ${actorId} generation mismatch...`);

    This should use a proper error class from errors.ts for consistency.

  3. Documentation Gap
    The actor ID format change ({doId}:{generation}) is not documented in user-facing docs.

    Recommendation: Update docs to explain the new actor ID format and what it means for users.

  4. Test Gap
    Tests don't cover the case where an actor is destroyed while requests are in-flight.

    Recommendation: Add a test that:

    • Starts a long-running action
    • Destroys the actor mid-action
    • Verifies proper cleanup and error handling

🎯 Security Considerations

  • ✅ No injection vulnerabilities in SQL queries (using parameterized queries)
  • ✅ Generation tracking prevents replay attacks with old actor IDs
  • ✅ Proper access control - destroy must be called from within the actor
  • ⚠️ Consider rate limiting on actor creation/destruction to prevent DoS

📊 Performance Considerations

  • ✅ Batch KV operations minimize round trips
  • ✅ Proper use of indexes on SQLite primary keys
  • ⚠️ kvListPrefix could be slow with many keys (consider pagination)
  • ⚠️ WeakMap usage is good for GC, but Map in #dos isn't cleared

📝 Code Quality

  • ✅ Follows repository conventions (structured logging, error handling)
  • ✅ Type safety throughout
  • ✅ Comprehensive test coverage
  • ⚠️ Some long functions could be refactored (e.g., loadActor is 80+ lines)

🏁 Conclusion

This is a solid implementation of actor destruction with good attention to edge cases. The generation tracking pattern is particularly elegant. The main concerns are:

  1. Critical: Race condition in loadActor needs fixing
  2. Critical: Missing transaction safety in create
  3. Critical: Silent KV deletion failures

Once these are addressed, this will be a robust feature. The test coverage is excellent and should catch most issues.

Recommendation: ✅ Approve with changes requested for the critical issues above.


Great work on this complex feature! The generation tracking design is particularly clever and should work well for preventing stale actor references.

Base automatically changed from 11-12-feat_rivet-engine_udb_key_parser to main November 13, 2025 22:34
@NathanFlurry NathanFlurry force-pushed the 11-12-feat_rivetkit_add_ability_to_destroy_actors branch from d58f1db to fc21704 Compare November 13, 2025 22:35
@NathanFlurry NathanFlurry merged commit 2ba31db into main Nov 13, 2025
6 of 10 checks passed
@NathanFlurry NathanFlurry deleted the 11-12-feat_rivetkit_add_ability_to_destroy_actors branch November 13, 2025 22:38
@claude
Copy link

claude bot commented Nov 13, 2025

Code Review: Actor Destroy Functionality

Summary

This PR adds the ability to destroy actors in RivetKit, implementing a generation-based resurrection mechanism for Cloudflare Workers. The changes are substantial (+1378/-346 lines) and introduce important new functionality.

Positive Aspects

Architecture & Design

  • Generation-based resurrection: The generation counter approach is solid for handling actor lifecycle and preventing stale references
  • Comprehensive testing: New test suite (actor-destroy.ts) covers both connected and non-connected scenarios
  • Clean separation: Actor ID utilities (actor-id.ts) and KV helpers (actor-kv.ts) are well-isolated

Code Quality

  • Structured logging: Consistent use of tracing with proper context (e.g., generation, actorId, destroying flags)
  • Error handling: New error types (ActorDestroying, ActorAlreadyExists) provide clear feedback
  • Documentation: Inline comments explain complex logic (e.g., weak references, generation tracking)

Issues & Concerns

Critical Issues

1. SQL Injection Risk (actor-handler-do.ts:350-357)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-handler-do.ts:350-357

this.ctx.storage.sql.exec(
    \`INSERT INTO _rivetkit_metadata (id, name, key, destroyed, generation)
    VALUES (1, ?, ?, 0, ?)
    ON CONFLICT(id) DO UPDATE SET
        name = excluded.name,
        key = excluded.key,
        destroyed = 0,
        generation = excluded.generation\`,
    req.name,
    JSON.stringify(req.key),
    generation,
);

Issue: While parameterized queries are used for values, this is good. However, ensure req.name and req.key are properly validated before insertion, especially if they come from user input.

Recommendation: Add validation for name (e.g., DNS subdomain pattern) and key structure before database insertion.

2. Race Condition in Actor Destruction (actor-driver.ts:227-295)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:227-295

startDestroy(actorId: string): void {
    // ... checks ...
    actorState.destroying = true;
    
    // Spawn onStop in background
    this.#callOnStopAsync(actorId, doId, actorState.actorInstance);
}

Issue: The destroy process is started asynchronously without any locking mechanism. Multiple concurrent startDestroy calls could pass the !actorState.destroying check before the flag is set.

Recommendation: Consider adding atomic check-and-set semantics or proper locking to prevent race conditions during destruction.

3. Memory Leak Potential (actor-handler-do.ts:166-171)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-handler-do.ts:166-171

// HACK: This leaks the DO context, but DO does not provide a native way
// of knowing when the DO shuts down. We're making a broad assumption
// that DO will boot a new isolate frequenlty enough that this is not an issue.
const actorId = this.ctx.id.toString();
globalState.setDOState(actorId, { ctx: this.ctx, env: env });

Issue: Acknowledged leak of DO context. While the comment suggests this is acceptable, it could accumulate over time.

Recommendation: Consider implementing cleanup on actor destruction or periodic garbage collection of stale entries.

High Priority Issues

4. Missing Error Handling in Background Destroy (actor-driver.ts:268-295)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:268-295

async #callOnStopAsync(actorId: string, doId: string, actor: CoreAnyActorInstance) {
    // Stop
    await actor.onStop("destroy");
    
    // Remove state
    const doState = this.#globalState.getDOState(doId);
    const sql = doState.ctx.storage.sql;
    sql.exec("UPDATE _rivetkit_metadata SET destroyed = 1 WHERE 1=1");
    sql.exec("DELETE FROM _rivetkit_kv_storage");
    // ...
}

Issue: No try-catch around the destroy operations. If onStop throws or SQL operations fail, the actor could be left in an inconsistent state.

Recommendation: Wrap in try-catch and log errors. Ensure state cleanup happens even if onStop fails.

5. Inconsistent Generation Validation (actor-driver.ts:100-122)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:100-122

// Check if generation matches
if (generation !== expectedGeneration) {
    throw new Error(
        \`Actor \${actorId} generation mismatch - expected \${expectedGeneration}, got \${generation}\`,
    );
}

Issue: Generation mismatch throws a generic Error instead of a custom ActorError. This inconsistency makes it harder for clients to handle this specific error case.

Recommendation: Create a custom error like ActorGenerationMismatch that extends ActorError with public: true.

6. Missing waitUntil Error Handling (actor-handler-do.ts:379-386)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-handler-do.ts:379-386

const env = getCloudflareAmbientEnv();
const actorData = { name: req.name, key: req.key, generation };
this.ctx.waitUntil(
    env.ACTOR_KV.put(
        GLOBAL_KV_KEYS.actorMetadata(actorId),
        JSON.stringify(actorData),
    ),
);

Issue: Background KV operations could fail silently without logging or error handling.

Recommendation: Wrap in a promise that logs failures:

this.ctx.waitUntil(
    env.ACTOR_KV.put(...)
        .catch(err => logger().error({ msg: "failed to update actor metadata", err }))
);

Medium Priority Issues

7. Unbounded SQL Results (actor-driver.ts:91-104)

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:91-104

const cursor = sql.exec(
    "SELECT name, key, destroyed, generation FROM _rivetkit_metadata LIMIT 1",
);
const result = cursor.raw().next();

Issue: While LIMIT 1 is present, the cursor iteration doesn't validate that exactly one row exists (could be zero due to metadata constraint).

Recommendation: Add explicit check that ensures metadata exists and is unique.

8. Typo in Error Class Name (errors.ts:331)

Location: rivetkit-typescript/packages/rivetkit/src/actor/errors.ts:331

export class RequestHandlerNotDfeined extends ActorError {

Issue: Typo - "Dfeined" should be "Defined"

Recommendation: Rename to RequestHandlerNotDefined (this is a breaking change, so coordinate with users).

9. Test Coverage Gaps

Location: Test files only cover basic destroy scenarios

Missing Coverage:

  • Concurrent destroy attempts on the same actor
  • Destroy during active RPC calls
  • Resurrection with different actor configurations
  • Generation counter overflow (unlikely but possible)
  • Destroy of already-destroyed actors at SQL level

Recommendation: Add tests for edge cases, especially concurrent operations.

10. Documentation for new_sqlite_classes Migration

Location: examples/cloudflare-workers/wrangler.json:9

"new_sqlite_classes": ["ActorHandler"]

Issue: Changed from new_classes to new_sqlite_classes without explanation. This could break existing deployments.

Recommendation: Add migration notes or comments explaining this breaking change for existing users.

Low Priority / Style Issues

11. Magic Number for Destroyed Flag

Multiple locations use destroyed INTEGER DEFAULT 0 and check if (destroyed).

Recommendation: Define constants:

const ACTOR_DESTROYED = 1;
const ACTOR_ACTIVE = 0;

12. Inconsistent Logging Patterns

Some logs use msg: first, others use context first:

logger().debug({ msg: "...", generation }); // ✓
logger().debug("actor is destroyed, cannot load"); // ✗

Recommendation: Standardize on structured logging with msg field.

Performance Considerations

  1. SQL Query Performance: The metadata queries use WHERE id = 1 which is efficient, but consider adding indexes if the KV storage table grows large
  2. Generation Counter Overflow: While unlikely, consider what happens when generation reaches Number.MAX_SAFE_INTEGER
  3. Background Delete Operations: DELETE FROM _rivetkit_kv_storage without WHERE clause could be slow for large actor state

Security Considerations

  1. Actor Resurrection: Ensure proper authorization checks when resurrecting destroyed actors
  2. Generation Validation: Properly validate generation numbers to prevent tampering
  3. KV Delete in Background: Consider if sensitive data should be deleted immediately vs. background

Testing Recommendations

  1. Add stress tests for rapid create/destroy cycles
  2. Test behavior when Durable Object is evicted mid-destroy
  3. Verify generation counter handles edge cases (0, MAX_INT, etc.)
  4. Test concurrent access patterns more thoroughly

Overall Assessment

Quality: Good ⭐⭐⭐⭐ (4/5)

  • Well-structured implementation with clear separation of concerns
  • Comprehensive feature addition with proper error handling
  • Good test coverage for happy paths

Concerns: Medium

  • Some race conditions and error handling gaps need addressing
  • Memory leak acknowledged but not resolved
  • Missing edge case coverage in tests

Recommendation

Approve with Changes: The core implementation is solid, but the critical and high-priority issues should be addressed before merging. The generation-based approach is sound, and the code quality is generally good.

Must-Fix Before Merge

  1. Add error handling in #callOnStopAsync (Issue [SVC-2483] Remove hardcoded uses of rivet.gg #4)
  2. Fix race condition in startDestroy (Issue [SVC-2479] Send cluster events to PostHog #2)
  3. Add error logging for background operations (Issue Remove servers not in salt #6)
  4. Fix typo in RequestHandlerNotDfeined (Issue Remove fallible uses of as in api services #8)

Should-Fix Soon

  1. Improve test coverage for edge cases (Issue [SOC-15] Fix error when running user-delete #9)
  2. Add generation overflow handling
  3. Consider proper cleanup mechanism for global state leak

Great work on this feature! The destroy functionality is a valuable addition to RivetKit. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants