-
Notifications
You must be signed in to change notification settings - Fork 287
Description
Describe the bug
I'm having difficulty deleting an agent after a period of inactivity. The root cause of this appears to be that await this.ctx.storage.deleteAlarm(); is not always (maybe usually) respected when the object is subsequently immediately evicted with an untrappable exeption (due to .abort()).
What I'm seeing is that in my schedule handler I call .destroy(), which as advertised, deletes the DO and evicts. However, 2 seconds later (as per the retry docs for DO), the alarm is retried by the DO runtime, even though .destroy() clearly calls deleteAlarm().
This causes the agent to come back into existence and re-create its schema, which then will consume some (small amount) of storage. Although small, over time and many agents, these zombies will add up.
To work around this, rather than calling ".destroy()" I have my own version of the destroy method that omits the "abort()" call, so that the alarm handler won't throw and won't be retried. However this doesn't work because the "alarm()" handler uses the database to remove the executed schedule, but of course the database schema is gone once my handler deletes storage, so it throws anyway.
Possible solutions, which all I think require that the alarm handler does not throw when the object is destroyed because .deleteAlarm() appears unreliable, include:
- making the "alarm" handler in the base class not readonly, so we can wrap it with a try/catch in our implementation classes and silently ignore the "table not found" errors when trying to remove the stored schedule from a now non-existent table
- Swallowing this particular error in the base class, but returning early from the alarm handler (so that _scheduleNextAlarm isn't called which will have more errors)
- Setting a member variable "_destroyed" in the destroy call, then using that to exit the alarm handler early after the user code callback. This would also require omitting the ".abort()" call or making it optional with a parameter to .destroy().
To Reproduce
- Create an agent and use the api to create a schedule with a handler that calls and awaits destroy()
- Observe in the Cloudflare logs that the alarm is being retried, and thus re-constructs the DO and the schema
Expected behavior
Schedule callbacks that call "destroy" are not retried by the DO infrastructure
Screenshots
Version:
0.2.17
Additional context
none