Handle OOM in the runtime

We have been discussing this in a couple recent Wasmtime meetings[^0] and [on Zulip](https://bytecodealliance.zulipchat.com/#narrow/channel/217126-wasmtime/topic/OOM.20handling.20in.20.28parts.20of.29.20Wasmtime) and I figured it was time to centralize discussion in a tracking issue.

[^0]: See https://github.com/bytecodealliance/meetings/blob/main/wasmtime/2025/wasmtime-10-23.md and https://github.com/bytecodealliance/meetings/blob/main/wasmtime/2025/wasmtime-11-20.md

What does handling OOM mean in this case? It means turning allocation failure into an `Err(...)` return and ultimately propagating that up to the Wasmtime embedder. It may even involve poisoning various data structures if necessary, maybe up to a whole store if necessary, but we haven't fleshed out the details completely yet. That will happen in discussions on this issue and various PRs during implementation.

Various, unordered sketches of things that will be involved:

* [ ] Replace `anyhow::Error` with a custom `wasmtime::Error`. At its most bare-bones, with all cargo features disabled, we will want this to basically be an `enum` without any data payloads in its variants. As we enable more cargo features, we can start adding support for formatting and error context and ultimately get to something like `anyhow::Error` with all features enabled.
* [ ] Create a `wasmtime-collections` crate that exposes fallible `Vec`, `HashMap`, etc... This is probably just going to be newtypes over the types we already use today, but `wasmtime_collections::Vec::push` will return a `Result` and be implemented via something like `self.0.try_reserve(1)?; self.0.push(item); Ok(())`, for example.
* [ ] We will need custom `serde::Deserialize` implementations that handle OOM failure for the `wasmtime-collections` types we use in our metadata that gets serialized into elf sections in our compiled code.
* [ ] We would ideally like to statically analyze our code and make sure that we aren't allocating infallibly in the relevant code paths. It seems like we can probably use clippy for this, or at least for a 95% solution to this that is Good Enough in practice.
* [ ] We need a way to dynamically test/fuzz our OOM handling to make sure we are actually getting it right in practice.

We will initially focus on supporting the following code paths:

* Creating a `Config`
* Creating an `Engine`
* Creating a `Linker`
* Creating `InstancePre`s
* Deserializing pre-compiled `Module`s and `Component`s (*not* compiling new ones!)
* Creating `Store`s
* Creating `Instance`s
* Creating `Memory`s, `Table`s, `Global`s, etc...
* Running Wasm

Basically, everything that is supported in our no-std/pulley builds now: a basic runtime without the compiler, that can only run pre-compiled Wasm. We will not initially support async or the pooling allocator either, for example. I have vague ideas about how we might be able to refactor the pooling allocator for greater flexibility and enable its use in no-std / no-virtual-memory environments, but that is a bit orthogonal.

Eventually we will want to support async Wasm, yielding on out-of-fuel, ..., and the component model's async functionality. That is going to be a larger project on top of this already large project, so I'm going to delay talking about how we will cross that bridge until we get closer to it.

In practice, I expect that we will start with the OOM testing/fuzzing, create something very simple that fails immediately, and land that as "expected to fail". Then we can get that passing, which will be quite a bunch of work for this first iteration. Then we can remove the failure expectation. Then we can do a little bit more stuff inside the OOM testing/fuzzing and reveal new places we need to fix, and then we can fix those. We can continue this process until things are starting to look more and more complete. At some point we will add the clippy lints, initially to smaller modules and eventually to bigger regions of code. But the testing can be the forcing function for what area of code we add OOM handling to each step of the way.

The best way to dynamically test/fuzz OOM handling that I know of is [the approach taken by SpiderMonkey's `oomTest()` helper](https://firefox-source-docs.mozilla.org/js/hacking_tips.html#how-to-debug-oomtest-failures): run a piece of code (potentially written by humans or generated by a fuzzer) with a special allocator that will return null on the first allocation made and check that the code didn't fail to handle the OOM, then run that code again but failing on the second allocation, then the third, etc... up to your time/compute budget. Starting by building this infrastructure is my rough plan. I've done a little bit of digging for other approaches to ensuring that your OOM-handling is correct, and I haven't really found anything, just people arguing about whether you should even check for null returns from `malloc` or not, which is not very helpful. That is to say, if anyone has any other ideas or knows of any other prior art here, I'd love to hear about it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle OOM in the runtime #12069

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle OOM in the runtime #12069

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions