Fix on-load VRAM OOM (#11144)

rattus128 · web-flow · commit 4086acf3c2f0 · 2025-12-06T18:42:09.000-05:00
slow down the CPU on model load to not run ahead. This fixes a VRAM on
flux 2 load.

I went to try and debug this with the memory trace pickles, which needs
--disable-cuda-malloc which made the bug go away. So I tried this
synchronize and it worked.

The has some very complex interactions with the cuda malloc async and
I dont have solid theory on this one yet.

Still debugging but this gets us over the OOM for the moment.
diff --git a/comfy/model_patcher.py b/comfy/model_patcher.py
@@ -762,6 +762,8 @@ def load(self, device_to=None, lowvram_model_memory=0, force_patch_weights=False
                     key = "{}.{}".format(n, param)
                     self.unpin_weight(key)
                     self.patch_weight_to_device(key, device_to=device_to)
+                if comfy.model_management.is_device_cuda(device_to):
+                    torch.cuda.synchronize()
 
                 logging.debug("lowvram: loaded module regularly {} {}".format(n, m))
                 m.comfy_patched_weights = True