Skip to content

Conversation

@rattus128
Copy link
Contributor

@rattus128 rattus128 commented Dec 4, 2025

TIL that the WAN TE has a 2GB weight followed by 16MB as the next size down. This means that team 8GB VRAM would fully offload the TE in async offload mode as it just multiplied this giant size by the num streams.

Do the more complex logic of summing up the upcoming to-load weight sizes to avoid triple counting this massive weight.

partial unload does the converse of recording the NS most recent unloads as they go.

This fixes a 2x side reports here: #11081
(This is not the OPs issue)

Example test conditions:
RTX3060 --reserve-vram 5.5 (emulates 8GB with some extra non incidental VRAM usage - matching user number)

image

Before:

Requested to load WanTEModel
loaded partially; 5334.55 MB usable, 0.00 MB loaded, 6419.09 MB offloaded, 6009.00 MB buffer reserved, lowvram patches: 0

After:

Requested to load WanTEModel
loaded partially; 5334.55 MB usable, 5283.48 MB loaded, 1136.00 MB offloaded, 48.00 MB buffer reserved, lowvram patches: 0

TIL that the WAN TE has a 2GB weight followed by 16MB as the next size
down. This means that team 8GB VRAM would fully offload the TE in async
offload mode as it just multiplied this giant size my the num streams.

Do the more complex logic of summing up the upcoming to-load weight
sizes to avoid triple counting this massive weight.

partial unload does the converse of recording the NS most recent
unloads as they go.
@comfyanonymous comfyanonymous merged commit 6be85c7 into comfyanonymous:master Dec 4, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants