mp: use look-ahead actuals for stream offload VRAM calculation (fixes unwanted TE full offload) #11096
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TIL that the WAN TE has a 2GB weight followed by 16MB as the next size down. This means that team 8GB VRAM would fully offload the TE in async offload mode as it just multiplied this giant size by the num streams.
Do the more complex logic of summing up the upcoming to-load weight sizes to avoid triple counting this massive weight.
partial unload does the converse of recording the NS most recent unloads as they go.
This fixes a 2x side reports here: #11081
(This is not the OPs issue)
Example test conditions:
RTX3060 --reserve-vram 5.5 (emulates 8GB with some extra non incidental VRAM usage - matching user number)
Before:
After: