Skip to content

Conversation

@manoelmarques
Copy link
Contributor

@manoelmarques manoelmarques commented Mar 12, 2025

Add Level 3 sleep mode that will offload the model weights to disk and discard the kv cache.

The model weights are not backed up in CPU memory and the content of kv cache is forgotten.

Level 3 sleep helps use minimum CPU memory and loads efficiently from disk when woken up.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@manoelmarques manoelmarques force-pushed the sleepwake branch 2 times, most recently from a627b18 to 348c817 Compare March 12, 2025 12:44
@manoelmarques manoelmarques marked this pull request as draft March 12, 2025 13:29
@manoelmarques manoelmarques force-pushed the sleepwake branch 8 times, most recently from 8d9863d to 0b7192a Compare March 18, 2025 19:12
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to do it. There are too many things to consider, in terms of disk, like the location of the disk to use, whether or not saving all tensors in one file ...

I would recommend you rewrite the sleep (level=1) logic for your use case, and keep it your own. I don't think this complexity is a good fit for the upstream.

@mergify
Copy link

mergify bot commented Apr 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @manoelmarques.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Aug 26, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @manoelmarques.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 26, 2025
@mergify mergify bot removed the needs-rebase label Aug 28, 2025
@manoelmarques manoelmarques force-pushed the sleepwake branch 4 times, most recently from 503389d to bdca49c Compare August 29, 2025 13:35
@manoelmarques manoelmarques force-pushed the sleepwake branch 3 times, most recently from d4d0b1b to 3e6d848 Compare September 9, 2025 16:58
@mergify
Copy link

mergify bot commented Sep 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @manoelmarques.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Co-authored-by: aavarghese <[email protected]>
Co-authored-by: manoelmarques <[email protected]>
Signed-off-by: Manoel Marques <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants