-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Core] Add a level 3 sleep/wake_up that offloads tensors to disk #14678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
a627b18 to
348c817
Compare
8d9863d to
0b7192a
Compare
youkaichao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to do it. There are too many things to consider, in terms of disk, like the location of the disk to use, whether or not saving all tensors in one file ...
I would recommend you rewrite the sleep (level=1) logic for your use case, and keep it your own. I don't think this complexity is a good fit for the upstream.
|
This pull request has merge conflicts that must be resolved before it can be |
8c9e0e8 to
46b76bf
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
351534b to
f297263
Compare
503389d to
bdca49c
Compare
d4d0b1b to
3e6d848
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
3e6d848 to
bf5cc17
Compare
851c74d to
96889b7
Compare
ac52f98 to
384a568
Compare
7738af8 to
a82562e
Compare
2bdabae to
3eed577
Compare
3eed577 to
7a92636
Compare
7a92636 to
4f74d64
Compare
Co-authored-by: aavarghese <[email protected]> Co-authored-by: manoelmarques <[email protected]> Signed-off-by: Manoel Marques <[email protected]>
4f74d64 to
b9e95d1
Compare
Add Level 3 sleep mode that will offload the model weights to disk and discard the kv cache.
The model weights are not backed up in CPU memory and the content of kv cache is forgotten.
Level 3 sleep helps use minimum CPU memory and loads efficiently from disk when woken up.