-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
[Misc] Add In-Container restart capability through supervisord for openai server #28502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
46faab3 to
9206e4b
Compare
|
@HappyAmazonian, your PR description included your HuggingFace token. I removed it, but I also suggest you revoke it. |
|
Question, typically our users rely on the cluster orchestrator to perform the restart, such as
This also gives granular observabilities into statistics of how often the pod crashes and restarts Can you please leave some insights? |
These are valid points, we can disable this feature by default but consider this as a first line of defense that is useful in the following scenarios -
We are trying to bring process resiliency without being opinionated on deployment patterns. |
|
Since it looks like this does not require any new dependencies, I'm interested to see what an "off by deafult" approach would look like. If it's simple enough, it seems OK. Otherwise, I'd say I think this sounds like custom behavior that should be enabled by building your own image. Let's see what having it as an option looks like, though. |
…enai server vllm-project#28502 Signed-off-by: Shen Teng <[email protected]>
172b24e to
3d0bec8
Compare
Hi. Thanks for replying. We changed our plugin. So now the by default, it uses |
Signed-off-by: Shen Teng <[email protected]>
Signed-off-by: Shen Teng <[email protected]>
|
Is |
@simon-mo Yes, it's an Apace 2.0 licensed package we publish - https://github.com/aws/model-hosting-container-standards. It is not necessarily tied to SageMaker, it encapsulates a bunch of boilerplate code for inference containers. We could partner to mitigate the risks you perceive. The intention for the launcher is to add basic functionality like in-container restart and converting ENV variables to CLI args. |
|
Can this change be made in the orchestration system to override entrypoint? AFAIK K8s supports this. |
This PR specifically targets non-k8s based orchestration. Besides we would like to introduce more features into the launcher like converting ENV variables to arguments. @simon-mo Would you be amenable to an entrypoint.sh script that makes the launcher conditional on an ENV variable being present ? @HappyAmazonian Could you take a shot at this ? |
|
What are the non-K8s based orchestratation that uses Docker container and do not offer restart? Fargate? |
Add configurable entrypoint.sh that allows toggling standard-supervisor usage via VLLM_USE_STANDARD_SUPERVISOR environment variable. - Default: direct vllm serve execution (standard-supervisor disabled) - Set VLLM_USE_STANDARD_SUPERVISOR=true to enable standard-supervisor Signed-off-by: Shen Teng <[email protected]>
47c7e05 to
c9d6f6f
Compare
Yes, Fargate and SageMaker processing and Batch Transform etc. There are legacy container orchestration patterns with polling loops rather than event driven architecture. That is the target audience here. @simon-mo The PR has been updated to reduce the blast radius. The launcher is now an OPT-IN. |
Purpose
vllm serveexit accidentally.standard-supervisorwhich generates config file forsupervisorand start the[supervisor](https://supervisord.org/) process with the launch command. This enables simple integration in the vllm openai server dockerfile.PROCESS_AUTO_RECOVERY=trueenv varTest Plan
in our repo aws/model-hosting-container-standards, we fully test the supervisor integration https://github.com/aws/model-hosting-container-standards/blob/main/python/tests/integration/test_supervisor_cli_integration.py
we've also manually tested the restart by killing the vllm process inside the docker container.
To test this, we can build the openai server with
Start the vllm server container
use
docker exec -it [container name] bashto log into the running container, use htop to kill the vllm subprocess and see the vllm process gets restartedTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.