-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Fix] Reduce busy polling when scheduler is idle #6026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this is neat, can you make it an sys arg options instead of default behavior? |
Looking forward to having this merged. |
84f405e
to
f853f9d
Compare
@xiezhq-hermann thanks for review. I added |
@xiezhq-hermann Just a friendly ping :-) |
The change looks good to me, and I just fixed a minor lint problem. |
@merrymercy Just a friendly ping.. This PR has been in ready-to-merge status for 2 weeks. |
In setups which have long inactivity periods it is desirable to reduce system power consumption when sglang does nothing. This would lead not only to power savings, but also to more CPU thermal headroom when a request eventually comes. This is important in cases when multiple GPUs are connected as each GPU would otherwise pin one thread at 100% CPU usage. The simplest solution is to use zmq.Poller on all sockets that may receive data that needs handling immediately.
Rebased. |
@merrymercy @xiezhq-hermann Just a friendly ping again :) Anything I could do to make this PR to land? It has been in "approved, ready to merge" status for almost a month now. |
@p12tic sorry for the delay due to some coordination problem on our side and thanks again for your work and patience : ) |
Motivation
In setups which have long inactivity periods it is desirable to reduce system power consumption when sglang does nothing. This would lead not only to power savings, but also to more CPU thermal headroom when a request eventually comes. This is important in cases when multiple GPUs are connected as each GPU would otherwise pin one thread at 100% CPU usage.
Primary use case is residential or small commercial users serving LLMs to a very small number of users.
FIXES #1730.
This PR includes a simplier alternative to #1731. There's less risk of adverse performance impact because the affected code runs when server is idle.
The reason for #1731 closure was specified as in-progress refactoring that would resolve the performance problem by itself. Because half a year passed since that time, I'm bringing up a fix again with the hope that maybe it will be deemed acceptable this time.
Modifications
The proposed solution is to use zmq.Poller on all sockets that may receive data that needs handling immediately. Doing this on idle should have few risks with little latency impact.
Checklist