-
Notifications
You must be signed in to change notification settings - Fork 140
scx_rustland: Introduce a congestion threshold #1894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
scheds/rust/scx_rustland/src/main.rs
Outdated
@@ -117,6 +117,9 @@ struct Opts { | |||
// Time constants. | |||
const NSEC_PER_USEC: u64 = 1_000; | |||
|
|||
// Congestion threshold. | |||
const NR_WAITING_MAX: u64 = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be interesting to try scaling this by the number of cores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be interesting to try scaling this by the number of cores.
I was thinking about that initially, but it's not trivial to model this effectively. In systems that are really big we may allow thousands of tasks to queue up before triggering any flush, leading to burstiness and stuttering behavior.
If tasks are queuing up and the length of the queue keeps growing over a certain threshold, it doesn't matter much if we have 1 CPU or 1000 CPUs, the fact is that the system doesn't have enough capacity to consume the amount of requests, so, in that case, we may want to operate in a more synchronous way, flushing tasks to prevent too long wait time (that may lead to stalls).
BTW, I may also update this PR, I'm not really happy how I've implemented the flush. I'm currently running more tests with a slightly different approach. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term it could be interesting to see if an arena based approach could work as well, would require a bit of thinking though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, also replacing the ring buffers used to bounce tasks from/to BPF with arenas would be interesting. Something that I'm planning to do at once arenas become a bit more stable.
cd0ef85
to
32ce51a
Compare
If too many tasks are piling up in the user-space scheduler we may risk to hit stall conditions. To prevent this, introduce a congestion threshold: when the number of waiting tasks exceeds this threshold, the scheduler will proactively flush the queue to bring the task count back below the critical level. Moreover, introduce the new option --nr-waiting-max to make this threshold configurable from the command line. This helps handle heavy stress tests that might flood the system with a high volume of tasks. Signed-off-by: Andrea Righi <[email protected]>
32ce51a
to
f3f24fe
Compare
If too many tasks are piling up in the user-space scheduler we may risk to hit stall conditions.
To prevent this, introduce a congestion threshold: when the number of waiting tasks exceeds this threshold, the scheduler will proactively flush the queue to bring the task count back below the critical level.
This helps handle heavy stress tests that might flood the system with a high volume of tasks.