Skip to content

scx_rustland: Introduce a congestion threshold #1894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2025

Conversation

arighi
Copy link
Contributor

@arighi arighi commented May 16, 2025

If too many tasks are piling up in the user-space scheduler we may risk to hit stall conditions.

To prevent this, introduce a congestion threshold: when the number of waiting tasks exceeds this threshold, the scheduler will proactively flush the queue to bring the task count back below the critical level.

This helps handle heavy stress tests that might flood the system with a high volume of tasks.

@arighi arighi requested review from htejun, multics69 and hodgesds May 16, 2025 08:35
@@ -117,6 +117,9 @@ struct Opts {
// Time constants.
const NSEC_PER_USEC: u64 = 1_000;

// Congestion threshold.
const NR_WAITING_MAX: u64 = 128;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be interesting to try scaling this by the number of cores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be interesting to try scaling this by the number of cores.

I was thinking about that initially, but it's not trivial to model this effectively. In systems that are really big we may allow thousands of tasks to queue up before triggering any flush, leading to burstiness and stuttering behavior.

If tasks are queuing up and the length of the queue keeps growing over a certain threshold, it doesn't matter much if we have 1 CPU or 1000 CPUs, the fact is that the system doesn't have enough capacity to consume the amount of requests, so, in that case, we may want to operate in a more synchronous way, flushing tasks to prevent too long wait time (that may lead to stalls).

BTW, I may also update this PR, I'm not really happy how I've implemented the flush. I'm currently running more tests with a slightly different approach. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long term it could be interesting to see if an arena based approach could work as well, would require a bit of thinking though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, also replacing the ring buffers used to bounce tasks from/to BPF with arenas would be interesting. Something that I'm planning to do at once arenas become a bit more stable.

@arighi arighi force-pushed the rustland-congestion-threshold branch from cd0ef85 to 32ce51a Compare May 16, 2025 13:49
If too many tasks are piling up in the user-space scheduler we may risk
to hit stall conditions.

To prevent this, introduce a congestion threshold: when the number of
waiting tasks exceeds this threshold, the scheduler will proactively
flush the queue to bring the task count back below the critical level.

Moreover, introduce the new option --nr-waiting-max to make this
threshold configurable from the command line.

This helps handle heavy stress tests that might flood the system with a
high volume of tasks.

Signed-off-by: Andrea Righi <[email protected]>
@arighi arighi force-pushed the rustland-congestion-threshold branch from 32ce51a to f3f24fe Compare May 16, 2025 14:04
@arighi arighi added this pull request to the merge queue May 16, 2025
Merged via the queue into main with commit 2a5a1d2 May 16, 2025
32 checks passed
@arighi arighi deleted the rustland-congestion-threshold branch May 16, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants