-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] Enhancements to ml.allocated_processors_scale
for increased flexibility in model allocations.
#110023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ML] Enhancements to ml.allocated_processors_scale
for increased flexibility in model allocations.
#110023
Conversation
ml.allocated_processors_scale
for increased flexibility in model allocations.
ml.allocated_processors_scale
for increased flexibility in model allocations.ml.allocated_processors_scale
for increased flexibility in model allocations.
Pinging @elastic/ml-core (Team:ML) |
It is correct that the Setting If Have you tested this change on your server? I would have expected over-subscribing the thread count would introduce contention. Does such a change increase throughput for you?
Is the problem that Elasticsearch thinks the machine has fewer vCPUs than is actually does have or for some reason Elasticsearch cannot use all the available CPUs? For example, if Elasticsearch thinks that a 16vCPU machine has only 8vCPUs then setting |
Hi, @davidkyle Thank you for your detailed explanation and the insights provided by the chart regarding the performance implications of threading beyond physical core counts. I fully appreciate Elasticsearch's rigorous approach to this matter. The primary scenario prompting my proposal arises when clients need to deploy multiple inference models within ES. Currently, the total number of deployment threads, calculated as Secondarily, our performance tests have shown that even when utilizing the full count of available vCPUs for intensive inference stress testing, the dedicated ML nodes' CPU utilization remains underutilized, hovering between 50%-60%. This bottleneck appears to be linked directly to the current restrictions. Thus, allowing expert-level testing by relaxing these constraints could help users identify more optimal deployment processes and thread counts. Post-adjustment in this PR, I have verified that such hyper-threaded utilization indeed enhances CPU usage rates on dedicated ML nodes. Since the default setting remains unchanged, this modification poses no risk to general users, who are still protected by the vCPU count limitation. For expert users with the needs outlined above, this change would grant them greater flexibility to maximize performance and conduct more thorough testing. Given that similar constraints exist in ml-cpp, a coordinated strategy adjustment might be necessary. I look forward to discussing this further with you and exploring potential collaborative adjustments.
Yes, I have conducted tests on our servers, and while it's generally expected that over-subscribing thread counts could lead to contention, our specific use case has shown a net increase in throughput. This is primarily due to the underutilization of CPU resources under current constraints, as mentioned earlier.
The issue isn't that Elasticsearch misinterprets the number of vCPUs; rather, it's about how Elasticsearch currently limits the thread count per allocation based on the vCPUs available. This can prevent it from utilizing the full potential of the hardware, especially in scenarios where the workload is not consistently high, allowing for safe over-subscription without contention. The I appreciate your engagement on this topic and look forward to further discussions to refine and enhance this feature. |
Thank you @Rassyan that is a very interesting idea to allow over-subscription of the CPU cores so that if you have 2 models deployed but only one model is actively used then that model can acquire all the CPU resource. I now see how this change would be helpful to you. My team has a meeting tomorrow, I've put this item on the agenda for discussion we will get back to you after the meeting. |
Hi @davidkyle , since you're most familiar with this part of the codebase, would you consider checking this PR when convenient? I'd value your expertise on the implementation approach. |
Related Issue
#109001
Motivation
The current implementation of
ml.allocated_processors_scale
is limited to integer values, primarily used for scaling down processor counts to account for hyper-threading. This proposal aims to extend its functionality to better utilize excess capacity on nodes by allowing the scaling up of processor counts.Proposed Changes
ml.allocated_processors_scale
to accept floating-point values for finer granularity in scaling.ml.allocated_processors_scale
to support values less than 1, enabling an increase in the effective processor count used in model planning.ml.allocated_processors_scale
on model allocations and thread usage.These changes will make the setting more adaptable to various resource availability scenarios.