[ML] Enhancements to `ml.allocated_processors_scale` for increased flexibility in model allocations. #110023

Rassyan · 2024-06-21T06:50:35Z

Related Issue

Motivation

The current implementation of ml.allocated_processors_scale is limited to integer values, primarily used for scaling down processor counts to account for hyper-threading. This proposal aims to extend its functionality to better utilize excess capacity on nodes by allowing the scaling up of processor counts.

Proposed Changes

Modify ml.allocated_processors_scale to accept floating-point values for finer granularity in scaling.
Allow ml.allocated_processors_scale to support values less than 1, enabling an increase in the effective processor count used in model planning.
Update documentation to clearly describe the effects of ml.allocated_processors_scale on model allocations and thread usage.

These changes will make the setting more adaptable to various resource availability scenarios.

…elastic#109001)

elasticsearchmachine · 2024-06-28T10:38:14Z

Pinging @elastic/ml-core (Team:ML)

davidkyle · 2024-06-28T15:41:07Z

It is correct that the ml.allocated_processors_scale setting was designed to scale down the number of processors to account for hyperthreading. The speed of inference is linearly related to the number of physical cores on the machine, increasing the number of physical cores a model can use increases the inference speed in a predictable manner. Once a model is using more threads than physical cores the performance improvements slow down as those threads are hyper-threaded. You can see the effect of hyper threading in this chart: https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#_elser_v2_2.

Setting ml.allocated_processors_scale: 2 makes the performance increase with every new thread predicable at the cost of a slight loss in overall performance.

If ml.allocated_processors_scale is a double and allowed to be < 0 then it would allow over subscription of the CPU resources. For example, on a machine with 16vCPUs setting ml.allocated_processors_scale: 0.5 would make the model assignment logic think there are 32vCPUs on the machine and allow a model to be deployed using 32 threads but those 32 threads are backed by only 16 hardware threads.

Have you tested this change on your server? I would have expected over-subscribing the thread count would introduce contention. Does such a change increase throughput for you?

This proposal aims to extend its functionality to better utilize excess capacity on nodes by allowing the scaling up of processor counts.

Is the problem that Elasticsearch thinks the machine has fewer vCPUs than is actually does have or for some reason Elasticsearch cannot use all the available CPUs? For example, if Elasticsearch thinks that a 16vCPU machine has only 8vCPUs then setting ml.allocated_processors_scale: 0.5 would allow Elasticsearch to use the true number of vCPUs. Is this the problem scenario you are experiencing?

Rassyan · 2024-07-01T03:24:59Z

Hi, @davidkyle

Thank you for your detailed explanation and the insights provided by the chart regarding the performance implications of threading beyond physical core counts. I fully appreciate Elasticsearch's rigorous approach to this matter.

The primary scenario prompting my proposal arises when clients need to deploy multiple inference models within ES. Currently, the total number of deployment threads, calculated as number_of_allocations * threads_per_allocation, must not exceed the total available vCPUs across ML nodes in the cluster. For instance, if inference Model A serves Business A and Model B serves Business B, users may wish to maximize inference capabilities for both models without concurrent high-throughput usage. Under the existing constraints, deploying both models at their maximum thread capacity simultaneously isn't feasible without first halting one. Given that both are online services requiring uninterrupted operation, the ability to set ml.allocated_processors_scale to a value less than 1 would offer expert users the flexibility to deploy more models to handle complex operations, thereby placing more control over node throughput and performance in their hands.

Secondarily, our performance tests have shown that even when utilizing the full count of available vCPUs for intensive inference stress testing, the dedicated ML nodes' CPU utilization remains underutilized, hovering between 50%-60%. This bottleneck appears to be linked directly to the current restrictions. Thus, allowing expert-level testing by relaxing these constraints could help users identify more optimal deployment processes and thread counts. Post-adjustment in this PR, I have verified that such hyper-threaded utilization indeed enhances CPU usage rates on dedicated ML nodes.

Since the default setting remains unchanged, this modification poses no risk to general users, who are still protected by the vCPU count limitation. For expert users with the needs outlined above, this change would grant them greater flexibility to maximize performance and conduct more thorough testing. Given that similar constraints exist in ml-cpp, a coordinated strategy adjustment might be necessary. I look forward to discussing this further with you and exploring potential collaborative adjustments.

Have you tested this change on your server? I would have expected over-subscribing the thread count would introduce contention. Does such a change increase throughput for you?

Yes, I have conducted tests on our servers, and while it's generally expected that over-subscribing thread counts could lead to contention, our specific use case has shown a net increase in throughput. This is primarily due to the underutilization of CPU resources under current constraints, as mentioned earlier.

Is the problem that Elasticsearch thinks the machine has fewer vCPUs than is actually does have or for some reason Elasticsearch cannot use all the available CPUs? For example, if Elasticsearch thinks that a 16vCPU machine has only 8vCPUs then setting ml.allocated_processors_scale: 0.5 would allow Elasticsearch to use the true number of vCPUs. Is this the problem scenario you are experiencing?

The issue isn't that Elasticsearch misinterprets the number of vCPUs; rather, it's about how Elasticsearch currently limits the thread count per allocation based on the vCPUs available. This can prevent it from utilizing the full potential of the hardware, especially in scenarios where the workload is not consistently high, allowing for safe over-subscription without contention. The ml.allocated_processors_scale setting, when adjusted to below 1, is intended to offer more flexibility in such cases, not to correct a miscount of vCPUs but to optimize resource usage during varying load conditions.

I appreciate your engagement on this topic and look forward to further discussions to refine and enhance this feature.

davidkyle · 2024-07-02T10:23:56Z

Thank you @Rassyan that is a very interesting idea to allow over-subscription of the CPU cores so that if you have 2 models deployed but only one model is actively used then that model can acquire all the CPU resource. I now see how this change would be helpful to you. My team has a meeting tomorrow, I've put this item on the agenda for discussion we will get back to you after the meeting.

Rassyan · 2025-04-21T03:58:49Z

Hi @davidkyle , since you're most familiar with this part of the codebase, would you consider checking this PR when convenient? I'd value your expertise on the implementation approach.

Enhancement for ml.allocated_processors_scale to Support Expansion. (…

57da18b

…elastic#109001)

elasticsearchmachine added needs:triage Requires assignment of a team area label v8.15.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 21, 2024

Rassyan changed the title ~~Enhancements to ml.allocated_processors_scale for increased flexibility in model allocations.~~ Enhancements to ml.allocated_processors_scale for increased flexibility in model allocations. Jun 21, 2024

Rassyan changed the title ~~Enhancements to ml.allocated_processors_scale for increased flexibility in model allocations.~~ [ML] Enhancements to ml.allocated_processors_scale for increased flexibility in model allocations. Jun 21, 2024

kingherc added :ml Machine learning Team:ML Meta label for the ML team labels Jun 21, 2024

thecoop removed the needs:triage Requires assignment of a team area label label Jun 28, 2024

davidkyle self-requested a review June 28, 2024 15:14

davidkyle added the >enhancement label Jul 2, 2024

davidkyle self-assigned this Jul 2, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

elasticsearchmachine added v9.1.0 and removed v9.0.0 labels Jan 30, 2025

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Enhancements to `ml.allocated_processors_scale` for increased flexibility in model allocations. #110023

[ML] Enhancements to `ml.allocated_processors_scale` for increased flexibility in model allocations. #110023

Uh oh!

Rassyan commented Jun 21, 2024

Uh oh!

elasticsearchmachine commented Jun 28, 2024

Uh oh!

davidkyle commented Jun 28, 2024 •

edited

Loading

Uh oh!

Rassyan commented Jul 1, 2024

Uh oh!

davidkyle commented Jul 2, 2024

Uh oh!

Rassyan commented Apr 21, 2025

Uh oh!

Uh oh!

[ML] Enhancements to ml.allocated_processors_scale for increased flexibility in model allocations. #110023

Are you sure you want to change the base?

[ML] Enhancements to ml.allocated_processors_scale for increased flexibility in model allocations. #110023

Uh oh!

Conversation

Rassyan commented Jun 21, 2024

Related Issue

Motivation

Proposed Changes

Uh oh!

elasticsearchmachine commented Jun 28, 2024

Uh oh!

davidkyle commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rassyan commented Jul 1, 2024

Uh oh!

davidkyle commented Jul 2, 2024

Uh oh!

Rassyan commented Apr 21, 2025

Uh oh!

Uh oh!

[ML] Enhancements to `ml.allocated_processors_scale` for increased flexibility in model allocations. #110023

[ML] Enhancements to `ml.allocated_processors_scale` for increased flexibility in model allocations. #110023

davidkyle commented Jun 28, 2024 •

edited

Loading