Add node write load to the ClusterInfo #130411

DiannaHohensee · 2025-07-01T21:05:45Z

Sets up ClusterInfoService to collect node write load and
pass it into ClusterInfo. The node write load stats are not
yet supplied, they'll be zero/empty in the ClusterInfo for
now.

Relates ES-11990

Sets up ClusterInfoService to collect node write load and pass it into ClusterInfo. The node write load stats are not yet supplied, they'll be zero/empty in the ClusterInfo for now. Relates ES-11990

elasticsearchmachine · 2025-07-01T21:06:16Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DiannaHohensee · 2025-07-01T21:13:10Z

server/src/internalClusterTest/java/org/elasticsearch/index/shard/IndexShardIT.java


    @Override
    protected Collection<Class<? extends Plugin>> getPlugins() {
-        return pluginList(InternalSettingsPlugin.class, BogusEstimatedHeapUsagePlugin.class);
+        return pluginList(InternalSettingsPlugin.class, BogusEstimatedHeapUsagePlugin.class, BogusWriteLoadCollectorPlugin.class);


I originally thought we'd be collecting the node write load stats in the stateless code. Now it's looking like we may not have a dependency on stateless code to collect the stats, but it's not yet decided. As I've already got this implemented, I'd like to go ahead with it as is, if folks agree. It will be a testing change later, if needed.

I think it will be more than a testing change if we don't rely on stateless for collecting stats? We don't need to use the service provider pattern in that case, i.e. no need for the pluginsService.loadSingletonServiceProvider in the NodeServiceProvider class. That said, I am ok with us going with this for now. We can remove it later if it turns out to be unnecessary.

The team discussion seems to decide that we will either extending NodeStats API or adding a new transport action for collection purpose. So maybe we can remove it in this PR already?

Yes, I'd have to change the NodeServiceProvider line, too 👍

My inclination is to ship this as is, without revisiting the testing, so I don't block myself or others picking up the next pieces of project work. And I'm also not 100% confident proposals will survive contact with code, haha.

DiannaHohensee · 2025-07-01T21:15:51Z

server/src/main/java/org/elasticsearch/cluster/ClusterInfo.java

-            // NOTE: We don't serialize estimatedHeapUsages at this stage, to avoid
-            // committing to API payloads until the feature is settled
+            // NOTE: We don't serialize estimatedHeapUsages/nodeWriteLoads at this stage, to avoid
+            // committing to API payloads until the features are settled


I'm not entirely sure what the toXContentChunked is used for, but figured I should follow suit.

nicktindall

LGTM with some comments

nicktindall · 2025-07-02T00:43:50Z

server/src/main/java/org/elasticsearch/cluster/NodeWriteLoad.java

+    String nodeId,
+    int totalWriteThreadPoolThreads,
+    int percentWriteThreadPoolUtilization,
+    long maxTaskTimeInWriteQueueMillis


I think we should indicate that these are averages (e.g. averageWriteThreadPoolUtilization and averageWriteThreadPoolQueueLatency)

I used the term "queue latency" to describe the time spent waiting in the queue, I'm not married to that term specifically but I think we should try and be consistent (happy to agree on something else).

You're right, the terminology doesn't match. Done 👍

nicktindall · 2025-07-02T00:55:09Z

server/src/main/java/org/elasticsearch/cluster/NodeWriteLoad.java

+public record NodeWriteLoad(
+    String nodeId,
+    int totalWriteThreadPoolThreads,
+    int percentWriteThreadPoolUtilization,


Nit: I have a preference for using a float/double here, i.e. a ratio rather than a percentage. I'm not sure why. But for example org.elasticsearch.common.util.concurrent.TaskExecutionTimeTrackingEsThreadPoolExecutor#pollUtilization returns a double (arguably it should be a float)

It seems like they're more useful for arithmetic, and percentages are more useful for rendering. Just an opinion though.

ywangd

I have a question about the data structure and potential expansion for more thread pools.

ywangd · 2025-07-02T03:32:37Z

server/src/main/java/org/elasticsearch/cluster/NodeWriteLoad.java

+public record NodeWriteLoad(
+    String nodeId,
+    int totalWriteThreadPoolThreads,
+    int percentWriteThreadPoolUtilization,


This structure allows reporting for only a single thread pool. Autoscaling uses stats for all 3 write related thread pools, write, system_write and system_critical_write. Starting out this work with just one thread pool is probably good enough. But I wonder whether the data structure could leave space for expansion? Not sure if this has already been discussed and agreed upon elsewhere. Please let me know if that is the case.

Besides write thread pools, there is possibilty that we might extend this to search thread pools as well?

Autoscaling uses stats for all 3 write related thread pools, write, system_write and system_critical_write.

I'm not familiar with system_write and system_critical_write. Looks like system_write might be for system indices?

But I wonder whether the data structure could leave space for expansion? Not sure if this has already been discussed and agreed upon elsewhere. Please let me know if that is the case.

Nope, it has not been discussed.

Besides write thread pools, there is possibilty that we might extend this to search thread pools as well?

Yes, I expect we would eventually add search. I think the future changes for that would be fairly straightforward: extending tracking to the search thread pools, sending that information along from the data node to master as well, modifying data structures like this.

I think it isn't worthwhile to implement search support now, since it would be significant overhead to think that far ahead, and we might end up changing this a few times before we're satisfied with everything for write. Using a new TransportVersion to set up search in future seems like it'd be okay?

Regarding system_write and system_critical_write thread pools.. I currently have no idea how we'd leverage that extra data to make allocation decisions.

What I meant by "leave space for expansion" is not adding either search or other thread pool stats right now. But making the data structure here to be something like

record NodeLoad (String nodeId, Map<String, LoadStats> threadsLoadStats) {} record LoadStats( int totalWriteThreadPoolThreads, int percentWriteThreadPoolUtilization, long maxTaskTimeInWriteQueueMillis )

where the Map<String, LoadStats> threadsLoadStats is keyed by thread pool name, e.g. write. In this PR, we still just take care of the write thread pool. But the map structure is more generic and organized and makes it easier to add any future thread pools that we might be interested without transport version changes.

Sure, it just seems like a lot more than we need right now. Adding search (not just here) will require a TransportVersion and significant changes. I don't see a clear use case for the other thread pools.

I don't have much preference for the structure myself. I'll go ahead with your code suggestion 👍

DiannaHohensee

I've updated the data structure. The naming is making my head spin a little, so I'll probably think on that a little more later. The node is sending thread pool stats, which reflect load, which are execution metrics..... Open to further naming suggestions 😌

ywangd

LGTM

Thanks for the iterations. My naming suggestion is optional. Feel free to take or discard. Regardless what names you decide to go with, I'd appreciate if you could make them consistent across all relevant places, e.g.:

The Collector class's name and it's variable name in NodeServiceProvider and InternalClusterInfoService
The field in ClusterInfo, ClusterInfoSimulator and InternalClusterInfoService

ywangd · 2025-07-04T01:53:53Z

server/src/main/java/org/elasticsearch/cluster/NodeExecutionLoad.java

+ * @param threadPoolUsageStatsMap A map of thread pool name ({@link org.elasticsearch.threadpool.ThreadPool.Names}) to the thread pool's
+ *                                usage stats ({@link ThreadPoolUsageStats}).
+ */
+public record NodeExecutionLoad(String nodeId, Map<String, ThreadPoolUsageStats> threadPoolUsageStatsMap) implements Writeable {


How about we use the phrase ThreadPoolUsageStats? I think we can also simplify the name of the map field.

Suggested change

public record NodeExecutionLoad(String nodeId, Map<String, ThreadPoolUsageStats> threadPoolUsageStatsMap) implements Writeable {

public record NodeThreadPoolUsageStats(String nodeId, Map<String, ThreadPoolUsageStats> threads) implements Writeable {

Add node write load to the ClusterInfo

f81f9f9

Sets up ClusterInfoService to collect node write load and pass it into ClusterInfo. The node write load stats are not yet supplied, they'll be zero/empty in the ClusterInfo for now. Relates ES-11990

DiannaHohensee self-assigned this Jul 1, 2025

DiannaHohensee requested a review from a team as a code owner July 1, 2025 21:05

DiannaHohensee added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team labels Jul 1, 2025

elasticsearchmachine added the v9.2.0 label Jul 1, 2025

DiannaHohensee commented Jul 1, 2025

View reviewed changes

DiannaHohensee requested review from nicktindall and ywangd July 1, 2025 21:17

nicktindall approved these changes Jul 2, 2025

View reviewed changes

ywangd reviewed Jul 2, 2025

View reviewed changes

DiannaHohensee added 4 commits July 2, 2025 16:03

Merge branch 'main' into 2025/07/01/cluster-info-service

ca5c9f6

change names and types

ac350d9

change NodeLoad data structure

9c7e04c

Merge branch 'main' into 2025/07/01/cluster-info-service

5d41e3d

DiannaHohensee commented Jul 3, 2025

View reviewed changes

ywangd approved these changes Jul 4, 2025

View reviewed changes

	public record NodeExecutionLoad(String nodeId, Map<String, ThreadPoolUsageStats> threadPoolUsageStatsMap) implements Writeable {
	public record NodeThreadPoolUsageStats(String nodeId, Map<String, ThreadPoolUsageStats> threads) implements Writeable {

Add node write load to the ClusterInfo #130411

Are you sure you want to change the base?

Add node write load to the ClusterInfo #130411

Uh oh!

Conversation

DiannaHohensee commented Jul 1, 2025

Uh oh!

elasticsearchmachine commented Jul 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DiannaHohensee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DiannaHohensee left a comment •

edited

Loading