Skip to content

Add thread pool utilisation metric #120363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Jan 17, 2025

There are existing metrics for the active number of threads, but it seems tricky to go from that to a "utilisation" number because all the pools have different sizes.

This PR adds es.thread_pool.{name}.threads.utilization.current which will be published by all TaskExecutionTimeTrackingEsThreadPoolExecutor thread pools (where EsExecutors.TaskTrackingConfig#trackExecutionTime is true).

The metric is a double gauge indicating what fraction (in [0.0, 1.0]) of the maximum possible execution time was utilised over the polling interval.

It's calculated as actualTaskExecutionTime / maximumTaskExecutionTime, so effectively a "mean" value. The metric interval is 60s so brief spikes won't be apparent in the measure, but the initial goal is to use it to detect hot-spotting so the 60s average will probably suffice.

Relates ES-10530

@nicktindall nicktindall marked this pull request as ready for review January 17, 2025 22:25
@nicktindall nicktindall requested a review from a team as a code owner January 17, 2025 22:25
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jan 17, 2025
@nicktindall nicktindall added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) :Core/Infra/Metrics Metrics and metering infrastructure labels Jan 17, 2025
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team Team:Distributed Coordination Meta label for Distributed Coordination team labels Jan 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @nicktindall, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jan 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@nicktindall
Copy link
Contributor Author

nicktindall commented Jan 24, 2025

@henningandersen pointed out the sampling only occurs once every 30s 60s so this approach will likely be too coarse for our needs. Investigating low-cost alternatives for maintaining a more representative number.

@ywangd
Copy link
Member

ywangd commented Jan 28, 2025

sampling only occurs once every 30s

Does this refer to the APM sampling interval? In that case, it is configured as 60s in the serverless-default-settings.yml file.

@nicktindall
Copy link
Contributor Author

Does this refer to the APM sampling interval? In that case, it is configured as 60s in the serverless-default-settings.yml file.

Yes it does, so even worse! it's on my list to put together a proposal for an approach that would introduce a finer-grained averaging mechanism. It'd mean adding some state and/or threads, similar to what IngestLoadProbe does. I have some ideas.

*/
public double getUtilisation() {
return (double) getActiveCount() / getMaximumPoolSize();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's worthwhile having this, or just omitting this metric for non-timed pools altogether. Could be misleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd agree with not reporting anything in this case. If something doesn't have a useful meaning or purpose, then it's just noise: it'll confuse people later who assume it must mean something since it's reported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd prefer to omit it too, too different.

Copy link
Contributor Author

@nicktindall nicktindall Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 addressed in 54db077

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the logic (not the testing), and left a few superficial notes.

I like the idea of reporting how much capacity of a thread pool has been used over time. It would be a sort of sanity check on other metrics we report that can only be so granular over time.

Do we want to add this information in addition to what's currently being discussed in the thread pool utilization design document? This data may not be the information we wanted to report, but it seems like it would provide nice signal along with other metrics?

final long currentPollTimeNanos = System.nanoTime();
final long executionTimeSinceLastPollNanos = currentTotalExecutionTimeNanos - lastTotalExecutionTime;
final long timeSinceLastPoll = currentPollTimeNanos - lastPollTime;
final long maxExecutionTimeSinceLastPollNanos = timeSinceLastPoll * getMaximumPoolSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final long maxExecutionTimeSinceLastPollNanos = timeSinceLastPoll * getMaximumPoolSize();
final long maxSupportedExecutionTimeSinceLastPollNanos = timeSinceLastPoll * getMaximumPoolSize();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this one just to maximumExecutionTimeSinceLastPollNanos. I think that's clearer than what was there, and correlates with totalExecutionTimeSinceLastPollNanos. The names are already quite long and I don't think "supported" adds much?

public double getUtilisation() {
final long currentTotalExecutionTimeNanos = totalExecutionTime.sum();
final long currentPollTimeNanos = System.nanoTime();
final long executionTimeSinceLastPollNanos = currentTotalExecutionTimeNanos - lastTotalExecutionTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final long executionTimeSinceLastPollNanos = currentTotalExecutionTimeNanos - lastTotalExecutionTime;
final long totalExecutionTimeSinceLastPollNanos = currentTotalExecutionTimeNanos - lastTotalExecutionTime;

Copy link
Contributor Author

@nicktindall nicktindall Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in c248419

@@ -89,6 +91,21 @@ public int getCurrentQueueSize() {
return getQueue().size();
}

/**
* This returns the percentage of the maximum time spent since the last poll executing tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This returns the percentage of the maximum time spent since the last poll executing tasks
* Returns the percentage of thread time that was actually used, of the available maximum thread time supported, since the last poll of
* this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note that this presumes CPUs are always available to run all the threads? Or is that not a fudge factor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it returns the fraction, not the percentage, i.e., a number [0;1], not [0;100]?

Copy link
Contributor Author

@nicktindall nicktindall Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed in b9b8f89. I think it's fair to ignore scheduler contention, worker threads would always be getting scheduled to some extent by the OS. The more that happens the slower a CPU bound task would run, but it doesn't change the fact the worker thread is "active".

Unless I've misunderstood the feedback @DiannaHohensee

*/
public double getUtilisation() {
return (double) getActiveCount() / getMaximumPoolSize();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd agree with not reporting anything in this case. If something doesn't have a useful meaning or purpose, then it's just noise: it'll confuse people later who assume it must mean something since it's reported.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a few comments.

* This returns the percentage of the maximum time spent since the last poll executing tasks
*/
@Override
public double getUtilisation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this pollUtilization() to signal more that it is not a read-only method (and start a US/UK/AU spelling fight 🙂)

Copy link
Contributor Author

@nicktindall nicktindall Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all instances of "utilisation" in b9b8f89

*/
public double getUtilisation() {
return (double) getActiveCount() / getMaximumPoolSize();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd prefer to omit it too, too different.

@@ -89,6 +91,21 @@ public int getCurrentQueueSize() {
return getQueue().size();
}

/**
* This returns the percentage of the maximum time spent since the last poll executing tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it returns the fraction, not the percentage, i.e., a number [0;1], not [0;100]?

Comment on lines 350 to 351
"percentage of maximum threads active for " + name,
"percent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a [0;1] fraction, not a percentage? I prefer that too.

Copy link
Contributor

@JeremyDahlgren JeremyDahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, mainly in the unit test.

Comment on lines 520 to 527
Future<?> future = threadPool.executor(threadPoolName).submit(() -> {
long innerStartTimeNanos = System.nanoTime();
safeSleep(100);
safeAwait(barrier);
minimumDurationNanos.set(System.nanoTime() - innerStartTimeNanos);
});
safeAwait(barrier);
safeGet(future);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sometimes fails on my machine:

    Expected: (a value greater than <0.06801617003477693> and a value less than <0.07091782686496141>)
         but: a value greater than <0.06801617003477693> <0.0> was less than <0.06801617003477693>

It looks like there is a race here, where the Future returns but TaskExecutionTimeTrackingEsThreadPoolExecutor.afterExecute() that increments the totalExecutionTime hasn't finished yet. Since we already have the assert above on threadPool.executor(threadPoolName) one option would be to save the executor reference and wait for getTotalTaskExecutionTime() to be greater than zero. I tried this out and the test would pass consistently in a loop after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! fixed in 287786f

Comment on lines 577 to 604
assertLatestLongValueMatches(
meterRegistry,
ThreadPool.THREAD_POOL_METRIC_NAME_ACTIVE,
InstrumentType.LONG_GAUGE,
threadPoolName,
equalTo(0L)
);
assertLatestLongValueMatches(
meterRegistry,
ThreadPool.THREAD_POOL_METRIC_NAME_CURRENT,
InstrumentType.LONG_GAUGE,
threadPoolName,
equalTo(0L)
);
assertLatestLongValueMatches(
meterRegistry,
ThreadPool.THREAD_POOL_METRIC_NAME_COMPLETED,
InstrumentType.LONG_ASYNC_COUNTER,
threadPoolName,
equalTo(0L)
);
assertLatestLongValueMatches(
meterRegistry,
ThreadPool.THREAD_POOL_METRIC_NAME_LARGEST,
InstrumentType.LONG_GAUGE,
threadPoolName,
equalTo(0L)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion to shorten up the code some would be to either group together the common parameters like:

            for (final var metricName : List.of(
                ThreadPool.THREAD_POOL_METRIC_NAME_ACTIVE,
                ThreadPool.THREAD_POOL_METRIC_NAME_CURRENT,
                ThreadPool.THREAD_POOL_METRIC_NAME_LARGEST
            )) {
                assertLatestLongValueMatches(meterRegistry, metricName, InstrumentType.LONG_GAUGE, threadPoolName, equalTo(0L));
            }

This could be done for each of the three sections of assertions in this test. Or maybe create a local lambda to hide the repeated meterRegistry and threadPoolName and only take the parameters that differ in each assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reduce this repetition a bit here... d6d82f2

I've had to do similar before (see AzureBlobStoreRepositoryMetricsTest). I wonder if the RecordingMeterRegistry could provide more utilities in a generic way.

Comment on lines +648 to +650
// Let all threads complete
safeAwait(barrier);
futures.forEach(ESTestCase::safeGet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test would fail on my machine sometimes after this point. Looks like a similar race condition as in the other test? If I kept the reference to the executor and waited on getOngoingTasks() to be empty the test would pass consistently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! good pick up, addressed in 436cc0b

Copy link
Contributor

@JeremyDahlgren JeremyDahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nicktindall nicktindall merged commit 270ca0a into elastic:main Apr 17, 2025
17 checks passed
@nicktindall nicktindall deleted the ES-10530_add_thread_pool_utilisation_metric branch April 17, 2025 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Metrics Metrics and metering infrastructure :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Core/Infra Meta label for core/infra team Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants