Skip to content

Bedrock sender sleeping in inference_utility thread pool #115079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
frensjan opened this issue Oct 18, 2024 · 3 comments
Closed

Bedrock sender sleeping in inference_utility thread pool #115079

frensjan opened this issue Oct 18, 2024 · 3 comments
Assignees
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team

Comments

@frensjan
Copy link

Elasticsearch Version

8.15

Installed Plugins

No response

Java Version

17

OS Version

Debian bookworm

Problem Description

Probably nothing broken, but confusing: after upgrading to ES 8.15 we're seeing a thread continuously occupying the inference_utility thread pool by the AmazonBedrockRequestExecutorService. It seems to be sleeping in handleTasks().

An example stack trace:

   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep([email protected]/Native Method)
        at java.lang.Thread.sleep([email protected]/Thread.java:344)
        at java.util.concurrent.TimeUnit.sleep([email protected]/TimeUnit.java:446)
        at org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.lambda$static$0([email protected]/RequestExecutorService.java:66)
        at org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$$Lambda$5012/0x00007fa300c68ff8.sleep([email protected]/Unknown Source)
        at org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.handleTasks([email protected]/RequestExecutorService.java:240)
        at org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.start([email protected]/RequestExecutorService.java:192)
        at org.elasticsearch.xpack.inference.external.http.sender.AmazonBedrockRequestExecutorService.start([email protected]/AmazonBedrockRequestExecutorService.java:19)
        at org.elasticsearch.xpack.inference.external.amazonbedrock.AmazonBedrockRequestSender.lambda$start$0([email protected]/AmazonBedrockRequestSender.java:89)
        at org.elasticsearch.xpack.inference.external.amazonbedrock.AmazonBedrockRequestSender$$Lambda$5018/0x00007fa300c750b0.run([email protected]/Unknown Source)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run([email protected]/ThreadContext.java:917)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:840)

It's just one thread out of the 10, losing 10% capacity there is probably not to big of an issue. The annoying thing is that we use a PromQL query to monitor thread pools of ES as provides by the Elasticsearch Exporter that fires as the core size of the pool is 0.

max(elasticsearch_thread_pool_active_count) by (type) / avg(elasticsearch_thread_pool_threads_count > 0) by (type)

Steps to Reproduce

Just use the threads cat API to see that this pool always has at least 1 thread active. Taking a thread / heap dump shows AmazonBedrockRequestExecutorService and related.

Logs (if relevant)

No response

@frensjan frensjan added >bug needs:triage Requires assignment of a team area label labels Oct 18, 2024
@frensjan
Copy link
Author

What perhaps could also help here is that if the _nodes/stats endpoint exposes the max size of the thread pool, it would be clear that there is still capacity in the pool.

@pxsalehi pxsalehi added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Oct 18, 2024
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@prwhelan prwhelan changed the title Bedrock sender sleeping in inference_utility thread pool RequestExecutorService sender sleeping in inference_utility thread pool Oct 22, 2024
@prwhelan prwhelan changed the title RequestExecutorService sender sleeping in inference_utility thread pool Bedrock sender sleeping in inference_utility thread pool Oct 22, 2024
@prwhelan
Copy link
Member

We fixed this to no longer use a sleeping thread to wait for new tasks: #126858

This will be available in 8.17.6, 8.18.1, 8.19.0, 9.0.1, and 9.1+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

5 participants