Skip to content

Semantic Text Chunking Indexing Pressure #125517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Apr 14, 2025

Conversation

Mikep86
Copy link
Contributor

@Mikep86 Mikep86 commented Mar 24, 2025

We have observed many OOMs due to the memory required to inject chunked inference results for semantic_text fields. This PR uses coordinating indexing pressure to account for this memory usage. When indexing pressure memory usage exceeds the threshold set by indexing_pressure.memory.limit, chunked inference result injection will be suspended to prevent OOMs.

@Mikep86 Mikep86 added >non-issue :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.19.0 labels Mar 24, 2025
@Mikep86 Mikep86 requested a review from kderusso March 24, 2025 16:59
@Mikep86
Copy link
Contributor Author

Mikep86 commented Apr 8, 2025

@elasticmachine update branch

@Mikep86
Copy link
Contributor Author

Mikep86 commented Apr 8, 2025

@elasticmachine update branch

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @Mikep86

@elasticsearchmachine
Copy link
Collaborator

Hi @Mikep86, I've created a changelog YAML for you.

@jimczi jimczi added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Apr 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Mikep86.
Let’s update the PR title and summary to reflect the use of indexing pressure.
It would be great if someone from @elastic/es-distributed-indexing could review this section.

@Mikep86 Mikep86 changed the title Add Semantic Text Chunking OOM Circuit Breaker Semantic Text Chunking Indexing Pressure Apr 11, 2025
@Mikep86 Mikep86 merged commit 85713f7 into elastic:main Apr 14, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 125517

@Mikep86
Copy link
Contributor Author

Mikep86 commented Apr 28, 2025

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

Mikep86 added a commit to Mikep86/elasticsearch that referenced this pull request Apr 28, 2025
We have observed many OOMs due to the memory required to inject chunked inference results for semantic_text fields. This PR uses coordinating indexing pressure to account for this memory usage. When indexing pressure memory usage exceeds the threshold set by indexing_pressure.memory.limit, chunked inference result injection will be suspended to prevent OOMs.

(cherry picked from commit 85713f7)

# Conflicts:
#	server/src/main/java/org/elasticsearch/node/NodeConstruction.java
#	server/src/main/java/org/elasticsearch/node/PluginServiceInstances.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java
elasticsearchmachine added a commit that referenced this pull request Apr 28, 2025
* Semantic Text Chunking Indexing Pressure (#125517)

We have observed many OOMs due to the memory required to inject chunked inference results for semantic_text fields. This PR uses coordinating indexing pressure to account for this memory usage. When indexing pressure memory usage exceeds the threshold set by indexing_pressure.memory.limit, chunked inference result injection will be suspended to prevent OOMs.

(cherry picked from commit 85713f7)

# Conflicts:
#	server/src/main/java/org/elasticsearch/node/NodeConstruction.java
#	server/src/main/java/org/elasticsearch/node/PluginServiceInstances.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants