Skip to content

Avoid walking the complete list of search contexts on shard creation #123855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Mar 3, 2025

This I found in the many-shards benchmark during some manual testing. Creating indices slows down measurably when there's concurrent searches going on. Interestingly enough, the bulk of the cost is coming from this hook. This makes sense to some extend because the map can quickly grow to a massive size as it scales as O(shards_searched_on_average * concurrent_searches) and a CHM generally is anything but cheap to iterate over.

=> no need to do this iteration if we're creating a new shard.

image

This I found in the many-shards benchmark during some manual testing.
Creating indices slows down measurably when there's concurrent searches
going on. Interestingly enough, the bulk of the cost is coming from this
hook. This makes sense to some extend because the map can quickly grow
to a massive size as it scales as O(shards_searched_on_average *
concurrent_searches) and a CHM generally is anything but cheap to
iterate over.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Mar 3, 2025
@benchaplin
Copy link
Contributor

The comment in that method: "we prefer to stop searches to restore full availability as fast as possible" makes me think that allowing searches to continue also slows down index creation - did your change actually speed up the index creation as a whole? Maybe I'm misunderstanding the comment... but it reads like: 'freeing the contexts is supposed to speed things up,' and you're saying 'trying to speed things up is slow, let's skip it to go faster.'

Another thought on:

O(shards_searched_on_average * concurrent_searches)

I wonder, does it make sense to maintain a different data structure of active readers per shard? That would speed up freeAllContextsForShard.

@original-brownbear
Copy link
Member Author

I wonder, does it make sense to maintain a different data structure of active readers per shard? That would speed up freeAllContextsForShard.

Yea this definitely should just live on IndexShard, well spotted :) Me and a collegue had the same thought recently. And this also holds the answer to your other question I think.

but it reads like: 'freeing the contexts is supposed to speed things up,' and you're saying 'trying to speed things up is slow, let's skip it to go faster.'

I think it's more like: "freeing contexts is sometimes slow and more importantly slows down under contention, here we can skip it because it does not do anything anyway to remove some contention introduced by index creation".
Hope that helps :)

You're 100% right, the current approach is not referencing the contexts from the correct place and dealing with that would be a much stronger fix, I just figured that would have a harder time getting a review in the short term and this change made my benchmarking easier to interpret when I opened it :D (and also still removes some contention/noise from heavily loaded production environments)

Copy link
Contributor

@benchaplin benchaplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I didn't fully understand your change. I believe I now see why you said freeing contexts "does not do anything anyway".

Copy link
Contributor

@benchaplin benchaplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

@original-brownbear original-brownbear added the auto-backport Automatically create backport pull requests when merged label Apr 14, 2025
@original-brownbear
Copy link
Member Author

Thanks Ben!

@original-brownbear original-brownbear merged commit 235867c into elastic:main Apr 14, 2025
17 checks passed
@original-brownbear original-brownbear deleted the avoid-walking-all-contexts branch April 14, 2025 19:19
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 14, 2025
…lastic#123855)

This I found in the many-shards benchmark during some manual testing.
Creating indices slows down measurably when there's concurrent searches
going on. Interestingly enough, the bulk of the cost is coming from this
hook. This makes sense to some extend because the map can quickly grow
to a massive size as it scales as O(shards_searched_on_average *
concurrent_searches) and a CHM generally is anything but cheap to
iterate over.
elasticsearchmachine pushed a commit that referenced this pull request Apr 14, 2025
…123855) (#126798)

This I found in the many-shards benchmark during some manual testing.
Creating indices slows down measurably when there's concurrent searches
going on. Interestingly enough, the bulk of the cost is coming from this
hook. This makes sense to some extend because the map can quickly grow
to a massive size as it scales as O(shards_searched_on_average *
concurrent_searches) and a CHM generally is anything but cheap to
iterate over.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants