Evict from the shared blob cache asynchronously #126581

nicktindall · 2025-04-10T06:35:21Z

I added a ThrottledTaskRunner to optionally execute the shared blob cache evictions asynchronously
- SharedSnapshotIndexEventListener and SharedSnapshotIndexFoldersDeletionListener now call the asynchronous method
- We limit to 5 concurrent deletion threads by default. I think there’s limited gains to be had by increasing concurrency much more than this because the removals happen inside a mutex anyway (the scan for what to remove can be done concurrently). Open to reducing it to less than 5 if we think that’s appropriate.
Whenever we are clearing the cache, we only clear the shared cache for partially mounted indices. It looked like only partial mounted indices use the shared cache. I imagine if recommendations are followed and people use dedicated frozen nodes there will be limited impact from that change, because the cache will either be empty or always scanned. Open to removing those changes for simplicity’s sake.

I don’t know if there’s anything to be gained by moving evictions from the CacheService off the applier thread any more than they already are. I have concerns about making that more asynchronous than it is because in CacheService there is a method called waitForCacheFilesEvictionIfNeeded which takes the shardsEvictionMutex and blocks until any pending shard evictions for the specified shard are completed. It uses the pendingShardsEvictions map to know if there are any pending evictions. If we add another layer of asynchrony, we will potentially be adding a “shadow” queue of evictions that this method doesn’t know about. I wonder if that might break things.

If there are performance issue with CacheService evictions, I think we’d be better off optimising the enqueueing and processing of evictions in that, some ideas for that include

Reduce the amount of lock contention. There is potentially a lot of contention for the shardsEvictionsMutex between the evicting threads on generic and the calls to markShardAsEvictedInCache. I believe there are ways to reduce that.

Reduce the amount of concurrent evictions. Currently there is no limit other than the size of the generic pool. We could add a ThrottledTaskRunner and it might reduce the contention enough to make markShardsAsEvictedInCache faster.

Relates: ES-10744

…exEventListener)

elasticsearchmachine · 2025-04-10T07:12:51Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

elasticsearchmachine · 2025-04-10T07:12:51Z

Hi @nicktindall, I've created a changelog YAML for you.

henningandersen

Looks good, will leave actual approval to Tanguy.

I think we could maybe do the spawn further out to avoid too many tasks - but it may not really be helpful.

I guess this does not address contention on the CacheService evictions - but we can see if we need that to address that too.

tlrx

I left some comment about the (lack of) reasons to evict cache entries for partially mounted shards. I do like the new forceEvictAsync method, it might be useful in other places too.

...ugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java

tlrx · 2025-04-11T09:46:49Z

...rch/xpack/searchablesnapshots/allocation/SearchableSnapshotIndexFoldersDeletionListener.java

+            final SharedBlobCacheService<CacheKey> sharedBlobCacheService =
+                SearchableSnapshotIndexFoldersDeletionListener.this.frozenCacheServiceSupplier.get();
+            assert sharedBlobCacheService != null : "frozen cache service not initialized";
+            sharedBlobCacheService.forceEvictAsync(SearchableSnapshots.forceEvictPredicate(shardId, indexSettings.getSettings()));


I'm afraid I don't remember all the decisions around evicting cache entries for partially mounted shards 😞

I suspect that we made it this way to allow cache regions to be reused sooner without waiting for them to decay. It was also useful at the beginning when some corrupted data got written in the cache for some reason, as the forced eviction would clear the mess for us.

But besides this, for partially mounted shards I don't see much reason to force the eviction of cache entries vs. just let them expire in cache. And if the shard is quickly relocated them reassigned to the same node, I think there is a risk that the async force eviction now runs concurrently with a shard recovery?

So maybe we could only force-evict asynchronously when the shard is deleted or failed, and let cache entries in cache if it's no longer assigned.

Thanks! That sounds reasonable. It's easy to implement in SearchableSnapshotIndexEventListener#beforeIndexRemoved because we have the IndexRemovalReason. It's a bit trickier in the SearchableSnapshotIndexFoldersDeletionListener#before(Index|Shard)FoldersDeleted because we lack that context, I'll trace back to where those deletions originate to see if there's an obvious way to propagate that.

I had a go at propagating the reason for the deletion to the listeners. This allows the listener to trigger the asynchronous eviction when we know the shards/indices aren't coming back (i.e. only on DELETE). It meant changes in a few places.

I used the IndexRemovalReason to communicate the reason for deletion. I don't like borrowing that from an unrelated interface but we did already have it in scope in some of these places. If we think it's right to use it I could break it out to be a top-level enum rather than being under IndicesClusterStateService.AllocatedIndices.IndexRemovalReason.

There are some places that now take a reasonText and an IndexRemovalReason we could get rid of the reason text if we don't feel it's adding anything, but it would mean some log messages would change. It sometimes seems to offer more context, for example the text is different when the IndexService executes a pending delete vs when it succeeds on the first attempt, also delete unassigned index specifies that the index is being deleted despite it not being assigned to the local node.

I think the only time it's safe to schedule an asynchronous delete is on an IndexRemovalReason.DELETE. I don't think FAILURE is appropriate, because I assume we could retry after one of those? I don't have the context to make this call I don't think.

Thanks Nick.

If we think it's right to use it I could break it out to be a top-level enum rather than being under IndicesClusterStateService.AllocatedIndices.IndexRemovalReason.

That makes sense.

There are some places that now take a reasonText and an IndexRemovalReason

Thanks for having kept the reason as text. It's provides a bit more context and people are also used to search them in logs.

I think the only time it's safe to schedule an asynchronous delete is on an IndexRemovalReason.DELETE. I don't think FAILURE is appropriate, because I assume we could retry after one of those?

Yes,it is possible that the failed shard got reassigned on the same node after it failed. But in that case, we don't really know the cause of the failure and it would be preferable to synchronously evict the cache I think. It makes sure that cached data are cleaned up so that retries will fetch them again from the source of truth (in the case the cached data are the cause of the failure if we were not evicting them then the shard would have no chance to recover ever).

It goes against the purpose of this PR but shard failures should be the exception so I think keeping the synchronous eviction is OK for failures.

…ache/shared/SharedBlobCacheService.java Co-authored-by: Tanguy Leroux <[email protected]>

…_blob_cache_asynchronously

nicktindall · 2025-04-23T06:32:40Z

...rch/xpack/searchablesnapshots/allocation/SearchableSnapshotIndexFoldersDeletionListener.java

+            final SharedBlobCacheService<CacheKey> sharedBlobCacheService =
+                SearchableSnapshotIndexFoldersDeletionListener.this.frozenCacheServiceSupplier.get();
+            assert sharedBlobCacheService != null : "frozen cache service not initialized";
+            sharedBlobCacheService.forceEvictAsync(SearchableSnapshots.forceEvictPredicate(shardId, indexSettings.getSettings()));


I had a go at propagating the reason for the deletion to the listeners. This allows the listener to trigger the asynchronous eviction when we know the shards/indices aren't coming back (i.e. only on DELETE). It meant changes in a few places.

I used the IndexRemovalReason to communicate the reason for deletion. I don't like borrowing that from an unrelated interface but we did already have it in scope in some of these places. If we think it's right to use it I could break it out to be a top-level enum rather than being under IndicesClusterStateService.AllocatedIndices.IndexRemovalReason.

There are some places that now take a reasonText and an IndexRemovalReason we could get rid of the reason text if we don't feel it's adding anything, but it would mean some log messages would change. It sometimes seems to offer more context, for example the text is different when the IndexService executes a pending delete vs when it succeeds on the first attempt, also delete unassigned index specifies that the index is being deleted despite it not being assigned to the local node.

I think the only time it's safe to schedule an asynchronous delete is on an IndexRemovalReason.DELETE. I don't think FAILURE is appropriate, because I assume we could retry after one of those? I don't have the context to make this call I don't think.

nicktindall · 2025-04-23T06:40:37Z

server/src/main/java/org/elasticsearch/index/IndexService.java

+                            this.indexSettings,
+                            shardPaths,
+                            IndexRemovalReason.FAILURE
+                        )


This may be a mis-categorisation as FAILURE. The javadoc seems to suggest it's deleting remnants of a different shard rather than the shard being created, due to a name collision. So we're deleting not because the shard failed to start, but to clear old state from a shard that used to have the same name as the one being started.

I think it's OK to use FAILURE, but maybe worth a comment?

tlrx

Sorry for the late review. I left a comment about evictions in case of shard failures, otherwise looks good.

tlrx · 2025-04-28T07:18:04Z

server/src/main/java/org/elasticsearch/index/IndexService.java

+                            this.indexSettings,
+                            shardPaths,
+                            IndexRemovalReason.FAILURE
+                        )


I think it's OK to use FAILURE, but maybe worth a comment?

tlrx · 2025-04-28T07:41:51Z

...rch/xpack/searchablesnapshots/allocation/SearchableSnapshotIndexFoldersDeletionListener.java

+            final SharedBlobCacheService<CacheKey> sharedBlobCacheService =
+                SearchableSnapshotIndexFoldersDeletionListener.this.frozenCacheServiceSupplier.get();
+            assert sharedBlobCacheService != null : "frozen cache service not initialized";
+            sharedBlobCacheService.forceEvictAsync(SearchableSnapshots.forceEvictPredicate(shardId, indexSettings.getSettings()));


Thanks Nick.

If we think it's right to use it I could break it out to be a top-level enum rather than being under IndicesClusterStateService.AllocatedIndices.IndexRemovalReason.

That makes sense.

There are some places that now take a reasonText and an IndexRemovalReason

Thanks for having kept the reason as text. It's provides a bit more context and people are also used to search them in logs.

I think the only time it's safe to schedule an asynchronous delete is on an IndexRemovalReason.DELETE. I don't think FAILURE is appropriate, because I assume we could retry after one of those?

Yes,it is possible that the failed shard got reassigned on the same node after it failed. But in that case, we don't really know the cause of the failure and it would be preferable to synchronously evict the cache I think. It makes sure that cached data are cleaned up so that retries will fetch them again from the source of truth (in the case the cached data are the cause of the failure if we were not evicting them then the shard would have no chance to recover ever).

It goes against the purpose of this PR but shard failures should be the exception so I think keeping the synchronous eviction is OK for failures.

Evict from the shared blob cache asynchronously

c86859c

elasticsearchmachine added v9.1.0 needs:triage Requires assignment of a team area label labels Apr 10, 2025

Only evict from shared cache when index is partial (SharedSnapshotInd…

c143dba

…exEventListener)

nicktindall added >enhancement :Distributed Indexing/Searchable Snapshots Searchable snapshots / frozen indices. labels Apr 10, 2025

nicktindall requested review from tlrx and henningandersen April 10, 2025 07:12

elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team and removed needs:triage Requires assignment of a team area label labels Apr 10, 2025

Update docs/changelog/126581.yaml

a62ac00

github-actions bot deployed to docs-preview April 10, 2025 07:13 View deployment

Merge branch 'main' into evict_from_the_shared_blob_cache_asynchronously

6ae1e7c

github-actions bot deployed to docs-preview April 10, 2025 07:18 View deployment

henningandersen reviewed Apr 10, 2025

View reviewed changes

Fix changelog

ff3a25d

github-actions bot deployed to docs-preview April 10, 2025 12:56 View deployment

tlrx reviewed Apr 11, 2025

View reviewed changes

Update x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobc…

2ef16c9

…ache/shared/SharedBlobCacheService.java Co-authored-by: Tanguy Leroux <[email protected]>

github-actions bot had a problem deploying to docs-preview April 14, 2025 05:55 Failure

Fix indenting

d3ce506

github-actions bot had a problem deploying to docs-preview April 14, 2025 05:56 Failure

evictionsRunner -> asyncEvictionsRunner

bd35686

github-actions bot deployed to docs-preview April 14, 2025 05:57 View deployment

Merge branch 'main' into evict_from_the_shared_blob_cache_asynchronously

83c1bda

github-actions bot deployed to docs-preview April 15, 2025 04:23 View deployment

Only evict asynchronously for shards we know are not coming back

632afbc

github-actions bot deployed to docs-preview April 15, 2025 06:14 View deployment

Merge remote-tracking branch 'origin/main' into evict_from_the_shared…

3274c1c

…_blob_cache_asynchronously

github-actions bot deployed to docs-preview April 15, 2025 07:20 View deployment

Merge remote-tracking branch 'origin/main' into evict_from_the_shared…

253dba1

…_blob_cache_asynchronously

github-actions bot deployed to docs-preview April 22, 2025 04:17 View deployment

Merge branch 'main' into evict_from_the_shared_blob_cache_asynchronously

f035f25

github-actions bot deployed to docs-preview April 23, 2025 03:14 View deployment

Propagate IndexRemovalReason to deletion listeners

7ac3220

nicktindall requested a review from a team as a code owner April 23, 2025 03:40

github-actions bot deployed to docs-preview April 23, 2025 03:41 View deployment

Fix naming (reasonMessage/reason)

8e18644

github-actions bot deployed to docs-preview April 23, 2025 03:46 View deployment

Fix naming (reasonText/reason)

410fb35

github-actions bot deployed to docs-preview April 23, 2025 03:48 View deployment

Naming

2372056

github-actions bot deployed to docs-preview April 23, 2025 03:50 View deployment

elasticsearchmachine and others added 2 commits April 23, 2025 03:56

[CI] Auto commit changes from spotless

8c91b45

Naming/javadoc

87d1ba4

github-actions bot deployed to docs-preview April 23, 2025 03:57 View deployment

randomReason()

ea43b2d

github-actions bot deployed to docs-preview April 23, 2025 04:03 View deployment

Don't evict shards when IndexRemovalReason is FAILURE

c6e7a05

github-actions bot deployed to docs-preview April 23, 2025 06:23 View deployment

javadoc/naming

7eebc42

nicktindall commented Apr 23, 2025

View reviewed changes

github-actions bot deployed to docs-preview April 23, 2025 07:00 View deployment

nicktindall requested a review from tlrx April 23, 2025 10:50

tlrx reviewed Apr 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evict from the shared blob cache asynchronously #126581

Evict from the shared blob cache asynchronously #126581

nicktindall commented Apr 10, 2025 •

edited

Loading

elasticsearchmachine commented Apr 10, 2025

elasticsearchmachine commented Apr 10, 2025

henningandersen left a comment

tlrx left a comment

tlrx Apr 11, 2025

nicktindall Apr 15, 2025

nicktindall Apr 23, 2025

tlrx Apr 28, 2025

nicktindall Apr 23, 2025

nicktindall Apr 23, 2025

tlrx Apr 28, 2025

tlrx left a comment

tlrx Apr 28, 2025

tlrx Apr 28, 2025

Evict from the shared blob cache asynchronously #126581

Are you sure you want to change the base?

Evict from the shared blob cache asynchronously #126581

Conversation

nicktindall commented Apr 10, 2025 • edited Loading

elasticsearchmachine commented Apr 10, 2025

elasticsearchmachine commented Apr 10, 2025

henningandersen left a comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicktindall commented Apr 10, 2025 •

edited

Loading