-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Evict from the shared blob cache asynchronously #126581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Evict from the shared blob cache asynchronously #126581
Conversation
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
Hi @nicktindall, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, will leave actual approval to Tanguy.
I think we could maybe do the spawn further out to avoid too many tasks - but it may not really be helpful.
I guess this does not address contention on the CacheService evictions - but we can see if we need that to address that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comment about the (lack of) reasons to evict cache entries for partially mounted shards. I do like the new forceEvictAsync method, it might be useful in other places too.
...ugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java
Outdated
Show resolved
Hide resolved
...ugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java
Outdated
Show resolved
Hide resolved
final SharedBlobCacheService<CacheKey> sharedBlobCacheService = | ||
SearchableSnapshotIndexFoldersDeletionListener.this.frozenCacheServiceSupplier.get(); | ||
assert sharedBlobCacheService != null : "frozen cache service not initialized"; | ||
sharedBlobCacheService.forceEvictAsync(SearchableSnapshots.forceEvictPredicate(shardId, indexSettings.getSettings())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't remember all the decisions around evicting cache entries for partially mounted shards 😞
I suspect that we made it this way to allow cache regions to be reused sooner without waiting for them to decay. It was also useful at the beginning when some corrupted data got written in the cache for some reason, as the forced eviction would clear the mess for us.
But besides this, for partially mounted shards I don't see much reason to force the eviction of cache entries vs. just let them expire in cache. And if the shard is quickly relocated them reassigned to the same node, I think there is a risk that the async force eviction now runs concurrently with a shard recovery?
So maybe we could only force-evict asynchronously when the shard is deleted or failed, and let cache entries in cache if it's no longer assigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That sounds reasonable. It's easy to implement in SearchableSnapshotIndexEventListener#beforeIndexRemoved
because we have the IndexRemovalReason
. It's a bit trickier in the SearchableSnapshotIndexFoldersDeletionListener#before(Index|Shard)FoldersDeleted
because we lack that context, I'll trace back to where those deletions originate to see if there's an obvious way to propagate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a go at propagating the reason for the deletion to the listeners. This allows the listener to trigger the asynchronous eviction when we know the shards/indices aren't coming back (i.e. only on DELETE). It meant changes in a few places.
- I used the
IndexRemovalReason
to communicate the reason for deletion. I don't like borrowing that from an unrelated interface but we did already have it in scope in some of these places. If we think it's right to use it I could break it out to be a top-level enum rather than being underIndicesClusterStateService.AllocatedIndices.IndexRemovalReason
. - There are some places that now take a
reasonText
and anIndexRemovalReason
we could get rid of the reason text if we don't feel it's adding anything, but it would mean some log messages would change. It sometimes seems to offer more context, for example the text is different when theIndexService
executes a pending delete vs when it succeeds on the first attempt, also delete unassigned index specifies that the index is being deleted despite it not being assigned to the local node. - I think the only time it's safe to schedule an asynchronous delete is on an
IndexRemovalReason.DELETE
. I don't thinkFAILURE
is appropriate, because I assume we could retry after one of those? I don't have the context to make this call I don't think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nick.
- If we think it's right to use it I could break it out to be a top-level enum rather than being under
IndicesClusterStateService.AllocatedIndices.IndexRemovalReason
.
That makes sense.
- There are some places that now take a
reasonText
and anIndexRemovalReason
Thanks for having kept the reason as text. It's provides a bit more context and people are also used to search them in logs.
- I think the only time it's safe to schedule an asynchronous delete is on an
IndexRemovalReason.DELETE
. I don't thinkFAILURE
is appropriate, because I assume we could retry after one of those?
Yes,it is possible that the failed shard got reassigned on the same node after it failed. But in that case, we don't really know the cause of the failure and it would be preferable to synchronously evict the cache I think. It makes sure that cached data are cleaned up so that retries will fetch them again from the source of truth (in the case the cached data are the cause of the failure if we were not evicting them then the shard would have no chance to recover ever).
It goes against the purpose of this PR but shard failures should be the exception so I think keeping the synchronous eviction is OK for failures.
…ache/shared/SharedBlobCacheService.java Co-authored-by: Tanguy Leroux <[email protected]>
…_blob_cache_asynchronously
…_blob_cache_asynchronously
final SharedBlobCacheService<CacheKey> sharedBlobCacheService = | ||
SearchableSnapshotIndexFoldersDeletionListener.this.frozenCacheServiceSupplier.get(); | ||
assert sharedBlobCacheService != null : "frozen cache service not initialized"; | ||
sharedBlobCacheService.forceEvictAsync(SearchableSnapshots.forceEvictPredicate(shardId, indexSettings.getSettings())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a go at propagating the reason for the deletion to the listeners. This allows the listener to trigger the asynchronous eviction when we know the shards/indices aren't coming back (i.e. only on DELETE). It meant changes in a few places.
- I used the
IndexRemovalReason
to communicate the reason for deletion. I don't like borrowing that from an unrelated interface but we did already have it in scope in some of these places. If we think it's right to use it I could break it out to be a top-level enum rather than being underIndicesClusterStateService.AllocatedIndices.IndexRemovalReason
. - There are some places that now take a
reasonText
and anIndexRemovalReason
we could get rid of the reason text if we don't feel it's adding anything, but it would mean some log messages would change. It sometimes seems to offer more context, for example the text is different when theIndexService
executes a pending delete vs when it succeeds on the first attempt, also delete unassigned index specifies that the index is being deleted despite it not being assigned to the local node. - I think the only time it's safe to schedule an asynchronous delete is on an
IndexRemovalReason.DELETE
. I don't thinkFAILURE
is appropriate, because I assume we could retry after one of those? I don't have the context to make this call I don't think.
this.indexSettings, | ||
shardPaths, | ||
IndexRemovalReason.FAILURE | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a mis-categorisation as FAILURE
. The javadoc seems to suggest it's deleting remnants of a different shard rather than the shard being created, due to a name collision. So we're deleting not because the shard failed to start, but to clear old state from a shard that used to have the same name as the one being started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to use FAILURE
, but maybe worth a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. I left a comment about evictions in case of shard failures, otherwise looks good.
this.indexSettings, | ||
shardPaths, | ||
IndexRemovalReason.FAILURE | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to use FAILURE
, but maybe worth a comment?
final SharedBlobCacheService<CacheKey> sharedBlobCacheService = | ||
SearchableSnapshotIndexFoldersDeletionListener.this.frozenCacheServiceSupplier.get(); | ||
assert sharedBlobCacheService != null : "frozen cache service not initialized"; | ||
sharedBlobCacheService.forceEvictAsync(SearchableSnapshots.forceEvictPredicate(shardId, indexSettings.getSettings())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nick.
- If we think it's right to use it I could break it out to be a top-level enum rather than being under
IndicesClusterStateService.AllocatedIndices.IndexRemovalReason
.
That makes sense.
- There are some places that now take a
reasonText
and anIndexRemovalReason
Thanks for having kept the reason as text. It's provides a bit more context and people are also used to search them in logs.
- I think the only time it's safe to schedule an asynchronous delete is on an
IndexRemovalReason.DELETE
. I don't thinkFAILURE
is appropriate, because I assume we could retry after one of those?
Yes,it is possible that the failed shard got reassigned on the same node after it failed. But in that case, we don't really know the cause of the failure and it would be preferable to synchronously evict the cache I think. It makes sure that cached data are cleaned up so that retries will fetch them again from the source of truth (in the case the cached data are the cause of the failure if we were not evicting them then the shard would have no chance to recover ever).
It goes against the purpose of this PR but shard failures should be the exception so I think keeping the synchronous eviction is OK for failures.
ThrottledTaskRunner
to optionally execute the shared blob cache evictions asynchronouslySharedSnapshotIndexEventListener
andSharedSnapshotIndexFoldersDeletionListener
now call the asynchronous methodI don’t know if there’s anything to be gained by moving evictions from the
CacheService
off the applier thread any more than they already are. I have concerns about making that more asynchronous than it is because inCacheService
there is a method calledwaitForCacheFilesEvictionIfNeeded
which takes theshardsEvictionMutex
and blocks until any pending shard evictions for the specified shard are completed. It uses thependingShardsEvictions
map to know if there are any pending evictions. If we add another layer of asynchrony, we will potentially be adding a “shadow” queue of evictions that this method doesn’t know about. I wonder if that might break things.If there are performance issue with
CacheService
evictions, I think we’d be better off optimising the enqueueing and processing of evictions in that, some ideas for that includeReduce the amount of lock contention. There is potentially a lot of contention for the
shardsEvictionsMutex
between the evicting threads on generic and the calls tomarkShardAsEvictedInCache
. I believe there are ways to reduce that.Reduce the amount of concurrent evictions. Currently there is no limit other than the size of the generic pool. We could add a
ThrottledTaskRunner
and it might reduce the contention enough to makemarkShardsAsEvictedInCache
faster.Relates: ES-10744