-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Recovery stage of a searchable snapshot shard stuck at FINALIZE #83878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
We noticed that the current behaviour of fully mounted indices shards being stuck at @henningandersen and I discussed this and we agreed on implementing some retrying logic (maybe at the directory level) for cache file prewarming. In addition to this retry logic we could add a |
It looks like we are truly hit by this problem in our productive clusters now. |
Question: Could such a |
No. This API clears the index request, query and field data caches but not the caches used by searchable snapshot. Usually cache file evictions occurs when a shard is relocated, removed or closed during it's prewarming. |
It's worth no note there might be different scenarios leading to this problem. Example with network socket read timeout from version 7.17.7:
|
This is still happening. Caused repeated plan change failure of a routine operation on a large cluster. Required 4 hours manual labour to identify and workaround. This process was to identify the stuck shards and run:
Please note that this happened on an |
Still occurring on an 8.14 cluster |
Still happening on 8.17 |
Elasticsearch Version: 7.17
If there's a failure in
prewarmCache
, then the recovery stage of a searchable snapshot shard will be stuck at FINALIZE although that its recovery is completed properly.This is the failure during
prewarmCache
.The text was updated successfully, but these errors were encountered: