-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Guard Get operations against Engine resets #125646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Relates elastic#124635 Closes ES-11324
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
// TODO ES-10826: Acquire ref to engine and retry if it's immutable again? | ||
l.onResponse(null); | ||
})); | ||
indexEventListener.beforeIndexShardMutableOperation(this, listener.delegateFailure((l, unused) -> l.onResponse(null))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we shouldn't provide the Engine through this listener. It kind of breaks the withEngine API that expects that the Engine won't escape the lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unhollow/hollow timing should protect the engine from going hollow again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can remove this TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// TODO ES-10826: Acquire ref to engine and retry if it's immutable again? | ||
l.onResponse(null); | ||
})); | ||
indexEventListener.beforeIndexShardMutableOperation(this, listener.delegateFailure((l, unused) -> l.onResponse(null))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can remove this TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Nice to see all the TODOs that I left getting removed
Interesting test failure on |
Good point @arteam. I think that I'll remove the test |
There is special handling for retrying AlreadyClosedException in the serverless test stress to adjust. |
Thanks both for noticing, I've opened the serverless counterpart of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if (cause instanceof ShardNotFoundException | ||
|| cause instanceof IndexNotFoundException | ||
|| cause instanceof IllegalIndexShardStateException | ||
|| cause instanceof AlreadyClosedException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This AlreadyClosedException is thrown when the engine is closed due to relocation (not due to reset)
Engine engine = indexShard.getEngineOrNull(); | ||
if (engine == null) { | ||
throw new AlreadyClosedException("engine closed"); | ||
// Allows to keep the same engine instance for getFromTranslog and getLastUnsafeSegmentGenerationForGets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly required, but I think it makes sense to ensure it uses the same engine instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for taking over this work @tlrx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
Thanks all! |
Relates #124635
Closes ES-11324