Skip to content

Guard Get operations against Engine resets #125646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 11, 2025
Merged

Guard Get operations against Engine resets #125646

merged 6 commits into from
Apr 11, 2025

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Mar 26, 2025

Relates #124635
Closes ES-11324

@fcofdez fcofdez added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. Team:Distributed Indexing Meta label for Distributed Indexing team labels Mar 26, 2025
@fcofdez fcofdez requested review from tlrx, arteam and kingherc March 26, 2025 08:55
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

// TODO ES-10826: Acquire ref to engine and retry if it's immutable again?
l.onResponse(null);
}));
indexEventListener.beforeIndexShardMutableOperation(this, listener.delegateFailure((l, unused) -> l.onResponse(null)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we shouldn't provide the Engine through this listener. It kind of breaks the withEngine API that expects that the Engine won't escape the lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unhollow/hollow timing should protect the engine from going hollow again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can remove this TODO.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// TODO ES-10826: Acquire ref to engine and retry if it's immutable again?
l.onResponse(null);
}));
indexEventListener.beforeIndexShardMutableOperation(this, listener.delegateFailure((l, unused) -> l.onResponse(null)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can remove this TODO.

Copy link
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Nice to see all the TODOs that I left getting removed

@arteam
Copy link
Contributor

arteam commented Mar 26, 2025

https://gradle-enterprise.elastic.co/s/4tews3fceo3b2/tests/task/:modules:stateless:internalClusterTest/details/co.elastic.elasticsearch.stateless.StatelessHollowIndexShardsIT/testGetFromTranslogRetriesOnAlreadyClosedException?expanded-stacktrace=WyIwLTEtMi0zIl0&top-execution=1

Interesting test failure on StatelessHollowIndexShardsIT#testGetFromTranslogRetriesOnAlreadyClosedException. We can probably remove this test or adjust it to throw randomFrom(new IndexNotFoundException(indexName), new ShardNotFoundException(shardId)))?

@fcofdez
Copy link
Contributor Author

fcofdez commented Mar 26, 2025

Good point @arteam. I think that I'll remove the test

@tlrx
Copy link
Member

tlrx commented Mar 26, 2025

There is special handling for retrying AlreadyClosedException in the serverless test stress to adjust.

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Mar 26, 2025
@fcofdez
Copy link
Contributor Author

fcofdez commented Mar 26, 2025

Thanks both for noticing, I've opened the serverless counterpart of this.

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@tlrx tlrx requested review from tlrx, kingherc and arteam April 11, 2025 07:52
@tlrx
Copy link
Member

tlrx commented Apr 11, 2025

I updated the PR now #126311 is merged, let me know what you think.

@fcofdez I'd appreciate some review from you too :)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (cause instanceof ShardNotFoundException
|| cause instanceof IndexNotFoundException
|| cause instanceof IllegalIndexShardStateException
|| cause instanceof AlreadyClosedException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This AlreadyClosedException is thrown when the engine is closed due to relocation (not due to reset)

Engine engine = indexShard.getEngineOrNull();
if (engine == null) {
throw new AlreadyClosedException("engine closed");
// Allows to keep the same engine instance for getFromTranslog and getLastUnsafeSegmentGenerationForGets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly required, but I think it makes sense to ensure it uses the same engine instance.

Copy link
Contributor Author

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for taking over this work @tlrx

Copy link
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@tlrx tlrx merged commit 7ad2369 into elastic:main Apr 11, 2025
17 checks passed
@tlrx
Copy link
Member

tlrx commented Apr 11, 2025

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants