Skip to content

[Test] More reliable wait for index to appear #126437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 9, 2025

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Apr 8, 2025

Relates: #125652
Resolves: #126204

@ywangd ywangd added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v9.1.0 labels Apr 8, 2025
@ywangd ywangd requested a review from DaveCTurner April 8, 2025 02:14
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Apr 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of nits otherwise LGTM.

There's a few other places we call indexExists() in an assertBusy() which probably want the same treatment.

@@ -748,7 +748,7 @@ public void testDeletionOfFailingToRecoverIndexShouldStopRestore() throws Except

logger.info("--> wait for the index to appear");
// that would mean that recovery process started and failing
waitForIndex("test-idx", TimeValue.timeValueSeconds(10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only caller of waitForIndex so I suggest removing the method itself too.

@@ -748,7 +748,7 @@ public void testDeletionOfFailingToRecoverIndexShouldStopRestore() throws Except

logger.info("--> wait for the index to appear");
// that would mean that recovery process started and failing
waitForIndex("test-idx", TimeValue.timeValueSeconds(10));
safeGet(clusterAdmin().prepareHealth(TEST_REQUEST_TIMEOUT, "test-idx").execute());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to set a 30s timeout on the request if we're only waiting for 10s:

Suggested change
safeGet(clusterAdmin().prepareHealth(TEST_REQUEST_TIMEOUT, "test-idx").execute());
safeGet(clusterAdmin().prepareHealth(SAFE_AWAIT_TIMEOUT, "test-idx").execute());

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 9, 2025
@elasticsearchmachine elasticsearchmachine merged commit 4f9bfb0 into elastic:main Apr 9, 2025
17 checks passed
@ywangd ywangd deleted the es-126204-fix branch April 9, 2025 04:31
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Apr 9, 2025
@nielsbauman
Copy link
Contributor

As far as I am concerned, the cluster health API is also a candidate for being run on the local node by default - it's also listed in #101805. I still think that's valuable and worth doing, but that API will probably need the option to run on the master node, as we'll want to be able to wait for all tasks to complete on the master node - at least in tests, not sure how valuable that is in production.

This is not something we need to discuss on this PR - I only brought it up here because I noticed we're using the cluster health API here to check the index health - but we might need to discuss an approach sometime later.

@DaveCTurner
Copy link
Contributor

++ yeah the cluster health API is kind of a weird one, it's so heavily used in tests to wait for the master node state that I think making it node-local will be fraught with difficulty. We could for instance wait for the node-local state and then reach out to the master to wait for the application to complete? But also it's not particularly heavyweight, it might not be worth the effort to migrate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SharedClusterSnapshotRestoreIT testDeletionOfFailingToRecoverIndexShouldStopRestore failing
4 participants