Replace assertBusy of indexExists #126501

ywangd · 2025-04-09T06:49:28Z

ywangd · 2025-04-09T06:52:33Z

test/framework/src/main/java/org/elasticsearch/test/ESIntegTestCase.java

+    public static void ensureIndexExists(String index) {
+        ensureIndexExists(index, SAFE_AWAIT_TIMEOUT.seconds(), SAFE_AWAIT_TIMEOUT.timeUnit());
+    }
+
+    public static void ensureIndexExists(String index, long timeout, TimeUnit unit) {
+        safeGet(
+            clusterAdmin().prepareHealth(new TimeValue(timeout, unit), index)
+                .setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED)
+                .execute(),
+            timeout,
+            unit
+        );
+    }


Raised this PR as a draft since I am not sure whether this is what we want. There are scenarios that are not covered by this, e.g. assertBusy for index not exists, assertBusy for an index exists on a remote cluster.

For the former, I wonder whether it is easier to change the existing indexExists method to always talk to the master node. For the later, I don't have a great suggestion.

In an integ test involving multiple clusters we have a client for each cluster so we should be able to do the same health API call there.

For the assertBusy/assertFalse waits that I could find it looks like it's ok to wait on any node - we're just checking that the index deletion has been committed, not that it has been applied everywhere, so need no change. For instance if the next thing the test does involves another cluster state update then that will of course wait for the previous one to complete. If that weren't the case, a GET _cluster/health?wait_for_events=LANGUID would do the trick I think.

Thanks for the suggestions. Yeah the assertFalse case does not seem to be problematic. Unlike assertTrue case, the tests do not anything more against the deleted index. For the remote cluster case, I made the variant of the method to take a client parameter.

...ternalClusterTest/java/org/elasticsearch/xpack/ilm/ClusterStateWaitThresholdBreachTests.java

DaveCTurner

👍 nice, I was going to do this but you beat me to it.

DaveCTurner · 2025-04-09T07:03:42Z

test/framework/src/main/java/org/elasticsearch/test/ESIntegTestCase.java

+        ensureIndexExists(index, SAFE_AWAIT_TIMEOUT.seconds(), SAFE_AWAIT_TIMEOUT.timeUnit());
+    }
+
+    public static void ensureIndexExists(String index, long timeout, TimeUnit unit) {


I'd rather we just always used SAFE_AWAIT_TIMEOUT (I suspect the one case where we wait 30s below is a bug and could be 10s). But if we do need to expose the timeout to callers could we use TimeValue in the API rather than long/TimeUnit? I'll fix up the safeGet overload to use TimeValue too.

I deleted the 30 seconds use cases. But still kept the method variant that takes a timeout parameter since the usage for remote cluster has a 120 seconds wait time. I suspect 120s is excessive. But probably justify to be longer than 10s.

DaveCTurner · 2025-04-09T07:06:41Z

test/framework/src/main/java/org/elasticsearch/test/ESIntegTestCase.java

@@ -1731,6 +1731,20 @@ public static boolean indexExists(String index, Client client) {
        return getIndexResponse.getIndices().length > 0;
    }

+    public static void ensureIndexExists(String index) {


Naming nit, maybe awaitIndexExists to show that it will wait? Otherwise this reads to me as something that will immediately fail if the index doesn't exist.

Yep renamed as suggested. Thanks!

DaveCTurner · 2025-04-09T07:12:41Z

test/framework/src/main/java/org/elasticsearch/test/ESIntegTestCase.java

+    public static void ensureIndexExists(String index) {
+        ensureIndexExists(index, SAFE_AWAIT_TIMEOUT.seconds(), SAFE_AWAIT_TIMEOUT.timeUnit());
+    }
+
+    public static void ensureIndexExists(String index, long timeout, TimeUnit unit) {
+        safeGet(
+            clusterAdmin().prepareHealth(new TimeValue(timeout, unit), index)
+                .setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED)
+                .execute(),
+            timeout,
+            unit
+        );
+    }


In an integ test involving multiple clusters we have a client for each cluster so we should be able to do the same health API call there.

For the assertBusy/assertFalse waits that I could find it looks like it's ok to wait on any node - we're just checking that the index deletion has been committed, not that it has been applied everywhere, so need no change. For instance if the next thing the test does involves another cluster state update then that will of course wait for the previous one to complete. If that weren't the case, a GET _cluster/health?wait_for_events=LANGUID would do the trick I think.

…ists

elasticsearchmachine · 2025-04-09T07:35:54Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DaveCTurner · 2025-04-09T07:45:52Z

x-pack/plugin/ccr/src/internalClusterTest/java/org/elasticsearch/xpack/ccr/AutoFollowIT.java

@@ -118,7 +118,7 @@ public void testAutoFollow() throws Exception {
        createLeaderIndex("metrics-201901", leaderIndexSettings);

        createLeaderIndex("logs-201901", leaderIndexSettings);
-        assertLongBusy(() -> { assertTrue(ESIntegTestCase.indexExists("copy-logs-201901", followerClient())); });
+        ESIntegTestCase.awaitIndexExists("copy-logs-201901", followerClient(), TimeValue.timeValueSeconds(120));


10s should be fine here unless something is desperately broken. The auto-follow is triggered by long-polling the cluster state on the leader so it should react immediately to the index creation.

OK makes sense. I dropped it.

DaveCTurner

LGTM thanks Yang

ywangd · 2025-04-09T08:11:34Z

@elasticmachine update branch

nielsbauman · 2025-04-09T09:25:17Z

test/framework/src/main/java/org/elasticsearch/test/ESIntegTestCase.java

+        safeGet(
+            client.admin()
+                .cluster()
+                .prepareHealth(SAFE_AWAIT_TIMEOUT, index)
+                .setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED)
+                .execute()
+        );


Is there a reason we're not using something like ESIntegTestCase#awaitClusterState here? We're not waiting for a specific health (i.e. GREEN) here, just for the index to exist. Couldn't (/shouldn't) we do that with a predicate?

Really just that this is replacing pre-existing client-based calls. It's the same either way really, the health API waits with a ClusterStateObserver.

One difference is that this method is also used to check a remote cluster's indices, i.e. it needs a client.

The other difference that I just noticed from the CI failure is that index name can be a wildcard like this usage. It requires expansion. Unfortunately, this means we have to use assertBusy in this case. I pushed 75c1464

The remote cluster in question is still running in the test JVM so no this doesn't need a client, you could use awaitClusterState with the ClusterService of the elected master of the follower cluster obtained with getFollowerCluster().getCurrentMasterNodeInstance(ClusterService.class).

I don't think we should support wildcards in this utility, especially if it requires an assertBusy like that. The only test that uses it is pretty odd (and very old). I'd rather we did something like #126582 there.

…ists

Relates elastic#126501

Relates #126501

This was already fixed by #126501. Fixes #126348

Replace assertBusy of indexExists

123e9c9

Relates: elastic#126437

ywangd added >test Issues or PRs that are addressing/adding tests v9.1.0 :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. labels Apr 9, 2025

ywangd requested a review from DaveCTurner April 9, 2025 06:49

ywangd commented Apr 9, 2025

View reviewed changes

...ternalClusterTest/java/org/elasticsearch/xpack/ilm/ClusterStateWaitThresholdBreachTests.java Show resolved Hide resolved

DaveCTurner reviewed Apr 9, 2025

View reviewed changes

Incorporate review comments

9c1d4c0

ywangd marked this pull request as ready for review April 9, 2025 07:35

Merge remote-tracking branch 'origin/main' into replace-more-index-ex…

a3338b3

…ists

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Apr 9, 2025

ywangd requested a review from DaveCTurner April 9, 2025 07:39

DaveCTurner reviewed Apr 9, 2025

View reviewed changes

ywangd mentioned this pull request Apr 9, 2025

Use TimeValue for timeouts in safeAwait etc. #126509

Merged

ywangd added 2 commits April 9, 2025 17:50

drop timeout

7ea2eab

drop

59c6473

ywangd requested a review from DaveCTurner April 9, 2025 07:51

DaveCTurner approved these changes Apr 9, 2025

View reviewed changes

ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 9, 2025

Merge branch 'main' into replace-more-index-exists

780d0e7

nielsbauman reviewed Apr 9, 2025

View reviewed changes

ywangd added 2 commits April 10, 2025 09:47

handle wildcard

75c1464

Merge remote-tracking branch 'origin/main' into replace-more-index-ex…

5ef316e

…ists

elasticsearchmachine merged commit 62636f9 into elastic:main Apr 10, 2025
17 checks passed

ywangd deleted the replace-more-index-exists branch April 10, 2025 00:57

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 10, 2025

Reduce assertBusy usage in testMultipleNodes

6019a7f

Relates elastic#126501

DaveCTurner mentioned this pull request Apr 10, 2025

Reduce assertBusy usage in testMultipleNodes #126582

Merged

elasticsearchmachine pushed a commit that referenced this pull request Apr 10, 2025

Reduce assertBusy usage in testMultipleNodes (#126582)

9e0d885

Relates #126501

This was referenced Apr 11, 2025

[CI] ClusterStateWaitThresholdBreachTests testWaitInShrunkShardsAllocatedExceedsThreshold failing #126348

Closed

Unmute #126348 #126690

Merged

elasticsearchmachine pushed a commit that referenced this pull request Apr 11, 2025

Unmute #126348 (#126690)

ac7eccc

This was already fixed by #126501. Fixes #126348

Replace assertBusy of indexExists #126501

Replace assertBusy of indexExists #126501

Conversation

ywangd commented Apr 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd commented Apr 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!