-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Replace assertBusy of indexExists #126501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace assertBusy of indexExists #126501
Conversation
public static void ensureIndexExists(String index) { | ||
ensureIndexExists(index, SAFE_AWAIT_TIMEOUT.seconds(), SAFE_AWAIT_TIMEOUT.timeUnit()); | ||
} | ||
|
||
public static void ensureIndexExists(String index, long timeout, TimeUnit unit) { | ||
safeGet( | ||
clusterAdmin().prepareHealth(new TimeValue(timeout, unit), index) | ||
.setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED) | ||
.execute(), | ||
timeout, | ||
unit | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised this PR as a draft since I am not sure whether this is what we want. There are scenarios that are not covered by this, e.g. assertBusy for index not exists, assertBusy for an index exists on a remote cluster.
For the former, I wonder whether it is easier to change the existing indexExists
method to always talk to the master node. For the later, I don't have a great suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an integ test involving multiple clusters we have a client for each cluster so we should be able to do the same health API call there.
For the assertBusy/assertFalse waits that I could find it looks like it's ok to wait on any node - we're just checking that the index deletion has been committed, not that it has been applied everywhere, so need no change. For instance if the next thing the test does involves another cluster state update then that will of course wait for the previous one to complete. If that weren't the case, a GET _cluster/health?wait_for_events=LANGUID
would do the trick I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions. Yeah the assertFalse case does not seem to be problematic. Unlike assertTrue case, the tests do not anything more against the deleted index. For the remote cluster case, I made the variant of the method to take a client parameter.
...ternalClusterTest/java/org/elasticsearch/xpack/ilm/ClusterStateWaitThresholdBreachTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice, I was going to do this but you beat me to it.
ensureIndexExists(index, SAFE_AWAIT_TIMEOUT.seconds(), SAFE_AWAIT_TIMEOUT.timeUnit()); | ||
} | ||
|
||
public static void ensureIndexExists(String index, long timeout, TimeUnit unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we just always used SAFE_AWAIT_TIMEOUT
(I suspect the one case where we wait 30s below is a bug and could be 10s). But if we do need to expose the timeout to callers could we use TimeValue
in the API rather than long/TimeUnit
? I'll fix up the safeGet
overload to use TimeValue
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted the 30 seconds use cases. But still kept the method variant that takes a timeout parameter since the usage for remote cluster has a 120 seconds wait time. I suspect 120s is excessive. But probably justify to be longer than 10s.
@@ -1731,6 +1731,20 @@ public static boolean indexExists(String index, Client client) { | |||
return getIndexResponse.getIndices().length > 0; | |||
} | |||
|
|||
public static void ensureIndexExists(String index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit, maybe awaitIndexExists
to show that it will wait? Otherwise this reads to me as something that will immediately fail if the index doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep renamed as suggested. Thanks!
public static void ensureIndexExists(String index) { | ||
ensureIndexExists(index, SAFE_AWAIT_TIMEOUT.seconds(), SAFE_AWAIT_TIMEOUT.timeUnit()); | ||
} | ||
|
||
public static void ensureIndexExists(String index, long timeout, TimeUnit unit) { | ||
safeGet( | ||
clusterAdmin().prepareHealth(new TimeValue(timeout, unit), index) | ||
.setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED) | ||
.execute(), | ||
timeout, | ||
unit | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an integ test involving multiple clusters we have a client for each cluster so we should be able to do the same health API call there.
For the assertBusy/assertFalse waits that I could find it looks like it's ok to wait on any node - we're just checking that the index deletion has been committed, not that it has been applied everywhere, so need no change. For instance if the next thing the test does involves another cluster state update then that will of course wait for the previous one to complete. If that weren't the case, a GET _cluster/health?wait_for_events=LANGUID
would do the trick I think.
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
@@ -118,7 +118,7 @@ public void testAutoFollow() throws Exception { | |||
createLeaderIndex("metrics-201901", leaderIndexSettings); | |||
|
|||
createLeaderIndex("logs-201901", leaderIndexSettings); | |||
assertLongBusy(() -> { assertTrue(ESIntegTestCase.indexExists("copy-logs-201901", followerClient())); }); | |||
ESIntegTestCase.awaitIndexExists("copy-logs-201901", followerClient(), TimeValue.timeValueSeconds(120)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10s should be fine here unless something is desperately broken. The auto-follow is triggered by long-polling the cluster state on the leader so it should react immediately to the index creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK makes sense. I dropped it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks Yang
@elasticmachine update branch |
safeGet( | ||
client.admin() | ||
.cluster() | ||
.prepareHealth(SAFE_AWAIT_TIMEOUT, index) | ||
.setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED) | ||
.execute() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're not using something like ESIntegTestCase#awaitClusterState
here? We're not waiting for a specific health (i.e. GREEN
) here, just for the index to exist. Couldn't (/shouldn't) we do that with a predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really just that this is replacing pre-existing client-based calls. It's the same either way really, the health API waits with a ClusterStateObserver
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One difference is that this method is also used to check a remote cluster's indices, i.e. it needs a client.
The other difference that I just noticed from the CI failure is that index name can be a wildcard like this usage. It requires expansion. Unfortunately, this means we have to use assertBusy in this case. I pushed 75c1464
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remote cluster in question is still running in the test JVM so no this doesn't need a client, you could use awaitClusterState
with the ClusterService
of the elected master of the follower cluster obtained with getFollowerCluster().getCurrentMasterNodeInstance(ClusterService.class)
.
I don't think we should support wildcards in this utility, especially if it requires an assertBusy
like that. The only test that uses it is pretty odd (and very old). I'd rather we did something like #126582 there.
Relates: #126437 (review)