Fix timeout for awaiting index existence #126773

nielsbauman · 2025-04-14T11:11:42Z

#126692 allowed consumers to specify a timeout to awaitIndexExists, but that timeout did not get propagated correctly to all the required places.

elastic#126692 allowed consumers to specify a timeout to `awaitIndexExists`, but that timeout did not get propagated correctly to all the required places.

nielsbauman · 2025-04-14T11:12:57Z

test/framework/src/main/java/org/elasticsearch/test/ESTestCase.java

-    // NB private because tests should be designed not to need to wait for longer than SAFE_AWAIT_TIMEOUT.
-    private static <T> T safeGet(Future<T> future, TimeValue timeout) {
+    public static <T> T safeGet(Future<T> future, TimeValue timeout) {


I don't think we can simply require tests to not have to wait for more than 10 seconds. In this case, downsampling is not consistently finished within 10 seconds, we really need more time. I'm open to other suggestions, but I think it's acceptable to allow consumers to specify a higher timeout.

I'd rather we didn't make it so easy to abuse these utilities like this - experience shows that it just leads to lazy testing that waits for far too long and slows everything down. If it's just for a health request, you can pick your own timeout and use ElasticsearchAssertions#assertNoTimeout.

Relatedly, why is downsampling so slow in tests? Are we just sending it unreasonably large amounts of data, or is it slow for some other reason?

Hm, I'm not sure I feel like not exposing this method will make much of a difference when it comes to encouraging people to write tests that don't need to wait more than 10 seconds - as there are other alternatives.

As to these downsampling tests (but I'm sure there are other places where we need a higher timeout), DataStreamLifecycleDownsampleDisruptionIT.testDataStreamLifecycleDownsampleRollingRestart (which failed a few times #122056 (comment) and #123769 (comment)) and ILMDownsampleDisruptionIT.testILMDownsampleRollingRestart both need longer timeouts because we do a rolling restart of the nodes. We saw both timeouts on the downsampling itself and of the downsampled index becoming green afterwards (i.e. shard relocation took too long).

We could wait for the cluster to complete the rolling restart (although we'd probably need a timeout higher than 10 seconds for that) and only then check for index existence within 10 seconds, but that probably only makes the tests longer as they could also complete downsampling before the rolling restart is finished.

Besides that, I feel like the difference between TEST_REQUEST_TIMEOUT (30s) and SAFE_AWAIT_TIMEOUT (10s) is also confusing. If we truly believe tests shouldn't have to wait for more than 10 seconds, we should drop the TEST_REQUEST_TIMEOUT too - I don't think we should, but that would be consistent.

@DaveCTurner any more thoughts on this? Otherwise I'll ask someone from Data Management to review as well to get this through.

Sorry for the delayed response. I'd still rather we used ElasticsearchAssertions#assertNoTimeout. I disagree that this doesn't make much difference, I think it is important to discourage writing badly-designed tests that have to wait for so long. It's clearly possible to do whatever waiting you want to do in any case, it's just a question of whether that should be easy to do without much thought.

I switched to assertNoTimeout in c2a30b0.

elasticsearchmachine · 2025-04-14T11:14:59Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DaveCTurner

Thanks Niels LGTM

Fix timeout for awaiting index existence

7798f69

elastic#126692 allowed consumers to specify a timeout to `awaitIndexExists`, but that timeout did not get propagated correctly to all the required places.

nielsbauman requested a review from DaveCTurner April 14, 2025 11:11

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 labels Apr 14, 2025

nielsbauman commented Apr 14, 2025

View reviewed changes

nielsbauman added 2 commits April 17, 2025 23:00

Merge branch 'main' into fix-timeout

d3b48c8

Use assertNoTimeout

c2a30b0

DaveCTurner approved these changes Apr 18, 2025

View reviewed changes

Merge branch 'main' into fix-timeout

7a21b44

nielsbauman enabled auto-merge (squash) April 18, 2025 06:59

nielsbauman merged commit a81c449 into elastic:main Apr 18, 2025
17 of 18 checks passed

nielsbauman deleted the fix-timeout branch April 18, 2025 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix timeout for awaiting index existence #126773

Fix timeout for awaiting index existence #126773

Uh oh!

nielsbauman commented Apr 14, 2025

Uh oh!

nielsbauman Apr 14, 2025

Uh oh!

DaveCTurner Apr 14, 2025

Uh oh!

nielsbauman Apr 14, 2025

Uh oh!

nielsbauman Apr 14, 2025

Uh oh!

nielsbauman Apr 17, 2025

Uh oh!

DaveCTurner Apr 17, 2025

Uh oh!

nielsbauman Apr 17, 2025

Uh oh!

elasticsearchmachine commented Apr 14, 2025

Uh oh!

DaveCTurner left a comment

Uh oh!

Uh oh!

Uh oh!

Fix timeout for awaiting index existence #126773

Fix timeout for awaiting index existence #126773

Uh oh!

Conversation

nielsbauman commented Apr 14, 2025

Uh oh!

nielsbauman Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

nielsbauman Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

nielsbauman Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

nielsbauman Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

nielsbauman Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 14, 2025

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!