-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix timeout for awaiting index existence #126773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
elastic#126692 allowed consumers to specify a timeout to `awaitIndexExists`, but that timeout did not get propagated correctly to all the required places.
// NB private because tests should be designed not to need to wait for longer than SAFE_AWAIT_TIMEOUT. | ||
private static <T> T safeGet(Future<T> future, TimeValue timeout) { | ||
public static <T> T safeGet(Future<T> future, TimeValue timeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can simply require tests to not have to wait for more than 10 seconds. In this case, downsampling is not consistently finished within 10 seconds, we really need more time. I'm open to other suggestions, but I think it's acceptable to allow consumers to specify a higher timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we didn't make it so easy to abuse these utilities like this - experience shows that it just leads to lazy testing that waits for far too long and slows everything down. If it's just for a health request, you can pick your own timeout and use ElasticsearchAssertions#assertNoTimeout
.
Relatedly, why is downsampling so slow in tests? Are we just sending it unreasonably large amounts of data, or is it slow for some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not sure I feel like not exposing this method will make much of a difference when it comes to encouraging people to write tests that don't need to wait more than 10 seconds - as there are other alternatives.
As to these downsampling tests (but I'm sure there are other places where we need a higher timeout), DataStreamLifecycleDownsampleDisruptionIT.testDataStreamLifecycleDownsampleRollingRestart
(which failed a few times #122056 (comment) and #123769 (comment)) and ILMDownsampleDisruptionIT.testILMDownsampleRollingRestart
both need longer timeouts because we do a rolling restart of the nodes. We saw both timeouts on the downsampling itself and of the downsampled index becoming green afterwards (i.e. shard relocation took too long).
We could wait for the cluster to complete the rolling restart (although we'd probably need a timeout higher than 10 seconds for that) and only then check for index existence within 10 seconds, but that probably only makes the tests longer as they could also complete downsampling before the rolling restart is finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides that, I feel like the difference between TEST_REQUEST_TIMEOUT
(30s) and SAFE_AWAIT_TIMEOUT
(10s) is also confusing. If we truly believe tests shouldn't have to wait for more than 10 seconds, we should drop the TEST_REQUEST_TIMEOUT
too - I don't think we should, but that would be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner any more thoughts on this? Otherwise I'll ask someone from Data Management to review as well to get this through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response. I'd still rather we used ElasticsearchAssertions#assertNoTimeout
. I disagree that this doesn't make much difference, I think it is important to discourage writing badly-designed tests that have to wait for so long. It's clearly possible to do whatever waiting you want to do in any case, it's just a question of whether that should be easy to do without much thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to assertNoTimeout
in c2a30b0.
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Niels LGTM
#126692 allowed consumers to specify a timeout to
awaitIndexExists
, but that timeout did not get propagated correctly to all the required places.