-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix NullPointerException
s failing TransportClusterStateActionDisruptionIT
#127523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaces the use of InternalTestCluster.getMasterName() with a ClusterServiceUtils.addTemporaryStateListener() call that waits for a new master node other than the previous master node. InternalTestCluster.getMasterName() is not safe to use in unstable clusters, per PR 127213. Closes: elastic#127466 elastic#127443 elastic#127424 elastic#127423 elastic#127422
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one small suggestion, other than that LGTM! Great job tracking down the PR that caused this failure.
N.B. GitHub requires you to write out the closes #...
for every issue, so you'll have to write
closes #...
closes #...
if you want GitHub to automatically close all of them.
var newMasterNodeListener = ClusterServiceUtils.addTemporaryStateListener( | ||
internalCluster().clusterService(nonMasterNode), | ||
state -> Optional.ofNullable(state.nodes().getMasterNode()).map(m -> m.getName().equals(masterName) == false).orElse(false), | ||
TEST_REQUEST_TIMEOUT | ||
); | ||
safeAwait(newMasterNodeListener, TEST_REQUEST_TIMEOUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this:
var newMasterNodeListener = ClusterServiceUtils.addTemporaryStateListener( | |
internalCluster().clusterService(nonMasterNode), | |
state -> Optional.ofNullable(state.nodes().getMasterNode()).map(m -> m.getName().equals(masterName) == false).orElse(false), | |
TEST_REQUEST_TIMEOUT | |
); | |
safeAwait(newMasterNodeListener, TEST_REQUEST_TIMEOUT); | |
awaitClusterState( | |
logger, | |
nonMasterNode, | |
state -> Optional.ofNullable(state.nodes().getMasterNode()).map(m -> m.getName().equals(masterName) == false).orElse(false) | |
); |
That looks a tiny bit cleaner to me and matches better what we're actually trying to achieve (which is waiting for the cluster state). The only downside is that this changes the timeout as ClusterServiceUtils#awaitClusterState
uses a timeout of 30s
whereas asserBusy
(the old code) uses a default timeout of 10s
. I think we'll want to try to reduce that 30s
timeout at some point anyway, so I don't see that as a reason not to use that method here.
Also, the logger
parameter will likely be removed in the future as well, making this even cleaner (but still only a little bit more).
Replaces the use of
InternalTestCluster.getMasterName()
with aClusterServiceUtils.addTemporaryStateListener()
call that waits for a new master node other than the previous master node.InternalTestCluster.getMasterName()
is not safe to use in unstable clusters, per PR #127213.To reproduce the
NullPointerException
ingetMasterName()
seen in the test failures I added a sleep in between the two cluster state accesses:Closes #127466
Closes #127443
Closes #127424
Closes #127423
Closes #127422