-
Notifications
You must be signed in to change notification settings - Fork 14.5k
KAFKA-19400: Update AddRaftVoterRequest RPC to version 1 #19982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
||
// This method sets up the context so a test can send an AddVoter request after | ||
// exiting this method | ||
private void prepareToSendAddVoter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this name is a bit deceiving, this is all prep we do on the local (leader) node in order for it to be able to respond favorably to an addvoter request, could we perhaps rename to "prepareLeaderToReceiveAddVoter"?
I'm also not sure how I feel about having a helper method for this, I do see how this is quite a bit of code duplication but I wonder if it might be more clear to have this written out explicitly in the tests (you probably don't need to keep all the same assertions for all the tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we perhaps rename to "prepareLeaderToReceiveAddVoter"
Yeah, this is more accurate.
I wonder if it might be more clear to have this written out explicitly in the tests
I would prefer to have this code in a helper method, which applies to the other comment too. In my opinion, it's clearer to me when all of these tests do have the exact same assertions so long as they apply, because it means our new AckWhenCommitted
field (or any other feature) is not unintentionally changing behavior elsewhere. For example, the metric values check prevented me from writing an incorrect implementation!
Another way to think about it is that in these AddVoter
unit tests, we're testing the same state of KRaft essentially (i.e. how the local leader handles AddVoter RPC), but each test is changing one thing, which is the ackWhenCommitted
value (true, false, or NOT_SUPPORTED). Because we want coverage, I think duplication is a natural side-effect.
Maybe I can just document the helpers/rename them as specific to these AddVoter
tests?
ReplicaKey follower, | ||
ReplicaKey newVoter, | ||
int epoch | ||
) throws Exception { | ||
// The new voter is now a voter after writing the VotersRecord to the log | ||
assertTrue(context.client.quorum().isVoter(newVoter)); | ||
checkLeaderMetricValues(3, 0, 1, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem very extensible - maybe the above checks belong outside of this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rationale of breaking this into a shared helper is similar to here: https://github.com/apache/kafka/pull/19982/files/c46387c64df052d0794932d91c8839bcd2e9d4a5#r2153376114.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifically the checkLeaderMetricValues assuming we have a 3 voter quorum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I'm wondering if just the two assertions/checks don't belong in the helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I'm wondering if just the two assertions/checks don't belong in the helper
We do want the metric check. Specifically the uncommitted voter change metric should go from 1 to 0. Otherwise there is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifically the checkLeaderMetricValues assuming we have a 3 voter quorum
Apologies, I did not clarify why I felt making this assumption was okay. My opinion is that the number of nodes is arbitrary in the context of the AddVoter unit tests. These should only check the state of KRaft as the voter attempts to go from X to X + 1.
"KIP_853_PROTOCOL", | ||
"KIP_996_PROTOCOL" | ||
}) | ||
void testAddVoterCompatibility(RaftProtocol protocol) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testAddVoterAckWhenCommittedUnsupported
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also need to add KIP_1166
here too.
// Attempt to add new voter to the quorum | ||
context.deliverRequest(context.addVoterRequest(Integer.MAX_VALUE, newVoter, newListeners)); | ||
|
||
completeApiVersionsForAddVoter(context, newVoter, newAddress); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only called in two places, I wonder if we should just remove the helper so folks can see very clearly what is happening in these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you okay with leaving this in? The completeApiVersionsForAddVoter
method is a more general method than commitNewVoterSetForAddVoter
, since it does not have metrics checks on the number of voters/observers. This means future tests that want to test AddRaftVoter RPC can call completeApiVersionsForAddVoter
as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, my motivation for grouping code in this way into helpers was to make the test read like the high level description of how AddRaftVoter RPC works:
prepareLeaderToReceiveAddVoter
gets the local replica to a state where the context can send the AddRaftVoter RPC and have it trigger an API versions request- The test context delivers an AddRaftVoter RPC
completeApiVersionsForAddVoter
completes the API versions RPC contained as part of the AddRaftVoter RPC loopcommitNewVoterSetForAddVoter
completes the fetch RPC that is needed to complete AddRaftVoter RPC
Let me know if that makes sense to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kevin, two comments left about the helpers, but approving the changes in either case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @kevin-wu24 .
currentTimeMs, | ||
data.ackWhenCommitted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's swap the order of currentTimeMs
and ackWhenCommitted
. "Ack when committed" is more important and raft tends to pass the current time as the last parameter.
time.timer(timeout.getAsLong()), | ||
ackWhenCommitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here. Let's swap the order of the timer and the "ack when committed" boolean.
private final Timer timeout; | ||
private final boolean ackWhenCommitted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but let's swap this order here and in the constructor.
current.future().complete( | ||
RaftUtil.addVoterResponse( | ||
Errors.NONE, | ||
null | ||
) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to join some of these lines if they are less than about 100 characters. E.g.:
current.future().complete(
RaftUtil.addVoterResponse(Errors.NONE, null)
);
// Attempt to add new voter to the quorum | ||
assertThrows( | ||
UnsupportedVersionException.class, | ||
() -> context.deliverRequest( | ||
context.addVoterRequest( | ||
Integer.MAX_VALUE, | ||
newVoter, | ||
newListeners | ||
).setAckWhenCommitted(false) | ||
) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not testing KafkaRaftClient
. This is testing the serialization code. It is testing that serializing a false "ack when committed" throws an exception when the target version is 0.
If you want to test this case for KafkaRaftClient, you need to make the local replica a non-voter that can auto-join and show that AddVoterRequest always sets "ack when committed" to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not testing KafkaRaftClient. This is testing the serialization code. It is testing that serializing a false "ack when committed" throws an exception when the target version is 0.
We want to test the serialization code behaves as expected right? How else would we know that we are throwing an exception that prevents us from sending AddVoter v1 to a node that only supports v0 because ackWhenCommitted
has ignorable == false
?
If you want to test this case for KafkaRaftClient, you need to make the local replica a non-voter that can auto-join and show that AddVoterRequest always sets "ack when committed" to false.
I was planning to add this as a check in all the KafkaRaftClientAutoJoinTest
tests in the other PR after this PR is merged, since all of those tests have the local replica as a non-voter. Specifically, we add another check in assertSentAddVoterRequest
that AckWhenCommitted == false
for any auto-joining replica.
RaftClientTestContext context, | ||
int epoch, | ||
ReplicaKey leader, | ||
ReplicaKey follower, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting in all of the reconfig tests we always go from 2 voters to 3 voters? Maybe we can improve this in a future PR and make this parametrized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can improve this in a future PR and make this parametrized.
Yeah possibly. I think just parameterizing might not be enough, depending on what you want to parametrize over. This is because the number of fetches needed to commit the new voters record increases as the number of voters increases.
Add the
ackWhenCommitted
boolean field to theAddRaftVoterRequest
RPC, and bump the RPC's version to 1.ackWhenCommitted
istrue
, and in this case the leader will return a response after committing theVotersRecord
generated by the RPC.ackWhenCommitted == false
, the leader will return a response after writing theVotersRecord
to its own local log.KafkaRaftClientReconfigTest