Changes to prevent deadlocks and loops on elections #7

lmalheiro · 2016-03-17T14:53:51Z

Hi, Pedro. I'm using the skiff-algorithm in a project and I had some situations when I was restarting the nodes where they would enter loops or deadlocks. I did a few changes that maybe you would like to incorporate. The following text is basically what I wrote in my commit to explain the changes.

As the candidates first vote to themselves, when all nodes are candidates,
the likely result of the election is a draw. Converting to follower after a failed election
increases the chance that the next election won't result in a draw.

If a leader doesn't forget that it voted to himself, this can prevent
candidates to step in as State.onRequestVote() [state.js] will always
deny the vote to other candidates than the leader himself. As the
leader is not a candidate by definition, this could result in a
deadlock. The same problem applies to the follower. The simplest
solution that I found was to remember the vote's term, so if a newer
term is seen, the node can forget its previous vote.

During an election, if the result of a vote request is 'not granted'
with reason 'too soon', another node has seen the leader alive, this
is a hint that the candidate should convert to follower.

When a node converts to follower, it may take some extra time
to setup the connection and the event handlers to receive an
initial heart beat from the leader, that could move the node to
candidate because of the timeout. If that node is behind in
the log, it can't be elected, which could let it stuck as candidate
(or move it back to follower after the change mentioned above,
this in turn, could result in a loop).

When a leader get's a timeout from a peer, it may be necessary to
reset the connection.

candidates and have the same term, the likely result result of the lection is a draw. Converting to follower after a failed election increses the chance that the next election won't result in a draw. If a leader doesn't forget that it voted to himself, this can prevent candidates to step in, as State.onRequestVote() [state.js] will always deny the vote to other candidates than the leader himself. As the leader is not a candidate by definition, this could result in a deadlock. The simplest solution that I found was to remember the vote's term, so a if newer term is seen, the node can forget its previous vote. During an election, if the result of a vote request is 'not granted' with reason 'too soon', another node has seen the leader alive, this is a hint that the candidate should convert to follower. When a node converts to follower, it may take longer to setup the connection and receive an initial heart beat from the leader, that could move the node to candidate, but if that node is behind in the log, it can't be elected, which could let it stuck as candidate (or move it back to follower after the change mentioned above, this in turn, could result in a loop). When a leader get's a timeout from a peer, it may be necessary to reset the connection. Probably I should find a generic solution that could be extended to the other states, but that will do for now.

lmalheiro closed this Mar 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to prevent deadlocks and loops on elections #7

Changes to prevent deadlocks and loops on elections #7

Uh oh!

lmalheiro commented Mar 17, 2016

Uh oh!

Uh oh!

Changes to prevent deadlocks and loops on elections #7

Changes to prevent deadlocks and loops on elections #7

Uh oh!

Conversation

lmalheiro commented Mar 17, 2016

Uh oh!

Uh oh!