Skip to content

Changes to prevent deadlocks and loops on elections #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Changes to prevent deadlocks and loops on elections #7

wants to merge 1 commit into from

Conversation

lmalheiro
Copy link

Hi, Pedro. I'm using the skiff-algorithm in a project and I had some situations when I was restarting the nodes where they would enter loops or deadlocks. I did a few changes that maybe you would like to incorporate. The following text is basically what I wrote in my commit to explain the changes.

As the candidates first vote to themselves, when all nodes are candidates,
the likely result of the election is a draw. Converting to follower after a failed election
increases the chance that the next election won't result in a draw.

If a leader doesn't forget that it voted to himself, this can prevent
candidates to step in as State.onRequestVote() [state.js] will always
deny the vote to other candidates than the leader himself. As the
leader is not a candidate by definition, this could result in a
deadlock. The same problem applies to the follower. The simplest
solution that I found was to remember the vote's term, so if a newer
term is seen, the node can forget its previous vote.

During an election, if the result of a vote request is 'not granted'
with reason 'too soon', another node has seen the leader alive, this
is a hint that the candidate should convert to follower.

When a node converts to follower, it may take some extra time
to setup the connection and the event handlers to receive an
initial heart beat from the leader, that could move the node to
candidate because of the timeout. If that node is behind in
the log, it can't be elected, which could let it stuck as candidate
(or move it back to follower after the change mentioned above,
this in turn, could result in a loop).

When a leader get's a timeout from a peer, it may be necessary to
reset the connection.

candidates and have the same term, the likely result result of the
lection is a draw. Converting to follower after a failed election
increses the chance that the next election won't result in a draw.

If a leader doesn't forget that it voted to himself, this can prevent
candidates to step in, as State.onRequestVote() [state.js] will always
deny the vote to other candidates than the leader himself. As the
leader is not a candidate by definition, this could result in a
deadlock. The simplest solution that I found was to remember the
vote's term, so a if newer term is seen, the node can forget its
previous vote.

During an election, if the result of a vote request is 'not granted'
with reason 'too soon', another node has seen the leader alive, this
is a hint that the candidate should convert to follower.

When a node converts to follower, it may take longer to setup the
connection and receive an initial heart beat from the leader, that
could move the node to candidate, but if that node is behind in the
log, it can't be elected, which could let it stuck as candidate (or
move it back to follower after the change mentioned above, this in
turn, could result in a loop).

When a leader get's a timeout from a peer, it may be necessary to
reset the connection. Probably I should find a generic solution that
could be extended to the other states, but that will do for now.
@lmalheiro lmalheiro closed this Mar 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant