Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node not part of the cluster is not allowed to vote #477

Merged
merged 32 commits into from Mar 7, 2022

Conversation

dhiaayachi
Copy link
Contributor

@dhiaayachi dhiaayachi commented Sep 23, 2021

When a follower is removed from the cluster it's removed from the configuration servers list but still run a follower loop (originally a follower node or ShutdownOnRemove set to false for Leader). When that node timeout (because it's not receiving heartbeat from the other nodes in the cluster), it transition to a Candidate state and try to trigger an election. 2 possible cases:

  • the cluster is stable and have a leader: the vote request will be rejected
  • the cluster is unstable and don't have a leader: the vote is granted and the removed node will be part of the voting nodes.

This uncover 2 issue:

  • a removed node should not transition to a Candidate state. This is fixed in Forbid removed node from being a Candidate #476
  • a node that is not part of the cluster should not be allowed to vote: this a bit trickier to fix because the RequestVoteRequest only have the ServerAddress which do not represent a unique ID of the node, the one stored in the configuration server list. A test that reproduce the issue is in this PR.

To keep track of things in this PR here is a list of what I intend to achieve:

  • Add ID and Addr field in RPCHeader
  • Transition from using Candidate and Leader fields in Protocol Version 3 to using Addr (part of RPC header) field in protocol version 4
  • Use ID to validate that the node that is triggering an election is part of the latest configuration
  • Expose the Leader ID field as part of the API to be used in consul
  • Mixed ProtocolVersion Test

@dhiaayachi dhiaayachi changed the base branch from main to removed-node-vote September 23, 2021 19:43
@dhiaayachi dhiaayachi changed the title add test that check a removed node is not allowed to vote node not part of the cluster is not allowed to vote Sep 29, 2021
@dhiaayachi
Copy link
Contributor Author

@mkeeler I was thinking about this after our conversation today and I noticed few things:

  • Adding the ID as part of EncodePeer and DecodePeer would break compatibility because in the case of node that is version 4 talking to a node version 3, the node version 3 would not be able to decode the receiver peer correctly and fail, So we will need to make a 2 step upgrade process as discussed
  • I was thinking that we can add the ID as a separate field with a separate set of encoders decoders like it's currently the case in this PR, this will make the upgrade seamless and we still can survive with nodes running different versions (3 and 4)
  • I also thought that having the ID of the source node as part of any raft request/response is very useful (at least for 2 know cases that we have today) and instead of adding it as part of RequestVoteRequest only I can add it in RPCHeader so all the request/response benefit from it. We can encode it all the time and only use it with a protocolVersion check in all the needed cases (RequestVoteRequest , AppendEntriesRequest and InstallSnapshotRequest)

Let me know what you think about this?

@mkeeler
Copy link
Member

mkeeler commented Oct 5, 2021

So from our original discussions I was thinking the upgrade process would look like:

  1. All nodes are on some existing version at protocol 3. DecodePeer & EncodePeer only handle the raw server address.
  2. New nodes are added with the new version but set to use protocol 3. EncodePeer sends the server address form. DecodePeer can handle either the old form or some new struct that contains the id and address.
  3. Old nodes are removed.
  4. New nodes get a rolling restart and get switched to protocol 4. EncodePeer will now send both the id and address as the peer info instead of just the address. DecodePeer can still handle both forms though.

So with that sort of flow it wouldn't break compatibility. However I think I prefer your approach of adding the id to the RPCHeader as we are likely to want that information in other places in order to track the leaders id and make it available to the application running Raft.

I wonder whether we should add both the address and id of the peer server to the RPCHeader and have Raft proto 4 not encode the Candidate field of the RequestVoteRequest and the Leader field of the AppendEntriesRequest / InstallSnapshotRequest. What do you think?

@dhiaayachi
Copy link
Contributor Author

Thank you for taking a look at this @mkeeler

About the upgrade process, I was thinking the same. I just called it a breaking change from the perspective that it need a special upgrade flow (2 restarts).

About removing the Candidate and Leader fields I need to check if that bring any other implications, but I think the use of it is internal to raft and it's doable, I will update the PR and get back to you for a round of review.

Base automatically changed from removed-node-vote to main October 7, 2021 00:31
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not all that familiar with this code but I had some high level questions/suggestion.

commands.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are already going back and adding some mixed protocol version tests which is great.

I suspect they will show some issues. I pointed out one place where I think there is a backwards compat concern but generally I think all the places where there is something like following probably needs to execute the code unconditionally instead of only if the proto isn't v4:

   // encode some older proto 3 specific field

Basically we should always encode the older Leader and Candidate fields even if the proto is v4 so that if we are communicating with v3 nodes they can parse out the information appropiately.

raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the main code looks good. There are a handful of comment fixups and two logging fixes suggested but nothing that would require further review of.

raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft_test.go Outdated Show resolved Hide resolved
raft_test.go Outdated Show resolved Hide resolved
raft_test.go Outdated Show resolved Hide resolved
raft_test.go Outdated Show resolved Hide resolved
dhiaayachi and others added 2 commits November 24, 2021 14:42
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this is looking good.

If I understand correctly, this PR does the following:

  1. Increments to protocol version to 4, to indicate that we have new data available in the RPCHeader
  2. Uses that new ID in the header to check if the voter is actually in the configuration before considering the vote.

That makes sense to me, and in general having the ID and addr of the server sending the message seems like it is useful.

A few questions, a few more style suggestions. I think the only blockers are the API backwards compat, and the one place we don't version check req.Addr.

But I do think if we can keep using req.Leader and req.Candiate when available, that might help keep the code easier to follow by reducing the number of places we check the protocol version.

I did not look at the tests much, hopefully Matt reviewed those more closely. Otherwise I can have a more detailed look.

api.go Outdated Show resolved Hide resolved
commands.go Show resolved Hide resolved
fuzzy/verifier.go Outdated Show resolved Hide resolved
net_transport.go Outdated Show resolved Hide resolved
raft.go Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
net_transport_test.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the Protocol version increase, are you sure we need to increase the version?

Could we not optionally check if that field is set, and if it is do the new logic? If it's not then nothing we can do about it. That would avoid the difficult upgrade scenario, and allow everyone on V3 to get the benefit of this new check.

Copy link
Contributor Author

@dhiaayachi dhiaayachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing this @dnephin, see my comments in line.

I'm starting to think more that v4 is not necessary for this. Your point about using the new field and fallback to the old one when the new is empty seem reasonable to me and the fact that we always fill the old fields will ensure the compatibility with the nodes running a code version prior to this PR.

When I first did this with v4 version I was thinking that we will remove the Candidate and Leader field (or stop filling them in v4) but we ended up filling them all the time which ensure retro compatibility.

Testing the wire protocol compatibility is trivial and can easily be done the way you described.

What you think @mkeeler ?

api.go Outdated Show resolved Hide resolved
commands.go Show resolved Hide resolved
fuzzy/verifier.go Outdated Show resolved Hide resolved
net_transport.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
@dhiaayachi
Copy link
Contributor Author

@dnephin, @mkeeler I did the changes to rely on the addr field being empty instead of incrementing the protocolVersion to 4. Let me know what you think?

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I definitely think this works better.

A couple comments about why we prefer the new field, and to document the existing fields accordingly.

I started to read over the tests a bit. I must admit some of the comments in the tests (which I think predate this PR) seem either misleading or confusing. I suspect I won't be able to review the test changes properly without spending a lot more time reading over Raft source first, but I trust Matt has done that review.

testing.go Outdated Show resolved Hide resolved
replication.go Outdated Show resolved Hide resolved
Comment on lines +49 to +53
if len(req.RPCHeader.Addr) > 0 {
ldr = string(req.RPCHeader.Addr)
} else {
ldr = string(req.Leader)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to do this? Any reason not to continue to use req.Leader? Why do we prefer the new RPCHeader.Addr ?

If we are trying to phase it out, we should document that on the Leader field.

Same question about the other two places we look at Leader or Candidate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should aim to phaseout Leader and Candidate parameters, it make more sense to infer those values from the RPCHeader now and removing those will reduce the on the wire packet size sensibly. Also if we keep them it mean we need to document/test special cases behaviour (different values, Addr empty and Candidate not...).
I will add the deprecation comment for both Candidate and Leader

raft_test.go Outdated Show resolved Hide resolved
@dhiaayachi
Copy link
Contributor Author

@dnephin I will leave this open for a day or so otherwise I will merge it. I would like to start working on bringing this changes to Consul, I have a couple of fixes blocked on this.

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants