Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Config.DisableProposalForwardingCallback for controlling message level proposal forwarding #88

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mitake
Copy link
Contributor

@mitake mitake commented Jul 24, 2023

This PR adds the new field DisableProposalForwardingCallback to the Config struct. The field is a pointer to callback function which takes raftpb.Message. A follower node calls this callback when a state machine tries to submit a MsgProp message. If the callback returns true based on the message, the follower node drops it and returns ErrProposalDropped.

If we pass a callback function which always returns true, the behavior of the Raft library will be same to the case of DisableProposalForwarding == true.

The motivation is described in #73

@ahrtr
Copy link
Member

ahrtr commented Jul 30, 2023

@mitake

  • How will you define the callback in etcd?
  • Providing such flexibility might be a little dangerous, because the raft's behavior/correction depends on application's code logic.

@mitake
Copy link
Contributor Author

mitake commented Aug 7, 2023

@ahrtr

How will you define the callback in etcd?

The callback will be defined like this: https://github.com/etcd-io/etcd/pull/16285/files#diff-2f800da9d8120de4693c10b4171d1824f7382f7854c32f0666eb9e0241479a8dR529

Providing such flexibility might be a little dangerous, because the raft's behavior/correction depends on application's code logic.

Technically the mechanism of MsgProp (forwarding proposal from a follower to a leader) isn't a part of Raft I think (the spec of Raft doesn't include this IIRC). It's an implementation specific to this raft library. Also the mechanism works at outside of the consensus protocol so it might be safe to have the callback mechanism. But I understand your concern and need to double check the Raft paper for making sure about it.

@mitake
Copy link
Contributor Author

mitake commented Aug 24, 2023

I checked the dissertation and section 6.2 Routing requests to the leader has descriptions related to this topic:

1. The first option, which we recommend and which LogCabin implements, is for the server to reject the 
request and return to the client the address of the leader, if known. This allows the client to reconnect to 
the leader directly, so future requests can proceed at full speed. It also takes very little additional code to 
implement, since clients already need to reconnect to a different server in the event of a leader failure.

2. Alternatively, the server can proxy the client’s request to the leader. This may be simpler in some cases. 
For example, if a client connects to any server for read requests (see Section 6.4), then proxying the 
client’s write requests would save the client from having to manage a dis- tinct connection to the leader 
used only for writes.

According to this description, MsgProp (corresponding to 2) is an optional mechanism for Raft. So DisableProposalForwarding can be provided by the library and it would be safe to have additional mechanism for dropping specific proposal messages. If clientv3 can have a mechanism to select a leader in client side like what clientv2 does: etcd-io/etcd#4030 it might be possible to turn on DisableProposalForwarding in etcd (not fully sure though)?

Leaders: A server might be in the leader state, but if it isn’t the current leader, it could be needlessly 
delaying client requests. For example, suppose a leader is partitioned from the rest of the cluster, but it 
can still communicate with a particular client. Without additional mechanism, it could delay a request 
from that client forever, being unable to replicate a log entry to any other servers. Meanwhile, there 
might be another leader of a newer term that is able to communicate with a majority of the cluster and 
would be able to commit the client’s request. Thus, a leader in Raft steps down if an election timeout 
elapses without a successful round of heartbeats to a majority of its cluster; this allows clients to retry 
their requests with another server.

I think this description is related to our lease revoking problem by a stale leader. In our case a client is etcd server itself, so a delayed lease revoke message will be forwarded to a leader and accepted.

Anyway I'll try this change with the failpoint based testing when I have time.

@mitake
Copy link
Contributor Author

mitake commented Aug 24, 2023

Related idea: I feel it might be good for controlling risks of new parameter of the Raft library by adding a new struct (e.g. Experimental) to Config, which collects such new parameters. Library users can know that parameters in Experimental are experimental thing and can be deprecated in future.

I think we have a few candidate of such parameters like etcd-io/etcd#7782
cc @chaochn47 you might be interested in this idea? It's great if I can have your opinion.

…level proposal forwarding

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
@mitake mitake force-pushed the disable-proposal-forwarding-callback branch from 12b7737 to ddd5443 Compare October 23, 2023 07:33
@ahrtr
Copy link
Member

ahrtr commented Nov 9, 2023

A couple of points:

  • In network isolation status, a node may still consider itself as a leader but actually it might not be. It will still broadcast the leaseRevoke request, and eventually the request will be sent to other nodes when the network is recovered.
  • This solution can't resolve the second case "Leases are wrongly revoked by the new leader" as mentioned in the Root Cause section in doc

@ahrtr
Copy link
Member

ahrtr commented Nov 9, 2023

  • In network isolation status, a node may still consider itself as a leader but actually it might not be. It will still broadcast the leaseRevoke request, and eventually the request will be sent to other nodes when the network is recovered.

It seems not a problem, because it will be rejected by other nodes due to it's smaller term.

@mitake
Copy link
Contributor Author

mitake commented Nov 12, 2023

Sorry let me check the entire of your doc later and reply in a few days @ahrtr . Let me reply to the below comment:

It seems not a problem, because it will be rejected by other nodes due to it's smaller term.

For MsgApp of LeaseRevoke, this is true. But before that MsgProp of LeaseRevoke is sent from the stale leader (etcd layer thinks itself as a leader but raft layer isn't), and it cannot be rejected because MsgProp doesn't have term information and it will be just forwarded to a new leader. I think this behavior is quite complicated so would like to write a doc including diagrams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants