Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: add textual description of the raft.Apply process #511

Merged
merged 5 commits into from Jul 14, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion commitment.go
Expand Up @@ -6,7 +6,7 @@ import (
)

// Commitment is used to advance the leader's commit index. The leader and
// replication goroutines report in newly written entries with Match(), and
// replication goroutines report in newly written entries with match(), and
// this notifies on commitCh when the commit index has advanced.
type commitment struct {
// protects matchIndexes and commitIndex
Expand Down
49 changes: 46 additions & 3 deletions docs/apply.md
@@ -1,7 +1,8 @@
# Raft Apply

Apply is the primary operation provided by raft.

Apply is the primary operation provided by raft. A client calls `raft.Apply` to apply
a command to the FSM. If no error is returned, the applied command is commited to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to make a distinction between committed and applied here?

committed == durably stored to disk
applied == applied to the fsm.

I am thinking maybe its worthwhile to define both terms so that the reader knows the two are not interchangeable.

quorum of the raft nodes.

This sequence diagram shows the steps involved in a `raft.Apply` operation. Each box
across the top is a separate thread. The name in the box identifies the state of the peer
Expand All @@ -23,7 +24,7 @@ sequenceDiagram
leadermain->>leadermain: store logs to disk

leadermain-)leaderreplicate: triggerCh
leaderreplicate-->>followermain: AppendEntries RPC
leaderreplicate-->>followermain: Transport.AppendEntries RPC

followermain->>followermain: store logs to disk

Expand All @@ -43,3 +44,45 @@ sequenceDiagram
end

```

Following is the description of each step as shown in the above diagram

1. The raft node handles the `raft.Apply` call by creating a new log entry and send the entry
to the `applyCh` channel.

2. If the node is not a leader, the method will return an error of `ErrNotLeader`. Otherwise,
the main loop of the leader node calls `raft.dispatchLogs` to write the log entry locally.

3. `raft.dispatchLogs` also sends a notification to the `f.triggerCh` of each follower (`map[ServerID]*followerReplication`) to start replicating log entries to the followers.

4. For each follower, the leader has started a long running routine (`replicate`) to
replicates log entries. On receiving a log entry to the `triggerCh`, the `replicate`
routine makes the `Transport.AppendEntries` RPC call to do the replicatioin.
huikang marked this conversation as resolved.
Show resolved Hide resolved

5. The follower which receives the `appendEntries` RPC calls invokes `raft.appendEntries` to handle
the request. It appends any new entries to the local log store.

6. In the same method on the follower as step 5, if the LeaderCommitIndex > this follower's
commitIndex, start `processLogs` to send all the committed entries that haven't been applied
to fsm (`fsmMutateCh <- batch`).

7. The peer applies the commited entries to the FSM.

8. If all went well, the follower responds success (`resp.Success = true`) to the
`appendEntries` RPC call.
Comment on lines +97 to +98
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not exactly sure where in this flow but do you think it would be good to point out the stepping nature of the processLogs within the append entries RPC.

One RPC will contain new logs entries but the LeaderCommitIndex will lag behind those log indexes. For example the following flow is likely

AppenEntries(Log: 1..5, LeaderCommitIndex: 0)
AppendEntries(Log: 6..8, LeaderCommitIndex: 4)
AppendEntries(Log: 9, LeaderCommitIndex: 8)
AppendEntries(Log: , LeaderCommitIndex: 9)

Sometimes append entries rpcs wont contain new logs but only be bumping the commit index. Only if a server is far enough behind the majority of all servers will it ever see logs with a leader commit index which is greater or equal to the logs index within the rpc.

This subtlety is important because when the first AppendEntries RPC from my example above returns, the follower will not have queued up any applies to the FSM.

Due to this behavior its possible that a very small window of time exists when all followers have committed the log to disk, the write has been realized in the FSM of the leader but the followers have not yet applied the log to their FSM.

Paul's replies in this issue are relevant: #508

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeeler , thanks for your suggestions. Totally agreed that it is worth to point out the details. Will send a new commit to update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mkeeler , could you please take another look at the updated PR when you have a chance? Thanks.


9. On receiving the successful response from `Transport.AppendEntries`, the leader needs to
update the fsm based on the replicated log entries. Specifically, the leader finds the
highest log entry index that has been replicated to a quorum of the servers (
`if quorumMatchIndex > c.commitIndex`), update `commitIndex` to that index, and
notify through the `commitCh` channel.

10. The leader receives the notification on the `r.leaderState.commitCh` channel and starts
grouping the entries that can be applied to the fsm.

11. `processLogs` applies all the committed entries that haven't been applied by batching the log entries and forwarding them through the `fsmMutateCh` channel to fsm.

12. The actual place applying the commited log entries is in the main loop of `runFSM()`.

13. After the log entries that contains the client req are applied to the fsm, the fsm
module will set the reponses to the client request (`req.future.respond(nil)`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to close the loop on the whole process it might be worth noting that the future returned by raft.Apply should now be unblocked and calls to Error or Response should return the data at this point.