Command#Apply is passed inconsistent context information. #239

jrallison · 2014-10-22T15:52:18Z

Hey all,

I'm using raft for a project, and using context.CurrentIndex() in my command's Apply function for some bookkeeping.

I noticed, in different circumstances, the context.CurrentIndex() is inconsistent. For instance, committing 3 entries, in succession against a new 1-node cluster, results in the following contexts passed to the Apply function:

Term: 0 CurrentIndex: 3 CommitIndex: 3
Term: 0 CurrentIndex: 4 CommitIndex: 4
Term: 0 CurrentIndex: 5 CommitIndex: 5

Killing and restarting the node will force it to recover from the log. While recovering, we receive the following contexts as the node replays the committed entries:

Term: 0 CurrentIndex: 5 CommitIndex: 3
Term: 0 CurrentIndex: 5 CommitIndex: 4
Term: 0 CurrentIndex: 5 CommitIndex: 5

This is due to how the internalCurrentIndex log method is computed.

I've added a failing test, and adjusted the code to always return the index of the entry for context.CurrentIndex() (which is the current index we're applying to the state machine).

Let me know if anything here is unclear, or if I've misunderstood the meaning of context.CurrentIndex().

jrallison · 2014-10-22T16:01:03Z

Note that using the CommitIndex value isn't usable either. As a new node added to the cluster will receive the following information as it receives committed entries from the existing node.

Term: 0 CurrentIndex: 3 CommitIndex: 5
Term: 0 CurrentIndex: 4 CommitIndex: 5
Term: 0 CurrentIndex: 5 CommitIndex: 5

Which seems to make sense. The change in this pull request unifies the context information, so that it's consistent in all cases.

otoolep · 2014-10-23T17:02:29Z

I don't know for sure either, but it's not clear to me why you think this is wrong behaviour. Couldn't CurrentIndex be interpreted as the next place a log entry will be written, and that the recovery case you outline is an exceptional case where CurrentIndex is quite a bit ahead of where the next log message will be written? But once recovery is complete, everything is OK?

Does this behaviour prevent you from building what you need to build?

jrallison · 2014-10-23T17:27:54Z

Hey @otoolep, it is as I cannot safely know which commit I'm currently applying against my state machine. I'm using raft to manage consensus for some external data rather than just an in-memory state machine, and knowing whether or not I've already processed a given commit in the past is important for correctness.

At any given time, I'm applying a single commit to the state machine on a given node. With the current implementation, a specific entry can be applied (or re-applied in the case of a node restart, etc) with totally different CurrentIndex and CommitIndex values.

CommitIndex is the one that seems to be the next place a new log entry will be written from the perspective of a given node, if I understand correctly.

At the very least, this makes it difficult to program against, or use in any significant fashion, as there doesn't seem to be any firm definition of what to expect from CurrentIndex.

otoolep · 2014-10-23T17:36:31Z

Hang on -- why are you worrying if a particular commit has been applied in the past? Shouldn't a recovery commence with restoring the data source under that node from snapshot, and then applying any outstanding log entries? Is it not possible to wipe out the data source, and then apply a snapshot (or simply replay the entire log?

I know this doesn't directly address your original question, but I am not sure I follow why you have issue in the first place. Perhaps you can't snap-and-restore your external data source? I do exactly this with rqlite and it works fine. I am interested in learning more about your application, to see if I can help.

jrallison · 2014-10-23T18:17:44Z

We're managing consensus of insert ordering for a time-based stream of data that grows without bound, and doesn't fit in memory (think terabytes of data). We're using raft to manage the "tail" of the stream, so that each node can be confident that it is writing a consistent ordering of the data.

We do use snap-and-restore to essentially truncate the raft log at certain points. Since our data is immutable once written, at regular intervals we record the current index, take a snapshot which contains the current index applied, etc), and archive the block of data written since the last snapshot.

In the case of a new node added to the cluster, or a node being down for multiple snapshots, the nodes have ways to restore missing archived blocks outside of raft (think something like https://github.com/golang/groupcache).

The issue becomes, since we're marking the current index for these snapshots and we have no safe way of knowing which index we've just applied, we end up with the potential for duplicated data when a node is recovering, hence causing inconsistency between nodes of the underlying data.

If I'm thinking about this wrong, or using raft incorrectly, I'd love to hear your thoughts.

In our testing, simply being able to keep track of the last index we've applied to our underlying data is enough to prevent this duplication.

otoolep · 2014-10-24T00:57:29Z

OK, I think I follow you. In general, if a snapshot of your system does not contain the complete state, I think I see how you could encounter problems. I need to think about it a bit more to be sure.

But it's not 100% clear to me from the source if there is a bug. It may just be the way that variable is meant to work.

jrallison · 2014-10-24T18:06:21Z

Thanks for the feedback. Perhaps another approach would be to add a new context field with the... index of the current command being applied.

Would love to get a better understanding of the intended meaning of the CurrentIndex/CommitIndex information.

otoolep · 2014-10-24T19:01:29Z

Yeah, @jrallison I think enhancing the information passed to Apply() might be something worth trying.

Apply should be called with proper current index

6ebcfd5

jrallison changed the title ~~Command#Apply context is passed inconsistent information.~~ Command#Apply is passed inconsistent context information. Oct 22, 2014

philips mentioned this pull request Oct 27, 2014

README: clarify etcd's use of raft #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command#Apply is passed inconsistent context information. #239

Command#Apply is passed inconsistent context information. #239

jrallison commented Oct 22, 2014

jrallison commented Oct 22, 2014

otoolep commented Oct 23, 2014

jrallison commented Oct 23, 2014

otoolep commented Oct 23, 2014

jrallison commented Oct 23, 2014

otoolep commented Oct 24, 2014

jrallison commented Oct 24, 2014

otoolep commented Oct 24, 2014

Command#Apply is passed inconsistent context information. #239

Are you sure you want to change the base?

Command#Apply is passed inconsistent context information. #239

Conversation

jrallison commented Oct 22, 2014

jrallison commented Oct 22, 2014

otoolep commented Oct 23, 2014

jrallison commented Oct 23, 2014

otoolep commented Oct 23, 2014

jrallison commented Oct 23, 2014

otoolep commented Oct 24, 2014

jrallison commented Oct 24, 2014

otoolep commented Oct 24, 2014