Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add last written position as metadata in snapshot #10115

Closed
Tracked by #9606
deepthidevaki opened this issue Aug 18, 2022 · 3 comments · Fixed by #10121
Closed
Tracked by #9606

Add last written position as metadata in snapshot #10115

deepthidevaki opened this issue Aug 18, 2022 · 3 comments · Fixed by #10121
Assignees
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@deepthidevaki
Copy link
Contributor

Description

Ref:- #9906 (comment)

For backup manager to identify if a snapshot can be included in the snapshot, a snapshot must include the position of the followup event included in the snapshot as part of its metadata. This event is the event whose source position = lastProcessedPosition in the snapshot. Since it is not possible to find actual "lastProcessedPosition" in the snapshot without opening the database, we can approximate it by taking "lastWrittenPosition" used by the AsyncSnapshotDirector to commit the snapshot. This value might be an over approximation, but it is safe to use it.

@deepthidevaki deepthidevaki added the kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. label Aug 18, 2022
@deepthidevaki deepthidevaki self-assigned this Aug 18, 2022
@deepthidevaki
Copy link
Contributor Author

Here is what I did to achieve this:

  • Added a PersistedSnapshot::getMetadata which returns a SnapshotMetadata
  • SnapshotMetadata is written to a file named ZEEBE.METADATA in the snapshot path.
  • SnapshotMetadata is serialized as json. This way we can also add new fields later without breaking compatibility.
  • The metadata file is inside the snapshot. So it is replicated as a normal "SnapshotChunk". This way the replication protocol remains unchanged => backward compatible. On the receiver side, it can recognize the metadata file from its name and build a SnapshotMetadata object by reading the file.
  • For backward compatibility, we handle the cases where metadata file is not available by building an incomplete SnapshotMetadata with known values.
  • SnapshotMetadata tentatively consists of following
    • processedPosition (same as in snapshot id)
    • exportedPosition (same as in snapshotid)
    • lastFollowupEventPosition (provided by AsyncSnapshotDirector)
    • version (we have a version in PersistedSnapshot, but it is not used and not persisted. By persisting it in metadata we can use it later for backwards compatibity)

Snapshot directory content with metadata will be as follows

> tree data/raft-partition/partitions/1/snapshots/
data/raft-partition/partitions/1/snapshots/
├── 19-1-39-19
│   ├── 000008.log
│   ├── 000009.sst
│   ├── CURRENT
│   ├── MANIFEST-000005
│   ├── OPTIONS-000007
│   └── ZEEBE.METADATA
└── 19-1-39-19.checksum

@npepinpe
Copy link
Member

Looks sounds. The only point is the JSON serialization. It has nice properties (human readable, editable, etc.) but I'm not sure it solves any form of schema evolution. Sure, you can add fields, but you could do that with MsgPack, or Protobuf, or YAML, or even SBE.

As we aren't reading often from it, I think the benefits of human readability trumps performance, so I don't think SBE would be particularly beneficial here. But if we really want to tackle schema evolution, I don't think JSON alone is enough for that. If backwards/forwards compatibility is important, then we need to explicitly have some form of update test to guarantee it. Just my two cents.

@deepthidevaki
Copy link
Contributor Author

Human readability was definitely the main motivation to chose json. Also I think handling backward/forward compatibility is easy as long as we are only adding new fields by setting ignore unknown properties = true. I think, we can also remove fields because it is possible to deserialize a partial json object. Agree that we need tests to ensure compatibility.

@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants