-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/upsert: Use RocksDB merge operator during snapshot rehydration #27064
Conversation
f3d06a3
to
3def3f8
Compare
3def3f8
to
0d24d47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple structural comments, but looks good!
Surprisingly small amount of code, just a couple of fiddly parts!
…late a real 'size diff' for estimating working-set size
@guswynn think I addressed your feedback in the last 2 commits! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hellll ya
/// The function should return the new value for the key after merging all the updates. | ||
pub(crate) fn snapshot_merge_function<O>( | ||
_key: UpsertKey, | ||
updates: impl Iterator<Item = StateValue<O>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the real perf question is how expensive it is to decode all these StateValue
's into value_xor
Vec
's
worth a TODO
to supporting Cow<[u8]>
in StateValue
and seeing if we can use serde
borrow support to avoid these extra allocations: https://serde.rs/lifetimes.html#borrowing-data-in-a-derived-impl; that way each merge function instantiation costs a singular allocation (with some possible resizes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh row doesn't support this (yet), ill talk to parker!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we should 100% wrap the Vec<u8>
buffer in Snapshotting
in https://docs.rs/serde_bytes/latest/serde_bytes/index.html
, thats a good start to reduce some cpu cycles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guswynn should I tackle that in this PR, or are you suggesting as a future optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine as a future optimization; if you see bad perf/high cpu usage in your testing its probably worth it though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing an improvement in my perf testing so this seems like something we should do now, though I'm going to merge this PR as-is so the next review is a clean slate based around perf improvements, and this sits behind a feature flag for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
Motivation
Tips for reviewer
The diff isn't huge to review all as one, but you can also review commit-by-commit to understand the semantic changes if that's easier.
I also need to do some performance testing on staging to compare both rehydration methods.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.