New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocksdb: separate out the full and partial merge operators #11073
base: master
Are you sure you want to change the base?
Conversation
I submitted this as a very WIP: as I was working on this, I realized there aren't very big gains to be had here, so I figured I'd move onto more impactful things. But I also need to measure it… If anybody wants to take a look at the changes for sanity, please do. Otherwise this stays as a pending task. |
The previous code was extremely confusing and also not particularly efficient, for it was merging each pair of operators for each reference-counted value, which for refcount increments would contain the entire value of the data too. For now we only partially merge the refcount decrements only, but we should investigate if use of a partial merge operator is giving us any perf wins at all, given how we need to allocate 8 byte vectors for the reference counts all the time?
e4af612
to
cf0ff20
Compare
Indeed, the profiling shows extremely minor improvement here. I still think this is worth landing due to the clarity this introduced into the code, but this didn't turn out to improve much in terms of perf, unfortunately. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11073 +/- ##
==========================================
- Coverage 71.14% 71.08% -0.06%
==========================================
Files 761 761
Lines 152857 153073 +216
Branches 152857 153073 +216
==========================================
+ Hits 108748 108814 +66
- Misses 39672 39821 +149
- Partials 4437 4438 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -112,27 +112,55 @@ pub(crate) fn encode_negative_refcount(rc: std::num::NonZeroU32) -> [u8; 8] { | |||
/// Assumes that all provided values with positive reference count have the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about the rules above - why do we even need to keep track of rc < 0? Could operands be applied in different order where we first delete a (key, value) and add it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because originally this code was conflating the partial and full merges together. So a partial merge would run first to merge two adjacent operators, such as
- +1
- -2
to arrive at -1
which only then is applied to the refcount stored in the storage through a full merge (which might as well still be a positive refcount).
Now that this function is only handling full merges, the negative intermediate refcounts should not exist at all, otherwise there's a bug somewhere that's emitting these refcount changes.
match rc.cmp(&0) { | ||
Ordering::Less => rc.to_le_bytes().to_vec(), | ||
Ordering::Equal => Vec::new(), | ||
Ordering::Less | Ordering::Equal => Vec::new(), // "free" the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still possible for the rc to get negative in the full merge? I'm a bit scared to approve this as I don't understand why was this implemented the current way in the first place, with the negative rc
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that should not happen (especially because of the condition above that sets rc = 0
if its negative), but cmp
needs us to handle Less
case somehow. We might as well put an assert in here, but that would only serve to crash the node when we least want it to -- in production.
/// will eventually be passed to the full merge operator instead. | ||
/// | ||
/// FIXME: verify if its actually beneficial to implement this at all -- this is returning | ||
/// newly allocated vectors of data just for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unfinished comment?
/// Merge adds refcounts, zero refcount becomes empty value. | ||
/// Empty values get filtered by get methods, and removed by compaction. | ||
pub(crate) fn refcount_merge_partial( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the following scenario possible:
- add (key, value1) -- key has rc == 1
- compation with full merge
- del key
- add (key, value2)
- partial merge -> cumulative rc == 0 so value2 is lost
- full merge -> nothing happens and the value at key remains, incorrectly, value1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
operation (which is the same as refcount increment) disables partial merges entirely (by returning a None
when rc > 0
.) Partial merge operator can only combine two adjacent (for the key) non-positive refcount change operations.
EDIT: Actually... that might not be true, and you might be right, I'll need to think about it. But one thing is that partial merge operator is only ever called with 2 adjacent operators.
I have a feeling that we should disable the partial merge entirely actually...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If this scenario you describe is buggy in this PR, it would also be buggy in current implementation...)
The previous code was extremely confusing and also not particularly efficient, for it was merging each pair of operators for each reference-counted value, which for refcount increments would contain the entire value of the data too.
For now we only partially merge the refcount decrements only, but we should investigate if use of a partial merge operator is giving us any perf wins at all, given how we need to allocate 8 byte vectors for the reference counts all the time?