Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocksdb: separate out the full and partial merge operators #11073

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nagisa
Copy link
Collaborator

@nagisa nagisa commented Apr 15, 2024

The previous code was extremely confusing and also not particularly efficient, for it was merging each pair of operators for each reference-counted value, which for refcount increments would contain the entire value of the data too.

For now we only partially merge the refcount decrements only, but we should investigate if use of a partial merge operator is giving us any perf wins at all, given how we need to allocate 8 byte vectors for the reference counts all the time?

@nagisa
Copy link
Collaborator Author

nagisa commented Apr 15, 2024

I submitted this as a very WIP: as I was working on this, I realized there aren't very big gains to be had here, so I figured I'd move onto more impactful things. But I also need to measure it…

If anybody wants to take a look at the changes for sanity, please do. Otherwise this stays as a pending task.

The previous code was extremely confusing and also not particularly
efficient, for it was merging each pair of operators for each
reference-counted value, which for refcount increments would contain the
entire value of the data too.

For now we only partially merge the refcount decrements only, but we
should investigate if use of a partial merge operator is giving us any
perf wins at all, given how we need to allocate 8 byte vectors for the
reference counts all the time?
@nagisa nagisa force-pushed the rocksdb-refcount-improvements branch from e4af612 to cf0ff20 Compare April 16, 2024 08:02
@nagisa
Copy link
Collaborator Author

nagisa commented Apr 16, 2024

Indeed, the profiling shows extremely minor improvement here. I still think this is worth landing due to the clarity this introduced into the code, but this didn't turn out to improve much in terms of perf, unfortunately.

Copy link

codecov bot commented Apr 16, 2024

Codecov Report

Attention: Patch coverage is 98.52941% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 71.08%. Comparing base (4d506a7) to head (cf0ff20).
Report is 3 commits behind head on master.

Files Patch % Lines
core/store/src/db/refcount.rs 98.38% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11073      +/-   ##
==========================================
- Coverage   71.14%   71.08%   -0.06%     
==========================================
  Files         761      761              
  Lines      152857   153073     +216     
  Branches   152857   153073     +216     
==========================================
+ Hits       108748   108814      +66     
- Misses      39672    39821     +149     
- Partials     4437     4438       +1     
Flag Coverage Δ
backward-compatibility 0.24% <0.00%> (-0.01%) ⬇️
db-migration 0.24% <0.00%> (-0.01%) ⬇️
genesis-check 1.43% <0.00%> (-0.01%) ⬇️
integration-tests 36.74% <55.88%> (-0.11%) ⬇️
linux 69.50% <98.52%> (-0.11%) ⬇️
linux-nightly 70.56% <98.52%> (-0.08%) ⬇️
macos 54.28% <97.05%> (+1.64%) ⬆️
pytests 1.66% <0.00%> (-0.01%) ⬇️
sanity-checks 1.44% <0.00%> (-0.01%) ⬇️
unittests 66.79% <97.05%> (-0.01%) ⬇️
upgradability 0.29% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nagisa nagisa marked this pull request as ready for review April 16, 2024 08:40
@nagisa nagisa requested a review from a team as a code owner April 16, 2024 08:40
@nagisa nagisa requested a review from wacban April 16, 2024 08:40
@@ -112,27 +112,55 @@ pub(crate) fn encode_negative_refcount(rc: std::num::NonZeroU32) -> [u8; 8] {
/// Assumes that all provided values with positive reference count have the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about the rules above - why do we even need to keep track of rc < 0? Could operands be applied in different order where we first delete a (key, value) and add it later?

Copy link
Collaborator Author

@nagisa nagisa Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's because originally this code was conflating the partial and full merges together. So a partial merge would run first to merge two adjacent operators, such as

  • +1
  • -2

to arrive at -1 which only then is applied to the refcount stored in the storage through a full merge (which might as well still be a positive refcount).

Now that this function is only handling full merges, the negative intermediate refcounts should not exist at all, otherwise there's a bug somewhere that's emitting these refcount changes.

match rc.cmp(&0) {
Ordering::Less => rc.to_le_bytes().to_vec(),
Ordering::Equal => Vec::new(),
Ordering::Less | Ordering::Equal => Vec::new(), // "free" the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible for the rc to get negative in the full merge? I'm a bit scared to approve this as I don't understand why was this implemented the current way in the first place, with the negative rcs.

Copy link
Collaborator Author

@nagisa nagisa Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that should not happen (especially because of the condition above that sets rc = 0 if its negative), but cmp needs us to handle Less case somehow. We might as well put an assert in here, but that would only serve to crash the node when we least want it to -- in production.

/// will eventually be passed to the full merge operator instead.
///
/// FIXME: verify if its actually beneficial to implement this at all -- this is returning
/// newly allocated vectors of data just for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unfinished comment?

Comment on lines +182 to +184
/// Merge adds refcounts, zero refcount becomes empty value.
/// Empty values get filtered by get methods, and removed by compaction.
pub(crate) fn refcount_merge_partial(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the following scenario possible:

  • add (key, value1) -- key has rc == 1
  • compation with full merge
  • del key
  • add (key, value2)
  • partial merge -> cumulative rc == 0 so value2 is lost
  • full merge -> nothing happens and the value at key remains, incorrectly, value1

Copy link
Collaborator Author

@nagisa nagisa Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add operation (which is the same as refcount increment) disables partial merges entirely (by returning a None when rc > 0.) Partial merge operator can only combine two adjacent (for the key) non-positive refcount change operations.


EDIT: Actually... that might not be true, and you might be right, I'll need to think about it. But one thing is that partial merge operator is only ever called with 2 adjacent operators.

I have a feeling that we should disable the partial merge entirely actually...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If this scenario you describe is buggy in this PR, it would also be buggy in current implementation...)

@nagisa nagisa marked this pull request as draft April 19, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants