rocksdb: separate out the full and partial merge operators #11073

nagisa · 2024-04-15T12:16:13Z

The previous code was extremely confusing and also not particularly efficient, for it was merging each pair of operators for each reference-counted value, which for refcount increments would contain the entire value of the data too.

For now we only partially merge the refcount decrements only, but we should investigate if use of a partial merge operator is giving us any perf wins at all, given how we need to allocate 8 byte vectors for the reference counts all the time?

nagisa · 2024-04-15T12:17:29Z

I submitted this as a very WIP: as I was working on this, I realized there aren't very big gains to be had here, so I figured I'd move onto more impactful things. But I also need to measure it…

If anybody wants to take a look at the changes for sanity, please do. Otherwise this stays as a pending task.

The previous code was extremely confusing and also not particularly efficient, for it was merging each pair of operators for each reference-counted value, which for refcount increments would contain the entire value of the data too. For now we only partially merge the refcount decrements only, but we should investigate if use of a partial merge operator is giving us any perf wins at all, given how we need to allocate 8 byte vectors for the reference counts all the time?

nagisa · 2024-04-16T08:05:41Z

Indeed, the profiling shows extremely minor improvement here. I still think this is worth landing due to the clarity this introduced into the code, but this didn't turn out to improve much in terms of perf, unfortunately.

codecov · 2024-04-16T08:27:52Z

Codecov Report

Attention: Patch coverage is 98.52941% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 71.08%. Comparing base (4d506a7) to head (cf0ff20).
Report is 3 commits behind head on master.

Files	Patch %	Lines
core/store/src/db/refcount.rs	98.38%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11073      +/-   ##
==========================================
- Coverage   71.14%   71.08%   -0.06%     
==========================================
  Files         761      761              
  Lines      152857   153073     +216     
  Branches   152857   153073     +216     
==========================================
+ Hits       108748   108814      +66     
- Misses      39672    39821     +149     
- Partials     4437     4438       +1

Flag	Coverage Δ
backward-compatibility	`0.24% <0.00%> (-0.01%)`	⬇️
db-migration	`0.24% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.43% <0.00%> (-0.01%)`	⬇️
integration-tests	`36.74% <55.88%> (-0.11%)`	⬇️
linux	`69.50% <98.52%> (-0.11%)`	⬇️
linux-nightly	`70.56% <98.52%> (-0.08%)`	⬇️
macos	`54.28% <97.05%> (+1.64%)`	⬆️
pytests	`1.66% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.44% <0.00%> (-0.01%)`	⬇️
unittests	`66.79% <97.05%> (-0.01%)`	⬇️
upgradability	`0.29% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wacban · 2024-04-17T08:35:12Z

core/store/src/db/refcount.rs

@@ -112,27 +112,55 @@ pub(crate) fn encode_negative_refcount(rc: std::num::NonZeroU32) -> [u8; 8] {
 /// Assumes that all provided values with positive reference count have the same


Question about the rules above - why do we even need to keep track of rc < 0? Could operands be applied in different order where we first delete a (key, value) and add it later?

I think that's because originally this code was conflating the partial and full merges together. So a partial merge would run first to merge two adjacent operators, such as

+1

-2

to arrive at -1 which only then is applied to the refcount stored in the storage through a full merge (which might as well still be a positive refcount).

Now that this function is only handling full merges, the negative intermediate refcounts should not exist at all, otherwise there's a bug somewhere that's emitting these refcount changes.

wacban · 2024-04-17T08:36:58Z

core/store/src/db/refcount.rs

    match rc.cmp(&0) {
-        Ordering::Less => rc.to_le_bytes().to_vec(),
-        Ordering::Equal => Vec::new(),
+        Ordering::Less | Ordering::Equal => Vec::new(), // "free" the data.


Is it still possible for the rc to get negative in the full merge? I'm a bit scared to approve this as I don't understand why was this implemented the current way in the first place, with the negative rcs.

I believe that should not happen (especially because of the condition above that sets rc = 0 if its negative), but cmp needs us to handle Less case somehow. We might as well put an assert in here, but that would only serve to crash the node when we least want it to -- in production.

wacban · 2024-04-17T08:37:21Z

core/store/src/db/refcount.rs

+/// will eventually be passed to the full merge operator instead.
+///
+/// FIXME: verify if its actually beneficial to implement this at all -- this is returning
+/// newly allocated vectors of data just for


nit: unfinished comment?

wacban · 2024-04-17T08:47:19Z

core/store/src/db/refcount.rs

+    /// Merge adds refcounts, zero refcount becomes empty value.
+    /// Empty values get filtered by get methods, and removed by compaction.
+    pub(crate) fn refcount_merge_partial(


Is the following scenario possible:

add (key, value1) -- key has rc == 1

compation with full merge

del key

add (key, value2)

partial merge -> cumulative rc == 0 so value2 is lost

full merge -> nothing happens and the value at key remains, incorrectly, value1

add operation (which is the same as refcount increment) disables partial merges entirely (by returning a None when rc > 0.) Partial merge operator can only combine two adjacent (for the key) non-positive refcount change operations.

EDIT: Actually... that might not be true, and you might be right, I'll need to think about it. But one thing is that partial merge operator is only ever called with 2 adjacent operators.

I have a feeling that we should disable the partial merge entirely actually...

(If this scenario you describe is buggy in this PR, it would also be buggy in current implementation...)

nagisa force-pushed the rocksdb-refcount-improvements branch from e4af612 to cf0ff20 Compare April 16, 2024 08:02

nagisa marked this pull request as ready for review April 16, 2024 08:40

nagisa requested a review from a team as a code owner April 16, 2024 08:40

nagisa requested a review from wacban April 16, 2024 08:40

wacban reviewed Apr 17, 2024

View reviewed changes

nagisa marked this pull request as draft April 19, 2024 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rocksdb: separate out the full and partial merge operators #11073

rocksdb: separate out the full and partial merge operators #11073

nagisa commented Apr 15, 2024

nagisa commented Apr 15, 2024

nagisa commented Apr 16, 2024

codecov bot commented Apr 16, 2024

wacban Apr 17, 2024

nagisa Apr 17, 2024 •

edited

wacban Apr 17, 2024

nagisa Apr 17, 2024 •

edited

wacban Apr 17, 2024

wacban Apr 17, 2024

nagisa Apr 17, 2024 •

edited

nagisa Apr 17, 2024

		@@ -112,27 +112,55 @@ pub(crate) fn encode_negative_refcount(rc: std::num::NonZeroU32) -> [u8; 8] {
		/// Assumes that all provided values with positive reference count have the same

rocksdb: separate out the full and partial merge operators #11073

Are you sure you want to change the base?

rocksdb: separate out the full and partial merge operators #11073

Conversation

nagisa commented Apr 15, 2024

nagisa commented Apr 15, 2024

nagisa commented Apr 16, 2024

codecov bot commented Apr 16, 2024

Codecov Report

wacban Apr 17, 2024

Choose a reason for hiding this comment

nagisa Apr 17, 2024 • edited

Choose a reason for hiding this comment

wacban Apr 17, 2024

Choose a reason for hiding this comment

nagisa Apr 17, 2024 • edited

Choose a reason for hiding this comment

wacban Apr 17, 2024

Choose a reason for hiding this comment

wacban Apr 17, 2024

Choose a reason for hiding this comment

nagisa Apr 17, 2024 • edited

Choose a reason for hiding this comment

nagisa Apr 17, 2024

Choose a reason for hiding this comment

nagisa Apr 17, 2024 •

edited

nagisa Apr 17, 2024 •

edited

nagisa Apr 17, 2024 •

edited