Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/upsert: Use RocksDB merge operator during snapshot rehydration #27064

Merged
merged 8 commits into from
May 16, 2024

Conversation

rjobanp
Copy link
Contributor

@rjobanp rjobanp commented May 13, 2024

Motivation

Tips for reviewer

The diff isn't huge to review all as one, but you can also review commit-by-commit to understand the semantic changes if that's easier.

I also need to do some performance testing on staging to compare both rehydration methods.

Checklist

@rjobanp rjobanp force-pushed the upsert-merge-operator branch 2 times, most recently from f3d06a3 to 3def3f8 Compare May 13, 2024 22:14
@rjobanp rjobanp requested a review from guswynn May 14, 2024 13:58
@rjobanp rjobanp marked this pull request as ready for review May 14, 2024 13:59
@rjobanp rjobanp requested a review from a team as a code owner May 14, 2024 13:59
Copy link
Contributor

@guswynn guswynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple structural comments, but looks good!

Surprisingly small amount of code, just a couple of fiddly parts!

src/storage-types/src/dyncfgs.rs Show resolved Hide resolved
src/storage/src/upsert/types.rs Outdated Show resolved Hide resolved
src/storage/src/upsert/types.rs Outdated Show resolved Hide resolved
src/storage/src/upsert/types.rs Outdated Show resolved Hide resolved
src/storage/src/upsert/types.rs Show resolved Hide resolved
@rjobanp
Copy link
Contributor Author

rjobanp commented May 15, 2024

@guswynn think I addressed your feedback in the last 2 commits!

Copy link
Contributor

@guswynn guswynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hellll ya

test/upsert/mzcompose.py Show resolved Hide resolved
src/rocksdb/src/lib.rs Outdated Show resolved Hide resolved
/// The function should return the new value for the key after merging all the updates.
pub(crate) fn snapshot_merge_function<O>(
_key: UpsertKey,
updates: impl Iterator<Item = StateValue<O>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the real perf question is how expensive it is to decode all these StateValue's into value_xor Vec's

worth a TODO to supporting Cow<[u8]> in StateValue and seeing if we can use serde borrow support to avoid these extra allocations: https://serde.rs/lifetimes.html#borrowing-data-in-a-derived-impl; that way each merge function instantiation costs a singular allocation (with some possible resizes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh row doesn't support this (yet), ill talk to parker!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should 100% wrap the Vec<u8> buffer in Snapshotting in https://docs.rs/serde_bytes/latest/serde_bytes/index.html, thats a good start to reduce some cpu cycles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guswynn should I tackle that in this PR, or are you suggesting as a future optimization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine as a future optimization; if you see bad perf/high cpu usage in your testing its probably worth it though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing an improvement in my perf testing so this seems like something we should do now, though I'm going to merge this PR as-is so the next review is a clean slate based around perf improvements, and this sits behind a feature flag for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

src/storage/src/upsert/types.rs Outdated Show resolved Hide resolved
@rjobanp rjobanp merged commit 2422f54 into MaterializeInc:main May 16, 2024
73 checks passed
@rjobanp rjobanp deleted the upsert-merge-operator branch May 16, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage/sources: improve rehydration time for upsert sources
2 participants