Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge: Avoid docstore stacking for small segments #1053

Closed
PSeitz opened this issue May 19, 2021 · 1 comment
Closed

Merge: Avoid docstore stacking for small segments #1053

PSeitz opened this issue May 19, 2021 · 1 comment

Comments

@PSeitz
Copy link
Contributor

PSeitz commented May 19, 2021

In the merge code the doc store stacking avoids re-compression of blocks by stacking the blocks of the existing segments, which is great, since it is much faster.
But there maybe scenarios where we have many small committed segments. In that case, we should try to go for the slower block recreation until we have segments with large enough segments to stack them.

Example:
Segment 1..8 to merged, doc store blocks sizes:

|1kb block|2kb block|1kb block|3kb block|1kb block|2kb block|1kb block|3kb block|

Currently we would carry them over

|1kb block|2kb block|1kb block|3kb block|1kb block|2kb block|1kb block|3kb block|

In this case we would want them to be merged into one block

|14kb block|

Open Question, what should be the threshold to start stacking:
e.g. when we have on average 5 full blocks per segment, we could start stacking.

Complete Alternative: Have a global cross segment doc store.

@fulmicoton
Copy link
Collaborator

That makes sense and 5 blocks is an ok threshold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants