Skip to content
This repository has been archived by the owner on Sep 8, 2018. It is now read-only.

Compacter writes too much data to disk #132

Open
dmitry-guryanov opened this issue Jun 29, 2018 · 1 comment
Open

Compacter writes too much data to disk #132

dmitry-guryanov opened this issue Jun 29, 2018 · 1 comment

Comments

@dmitry-guryanov
Copy link

Suppose you run ingeststore with default parameters (-store.segment-target-size is 128M) and you get 1M of log data every 4 seconds. So at the beginning comparter will find two sequential files of size 1M and merge to a file of size 2M, then it will read 2M + 1M and write 3M e.t.c. When segment size will be close to 128M it will write more than 100Mb of data every 4 seconds, while real amount of log data is only 1Mb.

I've added prometheus counted for number of bytes written in compacter, and got picture in the attachment. So by the time size of logs is 128Mb total number of data written is more than 20Gb.

I think compaction algorithm can be improved, or at least there should be a choice between slower query time because of bigger number of segment files and higher write rate.

screenshot_2018-06-29_15-14-35

dmitry-guryanov added a commit to dmitry-guryanov/oklog that referenced this issue Jul 11, 2018
@dmitry-guryanov
Copy link
Author

I think instead of splitting compaction into two stages: compacting overlapping and sequential segments. We can compact all segments at the same time because mergeRecordsToLog handles any source segments correctly, they don't need to be either overlapping or non-overlapping (or even sorted). Also mergeRecordsToLog splits output segments, so that they wouldn't be larger than provided maximum size.

So I think we can just take all segment of size less than targetSize and let mergeRecordsToLog function handles all of them. To avoid frequent writing of the same data I also require in the patch that total size of segments to compact more that targetSize or number of files is not very small (>64, but maybe should be configurable)

dmitry-guryanov added a commit to dmitry-guryanov/oklog that referenced this issue Jul 12, 2018
dmitry-guryanov added a commit to dmitry-guryanov/oklog that referenced this issue Jul 13, 2018
Current compaction algorithm writes too much data to disk
compared to number of bytes replicated.

This patch implements another algorithm, which keeps a number of
small files but compacts them all by one operation a bit later.

Fix for oklog#132
dmitry-guryanov added a commit to dmitry-guryanov/oklog that referenced this issue Jul 16, 2018
Current compaction algorithm writes too much data to disk
compared to number of bytes replicated.

This patch implements another algorithm, which keeps a number of
small files but compacts them all by one operation a bit later.

Fix for oklog#132
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant