This repository has been archived by the owner on Sep 8, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 168
Compacter writes too much data to disk #132
Comments
dmitry-guryanov
added a commit
to dmitry-guryanov/oklog
that referenced
this issue
Jul 11, 2018
I think instead of splitting compaction into two stages: compacting overlapping and sequential segments. We can compact all segments at the same time because mergeRecordsToLog handles any source segments correctly, they don't need to be either overlapping or non-overlapping (or even sorted). Also mergeRecordsToLog splits output segments, so that they wouldn't be larger than provided maximum size. So I think we can just take all segment of size less than targetSize and let mergeRecordsToLog function handles all of them. To avoid frequent writing of the same data I also require in the patch that total size of segments to compact more that targetSize or number of files is not very small (>64, but maybe should be configurable) |
dmitry-guryanov
added a commit
to dmitry-guryanov/oklog
that referenced
this issue
Jul 12, 2018
dmitry-guryanov
added a commit
to dmitry-guryanov/oklog
that referenced
this issue
Jul 13, 2018
Current compaction algorithm writes too much data to disk compared to number of bytes replicated. This patch implements another algorithm, which keeps a number of small files but compacts them all by one operation a bit later. Fix for oklog#132
dmitry-guryanov
added a commit
to dmitry-guryanov/oklog
that referenced
this issue
Jul 16, 2018
Current compaction algorithm writes too much data to disk compared to number of bytes replicated. This patch implements another algorithm, which keeps a number of small files but compacts them all by one operation a bit later. Fix for oklog#132
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Suppose you run ingeststore with default parameters (-store.segment-target-size is 128M) and you get 1M of log data every 4 seconds. So at the beginning comparter will find two sequential files of size 1M and merge to a file of size 2M, then it will read 2M + 1M and write 3M e.t.c. When segment size will be close to 128M it will write more than 100Mb of data every 4 seconds, while real amount of log data is only 1Mb.
I've added prometheus counted for number of bytes written in compacter, and got picture in the attachment. So by the time size of logs is 128Mb total number of data written is more than 20Gb.
I think compaction algorithm can be improved, or at least there should be a choice between slower query time because of bigger number of segment files and higher write rate.
The text was updated successfully, but these errors were encountered: