Compacter writes too much data to disk #132

dmitry-guryanov · 2018-06-29T17:44:50Z

Suppose you run ingeststore with default parameters (-store.segment-target-size is 128M) and you get 1M of log data every 4 seconds. So at the beginning comparter will find two sequential files of size 1M and merge to a file of size 2M, then it will read 2M + 1M and write 3M e.t.c. When segment size will be close to 128M it will write more than 100Mb of data every 4 seconds, while real amount of log data is only 1Mb.

I've added prometheus counted for number of bytes written in compacter, and got picture in the attachment. So by the time size of logs is 128Mb total number of data written is more than 20Gb.

I think compaction algorithm can be improved, or at least there should be a choice between slower query time because of bigger number of segment files and higher write rate.

Fix for oklog#132

dmitry-guryanov · 2018-07-11T21:05:52Z

I think instead of splitting compaction into two stages: compacting overlapping and sequential segments. We can compact all segments at the same time because mergeRecordsToLog handles any source segments correctly, they don't need to be either overlapping or non-overlapping (or even sorted). Also mergeRecordsToLog splits output segments, so that they wouldn't be larger than provided maximum size.

So I think we can just take all segment of size less than targetSize and let mergeRecordsToLog function handles all of them. To avoid frequent writing of the same data I also require in the patch that total size of segments to compact more that targetSize or number of files is not very small (>64, but maybe should be configurable)

Fix for oklog#132

Current compaction algorithm writes too much data to disk compared to number of bytes replicated. This patch implements another algorithm, which keeps a number of small files but compacts them all by one operation a bit later. Fix for oklog#132

dmitry-guryanov added a commit to dmitry-guryanov/oklog that referenced this issue Jul 11, 2018

another compation algorithm

0268236

Fix for oklog#132

dmitry-guryanov mentioned this issue Jul 11, 2018

another compaction algorithm #135

Open

dmitry-guryanov added a commit to dmitry-guryanov/oklog that referenced this issue Jul 12, 2018

another compation algorithm

2608b8f

Fix for oklog#132

denji mentioned this issue Dec 10, 2018

Add counter for number of bytes, written by compacter denji/oklog#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compacter writes too much data to disk #132

Compacter writes too much data to disk #132

dmitry-guryanov commented Jun 29, 2018

dmitry-guryanov commented Jul 11, 2018

Compacter writes too much data to disk #132

Compacter writes too much data to disk #132

Comments

dmitry-guryanov commented Jun 29, 2018

dmitry-guryanov commented Jul 11, 2018