Reuse byte buffer in WriteChunks and writeHash #653

codesome · 2019-07-05T08:56:03Z

This is a broken down piece of #627

CHANGELOG entry

[Don't merge until Prometheus 2.11 is out]

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome · 2019-07-05T09:02:45Z

Here are the benchmark results (ns/op diff fluctuates, it's not always negative)

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     13072562459     13007510801     -0.50%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     15860945       12854710       -18.95%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2075024560     2063998112     -0.53%

csmarchbanks

Rather than changing the interface, would it make sense to add the buffer to the Writer struct, and then initialize it to the correct size in NewWriter?

chunks/chunks.go

codesome · 2019-07-09T06:33:54Z

Rather than changing the interface, would it make sense to add the buffer to the Writer struct, and then initialize it to the correct size in NewWriter?

That is a good idea. I will make the change.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

bwplotka · 2019-07-09T22:08:36Z

chunks/chunks.go

-	if _, err := h.Write([]byte{byte(cm.Chunk.Encoding())}); err != nil {
+// 'buf' byte slice is used to avoid allocating new byte slice. Assumes that len(buf) is 0.
+func (cm *Meta) writeHash(h hash.Hash, buf []byte) error {
+	buf = append(buf[:0], byte(cm.Chunk.Encoding()))


Hm.. are we trying to save here one byte of allocation per ops? (: Also not sure if it really assumes that len(buf) is 0 right?

I forgot the remove the old comment (removed now), we don't assume len(buf) to be 0 now.

Hm.. are we trying to save here one byte of allocation per ops

Nope. Slice header = 2 bytes (?), 1 byte for the data. And this is called per series per chunk. So it would be millions of allocs at a large scale. This is the benchmark difference just for the writeHash (the base contains optimization in WriteChunks).

benchmark old allocs new allocs delta BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8 14825915 12829175 -13.47%

Apparently, in my benchmark, about 60-67% of allocs savings is from writeHash and remaining from re-using buffer in WriteChunks.

Wouldn't new slice headers be created when passing the slice around, so the only saving is the single byte? That said, I can see how this being in an inner loop could save a lot of gc.

Wouldn't new slice headers be created when passing the slice around

Interesting, I didn't know/think about it!

Nope. Slice header = 2 bytes (?),

https://golang.org/pkg/reflect/#SliceHeader

I can see 3 integers so it's definitely not 2 bytes -> rather 12 as those probably are int64 (:

Shows that I was in a hurry 😅. I considered 1 byte per variable without much thought!

The savings in allocs with this is good anyway, so WDYT?

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

csmarchbanks · 2019-07-11T18:53:34Z

chunks/chunks.go

@@ -97,6 +98,7 @@ type Writer struct {
 	wbuf    *bufio.Writer
 	n       int64
 	crc32   hash.Hash
+	buf     [binary.MaxVarintLen32]byte


A thought here, by moving this array to the chunk, it will now always be allocated to the heap, rather than having a chance of being allocated to the stack when it was in the function correct? Could that make things actually less efficient overall?

Not sure if it matters, but I am a bit curious to have a brief discussion.

I think this might be a valid point - it was likely on stack in previous version and thus a noop for gc and quickly cleared when the function ends.

But still allocations for those 5 bytes has to be there for every call instead of just one time in creation time. But are those allocs visible on allocs for mem benchmarks? Aren't those for heap only? Or it is just improving latency @codesome ? Would be interested why we see gain in benchmarks here as well (:

chunks.Writer is created only once per compaction, so I would not worry much about it (5-byte slice) being allocated either in heap or stack (would it make a big difference?). This looked cleaner hence kept it like that.

bwplotka

It's LGTM from my side, thanks for explanation.

Some questions around the second improvement as @csmarchbanks mentioned. (:

bwplotka · 2019-07-12T05:59:53Z

chunks/chunks.go

@@ -97,6 +98,7 @@ type Writer struct {
 	wbuf    *bufio.Writer
 	n       int64
 	crc32   hash.Hash
+	buf     [binary.MaxVarintLen32]byte


I think this might be a valid point - it was likely on stack in previous version and thus a noop for gc and quickly cleared when the function ends.

But still allocations for those 5 bytes has to be there for every call instead of just one time in creation time. But are those allocs visible on allocs for mem benchmarks? Aren't those for heap only? Or it is just improving latency @codesome ? Would be interested why we see gain in benchmarks here as well (:

gouthamve

LGTM from me as well with a nit and a question.

Also, I haven't done too much optimisation, but these changes honestly look like micro-optimisations to me.

gouthamve · 2019-07-12T06:33:30Z

chunks/chunks.go

-func (cm *Meta) writeHash(h hash.Hash) error {
-	if _, err := h.Write([]byte{byte(cm.Chunk.Encoding())}); err != nil {
+func (cm *Meta) writeHash(h hash.Hash, buf []byte) error {
+	buf = append(buf[:0], byte(cm.Chunk.Encoding()))


Can we be consistent here and do buf[0] = byte(...) as we do here: https://github.com/prometheus/tsdb/pull/653/files#diff-ba6ef14901e90aea742f3fd2909d7f07R315

As writeHash is not a part of chunks.Writer, I would not make any size assumptions of the passed buffer. It would be safer to append. The one you pointed is a chunks.Writer method and is assured of it's size. (But if you think it is fine as it is all internal, I can make the change)

gouthamve · 2019-07-12T06:34:06Z

chunks/chunks.go

 	for i := range chks {
 		chk := &chks[i]

 		chk.Ref = seq | uint64(w.n)

-		n := binary.PutUvarint(b[:], uint64(len(chk.Chunk.Bytes())))
+		n := binary.PutUvarint(w.buf[:], uint64(len(chk.Chunk.Bytes())))


Curious what the difference b/w passing w.buf and w.buf[:] is...

w.buf is an array, but PutUvarint expects a slice. w.buf[:] would convert an array into a slice.

codesome · 2019-07-12T06:47:57Z

but these changes honestly look like micro-optimisations to me

~19% savings in allocs, would not call it micro-opt :)
All these optimizations add up to 76-77% saved allocs during compaction. #627 (comment)

bwplotka · 2019-07-12T08:01:25Z

👍

csmarchbanks

LGTM as well!

krasi-georgiev · 2019-07-12T13:15:37Z

@codesome I would also prefer to run a prombench test with all these before merging to justify this additional complexity.

codesome · 2019-07-12T13:21:14Z

@krasi-georgiev given that I have merged multiple optimizations, do you want me to run prombench for each of them separately?

krasi-georgiev · 2019-07-12T13:36:29Z

I would guess each one would be a bit of a headache so maybe all at once.
However you prefer, I don't mind.
Main point is to decide whether it is worth this extra mental baggage.

codesome · 2019-07-12T13:41:50Z

I will start prombench with all together on Monday as I wouldn't be able to track on the weekends.

codesome added 2 commits July 3, 2019 15:24

Re-use byte buffer in WriteChunks and writeHash

e24fad9

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Merge remote-tracking branch 'upstream/master' into write-chunks-buf

388b70a

codesome mentioned this pull request Jul 5, 2019

Attempt to reduce memory footprint of compaction #627

Closed

csmarchbanks reviewed Jul 8, 2019

View reviewed changes

chunks/chunks.go Outdated Show resolved Hide resolved

Add CHANGELOG entry

9d4e1c6

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the write-chunks-buf branch from a371048 to b93ea11 Compare July 9, 2019 08:10

Fix review comments

41bacfe

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the write-chunks-buf branch from b93ea11 to 41bacfe Compare July 9, 2019 09:39

Merge remote-tracking branch 'upstream/master' into write-chunks-buf

1befe60

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

bwplotka reviewed Jul 9, 2019

View reviewed changes

Remove old comment

1a9aef1

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

csmarchbanks reviewed Jul 11, 2019

View reviewed changes

bwplotka reviewed Jul 12, 2019

View reviewed changes

gouthamve approved these changes Jul 12, 2019

View reviewed changes

csmarchbanks approved these changes Jul 12, 2019

View reviewed changes

codesome merged commit 3bc1ea3 into prometheus-junkyard:master Jul 12, 2019

codesome mentioned this pull request Jul 15, 2019

Benchmark tsdb master prometheus/prometheus#5765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse byte buffer in WriteChunks and writeHash #653

Reuse byte buffer in WriteChunks and writeHash #653

codesome commented Jul 5, 2019 •

edited

codesome commented Jul 5, 2019

csmarchbanks left a comment

codesome commented Jul 9, 2019

bwplotka Jul 9, 2019

codesome Jul 10, 2019

csmarchbanks Jul 10, 2019

codesome Jul 10, 2019

bwplotka Jul 11, 2019 •

edited

codesome Jul 11, 2019

bwplotka Jul 12, 2019

csmarchbanks Jul 11, 2019

bwplotka Jul 12, 2019

codesome Jul 12, 2019

bwplotka left a comment

bwplotka Jul 12, 2019

gouthamve left a comment

gouthamve Jul 12, 2019

codesome Jul 12, 2019

gouthamve Jul 12, 2019

codesome Jul 12, 2019

bwplotka Jul 12, 2019

codesome commented Jul 12, 2019

bwplotka commented Jul 12, 2019

csmarchbanks left a comment

krasi-georgiev commented Jul 12, 2019

codesome commented Jul 12, 2019

krasi-georgiev commented Jul 12, 2019

codesome commented Jul 12, 2019

Reuse byte buffer in WriteChunks and writeHash #653

Reuse byte buffer in WriteChunks and writeHash #653

Conversation

codesome commented Jul 5, 2019 • edited

codesome commented Jul 5, 2019

csmarchbanks left a comment

Choose a reason for hiding this comment

codesome commented Jul 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka Jul 11, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gouthamve left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codesome commented Jul 12, 2019

bwplotka commented Jul 12, 2019

csmarchbanks left a comment

Choose a reason for hiding this comment

krasi-georgiev commented Jul 12, 2019

codesome commented Jul 12, 2019

krasi-georgiev commented Jul 12, 2019

codesome commented Jul 12, 2019

codesome commented Jul 5, 2019 •

edited

bwplotka Jul 11, 2019 •

edited