GODRIVER-2914 bsoncodec/bsonrw: eliminate encoding allocations #1323

charlievieth · 2023-07-15T03:20:13Z

Summary

This commit eliminates all allocations during marshalling (except for
the final []byte slice allocation, which is unavoidable). This matches
the behavior of encoding/json. It also prevents the vwPool from leaking
writers and includes a few small optimizations to the value_writer.

Other Changes:

bsoncodec: reduce use of reflect.Value.Interface()
bson: use a buffer pool instead of SliceWriter to eliminate 2 allocs
bsonrw: use references to slice indexes

This work builds off of #1313
and uses the BenchmarkCode* suite added by that PR.

The combined performance improvement is below:

goos: darwin
goarch: arm64
pkg: go.mongodb.org/mongo-driver/bson
                      │ base.20.txt  │             new.20.txt              │
                      │    sec/op    │   sec/op     vs base                │
CodeUnmarshal/BSON-10    3.177m ± 1%   2.202m ± 1%  -30.68% (p=0.000 n=20)
CodeMarshal/BSON-10     2922.7µ ± 0%   757.1µ ± 2%  -74.10% (p=0.000 n=20)
geomean                  3.047m        1.291m       -57.63%

                      │ base.20.txt  │               new.20.txt               │
                      │     B/s      │      B/s       vs base                 │
CodeUnmarshal/BSON-10   582.5Mi ± 1%    840.4Mi ± 1%   +44.26% (p=0.000 n=20)
CodeMarshal/BSON-10     633.2Mi ± 0%   2444.3Mi ± 2%  +286.03% (p=0.000 n=20)
geomean                 607.3Mi         1.400Gi       +135.99%

                      │ base.20.txt  │              new.20.txt              │
                      │     B/op     │     B/op      vs base                │
CodeUnmarshal/BSON-10   4.219Mi ± 0%   4.148Mi ± 0%   -1.69% (p=0.000 n=20)
CodeMarshal/BSON-10     2.818Mi ± 3%   1.630Mi ± 0%  -42.16% (p=0.000 n=20)
geomean                 3.448Mi        2.600Mi       -24.59%

                      │  base.20.txt   │              new.20.txt              │
                      │   allocs/op    │  allocs/op   vs base                 │
CodeUnmarshal/BSON-10      230.4k ± 0%   220.7k ± 0%    -4.21% (p=0.000 n=20)
CodeMarshal/BSON-10     94066.000 ± 0%    1.000 ± 0%  -100.00% (p=0.000 n=20)
geomean                    147.2k         469.7        -99.68%

Background & Motivation

N/A

This commit eliminates all allocations during marshalling (except for the final []byte slice allocation, which is unavoidable). This matches the behavior of encoding/json. It also prevents the vwPool from leaking writers and includes a few small optimizations to the value_writer. Other Changes: * bsoncodec: reduce use of reflect.Value.Interface() * bson: use a buffer pool instead of SliceWriter to eliminate 2 allocs * bsonrw: use references to slice indexes This work builds off of mongodb#1313 and uses the BenchmarkCode* suite added by that PR. The combined performance improvement is below: ``` goos: darwin goarch: arm64 pkg: go.mongodb.org/mongo-driver/bson │ base.20.txt │ new.20.txt │ │ sec/op │ sec/op vs base │ CodeUnmarshal/BSON-10 3.177m ± 1% 2.202m ± 1% -30.68% (p=0.000 n=20) CodeMarshal/BSON-10 2922.7µ ± 0% 757.1µ ± 2% -74.10% (p=0.000 n=20) geomean 3.047m 1.291m -57.63% │ base.20.txt │ new.20.txt │ │ B/s │ B/s vs base │ CodeUnmarshal/BSON-10 582.5Mi ± 1% 840.4Mi ± 1% +44.26% (p=0.000 n=20) CodeMarshal/BSON-10 633.2Mi ± 0% 2444.3Mi ± 2% +286.03% (p=0.000 n=20) geomean 607.3Mi 1.400Gi +135.99% │ base.20.txt │ new.20.txt │ │ B/op │ B/op vs base │ CodeUnmarshal/BSON-10 4.219Mi ± 0% 4.148Mi ± 0% -1.69% (p=0.000 n=20) CodeMarshal/BSON-10 2.818Mi ± 3% 1.630Mi ± 0% -42.16% (p=0.000 n=20) geomean 3.448Mi 2.600Mi -24.59% │ base.20.txt │ new.20.txt │ │ allocs/op │ allocs/op vs base │ CodeUnmarshal/BSON-10 230.4k ± 0% 220.7k ± 0% -4.21% (p=0.000 n=20) CodeMarshal/BSON-10 94066.000 ± 0% 1.000 ± 0% -100.00% (p=0.000 n=20) geomean 147.2k 469.7 -99.68% ```

matthewdale

More awesome improvements! A few suggested changes.

matthewdale · 2023-08-22T23:47:53Z

bson/bsonrw/value_writer.go

@@ -612,5 +611,21 @@ func (vw *valueWriter) writeLength() error {
 }

 func isValidCString(cs string) bool {
-	return !strings.ContainsRune(cs, '\x00')
+	return strings.IndexByte(cs, 0) == -1


Optional: Consider adding a comment about why we can check individual bytes rather than runes.

Suggested change

return strings.IndexByte(cs, 0) == -1

// Disallow the zero byte in a cstring because the zero byte is used as the

// terminating character. It's safe to check bytes instead of runes because

// all multibyte UTF-8 code points start with "11xxxxxx" or "10xxxxxx", so

// "00000000" will never be part of a multibyte UTF-8 code point.

return strings.IndexByte(cs, 0) == -1

Real reason is that '\x00' is a byte (it equals 0 not 00000000) and since it is less that MaxRune (0x80) it just calls IndexByte - this change just omits an unnecessary function call (and one that can't currently be inlined because its complexity score exceeds what is allowed for inlining).

I think we're both saying the same thing. 0x80 (hex) == 10000000 (binary). I can update the comment with a reference to the corresponding code in strings.ContainsRune for additional clarity.

Ah, I thought you meant 10000000 as hex or a 32bit rune here since that's what IndexRune takes.

matthewdale · 2023-08-22T23:56:40Z

bson/bsonrw/value_writer.go

 		if !isValidCString(key) {
 			return errors.New("BSON element key cannot contain null bytes")
 		}
-
-		vw.buf = bsoncore.AppendHeader(vw.buf, t, key)
+		vw.appendHeader(t, key)
 	case mValue:
 		// TODO: Do this with a cache of the first 1000 or so array keys.


Optional: This TODO is unnecessary because strconv already does this for us. Consider removing it.

Suggested change

// TODO: Do this with a cache of the first 1000 or so array keys.

matthewdale · 2023-08-23T00:05:35Z

bson/marshal.go

-	sw := new(bsonrw.SliceWriter)
-	*sw = dst
+	sw := bufPool.Get().(*bytes.Buffer)
+	defer bufPool.Put(sw)


We should check the size and utilization of of sw and only recycle it if it's not too large and not underutilized.

E.g.

defer func() { if sw.Cap() < 16*1024*1024 && sw.Cap()/2 < sw.Len() { bufPool.Put(sw) } }()

See the similar code in Operation.Execute for details:

mongo-go-driver/x/mongo/driver/operation.go

Lines 519 to 531 in 99d53d6

defer func() {

// Proper usage of a sync.Pool requires each entry to have approximately the same memory

// cost. To obtain this property when the stored type contains a variably-sized buffer,

// we add a hard limit on the maximum buffer to place back in the pool. We limit the

// size to 16MiB because that's the maximum wire message size supported by MongoDB.

//

// Comment copied from https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/fmt/print.go;l=147

//

// Recycle byte slices that are smaller than 16MiB and at least half occupied.

if c := cap(*wm); c < 16*1024*1024 && c/2 < len(*wm) {

memoryPool.Put(wm)

}

}()

matthewdale · 2023-08-23T00:07:12Z

bson/bsonrw/value_reader.go

@@ -794,6 +793,15 @@ func (vr *valueReader) readByte() (byte, error) {
 	return vr.d[vr.offset-1], nil
 }

+func (vr *valueReader) consumeCString() error {


Optional: Consider the slightly more descriptive name skipCString.

matthewdale

Looks good 👍

bson/bsoncodec/struct_codec.go

charlievieth · 2023-09-05T18:28:34Z

bson/marshal.go

+			bufPool.Put(sw)
+		}
+	}()
+


This is normally best practice, but the Go standard library's "encoding/json" package does not perform this check when returning *encodeState to its pool (encoding/json/encode.go#L268-L279). I assume the reasoning here is that a program that encodes large structs typically does so frequently. An example here would be a user that stores large documents in their Mongodb database - most often many of the blobs will be of similar size so if some are large - then many of the other documents will be large as well.

If we do keep this logic then we should probably only check the capacity since I'm not sure what the check for sw.Cap()/2 < sw.Len() buys us and may lead to thrashing when dealing with a mix of small and large BSON documents.

Yep, the current "encoding/json" library doesn't do any pooled buffer size checking, which is actually the subject of golang/go#27735. We got the idea to measure buffer utilization as a simple way to prevent holding onto a bunch of mostly unused memory from that issue.

Under ideal usage patterns where messages are all the same size, the pooled memory will be mostly utilized. The pathological case is when there are mostly small messages with a few very large messages. That can cause the pooled buffers to grow very large, holding onto a lot of memory that is underutilized by the mostly small messages. That issue happened for some customers after we added buffer pooling to the operation logic and lead to GODRIVER-2677. The Go "encoding/json" library can currently also suffer from the same memory pooling problem.

One possible solution is to keep "buckets" of buffers of different size, so large buffers are used for large messages and small buffers are used for small messages. However, the implementation for that is relatively complex, so we decided to go with the simpler solution of setting a maximum buffer size and minimum utilization inspired by this comment. The trade off is holding onto less unused memory but possibly allocating memory more often for large messages.

Thank you for pointing out that issue. I think the easiest safe thing to do would be to increase the size limit of the buffers that we'll pool to something like 32mb (2x the max size of BSON docs or even 24mb) and remove the check to see if half the capacity is used. The motivation here is that due to how bytes.Buffer grows a Buffer with a length over 8mb could easily have a cap greater than 16mb.

We should also check if the size is equal to or less than whatever limit we pick.. Also, this isn't that big of a deal since the difference here will likely only be an allocation or two.

charlievieth · 2023-09-05T19:30:12Z

SGTM

…

On Tue, Sep 5, 2023 at 3:29 PM Matt Dale ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In bson/bsoncodec/struct_codec.go <#1323 (comment)> : > @@ -393,7 +393,7 @@ func (sc *StructCodec) DecodeValue(dc DecodeContext, vr bsonrw.ValueReader, val func isZero(v reflect.Value, omitZeroStruct bool) bool { kind := v.Kind() - if (kind != reflect.Pointer || !v.IsNil()) && v.Type().Implements(tZeroer) { + if (kind != reflect.Ptr || !v.IsNil()) && v.Type().Implements(tZeroer) { You're correct. However, the Go driver currently supports a min Go version of 1.13, which doesn't have the new Pointer symbol, so we have to use the old Ptr symbol. — Reply to this email directly, view it on GitHub <#1323 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFGHBWILJZOWNFJ4TDUTETXY54QLANCNFSM6AAAAAA2LAQCAY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

matthewdale

Looks good 👍

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com> Co-authored-by: Qingyang Hu <103950869+qingyang-hu@users.noreply.github.com>

charlievieth force-pushed the cev/bson-allocs branch from 2642dac to 4b0fe59 Compare July 15, 2023 03:20

esha-bhargava requested review from qingyang-hu and a team July 17, 2023 19:39

matthewdale requested review from matthewdale and removed request for a team July 26, 2023 18:44

matthewdale force-pushed the cev/bson-allocs branch from 4b0fe59 to f9c485a Compare August 22, 2023 22:51

matthewdale requested a review from a team as a code owner August 22, 2023 22:51

matthewdale had a problem deploying to api-report August 22, 2023 22:51 — with GitHub Actions Failure

matthewdale removed the request for review from a team August 22, 2023 22:52

matthewdale reviewed Aug 23, 2023

View reviewed changes

qingyang-hu previously approved these changes Aug 28, 2023

View reviewed changes

Code review changes.

4beb9fe

matthewdale had a problem deploying to api-report September 5, 2023 17:37 — with GitHub Actions Failure

matthewdale previously approved these changes Sep 5, 2023

View reviewed changes

qingyang-hu changed the title ~~bsoncodec/bsonrw: eliminate encoding allocations~~ GODRIVER-2914 bsoncodec/bsonrw: eliminate encoding allocations Sep 5, 2023

Replace reflect.Pointer with reflect.Ptr

80680c8

matthewdale had a problem deploying to api-report September 5, 2023 18:02 — with GitHub Actions Failure

charlievieth commented Sep 5, 2023

View reviewed changes

bson/bsoncodec/struct_codec.go Show resolved Hide resolved

charlievieth commented Sep 5, 2023

View reviewed changes

Improve comment in isValidCString.

603bb4a

matthewdale had a problem deploying to api-report September 6, 2023 19:18 — with GitHub Actions Failure

charlievieth dismissed stale reviews from matthewdale and qingyang-hu via 603bb4a September 8, 2023 21:40

Merge branch 'master' into cev/bson-allocs

6c41835

qingyang-hu had a problem deploying to api-report September 11, 2023 15:17 — with GitHub Actions Failure

qingyang-hu approved these changes Sep 12, 2023

View reviewed changes

matthewdale approved these changes Sep 12, 2023

View reviewed changes

qingyang-hu had a problem deploying to api-report September 12, 2023 18:01 — with GitHub Actions Failure

blink1073 added this pull request to the merge queue Sep 12, 2023

Merged via the queue into mongodb:master with commit 5a5a231 Sep 12, 2023
20 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GODRIVER-2914 bsoncodec/bsonrw: eliminate encoding allocations #1323

GODRIVER-2914 bsoncodec/bsonrw: eliminate encoding allocations #1323

charlievieth commented Jul 15, 2023 •

edited by qingyang-hu

matthewdale left a comment •

edited

matthewdale Aug 22, 2023

charlievieth Sep 5, 2023

matthewdale Sep 6, 2023

charlievieth Sep 6, 2023

matthewdale Aug 22, 2023

matthewdale Aug 23, 2023

matthewdale Aug 23, 2023

matthewdale left a comment

charlievieth Sep 5, 2023

matthewdale Sep 5, 2023 •

edited

charlievieth Sep 6, 2023

charlievieth commented Sep 5, 2023 via email

matthewdale left a comment

-	return strings.IndexByte(cs, 0) == -1
+	// Disallow the zero byte in a cstring because the zero byte is used as the
+	// terminating character. It's safe to check bytes instead of runes because
+	// all multibyte UTF-8 code points start with "11xxxxxx" or "10xxxxxx", so
+	// "00000000" will never be part of a multibyte UTF-8 code point.
+	return strings.IndexByte(cs, 0) == -1

	defer func() {
	// Proper usage of a sync.Pool requires each entry to have approximately the same memory
	// cost. To obtain this property when the stored type contains a variably-sized buffer,
	// we add a hard limit on the maximum buffer to place back in the pool. We limit the
	// size to 16MiB because that's the maximum wire message size supported by MongoDB.
	//
	// Comment copied from https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/fmt/print.go;l=147
	//
	// Recycle byte slices that are smaller than 16MiB and at least half occupied.
	if c := cap(wm); c < 1610241024 && c/2 < len(wm) {
	memoryPool.Put(wm)
	}
	}()

GODRIVER-2914 bsoncodec/bsonrw: eliminate encoding allocations #1323

GODRIVER-2914 bsoncodec/bsonrw: eliminate encoding allocations #1323

Conversation

charlievieth commented Jul 15, 2023 • edited by qingyang-hu

Summary

Background & Motivation

matthewdale left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewdale left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewdale Sep 5, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlievieth commented Sep 5, 2023 via email

matthewdale left a comment

Choose a reason for hiding this comment

charlievieth commented Jul 15, 2023 •

edited by qingyang-hu

matthewdale left a comment •

edited

matthewdale Sep 5, 2023 •

edited