wire: only borrow/return binaryFreeList buffers at the message level #1426

cfromknecht · 2019-05-01T21:06:49Z

This PR optimizes the wire packages serialization of small buffers by minimizing the number of borrow/return round trips during message serialization. Currently the wire package uses a binaryFreeList from which 8-byte buffers are borrowed and returned for the purpose of serializing small integers and varints.

Problem

To understand the problem, consider calling WriteVarInt on a number greater than 0xfc (which requires writing the discriminant and a 2, 4, or 8 byte value following).

For instance, writing 20,000 will invoke PutUint8 and then PutUint16. Expanding this out to examine the message passing, we see:

buffer <- freelist

PutUInt8(buffer, discriminant)
w.Write(buffer)

freelist <- buffer

buffer <- freelist

PutUint16(buffer, value)
w.Write(buffer)

freelist <- buffer

Each <- requires a channel select, which more-or-less bears the performance implication of a mutex. This cost, in addition to need to wake up other goroutines and switch executions, imparts a significant performance penalty. In the context of block serialization, several hundred thousand of these operations may be performed.

Solution

In our example above, we can improve this by only using two <-, one to borrow and one to return, as so:

buffer <- freelist

PutUInt8(buffer, discriminant)
w.Write(buffer)

PutUint16(buffer, value)
w.Write(buffer)

freelist <- buffer

As expected, cutting the number channels sends in half cuts also cuts the latency in half, which can be seen in the benchmarks below for larger VarInts,

The remainder of this PR is to propagate this pattern all the way up to the top level of messages in the wire package, such that deserializing a message only incurs one borrow and one return. Any subroutines are made to conditionally borrow from the binarySerializer if the invoker has not provided them with a buffer, and conditionally return if they indeed were required to borrow.

A good example of how these channel sends/receives can add up is in MsgTx serialization, which is now upwards of 80% faster as a result of these optimizations:

benchmark                         old ns/op     new ns/op     delta
BenchmarkSerializeTx-8            683           142           -79.21%
BenchmarkSerializeTxSmall-8       724           143           -80.25%
BenchmarkSerializeTxLarge-8       1476002       182111        -87.66%

Preliminary Benchmarks

benchmark                         old ns/op     new ns/op     delta
BenchmarkWriteVarInt1-8           71.9          74.6          +3.76%
BenchmarkWriteVarInt3-8           148           70.9          -52.09%
BenchmarkWriteVarInt5-8           149           71.0          -52.35%
BenchmarkWriteVarInt9-8           147           72.8          -50.48%
BenchmarkReadVarInt1-8            78.8          77.9          -1.14%
BenchmarkReadVarInt3-8            159           87.7          -44.84%
BenchmarkReadVarInt5-8            155           88.3          -43.03%
BenchmarkReadVarInt9-8            158           86.9          -45.00%
BenchmarkReadVarStr4-8            120           119           -0.83%
BenchmarkReadVarStr10-8           138           130           -5.80%
BenchmarkWriteVarStr4-8           101           105           +3.96%
BenchmarkWriteVarStr10-8          103           105           +1.94%
BenchmarkReadOutPoint-8           91.3          28.7          -68.57%
BenchmarkWriteOutPoint-8          78.9          10.2          -87.07%
BenchmarkReadTxOut-8              245           118           -51.84%
BenchmarkWriteTxOut-8             151           93.5          -38.08%
BenchmarkReadTxIn-8               338           139           -58.88%
BenchmarkWriteTxIn-8              238           31.0          -86.97%
BenchmarkDeserializeTxSmall-8     1119          586           -47.63%
BenchmarkDeserializeTxLarge-8     2476063       1275815       -48.47%
BenchmarkSerializeTx-8            683           142           -79.21%
BenchmarkSerializeTxSmall-8       724           143           -80.25%
BenchmarkSerializeTxLarge-8       1476002       182111        -87.66%
BenchmarkReadBlockHeader-8        406           69.1          -82.98%
BenchmarkWriteBlockHeader-8       431           21.1          -95.10%
BenchmarkDecodeGetHeaders-8       13537         12238         -9.60%
BenchmarkDecodeHeaders-8          1025275       236709        -76.91%
BenchmarkDecodeGetBlocks-8        13206         11684         -11.53%
BenchmarkDecodeAddr-8             337977        157519        -53.39%
BenchmarkDecodeInv-8              5990935       1898169       -68.32%
BenchmarkDecodeNotFound-8         6285831       1864701       -70.33%
BenchmarkDecodeMerkleBlock-8      4357          2606          -40.19%
BenchmarkTxHash-8                 1928          1222          -36.62%
BenchmarkDoubleHashB-8            993           969           -2.42%
BenchmarkDoubleHashH-8            932           1029          +10.41%

benchmark                         old allocs     new allocs     delta
BenchmarkWriteBlockHeader-8       4              0              -100.00%
// all others remain 0 alloc

benchmark                         old bytes     new bytes     delta
BenchmarkWriteBlockHeader-8       16            0             -100.00%
// all others remain unchanged

Notes

I'm still in the process of going through and adding benchmarks to top-level messages in order to guage the overall performance benefit, expect more to be added at a later point.

There are a few remaining messages which have not yet been optimized, e.g. MsgAlert, MsgVesion, etc. I plan to add those as well but decided to start with the ones that were more performance critical.

jakesylvestre · 2020-03-04T13:12:55Z

@jcvernaleo (as per #1530)

low priority
enhancement

By the way @cfromknecht this is awesome!

cfromknecht · 2021-02-03T17:29:25Z

FWIW I rebased this PR (locally) over #1684 and reran the CalcSighHash benchmarks.
Since calculating the legacy sighash involves serializing a copy of the original transaction,
optimizing our wire encoding speeds up the sighash calculation speed significantly.

This is a comparison of CalcSigHash before and after applying the wire optimizations
to the tokenizer branch:

benchmark                  old ns/op     new ns/op     delta
BenchmarkCalcSigHash-8     3619477       1758789       -51.41%

benchmark                  old allocs     new allocs     delta
BenchmarkCalcSigHash-8     801            712            -11.11%

benchmark                  old bytes     new bytes     delta
BenchmarkCalcSigHash-8     1293354       1290504       -0.22%

Using an 80% transaction serialization speedup to ballpark, this indicates roughly 2/3 of
the latency incurred byCalcSigHash is transaction serialization, and with this change it
becomes 1/4.

coveralls · 2021-02-06T11:18:52Z

Pull Request Test Coverage Report for Build 541264840

572 of 614 (93.16%) changed or added relevant lines in 22 files are covered.
45 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.3%) to 53.726%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
wire/msggetcfcheckpt.go	12	14	85.71%
wire/msgcfilter.go	14	18	77.78%
wire/msggetcfheaders.go	16	20	80.0%
wire/msggetcfilters.go	16	20	80.0%
wire/msgtx.go	94	99	94.95%
wire/msgcfcheckpt.go	14	22	63.64%
wire/msgcfheaders.go	18	33	54.55%

Files with Coverage Reduction	New Missed Lines	%
wire/msgtx.go	1	94.87%
peer/peer.go	2	76.18%
btcec/signature.go	6	91.64%
wire/common.go	36	91.91%

Totals
Change from base Build 535940167:	0.3%
Covered Lines:	21180
Relevant Lines:	39422

💛 - Coveralls

onyb · 2021-11-18T17:46:15Z

@cfromknecht Could you please rebase this PR, since #1769 landed recently?

otech47 · 2022-11-07T01:56:23Z

wire/msgtx.go

-	freeListMaxItems = 12500
+	freeListMaxItems = 125


Does the comment need to be updated to explain the new context for this value?

Roasbeef · 2023-12-16T00:42:26Z

Rebased with #2073

cfromknecht force-pushed the wire-opts branch from 8353a31 to 9442c96 Compare May 1, 2019 22:09

cfromknecht mentioned this pull request Aug 7, 2019

tlv: add library for new message/payload serialization format lightningnetwork/lnd#3061

Merged

cfromknecht force-pushed the wire-opts branch from 9442c96 to dacbb65 Compare January 25, 2020 03:47

jcvernaleo added enhancement low priority labels Mar 4, 2020

Roasbeef added IDB optimization IBD optimization and removed IDB optimization labels Feb 4, 2021

cfromknecht mentioned this pull request Feb 4, 2021

wire/common: Implement binarySerializer based on sync.Pool #1686

Open

cfromknecht added 19 commits February 5, 2021 10:31

wire/bench_test: report allocs in benchmarks

8d005ab

wire/bench: add witness block

92351be

wire/common: optimize Read/WriteVarInt

e766de6

wire: introduce Read/WriteVarIntBuf to reuse buffers between invocations

a6da5b3

wire/msgtx: use Read/WriteVarIntBuf in tx serialization

35ac17f

wire/msgtx: reuse tx-level buffer for version and locktime

904f63c

wire/common: add optimized Read/WriteVarBytesBuf

292e8f8

wire/msgtx: introduce optimized read/writeOutPointBuf

155d751

wire/msgtx: introduce optimized writeTxInBuf

d24e5c7

wire/msgtx: use writeTxInBuf in txn encoding

05b0ba0

wire/msgtx: introduce optimized readScriptBuf

ede651d

wire/msgtx: introduce optimized readTxInBuf

15d2e47

wire/msgtx: use readTxInBuf in txn serialization

1f8a45b

wire/msgtx: introduce optimized WriteTxOutBuf

84d80e0

wire/msgtx: use WriteTxOutBuf in txn serialization

ac4db9d

wire/msgtx: introduce optimized readTxOutBuf

080cad5

wire/msgtx: use readTxOutBuf in txn serialization

79976d0

wire/msgtx: introduce optimized writeTxWitnessBuf

56f1965

wire/msgtx: use writeTxWitnessBuf in txn serialization

11fe696

cfromknecht added 18 commits February 5, 2021 10:31

wiree/msggetcfilters: optimize serialization by reusing small buffers

6d3d4b2

wire/msgcfilter: optimize serialization by reusing small buffers

af9fdaa

wire/msgnotfound: optimize serialization by reusing small buffers

a1c1620

wire/invvect: remove unused readInvVect and writeInvVect

33c8877

wire/common: add optimized writeVarStrBuf an readVarStrBuf

6147d8b

wire/msgreject: optimize serialization by reusing small buffers

49aba92

wire/netaddress: add optimiezed read/writeNetAddressBuf

518fa70

wire/msgmerkleblock: optimize serialization by reusing small buffers

f4388ef

wire/msgping: remove usage for read/writeElement

a5affa7

wire/msgpong: remove usage of read/writeElement

08c24c9

wire/msgtx: remove unused writeTxWitness

b92f8e2

wire/msgtx: remove unused readTxOut

0b121ec

wire/msgtx: remove unused writeTxIn

055a6a8

wire/msgtx: remove unused readTxIn

0b67c05

wire/msgtx: remove unused readScript

49ac379

wire/msgtx: remove unused read/writeOutPoint

512571b

wire/msgtx: use tx-level script slab

e7ed275

wire/msgblock+msgtx: user block-level script slab

86ba486

cfromknecht force-pushed the wire-opts branch from dacbb65 to 86ba486 Compare February 5, 2021 18:32

onyb mentioned this pull request Feb 6, 2021

txscript: backport tokenizer from dcrd #1684

Closed

1 task

onyb added this to Pending reviews in btcd-dev Feb 9, 2021

Roasbeef mentioned this pull request Nov 4, 2022

Explore possible refactors or additional test cases in btcd/wire/msgtx.go #1915

Open

otech47 reviewed Nov 7, 2022

View reviewed changes

wire/msgtx.go

freeListMaxItems = 12500

freeListMaxItems = 125

Copy link

otech47 Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the comment need to be updated to explain the new context for this value?

Roasbeef mentioned this pull request Aug 9, 2023

wire: cache the non-witness serialization of MsgTx to memoize part of… #1376

Closed

Roasbeef mentioned this pull request Dec 16, 2023

wire: only borrow/return binaryFreeList buffers at the message level #2073

Merged

Roasbeef closed this Dec 16, 2023

btcd-dev automation moved this from Pending reviews to Done Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wire: only borrow/return binaryFreeList buffers at the message level #1426

wire: only borrow/return binaryFreeList buffers at the message level #1426

cfromknecht commented May 1, 2019 •

edited

jakesylvestre commented Mar 4, 2020

cfromknecht commented Feb 3, 2021 •

edited

coveralls commented Feb 6, 2021

onyb commented Nov 18, 2021

otech47 Nov 7, 2022

Roasbeef commented Dec 16, 2023

wire: only borrow/return binaryFreeList buffers at the message level #1426

wire: only borrow/return binaryFreeList buffers at the message level #1426

Conversation

cfromknecht commented May 1, 2019 • edited

Problem

Solution

Preliminary Benchmarks

Notes

jakesylvestre commented Mar 4, 2020

cfromknecht commented Feb 3, 2021 • edited

coveralls commented Feb 6, 2021

Pull Request Test Coverage Report for Build 541264840

💛 - Coveralls

onyb commented Nov 18, 2021

otech47 Nov 7, 2022

Choose a reason for hiding this comment

Roasbeef commented Dec 16, 2023

cfromknecht commented May 1, 2019 •

edited

cfromknecht commented Feb 3, 2021 •

edited