Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wire: cache the non-witness serialization of MsgTx to memoize part of… #1376

Closed
wants to merge 4 commits into from

Conversation

Roasbeef
Copy link
Member

… TxHash

In this commit, we add a new field to the MsgTx struct:
cachedSeralizedNoWitness. As we decode the main transaction, we use an
io.TeeReader to copy over the non-witness bytes into this new field.
As a result, we can fully cache all tx serialization when computing the
TxHash. This has been shown to show up on profiles during IBD. Caching
this value allows us to optimize TxHash calculation across the entire
daemon as a whole.

@Roasbeef
Copy link
Member Author

There's a bug here in that it doesn't detect mutations in the underlying *wire.MsgTx so if the value is changed at all, then we don't update the serialization. In the current pipeline for block validation, this doesn't manifest, but some of our tests do this which introduces some of the new failures in Travis.

@jakesylvestre
Copy link
Collaborator

@jcvernaleo (as per #1530)

  • Low priority
  • Outdated (needs updates to be merged)

@Roasbeef
Copy link
Member Author

Here's a recent run late in the chain:

Screenshot 2023-03-13 at 4 40 00 PM

From this we can see that when checking signatures, we actually spend most of our time just encoding the tx again to compute the sighash.

Here's a flame graph for another view:

Screenshot 2023-03-13 at 4 42 30 PM

cc @kcalvinalvin

@kcalvinalvin
Copy link
Collaborator

I need a refresher on this but if memory serves right, doing away with the channel stuff in binaryFreeList and using a sync.Pool got rid of a big chunk of the time taken.

From this we can see that when checking signatures, we actually spend most of our time just encoding the tx again to compute the sighash.

But yeah if this is something that doesn't need to happen in the first place with memoization that'd be much better.

@Roasbeef
Copy link
Member Author

Roasbeef commented Aug 9, 2023

Recent observation of the impact here:
Screenshot 2023-08-09 at 4 45 09 PM

@Roasbeef
Copy link
Member Author

Roasbeef commented Aug 9, 2023

Related PR re binary free list: #1426

… TxHash

In this commit, we add a new field to the `MsgTx` struct:
`cachedSeralizedNoWitness`. As we decode the main transaction, we use an
`io.TeeReader` to copy over the non-witness bytes into this new field.
As a result, we can fully cache all tx serialization when computing the
TxHash. This has been shown to show up on profiles during IBD. Caching
this value allows us to optimize TxHash calculation across the entire
daemon as a whole.
@Roasbeef
Copy link
Member Author

Rebased!

@Roasbeef
Copy link
Member Author

Roasbeef commented Dec 5, 2023

Dug a bit, and I realize the issue is in the test itself, it mutates transactions, with stuff like replaceSpendScript. So what we need here (for the test's sake) is that we have a method to wipe the cached serialization if something is mutated. Alternatively, we have some internal map that uses the MsgTx as a value, which is consulted to see if the transaction has changed since last time.

@Roasbeef
Copy link
Member Author

Roasbeef commented Dec 5, 2023

Pushed a tx implementing the above, don't think it's ideal though...worried about weird edge cases where transaction generation code uses a tx as a scratch pad and makes multiple versions to sign, with this it'll give the wrong txid after the first instance.

Also added a commit that'll re-use the non-witness serialization for SerializeNoWitness.

Copy link
Collaborator

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked in both btcd and lnd to see if we modify wire.MsgTx after calculating the hash, but couldn't find any instances. The code looks good, but I think we need to be extra careful and see if there are any instances across our codebases

@@ -877,6 +877,10 @@ func (msg *MsgTx) Serialize(w io.Writer) error {
// Serialize, however even if the source transaction has inputs with witness
// data, the old serialization format will still be used.
func (msg *MsgTx) SerializeNoWitness(w io.Writer) error {
if msg.cachedSeralizedNoWitness != nil {
w.Write(msg.cachedSeralizedNoWitness)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing return?

// the rawTxTeeReader here, as these are segwit specific bytes.
var (
flag [1]byte
hasWitneess bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/hasWitneess/hasWitness

@guggero guggero self-requested a review December 8, 2023 19:09
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the elegance of using a io.TeeReader (TIL by the way), I'm not sure if this would break a bunch of things outside of btcd, namely in lnd and related projects.

What if we make the caching optional with an additional struct member boolean (or if memory footprint is a concern, use a bit within the Version field) that turns the caching on?
That way we could both benefit from the optimization within btcd while not breaking any assumptions for outside code.


// Write out the actual number of inputs as this won't be the very byte
// series after the versino of segwit transactions.
if WriteVarInt(&rawTxBuf, pver, count); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add err := WriteVarInt().

}

// Write out the actual number of inputs as this won't be the very byte
// series after the versino of segwit transactions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/versino/version

// this transaction without witness data. When we decode a transaction,
// we'll write out the non-witness bytes to this so we can quickly
// calculate the TxHash later if needed.
cachedSeralizedNoWitness []byte
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/cachedSeralizedNoWitness/cachedSerializedNoWitness/

// useful to be able to get the correct txid after mutating a transaction's
// state.
func (msg *MsgTx) WipeCache() {
msg.cachedSeralizedNoWitness = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need to call this in the helper methods like AddTxIn() and AddTxOut()? Or is the assumption that TxHash() would in practice only be called once the transaction is fully built, so the cache doesn't need to be invalidated?

I'm mostly worrying about uses of MsgTx outside of btcd, where I'm not sure we can 100% guarantee that we're always using this pattern...

@Roasbeef
Copy link
Member Author

What if we make the caching optional with an additional struct member boolean (or if memory footprint is a concern, use a bit within the Version field) that turns the caching on?

Not a bad idea, agree that this could have a lot of unintended consequences. The other approach uses btcutil.Tx which gives another route to implement something like this.

@Roasbeef
Copy link
Member Author

Closing in favor of #2023, see this comment for some details: #2023 (comment)

TL;DR: I think we need to remove/optimize the binary free list instead.

@Roasbeef Roasbeef closed this Dec 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants