New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Taproot BIP341 signature message and create a unified sighash cache for legacy, segwit and taproot inputs #628
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having a single sighash cache for all output types. It is a breaking change, but it appears to be cleaner and could also be more efficient when we are spending segwitv0 and taproot outputs together(as implemented in bitcoin core).
let cache = SighashCache::new(&tx);
let legacy_hash = cache.sighash_legacy();
let segwitv0_sighash = cache.sighash_segwitv0(amt, script..);
let taphash = cache.sighash_taproot(&prevouts);
it is surely cleaner especially because it is handier to use (in case of mixed inputs it allows to instantiate only one sighash cache) but in what ways is it more efficient? The hashes are differents (both in order and also single instead of double) I agree we should do this, I am trying to understand if I am missing something |
a783cea
to
8098975
Compare
Yes, you can cache one round of sha256 computation. segwitv0 sighash uses double sha256. So, while calculating hashPrevouts for segwitv0 instead of hashing the prevouts again, we hash the taproot prevouts_sha256 hash once to obtain The code from bitcoin core
|
That being said, I am also okay if we decide that the code complexity of this particular optimization is not worth the benefit.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this :)
Code review ACK 8098975.
- My vote is for adding a new SigHashType for Taproot. Initially, it seemed a bit wasteful at first to have a new SigHashType for every new sighash update(we will have another enum for ANYPREVOUT). But I am convinced that this is the correct way to proceed.
- And a single cache for all signature hashes(legacy, segwitv0, and taproot) with a new name and optionally mark the current
bip143::SighashCache
as deprecated.
Thanks for the review!
Still TODO
apart from deprecation, latest commits implement the unified cache with legacy, segwitv0 and taproot |
The latest changes should include all feedback received (deprecation of I am going to wait for @sanket1729 feedback round, if mostly ok, I think I am going to squash the commits for easier review and remove draft state (probably 3 commits: add encodable to sha256::Hash, add sighash, deprecate bip143::sighash) |
src/util/bip341.rs
Outdated
}; | ||
self.common_cache = Some(cache); | ||
} | ||
self.common_cache.as_ref().unwrap() // safe to unwrap because we checked is_none() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not scared of unwraps
, but we can avoid it with get_or_insert_with
as done in bip143.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here and everywhere we call this unwrap() after is_none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it for the following reasons:
- the
segwit_cache()
method is now calling thecommon_cache()
method that takes&self
, so it's not possible anymore to borrow and mutable borrow at the same time (before was possible because we were borrowing and mutable borrowing on different struct fields). common_cache()
andtaproot_cache()
could indeed useget_or_insert_with()
but it requires let bindings before the call that are done even in the cache hit phase, so I think it's a little better to avoid those (unless someone tells me they are optimized away by the compiler).- since it seems
segwit_cache()
cannot be implemented withget_or_insert_with()
, I thought to be coherent with other methods
Unrelated to the get_or_insert_with()
discussion, the method before was returning owned values, copying 32 bytes every time, while now it returns only a borrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't believe there's performance loss. It is also possible to use get_or_insert_with
in segwit_cache
with some refactoring - see last commit of RCasatta#3
src/util/sighash.rs
Outdated
/// Encode the BIP143 signing data for any flag type into a given object implementing a | ||
/// [std::io::Write] trait. | ||
pub fn segwit_encode_signing_data_to<Write: io::Write>( | ||
&mut self, | ||
mut writer: Write, | ||
input_index: usize, | ||
script_code: &Script, | ||
value: u64, | ||
prevouts: &Prevouts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep this as script_code
and u64
. In p2wsh, the script_code is not the script_pubkey
, but the witness_script
. So, this code would give incorrect results.
In case, the witness_script
contains OP_CODESEP
it best for the user to supply the script_code
parameter.
This is also consistent with how this is defined in bip141.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 -- note that we could also provide a (version with) a codesep argument if we don't want to have to clone the script, or provide a Script Slice (equivalent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept segwit method in line with existent in e76e5e1
@JeremyRubin it is already a borrowed Script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codesep in segwit requires a script slice -- who owns the script presently? If you don't have a codesep argument OR a script slice (and compute sep ext), then somewhere you're allocating extra.
src/util/bip341.rs
Outdated
fn get(&self, input_index: usize) -> Result<&TxOut, Error> { | ||
match self { | ||
Prevouts::Anyone(index, prevout) => { | ||
if input_index == *index { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIP-118 doesn;'t commit to the input_index, so this might not be robust?
Maybe make get take an Option index and only check if it's provided
src/util/bip341.rs
Outdated
common_cache: Option<CommonCache>, | ||
|
||
/// Cache for taproot v1 inputs | ||
taproot_cache: Option<TaprootCache>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might as well just always cache everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As bitcoin core is doing it's more efficient to separate the cache this way so you can compute only what you need and reuse it according to the type of the inputs.
- for legacy: no cache is computed
- for taproot only: no segwit_cache is computed
- for segwit only: no taproot_cache is computed
Even if we don't have the same efficiency requirement as bitcoin core I think it's nice to have (and also @sanket1729 have a preference for that #628 (comment))
/// Hashtype of an input's signature, encoded in the last byte of the signature | ||
/// Fixed values so they can be casted as integer types for encoding | ||
#[derive(PartialEq, Eq, Debug, Copy, Clone)] | ||
pub enum SigHashType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a 2 level enum where we have
enum SigHashType {
Legacy,
SegwitV0,
SegwitV1,
}
to make clear contexts where modes do not have/have different meanings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just have different types of SigHashTypeLegacy
, SighashTypeSegwitv0
, `SigHashTypeTaproot.
In what use case would you need a level 2 enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's OK -- I mean more that there should not be a "SigHashType" general enum that From coerces into other SigHashType* things. Espeically once we get into BIP-118, it would switch into a TryFrom because certain SigHashTypes would not be proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just have different types of SigHashTypeLegacy, SighashTypeSegwitv0, SigHashTypeTaproot.
why different types for legacy and segwit?
Is it a very bad idea to add a RESERVED
variant to the sighash::SigHashType
introduced in this PR to accommodate SIGHASH_PREVOUT when it's time?
#[non_exhaustive]
would be better but it's not in our MSRV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I misread your comment initially for something else. I am not against adding the reserved variant, but I don't know if how effectively it serves the same purpose as #[non-exhaustive]
.
I will let other contributors comment on this, I am not sure if we should so much forward-thinking about OP_ANYPREVOUT
right now. The current spec for BIP118 is itself a major change in PublicKey API because it introduces new key types. Maybe additionally doing this is that not that bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if I can better specify my position, I think that that one should be able to make a "pure" function:
fn get_sighash(tx:&TransactionData, sighash: SigHashType) -> Hash
which computes the correct digest with no additional instructions. Since legacy, segwitv0, and segwitv1 all compute different things, unifying the type as one SigHashType
means that we need additional context, so
fn get_sighash_legacy(tx:&TransactionData, sighash: SigHashType) -> Hash
fn get_sighash_v0(tx:&TransactionData, sighash: SigHashType) -> Hash
fn get_sighash_v1(tx:&TransactionData, sighash: SigHashType) -> Hash
This is already sort of the case because v1 requires more context to sign, but I imagine the APIs will converge on having the superset of data available for any case (v1/v0/legacy). I think you in theory could parse what digest version is required from the TxData, but it'd be nice if SigHashType were an explicit & unique digest instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree this isn't a huge issue for now, just sounding the warning that BIP-118's plans do include diverging sighash types where there will be different things available at different times, and I think it is something we can handle relatively elegantly by being more explicit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non_exhaustive
does not work on current MSRV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code review ACK a37de1a.
I will review the tests carefully later, but I am super happy with the current API design. We can squash this and open it up for feedback from other contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing segwit_signature_hash()
to segwitv0_signature_hash()
?
a37de1a
to
877dc47
Compare
I squashed the commits, in comparison to the pre-squash situation I left legacy and tests in their place with a trick for legacy tests to tests also the new API code paths (while the segwit tests are left there since the I also added the proposed
I am not super in love with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 877dc47. Ran tests, checked that tests have covered all paths of interest, all sighash types, annex, scriptpaths.
The only thing I am unsure about is the effectiveness of the Reserved
variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some nits, but generally utACK 877dc47
…references, clarify legacy, non_exhaustive comment, remove std::
24d3113
to
c704ee7
Compare
Sorry @dr-orlovsky , @sgeisler , @sanket1729 for another dismissed review but the change proposed by @Kixunil is superior (especially since there were too many comments on why unwrapping before) and also the Since I think the above two changes were important I fixed also documentations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
/// Encode the BIP143 signing data for any flag type into a given object implementing a | ||
/// [`std::io::Write`] trait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
/// [`std::io::Write`] trait. | |
/// [`io::Write`] trait. |
as was specified above (to support no_std)
} | ||
|
||
/// Encode the legacy signing data for any flag type into a given object implementing a | ||
/// [`std::io::Write`] trait. Internally calls [`Transaction::encode_signing_data_to`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibid
common_cache: &'a mut Option<CommonCache>, | ||
tx: &R, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we passing these two arguments when we can just use self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't because self.segwit_cache
may be borrowed during call to this function which prevents us from borrowing self
mutably. The name of the function is supposed to indicate it but maybe it's not clear enough? Have an idea for a better one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see. Yes, I do not know any other workaround
for txin in tx.input.iter() { | ||
txin.previous_output | ||
.consensus_encode(&mut enc_prevouts) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it will be better to have an expect
here
txin.previous_output | ||
.consensus_encode(&mut enc_prevouts) | ||
.unwrap(); | ||
txin.sequence.consensus_encode(&mut enc_sequences).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibid
outputs: { | ||
let mut enc = sha256::Hash::engine(); | ||
for txout in tx.output.iter() { | ||
txout.consensus_encode(&mut enc).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibid
common_cache: &'a mut Option<CommonCache>, | ||
tx: &R, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see. Yes, I do not know any other workaround
@apoelstra we already have to ACKs here, but taking into account the importance of the code, probably it will be up to you to add the third ACK and merge the PR |
impl<R: DerefMut<Target=Transaction>> SigHashCache<R> { | ||
/// When the SigHashCache is initialized with a mutable reference to a transaction instead of a | ||
/// regular reference, this method is available to allow modification to the witnesses. | ||
/// | ||
/// This allows in-line signing such as | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment belongs above line 163
@@ -263,13 +181,14 @@ impl<R: DerefMut<Target=Transaction>> SigHashCache<R> { | |||
/// } | |||
/// ``` | |||
pub fn access_witness(&mut self, input_index: usize) -> &mut Vec<Vec<u8>> { | |||
&mut self.tx.input[input_index].witness | |||
self.cache.witness_mut(input_index).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe you can just define an empty vec in the Self type that gets returned if input_index is out of bound. It's better than the unchecked access, and perhaps better than adding an unwrap / panic.
definitely a nit though, as this is deprecated code anyways.
sequences: sha256::Hash, | ||
|
||
/// in theory, `outputs` could be `Option` since `NONE` and `SINGLE` doesn't need it, but since | ||
/// `ALL` is the mostly used variant by large, we don't bother |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the same comment is true for prevouts/sequences with ANYONECANPAY, no?
|
||
/// Values cached for segwit inputs, it's equal to [`CommonCache`] plus another round of `sha256` | ||
#[derive(Debug)] | ||
struct SegwitCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SegwitV0Cache, taproot is a segwit too?
/// `One` variant allows to provide the single Prevout needed. It's useful for example | ||
/// when modifier `ANYONECANPAY` is provided, only prevout of the current input is needed. | ||
/// The first `usize` argument is the input index this [`TxOut`] is referring to. | ||
One(usize, &'u TxOut), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in theory usize is redundant with &'u TxOut + Transaction as you can do the pointer math to compute where the TxOut lives in the out vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this, what vector are you talkng about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess i'm (incorrectly?) assuming that there is some backing &'u [TxOut]
that this txout is resident from which the lifetime derives, (where ones we don't have are default() nulls), but i suppose that we might just have a single TxOut fetched in some contexts and not want a whole array, or that having a ref to the array would prevent some clever uses of splits and whatnot on that storage to hand out mutable and immutable references simultaneously...
One(usize, &'u TxOut), | ||
/// When `ANYONECANPAY` is not provided, or the caller is handy giving all prevouts so the same | ||
/// variable can be used for multiple inputs. | ||
All(&'u [TxOut]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a slice is the wrong type, we want a reference to the underlying storage to ensure we have the whole slice, not just partial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly disagree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to make this API safe would be via a smart constructor. E.g.
pub enum Prevouts<'u> {
//...
All(SafeSliceAll<'u>)
}
use private::SafeSliceAll;
mod private {
// self.0 not pub so can only be constructed locally.
pub struct SafeSliceAll<'u>(&'u [TxOut]);
impl<'u> From<&'u Vec<TxOut>> for SafeSliceAll<'u> {
//....
}
}
Allowing to construct an All from a arbitrary slice is error prone IMO since you can't guarantee it's actually all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to use such a smart constructor type because pub enums require all fields of variants are pub iirc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the user would be required to have the txouts all in a Vec
like this.
|
||
impl<'u> Prevouts<'u> { | ||
fn check_all(&self, tx: &Transaction) -> Result<(), Error> { | ||
if let Prevouts::All(prevouts) = self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as noted earlier, a non-slice type would eliminate this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a check, it's a destructuring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the check is the line below
CR-ack; some nits nothing needing fixing IMO |
/// 0x3: Sign the output whose index matches this input's index. If none exists, | ||
/// sign the hash `0000000000000000000000000000000000000000000000000000000000000001`. | ||
/// (This rule is probably an unintentional C++ism, but it's consensus so we have | ||
/// to follow it.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we should update ths comment to say that this is only true for legacy input. For segwit (v0 and taproot), if you try to SINGLE-sign an output that doesn't exist it'll just fail.
src/util/sighash.rs
Outdated
// sha_single_output (32): the SHA256 of the corresponding output in CTxOut format. | ||
if sighash == SigHashType::Single { | ||
let mut enc = sha256::Hash::engine(); | ||
self.tx.output[input_index].consensus_encode(&mut enc)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to bounds-check this, either here or in check_index
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is fixed in a later commit.
} else if sighash == LegacySigHashType::Single && input_index < self.tx.output.len() { | ||
let mut single_enc = SigHash::engine(); | ||
self.tx.output[input_index].consensus_encode(&mut single_enc)?; | ||
SigHash::from_engine(single_enc).consensus_encode(&mut writer)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add an else if sighash == LegacySigHashType::Single && input_index >= self.tx.output.len()
which encodes the one hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki message signature algorithm
The base is taken from
bip143::SigHashCache
, some code results duplicated but I think it's more clear to keep things separatedWould mark some bullet point on #503
Test vectors are taken by running https://github.com/bitcoin/bitcoin/blob/d1e4c56309aeb73772e3a9d779a9c157024c9e1e/test/functional/feature_taproot.py with a modified
TaprootSignatureHash
function to print intermediate values that I cannot found in the bip341 test vector jsonUPDATE: Latest version includes the suggestion from @sanket1729 to create a unified tool for signature message hash for legacy, segwit, and taproot inputs. In particular, makes sense for mixed segwit v0 and taproot v1 inputs because cached values could be shared