Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Play around with hashes #2664

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Conversation

tcharding
Copy link
Member

@tcharding tcharding commented Apr 5, 2024

There are a few problems with hashes, mainly:

I've played with hashes at least 5 times over the last 3 months and I always get in knots without making any progress. Today I went wikipedia to try and understand the high level functions so I could better understand the current code design. #2663 fell out of this too.

Can you review the last 2 patches please Andrew and tell me if they are even close to being correct and/or acceptable.

  • second to last is just docs in the top level readme
  • last patch is tree restructure, non-API-breaking, with my new high level understanding of all the modules.

The first 4 patches could go in but its not worth the bother till we release I rekon.

@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate doc labels Apr 5, 2024
@tcharding
Copy link
Member Author

cc @apoelstra (here instead of mention in PR description).

hashes/README.md Outdated

### [SHA-1](https://en.wikipedia.org/wiki/SHA-1)

Since 2005, SHA-1 has not been considered secure against well-funded opponents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ce1881b:

I would either drop this line or qualify "secure" and "well-funded" much more precisely. This is quite a provocative line when applied to a hash used by git (2005) and bitcoin (2009)! (I checked the wikipedia page which has these exact words, so I guess I don't blame you....but wtf. They link to a blog post by Bruce Schneier which is slightly less provocative, which in turn cites an actual result, which is that SHA1 had 2^69 security against collisions in 2005.)

Since 2017 (SHAttered) sha1 has not been collision resistant even against poorly-funded components. I would suggest instead writing:

SHA-1 has seen a series of cryptographic breaks and should not be considered collision-resistant. While there are no practical weaknesses against its first- or second-preimage security, we do not recommend it for any modern application.

and we could link "should not be considered collision-resistant" to the wiki page or the SHAttered paper or press release or whatever. Or to the bitcoin transaction which claimed petertodd's "collide sha1 bounty" transaction :).

hashes/README.md Outdated
- SHA-256

Module: `sha256`
Digest: 256 bit (32 byte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ce1881b

These all look fine but a little hard to keep up-to-date. The digest length is in the name of every one of these hashes, and also we may rearrange our modules so that the Module: line goes out of date.

hashes/README.md Outdated
code](https://en.wikipedia.org/wiki/Message_authentication_code) (MAC) involving a cryptographic
hash function and a secret cryptographic key.

We implement HMAC-x where x is any of the hashes above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ce1881b:

We implement HMAC for any hash implementing our Hash trait. It is not limited to the hashes in this repository and not explicitly implemented for any of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we want to keep it like that? A lot of the problems patching the Hash trait stem (in my experience) from the traits usage in Hmac.

@apoelstra
Copy link
Member

As for the rearranging of modules, this structure feels ad-hoc and undiscoverable to me. It's strange that we have HMAC under mac when the only MAC that we implement is HMAC and the only MAC that it makes sense for a hashing crate to implement is HMAC.

Then I don't understand why some hashes are "compound" and some are "hash functions". This sorta makes sense for hashes which are the composition of other hash functions, though this isn't an obvious way to categorize hash functions for either users or implementors...but then why is sha256t there for example?

@tcharding
Copy link
Member Author

tcharding commented Apr 5, 2024

Thanks for reviewing man.

These all look fine but a little hard to keep up-to-date.

Yeah, I think I agree with you.

As for the rearranging of modules, this structure feels ad-hoc and undiscoverable to me. It's strange that we have HMAC under mac when the only MAC that we implement is HMAC and the only MAC that it makes sense for a hashing crate to implement is HMAC.

wiki says the siphash family is only suitable for MAC - is that correct? https://en.wikipedia.org/wiki/SipHash

Then I don't understand why some hashes are "compound" and some are "hash functions". This sorta makes sense for hashes which are the composition of other hash functions, though this isn't an obvious way to categorize hash functions for either users or implementors...but then why is sha256t there for example?

Am I correct in thinking that the "compound" hashes are all Bitcoin things and they are constructs that use "primitive" cryptographic hash functions. I couldn't find any discussion of this outside of bip119 talking about tagged hashes. I made up the word "compound" and am not that happy with it. (I started with bitcoin/ and that seemed worse .)

@apoelstra
Copy link
Member

wiki says the siphash family is only suitable for MAC - is that correct? https://en.wikipedia.org/wiki/SipHash

It is not correct. It seems like wikipedia is just fatally confused about hash functions.

Am I correct in thinking that the "compound" hashes are all Bitcoin things and they are constructs that use "primitive" cryptographic hash functions.

Ok, but this is really an implementation detail whether a hash function is constructed as the composition of other hash functions or not. And tagged hashes are identical to sha256 except for their IV so I don't see why one would be "primitive" and the other not.

I couldn't find any discussion of this outside of bip119 talking about tagged hashes.

In the literature tagging hashes is referred to as "domain separation".

@tcharding
Copy link
Member Author

Cool, thanks man. I'll have another play with it.

@tcharding
Copy link
Member Author

Hey @apoelstra could you take a little squiz at this please? It does not include sha256t. The unit test changes are rough, just to get it all building.

Note please:

  • I wouldn't bother with the diff, I moved heaps of stuff around while working this out.
  • I've tried to re-work the HashEngine trait as you and @Kixunil discussed in Split Hash trait #2496 - let me know if I've got the wrong end of the stick.
  • I'm not sure about the new EngineMidstate trait or if I've got the trait bounds correct in Hmac - took me a lot of massaging to get Hmac to this state.
  • sha256::hash function - just an idea, can't remember when we talked last time about this sort of module level function.

I'm just chasing a quick "looks ok" or "miles off" before I try to work the tagged stuff in. Thanks man.

@apoelstra
Copy link
Member

Looks ok. Not sure why EngineMidstate is a separate trait from Engine.

Lots of small nits (e.g. the Bytes associated type could just be [u8; N] if it's really always supposed to be a byte array).

Not sure what you mean about the trait bounds in hmac.rs. They look pretty straightforward. Though you shouldn't have trait bounds on struct definitions if you can avoid it.

@tcharding
Copy link
Member Author

Sweet, thanks man.

In preparation for patching the `sha256t` module add a regression test
to make sure I don't break the current implementation.
The generics on `sha256t::Hash` make working with the `hashes` module
more complicated that it needs to be. The only reason for the tag is to
have a way to keep a compile time midstate constant for the sha256
engine used by the tagged hash. Note also that the `sha25t::Hash` type
is only ever used in test code, any user of the `hashes` lib (including
`bitcoin` creates a new type and uses that).

The same can be achieved by creating new engine type and putting the
const midstate in the `Default` implementation.
@coveralls
Copy link

coveralls commented Apr 22, 2024

Pull Request Test Coverage Report for Build 8793573668

Details

  • 97 of 142 (68.31%) changed or added relevant lines in 3 files are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.07%) to 83.066%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bitcoin/src/crypto/sighash.rs 4 5 80.0%
bitcoin/src/taproot/mod.rs 8 19 42.11%
hashes/src/sha256t.rs 85 118 72.03%
Files with Coverage Reduction New Missed Lines %
hashes/src/sha256t.rs 5 70.77%
Totals Coverage Status
Change from base Build 8793522744: -0.07%
Covered Lines: 19239
Relevant Lines: 23161

💛 - Coveralls

@tcharding
Copy link
Member Author

Interesting, the CI fail is because feature gating on a trait impl does not work. I tried to find where we discussed doing so before but all I found was a reference to the conversation (#2493 (comment)).

So, I remember someone (@Kixunil?) telling me not to feature gate trait impls but I can't remember why. Turns out it doesn't work anyway - I hit this today in this PR and yesterday also by coincidence while trying to implement io::Write behind a feature gate.

I'm posting all this because I do not know how to solve the problem of "implement this trait if this feature is enabled" which seems like a sane thing to do?

I used cargo expand to see that the trait is implemented but its only available in the same crate that has the implementation - I do not understand the mechanism behind this behaviour.

@apoelstra
Copy link
Member

I'm confused. CI failure looks like it's about schemars and some macro thing, not about trait impls.

What do you mean by "feature-gating a trait impl". Obviously we need to feature-gate impls of std::error::Error on std being present, for example,

@tcharding
Copy link
Member Author

Oh bizzare, yes you are right feature gating of std::error::Error works - I'll have to think a bit harder.

@tcharding
Copy link
Member Author

I'm confused. CI failure looks like it's about schemars and some macro thing, not about trait impls.

Apologies, I was assuming you could read the changes on my local machine :(

I've now pushed up the failure. The schemars test will fail with "the trait JsonSchema is not implemented for TestHash" but the sha256t_hash_newtype trait does indeed implement JsonSchema.

I've added a patch that shows that the exact same code from the failing test passes if it is in the same crate as the macro is defined i.e., in sha25t.rs. The std::error::Error thing I cannot explain.

@apoelstra
Copy link
Member

Ah! So, the issue here is that the #[cfg(feature = "schemars")] is in the output of the macro not in the macro definition itself. So you're feature-gating on the schemars feature of the schemars test crate, not on the schemars feature of bitcoin_hashes.

In principle what you're doing is fine, but you need to do it by conditionally defining the macro, which will be a little ugly. (Probably you need an aux macro which is conditionally compiled either as a schemars trait impl or as nothing.)

@tcharding
Copy link
Member Author

aaaahhhh! Legend, thanks man. I wish I raised this question a few days ago!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate doc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants