Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take a sword to the hashes crate #2708

Closed

Conversation

tcharding
Copy link
Member

@tcharding tcharding commented Apr 23, 2024

This one is brutal. (Note, the 34,000 lines of change are just the API files which are not part of the PR.)

Context:

I have been hacking on the hashes crate a lot recently and, as I have complained before, found it to be way too entangled. Every time I try some change the twisted knot of macros and generics eventually chokes me.

Claims:

  1. The hash_newtype and sha256_hash_newtype macros should not live in hashes because the caller of these macros knows what they want from the new type, hashes has no way of knowing exactly what a caller wants.
  2. The sha256t module design, using the Tag trait is unnecessarily complex, the sole purpose is to provide an engine that uses a const midstate - the same can be achieved without using the trait with no loss of efficiency.

Caveat: I don't know if anyone is using the sha256t::Hash type directly, in bitcoin we do not - I didn't look anywhere else.

Review:

I've used the API checking script from #1880 and added patches as I went so you can see how the API is changing. Likely we will want to iterate, this PR gives as the foundation to be able to.

The last 4 patches are what I'd like to do, so that we can make the crate manageable - I can hear the screams of unnecessary API breakage already though. I rekon its worth it.

We would like to check for API changes during development and in CI so
that such changes can be discussed separately from the code. We have API
listings already.

Add tooling and documentation for creating API listings for the public
crates within the workspace (i.e., not `internals`).

Add API text files from an initial run of the script.
Add a job that runs the new script to check for changes to the public
APIs of `hashes` and `bitcoin`.
Add a `just` command to run the API checking script. Makes it more
discoverable.
Run `just check-api`, no other manual changes.
Currently we use `hashes::hash_newtype!` to create various types that
wrap hash types. While this works it makes the `hashes` crate less
maintainable and more prone to change because logic that we want to
control from `bitcoin` is in `hashes`. We want to stabalise the leaf
crates as quickly as possible, moving the `hash_newtype` logic to
`bitcoin` allows us to simplify `hashes` and put the code closer to
where it is used.

Note, this patch does not remove `hashes::hash_newtype`, it just adds a
bunch of additional macros to `bitcoin::internal_macros`.

Note on design:

Instead of a single enormous macro, we add multiple smaller macros. This
allows implementing just what is needed for a type and gives finer
control over what is implemented and what is in the public API. All at
the cost of being a bit more verbose.
Show what changed with after removing usage of `hashes::hash_newtype!`.
As we did for the `hashes::hash_newtype!` macro; replace the
`hashes::sha256t_hash_newtype!` macro with a collection of more simple
macros in the `bitcoin` crate.

This improves maintainability by giving putting control of the macro
logic in the crate that uses the macro. Also assists in stabalizing
`hashes` sooner because we can now remove the two macros from `hashes`.
The `hashes` crate is _hard_ to work on, every time one wants to change
something there are various macros that are far from trivial to
understand.

We would like to improve the `hashes` API so we can stabalize the crate.
In an effort to assist in doing so remove the two hash newtype macros.

This is obviously an API breaking change - is anyone outside of
`rust-bitcoin` using these macros, if so they are going to be annoyed
with us.

Remove the `hash_newtype!` and `sha256_hash_newtype` macros.
As can be seen from the macros in `bitcoin` we can fully support tagged
SHA256 hashes without needing a module in `hashes`.

In an effort to make the `hashes` crate easier to work on remove the
`sha256t` module all together.

This risks rugging and annoying users but either I need 10 more points
of IQ to work on the `hashes` crate or we have to do something to reduce
the complexity.
@tcharding tcharding marked this pull request as draft April 23, 2024 07:42
@github-actions github-actions bot added ci C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate doc labels Apr 23, 2024
@coveralls
Copy link

Pull Request Test Coverage Report for Build 8796672835

Details

  • 140 of 161 (86.96%) changed or added relevant lines in 14 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 83.151%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bitcoin/src/internal_macros.rs 64 85 75.29%
Files with Coverage Reduction New Missed Lines %
bitcoin/src/internal_macros.rs 1 70.27%
Totals Coverage Status
Change from base Build 8793522744: 0.02%
Covered Lines: 19118
Relevant Lines: 22992

💛 - Coveralls

@apoelstra
Copy link
Member

  • The hash_newtype and sha256_hash_newtype macros should not live in hashes because the caller of these macros knows what they want from the new type, hashes has no way of knowing exactly what a caller wants.

Where should it live then? It seems weird that a macro whose purpose is to define a new hash type, providing all the utility methods that a hash type needs, wouldn't live in the hashes crate.

  • The sha256t module design, using the Tag trait is unnecessarily complex, the sole purpose is to provide an engine that uses a const midstate - the same can be achieved without using the trait with no loss of efficiency.

The goal is that users cannot mix up different kinds of tagged hashes. Maybe there is a different way to achieve this (e.g. by extending the hash_newtype macro :P). But conflating all tagged hashes in one type is definitely a regression.

@tcharding
Copy link
Member Author

tcharding commented Apr 23, 2024

But conflating all tagged hashes in one type is definitely a regression.

Maybey I explained it badly, that is not what this PR does, each tagged hash is a new type. Its the sha256t::Hash type that goes away. I.e, TapSighash is a type that returns an engine pre-tagged ready to go. One foot gun I was thinking about is that one could pass a "random" sha256::HashEngine into TapSighash::from_engine but that seems unlikely/silly.

@tcharding
Copy link
Member Author

tcharding commented Apr 23, 2024

A few more notes:

  • The new macros are not of the "call site looks like normal code" form because that makes writing the macro way harder, can be done before merging.
  • The new macros are more verbose but give finer control
  • The main way to review this is via the API file patches

After sleeeping on this I'm still pretty psyched about this PR, I rekon it buys us a lot.

popd > /dev/null
}

# Check if there are changes (dirty git index) to the `api/` directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In b936137:

trailing whitespace

@apoelstra
Copy link
Member

Looks like the first 4 commits just do the initial API and can be squashed.

@@ -154,7 +154,7 @@ impl fmt::Display for AddressInner {
NetworkKind::Main => PUBKEY_ADDRESS_PREFIX_MAIN,
NetworkKind::Test => PUBKEY_ADDRESS_PREFIX_TEST,
};
prefixed[1..].copy_from_slice(&hash[..]);
prefixed[1..].copy_from_slice(hash.as_byte_array());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dd06be0:

It would be clearer if these changes (using as_byte_array throughout) were done in their own commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -468,7 +468,7 @@ mod test {
{
// test serialization
let raw: Vec<u8> = serialize(&BlockTransactionsRequest {
block_hash: Hash::all_zeros(),
block_hash: BlockHash::all_zeros(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dd06be0:

As well as these (or they can join the as_byte_array commit) -- replacing Hash with concrete types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hex::fmt_hex_exact!(f, $len, self.as_byte_array().iter().rev(), case)
}
$crate::DisplayHash::Forwards => {
hex::fmt_hex_exact!(f, $len, self.as_byte_array().iter(), case)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dd06be0:

I think it'd be clearer, and less code and less API, to drop the DisplayHash type and just use a magic constant in the macro invocation, and to have separate impls here (depending on the magic constant) rather than matching.

@@ -112,6 +112,7 @@ fn from_engine<T: Tag>(e: sha256::HashEngine) -> Hash<T> {
///
/// [`hash_newtype`]: crate::hash_newtype
#[macro_export]
#[deprecated(since = "TBD", note = "this macro is no longer supported, will be removed soon")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 57a9325:

What if, instead of deprecating these, we made them a wrapper around the more detailed macros, where the more-detailed ones would be called by pseudo-attributes like #[hash_impl(hex_fmt)] etc. And have the hash_wrapper macro always be called.

This would minimize the amount of work we'd have to do to make the "hash invocation looks like real code" syntax, and minimize the verbosity of calling these things. Though it would be a bit more work in this macro to support arbitrary sets of attributes like this.

@@ -253,35 +253,3 @@ impl fmt::Display for FromSliceError {

#[cfg(feature = "std")]
impl std::error::Error for FromSliceError {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 8b5958a:

re "is anybody outside of rust-bitcoin using these macros" yes, we are using it in rust-miniscript for one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in rust-elements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and elements-miniscript (in different ways than in rust-miniscript)

@apoelstra
Copy link
Member

Overall aac29ba looks good. I have a few high-level comments:

  • We should pull out the first four API-setting changes and open a new copy of Check for changes to the public API #1880 which re-introduces the API-checking script. Would be OK if it wasn't in CI for now, so this PR (which uses it) would need an initial "update API" commit each time it was rebased.
  • Before ripping the macro apart, I'd stop using the Hash trait by name (ideally we'd completely remove it, but if you don't want to do that first, it'd suffice to make sure we always import it as _ and then call methods like BlockHash::all_zeroes() as though they were intrinsics). This would uniformize the API. I might even go as far as to try to write a script that checks the API for uniformity across all hashtypes.
  • I would give every hash a const LENGTH: usize associated constant and use that in the return value of to_byte_array etc. So you'd return [u8; Self::LENGTH] everywhere rather than Self::Bytes or worse <sha256d::Hash as Hash>::Bytes. And drop Bytes.
  • As mentioned elsewhere, ideally tagged hashes would have tagged engines somehow. But that's independent of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate ci doc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants