implement `Hasher` trait for `Hasher` #87

Luro02 · 2020-05-19T10:43:31Z

https://doc.rust-lang.org/std/hash/trait.Hasher.html

src/lib.rs

oconnor663 · 2020-05-19T21:32:57Z

(Just dropping in to make a general apology for not being responsive this week. I'm ramping up at a new job. I will probably need to get to this PR over the weekend.)

oconnor663 · 2020-05-23T22:15:43Z

Could you say more about the use case here? blake3::Hash implements core::hash::Hash, so I'd expect most callers should finalize manually and use the Hash rather than the Hasher as a map key. (Note that finalize is a &self method, and you can update and finalize multiple times if necessary, as this PR does in finish.)

One of my worries is that blake3::Hasher is "very mutable". It's entire purpose as a type is to be mutated with update. Using it as a HashMap key makes it tempting to call update directly on a map's keys. But that would violate the HashMap contract and lead to weird logic bugs like keys disappearing. (Not technically undefined behavior, but still dangerous and very hard to debug.)

Luro02 · 2020-05-24T10:10:02Z

Could you say more about the use case here?

When I was looking through the documentation I found the blake3::Hasher struct, which reminded me of the Hasher trait that has almost the same interface, so I thought it might be useful to have the trait implemented.

This trait makes it possible to hash any type implementing Hash with the blake3::Hasher, so one could for example create a blake3 hash for a struct, without converting it to bytes first:

use std::hash::Hash;
use std::hash::Hasher as _;

use blake3::Hasher;

#[derive(Hash)]
pub struct Example {
    pub field_0: String,
    pub field_1: usize,
}

fn main() {
    let example = Example {
        field_0: "Hello".into(),
        field_1: 123,
    };

    let mut hasher = Hasher::new();

    // this does only work when the `Hasher` trait is implemented for `Hasher`
    example.hash(&mut hasher);

    let hash = hasher.finish();

    assert_eq!(11927595719148982277, hash);
}

Sometimes it isn't even possible to convert a struct to bytes, in which case only the Hasher trait works.

blake3::Hash implements core::hash::Hash, so I'd expect most callers should finalize manually and use the Hash rather than the Hasher as a map key.

By using Hasher as a map key you mean something like this?:

use std::collections::HashMap;
use blake3::Hasher;

let mut map = HashMap::new();

map.insert(Hasher::new(), 0_usize);

(this is not possible, because blake3::Hasher would have to implement the Hash trait).

It is possible to use the blake3::Hasher to hash the HashMap keys through something like this:

use std::collections::HashMap;
use std::hash::BuildHasher;

/// This is a custom type implementing the `BuildHasher` trait,
/// which is required by `HashMap::with_hasher`
struct Blake3HasherBuilder;

impl BuildHasher for Blake3HasherBuilder {
    // NOTE: this only works if blake3::Hasher implements the `Hasher` trait
    type Hasher = blake3::Hasher;
    
    fn build_hasher(&self) -> Self::Hasher { blake3::Hasher::new() }
}

let mut map = HashMap::with_hasher(hash_builder);

map.insert("hello", "world"); // the "hello" key would be hashed with blake3::Hasher instead of the `DefaultHasher`

One of my worries is that blake3::Hasher is "very mutable". It's entire purpose as a type is to be mutated with update. Using it as a HashMap key makes it tempting to call update directly on a map's keys. But that would violate the HashMap contract and lead to weird logic bugs like keys disappearing. (Not technically undefined behavior, but still dangerous and very hard to debug.)

I think you might have confused the Hash and Hasher traits.

`Hash`

The Hash trait should be implemented by all types that can be hashed. For example

struct HelloNumber {
    hello: &'static str,
    number: usize,
}

can be hashed and therefore one would implement the Hash trait for it:

use std::hash::Hash;

impl Hash for HelloNumber {
    fn hash<H>(&self, state: &mut H)
    where
        H: std::hash::Hasher,
    {
        state.write(self.hello.as_bytes());
        state.write_usize(self.number);
    }
}

or one could simply derive it:

#[derive(Hash)]
struct HelloNumber {
    hello: &'static str,
    number: usize,
}

One is then able to use it wherever a type has to implement the Hash trait. For example as a HashMap key (requires the type to also implement Eq and PartialEq).

`Hasher`

The Hash trait is generic and allows to use any Hasher that implements the Hasher trait, which makes it possible to hash types, with any Hasher that I want, e.g. md5, sha1, sha256 or I could create my own hasher:

use std::hash::{Hasher, Hash};

// Each time you write some data to the `Hasher` the counter is incremented by one.
pub struct StupidHasher(pub u64);

impl Hasher for StupidHasher {
    fn finish(&self) -> u64 { self.0 }

    fn write(&mut self, _bytes: &[u8]) { self.0 += 1 }
}

fn main() {
    #[derive(Hash)]
    struct Example {
        pub field_1: &'static str,
        pub field_2: usize,
    }
    
    let example = Example {
        field_1: "hello".into(),
        field_2: 123
    };

    let mut stupid_hasher = StupidHasher(0);

    example.hash(&mut stupid_hasher);

    // 1st call: hash b"hello"
    // 2nd call: hash 0xFF 
    // dont know why, but this is how it is implemented:
    // https://doc.rust-lang.org/src/core/hash/mod.rs.html#611-616
    // 3rd call: hash 123_usize
    assert_eq!(stupid_hasher.0, 3);
    assert_eq!(stupid_hasher.0, stupid_hasher.finish());
}

oconnor663 · 2020-05-24T13:12:57Z

I think you might have confused the Hash and Hasher traits.

I definitely did! I need to reread this from the beginning.

newpavlov · 2020-06-10T18:40:48Z

core::hash::Hasher trait is generally not intended for functions like MD-5, SHA-2, etc. This crate already implements digest and crypto-mac traits, so unless you plan to use BLAKE3 with hash tables I don't think Hasher trait should be implemented by it. Speed of BLAKE3 may make it suitable for hash tables, but it's worth to first compare performance with the default hasher (right now it's SipHash).

oconnor663 · 2020-06-21T02:23:36Z

Now that I understand better what's going on here, my biggest reservation is that std::hash::Hasher limits its output to 8 bytes. That's too small for most cryptographic purposes. (Though it works well for hash tables, which I think was the main (only?) design goal for the trait.) I think there's a real risk that users might not realize this limitation, and might think that the advertised cryptographic properties of BLAKE3 apply even when it's used through this interface, which could lead to security issues.

This trait makes it possible to hash any type implementing Hash with the blake3::Hasher, so one could for example create a blake3 hash for a struct, without converting it to bytes first:

Coincidentally, we've been thinking a lot about this exact problem at work. Uncanny timing! I think the problem of hashing structured data is really interesting and useful but also really subtle. (There must be rigorous papers on this, but I don't know of any. If anyone reading along can link to research, please do.) My tldr on this is that I think there are some commonly expected properties that the Rust standard library's approach is missing, so "do what the Rust standard library does" is unlikely to be a good solution.

Diving into that, here are some example questions:

Is the std::hash::Hash serialization of a structure guaranteed to be consistent across different machines? This is a requirement for any hash that might get published, for example by saving it in a file or writing it to a database. I think the standard library's implementation does not have this property. For example, integers are currently hashed according to their native endianness. Also the hash of a usize depends on the word size of the machine.
Is std::hash::Hash serialization guaranteed to be stable across different versions of the Rust compiler? This is also a requirement for any hash that might get published. I'm not aware of any cases where it's changed in practice, but also I don't think we have any guarantee that it won't.
What does it mean to hash an unordered container like a HashMap? Is it disallowed? Is it nondeterministic? Or does it imply sorting (and therefore an Ord requirement on the contents)? I think the standard library disallows this by not implementing std::hash::Hash for such containers.
What changes to a struct definition will change its hash? Changing its name? Changing the names of its fields? Changing the order of its fields? Can two structurally distinct objects give the same hash? Is there any way to extend a struct with new optional fields, while preserving its hash in cases where those fields are unset? (I believe Rust's answers are no, no, yes, yes, no.)

That very last question in particular, about whether new fields can be added in a backwards-compatible way, is a central design goal of Protobuf and a very useful property of many other serialization formats like JSON and MessagePack. I think that an Industrial Strength™ design for hashing structured data will need to take a position on this, and I think users are likely to need to know the details of that position and design around them.

newpavlov · 2020-06-21T02:33:25Z

In RustCrypto we had the following discussions about hashing structures some time ago:
RustCrypto/hashes#29
RustCrypto/utils#2

cesarb reviewed May 19, 2020

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

Luro02 force-pushed the master branch from 310ef11 to e4454f2 Compare May 21, 2020 14:43

implement Hasher trait for Hasher

ade061b

Luro02 force-pushed the master branch from e4454f2 to ade061b Compare May 21, 2020 14:46

Luro02 closed this Jun 21, 2020

oconnor663 mentioned this pull request Jul 23, 2020

feat: Implement core::hash::Hasher for Hasher #100

Open

oconnor663 mentioned this pull request May 8, 2021

b3sum: digest checksum for multiple files #171

Open

oconnor663 mentioned this pull request Nov 26, 2021

question/help for checksum of collections and complex types #213

Closed

oconnor663 mentioned this pull request Sep 9, 2023

Hash trait for <T> #340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement `Hasher` trait for `Hasher` #87

implement `Hasher` trait for `Hasher` #87

Luro02 commented May 19, 2020

oconnor663 commented May 19, 2020

oconnor663 commented May 23, 2020 •

edited

Luro02 commented May 24, 2020

oconnor663 commented May 24, 2020

newpavlov commented Jun 10, 2020 •

edited

oconnor663 commented Jun 21, 2020 •

edited

newpavlov commented Jun 21, 2020

implement Hasher trait for Hasher #87

implement Hasher trait for Hasher #87

Conversation

Luro02 commented May 19, 2020

oconnor663 commented May 19, 2020

oconnor663 commented May 23, 2020 • edited

Luro02 commented May 24, 2020

Hash

Hasher

oconnor663 commented May 24, 2020

newpavlov commented Jun 10, 2020 • edited

oconnor663 commented Jun 21, 2020 • edited

newpavlov commented Jun 21, 2020

implement `Hasher` trait for `Hasher` #87

implement `Hasher` trait for `Hasher` #87

oconnor663 commented May 23, 2020 •

edited

`Hash`

`Hasher`

newpavlov commented Jun 10, 2020 •

edited

oconnor663 commented Jun 21, 2020 •

edited