Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple, Non-DOS-resistent seed #126

Closed
emilk opened this issue Sep 7, 2022 · 14 comments · Fixed by #131
Closed

Simple, Non-DOS-resistent seed #126

emilk opened this issue Sep 7, 2022 · 14 comments · Fixed by #131

Comments

@emilk
Copy link
Contributor

emilk commented Sep 7, 2022

Summary

I want to be able to use ahash without enabling runtime-rng nor compile-time-rng.

Background:

Currently users have a choice between runtime-rng and compile-time-rng. As I understand it, both these are designed to help against DOS attacks by having an unpredictable seed. There are many circumstances where DOS-attacks won't matter, e.g. whenever ahash isn't used with user-generated keys, etc (related issue: #58).

Why would one want to opt-out of these features?

runtime-rng has the downside that, in order to use ahash in a web app (see emilk/egui#2009), users need to add

[target.'cfg(target_arch = "wasm32")'.dependencies]
getrandom = { version = "0.2", features = ["js"] }

compile-time-rng has the downside that it pulls in extra dependencies, and increase compile times.

So, as a library author that do not care about DOS attacks, I would like to have to be able to use ahash without either runtime-rng or compile-time-rng. However, this doesn't currently work:

❯ cargo build --no-default-features --features std
warning: file found to be present in multiple build targets: /Users/emilk/code/forks/aHash/tests/bench.rs
   Compiling ahash v0.8.0 (/Users/emilk/code/forks/aHash)
error[E0599]: the function or associated item `from_iter` exists for struct `AHashMap<K, V>`, but its trait bounds were not satisfied
   --> src/hash_map.rs:42:15
    |
20  | pub struct AHashMap<K, V, S = crate::RandomState>(HashMap<K, V, S>);
    | --------------------------------------------------------------------
    | |
    | function or associated item `from_iter` not found for this
    | doesn't satisfy `AHashMap<K, V>: FromIterator<(K, V)>`
...
42  |         Self::from_iter(arr)
    |               ^^^^^^^^^ function or associated item cannot be called on `AHashMap<K, V>` due to unsatisfied trait bounds
    |
   ::: src/random_state.rs:197:1
    |
197 | pub struct RandomState {
    | ---------------------- doesn't satisfy `random_state::RandomState: Default`
    |
note: trait bound `random_state::RandomState: Default` was not satisfied
   --> src/hash_map.rs:283:22
    |
280 | impl<K, V, S> FromIterator<(K, V)> for AHashMap<K, V, S>
    |               --------------------     -----------------
...
283 |     S: BuildHasher + Default,
    |                      ^^^^^^^ unsatisfied trait bound introduced here
help: consider annotating `random_state::RandomState` with `#[derive(Default)]`
   --> |src/random_state.rs:197:1
    |
197 | #[derive(Default)]
    |

Solution

I would suggest that ahash can be used without both runtime-rng and compile-time-rng by just using a fixed seed.

Either --no-default-features --features std should just work, or we add a new feature fixed-seed (or similar) so that users opt-in to this (potentially DOS-vulnerable) behavior.

In either case, this would be the fallback only if no dependency enables the runtime-rng or compile-time-rng features. That means me as a library author can leave it to my users to opt-in to DOS-resistant hash seeds.

@repi
Copy link

repi commented Sep 10, 2022

We would like to have this as well, we compile with ahash WebAssembly modules for use outside of the browser which need to have minimal dependencies, fast compile times, and DOS-resistent hash seeds are not important at all.

@notgull
Copy link

notgull commented Sep 10, 2022

I don't see the issue with compile-time-rng. At worst, it slightly increases compile times at the cost of preventing worst-case quadratic behavior in certain data structures. Especially in webapps that use packages like winit, I doubt that the three relatively small packages that that feature adds are truly impactful.

@tkaitchuck
Copy link
Owner

I want to be able to use ahash without enabling runtime-rng nor compile-time-rng.

This can be done. See my comment here: #123 (comment)

compile-time-rng has the downside that it pulls in extra dependencies, and increase compile times.

compile-time-rng is a build time dependency only. So there should be no deployment cost.

I would suggest that ahash can be used without both runtime-rng and compile-time-rng by just using a fixed seed.

This is already the case. The problem you are encountering is that the Builder does not implement Default in such a case. However you can still use it.

@emilk
Copy link
Contributor Author

emilk commented Sep 12, 2022

Thanks for your replies @tkaitchuck !

This can be done. See my comment here: #123 (comment)

The problem you are encountering is that the Builder does not implement Default in such a case. However you can still use it.

What I would like is for ahash::RandomState::default() and ahash::RandomStae::new to work even without enabling runtime-rng nor compile-time-rng.

One of the great things about ahash is that ahash::HashMap is a drop-in replacement for std::collections::HashMap, but that is not true without runtime-rng or compile-time-rng.

My library could use RandomState::generate_with, but that means replacing all my #[derive(Default)] around everything containing a ahash::HashMap. I don't want to do that. If instead I instead could just use ahash::HashMap::default() (without runtime-rng or compile-time-rng) then my users can still opt-in to DOS-resistance by enabling runtime-rng themselves.

@notgull
Copy link

notgull commented Sep 19, 2022

@emilk Here's the solution I use in my own projects.

struct HashMap<K, V> {
    inner: hashbrown::HashMap<K, V, RandomState>,
}

impl<K, V> Default for HashMap<K, V> {
    fn default() -> Self {
        #[cfg(has_random)]
        let inner = HashMap::with_hasher(RandomState::default());
        #[cfg(not(has_random))]
        let inner = HashMap::with_hasher(RandomState::generate_with(SEED1, SEED2, SEED3, SEED4));

        Self { inner }
    }
}

I find it to be a robust solution that allows for a drop-in replacement.

@rklaehn
Copy link

rklaehn commented Oct 6, 2022

To add to this discussion: Sometimes you want to explicitly have deterministic behaviour. E.g. when writing tests. In particular, with property based tests it is really desirable that the same seed produces the same results.

As far as I can see there is no straightforward way to get a fully deterministic ahash HashMap. Having that would be very useful. I got a case where I am using the fnv crate instead of ahash despite our code base already depending on ahash, just because I want strictly deterministic behaviour...

@schungx
Copy link
Contributor

schungx commented Oct 6, 2022

On this issue, I have recently encountered a different problem.

The core of the prob is: ahash is too popular.

My crate would pull in ahash with default-features=false but some other dependency down the tree would also pull in ahash with std.

Since cargo would merge features, I end up always having std.

RandomState::with_seed works fine, but that problem took quite a while to diagnose.

@rklaehn
Copy link

rklaehn commented Oct 6, 2022

Sorry if I am dense, but how do I generate a fully deterministic ahash hashtable using RandomState::with_seed? Got a bit lost...

@notgull
Copy link

notgull commented Oct 6, 2022

To add to this discussion: Sometimes you want to explicitly have deterministic behaviour. E.g. when writing tests. In particular, with property based tests it is really desirable that the same seed produces the same results.

At this point you should probably expose with_seed in some way, e.g. have an alternate constructor that initializes with with_seed rather than default.

My crate would pull in ahash with default-features=false but some other dependency down the tree would also pull in ahash with std.

You should open a PR with these crates.

Sorry if I am dense, but how do I generate a fully deterministic ahash hashtable using RandomState::with_seed? Got a bit lost...

Use the with_hasher method on HashMap.

@rklaehn
Copy link

rklaehn commented Oct 6, 2022

Sorry if I am dense, but how do I generate a fully deterministic ahash hashtable using RandomState::with_seed?
Got a bit lost...

Use the with_hasher method on HashMap.

So this is what I came up with to make a type of AHashMap that is always deterministic. Still feels a bit like walking a minefield since seemingly innocent methods like RandomState::with_seed are introducing nondeterminism...

So with_seed mixes in randomness, but with_seeds does not?

    pub struct DeterministicHasher(RandomState);

    impl Default for DeterministicHasher {
        fn default() -> Self {
            Self(RandomState::with_seeds(0, 0, 0, 0))
        }
    }

    impl BuildHasher for DeterministicHasher {
        type Hasher = ahash::AHasher;

        fn build_hasher(&self) -> Self::Hasher {
            self.0.build_hasher()
        }
    }

    type DetAHashMap<K, V> = ahash::AHashMap<K, V, DeterministicHasher>;

@schungx
Copy link
Contributor

schungx commented Oct 6, 2022

Sorry I believe it is with_seeds (I made typo) which is completely deterministic. with_seed (singular) will use random numbers.

You can look into the source to see that it just takes the four numbers as the seed without any processing.

I eventually expose an option to provide fixed seeds.

@rklaehn
Copy link

rklaehn commented Oct 6, 2022

Sorry I believe it is with_seeds (I made typo) which is completely deterministic. with_seed (singular) will use random numbers.

You can look into the source to see that it just takes the four numbers as the seed without any processing.

Yes, I checked. Just found the difference a bit surprising. I would expect a method that is called with_seed to also be completely deterministic. That is usually what you want if you pass in an explicit seed.

I eventually expose an option to provide fixed seeds.

Thanks a lot. Going with the solution above for now, I guess...

@tkaitchuck
Copy link
Owner

Both with_seed and with_seeds when provided with will provided identical hashers for identical seeds. Either of these can be used for a deterministic hash table.

generate_with and new both use randomness and will create different hashers each time.

@schungx
Copy link
Contributor

schungx commented Oct 25, 2022

It is probably better to mention these in the documentation as it takes a while reading through the code to figure out. Especially the differences between with_seed and with_seeds (I guess with_seeds is "lower level") are not entirely apparent.

Some form of table would be helpful, such as...

Constructor Dynamically random? Consistent cross compiles? Seeds
new Y random
generate_with Y u64 x 4
with_seed N N u64 + compile-time random number
with_seeds N Y u64 x 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants