Skip to content

Commit

Permalink
Improve documentation (#135)
Browse files Browse the repository at this point in the history
Signed-off-by: Tom Kaitchuck <Tom.Kaitchuck@gmail.com>
  • Loading branch information
tkaitchuck committed Oct 25, 2022
1 parent 8174160 commit 6801bf1
Show file tree
Hide file tree
Showing 3 changed files with 164 additions and 37 deletions.
20 changes: 11 additions & 9 deletions README.md
Expand Up @@ -53,18 +53,20 @@ map.insert(56, 78);
The aHash package has the following flags:
* `std`: This enables features which require the standard library. (On by default) This includes providing the utility classes `AHashMap` and `AHashSet`.
* `serde`: Enables `serde` support for the utility classes `AHashMap` and `AHashSet`.
* `compile-time-rng`: Whenever possible aHash will seed hashers with random numbers using the [getrandom](https://github.com/rust-random/getrandom) crate.
This is possible for OS targets which provide a source of randomness. (see the [full list](https://docs.rs/getrandom/0.2.0/getrandom/#supported-targets).)
For OS targets without access to a random number generator, `compile-time-rng` provides an alternative.
* `runtime-rng`: To obtain a seed for Hashers will obtain randomness from the operating system. (On by default)
This is done using the [getrandom](https://github.com/rust-random/getrandom) crate.
* `compile-time-rng`: For OS targets without access to a random number generator, `compile-time-rng` provides an alternative.
If `getrandom` is unavailable and `compile-time-rng` is enabled, aHash will generate random numbers at compile time and embed them in the binary.
This allows for DOS resistance even if there is no random number generator available at runtime (assuming the compiled binary is not public).
This makes the binary non-deterministic, unless `getrandom` is available for the target in which case the flag does nothing.
(If non-determinism is a problem see [constrandom's documentation](https://github.com/tkaitchuck/constrandom#deterministic-builds))
This makes the binary non-deterministic. (If non-determinism is a problem see [constrandom's documentation](https://github.com/tkaitchuck/constrandom#deterministic-builds))

**NOTE:** If `getrandom` is unavailable and `compile-time-rng` is disabled aHash will fall back on using the numeric
value of memory addresses as a source of randomness. This is somewhat strong if ALSR is turned on (it is by default)
but for embedded platforms this will result in weak keys. As a result, it is recommended to use `compile-time-rng` anytime
random numbers will not be available at runtime.
If both `runtime-rng` and `compile-time-rng` are enabled the `runtime-rng` will take precedence and `compile-time-rng` will do nothing.

**NOTE:** If both `runtime-rng` and `compile-time-rng` a source of randomness may be provided by the application on startup
using the [ahash::random_state::set_random_source](https://docs.rs/ahash/latest/ahash/random_state/fn.set_random_source.html) method.
If neither flag is set and this is not done, aHash will fall back on using the numeric value of memory addresses as a source of randomness.
This is somewhat strong if ALSR is turned on (it is by default) but for embedded platforms this will result in weak keys.
As a result, it is recommended to use `compile-time-rng` anytime random numbers will not be available at runtime.

## Comparison with other hashers

Expand Down
67 changes: 47 additions & 20 deletions src/lib.rs
@@ -1,18 +1,24 @@
//! AHash is a hashing algorithm is intended to be a high performance, (hardware specific), keyed hash function.
//! This can be seen as a DOS resistant alternative to `FxHash`, or a fast equivalent to `SipHash`.
//! It provides a high speed hash algorithm, but where the result is not predictable without knowing a Key.
//! This allows it to be used in a `HashMap` without allowing for the possibility that an malicious user can
//! AHash is a high performance keyed hash function.
//!
//! It is a DOS resistant alternative to `FxHash` or a faster alternative to `SipHash`.
//!
//! It quickly provides a high quality hash where the result is not predictable without knowing the Key.
//! AHash works with `HashMap` to hash keys, but without allowing for the possibility that an malicious user can
//! induce a collision.
//!
//! # How aHash works
//!
//! aHash uses the hardware AES instruction on x86 processors to provide a keyed hash function.
//! aHash is not a cryptographically secure hash.
//! When it is available aHash uses the hardware AES instructions to provide a keyed hash function.
//! When it is not, aHash falls back on a slightly slower alternative algorithm.
//!
//! AHash does not have a fixed standard for its output. This allows it to improve over time.
//! But this also means that different computers or computers using different versions of ahash will observe different
//! hash values.
#![cfg_attr(
feature = "std",
doc = r##"
# Example
# Usage
AHash is a drop in replacement for the default implementation of the Hasher trait. To construct a HashMap using aHash as its hasher do the following:
```
use ahash::{AHasher, RandomState};
use std::collections::HashMap;
Expand All @@ -25,25 +31,46 @@ map.insert(12, 34);
#![cfg_attr(
feature = "std",
doc = r##"
For convenience, both new-type wrappers and type aliases are provided. The new type wrappers are called called `AHashMap` and `AHashSet`. These do the same thing with slightly less typing.
The type aliases are called `ahash::HashMap`, `ahash::HashSet` are also provided and alias the
std::[HashMap] and std::[HashSet]. Why are there two options? The wrappers are convenient but
can't be used where a generic `std::collection::HashMap<K, V, S>` is required.
For convenience, both new-type wrappers and type aliases are provided.
The new type wrappers are called called `AHashMap` and `AHashSet`.
These do the same thing with slightly less typing. (For convience `From`, `Into`, and `Deref` are provided).
```
use ahash::AHashMap;
let mut map: AHashMap<i32, i32> = AHashMap::with_capacity(4);
let mut map: AHashMap<i32, i32> = AHashMap::new();
map.insert(12, 34);
map.insert(56, 78);
// There are also type aliases provieded together with some extension traits to make
// it more of a drop in replacement for the std::HashMap/HashSet
use ahash::{HashMapExt, HashSetExt}; // Used to get with_capacity()
let mut map = ahash::HashMap::with_capacity(10);
```
For even less typing and better interop with existing libraries which require a `std::collection::HashMap` (such as rayon),
the type aliases [HashMap], [HashSet] are provided. These alias the `std::HashMap` and `std::HashSet` using aHash as the hasher.
```
use ahash::{HashMap, HashMapExt};
let mut map: HashMap<i32, i32> = HashMap::new();
map.insert(12, 34);
let mut set = ahash::HashSet::with_capacity(10);
set.insert(10);
```
Note the import of [HashMapExt]. This is needed for the constructor.
# Directly hashing
Hashers can also be instantiated with `RandomState`. For example:
```
use std::hash::BuildHasher;
use ahash::RandomState;
let hash_builder = RandomState::with_seed(42);
let hash = hash_builder.hash_one("Some Data");
```
### Randomness
To ensure that each map has a unique set of keys aHash needs a source of randomness.
Normally this is just obtained from the OS. (Or via the `compile-time-rng` flag)
If for some reason (such as fuzzing) an application wishes to supply all random seeds manually, this can be done via:
[random_state::set_random_source].
"##
)]
#![deny(clippy::correctness, clippy::complexity, clippy::perf)]
Expand Down Expand Up @@ -157,7 +184,7 @@ where
/// [AHasher]s in order to hash the keys of the map.
///
/// Generally it is preferable to use [RandomState] instead, so that different
/// hashmaps will have different keys. However if fixed keys are desireable this
/// hashmaps will have different keys. However if fixed keys are desirable this
/// may be used instead.
///
/// # Example
Expand Down
114 changes: 106 additions & 8 deletions src/random_state.rs
Expand Up @@ -118,6 +118,12 @@ cfg_if::cfg_if! {
}
/// A supplier of Randomness used for different hashers.
/// See [set_random_source].
///
/// If [set_random_source] aHash will default to the best available source of randomness.
/// In order this is:
/// 1. OS provided random number generator (available if the `runtime-rng` flag is enabled which it is by default)
/// 2. Strong compile time random numbers used to permute a static "counter". (available if `compile-time-rng` is enabled. __Enabling this is recommended if `runtime-rng` is not possible__)
/// 3. A static counter that adds the memory address of each [RandomState] created permuted with fixed constants. (Similar to above but with fixed keys)
pub trait RandomSource {
fn gen_hasher_seed(&self) -> usize;
}
Expand Down Expand Up @@ -195,6 +201,16 @@ cfg_if::cfg_if! {
/// [Hasher]: std::hash::Hasher
/// [BuildHasher]: std::hash::BuildHasher
/// [HashMap]: std::collections::HashMap
///
/// There are multiple constructors each is documented in more detail below:
///
/// | Constructor | Dynamically random? | Seed |
/// |---------------|---------------------|------|
/// |`new` | Each instance unique|_[RandomSource]_|
/// |`generate_with`| Each instance unique|`u64` x 4 + static counter|
/// |`with_seed` | Fixed per process |`u64` + static random number|
/// |`with_seeds` | Fixed |`u64` x 4|
///
#[derive(Clone)]
pub struct RandomState {
pub(crate) k0: u64,
Expand All @@ -210,16 +226,26 @@ impl fmt::Debug for RandomState {
}

impl RandomState {
/// Use randomly generated keys

/// Create a new `RandomState` `BuildHasher` using random keys.
///
/// (Each instance will have a unique set of keys).
#[inline]
pub fn new() -> RandomState {
let src = get_src();
let fixed = get_fixed_seeds();
Self::from_keys(&fixed[0], &fixed[1], src.gen_hasher_seed())
}

/// Allows for supplying seeds, but each time it is called the resulting state will be different.
/// This is done using a static counter, so it can safely be used with a fixed keys.
/// Create a new `RandomState` `BuildHasher` based on the provided seeds, but in such a way
/// that each time it is called the resulting state will be different and of high quality.
/// This allows fixed constant or poor quality seeds to be provided without the problem of different
/// `BuildHasher`s being identical or weak.
///
/// This is done via permuting the provided values with the value of a static counter and memory address.
/// (This makes this method somewhat more expensive than `with_seeds` below which does not do this).
///
/// The provided values (k0-k3) do not need to be of high quality but they should not all be the same value.
#[inline]
pub fn generate_with(k0: u64, k1: u64, k2: u64, k3: u64) -> RandomState {
let src = get_src();
Expand Down Expand Up @@ -252,7 +278,11 @@ impl RandomState {
RandomState { k0, k1, k2, k3 }
}

/// Allows for explicitly setting a seed to used.
/// Build a `RandomState` from a single key. The provided key does not need to be of high quality,
/// but all `RandomState`s created from the same key will produce identical hashers.
/// (In contrast to `generate_with` above)
///
/// This allows for explicitly setting the seed to be used.
///
/// Note: This method does not require the provided seed to be strong.
#[inline]
Expand All @@ -262,9 +292,13 @@ impl RandomState {
}

/// Allows for explicitly setting the seeds to used.
/// All `RandomState`s created with the same set of keys key will produce identical hashers.
/// (In contrast to `generate_with` above)
///
/// Note: This method is robust against 0s being passed for one or more of the parameters
/// or the same value being passed for more than one parameter.
/// Note: If DOS resistance is desired one of these should be a decent quality random number.
/// If 4 high quality random number are not cheaply available this method is robust against 0s being passed for
/// one or more of the parameters or the same value being passed for more than one parameter.
/// It is recommended to pass numbers in order from highest to lowest quality (if there is any difference).
#[inline]
pub const fn with_seeds(k0: u64, k1: u64, k2: u64, k3: u64) -> RandomState {
RandomState {
Expand All @@ -275,7 +309,36 @@ impl RandomState {
}
}

/// Calculates the hash of a single value.
/// Calculates the hash of a single value. This provides a more convenient (and faster) way to obtain a hash:
/// For example:
#[cfg_attr(
feature = "std",
doc = r##" # Examples
```
use std::hash::BuildHasher;
use ahash::RandomState;
let hash_builder = RandomState::new();
let hash = hash_builder.hash_one("Some Data");
```
"##
)]
/// This is similar to:
#[cfg_attr(
feature = "std",
doc = r##" # Examples
```
use std::hash::{BuildHasher, Hash, Hasher};
use ahash::RandomState;
let hash_builder = RandomState::new();
let mut hasher = hash_builder.build_hasher();
"Some Data".hash(&mut hasher);
let hash = hasher.finish();
```
"##
)]
/// (Note that these two ways to get a hash may not produce the same value for the same data)
///
/// This is intended as a convenience for code which *consumes* hashes, such
/// as the implementation of a hash table or in unit tests that check
Expand All @@ -295,6 +358,11 @@ impl RandomState {
}
}

/// Creates an instance of RandomState using keys obtained from the random number generator.
/// Each instance created in this way will have a unique set of keys. (But the resulting instance
/// can be used to create many hashers each or which will have the same keys.)
///
/// This is the same as [RandomState::new()]
impl Default for RandomState {
#[inline]
fn default() -> Self {
Expand Down Expand Up @@ -341,7 +409,37 @@ impl BuildHasher for RandomState {
AHasher::from_random_state(self)
}

/// Calculates the hash of a single value.

/// Calculates the hash of a single value. This provides a more convenient (and faster) way to obtain a hash:
/// For example:
#[cfg_attr(
feature = "std",
doc = r##" # Examples
```
use std::hash::BuildHasher;
use ahash::RandomState;
let hash_builder = RandomState::new();
let hash = hash_builder.hash_one("Some Data");
```
"##
)]
/// This is similar to:
#[cfg_attr(
feature = "std",
doc = r##" # Examples
```
use std::hash::{BuildHasher, Hash, Hasher};
use ahash::RandomState;
let hash_builder = RandomState::new();
let mut hasher = hash_builder.build_hasher();
"Some Data".hash(&mut hasher);
let hash = hasher.finish();
```
"##
)]
/// (Note that these two ways to get a hash may not produce the same value for the same data)
///
/// This is intended as a convenience for code which *consumes* hashes, such
/// as the implementation of a hash table or in unit tests that check
Expand Down

0 comments on commit 6801bf1

Please sign in to comment.