Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Behaviour and Speed on Computers that do not offer the AES instruction set #106

Open
NeverGivinUp opened this issue Dec 8, 2021 · 7 comments

Comments

@NeverGivinUp
Copy link

The current documentation (Version 0.7.6) makes only very little mention that aHash fundamentally requires a special hardware instruction set. Specifically -- for the uninitiated -- it does not become clear that the speed of aHash depends on that instruction sets availability on any specific computer, which computers do not offer that, what aHash does in that case and how that affects performance.

Personally, I'd like to know how aHash performs on handheld devices of varying age and price classes, and which hasher to pick in case too many of the devices do not offer AES.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Dec 31, 2021

Ahash's goal is to be as fast as possible wherever. This means using specialized instructions when they are available and falling back when they are not. Currently there are three algorithms:

  • For cpus with an AES-ni instruction on x86-64 and ARM it uses this algorithm
  • For cpus that don't have a AES instruction but are in the set: ("x86_64", "aarch64", "mips64", "powerpc64", "riscv64gc", "s390x") then it uses this algorithm (Which has an accelerated from for long strings located here)
  • For CPUs which lack AES and aren't listed above the folded multiply is replaced with this which is 3 xors 3 rotates and two multiplies.

These should work very well on virtually anything which has a 64 bit multiply. If however you are on a CPU which only has a 32bit multiply, it may not be optimal. I have not yet figured out a faster way to get a high quality 64 bit hash on a 32bit cpu without compromising quality or DOS resistance.

All of this is of course subject to change. As stated in the documentation: if a faster technique is found for any particular architecture, it will be adopted. If you find some other hash which if faster on some particular CPU and think it can be adapted to be DOS resistant, then feel free to open an issue of PR here.

@ssvb
Copy link

ssvb commented Feb 18, 2022

How large is the random key? The AHash-fallback-algorithm wiki page is talking about a 64-bit random key in the Finalization section, but I see two 128-bit seeds (256-bit total) in the code (pub fn new_with_keys(key1: u128, key2: u128) -> AHasher).

How many arithmetic operations total and in which order are done when hashing a single 32-bit or 64-bit value?

What's the difference between finish and short_finish in the code?

The table in the FAQ shows that the aHash Fallback implementation is roughly as fast as the AES implementation for the data types up to u128. Can it be generally used as a DOS resistant high performance hash function implementation even without AES? What do you think about adopting it for use in other programming languages?

Also about a potential 32-bit variant. What would happen if everything is just mechanically changed to 32-bits? My understanding is that generating a 32-bit hash is perfectly fine for selecting a hash table bucket on a 32-bit CPU. Would such modification offer insufficient DOS resistance?

@tkaitchuck
Copy link
Owner

The initialization takes 256bits of key. Before the key is used it is run through this function:

fn from_keys(a: &[u64; 4], b: &[u64; 4], c: usize) -> RandomState {

(Which mixes the key using the Hashing algorithm itself incase some portion of the key is less good than some other portion)
The AES codepath makes use of the full 256 bits of the mixed key. The Fallback algorithm uses 128 bits of the result of mixing the key for fixed size inputs (IE: primitives such as integers, floats, and structs of primitives) but uses the full 256bits for any code path involving any sort of slice or string. (IE: variable length input)
The 'short_finish' is used as a specialization to accelerate the hashing of code paths of constant size primitives. (And not for any structs or variable length inputs). For example it is used if you are making a map where the key is use an i32. This short_finish code is weaker, but there isn't a good way to DOS attack collisions when the key being used is just a single integer.

Can it be generally used as a DOS resistant high performance hash function implementation even without AES?

Yes, It can be used as a does resistant hash function for hashmaps. (Other uses such as for distributed systems or persisting hashes aren't recommended because hashes are subject to change on different architectures and versions)

What do you think about adopting it for use in other programming languages?

Like by calling into Rust code from another language, or by re-implementing? With the former that depends on what language you are calling from. If it has a high overhead to call a Rust function (most languages don't but a few do) the call overhead could be more costly than the hash itself. (With small inputs) For re-implementing it aHash could serve as a template for another language's implementation, but realistically I wouldn't ever expect them to be fully compatible because aHash does a LOT of weird things, and doesn't even guarantee compatibility with itself.

Also about a potential 32-bit variant. What would happen if everything is just mechanically changed to 32-bits? My understanding is that generating a 32-bit hash is perfectly fine for selecting a hash table bucket on a 32-bit CPU. Would such modification offer insufficient DOS resistance?

"Insufficient" depends on what you are using it for and what your risk tolerance is. Even with 64bits aHash is not a cryptographic hash, so it cannot be relied on that nobody in all future time will ever find a single collision in the way you would want if you were using it to for example sign certs. But in general the algorithm for the fallback is decent and could probably be re-implemented with half size integers. It would probably be decently fast on 32bit hardware. Obviously that is going to halve the key size to just 64bits for the short code paths and 128 for the long ones. Those are shorter keys but still long enough that it seems implausible that someone would attempt to brute force them just to launch a generic DOS attack on whatever embedded hardware it is running on via a hashmap collisions. Realistically the larger concern on 32bit embedded is getting good keys. AHash tries to get as good of random data as is available, but often on embedded hardware there is no runtime RNG available and even the address fallback doesn't work well because ASLR is disabled. This is why the compile time RNG support was added. But that has obvious drawbacks.

@ssvb
Copy link

ssvb commented Apr 12, 2022

The 'short_finish' is used as a specialization to accelerate the hashing of code paths of constant size primitives. (And not for any structs or variable length inputs). For example it is used if you are making a map where the key is use an i32. This short_finish code is weaker, but there isn't a good way to DOS attack collisions when the key being used is just a single integer.

Thanks! What does the current algorithm for hashing an i32 key look like and how many arithmetic operations are used? I'm having some difficulties following the logic of the code for this code path and I'm not very fluent in Rust.

Like by calling into Rust code from another language, or by re-implementing?

Re-implementing it. For exactly the same purpose: DoS resistant way of picking a hash table bucket in a performance optimized hash table implementation. Compatibility doesn't matter. I would like to try to use it in Crystal and D languages. Also looks like the current way of hashing integer keys in Python dict is vulnerable, so Python is a possible target too: https://codeforces.com/blog/entry/101817

I tried to benchmark aHash vs. Crystal's default FunnyHash. And aHash looks good (even without the special shortcut for a short key): https://gist.github.com/ssvb/a9f0fc9c8ff9d01a11829dcc7356fb90

Though I'm a bit worried about the last folded multiply done by aHash when hashing an u64 value. We are essentially multiplying the result by a random key, which may be zero if we are very unlucky (or just poor quality even if it's not exactly zero).

@tkaitchuck
Copy link
Owner

We are essentially multiplying the result by a random key, which may be zero if we are very unlucky (or just poor quality even if it's not exactly zero).

It's doing the full 128 bits multiply and using the high order bits, so it isn't poor quality if there are only a few bits set. It is however a problem if the value is exactly zero. (This will result in the other 64 bits not affecting the state of the hash function.) This could be prevented by doing an &1 or similar but that would sacrifice one bit of the input, which would likely be easier to exploit than having a 1/2^64 chance of 8 bytes of the string not factoring into the resulting hash.

Thanks! What does the current algorithm for hashing an i32 key look like and how many arithmetic operations are used? I'm having some difficulties following the logic of the code for this code path and I'm not very fluent in Rust.

No worries. This is a particularly very difficult to read code base due to all the optimization shanagins.

Currently for hashing an i32 using the fallback algorithm, with the accelerated path the hash function is effectively:

fn hash_i32(input: i32, key1: u64, key2: u64) -> u64 {
  let temp = (input ^ key1) as u128  *   6364136223846793005_u128
  let temp2 =  ((temp & 0xffff_ffff_ffff_ffff) as u64) ^ ((temp >> 64) as u64)  // XOR the lower 64 bits with the upper 64 bits.
  return temp2.rotate_left(key2 & 63);
}

So reduces to 2 xors, one multiply, and a rotate. (The ands and shifts all optimize away)

@St4NNi
Copy link

St4NNi commented Aug 11, 2023

Sorry for hijacking this issue, but I was just curious what would be the preferred way to reduce the hash size of the fallback algorithm to a u32 from the u64 when optimizing for space efficiency ? Truncating or an additional XOR between the lower and upper 32 bits ?

@tkaitchuck
Copy link
Owner

@St4NNi Truncation should be fine, as both the upper and lower 32bits of the output are of high quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants