Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider runtime detecting AES on x86 or using CRC #104

Open
CryZe opened this issue Nov 17, 2021 · 9 comments
Open

Consider runtime detecting AES on x86 or using CRC #104

CryZe opened this issue Nov 17, 2021 · 9 comments

Comments

@CryZe
Copy link

CryZe commented Nov 17, 2021

They recently defined microarchitecture levels for x86-64 (Wikipedia) but apparently decided against including AES in any of them. So it seems like applications may in the future start targeting these levels. Especially on Windows 11, which requires fairly recent CPUs, you can absolutely start using x86-64-v3. But as this does not include AES, ahash is still going to use the fallback algorithm. So with these levels being defined the way they are, I don't see ahash ever realistically using AES in an application that isn't just meant for self consumption (where you can use target-cpu=native). So it probably makes sense to look into runtime detection of the feature or potentially introducing a fallback to CRC, which is included in v2. Supposedly CRC is actually quite suitable for hashing: https://twitter.com/pcwalton/status/1460782519733788673

@tkaitchuck
Copy link
Owner

I am planning a refactoring of the way the the hasher is instantiated which I intended for use with specialization. But I suppose it could also be used for runtime feature detection.
Is there a specific API that can be used to test for AES?

I had previously looked into CRC but dismissed it because it required 64bit and SSE4.2 which virtually guarantees that AES is available, and while low latency, it only operates on 32bits which limits throughput somewhat.

@itamarst
Copy link

itamarst commented Dec 9, 2021

Maybe you can detect AES with multiversion crate? Docs imply something like this might work:

#[target("x86_64]+aes")]
fn yourfunc() {}

At some level (maybe fairly low down, but somewhere) it prevents inlining, though, it sounds like.

@ssvb
Copy link

ssvb commented Feb 18, 2022

I had previously looked into CRC but dismissed it because it required 64bit and SSE4.2 which virtually guarantees that AES is available

There seem to be a lot of exceptions in the line-up of Intel processors: https://en.wikipedia.org/wiki/AES_instruction_set#Intel
I have an old Core-i7 Nehalem processor with CRC support, but not AES.

@tkaitchuck
Copy link
Owner

Yes. Nehalem is the unlucky one in the middle. Core, and Penryn have neither but Westmere and Sandy Bridge have both. So adding support for that is only a 1 year window of CPU manufacturing.

The bigger problem is that it's not a great mixer. I don't know how to use it to outperform folded multiply which it what it would use in the absence of AES instruction now.

@ssvb
Copy link

ssvb commented Nov 1, 2022

Westmere and Sandy Bridge have both

Budget variants of Sandy Bridge (labelled as "Core i3", "Pentium" or "Celeron") didn't have AES support. And Pentium G3420 (also without AES support) from the Haswell family is still being manufactured even today. Based on what I heard, cheap low end processors may sell in larger volumes than their expensive high end siblings, so I'm not sure what is the realistic percentage of processors in actual use in the hands of real users that don't have AES support right now.

@ssvb
Copy link

ssvb commented Nov 1, 2022

The bigger problem is that it's not a great mixer. I don't know how to use it to outperform folded multiply which it what it would use in the absence of AES instruction now.

I'm a little bit concerned about the folded multiply, because it ruins everything if the random secret key happens to be zero. Some non-zero secret keys may result in a poor hashing quality too (so it's not just a 1 in 2^64 chance). I mentioned this earlier in another ticket too.

@tkaitchuck
Copy link
Owner

I'm a little bit concerned about the folded multiply, because it ruins everything if the random secret key happens to be zero.

Sortof. It's used in a more complicated way. Two blocks 64bit are read in, xored with two keys and then the resulting values are folded multiplied together. Then the result is combined with the internal state. So if for any block of the input the plaintext is the same as the secret key then the adjacent block's value will no effect on the output. This means that if you can guess either the first or second 64 bits of the key you can produce an unlimited number of hash collisions because the adjacent 64 bits are effectively ignored. This is not likely to cause a problem for random large strings, and is too improbable to brute force. However if there were some other vulnerability, it could be used to turn it into an attack.

A 'poor quality' update you allude to in the case of a non-zero value is not a concern. To see why, suppose for example that by chance all but one bit in one of the numbers was zero. Then the output would be the other input rotated to start at the position of the one set bit. So even with only one bit set every bit in the output can be affected. This resulting value is combined with the state in a way such that even inputs with a fixed differential cannot be canceled later. (IE: A small difference in one block can't be undone with a small difference in a later block.) Once all the blocks have been combined the finalizer should mix the state well.

If there is a better option, I would be interested in trying it. Looking at _mm_crc32_u64, the best option would be to do something like:

let buffer: [u32;2] = self.buffer.convert();
let buffer[0] = crc(buffer[0], new_data + key1);
let buffer[1] = crc(buffer[1], new_data + key1);
self.buffer = buffer.convert();

The problems with this are:

  • Getting a 64 bit state requires a lot of awkward manipulation (not shown above) because the instruction takes a 64 bit value but only uses 32 bits of it.
  • The quality is terrible. - So additional mixing beyond this is needed
  • Even this version is more than a factor of 2x slower the current fallback.
  • Due to the nature of CRC reordering of blocks causes collisions. (IE: Hash(A, B, C, D) == Hash(C, D, A, B)) - This means the keys used to mask the input need to change between each block in a non-predictable way.

Is there a better way to go about this? From my perspective the quality issues could be fixed but getting the performance on par is probably impossible because it costs the same as a multiply but only works with 32bits of state.

@tkaitchuck
Copy link
Owner

For reference if folded multiply is not available it now uses this:
https://github.com/tkaitchuck/aHash/blob/master/src/operations.rs#L20
Which is 2 multiplies and 5 other single cycle instructions several of which can use instruction level parallelism.

So at the very least any CPU specific version, will need to out perform this most generic version.

@TheIronBorn
Copy link

Is there a specific API that can be used to test for AES?

is is_x86_feature_detected not good enough?

At the very least changing the documentation to urge users to do runtime detection would be good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants