Add SIMD implementations if/when `std::simd` is stable #83

akhilles · 2023-02-04T22:57:32Z

snakehand · 2023-02-05T13:51:29Z

If you want to get significant speedup with SIMD, (especially on x86) you should implement the algorithm using carry-less multiplications.

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf

I can help giving pointers how to do this practically.

KillingSpark · 2023-12-06T09:57:29Z

Hi @snakehand I've been thinking about this a bit, sadly the paper you linked isn't available anymore (at least not under the link and a quick google only turns up a whitepaper about carry less mult for galois counter mode) but am I right if I think the carry-less multiplication would be used somewhat like this? (Ignoring reflection etc etc for conciseness)

fn crc(poly: u64, crc: u64, bytes: &[u8]) -> u64 {
    let mut idx = 0;
    while bytes.len() - idx >= 8 {
        let next_data = load_u64(bytes, idx);
        let multiplicated: u128 = carry_less_mult(crc ^ next_data, poly);
        // Question: Are the higher bits of any significance anymore? 
        // They are equivalent to what we shift out / throw away in the "normal" implementations right?
        crc = lower_bits(multiplicated);
        idx += 8;
    }
    // deal with remainder
    crc
}

snakehand · 2023-12-06T14:05:22Z

The document is available here :

https://github.com/tpn/pdfs/blob/master/Fast%20CRC%20Computation%20for%20Generic%20Polynomials%20Using%20PCLMULQDQ%20Instruction%20-%20Intel%20(December%2C%202009).pdf

The speedup comes from using the carryless multiplication in bigger data units, and using Barett reduction to compute the final smaller CRC.

KillingSpark · 2023-12-23T23:35:35Z

Started doing preliminary work here, no simd yet just understanding the algorithm: https://github.com/KillingSpark/crc-rs/tree/clmul

Interestingly enough this is ~2x faster than the current table-less implementation even without any real thought on optimization and especially with the lack of any simd. Might be worth using this even if the simd instructions aren't available for a specific target.

RiversJin mentioned this issue Jun 1, 2023

Does crc-rs have plans to implement hardware acceleration? #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SIMD implementations if/when `std::simd` is stable #83

Add SIMD implementations if/when `std::simd` is stable #83

akhilles commented Feb 4, 2023

snakehand commented Feb 5, 2023

KillingSpark commented Dec 6, 2023 •

edited

snakehand commented Dec 6, 2023 •

edited

KillingSpark commented Dec 23, 2023

Add SIMD implementations if/when std::simd is stable #83

Add SIMD implementations if/when std::simd is stable #83

Comments

akhilles commented Feb 4, 2023

snakehand commented Feb 5, 2023

KillingSpark commented Dec 6, 2023 • edited

snakehand commented Dec 6, 2023 • edited

KillingSpark commented Dec 23, 2023

Add SIMD implementations if/when `std::simd` is stable #83

Add SIMD implementations if/when `std::simd` is stable #83

KillingSpark commented Dec 6, 2023 •

edited

snakehand commented Dec 6, 2023 •

edited