New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RVV port for Base64 procedures #380
Comments
Ah, you beat me to it :-) I was planning to start on that as well. For encode I was planning to do something similar to the haswell implementation, that's basically what you did. For decode, I'd try to use vcompress instead of vrgather as the last step in pack, it's the less complex operation. |
You have to take into account that we need to store data in the big-endian order, this is why I used gather. Alternative is to swap 4-byte inputs when Zvbb extension is available. |
I thought you could do the endianess swap with the initial vrgather, but now that I think about it a bit more that doesn't line up the bits properly. |
@WojciechMula I tried rearranging the shifts to create the big endian result, that could be compressed: // in32: [00dddDDD|00cccCCC|00bbbBBB|00aaaAAA]
// d: [00dddDDD|00000000|00000000|00000000] i&
// c1: [0000dddD|DD00cccC|000000bb|bBBB00aa] i>>2,4 (i as 2x16)
// c: [00000000|0000cccC|000000bb|00000000] c1&
// ca: [CC000000|00000000|aaaAAA00|00000000] i<<14,10 (i as 2x16)
// b1: [cCCC00bb|bBBB00aa|aAAA0000|00000000] i<<12
// b: [00000000|bBBB0000|00000000|00000000] b1&
// dcba: [CCdddDDD|bBBBcccC|aaaAAAbb|00000000] d|c|ca|b This even requires one fewer shift, but reinterprets the input as 16 bit elements for two shifts. // outside of looop:
const size_t vl16m4 = __riscv_vsetvlmax_e16m4();
const vuint16m4_t v16_2_4 = __riscv_vreinterpret_v_u32m4_u16m4(__riscv_vmv_v_x_u32m4(0x000400002, vl16m4));
const vuint16m4_t v16_14_10 = __riscv_vreinterpret_v_u32m4_u16m4(__riscv_vmv_v_x_u32m4(0x000a0000e, vl16m4));
const size_t vl8m4 = __riscv_vsetvlmax_e8m4();
vbool2_t mcompress = __riscv_vmsne_vx_u8m4_b2(__riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m2), 3, vl8m4), 3, vl8m4);
...
const vuint8m4_t in8 = ...;
// in32: [00dddDDD|00cccCCC|00bbbBBB|00aaaAAA]
const vuint32m4_t in32 = __riscv_vreinterpret_v_u8m4_u32m4(in8);
const vuint16m4_t in16 = __riscv_vreinterpret_v_u8m4_u16m4(in8);
const size_t vl = __riscv_vsetvlmax_e32m4();
// d: [00dddDDD|00000000|00000000|00000000]
const vuint32m4_t d = __riscv_vand_vx_u32m4(in32, 0x3f000000, vl);
// c1: [0000dddD|DD00cccC|000000bb|bBBB00aa]
const vuint32m4_t c1 = __riscv_vreinterpret_v_u16m4_u32m4(__riscv_vsrl_vx_u16m4(in16, v16_2_4, vl));
// c: [00000000|0000cccC|000000bb|00000000]
const vuint32m4_t c = __riscv_vand_vx_u32m4(c, 0x000f0300, vl);
// ca: [CC000000|00000000|aaaAAA00|00000000]
const vuint32m4_t ca = __riscv_vreinterpret_v_u16m4_u32m4(__riscv_vsll_vx_u16m4(in16, v16_14_10, vl));
// b1: [cCCC00bb|bBBB00aa|aAAA0000|00000000]
const vuint32m4_t b1 = __riscv_vsll_vx_u32m4(b1, 12, vl);
// b: [00000000|bBBB0000|00000000|00000000]
const vuint32m4_t b = __riscv_vand_vx_u32m4(b1, 0x00f00000, vl);
// dcba: [CCdddDDD|bBBBcccC|aaaAAAbb|00000000]
const vuint32m4_t dcba = __riscv_vor_vv_u32m4(__riscv_vor_vv_u32m4(d, c, vl); __riscv_vor_vv_u32m4(ca, b, vl), vl);
// pack 3 byte-groups into continous array
return __riscv_vcompress_vm_u8m4(__riscv_vreinterpret_v_u32m4_u8m4(abcd), mcompress, vl); |
I got inspired by @camel-cdr :) and started to work on base64 procedures. The undergoing work is done in my repo https://github.com/WojciechMula/base64simd.
The text was updated successfully, but these errors were encountered: