-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add optimized implementations using RISC-V vector intrinsics #1087
Comments
I'm building a list of rvv benchmark results, which could be useful for this project, it currently has numbers for C906 and C910/C920: https://camel-cdr.github.io/rvv-bench-results/ A few performance notes on other processors:
I think the biggest problem with directly porting fixed size SIMD to rvv is how to choose the LMUL. For SSE and neon the answer is simple, choose LMUL=1, because the V extension, as required by the application profile, requires a VLEN of at least 128 bits. For avx2 and avx512, it's less trivial. It may also run at half the possible speed (see bobcat), because currently most processors (C9*, ocelot, probably any with VLEN=128) dispatch based on LMUL, and no based on the set vl, so a vl=1 operation would be slower with LMUL=2 then with LMUL=1. Ara on the other hand, dispatch based on the set vl, so vl=1 LMUL=1 is as fast as a vl=1 LMUL=2 operation.
The question is whether to calculate the best LMUL at startup, or add the fix VLEN as a configurable parameter, or add the minimum supported VLEN as a configurable parameter. |
Thank you very much @camel-cdr for sharing your benchmarking project! I'm also getting the same C908 dev board you are waiting on; and likewise it is also expected next week :-) |
For others looking to contribute, there is an open application to receive a Kendryte K230 developer board with RVV 1.0 from Canaan https://docs.google.com/forms/d/e/1FAIpQLSeZ6GBvZynKFm4w7ZRdI_NRyzgVcr4NSxuPZNLZ8__K9Y2WbA/viewform (background information) I'm happy to help you with your application, please contact me directly for that. |
Just dropping a few more reference here: neon2rvv: https://github.com/howjmay/neon2rvv if somebody plans to port vzip with rvv intrinsics: riscv-non-isa/rvv-intrinsic-doc#289 (comment) |
The risc-v summit talk about the rvv simde paper is online: https://www.youtube.com/watch?v=puvnghbIAV4 |
(I don't plan on doing this myself, but I wanted to start the conversation to see who is interested in doing this)
What
Use RISC-V vector intrinsics to provide optimized implementations of the existing intrinsics (X86, ARM Neon, MIPS MSA, WASM, etc.) already in SIMD Everywhere.
Existing work
VLEN
of 128bits).When to start
The vector extensions themselves were ratified in 2021. The intrinsics for using them from C/C++ are nearly ratified (see below), therefore we can start accepting contributions now.
(source)
(source)
Recent draft: https://github.com/riscv-non-isa/rvv-intrinsic-doc/releases/download/draft-20231014-c10de5388709b000ecc4becb0d9ee16baa0141a9/v-intrinsic-spec.pdf (latest drafts)
https://github.com/riscv-non-isa/rvv-intrinsic-doc
Which compilers to test?
Benchmarking
Maybe autovectorization is good enough. Hand written implementations should both be compared by the number of instructions and on real-world performance.
Please share any suggestions for publicly available RISC-V Vector 1.0 systems.
https://riscv.org/risc-v-developer-boards/details/
https://www.riscfive.com/risc-v-development-boards/ lists some boards with the
V
extension, but I can't find a public declaration that any of them follow the 1.0 version of the vector extension.According to https://doi.org/10.48550/arXiv.2210.08882 , the following cores implement v1.0 of the RISC-V Vector Extension: SiFive X280, Andes NX27V, Atrevido 220. Notably for the riscfive.com list of dev boards, the XuanTie 910 core is RVV version 0.7.1.
The text was updated successfully, but these errors were encountered: