New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce normal register loads #263
Conversation
@larq/compute-engine any idea why the CI isn't running? |
Oh, I think we've configured CI to only run on push, since this is a PR from a fork, it doesn't work. |
4866629
to
1ec2183
Compare
The |
Now there's different CI failures that are unrelated to the code. The Git checkout is failing for some bizarre reason. |
Looks like this might be a problem with the git submodule checkout, though I don't see any failure on master. |
I just rebased and it seems the checkout fine now. After actions/checkout#173 is merged we should upgrade to the new checkout actions which hopefully will be less flaky. |
* Move two CortexM tests from GTest to TFLite testing GTest is replaced with the TFLiteMicro testing framework, making it possible to run these tests also on the CortexM. The two tests for which this is done don't used advanced features (parameter sweep) and therefore work equally well without GTest. * Add two existing tests for the CorexM target Added two existing tests for BGEMV and the quantized multiplier to the build_lcem script to run on CortexM. The tests are slightly modified since std::cerr and the C++ random number generator are not available. * Fix Bazel linter issues * Fix compilation error with std::round for RISC-V * Fix issues for RISC-V target * Improve QEMU check in build script * Fix error reporting and micro compiler flags * Fix issue with tflite error reporting Co-authored-by: Cedric Nugteren <cedric@plumerai.com>
What do these changes do?
This change swaps some of the (normal, non-SIMD) registers used for intermediate results in the ARM64 binary conv kernels, in such a way that reloading several parameters from memory is no longer necessary.
Two additional registers are used by the kernel (which were previously unused):
x14
andx15
.Three
ldr
memory read instructions have been eliminated.How Has This Been Tested?
The ARM64 tests pass locally.
Benchmark Results
Benchmark results (μs) on a Raspberry Pi 4 (1 thread) on QuickNet Large: 78828.5 -> 78580.3.
Results are the average of 500 runs but the results are so close that there probably isn't a meaningful difference here.