Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
internal/lz4block: Short literal copying in arm64 decoder
This borrows a trick from the amd64 decoder: copy short literals by always copying 16 bytes, using the space beyond the literal's length as scratch space (but only when we're >=16 bytes from the end of src, dst). Benchmark results on RPi4B: name old speed new speed delta UncompressPg1661-4 101MB/s ± 1% 138MB/s ± 0% +36.46% (p=0.000 n=10+9) UncompressDigits-4 453MB/s ± 0% 525MB/s ± 0% +15.81% (p=0.000 n=10+10) UncompressTwain-4 110MB/s ± 0% 156MB/s ± 0% +41.42% (p=0.000 n=10+10) UncompressRand-4 1.14GB/s ± 1% 1.13GB/s ± 1% ~ (p=0.075 n=10+10) Also moved a SUBS to just before the associated branch to allow instruction fusion on ARMs that do that.
- Loading branch information
Showing
1 changed file
with
42 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters