Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Replace bytes.Equal with smaller comparisons #695

Merged
merged 1 commit into from Nov 23, 2022

Conversation

greatroar
Copy link
Contributor

@greatroar greatroar commented Nov 22, 2022

This replaces some bytes.Equal calls, which compile down to calls to runtime.memequal, with uint32 or string comparisons that become CMPL instructions (or a series of CMPBs, in one case).

The code is several dozen instructions shorter. Benchmarks show no regressions:

name                                                  old speed      new speed      delta
Decoder_DecoderSmall/kppkn.gtb.zst/buffered-8          459MB/s ± 0%   458MB/s ± 0%    ~     (p=0.546 n=9+9)
Decoder_DecoderSmall/kppkn.gtb.zst/unbuffered-8        566MB/s ± 3%   568MB/s ± 2%    ~     (p=0.481 n=10+10)
Decoder_DecoderSmall/geo.protodata.zst/buffered-8     1.19GB/s ± 0%  1.19GB/s ± 0%  +0.24%  (p=0.040 n=9+9)
Decoder_DecoderSmall/geo.protodata.zst/unbuffered-8    831MB/s ± 2%   832MB/s ± 2%    ~     (p=0.853 n=10+10)
Decoder_DecoderSmall/plrabn12.txt.zst/buffered-8       349MB/s ± 3%   352MB/s ± 0%    ~     (p=0.075 n=9+10)
Decoder_DecoderSmall/plrabn12.txt.zst/unbuffered-8     529MB/s ± 6%   534MB/s ± 7%    ~     (p=0.796 n=10+10)
Decoder_DecoderSmall/lcet10.txt.zst/buffered-8         421MB/s ± 1%   422MB/s ± 0%  +0.32%  (p=0.029 n=9+8)
Decoder_DecoderSmall/lcet10.txt.zst/unbuffered-8       608MB/s ± 2%   606MB/s ± 9%    ~     (p=0.897 n=8+10)
Decoder_DecoderSmall/asyoulik.txt.zst/buffered-8       367MB/s ± 0%   362MB/s ± 5%    ~     (p=0.388 n=10+9)
Decoder_DecoderSmall/asyoulik.txt.zst/unbuffered-8     454MB/s ± 4%   449MB/s ± 3%    ~     (p=0.353 n=10+10)
Decoder_DecoderSmall/alice29.txt.zst/buffered-8        360MB/s ± 0%   361MB/s ± 0%  +0.18%  (p=0.001 n=10+8)
Decoder_DecoderSmall/alice29.txt.zst/unbuffered-8      572MB/s ± 4%   569MB/s ± 8%    ~     (p=0.842 n=10+9)
Decoder_DecoderSmall/html_x_4.zst/buffered-8          2.46GB/s ± 0%  2.46GB/s ± 1%    ~     (p=0.497 n=9+10)
Decoder_DecoderSmall/html_x_4.zst/unbuffered-8        1.65GB/s ± 5%  1.67GB/s ± 6%    ~     (p=0.481 n=10+10)
Decoder_DecoderSmall/paper-100k.pdf.zst/buffered-8    3.87GB/s ± 1%  3.87GB/s ± 0%    ~     (p=0.353 n=10+10)
Decoder_DecoderSmall/paper-100k.pdf.zst/unbuffered-8  1.85GB/s ± 4%  1.84GB/s ± 7%    ~     (p=0.529 n=10+10)
Decoder_DecoderSmall/fireworks.jpeg.zst/buffered-8    8.62GB/s ± 1%  8.63GB/s ± 1%    ~     (p=0.529 n=10+10)
Decoder_DecoderSmall/fireworks.jpeg.zst/unbuffered-8  3.34GB/s ± 2%  3.34GB/s ± 3%    ~     (p=0.796 n=10+10)
Decoder_DecoderSmall/urls.10K.zst/buffered-8           587MB/s ± 1%   586MB/s ± 1%    ~     (p=0.589 n=10+9)
Decoder_DecoderSmall/urls.10K.zst/unbuffered-8         875MB/s ± 4%   877MB/s ± 2%    ~     (p=0.661 n=9+10)
Decoder_DecoderSmall/html.zst/buffered-8               962MB/s ± 0%   961MB/s ± 0%    ~     (p=0.368 n=10+9)
Decoder_DecoderSmall/html.zst/unbuffered-8             711MB/s ± 3%   709MB/s ± 1%    ~     (p=0.965 n=10+8)
Decoder_DecoderSmall/comp-data.bin.zst/buffered-8      407MB/s ± 1%   407MB/s ± 0%    ~     (p=0.684 n=10+10)
Decoder_DecoderSmall/comp-data.bin.zst/unbuffered-8    152MB/s ± 4%   152MB/s ± 3%    ~     (p=0.971 n=10+10)

This replaces some bytes.Equal calls, which compile down to calls to
runtime.memequal, with uint32 or string comparisons that become CMPL
instructions (or a series of CMPBs, in one case).
@greatroar greatroar changed the title zstd: Replace bytes.Equal with faster comparisons zstd: Replace bytes.Equal with smaller comparisons Nov 23, 2022
Copy link
Owner

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes.Equal is doing a string conversion so it is only a semantic change.

But it looks better.

@klauspost klauspost merged commit 4af4108 into klauspost:master Nov 23, 2022
@greatroar greatroar deleted the compare branch November 23, 2022 09:29
@greatroar
Copy link
Contributor Author

greatroar commented Nov 23, 2022

Thanks. I see you've changed the commit message, but I assure you this is not a purely cosmetic change.

The issue isn't really string == vs bytes.Equal, it's var vs const. If you compile the following file...

package comparisons

import "bytes"

const sc = "abcd"

var sv = "abcd"

var bv = []byte("abcd")

func eqSc(p []byte) bool { return string(p[:4]) == sc }
func eqSv(p []byte) bool { return string(p[:4]) == sv }
func eqBv(p []byte) bool { return bytes.Equal(p[:4], bv) }

then you'll find that eqSv and eqBv compile to practically identical code, a bounds check and a call to runtime.memequal. But eqSc compiles to a bounds check and

CMPL    (AX), $1684234849 // AX = p_data, immediate = "abcd" interpreted as little-endian

because the compiler knows the length of both arguments and the value of the latter.

@klauspost
Copy link
Owner

@greatroar

I tried comparing the skippableFrameMagic (in (*Header).Decode) alone just for my curiosity. Both resulted in a runtime.memequal call on go version go1.19.2 windows/amd64. Perhaps because that is 3 and not 4 bytes, so there is no shortcut.

The functions are typically called once per decode (per frame), so I don't expect to see it being anything measurable either way. I do think the code is nicer (except the "\x2a\x4d\x18" - but the rest makes up for it).

kodiakhq bot pushed a commit to cloudquery/cloudquery that referenced this pull request Feb 1, 2023
…7575)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/klauspost/compress](https://togithub.com/klauspost/compress) | indirect | patch | `v1.15.11` -> `v1.15.15` |

---

### Release Notes

<details>
<summary>klauspost/compress</summary>

### [`v1.15.15`](https://togithub.com/klauspost/compress/releases/tag/v1.15.15)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.14...v1.15.15)

##### What's Changed

-   zstd: Add delta encoding support by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#728
-   huff0: Reduce bounds checking by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#734
-   huff0: Assembler improvements by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#736
-   deflate: Improve level 7-9 by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#739
-   gzhttp: Add SuffixETag() and DropETag() options to prevent ETag collisions on compressed responses by [@&#8203;willbicks](https://togithub.com/willbicks) in [klauspost/compress#740
-   zstd: Don't allocate dataStorage when using byteBuf by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#741
-   huff0: Speed up compression of short blocks by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#744
-   zstd: Handle dicts by pointer, always by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#743
-   fse: Optimize compression by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#745
-   Retract v1.14.1-v.1.14.3 by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#742

##### New Contributors

-   [@&#8203;willbicks](https://togithub.com/willbicks) made their first contribution in [klauspost/compress#740

**Full Changelog**: klauspost/compress@v1.15.14...v1.15.15

### [`v1.15.14`](https://togithub.com/klauspost/compress/releases/tag/v1.15.14)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.13...v1.15.14)

#### What's Changed

-   flate: Improve speed in big stateless blocks. by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#718
-   zstd: Trigger BCE by switching on lengths by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#716
-   zstd: Shave some instructions off the amd64 asm by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#720
-   export NoGzipResponseWriter for custom ResponseWriter wrappers by [@&#8203;harshavardhana](https://togithub.com/harshavardhana) in [klauspost/compress#722
-   s2: Add example for indexing and existing stream by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#723
-   tests: Tweak fuzz tests by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#719

#### New Contributors

-   [@&#8203;harshavardhana](https://togithub.com/harshavardhana) made their first contribution in [klauspost/compress#722

**Full Changelog**: klauspost/compress@v1.15.13...v1.15.14

### [`v1.15.13`](https://togithub.com/klauspost/compress/releases/tag/v1.15.13)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.12...v1.15.13)

#### What's Changed

-   zstd: Add MaxEncodedSize to encoder by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#691
-   zstd: Improve "best" end search by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#693
-   zstd: Replace bytes.Equal with smaller comparisons by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#695
-   zstd: Faster CRC checking/skipping by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#696
-   zstd: Rewrite matchLen to make it inlineable by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#701
-   zstd: Write table clearing in a way that the compiler recognizes by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#702
-   zstd: Use individual reset threshold by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#703
-   huff0: Check for zeros earlier in Scratch.countSimple by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#704
-   zstd: Improve best compression's match selection by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#705
-   zstd: Select best match using selection trees by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#706
-   zstd: sync xxhash with final accepted patch upstream by [@&#8203;lizthegrey](https://togithub.com/lizthegrey) in [klauspost/compress#707
-   zstd: Import xxhash v2.2.0 by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#708

**Full Changelog**: klauspost/compress@v1.15.12...v1.15.13

### [`v1.15.12`](https://togithub.com/klauspost/compress/releases/tag/v1.15.12)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.11...v1.15.12)

##### What's Changed

-   zstd: Tweak decoder allocs. by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#680
-   gzhttp: Always delete `HeaderNoCompression` by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#683

**Full Changelog**: klauspost/compress@v1.15.11...v1.15.12

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 3am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4xMDkuMSIsInVwZGF0ZWRJblZlciI6IjM0LjEwOS4xIn0=-->
kodiakhq bot pushed a commit to cloudquery/filetypes that referenced this pull request Mar 1, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/klauspost/compress](https://togithub.com/klauspost/compress) | indirect | minor | `v1.15.11` -> `v1.16.0` |

---

### ⚠ Dependency Lookup Warnings ⚠

Warnings were logged while processing this repo. Please check the Dependency Dashboard for more information.

---

### Release Notes

<details>
<summary>klauspost/compress</summary>

### [`v1.16.0`](https://togithub.com/klauspost/compress/releases/tag/v1.16.0)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.15...v1.16.0)

#### What's Changed

-   s2: Add Dictionary support by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#685
-   s2: Add Compression Size Estimate by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#752
-   s2: Add support for custom stream encoder by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#755
-   s2: Add LZ4 block converter by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#748
-   s2: Support io.ReaderAt in ReadSeeker by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#747
-   s2c/s2sx: Use concurrent decoding by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#746
-   tests: Upgrade to Go 1.20 by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#749
-   Update all (command) dependencies by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#758

**Full Changelog**: klauspost/compress@v1.15.15...v1.16.0

### [`v1.15.15`](https://togithub.com/klauspost/compress/releases/tag/v1.15.15)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.14...v1.15.15)

#### What's Changed

-   zstd: Add delta encoding support by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#728
-   huff0: Reduce bounds checking by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#734
-   huff0: Assembler improvements by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#736
-   deflate: Improve level 7-9 by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#739
-   gzhttp: Add SuffixETag() and DropETag() options to prevent ETag collisions on compressed responses by [@&#8203;willbicks](https://togithub.com/willbicks) in [klauspost/compress#740
-   zstd: Don't allocate dataStorage when using byteBuf by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#741
-   huff0: Speed up compression of short blocks by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#744
-   zstd: Handle dicts by pointer, always by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#743
-   fse: Optimize compression by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#745
-   Retract v1.14.1-v.1.14.3 by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#742

#### New Contributors

-   [@&#8203;willbicks](https://togithub.com/willbicks) made their first contribution in [klauspost/compress#740

**Full Changelog**: klauspost/compress@v1.15.14...v1.15.15

### [`v1.15.14`](https://togithub.com/klauspost/compress/releases/tag/v1.15.14)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.13...v1.15.14)

#### What's Changed

-   flate: Improve speed in big stateless blocks. by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#718
-   zstd: Trigger BCE by switching on lengths by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#716
-   zstd: Shave some instructions off the amd64 asm by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#720
-   export NoGzipResponseWriter for custom ResponseWriter wrappers by [@&#8203;harshavardhana](https://togithub.com/harshavardhana) in [klauspost/compress#722
-   s2: Add example for indexing and existing stream by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#723
-   tests: Tweak fuzz tests by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#719

#### New Contributors

-   [@&#8203;harshavardhana](https://togithub.com/harshavardhana) made their first contribution in [klauspost/compress#722

**Full Changelog**: klauspost/compress@v1.15.13...v1.15.14

### [`v1.15.13`](https://togithub.com/klauspost/compress/releases/tag/v1.15.13)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.12...v1.15.13)

#### What's Changed

-   zstd: Add MaxEncodedSize to encoder by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#691
-   zstd: Improve "best" end search by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#693
-   zstd: Replace bytes.Equal with smaller comparisons by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#695
-   zstd: Faster CRC checking/skipping by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#696
-   zstd: Rewrite matchLen to make it inlineable by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#701
-   zstd: Write table clearing in a way that the compiler recognizes by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#702
-   zstd: Use individual reset threshold by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#703
-   huff0: Check for zeros earlier in Scratch.countSimple by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#704
-   zstd: Improve best compression's match selection by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#705
-   zstd: Select best match using selection trees by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#706
-   zstd: sync xxhash with final accepted patch upstream by [@&#8203;lizthegrey](https://togithub.com/lizthegrey) in [klauspost/compress#707
-   zstd: Import xxhash v2.2.0 by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#708

**Full Changelog**: klauspost/compress@v1.15.12...v1.15.13

### [`v1.15.12`](https://togithub.com/klauspost/compress/releases/tag/v1.15.12)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.15.11...v1.15.12)

#### What's Changed

-   zstd: Tweak decoder allocs. by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#680
-   gzhttp: Always delete `HeaderNoCompression` by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#683

**Full Changelog**: klauspost/compress@v1.15.11...v1.15.12

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 3am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4xMDkuMSIsInVwZGF0ZWRJblZlciI6IjM0LjE1NC4wIn0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants