Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrectly rejected blocks #569

Merged
merged 4 commits into from Apr 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
58 changes: 21 additions & 37 deletions zstd/README.md
Expand Up @@ -386,47 +386,31 @@ In practice this means that concurrency is often limited to utilizing about 3 co

### Benchmarks

These are some examples of performance compared to [datadog cgo library](https://github.com/DataDog/zstd).

The first two are streaming decodes and the last are smaller inputs.


Running on AMD Ryzen 9 3950X 16-Core Processor. AMD64 assembly used.

```
BenchmarkDecoderSilesia-8 3 385000067 ns/op 550.51 MB/s 5498 B/op 8 allocs/op
BenchmarkDecoderSilesiaCgo-8 6 197666567 ns/op 1072.25 MB/s 270672 B/op 8 allocs/op

BenchmarkDecoderEnwik9-8 1 2027001600 ns/op 493.34 MB/s 10496 B/op 18 allocs/op
BenchmarkDecoderEnwik9Cgo-8 2 979499200 ns/op 1020.93 MB/s 270672 B/op 8 allocs/op

Concurrent performance:

BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16 28915 42469 ns/op 4340.07 MB/s 114 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16 116505 9965 ns/op 11900.16 MB/s 16 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16 8952 134272 ns/op 3588.70 MB/s 915 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16 11820 102538 ns/op 4161.90 MB/s 594 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16 34782 34184 ns/op 3661.88 MB/s 60 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16 27712 43447 ns/op 3500.58 MB/s 99 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16 62826 18750 ns/op 21845.10 MB/s 104 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16 631545 1794 ns/op 57078.74 MB/s 2 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16 1690140 712 ns/op 172938.13 MB/s 1 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16 10432 113593 ns/op 6180.73 MB/s 1143 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html.zst-16 113206 10671 ns/op 9596.27 MB/s 15 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16 1530615 779 ns/op 5229.49 MB/s 0 B/op 0 allocs/op

BenchmarkDecoder_DecodeAllParallelCgo/kppkn.gtb.zst-16 65217 16192 ns/op 11383.34 MB/s 46 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/geo.protodata.zst-16 292671 4039 ns/op 29363.19 MB/s 6 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/plrabn12.txt.zst-16 26314 46021 ns/op 10470.43 MB/s 293 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/lcet10.txt.zst-16 33897 34900 ns/op 12227.96 MB/s 205 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/asyoulik.txt.zst-16 104348 11433 ns/op 10949.01 MB/s 20 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/alice29.txt.zst-16 75949 15510 ns/op 9805.60 MB/s 32 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/html_x_4.zst-16 173910 6756 ns/op 60624.29 MB/s 37 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/paper-100k.pdf.zst-16 923076 1339 ns/op 76474.87 MB/s 1 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/fireworks.jpeg.zst-16 922920 1351 ns/op 91102.57 MB/s 2 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/urls.10K.zst-16 27649 43618 ns/op 16096.19 MB/s 407 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/html.zst-16 279073 4160 ns/op 24614.18 MB/s 6 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/comp-data.bin.zst-16 749938 1579 ns/op 2581.71 MB/s 0 B/op 0 allocs/op
BenchmarkDecoderSilesia-32 5 206878840 ns/op 1024.50 MB/s 49808 B/op 43 allocs/op
BenchmarkDecoderEnwik9-32 1 1271809000 ns/op 786.28 MB/s 72048 B/op 52 allocs/op

Concurrent blocks, performance:

BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32 67356 17857 ns/op 10321.96 MB/s 22.48 pct 102 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32 266656 4421 ns/op 26823.21 MB/s 11.89 pct 19 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32 20992 56842 ns/op 8477.17 MB/s 39.90 pct 754 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32 27456 43932 ns/op 9714.01 MB/s 33.27 pct 524 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32 78432 15047 ns/op 8319.15 MB/s 40.34 pct 66 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32 65800 18436 ns/op 8249.63 MB/s 37.75 pct 88 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32 102993 11523 ns/op 35546.09 MB/s 3.637 pct 143 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32 1000000 1070 ns/op 95720.98 MB/s 80.53 pct 3 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32 749802 1752 ns/op 70272.35 MB/s 100.0 pct 5 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32 22640 52934 ns/op 13263.37 MB/s 26.25 pct 1014 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html.zst-32 226412 5232 ns/op 19572.27 MB/s 14.49 pct 20 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32 923041 1276 ns/op 3194.71 MB/s 31.26 pct 0 B/op 0 allocs/op
```

This reflects the performance around May 2020, but this may be out of date.
This reflects the performance around May 2022, but this may be out of date.

## Zstd inside ZIP files

Expand Down
28 changes: 4 additions & 24 deletions zstd/_generate/gen.go
@@ -1,13 +1,10 @@
package main

//go:generate go run gen.go -out seqdec_amd64.s -stubs delme.go -pkg=zstd
//go:generate go run gen.go -out ../seqdec_amd64.s -pkg=zstd

import (
"flag"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"runtime"

_ "github.com/klauspost/compress"
Expand All @@ -31,7 +28,7 @@ const errorMatchLenTooBig = 2
// error reported when mo > t or mo > s.windowSize
const errorMatchOffTooBig = 3

// error reported when the sum of literal lengths exeeceds the literal buffer size
// error reported when the sum of literal lengths exceeds the literal buffer size
const errorNotEnoughLiterals = 4

// error reported when capacity of `out` is too small
Expand All @@ -44,13 +41,6 @@ const seqValsSize = 24

func main() {
flag.Parse()
out := flag.Lookup("out")
os.Remove(filepath.Join("..", out.Value.String()))
stub := flag.Lookup("stubs")
if stub.Value.String() != "" {
os.Remove(stub.Value.String())
defer os.Remove(stub.Value.String())
}

Constraint(buildtags.Not("appengine").ToConstraint())
Constraint(buildtags.Not("noasm").ToConstraint())
Expand Down Expand Up @@ -89,16 +79,6 @@ func main() {
decodeSync.setBMI2(true)
decodeSync.generateProcedure("sequenceDecs_decodeSync_safe_bmi2")
Generate()
b, err := ioutil.ReadFile(out.Value.String())
if err != nil {
panic(err)
}
const readOnly = 0444
err = ioutil.WriteFile(filepath.Join("..", out.Value.String()), b, readOnly)
if err != nil {
panic(err)
}
os.Remove(out.Value.String())
}

func debugval(v Op) {
Expand Down Expand Up @@ -1044,12 +1024,12 @@ func (e executeSimple) executeSingleTriple(c *executeSingleTripleContext, handle
if !e.useSeqs {
Comment("Check if we have enough space in s.out")
{
// baseAfterCopy = ll + ml + c.outBese
// baseAfterCopy = ll + ml + c.outBase
baseAfterCopy := GP64()
LEAQ(Mem{Base: ll, Index: ml, Scale: 1}, baseAfterCopy)
ADDQ(c.outBase, baseAfterCopy)
CMPQ(baseAfterCopy, c.outEndPtr)
JAE(LabelRef("error_not_enough_space"))
JA(LabelRef("error_not_enough_space"))
}
}

Expand Down
6 changes: 5 additions & 1 deletion zstd/blockdec.go
Expand Up @@ -494,11 +494,15 @@ func (b *blockDec) decodeCompressed(hist *history) error {
b.dst = append(b.dst, hist.decoders.literals...)
return nil
}

before := len(hist.decoders.out)
err = hist.decoders.decodeSync(hist.b[hist.ignoreBuffer:])
if err != nil {
return err
}
if hist.decoders.maxSyncLen > 0 {
hist.decoders.maxSyncLen += uint64(before)
hist.decoders.maxSyncLen -= uint64(len(hist.decoders.out))
}
b.dst = hist.decoders.out
hist.recentOffsets = hist.decoders.prevOffset
return nil
Expand Down
7 changes: 4 additions & 3 deletions zstd/decoder_test.go
Expand Up @@ -703,7 +703,7 @@ func TestDecoderRegression(t *testing.T) {
if err != nil {
t.Error(err)
}
got, gotErr := dec.DecodeAll(in, nil)
got, gotErr := dec.DecodeAll(in, make([]byte, 0, len(in)))
t.Log("Received:", len(got), gotErr)

// Check if we got the same:
Expand All @@ -713,7 +713,7 @@ func TestDecoderRegression(t *testing.T) {
return
}
defer decL.Close()
got2, gotErr2 := decL.DecodeAll(in, nil)
got2, gotErr2 := decL.DecodeAll(in, make([]byte, 0, len(in)/2))
t.Log("Fresh Reader received:", len(got2), gotErr2)
if gotErr != gotErr2 {
if gotErr != nil && gotErr2 != nil && gotErr.Error() != gotErr2.Error() {
Expand Down Expand Up @@ -741,7 +741,7 @@ func TestDecoderRegression(t *testing.T) {
if err != nil {
t.Error(err)
}
got, gotErr := dec.DecodeAll(in, nil)
got, gotErr := dec.DecodeAll(in, make([]byte, 0, len(in)))
t.Log("Received:", len(got), gotErr)

// Check a fresh instance
Expand Down Expand Up @@ -1381,6 +1381,7 @@ func BenchmarkDecoder_DecodeAllParallel(b *testing.B) {
}
}
})
b.ReportMetric(100*float64(len(in))/float64(len(got)), "pct")
})
}
}
Expand Down
6 changes: 3 additions & 3 deletions zstd/encoder_test.go
Expand Up @@ -251,12 +251,12 @@ func TestEncoderRegression(t *testing.T) {
t.Error(err)
}
encoded := enc.EncodeAll(in, nil)
got, err := dec.DecodeAll(encoded, nil)
// Usually too small...
got, err := dec.DecodeAll(encoded, make([]byte, 0, len(in)))
if err != nil {
t.Logf("error: %v\nwant: %v\ngot: %v", err, len(in), len(got))
t.Fatal(err)
}

// Use the Writer
var dst bytes.Buffer
enc.ResetContentSize(&dst, int64(len(in)))
Expand All @@ -269,7 +269,7 @@ func TestEncoderRegression(t *testing.T) {
t.Error(err)
}
encoded = dst.Bytes()
got, err = dec.DecodeAll(encoded, nil)
got, err = dec.DecodeAll(encoded, make([]byte, 0, len(in)/2))
if err != nil {
t.Logf("error: %v\nwant: %v\ngot: %v", err, in, got)
t.Error(err)
Expand Down
6 changes: 4 additions & 2 deletions zstd/seqdec_amd64.go
Expand Up @@ -59,10 +59,9 @@ func (s *sequenceDecs) decodeSyncSimple(hist []byte) (bool, error) {
if s.maxSyncLen == 0 && cap(s.out)-len(s.out) < maxCompressedBlockSizeAlloc {
useSafe = true
}
if s.maxSyncLen > 0 && uint64(cap(s.out))-compressedBlockOverAlloc < s.maxSyncLen {
if s.maxSyncLen > 0 && cap(s.out)-len(s.out)-compressedBlockOverAlloc < int(s.maxSyncLen) {
useSafe = true
}

br := s.br

maxBlockSize := maxCompressedBlockSize
Expand Down Expand Up @@ -123,6 +122,9 @@ func (s *sequenceDecs) decodeSyncSimple(hist []byte) (bool, error) {

case errorNotEnoughSpace:
size := ctx.outPosition + ctx.ll + ctx.ml
if debugDecoder {
println("msl:", s.maxSyncLen, "cap", cap(s.out), "bef:", startSize, "sz:", size-startSize, "mbs:", maxBlockSize, "outsz:", cap(s.out)-startSize)
}
return true, fmt.Errorf("output (%d) bigger than max block size (%d)", size-startSize, maxBlockSize)

default:
Expand Down
14 changes: 9 additions & 5 deletions zstd/seqdec_amd64.s
@@ -1,4 +1,4 @@
// Code generated by command: go run gen.go -out seqdec_amd64.s -stubs delme.go -pkg=zstd. DO NOT EDIT.
// Code generated by command: go run gen.go -out ../seqdec_amd64.s -pkg=zstd. DO NOT EDIT.

//go:build !appengine && !noasm && gc && !noasm
// +build !appengine,!noasm,gc,!noasm
Expand Down Expand Up @@ -1667,7 +1667,7 @@ sequenceDecs_decodeSync_amd64_match_len_ofs_ok:
LEAQ (AX)(R13*1), R14
ADDQ R10, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ AX, AX
Expand Down Expand Up @@ -1913,6 +1913,7 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R12, 136(AX)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These showed up after re-regenerating.

MOVQ $0x00000005, ret+24(FP)
RET

Expand Down Expand Up @@ -2173,7 +2174,7 @@ sequenceDecs_decodeSync_bmi2_match_len_ofs_ok:
LEAQ (CX)(R13*1), R14
ADDQ R9, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ CX, CX
Expand Down Expand Up @@ -2419,6 +2420,7 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R11, 136(AX)
MOVQ $0x00000005, ret+24(FP)
RET

Expand Down Expand Up @@ -2701,7 +2703,7 @@ sequenceDecs_decodeSync_safe_amd64_match_len_ofs_ok:
LEAQ (AX)(R13*1), R14
ADDQ R10, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ AX, AX
Expand Down Expand Up @@ -2976,6 +2978,7 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R12, 136(AX)
MOVQ $0x00000005, ret+24(FP)
RET

Expand Down Expand Up @@ -3236,7 +3239,7 @@ sequenceDecs_decodeSync_safe_bmi2_match_len_ofs_ok:
LEAQ (CX)(R13*1), R14
ADDQ R9, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ CX, CX
Expand Down Expand Up @@ -3511,5 +3514,6 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R11, 136(AX)
MOVQ $0x00000005, ret+24(FP)
RET
Binary file modified zstd/testdata/comp-crashers.zip
Binary file not shown.