Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2: non Indexed Seek may seek to the wrong offset #632

Closed
JustinAzoff opened this issue Jun 28, 2022 · 2 comments · Fixed by #633
Closed

s2: non Indexed Seek may seek to the wrong offset #632

JustinAzoff opened this issue Jun 28, 2022 · 2 comments · Fixed by #633

Comments

@JustinAzoff
Copy link

Hi!

I have noticed an issue where the non-indexed Seek code may skip to the wrong offset depending on what parameters are used. I'm not quite sure what is wrong yet, but I was able to work out a test case that shows the issue well.

This writes 1,000,000 25 byte records, then attempts to seek through them using different skip lengths. The part that made this hard to track down is that for small skip sizes it works properly, but large sizes return invalid data, or try seeking too far and trigger unexpected EOF.

$ go test -v      
=== RUN   TestSeeking
=== RUN   TestSeeking/skip=100
=== RUN   TestSeeking/skip=1000
=== RUN   TestSeeking/skip=10000
=== RUN   TestSeeking/skip=20000
=== RUN   TestSeeking/skip=100000
    main_test.go:64: Expected "Item 0000000000000200000\n", got "tem 0000000000000241943\nI"
=== RUN   TestSeeking/skip=200000
    main_test.go:64: Expected "Item 0000000000000400000\n", got "m 0000000000000525829\nIte"
=== RUN   TestSeeking/skip=400000
    main_test.go:56: Failed to seek: unexpected EOF
--- FAIL: TestSeeking (0.21s)
    --- PASS: TestSeeking/skip=100 (0.01s)
    --- PASS: TestSeeking/skip=1000 (0.01s)
    --- PASS: TestSeeking/skip=10000 (0.01s)
    --- PASS: TestSeeking/skip=20000 (0.01s)
    --- FAIL: TestSeeking/skip=100000 (0.00s)
    --- FAIL: TestSeeking/skip=200000 (0.00s)
    --- FAIL: TestSeeking/skip=400000 (0.00s)
import (
        "fmt"
        "io"
        "os"
        "path/filepath"
        "testing"

        "github.com/klauspost/compress/s2"
)

func TestSeeking(t *testing.T) {
        d := t.TempDir()
        fn := filepath.Join(d, "test.s2")

        f, err := os.Create(fn)
        if err != nil {
                t.Fatal(err)
        }

        enc := s2.NewWriter(f)

        //24 bytes per item plus \n = 25 bytes per record
        for i := 0; i < 1_000_000; i++ {
                fmt.Fprintf(enc, "Item %019d\n", i)
        }

        err = enc.Close()
        if err != nil {
                t.Fatal(err)
        }

        err = f.Close()
        if err != nil {
                t.Fatal(err)
        }

        for _, skip := range []int{100, 1_000, 10_000, 20_000, 100_000, 200_000, 400_000} {
                t.Run(fmt.Sprintf("skip=%d", skip), func(t *testing.T) {
                        f, err = os.Open(fn)
                        if err != nil {
                                t.Fatal(err)
                        }
                        defer f.Close()
                        dec := s2.NewReader(f)
                        seeker := s2.ReadSeeker{
                                Reader: dec,
                        }
                        buf := make([]byte, 25)
                        for rec := 0; rec < 1_000_000; rec += skip {
                                offset := int64(rec * 25)
                                //t.Logf("Reading record %d", rec)
                                _, err := seeker.Seek(offset, io.SeekStart)
                                if err != nil {
                                        t.Fatalf("Failed to seek: %v", err)
                                }
                                _, err = dec.Read(buf)
                                if err != nil {
                                        t.Fatalf("Failed to seek: %v", err)
                                }
                                expected := fmt.Sprintf("Item %019d\n", rec)
                                if string(buf) != expected {
                                        t.Fatalf("Expected %q, got %q", expected, buf)
                                }
                        }
                })
        }

}
@klauspost
Copy link
Owner

Thanks for the reproducer. I will take a look!

klauspost added a commit that referenced this issue Jun 29, 2022
Un-indexed forward seeks does not preserve correct absolute offset when fully skipping blocks.

If stream is seekable, but no index is provided, allow non-random seeking.

Fixes #632
@klauspost
Copy link
Owner

@JustinAzoff Thanks for the report again.

I have added a fix in #633 - I added a slightly modified version of your reproducer as a test.

I will release a new version shortly.

klauspost added a commit that referenced this issue Jun 29, 2022
* s2: Fix absolute forward seeks

Un-indexed forward seeks does not preserve correct absolute offset when fully skipping blocks.

If stream is seekable, but no index is provided, allow non-random seeking.

Fixes #632
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants