Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix reader getting wrong posting offsets when querying multiple values #7301

Merged
merged 8 commits into from May 1, 2024

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Apr 24, 2024

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Index header reader has 2 methods to get posting offsets.
PostingsOffset receives only a single value and PostingsOffsets receives multiple values. Both methods rely on the same underlying function postingsOffset to iterate over the posting offset table and get the index range.

PostingsOffset is used in Store Gateway for query and PostingsOffsets is only used in lazy postings.

For a given slice of values, we expect that both of them return the same index ranges, using the following code.

// Batched 
rngs, err := reader.PostingsOffsets(name, values...)

// Non Batched
for _, value := range values {
  rng, err := reader.PostingsOffset(name, value)
  // Add rng to a range slice.
}

However, we observed that PostingsOffsets returned wrong index range when lazy expanded postings is enabled and querying multiple values. The bug is caused by https://github.com/thanos-io/thanos/blob/main/pkg/block/indexheader/binary_reader.go#L949 when the current index header reader is pointing to the last entry in the posting offset table but continue iterating further. This might cause invalid index size bug (reproduced in the new unit test I added) or wrong index range (end < start).

Example returned index ranges. The last one {517099476 111} is wrong. Others are either not found range {-1, -1} or normal range with start value (the first one) < end value (the second one)

[{-1 -1} {512174520 515605200} {-1 -1} {-1 -1} {515605208 517099468} {-1 -1} {-1 -1} {517099476 111}]

This will result in wrong cardinality calculation in lazy posting optimization and make the query performance worse. https://github.com/thanos-io/thanos/blob/main/pkg/store/lazy_postings.go#L64

Verification

I added a unit test to reproduce and also a test case to compare results from batched and non batched implementation.

@@ -173,6 +187,17 @@ func TestReaders(t *testing.T) {
testutil.Assert(t, rngs[2].End > rngs[2].Start)
testutil.Equals(t, NotFoundRange, rngs[1])

// 3 values exist and 3 values don't exist.
rngs, err = br.PostingsOffsets("cluster", "a-eu-west-1", "a-us-west-2", "b-eu-west-1", "b-us-east-1", "c-eu-west-1", "c-us-east-2")
testutil.Ok(t, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this fix, it will cause error invalid index size.

…ltiple values

Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the fix-indexreader-postingoffset branch from 130ff6a to 61878fc Compare April 27, 2024 05:30
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the fix-indexreader-postingoffset branch from 61878fc to 7996766 Compare April 27, 2024 06:04
@alanprot
Copy link
Contributor

alanprot commented May 1, 2024

Ok.. good catch.

I had to pull this PR and run the test to see what was going on but now I understand. :P
Good catch.. LGTM.

We could just add an extra validation on this lines as in this case the decoder buff also returned an error:

skipNAndName(&d, &buf)
d.UvarintBytes() // Label value.
postingOffset := int64(d.Uvarint64())

@yeya24
Copy link
Contributor Author

yeya24 commented May 1, 2024

Thanks for the review. I am going to merge this pr since I believe this change is only used in lazy postings.

@yeya24 yeya24 merged commit 1e745af into thanos-io:main May 1, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants