Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OR operation results in corrupt bitmaps #358

Closed
richardartoul opened this issue Jun 10, 2022 · 15 comments
Closed

OR operation results in corrupt bitmaps #358

richardartoul opened this issue Jun 10, 2022 · 15 comments

Comments

@richardartoul
Copy link
Contributor

richardartoul commented Jun 10, 2022

Ran into this problem. Seems to be rare and possibly related to range containers? But not sure how to debug it.

func TestCorruption(t *testing.T) {
	bm1Raw := "OjAAAAEAAAAAAAoAEAAAANeU2JTZlNqU25TclN2U3pTflOCU4ZQ="
	bm2Raw := "OzADAA8AAPT/AQD//wIA//8DAD8NJQAAAC8AAAA1AAAAOwAAAAIAAADWlOKUHWsBAAAA//8BAAAA//8BAAAAPw0="
	bm1Decoded, err := base64.StdEncoding.DecodeString(bm1Raw)
	require.NoError(t, err)
	bm2Decoded, err := base64.StdEncoding.DecodeString(bm2Raw)
	require.NoError(t, err)

	bm1 := roaring.New()
	_, err = bm1.FromBuffer(bm1Decoded)
	require.NoError(t, err)

	ok, _ := roundTripRoaring("bm1", bm1)
	require.True(t, ok)

	bm2 := roaring.New()
	_, err = bm2.FromBuffer(bm2Decoded)
	require.NoError(t, err)

	ok, _ = roundTripRoaring("bm2", bm2)
	require.True(t, ok)

	fmt.Println("bm1", bm1.String())
	fmt.Println("bm2", bm1.String())

	bm1.Or(bm2)

        // Panics.
	// fmt.Println("OR output", bm1.String())

	ok, _ = roundTripRoaring("output", bm1)
	require.True(t, ok) // Fails
}

func roundTripRoaring(name string, b *roaring.Bitmap) (bool, []byte) {
	marshaled, err := b.ToBytes()
	if err != nil {
		return false, nil
	}
	p, err := roaring.New().FromBuffer(marshaled)
	if err != nil {
		return false, nil
	}
	if p != int64(len(marshaled)) {
		return false, nil
	}
	return true, marshaled
}
@richardartoul
Copy link
Contributor Author

@lemire I'll keep digging, but if you have an idea let me know.

@richardartoul
Copy link
Contributor Author

@lemire That is unclear from the documentation, but I dont think thats quite right. If you check the implementations, FromBuffer and ReadFrom, and UnmarshalBinary all have almost the exact same implementation. They seem to be mostly convenience wrappers for slightly different APIs (maybe you were thinking of frozenview?)

I also tried updating my reproducer to use ReadFrom and it fails the same way. Similarly, I've already tried using Or(bm1, bm2) and that also fails as well so I think there is a real bug here.

@richardartoul
Copy link
Contributor Author

@lemire Also I should add that I believe this bug is a regression as I only discovered it while testing the upgrade from 0.5.5 to 1.1 in our production shadows. Reverting to 0.5.5 immediately made the issue go away.

... which makes me realize we may be able to narrow down the issue with a git bisect.

@richardartoul
Copy link
Contributor Author

richardartoul commented Jun 11, 2022

@lemire ok i did a quick manual git bisect and I can say with confidence the issue was introduced in this P.R: #312

Before this commit this reproducer behaves properly, afterwards it fails.

EDIT1: Specifically the implementation change to orArray

@richardartoul
Copy link
Contributor Author

Ok so I debugged it further and I think the problem is ~ something like:

  1. (rc *runContainer16) orArray calls result.toArrayContainer()
  2. (rc *runContainer16) toArrayContainer() calls ac.iaddRange() in a loop.
  3. iaddrange may modify ac in place, or it may return a completely new bitmap container (which is completely ignored by the caller)

Something in the space between #2 and #3 is wrong though and breaks the Or functionality. My recommendation is to revert that whole P.R, or at least the modification to the orArray method.

@lemire
Copy link
Member

lemire commented Jun 11, 2022

Ping @jacksonrnewhouse

@lemire
Copy link
Member

lemire commented Jun 11, 2022

@richardartoul Thanks for the analysis. Let us first try to get @jacksonrnewhouse to comment.

@richardartoul
Copy link
Contributor Author

@lemire No problem, thanks for responding so quick!

@jnewhouse
Copy link
Contributor

Okay, found the bug. The method toArrayContainer() is unsafe if there are more than arrayDefaultMaxSize (4096) elements in it. The code tries very hard to not invoke it in such cases but there is an off-by-one error in runArrayUnionToRuns, introduced at https://github.com/RoaringBitmap/roaring/pull/312/files#diff-e78a8f6657508d16a490f61f55f8b9773edde74a3d36bc0a96686d402f4c5c31R2284. Within an interval the "length" field is really one less than the length, so we can use uint16 as the size. That function adds (previousInterval.length+1) together repeatedly, but then returns it as cardMinusOne. Switching that line to just cardMinusOne += previousInterval.length would avoid triggering the toArrayContainer() issue.

@jnewhouse
Copy link
Contributor

This only triggers when the run container and array container combine to be the full set, as the value wraps around and returns 0, causing it to try and pack the full set into an array container.

@jnewhouse
Copy link
Contributor

It also needs the returned run container to be [0, 65355] so that it doesn't even populate the backing uint16 slice of the array container.

@lemire
Copy link
Member

lemire commented Jun 11, 2022

Who wants to take a pick at a PR to fix this? Thanks to @richardartoul we have the test.

@richardartoul
Copy link
Contributor Author

I would be happy to do it, but TBH after reading @jnewhouse explanation I think it would be better if someone familiar with the conventions in the codebase did it since it sounds like the root cause was quite subtle.

@jacksonrnewhouse
Copy link

Okay, I put together a fix. Sorry for introducing this bug.

@lemire
Copy link
Member

lemire commented Jun 14, 2022

We don’t apologize each time a bug is introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants