New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tantivy panic during merge #1629
Comments
More info, I added a print to show the content of the block that causes the panic. fn flush_block(&mut self) {
// encode the positions in the block
if self.block.is_empty() {
return;
}
if self.block.len() == COMPRESSION_BLOCK_SIZE {
let (bit_width, block_encoded): (u8, &[u8]) =
self.block_encoder.compress_block_unsorted(&self.block[..]);
self.bit_widths.push(bit_width);
self.positions_buffer.extend(block_encoded);
} else {
println!("block: {:?}", self.block);
debug_assert!(self.block.len() < COMPRESSION_BLOCK_SIZE);
let block_vint_encoded = self.block_encoder.compress_vint_unsorted(&self.block[..]);
self.positions_buffer.extend_from_slice(block_vint_encoded);
}
self.block.clear();
}
|
And a test to reproduce the same error as seen in quickwit: #[test]
fn test_encode_unsorted_block_with_panic() {
let block = vec![4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239];
let mut encoder = BlockEncoder::new();
let compressed_data = encoder.compress_vint_unsorted(&block);
} outputs:
|
I extracted one split that could lead to this issue and got the following info. You will notice the huge size of the position file.
I used tantivy-cli with last tantivy version and the quickwit features to see if the search could work... A normal search works fine but as soon as I tried a phrase query, I got this error (Strangely I do not get this error when I search from quickwit.)
|
My last note on this today: I tried to identify a split that could cause this error during a merge without success. So the issue reveals itself only a merge with 10 splits that looks like the one above :/ |
@fmassot What is the config on the field that fails? |
There is only on text field and it's a {
"name": "text",
"type": "text",
"options": {
"indexing": {
"record": "position",
"fieldnorms": false,
"tokenizer": "default"
},
"stored": true,
"fast": false
}
} |
For Field with several FieldValues, with a value that contained no token at all, the token position was reinitialized to 0. As a result, PhraseQueries can show some false positives. In addition, after the computation of the position delta, we can underflow u32, and end up with gigantic delta. We haven't been able to actually explain the bug in 1629, but it is assumed that in some corner case these delta can cause a panic. Closes #1629
For Field with several FieldValues, with a value that contained no token at all, the token position was reinitialized to 0. As a result, PhraseQueries can show some false positives. In addition, after the computation of the position delta, we can underflow u32, and end up with gigantic delta. We haven't been able to actually explain the bug in 1629, but it is assumed that in some corner case these delta can cause a panic. Closes #1629
I'm opening the issue to start tracking this problem and will add more accurate data about the segments once I have found them...
So during a merge of 10 segments (made with quickwit), I had this stack backtrace:
The text was updated successfully, but these errors were encountered: