Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tantivy panic during merge #1629

Closed
fmassot opened this issue Oct 18, 2022 · 6 comments
Closed

tantivy panic during merge #1629

fmassot opened this issue Oct 18, 2022 · 6 comments

Comments

@fmassot
Copy link
Contributor

fmassot commented Oct 18, 2022

I'm opening the issue to start tracking this problem and will add more accurate data about the segments once I have found them...

So during a merge of 10 segments (made with quickwit), I had this stack backtrace:

thread 'merge_thread_0' panicked at 'index out of bounds: the len is 512 but the index is 512', /home/fmassot/.cargo/git/checkouts/tantivy-f70b7ea03dadae9a/97ccd6d/src/postings/compression/vint.rs:36:17
stack backtrace:
   0:     0x5629c7bd614d - std::backtrace_rs::backtrace::libunwind::trace::h9135f25bc195152c
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x5629c7bd614d - std::backtrace_rs::backtrace::trace_unsynchronized::h015ee85be510df51
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x5629c7bd614d - std::sys_common::backtrace::_print_fmt::h5fad03caa9652a2c
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x5629c7bd614d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h2b42ca28d244e5c7
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x5629c7c0031c - core::fmt::write::h401e827d053130ed
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/fmt/mod.rs:1198:17
   5:     0x5629c7bcee81 - std::io::Write::write_fmt::hffec93268f5cde32
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/io/mod.rs:1672:15
   6:     0x5629c7bd77c5 - std::sys_common::backtrace::_print::h180c4c706ee1d3fb
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x5629c7bd77c5 - std::sys_common::backtrace::print::hd0c35d18765761c9
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x5629c7bd77c5 - std::panicking::default_hook::{{closure}}::h1f023310983bc730
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:295:22
   9:     0x5629c7bd74e1 - std::panicking::default_hook::h188fec3334afd5be
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:314:9
  10:     0x5629c7bd7d56 - std::panicking::rust_panic_with_hook::hf26e9d4f97b40096
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:698:17
  11:     0x5629c7bd7c47 - std::panicking::begin_panic_handler::{{closure}}::hfab912107608087a
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:588:13
  12:     0x5629c7bd6644 - std::sys_common::backtrace::__rust_end_short_backtrace::h434b685ce8d9965b
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x5629c7bd7979 - rust_begin_unwind
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
  14:     0x5629c581f8a3 - core::panicking::panic_fmt::ha6dc7f2ab2479463
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
  15:     0x5629c581f7e2 - core::panicking::panic_bounds_check::ha6e6615eae13afdc
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:84:5
  16:     0x5629c760090c - tantivy::positions::serializer::PositionSerializer<W>::flush_block::h891dfda243076f0b
  17:     0x5629c7600995 - tantivy::positions::serializer::PositionSerializer<W>::close_term::hdf0dfcd857c439ad
  18:     0x5629c764aad4 - tantivy::postings::serializer::FieldSerializer::close_term::h9fb43af6db985844
  19:     0x5629c75f09df - tantivy::indexer::merger::IndexMerger::write::h4a49df2be6406a11
  20:     0x5629c75f40de - tantivy::indexer::segment_updater::merge::h88822930a6860f6b
  21:     0x5629c762382b - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hc6c1527bf6b9168e
  22:     0x5629c762ea7c - <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute::h9250abc9a472b211
  23:     0x5629c580e845 - rayon_core::registry::WorkerThread::wait_until_cold::h6673666d367f8ef9
  24:     0x5629c79565be - rayon_core::registry::ThreadBuilder::run::hedd3a6b4143c26b7
  25:     0x5629c79590b1 - std::sys_common::backtrace::__rust_begin_short_backtrace::he9d3cf218a9aa1b6
  26:     0x5629c795b99d - core::ops::function::FnOnce::call_once{{vtable.shim}}::hbce2b4052adc110e
  27:     0x5629c7bddf33 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h56d5fc072706762b
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/boxed.rs:1935:9
  28:     0x5629c7bddf33 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h41deef8e33b824bb
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/boxed.rs:1935:9
  29:     0x5629c7bddf33 - std::sys::unix::thread::Thread::new::thread_start::ha6436304a1170bba
                               at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys/unix/thread.rs:108:17
  30:     0x7fb3f9bc3fa3 - start_thread
  31:     0x7fb3f996d06f - clone
  32:                0x0 - <unknown>
Rayon: detected unexpected panic; aborting
@fmassot
Copy link
Contributor Author

fmassot commented Oct 18, 2022

More info, I added a print to show the content of the block that causes the panic.

    fn flush_block(&mut self) {
        // encode the positions in the block
        if self.block.is_empty() {
            return;
        }
        if self.block.len() == COMPRESSION_BLOCK_SIZE {
            let (bit_width, block_encoded): (u8, &[u8]) =
                self.block_encoder.compress_block_unsorted(&self.block[..]);
            self.bit_widths.push(bit_width);
            self.positions_buffer.extend(block_encoded);
        } else {
            println!("block: {:?}", self.block);
            debug_assert!(self.block.len() < COMPRESSION_BLOCK_SIZE);
            let block_vint_encoded = self.block_encoder.compress_vint_unsorted(&self.block[..]);
            self.positions_buffer.extend_from_slice(block_vint_encoded);
        }
        self.block.clear();
    }
block: [4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239]

@fmassot
Copy link
Contributor Author

fmassot commented Oct 18, 2022

And a test to reproduce the same error as seen in quickwit:

    #[test]
    fn test_encode_unsorted_block_with_panic() {
        let block = vec![4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 183, 4294967216, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239, 199, 4294967195, 4294967255, 4294967239];
    
        let mut encoder = BlockEncoder::new();
        let compressed_data = encoder.compress_vint_unsorted(&block);
    }

outputs:

thread 'postings::compression::tests::test_encode_unsorted_block_with_panic' panicked at 'index out of bounds: the len is 512 but the index is 512', src/postings/compression/vint.rs:36:17

@fmassot
Copy link
Contributor Author

fmassot commented Oct 18, 2022

I extracted one split that could lead to this issue and got the following info. You will notice the huge size of the position file.


+--------------------------------------------+-----------+
|                     Files in Split                     |
+--------------------------------------------+-----------+
| File Name                                  | Size      |
+--------------------------------------------+-----------+
| 26aaa1cd83c348088ad98901635af0d1.fast      | 118.19 kB |
+--------------------------------------------+-----------+
| 26aaa1cd83c348088ad98901635af0d1.fieldnorm | 99 B      |
+--------------------------------------------+-----------+
| 26aaa1cd83c348088ad98901635af0d1.idx       | 834.75 kB |
+--------------------------------------------+-----------+
| 26aaa1cd83c348088ad98901635af0d1.pos       | 7.45 MB   |
+--------------------------------------------+-----------+
| 26aaa1cd83c348088ad98901635af0d1.store     | 240.94 kB |
+--------------------------------------------+-----------+
| 26aaa1cd83c348088ad98901635af0d1.term      | 48.34 kB  |
+--------------------------------------------+-----------+
| hotcache                                   | 5.25 kB   |
+--------------------------------------------+-----------+
| meta.json                                  | 1.55 kB   |
+--------------------------------------------+-----------+

I used tantivy-cli with last tantivy version and the quickwit features to see if the search could work... A normal search works fine but as soon as I tried a phrase query, I got this error (Strangely I do not get this error when I search from quickwit.)

thread 'main' panicked at 'attempt to add with overflow', /home/fmassot/tantivy/src/postings/segment_postings.rs:252:17
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
   2: core::panicking::panic
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:48:5
   3: <tantivy::postings::segment_postings::SegmentPostings as tantivy::postings::postings::Postings>::positions_with_offset
             at /home/fmassot/tantivy/src/postings/segment_postings.rs:252:17
   4: tantivy::query::phrase_query::phrase_scorer::PostingsWithOffset<TPostings>::positions
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_scorer.rs:24:9
   5: tantivy::query::phrase_query::phrase_scorer::PhraseScorer<TPostings>::compute_phrase_match
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_scorer.rs:324:13
   6: tantivy::query::phrase_query::phrase_scorer::PhraseScorer<TPostings>::phrase_exists
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_scorer.rs:299:32
   7: tantivy::query::phrase_query::phrase_scorer::PhraseScorer<TPostings>::phrase_match
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_scorer.rs:294:13
   8: <tantivy::query::phrase_query::phrase_scorer::PhraseScorer<TPostings> as tantivy::docset::DocSet>::advance
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_scorer.rs:363:37
   9: tantivy::query::phrase_query::phrase_scorer::PhraseScorer<TPostings>::new
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_scorer.rs:279:13
  10: tantivy::query::phrase_query::phrase_weight::PhraseWeight::phrase_scorer
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_weight.rs:75:17
  11: <tantivy::query::phrase_query::phrase_weight::PhraseWeight as tantivy::query::weight::Weight>::scorer
             at /home/fmassot/tantivy/src/query/phrase_query/phrase_weight.rs:91:31
  12: tantivy::commands::search::run_search
             at ./src/commands/search.rs:41:26
  13: tantivy::commands::search::run_search_cli
             at ./src/commands/search.rs:17:5
  14: tantivy::main
             at ./src/main.rs:122:25
  15: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@fmassot
Copy link
Contributor Author

fmassot commented Oct 18, 2022

My last note on this today: I tried to identify a split that could cause this error during a merge without success. So the issue reveals itself only a merge with 10 splits that looks like the one above :/

@PSeitz
Copy link
Contributor

PSeitz commented Oct 19, 2022

@fmassot What is the config on the field that fails?

@fmassot
Copy link
Contributor Author

fmassot commented Oct 19, 2022

There is only on text field and it's a array<text> in quickwit. Here is the generated schema in a tantivy segment:

{
      "name": "text",
      "type": "text",
      "options": {
        "indexing": {
          "record": "position",
          "fieldnorms": false,
          "tokenizer": "default"
        },
        "stored": true,
        "fast": false
      }
    }

fulmicoton added a commit that referenced this issue Oct 20, 2022
For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.

As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.

We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.

Closes #1629
fulmicoton added a commit that referenced this issue Oct 20, 2022
For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.

As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.

We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.

Closes #1629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants