Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in merge thread #1125

Closed
appaquet opened this issue Jul 31, 2021 · 7 comments · Fixed by #1132
Closed

Panic in merge thread #1125

appaquet opened this issue Jul 31, 2021 · 7 comments · Fixed by #1132

Comments

@appaquet
Copy link
Contributor

Running on 0.15.3, the merge thread panics while trying to list segment positions. I couldn't find a small reproducible code that triggers the issue yet, and this seems to only happen in my project after a few days of usage with probably about a hundred mutations per day. After trying to reproduce the issue with random mutations in my quest of finding a small reproducing example, I have the feeling that it's the result of some corruption after many merges.

Backtrace
thread 'merge_thread0' panicked at 'attempt to add with overflow', /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/postings/segment_postings.rs:258:17
stack backtrace:
   0:     0x55f464fc1b50 - std::backtrace_rs::backtrace::libunwind::trace::ha5edb8ba5c6b7a6c
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
   1:     0x55f464fc1b50 - std::backtrace_rs::backtrace::trace_unsynchronized::h0de86d320a827db2
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55f464fc1b50 - std::sys_common::backtrace::_print_fmt::h97b9ad6f0a1380ff
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x55f464fc1b50 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h14be7eb08f97fe80
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x55f464fec0ef - core::fmt::write::h2ca8877d3e0e52de
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/fmt/mod.rs:1094:17
   5:     0x55f464fba6b5 - std::io::Write::write_fmt::h64f5987220b618f4
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/io/mod.rs:1584:15
   6:     0x55f464fc405b - std::sys_common::backtrace::_print::h7f1a4097308f2e0a
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x55f464fc405b - std::sys_common::backtrace::print::h1f799fc2ca7f5035
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x55f464fc405b - std::panicking::default_hook::{{closure}}::hf38436e8a3ce1071
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:208:50
   9:     0x55f464fc3b2d - std::panicking::default_hook::he2f8f3fae11ed1dd
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:225:9
  10:     0x55f464fc47dd - std::panicking::rust_panic_with_hook::h79a18548bd90c7d4
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:591:17
  11:     0x55f464fc4317 - std::panicking::begin_panic_handler::{{closure}}::h212a72cc08e25126
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:495:13
  12:     0x55f464fc1fec - std::sys_common::backtrace::__rust_end_short_backtrace::hbd6897dd42bc0fcd
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:141:18
  13:     0x55f464fc42a9 - rust_begin_unwind
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
  14:     0x55f464fe9771 - core::panicking::panic_fmt::h77ecd04e9b1dd84d
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
  15:     0x55f464fe96bd - core::panicking::panic::h60569d8a39169222
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
  16:     0x55f463b57ee9 - <tantivy::postings::segment_postings::SegmentPostings as tantivy::postings::postings::Postings>::positions_with_offset::h4123bf66e4e00f68
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/postings/segment_postings.rs:258:17
  17:     0x55f463b57479 - tantivy::postings::postings::Postings::positions::hbd39bc6e49c80976
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/postings/postings.rs:24:9
  18:     0x55f463a91743 - tantivy::indexer::merger::IndexMerger::write_postings_for_field::h57e445b57d077cf9
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/indexer/merger.rs:922:25
  19:     0x55f463a9297d - tantivy::indexer::merger::IndexMerger::write_postings::h0fbf4e4079e39341
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/indexer/merger.rs:970:53
  20:     0x55f463a94cf7 - tantivy::indexer::merger::IndexMerger::write::h7c6ea7dfcc8f779d
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/indexer/merger.rs:1071:33
  21:     0x55f4639481e2 - tantivy::indexer::segment_updater::merge::h36b25a9851a827ec
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/indexer/segment_updater.rs:142:20
  22:     0x55f46394d432 - tantivy::indexer::segment_updater::SegmentUpdater::start_merge::{{closure}}::h8d47bf6a2d5d64b7
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/indexer/segment_updater.rs:498:19
  23:     0x55f463b7dc2c - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h8a901b06a716ed55
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
  24:     0x55f464945269 - <futures_task::future_obj::LocalFutureObj<T> as core::future::future::Future>::poll::h089823b4a04b8205
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-task-0.3.16/src/future_obj.rs:84:18
  25:     0x55f4649451e1 - <futures_task::future_obj::FutureObj<T> as core::future::future::Future>::poll::h122379c8c36bd25a
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-task-0.3.16/src/future_obj.rs:127:9
  26:     0x55f464944c3c - futures_util::future::future::FutureExt::poll_unpin::h992a41cd45e9a023
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.16/src/future/future/mod.rs:562:9
  27:     0x55f464946a75 - futures_executor::thread_pool::Task::run::h67abfdd088d67ee6
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.16/src/thread_pool.rs:322:27
  28:     0x55f4649458e7 - futures_executor::thread_pool::PoolState::work::habcdb6f78988cc91
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.16/src/thread_pool.rs:154:39
  29:     0x55f4649468fb - futures_executor::thread_pool::ThreadPoolBuilder::create::{{closure}}::h62dba45c12cd055b
                               at /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.16/src/thread_pool.rs:284:42
  30:     0x55f464958edc - std::sys_common::backtrace::__rust_begin_short_backtrace::h5af3b7a895a3f7bc
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
  31:     0x55f46495a741 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::hfb907fbc1f766a9a
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:481:17
  32:     0x55f464958eb1 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::ha4715ad11f2a71a4
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:344:9
  33:     0x55f464948373 - std::panicking::try::do_call::hcdd310d3de191f62
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
  34:     0x55f46494876d - __rust_try
  35:     0x55f4649482b1 - std::panicking::try::h0354e8d9732084ca
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
  36:     0x55f464959b91 - std::panic::catch_unwind::h4b478974cca8c20c
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:431:14
  37:     0x55f46495a53d - std::thread::Builder::spawn_unchecked::{{closure}}::h65ed724ce8ee9f8a
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:480:30
  38:     0x55f46495a8bf - core::ops::function::FnOnce::call_once{{vtable.shim}}::h496b9f8e92515106
                               at /home/appaquet/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
  39:     0x55f464fcb3da - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h75c2ca1daad47228
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/alloc/src/boxed.rs:1546:9
  40:     0x55f464fcb3da - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hdf9f8afc9d34e311
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/alloc/src/boxed.rs:1546:9
  41:     0x55f464fcb3da - std::sys::unix::thread::Thread::new::thread_start::hc238bac7748b195d
                               at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys/unix/thread.rs:71:17
  42:     0x7fb5f6eaf609 - start_thread
  43:     0x7fb5f6c7f293 - clone
  44:                0x0 - <unknown>

After adding some debugging right before the panic point, and it seems that the positions explode in value which gets summed in loop and eventually overflows the u32.

Debugging code

diff --git a/src/postings/segment_postings.rs b/src/postings/segment_postings.rs
index 67c3aa48a..78b958f72 100644
--- a/src/postings/segment_postings.rs
+++ b/src/postings/segment_postings.rs
@@ -255,6 +255,9 @@ impl Postings for SegmentPostings {
             position_reader.read(read_offset, &mut output[..]);
             let mut cum = offset;
             for output_mut in output.iter_mut() {
+                if cum > 100000 {
+                    println!("tf={} out={} cum={}", term_freq, *output_mut, cum);
+                }
                 cum += *output_mut;
                 *output_mut = cum;
             }

Outputs

tf=17 out=537053 cum=214909
tf=17 out=1564834 cum=751962
tf=17 out=4131190 cum=2316796
tf=17 out=10060076 cum=6447986
tf=17 out=22897404 cum=16508062
tf=17 out=49205305 cum=39405466
tf=17 out=100625901 cum=88610771
tf=17 out=197071398 cum=189236672
tf=17 out=371523100 cum=386308070
tf=17 out=677080003 cum=757831170
tf=17 out=1197093556 cum=1434911173
tf=17 out=2059465165 cum=2632004729
...
thread 'merge_thread0' panicked at 'attempt to add with overflow', /home/appaquet/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/postings/segment_postings.rs:258:17
...

Let me know if there is anything I could provide to help. Even though my code that uses Tantivy is open source, it's probably too complex to reproduce easily on your side and would require my corrupted index that contains private data. Feel free to contact me privately if it can help debugging (my gh username @ gmail.com)

@fulmicoton
Copy link
Collaborator

Thank you for the thorough bug report (again) :)
I think your analysis is correct.

Can you share the size of your documents (in KB is ok) and the number of documents you have in your segments today?

Tantivy panics if it tries to merge segments and end up with a result amounting to more than 4 billions tokens in the segment.
We need to prevent these merge from happening or accept > 4 billions tokens.

@fulmicoton
Copy link
Collaborator

#1126

@fulmicoton
Copy link
Collaborator

Oops no I didn't analyze the bug correctly I think

@fulmicoton
Copy link
Collaborator

This is pretty bad... The position file must be corrupted. Right now it acts as if one document has 4 billions tokens.

@appaquet
Copy link
Contributor Author

appaquet commented Aug 1, 2021

Thank you for the thorough bug report (again) :)
I think your analysis is correct.

Can you share the size of your documents (in KB is ok) and the number of documents you have in your segments today?

Tantivy panics if it tries to merge segments and end up with a result amounting to more than 4 billions tokens in the segment.
We need to prevent these merge from happening or accept > 4 billions tokens.

See https://gist.github.com/appaquet/54c30d8c7f82712934f58c82a6592e10 for list of files and some more logging about the merged segments and their size. As you can see, it's definitely not hitting the 4B tokens limit.

@PSeitz
Copy link
Contributor

PSeitz commented Aug 1, 2021

Question would be if the corruption is coming from the merge code or somewhere else.
Could you share your config?

@fulmicoton
Copy link
Collaborator

@PSeitz @appaquet shared his meta.json (see the link)

TLDR: No specific tokenizer. Sorted by a fast field. Some deletes in some segments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants