Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in Prometheus timer #2485

Closed
michaelsproul opened this issue Jul 30, 2021 · 4 comments
Closed

Panic in Prometheus timer #2485

michaelsproul opened this issue Jul 30, 2021 · 4 comments
Labels
A0 bug Something isn't working

Comments

@michaelsproul
Copy link
Member

michaelsproul commented Jul 30, 2021

A user on Discord reported this panic in a Prometheus timer, which looks quite curious.

jul 30 08:18:00 JustSomeServer lighthouse[5262]: Jul 30 08:18:00.237 INFO New block received                      hash: 0x64fb…48a3, slot: 1733488
Jul 30 08:18:05 JustSomeServer lighthouse[5262]: Jul 30 08:18:05.218 INFO Synced                                  slot: 1733488, block: 0x64fb…48a3, epoch: 54171, finalized_epoch: 54169, finalized_root: 0xa77e…419b, peers: 52, service: slot_notifier
Jul 30 08:18:07 JustSomeServer lighthouse[5262]: thread 'thread 'tokio-runtime-worker' panicked at 'supplied instant is later than self', library/std/src/time.rs:281:48
Jul 30 08:18:07 JustSomeServer lighthouse[5262]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Jul 30 08:18:07 JustSomeServer lighthouse[5262]: tokio-runtime-worker' panicked at 'supplied instant is later than self', library/std/src/time.rs:281:48
Jul 30 08:18:07 JustSomeServer lighthouse[5262]: thread 'thread 'tokio-runtime-worker' panicked at 'supplied instant is later than self', library/std/src/time.rs:281:48
Jul 30 08:18:07 JustSomeServer lighthouse[5262]: stack backtrace:
Jul 30 08:18:07 JustSomeServer lighthouse[5262]: tokio-runtime-worker' panicked at 'supplied instant is later than self', library/std/src/time.rs:281:48
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    0:     0x560a89751930 - std::backtrace_rs::backtrace::libunwind::trace::h63b7a90188ab5fb3
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    1:     0x560a89751930 - std::backtrace_rs::backtrace::trace_unsynchronized::h80aefbf9b851eca7
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    2:     0x560a89751930 - std::sys_common::backtrace::_print_fmt::hbef05ae4237a4d72
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:67:5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    3:     0x560a89751930 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h28abce2fdb9884c2
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:46:22
Jul 30 08:18:07 JustSomeServer systemd[1]: lhbn.service: Main process exited, code=killed, status=4/ILL
Jul 30 08:18:07 JustSomeServer systemd[1]: lhbn.service: Failed with result 'signal'.
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    4:     0x560a89773a6f - core::fmt::write::h3b84512577ca38a8
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/fmt/mod.rs:1092:17
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    5:     0x560a89749952 - std::io::Write::write_fmt::h465f8feea02e2aa1
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/io/mod.rs:1572:15
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    6:     0x560a89753f65 - std::sys_common::backtrace::_print::h525280ee0d29bdde
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:49:5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    7:     0x560a89753f65 - std::sys_common::backtrace::print::h1f0f5b9f3ef8fb78
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:36:9
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    8:     0x560a89753f65 - std::panicking::default_hook::{{closure}}::ha5838f6faa4a5a8f
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:208:50
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:    9:     0x560a89753a13 - std::panicking::default_hook::hfb9fe98acb0dcb3b
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:225:9
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   10:     0x560a8975456d - std::panicking::rust_panic_with_hook::hb89f5f19036e6af8
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:591:17
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   11:     0x560a89754107 - std::panicking::begin_panic_handler::{{closure}}::h119e7951427f41da
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:497:13
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   12:     0x560a89751dec - std::sys_common::backtrace::__rust_end_short_backtrace::hce386c44bf47a128
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:141:18
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   13:     0x560a89754069 - rust_begin_unwind
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   14:     0x560a882bf191 - core::panicking::panic_fmt::h2242888e8769cd33
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   15:     0x560a882beff3 - core::option::expect_failed::h5ed4263027530e3b
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:1321:5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   16:     0x560a8975142b - core::option::Option<T>::expect::hab4d2ebf9bb84f36
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:349:21
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   17:     0x560a8975142b - std::time::Instant::duration_since::hc096810677d2cca4
ul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/time.rs:281:9
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   18:     0x560a8975142b - <std::time::Instant as core::ops::arith::Sub>::sub::hc9b57e1423187eb6
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/time.rs:406:9
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   19:     0x560a8975142b - std::time::Instant::elapsed::h8b0b27c246a03bd7
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/time.rs:345:9
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   20:     0x560a896208d6 - <prometheus::histogram::HistogramTimer as core::ops::drop::Drop>::drop::h386e2807304ed548
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   21:     0x560a88a66873 - core::ptr::drop_in_place<core::option::Option<prometheus::histogram::HistogramTimer>>::h756f627bfe202d2f
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   22:     0x560a88a191a5 - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::h8f0d8d8f32dfab2e
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   23:     0x560a887888ed - tokio::runtime::task::harness::Harness<T,S>::poll::h145341c68d0cc9b5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   24:     0x560a8930b6ca - std::sys_common::backtrace::__rust_begin_short_backtrace::h8f5268459a95c38a
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   25:     0x560a893223e8 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h0e748a2b3e9b15d5
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   26:     0x560a89758eba - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc444a77f8dd8d825
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   27:     0x560a89758eba - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8b68a0a9a2093dfc
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   28:     0x560a89758eba - std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:                                at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/thread.rs:71:17
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   29:     0x7fbb436f16db - start_thread
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   30:     0x7fbb42e7871f - __clone
Jul 30 08:18:07 JustSomeServer lighthouse[5262]:   31:                0x0 - <unknown>
Jul 30 08:18:07 JustSomeServer lighthouse[5262]: thread panicked while panicking. aborting.
Jul 30 08:18:12 JustSomeServer systemd[1]: lhbn.service: Service hold-off time over, scheduling restart.
Jul 30 08:18:12 JustSomeServer systemd[1]: lhbn.service: Scheduled restart job, restart counter is at 1.
Jul 30 08:18:12 JustSomeServer systemd[1]: Stopped Lighthouse Beacon Node.
Jul 30 08:18:12 JustSomeServer systemd[1]: Started Lighthouse Beacon Node.

lhbn_panicked.txt

@michaelsproul michaelsproul changed the title Fix metrics panic Panic in Prometheus timer Jul 30, 2021
@michaelsproul
Copy link
Member Author

Looking into it, it seems that Prometheus uses Instant::elapsed, which shouldn't panic unless there are bugs in the implementation of Instant at the OS/hardware level. I'll try to find out more about this user's platform so we can narrow it down a bit. The comments in Instant's source hint at a colourful history of timing bugs on different platforms:

https://doc.rust-lang.org/src/std/time.rs.html#220-260

We should probably propose a patch to the prometheus crate which bypasses elapsed for something like:

Instant::now().saturating_duration_since(self.start)

@michaelsproul michaelsproul added A0 bug Something isn't working labels Jul 30, 2021
@michaelsproul
Copy link
Member Author

About the environment: the host seems to identify itself as vServer 20171111, but I'm not sure if that's a specific hypervisor or just branding by the Hetzner VPS provider.

@michaelsproul
Copy link
Member Author

Fixed in tikv/rust-prometheus#406

@paulhauner paulhauner added the v1.5.1 To be included in the v1.5.1 relase label Aug 2, 2021
bors bot pushed a commit that referenced this issue Aug 17, 2021
## Proposed Changes

* Consolidate Tokio versions: everything now uses the latest v1.10.0, no more `tokio-compat`.
* Many semver-compatible changes via `cargo update`. Notably this upgrades from the yanked v0.8.0 version of crossbeam-deque which is present in v1.5.0-rc.0
* Many semver incompatible upgrades via `cargo upgrades` and `cargo upgrade --workspace pkg_name`. Notable ommissions:
    - Prometheus, to be handled separately: #2485
    - `rand`, `rand_xorshift`: the libsecp256k1 package requires 0.7.x, so we'll stick with that for now
    - `ethereum-types` is pinned at 0.11.0 because that's what `web3` is using and it seems nice to have just a single version
    
## Additional Info

We still have two versions of `libp2p-core` due to `discv5` depending on the v0.29.0 release rather than `master`. AFAIK it should be OK to release in this state (cc @AgeManning )
@michaelsproul michaelsproul added v1.5.2 The release after v1.5.1 v2.0.0 Altair on mainnet release (v2.0.0) and removed v1.5.1 To be included in the v1.5.1 relase v1.5.2 The release after v1.5.1 labels Aug 26, 2021
pawanjay176 pushed a commit to pawanjay176/lighthouse that referenced this issue Aug 27, 2021
## Proposed Changes

* Consolidate Tokio versions: everything now uses the latest v1.10.0, no more `tokio-compat`.
* Many semver-compatible changes via `cargo update`. Notably this upgrades from the yanked v0.8.0 version of crossbeam-deque which is present in v1.5.0-rc.0
* Many semver incompatible upgrades via `cargo upgrades` and `cargo upgrade --workspace pkg_name`. Notable ommissions:
    - Prometheus, to be handled separately: sigp#2485
    - `rand`, `rand_xorshift`: the libsecp256k1 package requires 0.7.x, so we'll stick with that for now
    - `ethereum-types` is pinned at 0.11.0 because that's what `web3` is using and it seems nice to have just a single version
    
## Additional Info

We still have two versions of `libp2p-core` due to `discv5` depending on the v0.29.0 release rather than `master`. AFAIK it should be OK to release in this state (cc @AgeManning )
@michaelsproul michaelsproul removed the v2.0.0 Altair on mainnet release (v2.0.0) label Sep 27, 2021
bors bot pushed a commit that referenced this issue Nov 1, 2021
## Issue Addressed

Closes #2485
@michaelsproul
Copy link
Member Author

Fix is merged and will be available in the next release post v2.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants