Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to receive a message from Overseer: Signal channel is terminated and empty. #2540

Open
ioannist opened this issue Oct 28, 2023 · 3 comments

Comments

@ioannist
Copy link

Moonbeam-skylake 0.33, operating as a full-node

I am running a fullnode and querying it extensively in localhost. After approx 70 blocks, the moonbeam node crashes with:

Oct 28 05:27:55 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:55 [Relaychain] ✨ Imported #17915340 (0x8d94…96f5)
Oct 28 05:27:55 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:55 [Relaychain] 💤 Idle (6 peers), best: #17915340 (0x8d94…96f5), finalized #17915337 (0x2545…00b5), ⬇ 30.2kiB/s ⬆ 7.0kiB/s
Oct 28 05:27:55 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:55 [🌗] ⚙️  Preparing  0.0 bps, target=#4743523 (11 peers), best: #4743508 (0x95c1…3d43), finalized #4743505 (0x7bff…d019), ⬇ 4.9kiB/s ⬆ 89 B/s
Oct 28 05:27:56 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:56 [Relaychain] cannot query the runtime API version: Api called for an unknown Block: State already discarded for 0x5adec8fe76ac16a0e2ff5bb1333dab8d683b67ab6fbda537c577511b3d8c511b
Oct 28 05:27:56 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:56 [Relaychain] Failed to fetch runtime API data for job err=NotSupported { runtime_api_name: "validator_groups" }
Oct 28 05:27:56 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:56 [Relaychain] cannot query the runtime API version: Api called for an unknown Block: State already discarded for 0x5adec8fe76ac16a0e2ff5bb1333dab8d683b67ab6fbda537c577511b3d8c511b
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] Failed to receive a message from Overseer, exiting err=Generated(Context("Signal channel is terminated and empty."))
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] err=Subsystem(Generated(Context("Signal channel is terminated and empty.")))
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] error receiving message from subsystem context: Generated(Context("Signal channel is terminated and empty.")) err=Generated(Context("Signal channel is terminated and empty."))
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="statement-distribution-subsystem" err=FromOrigin { origin: "statement-distribution", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="network-bridge-rx-subsystem" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="dispute-distribution-subsystem" err=FromOrigin { origin: "dispute-distribution", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="availability-recovery-subsystem" err=FromOrigin { origin: "availability-recovery", source: Generated(Context("Signal channel is terminated and empty.")) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="bitfield-signing-subsystem" err=FromOrigin { origin: "bitfield-signing", source: Generated(Context("Signal channel is terminated and empty.")) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="candidate-validation-subsystem" err=FromOrigin { origin: "candidate-validation", source: Generated(Context("Signal channel is terminated and empty.")) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="provisioner-subsystem" err=FromOrigin { origin: "provisioner", source: OverseerExited(Generated(Context("Signal channel is terminated and empty."))) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="network-bridge-tx-subsystem" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] subsystem exited with error subsystem="chain-api-subsystem" err=FromOrigin { origin: "chain-api", source: Generated(Context("Signal channel is terminated and empty.")) }
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] Overseer exited with error err=Generated(SubsystemStalled("approval-distribution-subsystem"))
Oct 28 05:27:58 stakebaby-chalandri moonbeam[2508374]: 2023-10-28 05:27:58 [Relaychain] Essential task `overseer` failed. Shutting down service.

This appears to be heavy-load or concurrency related, because the node does not crash if I ease down on the query rate. To put this in context, the node is queried by 7-12 NodeJS processes, each one of which can execute up to 300 queries concurrently. The moonbeam process averages 200%-450% of logical core capacity. I have tried different block spans, and the error persists, so I don't think it's related to db corruption.

Looks like it's a polkadot issue, but I am not sure if it has been resolved or ignored.
paritytech/polkadot#6624

@crystalin
Copy link
Collaborator

Thank you @ioannist , We are aware of this issue but couldn't well reproduce it so maybe with your help we can pin point where it comes from

@ioannist
Copy link
Author

It eventually happens (within 50 blocks give or take) under heavy load. It does not happen if I only run one indexing worker (or one block at a time).

I've been paying around with flags and setting this, seems to avert the issue
--max-runtime-instances 256

@ioannist
Copy link
Author

ioannist commented May 7, 2024

looks like they are getting to the bottom of it here
paritytech/polkadot-sdk#840

our full node keeps crashing every 20 min or so on this error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants