Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not confirm best peer had pivot block #7109

Open
macfarla opened this issue May 16, 2024 · 3 comments · May be fixed by #7126
Open

Could not confirm best peer had pivot block #7109

macfarla opened this issue May 16, 2024 · 3 comments · May be fixed by #7126
Assignees
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))

Comments

@macfarla
Copy link
Contributor

latest besu from this commit 27e5a64
1 of 6 Azure nodes stuck with this recurring error

{"@timestamp":"2024-05-16T03:45:24,313","level":"DEBUG","thread":"EthScheduler-Timer-0","class":"SyncTargetManager","message":"Could not confirm best peer had pivot block","throwable":" java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Timeout after 5000 MILLISECONDS\n\tat java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)\n\tat java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)\n\tat java.base/java.util.concurrent.CompletableFuture$OrApply.tryFire(CompletableFuture.java:1576)\n\tat java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1219)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)\n\tat java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)\n\tat org.hyperledger.besu.ethereum.eth.manager.EthScheduler.lambda$failAfterTimeout$11(EthScheduler.java:301)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: java.util.concurrent.TimeoutException: Timeout after 5000 MILLISECONDS\n\tat org.hyperledger.besu.ethereum.eth.manager.EthScheduler.lambda$failAfterTimeout$11(EthScheduler.java:300)\n\t... 5 more\n"}

looks like a chain halt - nothing in INFO logs for 3h after this:

{"@timestamp":"2024-05-15T23:14:08,234","level":"INFO","thread":"EthScheduler-Services-36 (requestCompleteTask)","class":"SnapWorldDownloadState","message":"Pausing world state download while waiting for sync to complete","throwable":""} {"@timestamp":"2024-05-15T23:14:45,654","level":"INFO","thread":"EthScheduler-Services-279 (importBlock)","class":"ImportBlocksStep","message":"Block import progress: 12582224 of 19878555 (63%), Peer count: 25","throwable":""} {"@timestamp":"2024-05-15T23:22:41,709","level":"INFO","thread":"EthScheduler-Services-225 (importBlock)","class":"ImportBlocksStep","message":"Block import progress: 12606638 of 19878555 (63%), Peer count: 24","throwable":""} {"@timestamp":"2024-05-15T23:24:47,420","level":"INFO","thread":"EthScheduler-Services-258 (importBlock)","class":"ImportBlocksStep","message":"Block import progress: 12614838 of 19878555 (63%), Peer count: 25","throwable":""} {"@timestamp":"2024-05-15T23:32:05,457","level":"INFO","thread":"EthScheduler-Services-283","class":"SnapWorldDownloadState","message":"Running world state heal process from peers with pivot block 19878650","throwable":""} {"@timestamp":"2024-05-15T23:34:06,377","level":"INFO","thread":"EthScheduler-Services-283 (importBlock)","class":"ImportBlocksStep","message":"Block import progress: 12639013 of 19878650 (63%), Peer count: 25","throwable":""}

@macfarla
Copy link
Contributor Author

Restart resolved the issue for this node.

@macfarla macfarla added P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) bug Something isn't working labels May 16, 2024
@macfarla
Copy link
Contributor Author

with the extra info in the log message,
{"@timestamp":"2024-05-20T21:37:46,523","level":"DEBUG","thread":"EthScheduler-Timer-0","class":"SyncTargetManager","message":"Could not confirm best peer 0xaaa... had pivot block 19913245","throwable":" java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Timeout after 5000 MILLISECONDS\n\tat java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)\n\tat java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)\n\tat java.base/java.util.concurrent.CompletableFuture$OrApply.tryFire(CompletableFuture.java:1576)\n\tat java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1219)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)\n\tat java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)\n\tat org.hyperledger.besu.ethereum.eth.manager.EthScheduler.lambda$failAfterTimeout$11(EthScheduler.java:301)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: java.util.concurrent.TimeoutException: Timeout after 5000 MILLISECONDS\n\tat org.hyperledger.besu.ethereum.eth.manager.EthScheduler.lambda$failAfterTimeout$11(EthScheduler.java:300)\n\t... 5 more\n"}

could then use admin_removePeer to remove that problem peer, for a different way to recover

@macfarla macfarla self-assigned this May 20, 2024
@macfarla
Copy link
Contributor Author

the problem is that we're calling this function recursively, and there is a repeated timeout exception - but we're still retrying because of the recursion. And the exceptionally() only returns once the other part completes

https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html#exceptionally-java.util.function.Function-

Returns a new CompletableFuture that is completed when this CompletableFuture completes, with the result of the given function of the exception triggering this CompletableFuture's completion when it completes exceptionally; otherwise, if this CompletableFuture completes normally, then the returned CompletableFuture also completes normally with the same value.

@macfarla macfarla linked a pull request May 21, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant