Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to restart follower node after an unfinished ledger sync on Windows. #5813

Open
mxmauro opened this issue Oct 30, 2023 · 3 comments
Open
Labels
bug Something isn't working

Comments

@mxmauro
Copy link
Contributor

mxmauro commented Oct 30, 2023

Subject of the issue

couldn't initialize the node error when starting a node if it was stopped before ledger sync command completes.

Your environment

Windows.

Steps to reproduce

Configuration settings for a testnet node:

"Archival": false,
"EnableFollowMode": true,
"EnableTxnEvalTracer": true,
"MaxAcctLookback": 128,
"DisableAPIAuth": true
  1. Start the node from scratch (algod -d ./data command line, for example).
  2. In another window, do curl -X POST http://localhost:8001/v2/ledger/sync/100000 to sync up to block 100000.
  3. Wait a few seconds so sync starts and, back to the node window, hit CTRL+C to stop the node.
  4. Try to start the node again. If it starts, you can repeat the process from step 2, sooner or later you should hit the error.

Generated log:

{"file":"follower_node.go","function":"github.com/algorand/go-algorand/node.MakeFollower","level":"error","line":141,"msg":"[Stack] goroutine 1 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/algorand/go-algorand/logging.logger.Errorf({0xc000167b90?, 0xc00000a510?}, {0x1416ce878, 0x2d}, {0xc008354560, 0x1, 0x1})\n\tgithub.com/algorand/go-algorand/logging/log.go:231 +0x69\ngithub.com/algorand/go-algorand/node.MakeFollower({_, _}, {_, _}, {0x1e, 0x0, 0x4, {0x0, 0x0}, 0xdf8475800, ...}, ...)\n\tgithub.com/algorand/go-algorand/node/follower_node.go:141 +0x1009\ngithub.com/algorand/go-algorand/daemon/algod.(*Server).Initialize(_, {0x1e, 0x0, 0x4, {0x0, 0x0}, 0xdf8475800, {0x0, 0x0}, 0xf, ...}, ...)\n\tgithub.com/algorand/go-algorand/daemon/algod/server.go:234 +0x10f3\nmain.run()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:333 +0x20c6\nmain.main()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:63 +0x5a\n","time":"2023-10-23T14:39:00.329862-03:00"}
{"file":"follower_node.go","function":"github.com/algorand/go-algorand/node.MakeFollower","level":"error","line":141,"msg":"unable to set sync round to Ledger.DBRound requested sync round cannot be less than the latest round","time":"2023-10-23T14:39:00.331120-03:00"}
{"file":"main.go","function":"main.run","level":"error","line":336,"msg":"[Stack] goroutine 1 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/algorand/go-algorand/logging.logger.Error({0xc000167b90?, 0xc00000a510?}, {0xc008354870, 0x1, 0x1})\n\tgithub.com/algorand/go-algorand/logging/log.go:221 +0x53\nmain.run()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:336 +0x24ce\nmain.main()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:63 +0x5a\n","time":"2023-10-23T14:39:00.331120-03:00"}
{"file":"main.go","function":"main.run","level":"error","line":336,"msg":"couldn't initialize the node: requested sync round cannot be less than the latest round","time":"2023-10-23T14:39:00.331120-03:00"}
{"file":"router.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server.NewRouter","level":"error","line":77,"msg":"[Stack] goroutine 1 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/algorand/go-algorand/logging.logger.Errorf({0xc0004c7b90?, 0xc00000a500?}, {0x1416d6ef7, 0x33}, {0xc0007c17a0, 0x2, 0x2})\n\tgithub.com/algorand/go-algorand/logging/log.go:231 +0x69\ngithub.com/algorand/go-algorand/daemon/algod/api/server.NewRouter({0x141a22e68, 0xc000114ce0}, {0x286baf88?, 0xc0008b2500}, 0xc000dca5a0, {0x0, 0x0}, {0xc0005fc500, 0x40}, {0x141a0a870, ...}, ...)\n\tgithub.com/algorand/go-algorand/daemon/algod/api/server/router.go:77 +0x17e\ngithub.com/algorand/go-algorand/daemon/algod.(*Server).Start(0xc000052200)\n\tgithub.com/algorand/go-algorand/daemon/algod/server.go:333 +0x9b3\nmain.run()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:417 +0x2505\nmain.main()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:63 +0x5a\n","time":"2023-10-23T14:39:47.812159-03:00"}
{"file":"router.go","function":"github.com/algorand/go-algorand/daemon/algod/api/server.NewRouter","level":"error","line":77,"msg":"Invalid apiToken was passed to NewRouter (''): provided APIToken too short. Must be \u003e= 64 characters","time":"2023-10-23T14:39:47.813224-03:00"}
{"file":"wsNetwork.go","function":"github.com/algorand/go-algorand/network.(*WebsocketNetwork).getDNSAddrs","level":"warning","line":2089,"msg":"Cannot lookup archive SRV record for testnet.algorand.network: ReadFromBootstrap: DNS LookupSRV failed when using system resolver(no answer for (_archive._tcp.testnet.algorand.network., 33) from DNS servers [1.1.1.1\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000:53 8.8.8.8\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000:53]), fallback resolver(\u003cnil\u003e), as well as using default resolver due to no signature in DNS response for _archive._tcp.testnet.algorand.network","name":"","time":"2023-10-23T14:39:47.963723-03:00"}
{"file":"wsNetwork.go","function":"github.com/algorand/go-algorand/network.(*WebsocketNetwork).getDNSAddrs","level":"warning","line":2089,"msg":"Cannot lookup archive SRV record for testnet.algorand.network: ReadFromBootstrap: DNS LookupSRV failed when using system resolver(no answer for (_archive._tcp.testnet.algorand.network., 33) from DNS servers [1.1.1.1\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000:53 8.8.8.8\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000:53]), fallback resolver(\u003cnil\u003e), as well as using default resolver due to no signature in DNS response for _archive._tcp.testnet.algorand.network","name":"","time":"2023-10-23T14:39:48.140789-03:00"}
{"file":"follower_node.go","function":"github.com/algorand/go-algorand/node.MakeFollower","level":"error","line":141,"msg":"[Stack] goroutine 1 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/algorand/go-algorand/logging.logger.Errorf({0xc000490070?, 0xc0005b4010?}, {0x1416ce878, 0x2d}, {0xc0081ddf00, 0x1, 0x1})\n\tgithub.com/algorand/go-algorand/logging/log.go:231 +0x69\ngithub.com/algorand/go-algorand/node.MakeFollower({_, _}, {_, _}, {0x1e, 0x0, 0x4, {0x0, 0x0}, 0xdf8475800, ...}, ...)\n\tgithub.com/algorand/go-algorand/node/follower_node.go:141 +0x1009\ngithub.com/algorand/go-algorand/daemon/algod.(*Server).Initialize(_, {0x1e, 0x0, 0x4, {0x0, 0x0}, 0xdf8475800, {0x0, 0x0}, 0xf, ...}, ...)\n\tgithub.com/algorand/go-algorand/daemon/algod/server.go:234 +0x10f3\nmain.run()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:333 +0x20c6\nmain.main()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:63 +0x5a\n","time":"2023-10-23T14:40:18.226031-03:00"}
{"file":"follower_node.go","function":"github.com/algorand/go-algorand/node.MakeFollower","level":"error","line":141,"msg":"unable to set sync round to Ledger.DBRound requested sync round cannot be less than the latest round","time":"2023-10-23T14:40:18.227593-03:00"}
{"file":"main.go","function":"main.run","level":"error","line":336,"msg":"[Stack] goroutine 1 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/algorand/go-algorand/logging.logger.Error({0xc000490070?, 0xc0005b4010?}, {0xc008498210, 0x1, 0x1})\n\tgithub.com/algorand/go-algorand/logging/log.go:221 +0x53\nmain.run()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:336 +0x24ce\nmain.main()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:63 +0x5a\n","time":"2023-10-23T14:40:18.227593-03:00"}
{"file":"main.go","function":"main.run","level":"error","line":336,"msg":"couldn't initialize the node: requested sync round cannot be less than the latest round","time":"2023-10-23T14:40:18.227593-03:00"}

Expected behaviour

To be able to restart the node.

@mxmauro mxmauro added the bug Something isn't working label Oct 30, 2023
@mxmauro
Copy link
Contributor Author

mxmauro commented Nov 12, 2023

Uploaded data folder that hits the issue.

Seems the tracker database is behind the blocks and the call to node.SetSyncRound which uses LatestTrackerCommitted() fails because of this.

Maybe, when exiting the node, blocks database is updated but tracker isn't so, on restart, tracker is too far behind blocks.

data.zip

@gmalouf
Copy link
Contributor

gmalouf commented Dec 13, 2023

As Windows is not one of primary supported OS's, can we confirm that this is reproducible in MacOS/supported Linux environments?

@mxmauro
Copy link
Contributor Author

mxmauro commented Dec 15, 2023

Hi @gmalouf , just checked on macOS and before on Linux and seems not to happen. I think the goruntime uses a different scheduling and makes the blocks db to be most of times ahead of the tracker.

@gmalouf gmalouf changed the title Unable to restart follower node after an unfinished ledger sync. Unable to restart follower node after an unfinished ledger sync on Windows. Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants