Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burrow does not exit after panic #791

Open
reimai opened this issue Oct 2, 2023 · 0 comments
Open

Burrow does not exit after panic #791

reimai opened this issue Oct 2, 2023 · 0 comments

Comments

@reimai
Copy link

reimai commented Oct 2, 2023

Version: 1.6.0
Issue: burrow hangs (stops responding, but does not exit) after a failure to unlock from zk:

2023-10-02 17:33:19.940 |   {"level":"info","ts":1696257198.8336904,"msg":"re-submitting `0` credentials after reconnect","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:19.940 |  {"level":"info","ts":1696257198.8336573,"msg":"authenticated: id=74567085257124526, timeout=6000","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:19.940 |  {"level":"info","ts":1696257198.811102,"msg":"starting session","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:19.940 |  {"level":"info","ts":1696257198.8110363,"msg":"Connected to [zk-ip1]:2181","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:18.938 | stderr   	/home/runner/work/Burrow/Burrow/core/internal/notifier/coordinator.go:272 +0x1f1
2023-10-02 17:33:18.938 | stderr   created by github.com/linkedin/Burrow/core/internal/notifier.(*Coordinator).Start
2023-10-02 17:33:18.938 | stderr   	/home/runner/work/Burrow/Burrow/core/internal/notifier/coordinator.go:328 +0x505
2023-10-02 17:33:18.938 | stderr   github.com/linkedin/Burrow/core/internal/notifier.(*Coordinator).manageEvalLoop(0xc0000f0380)
2023-10-02 17:33:18.934 | stderr   goroutine 115 [running]:
2023-10-02 17:33:18.934 | stderr
2023-10-02 17:33:18.934 | stderr   panic: Unable to release zookeeper lock after session expiration

Seems like that panic was somehow recovered, because Burrow failed at was not printed. And the process did not died until 10 minutes later when I send it a SIGTERM.

A similar thing happens if I start it locally, without access to zk:

{"level":"panic","ts":1696264692.487353,"msg":"Failure to start zookeeper","type":"coordinator","name":"zookeeper","error":"lookup zk-host on [zk-ip]:53: no such host"}
panic: Failure to start zookeeper [recovered]
	panic: Failure to start zookeeper

goroutine 1 [running]:
main.handleExit()
	/home/runner/work/Burrow/Burrow/main.go:63 +0xf8
panic({0xbeb8a0, 0xc0003a60b0})
	/opt/hostedtoolcache/go/1.20.1/x64/src/runtime/panic.go:884 +0x213
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0x7f680a4c45e8?, {0x0?, 0x0?, 0xc000132020?})
	/home/runner/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/entry.go:198 +0x65
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00011c000, {0xc000226180, 0x1, 0x1})
	/home/runner/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/entry.go:264 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc000226000?, {0xd227e4?, 0x0?}, {0xc000226180, 0x1, 0x1})
	/home/runner/go/pkg/mod/go.uber.org/zap@v1.24.0/logger.go:258 +0x59
github.com/linkedin/Burrow/core/internal/zookeeper.(*Coordinator).Start(0xc00014a240)
	/home/runner/work/Burrow/Burrow/core/internal/zookeeper/coordinator.go:87 +0x42b
github.com/linkedin/Burrow/core.Start(0xc000084540?, 0xc0001a5ef0?)
	/home/runner/work/Burrow/Burrow/core/burrow.go:158 +0x49b
main.main()
	/home/runner/work/Burrow/Burrow/main.go:114 +0x4d2

And no logs since that, the prcess is alive. This time the panic clearly has been recovered.
I would very much like burrow to exit on network problems, so my orchestration could restart it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant