ETCD no leader when nodes have problem connect with leader node #16502

EdithChenLi · 2023-08-29T02:51:06Z

EdithChenLi
Aug 29, 2023

ETCD 3-nodes connection based on certificates, when I stop leader node service or server, other 2 nodes will immediately promote 1 of them to be leader. But I found if others 2nodes have connect problem with leader node, like certificate key lost/expired or network delay, no new leader will promote. The PostgreSQL cluster will become read-only after 1-3 mins.

ETCD version is etcd:3.2.32(podman image). is this expected?

Failed to get the status of endpoint xxxx:2379(rpc error: code = internal desc=connection errro: desc = "transport: authentication handshake failed: remote error: tls: internal error")

ENDPOINT ID VERSION DB SIZE IS LEADER RAFT TERM RAFT INDEX
xxxxxx:2379 xxxxxxx 3.2.32 156kb false 18 142691
xxxxxx:2379 xxxxxxx 3.2.32 160kb false 18 142691

jmhbnz · 2023-08-29T03:11:06Z

jmhbnz
Aug 29, 2023
Maintainer

Hey @EdithChenLi - Thanks for raising this question.

This won't be what you want to hear however etcd v3.2.32 is 2.5 years old and out of support. There have been tons of fixes since that release. Do you know if this issue occurs on a more recent version of etcd? Ideally the latest 3.4.x or 3.5.x release?

0 replies

EdithChenLi · 2023-08-29T03:27:35Z

EdithChenLi
Aug 29, 2023
Author

thanks for the response @jmhbnz . I know this version is bit old, but we installed the ETCD using Podman image which latest version is etcd:3.2.32

1 reply

jmhbnz Aug 29, 2023
Maintainer

You can find more recent container images here in quay.io: https://quay.io/repository/coreos/etcd?tab=tags&tag=latest

Please check if a later version resolved the issue.

EdithChenLi · 2023-08-29T04:09:08Z

EdithChenLi
Aug 29, 2023
Author

@jmhbnz For your question, I did not use > 3.4 version before, not quite sure about if same issue happens. As ETCD doc, seems there is lease and leaner nodes setup which should help fix this issue

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD no leader when nodes have problem connect with leader node #16502

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ETCD no leader when nodes have problem connect with leader node #16502

EdithChenLi Aug 29, 2023

Failed to get the status of endpoint xxxx:2379(rpc error: code = internal desc=connection errro: desc = "transport: authentication handshake failed: remote error: tls: internal error")

ENDPOINT ID VERSION DB SIZE IS LEADER RAFT TERM RAFT INDEX xxxxxx:2379 xxxxxxx 3.2.32 156kb false 18 142691 xxxxxx:2379 xxxxxxx 3.2.32 160kb false 18 142691

Replies: 3 comments · 1 reply

jmhbnz Aug 29, 2023 Maintainer

EdithChenLi Aug 29, 2023 Author

jmhbnz Aug 29, 2023 Maintainer

EdithChenLi Aug 29, 2023 Author

EdithChenLi
Aug 29, 2023

ENDPOINT ID VERSION DB SIZE IS LEADER RAFT TERM RAFT INDEX
xxxxxx:2379 xxxxxxx 3.2.32 156kb false 18 142691
xxxxxx:2379 xxxxxxx 3.2.32 160kb false 18 142691

Replies: 3 comments 1 reply

jmhbnz
Aug 29, 2023
Maintainer

EdithChenLi
Aug 29, 2023
Author

jmhbnz Aug 29, 2023
Maintainer

EdithChenLi
Aug 29, 2023
Author