You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about gossip implementation used in Prometheus alertmanager.
Currently we have alertmanager HA cluster of three nodes configured at start in config file, running with 4 nodes. One node joined the cluster after some time from another k8s namespace.
I suspect gossip protocol. If the list of available members is just a list of IPs and taking in mind in k8s cluster it is possible for new Pod to get IP of some random just died Pod, then (e.g. during cluster maintenance, when many Pods are restarted) some Pod can hijack IP of legit alertmanager Pod.
New Pod is a part of another alertmanager setup which has only one peer.
In other words gossip implementation may trust new peer with the same IP. (?)
My question would be: is there such possibility of using this library in a such way, that we can get new peer in gossip available peer table even though it was not there at beginning?
At some point of time Pods are restarted, not at the same time, so that:
alertmanagerA:
peer1 => peer1.a.com (2.2.2.2)
peers: 2.2.2.2, 22.22.22.22, 33.33.33.33, 44.44.44.44
peer1 of alertmanagerA accidentally got IP of peer1 of alertmanagerB.
Therefore peers of alertmanagerB are able to share a list of active peers with peer with IP 2.2.2.2, and the peer with IP 2.2.2.2 in it's turn sends live ping to extended list of nodes.
Is such situation possible in theory?
I'll try to reproduce it in a small k8s cluster with reduced list of available Pod's IPs.
The text was updated successfully, but these errors were encountered:
the gossip protocol itself is good with such situation because it doesn't care about: security, a list of desired peers.
the alertmanager probably should compare peers from it's config with peers taken from gossip library as a basic check (not sure what to do if lists differ).
Hi,
I have a question about gossip implementation used in Prometheus alertmanager.
Currently we have alertmanager HA cluster of three nodes configured at start in config file, running with 4 nodes. One node joined the cluster after some time from another k8s namespace.
I suspect gossip protocol. If the list of available members is just a list of IPs and taking in mind in k8s cluster it is possible for new Pod to get IP of some random just died Pod, then (e.g. during cluster maintenance, when many Pods are restarted) some Pod can hijack IP of legit alertmanager Pod.
New Pod is a part of another alertmanager setup which has only one peer.
In other words gossip implementation may trust new peer with the same IP. (?)
My question would be: is there such possibility of using this library in a such way, that we can get new peer in gossip available peer table even though it was not there at beginning?
E.g.:
alertmanagerA at start time:
peer1 => peer1.a.com (1.1.1.1)
peers: 1.1.1.1
alertmanagerB at start time:
peer1 => peer1.b.com (2.2.2.2)
peer2 => peer2.b.com (3.3.3.3)
peer3 => peer3.b.com (4.4.4.4)
peers: 2.2.2.2, 3.3.3.3, 4.4.4.4
At some point of time Pods are restarted, not at the same time, so that:
alertmanagerA:
peer1 => peer1.a.com (2.2.2.2)
peers: 2.2.2.2, 22.22.22.22, 33.33.33.33, 44.44.44.44
alertmanagerB:
peer1 => peer1.b.com (22.22.22.22)
peer2 => peer2.b.com (33.33.33.33)
peer3 => peer3.b.com (44.44.44.44)
peers: 2.2.2.2, 22.22.22.22, 33.33.33.33, 44.44.44.44
peer1 of alertmanagerA accidentally got IP of peer1 of alertmanagerB.
Therefore peers of alertmanagerB are able to share a list of active peers with peer with IP 2.2.2.2, and the peer with IP 2.2.2.2 in it's turn sends live ping to extended list of nodes.
Is such situation possible in theory?
I'll try to reproduce it in a small k8s cluster with reduced list of available Pod's IPs.
The text was updated successfully, but these errors were encountered: