Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PersistentKeepalive in Peer CRD is not written to Wireguard config #153

Open
ToothlessGear opened this issue May 5, 2021 · 14 comments
Open

Comments

@ToothlessGear
Copy link

ToothlessGear commented May 5, 2021

Hi @squat,

I am running a cluster with --mesh-granularity=full (kg version: e843262064963aef2491a9fd32c82746c31b2af6) with the nodes behind NAT and trying to connect to an external peer.
When I create this peer resource with a PersistentKeepalive value it does not show in the resulting Wireguard config.

apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: externalpeer
spec:
  allowedIPs:
    - 10.6.0.1/32
  endpoint:
    ip: PUBLIC_IP_OMITTED
    port: 51820
  persistentKeepalive: 25
  publicKey: PUBLIC_KEY_OMITTED

In the resulting Wireguard configuration on the nodes, the set value for persistentKeepAlive is omitted:

$ sudo wg show
interface: kilo0
  public key: PUBLIC_KEY_OMITTED
  private key: (hidden)
  listening port: 51820

... other node peers ...

peer: PUBLIC_KEY_OMITTED
  endpoint: PUBLIC_ENDPOINT_OMITTED
  allowed ips: 10.6.0.1/32

Using kgctl it shows the values correctly, however:

$ kgctl --kubeconfig ~/k3s.yaml showconf peer externalpeer --as-peer
[Peer]
AllowedIPs = 10.6.0.1/32
Endpoint = PUBLIC_ENDPOINT_OMITTED
PersistentKeepalive = 25
PublicKey = PUBLIC_KEY_OMITTED

Setting it via the node annotation kilo.squat.ai/persistent-keepalive: "25" works and a connection is established, but then it is set for all peers, which I want to avoid.

It seem like in topology.Conf() it always uses the value set by the Topology struct, rather than the value of the peer.

Is this a bug, or is this intentional and I am missing something?

@leonnicolas
Copy link
Collaborator

Apparently the PersistenKeepalive is only meant for the client (Peer) configuration. If you would change the current behavior of the topology.Conf() function, you would make all your nodes with a public endpoint send Keepalives to the specific peer, too, although this is unnecessary.
I guess, a feature that would enable a more fine grain configuration of Keepalives being send by nodes to only specific peers is not very useful. Normally, a node behind NAT should send Keepalives to all its peers, not just to a particular one.
So specifying the PersistentKeepalive for a peer is only for convenience, when using kgctl showconf to generate a client configuration? wdyt?

@ToothlessGear
Copy link
Author

Normally, a node behind NAT should send Keepalives to all its peers, not just to a particular one.

I agree with that.
However, all cluster nodes are on the same network/in the same location, where the PersistentKeepalive annotation on the node would not be needed as well. My thinking was that, if I want to connect this location to an external peer (this is a simple machine, not another Kubernetes node) which is publicly available, I can simple create the CRD and set the PersistentKeepalive value.

Maybe the behavior could be, that if the PersistentKeepalive value is set for the topology, it will override the value set by CRD. Something like this, but I am not sure about the implication in other use cases:

diff --git a/pkg/mesh/topology.go b/pkg/mesh/topology.go
index 858c900..d0ffd3f 100644
--- a/pkg/mesh/topology.go
+++ b/pkg/mesh/topology.go
@@ -227,10 +227,16 @@ func (t *Topology) Conf() *wireguard.Conf {
                c.Peers = append(c.Peers, peer)
        }
        for _, p := range t.peers {
+               var pka int
+               if t.persistentKeepalive != 0 {
+                       pka = t.persistentKeepalive
+               } else {
+                       pka = p.PersistentKeepalive
+               }
                peer := &wireguard.Peer{
                        AllowedIPs:          p.AllowedIPs,
                        Endpoint:            t.updateEndpoint(p.Endpoint, p.PublicKey, p.PersistentKeepalive),
-                       PersistentKeepalive: t.persistentKeepalive,
+                       PersistentKeepalive: pka,
                        PresharedKey:        p.PresharedKey,
                        PublicKey:           p.PublicKey,
                }

@leonnicolas
Copy link
Collaborator

leonnicolas commented May 7, 2021

However, all cluster nodes are on the same network/in the same location, where the PersistentKeepalive annotation on the node would not be needed as well. My thinking was that, if I want to connect this location to an external peer (this is a simple machine, not another Kubernetes node) which is publicly available, I can simple create the CRD and set the PersistentKeepalive value.

I see the nodes are in one network behind NAT and use WireGurad to encrypt all traffic (full Mesh)

Maybe the behavior could be, that if the PersistentKeepalive value is set for the topology, it will override the value set by CRD. Something like this, but I am not sure about the implication in other use cases:

If I understand correctly, this would mean even publicly available nodes will send keepalives to peers. This would make WireGuard more chatty which is against its core ideas.

I think to make your use case work, we would need to check along with your proposed changes whether or not the node has a public endpoint and only then set a persistent keepalive for the peer. I wonder what @squat thinks about it.

@ToothlessGear
Copy link
Author

I think to make your use case work, we would need to check along with your proposed changes whether or not the node has a public endpoint and only then set a persistent keepalive for the peer. I wonder what @squat thinks about it.

Yes, feedback would be appreciated if this is something worth pursuing in a PR.

@leonnicolas
Copy link
Collaborator

leonnicolas commented May 16, 2021

Setting the persistent keepalive only in some cases (node has no persistent keepalive annotation and has a private endpoint) has some disadvantages:

  • when endpoint is a DNS name, we would have to look that up all the time
  • might be confusing because it would (sometimes) change the meaning of the current PersistentKeepalive field in the CRD
  • the solution is not very flexible (no way to configure it)

We could add a second field (eg. PersistentKeepaliveNode) to the CRD and in the simplest case all nodes in the full mesh would send the keepalive to the peer. This could be expended by adding an array with node names (only the given nodes should send a keepalive) . But this is hard to maintain when the node names change. There could be some kind of wildcard?

What do you think about an additional field in the CRD?

@squat
Copy link
Owner

squat commented May 16, 2021

Yes, this sounds like a more understandable API that won't produce quite so many surprises. If we want to keep it flexible and not be so tightly bound to node names, which in real production clusters, come and go, we could use label matchers to select nodes

@ToothlessGear
Copy link
Author

Sounds good! I am fine with an addtional field in the CRD.

So to summarize the scenario with an example:
I have some number nodes in the same location/network behind NAT, all with private endpoints, and trying to add a ordinary peer which is publicly reachable. These nodes themselves are labeled with networktype: behind-nat for this example.

I would have to create such a CRD:

apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: externalpeer
spec:
  allowedIPs:
    - 10.6.0.1/32
  endpoint:
    ip: 1.2.3.4
    port: 51820
  persistentKeepalive: 25
  persistentKeepaliveNodeSelector:
    networktype: behind-nat
  publicKey: SOMEPUBLICKEY

This results that the persistent keepalive value will be written to the resulting WireGuard config for only this peer in a full mesh configuration.
If the annotation kilo.squat.ai/persistent-keepalive: "25" is set, it still should have precedence over the CRD config, I think.

Does that sound reasonable?

@leonnicolas
Copy link
Collaborator

Should we make a second persistent keepalive field to avoid the keepalive for the public peer in the WireGurad config when you run kgctl showconf peer <public peer>?

@squat
Copy link
Owner

squat commented May 20, 2021

Yeah, Leon and I were thinking we could make the API look something like:

apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: externalpeer
spec:
  allowedIPs:
    - 10.6.0.1/32
  endpoint:
    ip: 1.2.3.4
    port: 51820
  persistentKeepalive: 25
  nodePersistentKeepalive:
  - value: 10
    selector:
      networktype: behind-nat
  publicKey: SOMEPUBLICKEY

wdyt? Alternatively, we could make the array instead be a single value and assume users don't need more granularity than that. One final consideration was whether we should be using nodes and corresponding label selectors or if we should be using location names? We may want to consider how this may affect a possible future where we may or may not want to introduce a location CRD.

@ToothlessGear
Copy link
Author

Ah, now I got you about the second value.
This looks nice! I think the array is fine here.
If choosing the location directly instead of the node label, wouldn't it make the API simpler, like:

nodePersistentKeepalive:
  - value: 10
    location: behind-nat

@squat
Copy link
Owner

squat commented Apr 7, 2022

I've been thinking recently about this change and thought that it would be good to enhance the Peer CRD to version v1alpha2 and add new fields.

I'd like to go one step further and provide an escape hatch that can be used on the Peer CR to override any fields of any Peers, e.g.

apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: externalpeer
spec:
  allowedIPs:
    - 10.6.0.1/32
  endpoint:
    ip: 1.2.3.4
    port: 51820
  persistentKeepalive: 25
  peers:
  - selector:
      publicKey: xxxx
    persistentKeepalive: 10
  - selector:
      location: foo
    endpoint: x.x.x.x:xxxx
  - selector:
      peer: bar
    allowedIPs:
    - yyyy
    - zzzz

The selector field of the nested peer object would allow selecting WireGuard peers based on Kilo location name, WireGuard public key, or Kilo peer name.

Any thoughts?

@ToothlessGear
Copy link
Author

Generally I quite like this approach!

But I am not quite sure, if I understand the peer selector correctly.
Does that mean one Peer CR could override a different one?

I think a peer must not modify anything, other than itself.

If multiple Peer CRs are overriding the same selector, it could lead to undefined behavior. Which one would "win"?
It also could circumvent certain RBAC rules that are set in place (one is allowed to edit a certain Peer, but not others), couldn't it?

@squat
Copy link
Owner

squat commented Apr 8, 2022

The idea is that one peer could override the configuration it generates for itself when wanting to communicate with another peer. E.g., i want to use a different persistent keepalive when i talk to that person. This helps make my peer CRD fully describe and encapsulate my configuration while letting everyone else's config remain unchanged/use automatically deducted values

@ToothlessGear
Copy link
Author

I misunderstood, thanks for the clarification.
Seems good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants