Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated attempts to reconcile mesh network #253

Open
sbaildon opened this issue Nov 14, 2021 · 6 comments
Open

Repeated attempts to reconcile mesh network #253

sbaildon opened this issue Nov 14, 2021 · 6 comments

Comments

@sbaildon
Copy link
Contributor

sbaildon commented Nov 14, 2021

I have an issue where when I connect an outside peer (eg. my laptop) to the cluster, kilo sees that configurations aren't the same and recreates the mesh to reconcile the differences. However, the config is never as expected and kilo will constantly attempt to reconcile, killing the network every ~30 seconds

I'm going to keep debugging, but I created this issue just in case you know what's up before I spend time here.

I added some prints to see what was going on:

level.Info(logger).Log("reason", "peer endpoints", "c", c, "b", b)

B C
{
  "b": {
    "Interface": {
      "ListenPort": 51820,
      "PrivateKey": "redacted="
    },
    "Peers": [
      {
        "AllowedIPs": [
          {
            "IP": "10.0.0.2",
            "Mask": "/////w=="
          },
          {
            "IP": "10.4.0.2",
            "Mask": "/////w=="
          },
          {
            "IP": "10.42.0.0",
            "Mask": "////AA=="
          }
        ],
        "Endpoint": {
          "DNS": "",
          "IP": "10.0.0.2",
          "Port": 51820
        },
        "PersistentKeepalive": 0,
        "PresharedKey": null,
        "PublicKey": "MDN2K2trTzZmVGNTSW42MktibGs2d3BkMW5pdnEyOElXVU0wU3hhQ3AxMD0=",
        "LatestHandshake": "2021-11-14T13:15:34Z"
      },
      {
        "AllowedIPs": [
          {
            "IP": "10.5.0.2",
            "Mask": "/////w=="
          }
        ],
        "Endpoint": null,
        "PersistentKeepalive": 0,
        "PresharedKey": null,
        "PublicKey": "WFZjZDhEQjloZFAxUENTeXh1QVBha3BCOVpqRCt1TWdCUld2Q3lJbDAxZz0=",
        "LatestHandshake": "0001-01-01T00:00:00Z"
      },
      {
        "AllowedIPs": [
          {
            "IP": "10.5.0.1",
            "Mask": "/////w=="
          }
        ],
        "Endpoint": {
          "DNS": "",
          "IP": "91.130.160.180",
          "Port": 57943
        },
        "PersistentKeepalive": 0,
        "PresharedKey": null,
        "PublicKey": "YjJxN1ZaeEpiZnl3Nlh6ZFRQR1JkSGJqVHRIblpwVlZwY1FhNHpyTmtWRT0=",
        "LatestHandshake": "2021-11-14T13:17:02Z"
      }
    ]
  },
  "caller": "conf.go:355",
  "level": "info",
  "reason": "peer endpoints",
  "ts": "2021-11-14T13:17:30.217962107Z"
}
{
  "c": {
    "Interface": {
      "ListenPort": 51820,
      "PrivateKey": "redacted="
    },
    "Peers": [
      {
        "AllowedIPs": [
          {
            "IP": "10.0.0.2",
            "Mask": "/////w=="
          },
          {
            "IP": "10.4.0.2",
            "Mask": "/////w=="
          },
          {
            "IP": "10.42.0.0",
            "Mask": "////AA=="
          }
        ],
        "Endpoint": {
          "DNS": "",
          "IP": "10.0.0.2",
          "Port": 51820
        },
        "PersistentKeepalive": 0,
        "PresharedKey": null,
        "PublicKey": "MDN2K2trTzZmVGNTSW42MktibGs2d3BkMW5pdnEyOElXVU0wU3hhQ3AxMD0=",
        "LatestHandshake": "0001-01-01T00:00:00Z"
      },
      {
        "AllowedIPs": [
          {
            "IP": "10.5.0.2",
            "Mask": "/////w=="
          }
        ],
        "Endpoint": null,
        "PersistentKeepalive": 0,
        "PresharedKey": null,
        "PublicKey": "WFZjZDhEQjloZFAxUENTeXh1QVBha3BCOVpqRCt1TWdCUld2Q3lJbDAxZz0=",
        "LatestHandshake": "0001-01-01T00:00:00Z"
      },
      {
        "AllowedIPs": [
          {
            "IP": "10.5.0.1",
            "Mask": "/////w=="
          }
        ],
        "Endpoint": null,
        "PersistentKeepalive": 0,
        "PresharedKey": null,
        "PublicKey": "YjJxN1ZaeEpiZnl3Nlh6ZFRQR1JkSGJqVHRIblpwVlZwY1FhNHpyTmtWRT0=",
        "LatestHandshake": "0001-01-01T00:00:00Z"
      }
    ]
  },
  "caller": "conf.go:355",
  "level": "info",
  "reason": "peer endpoints",
  "ts": "2021-11-14T13:17:30.217962107Z"
}

Turns out my laptop peer, 10.5.0.1, has a configured endpoint in oldConf, b, but is null in the new conf, c, and that's what's causing kilo to reconcile the differences

@leonnicolas
Copy link
Collaborator

i think it is because your laptop's endpoint is discovered since #146 and now Kilo wants to reapply the spec of your Laptop's peer that has a nil endpoint because the actual endpoint has been added and spec and reality have diverged. Let me check why I haven't noticed this with my laptop. Maybe this is wrong.

@leonnicolas
Copy link
Collaborator

leonnicolas commented Nov 14, 2021

What is the Peer spec of your laptop. Did you set persitent-keep-alive to 0?
Because the endpoint is not updated if it is 0:

if persistentKeepalive == 0 {

@sbaildon
Copy link
Contributor Author

What is the Peer spec of your laptop. Did you set persitent-keep-alive to 0? Because the endpoint is not updated if it is 0:

if persistentKeepalive == 0 {

Brilliant, that's exactly what's happening. I've added a persistentKeepalive and the network stays stable.

@sbaildon
Copy link
Contributor Author

Defining a peer with a persistent keep alive of 0

apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: laptop
spec:
  allowedIPs:
  - 10.5.0.1/32
  publicKey: SzhsHapvJy61urzHTAvx3Iu7ANlO+PGbsPy/mKY8U10=
  persistentKeepalive: 0

Still sees kilo attempt to reconcile the mesh network; line 3, 30~ seconds after apply:

{"caller":"mesh.go:344","component":"kilo","event":"add","level":"info","peer":{"PublicKey":[75,56,108,29,170,111,39,46,181,186,188,199,76,11,241,220,139,187,0,217,78,248,241,155,176,252,191,152,166,60,83,93],"Remove":false,"UpdateOnly":false,"PresharedKey":null,"PersistentKeepaliveInterval":0,"ReplaceAllowedIPs":false,"AllowedIPs":[{"IP":"10.5.0.1","Mask":"/////w=="}],"Endpoint":null,"Name":"laptop"},"ts":"2022-05-25T00:50:29.118108442Z"}

{"caller":"mesh.go:544","component":"kilo","diff":"number of peers: old=1, new=2","level":"info","msg":"WireGuard configurations are different","ts":"2022-05-25T00:50:29.16908714Z"}

{"caller":"mesh.go:544","component":"kilo","diff":"peer endpoints: nil value","level":"info","msg":"WireGuard configurations are different","ts":"2022-05-25T00:50:59.040795773Z"}

Is the intention of this code-path to prevent mesh reconciliation if pka == nil || pka == 0? Or am I misunderstanding?

if persistentKeepalive == nil || *persistentKeepalive == time.Duration(0) {

FWIW, I'm not bothered about keeping otherwise silent connections alive through NAT

@sbaildon sbaildon reopened this May 25, 2022
@sbaildon
Copy link
Contributor Author

Some mysterious behaviour I don't quite understand; I have a peer configuration called phone that is intended for my well, uh, phone, which didn't cause mesh reconciliation—I'm tailing kilo's logs. My phone is connected to the same WiFi network, there's no cellular involved here.

apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: laptop
spec:
  allowedIPs:
  - 10.5.0.1/32
  publicKey: SzhsHapvJy61urzHTAvx3Iu7ANlO+PGbsPy/mKY8U10=
  persistentKeepalive: 0
---
apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: phone
spec:
  allowedIPs:
  - 10.5.0.2/32
  publicKey: urgVgSoHEwG5/7q0k5NpjWSBpAyxPfhvdT/v0zd561o=
  persistentKeepalive: 0

Taking a stab in the dark that something is up with the laptop peer, I created a third peer, dummy, and connected from my laptop. No good; there's mesh reconciliation there too.

apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
  name: dummy
spec:
  allowedIPs:
  - 10.5.0.3/32
  publicKey: AzckRiPfM30PNbyX/kxCv59YlIfaoj/hVU7LPkxuuAw=
  persistentKeepalive: 0

Okay, so now thinking something is up with the clients, I migrate the laptop peer config to my phone and connect from there. No good; reconciliation again. I try dummy from my phone. Also reconciliation.

So now the reverse—export the phone peer and import it on my laptop. Strange—there's no reconciliation at all. For whatever reason the phone peer doesn't cause any undesired behaviour.

@sbaildon
Copy link
Contributor Author

sbaildon commented May 25, 2022

I moved the private key from dummy to phone, kept the rest the same; mesh reconciliation.

Reset phone back to the original keypair—no reconciliation.

🤯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants