Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolvers for IPv6 gets removed during boot [OpenStack] #8690

Open
MindTooth opened this issue May 2, 2024 · 4 comments
Open

DNS resolvers for IPv6 gets removed during boot [OpenStack] #8690

MindTooth opened this issue May 2, 2024 · 4 comments

Comments

@MindTooth
Copy link
Contributor

MindTooth commented May 2, 2024

Bug Report

Thanks for the help so far. 馃檹馃徎

Description

In OpenStack I have two subnets, one IPv4 and one for IPv6. On both I have set two DNS servers, in total four. However, the resolvers for IPv6 gets removed. Also, the network setup flips back and forth, so it's difficult for me to understand what is happening.

E.g. I have like five [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": messages and four updated dns server nameservers {"component": "dns-resolve-cache", "addrs" messages.

Some messages about the IPv4 address being removed/added twice. So for me it's so strange to see that the interface is reconfigured so many times during a boot. 馃槃

Currently on Cilium. I can try Flannel too?


Ed1t: seems that because of this, looking up IPv6 takes time, resulting in some time before it proceeds. Does it use some fallback when IPv6 resolvers are not added? talosctl logs dns-resolve-cache does not show lookups.

Logs

  • routes
  • routespec
  • dmesg from all three nodes - cp3 is fresh node today

dns_issue.tgz - Must be de-encrypted.


Ed1t: ran with metadata service as the initial cluster was with cloud-drive. This gave a different result:

debug_cp2_metadata.tgz - decrypt

Now it can't find the resolvers for IPv6 at all. 馃

network_data.json - decrypt

Environment

  • Talos version:
Client:
	Tag:         v1.7.1
	SHA:         e9cb904e
	Built:       
	Go version:  go1.22.2
	OS/Arch:     darwin/arm64
Server:
	NODE:        10.10.10.51
	Tag:         v1.7.1
	SHA:         e9cb904e
	Built:       
	Go version:  go1.22.2
	OS/Arch:     linux/amd64
	Enabled:     RBAC
  • Kubernetes version:
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
  • Platform: OpenStack
@smira
Copy link
Member

smira commented May 16, 2024

I'm not sure what the issue is, but it might be easier for you do the initial triage to help me with that, as it's hard to dig into configuration with zero knowledge about the context.

Following the guide you can yourself see what are the all configuration sources and how it got merged.

E.g. for the resolvers:

  1. Get all configured resolvers (from all sources):
    talosctl get resolverspecs --namespace=network-config -o yaml
    
  2. Get final merged configuration:
    talosctl get resolverspecs -o yaml
    

If you see an issue at this point (something got merged wrong, wrong priority), you can either fix it in on your side, or report an issue.

The remaining piece is translation of OpenStack metadata to Talos network configuration, an easy way here is to compare the OpenStack metadata document (which you already have) and what Talos translated it to, which you can read with
talosctl read /system/state/platform-network.yaml (see here). If there's a bug at this point, a specific issue might be helpful, so we can add a buggy path to the unit-tests.

@smira
Copy link
Member

smira commented May 16, 2024

I might guess there's a conflict between DNS resolvers coming from DHCP, OpenStack metadata and your machine config, but some data gathered as I outlined above would help a lot.

@MindTooth
Copy link
Contributor Author

Yes, you are right. Operator before Platform. I've attached the outputs.

resolve_issue.tgz


So, from the data you can see that DHCP4 takes present over platform (IPv6 SLAAC).

169.254.169.254/openstack/latest/network_data.json contains a section for "services": [] and inside you have "type": "dns". I would assume, that this will always contain the cumulative collection of all DNS adresses added to the subnets.

"services": [
{
"address": "8.8.8.8",
"type": "dns"
},
{
"address": "1.1.1.1",
"type": "dns"
}
]
}

Would it be natural for OpenStack to include all DNS from DHCP and also append the unique adresses from "services":?

Without explicitly setting the DNS inside machine:, Talos should gather all resolvers by OpenStack. This use case is maybe rare with IPv6 and especially with SLAAC. But, either we need to update docs to force users to set resolvers explicitly or change the logic for the OpenStack integration.

Thoughts?

Thank you for taking the time to reply.

@smira
Copy link
Member

smira commented May 17, 2024

This one seems similar to the hostname issue, but not quite the same.

As I get from your dump, OpenStack returns a full list of DNS servers (2 IPv4 + 2 IPv6), but configures the interface to run DHCPv4, which obviously only return 2 IPv4 DNS servers (can't return IPv6). So in this particular case, I wonder if we should take some special rules to merge the lists of resolvers, as clearly we could be smart enough to preserve the IPv6 resolvers. I will think about this case a bit more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants