Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use IPv6 literal in registry mirror endpoint #9897

Open
brandond opened this issue Apr 9, 2024 · 4 comments
Open

Cannot use IPv6 literal in registry mirror endpoint #9897

brandond opened this issue Apr 9, 2024 · 4 comments
Assignees
Labels
kind/upstream-issue This issue appears to be caused by an upstream bug
Milestone

Comments

@brandond
Copy link
Contributor

brandond commented Apr 9, 2024

This appears to be a bug in container'd URL parsing, as well as a limitation of the toml configuration format.

Specifying an RFC2732-compliant URL containing an IPv6 address literal as a registry endpoint generates the following toml:

host."https://[fd7c:53a5:aef5::242:ac11:7]"]
  capabilities = ["pull", "resolve"]
  skip_verify = true

This fails to load because square braces are not valid in TOML keys:

time="2024-04-09T20:26:57.094249564Z" level=error msg="failed to decode hosts.toml" error="failed to parse TOML: (8, 2): unexpected token table key cannot contain ']', was expecting a table key"

What if we just remove the braces:

[host."https://fd7c:53a5:aef5::242:ac11:7"]
  capabilities = ["pull", "resolve"]
  skip_verify = true

Apr 09 20:22:35 systemd-node-1 k3s[2162]: E0409 20:22:35.460639 2162 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"rancher/mirrored-pause:3.6\": failed to pull image \"rancher/mirrored-pause:3.6\": failed to pull and unpack image \"docker.io/rancher/mirrored-pause:3.6\": failed to resolve reference \"docker.io/rancher/mirrored-pause:3.6\": failed to do request: Head \"https://fd7c:53a5:aef5::242:ac11:7/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io\": dial tcp [fd7c:53a5:aef5::242:ac11]:7: connect: no route to host" pod="kube-system/metrics-server-54fd9b65b-2c77m"

This results in the wrong behavior; the final octet of the IPv6 address literal is being used as the port. What about adding the port to the literal?

[host."https://fd7c:53a5:aef5::242:ac11:7:443"]
  capabilities = ["pull", "resolve"]

This allows the dialer to connect, however it appears that the requests fail because the host header sent in the request is invalid:

time="2024-04-09T20:36:11.723359701Z" level=debug msg="do request" host="fd7c:53a5:aef5::242:ac11:7:443" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/v1.7.11-k3s2 request.method=HEAD url="https://fd7c:53a5:aef5::242:ac11:7:443/rancher/mirrored-pause/manifests/3.6?ns=docker.io"
time="2024-04-09T20:36:11.738154285Z" level=debug msg="fetch response received" host="fd7c:53a5:aef5::242:ac11:7:443" response.header.content-length=19 response.header.content-type="text/plain; charset=utf-8" response.header.date="Tue, 09 Apr 2024 20:36:11 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.status="404 Not Found" url="https://fd7c:53a5:aef5::242:ac11:7:443/rancher/mirrored-pause/manifests/3.6?ns=docker.io"

Pulling the image directly seems to work OK, as the host is already properly escaped:

root@systemd-node-1:/# cat "/var/lib/rancher/k3s/agent/etc/containerd/certs.d/[fd7c:53a5:aef5::242:ac11:7]/hosts.toml"
# File generated by k3s. DO NOT EDIT.

server = "https://[fd7c:53a5:aef5::242:ac11:7]/v2"
capabilities = ["pull", "resolve", "push"]

skip_verify = true

[host]

root@systemd-node-1:/# crictl pull '[fd7c:53a5:aef5::242:ac11:7]/library/busybox:latest'
Image is up to date for sha256:ba5dc23f65d4cc4a4535bce55cf9e63b068eb02946e3422d3587e8ce803b6aab
@brandond brandond added the kind/upstream-issue This issue appears to be caused by an upstream bug label Apr 9, 2024
@brandond brandond changed the title Cannot use IPv6 literal as registry mirror endpoint Cannot use IPv6 literal in registry mirror endpoint Apr 9, 2024
@brandond
Copy link
Contributor Author

brandond commented Apr 9, 2024

Tracking this in containerd at containerd/containerd#10055

@brandond brandond added this to the Backlog milestone Apr 9, 2024
@brandond
Copy link
Contributor Author

brandond commented Apr 9, 2024

It appears that Kubernetes itself has similar issues, if I try to use --system-default-registry=[fd7c:53a5:aef5::242:ac11:7] it rejects the pod, despite my being able to pull that via crictl:

root@systemd-node-1:/# kubectl get pod -n kube-system   coredns-576548d8fb-bvcsc -o yaml | grep -A5 state
    state:
      waiting:
        message: 'Failed to apply default image tag "[fd7c:53a5:aef5::242:ac11:7]/rancher/mirrored-coredns-coredns:1.10.1":
          couldn''t parse image name "[fd7c:53a5:aef5::242:ac11:7]/rancher/mirrored-coredns-coredns:1.10.1":
          invalid reference format'
        reason: InvalidImageName

root@systemd-node-1:/# crictl pull [fd7c:53a5:aef5::242:ac11:7]/rancher/mirrored-coredns-coredns:1.10.1
Image is up to date for sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc

Ref:

Bumping that library, and quoting the image: value in our manifests (including the traefik deployment manifest) appears to make things work properly.

EDIT: this was actually fixed in distribution/reference@992adca which is in all tagged releases of distribution/reference. distribution/distribution v2.8.3 switched to using that as the backend for the deprecated distribution/distribution/reference module, while Kubernetes 1.29 switched from using distribution/distribution/reference to using distribution/reference. tl;dr just use the the newer distribution/distribution release, which will resolve the issue on all versions of Kubernetes.

@brandond
Copy link
Contributor Author

brandond commented Apr 12, 2024

This also breaks spegel on nodes with ipv6 as the primary address family, as we use 127.0.0.1 or [::1] as the address for the embedded registry endpoint. I suppose we could change this to just use localhost instead but we've seen issues in the past with environments where that for some reason does not resolve properly.

# File generated by k3s. DO NOT EDIT.

server = "https://registry-1.docker.io/v2"
capabilities = ["pull", "resolve", "push"]

[host."https://[::1]:6443/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/var/lib/rancher/k3s/agent/server-ca.crt"]
  client = [["/var/lib/rancher/k3s/agent/client-k3s-controller.crt", "/var/lib/rancher/k3s/agent/client-k3s-controller.key"]]

@brandond
Copy link
Contributor Author

brandond commented Apr 12, 2024

I'll probably pull this in to our fork even if upstream doesn't accept it.

We'll need to modify our hosts.toml generator to output escaped keys. Probably want to do that using a custom function in the template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/upstream-issue This issue appears to be caused by an upstream bug
Projects
Status: Working
Development

No branches or pull requests

2 participants