Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e test for CVE-2021-29923 #107552

Merged
merged 1 commit into from
Feb 9, 2022

Conversation

aojea
Copy link
Member

@aojea aojea commented Jan 14, 2022

The e2e test checks that the component implementing Kubernetes Services
interprets ClusterIPs with leading zeros as decimal, otherwise the
cluster will be exposed to CVE-2021-29923.

This also verifies the validation for Services.

The other objects that are susceptible to be vulnerable are Endpoints and EndpointSlices, but they commonly are auto generated, that makes it harder to test, so this test is a necessary but not sufficient condition.

/kind cleanup

added an e2e test to verify that the cluster is not vulnerable to CVE-2021-29923 when using Services with IPs with leading zeros, note that this test is a necessary but not sufficient condition, all the components in the clusters that consume IPs addresses from the APIs MUST interpret them as decimal or discard them.

Ref #100895

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 14, 2022
@k8s-ci-robot
Copy link
Contributor

@aojea: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 14, 2022
@aojea
Copy link
Member Author

aojea commented Jan 14, 2022

/sig network
/sig api-machinery
/assign @sttts @thockin @liggitt
/hold
I want to run it downstream too to verify I'm not doing wrong assumptions

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2022
@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Jan 14, 2022
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 14, 2022
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jan 14, 2022
@aojea
Copy link
Member Author

aojea commented Jan 14, 2022

nc -v -t -w 2 funny-ip 80
nc: getaddrinfo: Name does not resolve
command terminated with exit code 1

😅 I have to confirm if is coredns that is discarding the IP with leading zeros @chrisohaver

@aojea
Copy link
Member Author

aojea commented Jan 14, 2022

Indeed coredns doesn't consider the ips valid ...
https://github.com/coredns/coredns/blob/5b9b079dabc7f71463cea3f0c6a92f338935039d/plugin/kubernetes/ns.go#L50

this is going to be a chaos, hopefully nobody is using these addresses 🙃

-01-14T12:32:42.138527975Z stdout F [INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
2022-01-14T12:32:42.138546162Z stdout F CoreDNS-1.8.6
2022-01-14T12:32:42.138552199Z stdout F linux/amd64, go1.17.1, 13a9191

@chrisohaver
Copy link
Contributor

nc -v -t -w 2 funny-ip 80
nc: getaddrinfo: Name does not resolve
command terminated with exit code 1

😅 I have to confirm if is coredns that is discarding the IP with leading zeros @chrisohaver

Any CoreDnS release compiled with Go 1.17 will treat IPv4 with leading zeros as invalid. We started compiling releases with Go 1.17 in 1.8.5.

From Go 1.17 Docs: https://golang.org/doc/go1.17#minor_library_changes

The ParseIP and ParseCIDR functions now reject IPv4 addresses which contain decimal components with leading zeros. These components were always interpreted as decimal, but some operating systems treat them as octal. This mismatch could hypothetically lead to security issues if a Go application was used to validate IP addresses which were then used in their original form with non-Go applications which interpreted components as octal. Generally, it is advisable to always re-encode values after validation, which avoids this class of parser misalignment issues.

@aojea
Copy link
Member Author

aojea commented Jan 14, 2022

Any CoreDnS release compiled with Go 1.17 will treat IPv4 with leading zeros as invalid. We started compiling releases with Go 1.17 in 1.8.5.

yeah, thanks for confirming, the good side is that users will be forced to update the IPs with leading zeros, despite the data being valid in kubernetes, the users will not be able to have a fully working cluster because DNS will not work ... if we add the rest of the networking ecosystem Ingresses, CNI plugins, ... that most probably are doing the same, we can think in removing the custom parsers after a few releases

@aojea
Copy link
Member Author

aojea commented Jan 14, 2022

I've modified the test to fail only if it can't connect to the IP in decimal format but it can against the IP interpreted in octal format, that is just what the CVE describes

@aojea aojea force-pushed the e2e_parse_misalignment branch 2 times, most recently from 9d23454 to 6d00b90 Compare January 14, 2022 15:54
The e2e test checks that the component implementing Kubernetes Services
interprets ClusterIPs with leading zeros as decimal, otherwise the
cluster will be exposed to CVE-2021-29923.
@aojea
Copy link
Member Author

aojea commented Jan 14, 2022

/retest
this is working now

@aojea
Copy link
Member Author

aojea commented Jan 18, 2022

k8s.io/kubernetes/pkg/controller/statefulset: TestStatefulSetControlRollingUpdateWithPartition/SetDeleteOnly/StatefulSetAutoDeletePVCEnabled expand_more 1s
k8s.io/kubernetes/pkg/controller/statefulset: TestStatefulSetControlRollingUpdateWithPartition expand_more

/test pull-kubernetes-unit

@aojea
Copy link
Member Author

aojea commented Jan 18, 2022

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 18, 2022
@fedebongio
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 18, 2022
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Are we going to let this go forever? It seems so brittle.

We know that rejecting these would likely break stored data.

We know that canonicalizing these could cause apply-loop problems.

We know that not doing anything is brittle.

Deciding not to decide is terrible. What if started canonicalizing IP strings. In general we don't want to do this, but I think the risk justifies it here. If we were to do this, would we also canonicalize IPv6 addresses?

})

// Try to get a free IP that has different decimal and binary interpretation with leading zeros.
// Return both IPs, the one interpretad as binary and the one interpreted as decimal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/binary/octal

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member Author

aojea commented Feb 8, 2022

Are we going to let this go forever? It seems so brittle

I vote to wait until golang 1.16 is deprecated, check that no other project used the sloppy parsers and start a one year warning that the ips with zeros will be invalid.

The data can be valid, but will be practicallly unusable , at minimum DNS will now work #107552 (comment)

@thockin
Copy link
Member

thockin commented Feb 8, 2022 via email

@aojea
Copy link
Member Author

aojea commented Feb 8, 2022

How does golang 1.16 really relate to this? It doesn't change existing
saved data or clients (not all of which are Go, of course).

I was looking what APIs were using the parsers, and I found these:

gvr("", "v1", "nodes"): `{"kind": "Node", "apiVersion": "v1", "metadata": {"name": "node1"}, "spec": {"unschedulable": true}, "status": {"addresses":[{"address":"172.18.0.012","type":"InternalIP"}]}}`,
gvr("", "v1", "pods"): `{"kind": "Pod", "apiVersion": "v1", "metadata": {"name": "pod1", "namespace": "test-cve-2021-29923"}, "spec": {"containers": [{"image": "` + "image" + `", "name": "container7", "resources": {"limits": {"cpu": "1M"}, "requests": {"cpu": "1M"}}}]}, "status": {"podIP":"10.244.0.05","podIPs":[{"ip":"10.244.0.05"}]}}`,
gvr("", "v1", "services"): `{"kind": "Service", "apiVersion": "v1", "metadata": {"name": "service1", "namespace": "test-cve-2021-29923"}, "spec": {"clusterIP": "10.0.0.011", "externalIP": "192.168.0.012", "externalName": "service1name", "ports": [{"port": 10000, "targetPort": 11000}], "selector": {"test": "data"}}}`,
gvr("", "v1", "endpoints"): `{"kind": "Endpoints", "apiVersion": "v1", "metadata": {"name": "ep1name", "namespace": "test-cve-2021-29923"}, "subsets": [{"addresses": [{"hostname": "bar-001", "ip": "192.168.3.011"}], "ports": [{"port": 8000}]}]}`,
// k8s.io/kubernetes/pkg/apis/discovery/v1
gvr("discovery.k8s.io", "v1", "endpointslices"): `{"kind": "EndpointSlice", "apiVersion": "discovery.k8s.io/v1", "metadata": {"name": "slicev1", "namespace": "test-cve-2021-29923"}, "addressType": "IPv4", "protocol": "TCP", "ports": [], "endpoints": [{"addresses": ["10.244.0.011"], "conditions": {"ready": true, "serving": true, "terminating": false}, "nodeName": "control-plane"}]}`,
// k8s.io/kubernetes/pkg/apis/networking/v1
gvr("networking.k8s.io", "v1", "ingresses"): `{"kind": "Ingress", "apiVersion": "networking.k8s.io/v1", "metadata": {"name": "ingress3", "namespace": "test-cve-2021-29923"}, "spec": {"defaultBackend": {"service":{"name":"service", "port":{"number": 5000}}}}, "status":{"loadBalancer":{"ingress": [{"ip":"10.0.0.013"}]}}}`,
gvr("networking.k8s.io", "v1", "networkpolicies"): `{"kind": "NetworkPolicy", "apiVersion": "networking.k8s.io/v1", "metadata": {"name": "np2", "namespace": "test-cve-2021-29923"}, "spec": {"egress":[{"ports":[{"port":5978,"protocol":"TCP"}],"to":[{"ipBlock":{"cidr":"10.0.012.0/24"}}]}],"ingress":[{"from":[{"ipBlock":{"cidr":"172.017.0.0/16","except":["172.17.001.0/24"]}},{"podSelector":{"matchLabels":{"role":"frontend"}}}],"ports":[{"port":6379,"protocol":"TCP"}]}],"podSelector":{"matchLabels":{"role":"db"}},"policyTypes":["Ingress","Egress"]}}`,

The data is not going to change, but people using that data will have to migrate it or DNS will not work, per example,

#107552 (comment)

and my observation is that all projects are upgrading to 1.17 choosing the new parsers and not to drop the IPs with leading zeros, so is just not DNS, is most of the networking features.

gardener/gardener#4822
antrea-io/antrea#3189
ovn-org/ovn-kubernetes#2784
cilium/cilium#17190
projectcalico/calico#5285

Actually, it seems that only openshift and kubernetes upstream are using the sloppy parsers

https://grep.app/search?q=ParseIPSloppy&case=true

I think that being strict about data compatibility is right, but in this case, is really unlikely that people is using IPs with leading zeros, but in case there is someone, it is more unlikely that those IPs keep working after go 1.16 is deprecated.

I've also made a tool to detect this IPs on etcd dumps, https://github.com/aojea/funny-ip-etcd-detector

I think that an exception can be made for this, with a proper announcement and having a reasonable period to minimize the risk, so we can move to the stdlib parsers.

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 2047936 into kubernetes:master Feb 9, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Feb 9, 2022
@thockin
Copy link
Member

thockin commented Feb 9, 2022 via email

@aojea
Copy link
Member Author

aojea commented Feb 9, 2022

...because we are the only ones being careful about compat...

I don't think that we are the only that care about compat, golang decided to break this compat for a good reason, and is to avoid exposing users to this problem of parsers misalignment, and so other projects are doing.

I don't think we should break saved data.

I always think that the compat promise is a contract and as any contract in negotiable, and in this case I can see that with enough notice is a reasonable break with a very minimum risk.

My point is that I argue this saved data is valid, since as we can see, you are rolling a dice, depending on the consumer of the data, the IP can mean one thing or the other, and we are breaking for a good reasoning, avoiding users to be exposed to a very subtle security issue.

The big problem here is not the api machinery, if we know "these are the fields" we can do any of the solutions you propose, but there is no "IP" field or an easy way to track what objects and fields are affected, you can have IPs in any string field, and we can end with some IPs allow zeros and others not 🤷

@thockin
Copy link
Member

thockin commented Feb 10, 2022

Just put yourself in the place of a cluster user whose stuff works on Monday, and stops working on Tuesday. You'd be in a panic. Now you find out "Kubernetes changed some policy" and the only place it was documented was in the cluster's release notes (which, as a user, I probably don't read). How unhappy are you?

if we know "these are the fields" we can do any of the solutions you propose

You're not wrong, but I still think we should try, at least for all the things we know about. How about something like:

  1. Any new builtin APIs should use ParseIP/ParseCIDR (not sloppy) and we let validation fail.
  2. For the places we know of (service, ingress, netpol, pod, node, ...) we set a condition like "AmbiguousAddressSyntax", log them, maybe a metric.
  3. Allow updates to otherwise immutable fields as long as the IP adress is changing from a malformed value to a correct value.
  4. Ship that in 1.24.
  5. Warn that in 1.25 or 1.26 such IPs will be tolerated if they exist, but rejected for new usage.
  6. Implement (5)
  7. Warn that in N+1 or N+2 we will fix such IPs upon read, effectively purging them from k8s
  8. Implement (7) and ship
  9. Maybe put these updates in a repair loop, ship that, then eventually remove sloppy parsing entirely

@aojea
Copy link
Member Author

aojea commented Feb 10, 2022

Sounds like a good plan 👍 ... I'm in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants