Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI plugins sent incorrect authority headers during registration with kubelet #108254

Closed
EricRnR opened this issue Feb 21, 2022 · 6 comments · Fixed by #112597
Closed

CSI plugins sent incorrect authority headers during registration with kubelet #108254

EricRnR opened this issue Feb 21, 2022 · 6 comments · Fixed by #112597
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@EricRnR
Copy link

EricRnR commented Feb 21, 2022

What happened?

When using the CSI driver node registration sidecar container, the kubelet-registration-path parameter is set to either unix:///path/to/unix.sock or /path/to/unix.sock. One or both of these options should cause kubelet to send a valid authority header to the socket. In the former case, kubelet will fail to find the file path since it will pass the unix header in the net.Dialer target to dial. In the latter, the dialer will succeed to call the container over the socket but send an incorrect :authority pseudo header (the /path/to/unix.sock).

What did you expect to happen?

A call into the CSI container with a valid authority header, using one or either kubelet-registration-path parameter.

How can we reproduce it (as minimally and precisely as possible)?

Deploy a CSI plugin example, setting kubelet-registration-path on the node-driver-registrar sidecar container to either a unix:///path/to/unix.sock or /path/to/unix.sock. Note, a plugin may not fail in the latter case if the CSI plugin is written in a language with an http2 library that does not strictly check the authority header, however the header will still be incorrect. If using Rust as a language, the h2 library will strictly check the authority header and return a protocol error. Go does not seem to reject the invalid authority header, which is perhaps why most plugins do not notice the issue.

Anything else we need to know?

Using /path/to/unix.sock will not have a 'unix:' header. It checks for this to substitute 'localhost' as the authority here, which will not happen in this case.

Using unix:///path/to/unix.sock will get the authority substituted, but will pass the full 'unix:' header in as part of the path file. Related code can be seen here (non-nil custom dialer) and here (newGrpcConn looks like it expects no unix header based on log entry and externally supplied dialcontext). This work may have been overlapping with related work in grpc-go here and plans here, where both libraries seem to be taking responsibility for managing the authority header for unix sockets now.

Kubelet logs using the unix header:

Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: I0219 14:41:11.454702  238934 csi_plugin.go:99] kubernetes.io/csi: Trying to validate a new CSI Driver with name: thin-sync-csi.racksandrails.com endpoint: unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock versions: 1.0.0
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: I0219 14:41:11.454787  238934 csi_plugin.go:112] kubernetes.io/csi: Register new plugin with name: thin-sync-csi.racksandrails.com at endpoint: unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: W0219 14:41:11.455279  238934 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins/csi-thinsync/csi.sock localhost 0xc0044710c0 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock: connect: no such file or directory". Reconnecting...
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: W0219 14:41:11.455476  238934 csi_client.go:184] Error calling CSI NodeGetInfo(): rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock: connect: no such file or directory"
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: E0219 14:41:11.478062  238934 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock" failed. No retries permitted until 2022-02-19 14:41:11.978009921 -0500 EST m=+3309.907550017 (durationBeforeRetry 500ms). Error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock: connect: no such file or directory": rpc error: code = Unavailable desc = error reading from server: EOF
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: W0219 14:41:11.478162  238934 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock /var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock: connect: connection refused". Reconnecting...

Of note: the inconsistent 'Error while dialing dial unix' entries: one showing the unix: header for the path, while the registration socket shows it without. Also, the 'localhost' replacement is visible in the logs for the first (unix:-prefixed) and not the second registration notification call (non-prefixed).

node registration sidecar logs when not using the unix header:

I0221 13:07:20.377752 1 main.go:167] Running node-driver-registrar in mode=registration
I0221 13:07:20.378223 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0221 13:07:20.378237 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0221 13:07:20.378554 1 main.go:198] Calling CSI driver to discover driver name
I0221 13:07:20.378565 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0221 13:07:20.378568 1 connection.go:184] GRPC request: {}
I0221 13:07:20.380870 1 connection.go:186] GRPC response: {"name":"thin-sync-csi.racksandrails.com","vendor_version":"0.1"}
I0221 13:07:20.380918 1 connection.go:187] GRPC error: <nil>
I0221 13:07:20.380924 1 main.go:208] CSI driver name: "thin-sync-csi.racksandrails.com"
I0221 13:07:20.380954 1 node_register.go:53] Starting Registration Server at: /registration/thin-sync-csi.racksandrails.com-reg.sock
I0221 13:07:20.381066 1 node_register.go:62] Registration Server started at: /registration/thin-sync-csi.racksandrails.com-reg.sock
I0221 13:07:20.381106 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0221 13:07:21.953021 1 main.go:102] Received GetInfo call: &InfoRequest{}
I0221 13:07:21.953276 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi-thinsync/registration"
I0221 13:07:21.961349 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR,}
E0221 13:07:21.961377 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR, restarting registration container.

Of note, kubelet notifies the container it received a protocol error. The CSI container rust logs show the matching protocol error and authority header value:

[2022-02-21T15:05:56Z DEBUG h2::server] malformed headers: malformed authority (b"/var/lib/kubelet/plugins/csi-thinsync/csi.sock"): invalid uri character
[2022-02-21T15:05:56Z DEBUG h2::codec::framed_read] received frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
[2022-02-21T15:05:56Z DEBUG h2::codec::framed_write] send frame=Reset { stream_id: StreamId(1), error_code: PROTOCOL_ERROR }

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:32:02Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Bare metal

OS version

# On Linux:
$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="35 (Server Edition)"
ID=fedora
VERSION_ID=35
VERSION_CODENAME=""
PLATFORM_ID="platform:f35"
PRETTY_NAME="Fedora Linux 35 (Server Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:35"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f35/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=35
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=35
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Server Edition"
VARIANT_ID=server

$ uname -a
Linux server001.ga.racksandrails.net 5.15.14-200.fc35.x86_64 #1 SMP Tue Jan 11 16:49:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Container runtime (CRI) and and version (if applicable)

cri-o

Related plugins (CNI, CSI, ...) and versions (if applicable)

calico, metal-lb, bgp, applicable 1.23 versions.
@EricRnR EricRnR added the kind/bug Categorizes issue or PR as related to a bug. label Feb 21, 2022
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 21, 2022
@k8s-ci-robot
Copy link
Contributor

@EricRnR: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 21, 2022
@EricRnR
Copy link
Author

EricRnR commented Feb 21, 2022

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 21, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 21, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
3 participants