Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet sending invalid grpc header during plugin registration #109081

Closed
travisghansen opened this issue Mar 28, 2022 · 10 comments · Fixed by #112597
Closed

kubelet sending invalid grpc header during plugin registration #109081

travisghansen opened this issue Mar 28, 2022 · 10 comments · Fixed by #112597
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@travisghansen
Copy link

What happened?

When registering a new csi driver the grpc client is sending the filepath of the uds socket as the host/authority header instead of localhost. In practice almost all grpc / http2 servers outright reject this as invalid. Given that the grpc-go client was updated years ago I'm unclear why this problem still persists to this day :(

kubelet logs look like this:

I0328 18:55:05.477185    2824 plugin_watcher.go:215] "Removing socket path from desired state cache" path="/var/lib/kubelet/plugins_registry/org.democratic-csi.local-hostpath-reg.sock"
I0328 18:55:05.477248    2824 plugin_watcher.go:203] "Adding socket path or updating timestamp to desired state cache" path="/var/lib/kubelet/plugins_registry/org.democratic-csi.local-hostpath-reg.sock"
I0328 18:55:05.537582    2824 kubelet.go:2091] "SyncLoop (PLEG): event for pod" pod="democratic-csi/local-hostpath-democratic-csi-node-jr2ld" event=&{ID:ac582e0d-9fc5-4037-871f-987dd179116a Type:ContainerStarted Data:e95fd0639fc013df6190156a8628e9dd41e6f5b4da57b3f0b1ad4da10e09f1c2}
I0328 18:55:05.875089    2824 csi_plugin.go:178] kubernetes.io/csi: registrationHandler.DeRegisterPlugin request for plugin org.democratic-csi.local-hostpath
I0328 18:55:05.875014    2824 reconciler.go:143] "OperationExecutor.UnregisterPlugin started" plugin={SocketPath:/var/lib/kubelet/plugins_registry/org.democratic-csi.local-hostpath-reg.sock Timestamp:2022-03-28 18:54:35.884588055 +0000 UTC m=+224637.374833371 Handler:0x77e4340 Name:org.democratic-csi.local-hostpath}
I0328 18:55:06.876702    2824 reconciler.go:160] "OperationExecutor.RegisterPlugin started" plugin={SocketPath:/var/lib/kubelet/plugins_registry/org.democratic-csi.local-hostpath-reg.sock Timestamp:2022-03-28 18:55:05.477255587 +0000 UTC m=+224666.967500904 Handler:<nil> Name:}
I0328 18:55:06.910834    2824 csi_plugin.go:99] kubernetes.io/csi: Trying to validate a new CSI Driver with name: org.democratic-csi.local-hostpath endpoint: /var/lib/kubelet/plugins/org.democratic-csi.local-hostpath/csi.sock versions: 1.0.0
I0328 18:55:06.911192    2824 csi_plugin.go:112] kubernetes.io/csi: Register new plugin with name: org.democratic-csi.local-hostpath at endpoint: /var/lib/kubelet/plugins/org.democratic-csi.local-hostpath/csi.sock
W0328 18:55:06.931819    2824 csi_client.go:184] Error calling CSI NodeGetInfo(): rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR
E0328 18:55:07.046141    2824 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins_registry/org.democratic-csi.local-hostpath-reg.sock" failed. No retries permitted until 2022-03-28 18:55:07.546097706 +0000 UTC m=+224669.036343058 (durationBeforeRetry 500ms). Error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR: rpc error: code = Unavailable desc = error reading from server: EOF

csi server logs look like this:

Http2Session server (33) session created
Http2Session server (33) i/o stream consumed
Http2Session server (33) scheduling write
Http2Session server (33) sending pending data
Http2Session server (33) nghttp2 has 9 bytes to send
Http2Session server (33) wants read? 1
Http2Session server (33) receiving 33 bytes, offset 0
Http2Session server (33) receiving 33 bytes [wants data? 1]
Http2Session server (33) complete frame received: type: 4
Http2Session server (33) sending pending data
Http2Session server (33) nghttp2 has 9 bytes to send
Http2Session server (33) wants read? 1
Http2Session server (33) wants read? 1
Http2Session server (33) receiving 168 bytes, offset 0
Http2Session server (33) receiving 168 bytes [wants data? 1]
Http2Session server (33) complete frame received: type: 4
Http2Session server (33) settings refreshed for session
Http2Session server (33) beginning headers for stream 1
Http2Session server (33) Error 'Invalid HTTP header field was received: frame type: 1, stream: 1, name: [:authority], value: [/var/lib/kubelet/plugins/org.democratic-csi.local-hostpath/csi.sock]'
Http2Session server (33) invalid frame received (0/1000), code: -531
Http2Session server (33) sending pending data
Http2Session server (33) stream 1 closed with code: 1
HttpStream 1 (37) [Http2Session server (33)] closed with code 1
HttpStream 1 (37) [Http2Session server (33)] destroying stream
Http2Session server (33) nghttp2 has 13 bytes to send
Http2Session server (33) wants read? 1
Http2Session server (33) wants read? 1
HttpStream 1 (37) [Http2Session server (33)] tearing down stream

What did you expect to happen?

Registration to move along happily.

How can we reproduce it (as minimally and precisely as possible)?

simple instructions here: k3s-io/k3s#5346

Anything else we need to know?

Most csi drivers are written in go and the go grpc server simply ignores this non-compliant header, otherwise the issue would be far more prevalent.

Kubernetes version

$ kubectl version
# paste output here

Every version up until current AFAIK.

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@travisghansen travisghansen added the kind/bug Categorizes issue or PR as related to a bug. label Mar 28, 2022
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 28, 2022
@k8s-ci-robot
Copy link
Contributor

@travisghansen: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 28, 2022
@travisghansen
Copy link
Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 28, 2022
@mythi
Copy link
Contributor

mythi commented Apr 7, 2022

@travisghansen I ran into this error in my use-case and debugged what's going on. I think the problem goes away with a recent enough grpc-go and when the socket path is correctly prefixed in the client connection:

https://github.com/grpc/grpc-go/blob/5682cc6a321507ce2a865beb8dd4bed9a5db0e06/clientconn.go#L1654-L1665

I though I could start working on fixing this in kubelet.

@travisghansen
Copy link
Author

That would be awesome! Then I can finally do away with all the crazy workarounds.

@mythi
Copy link
Contributor

mythi commented Apr 8, 2022

The nomad issue link you gave tried the approach I was talking about without luck it seems. I need take a closer look what's wrong...

@mythi
Copy link
Contributor

mythi commented Apr 22, 2022

btw, this look a duplicate of #107093 and #108254

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2022
@mythi
Copy link
Contributor

mythi commented Jul 21, 2022

/remove-lifecycle stale

currently planning to use grpc.WithAuthority("localhost"), but I have some cleanup pending with #109778 first

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2022
@mythi
Copy link
Contributor

mythi commented Oct 19, 2022

/remove-lifecycle stale

The fix PR is pending approval

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
4 participants