Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubectl: Keepalive connection to API server for exec and logs. #94301

Closed
nielsbasjes opened this issue Aug 28, 2020 · 24 comments
Closed

Kubectl: Keepalive connection to API server for exec and logs. #94301

nielsbasjes opened this issue Aug 28, 2020 · 24 comments
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI.

Comments

@nielsbasjes
Copy link

What would you like to be added:

A feature (enabled by default) that any "long running" kubectl command (like exec and logs) send a periodic keep alive signal to the API server.

At this point this is only available for the proxy command where it is disabled by default.

I propose to have this keepalive made a part of also the exec and logs (and others?) and to set the default value to 5s.

Why is this needed:

When running a kubernetes cluster in High Available mode (i.e. multiple api servers behind a loadbalancer) it is common that client side inactivity timeouts are set in the load balancer so it is able to cleanup stale connections.

The advised haproxy config (which I have right now) advises to set the timeouts for client and server to 20s.
https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md#haproxy-configuration

What I ran into is that I listen to the logs of a pod (i.e. kubectl logs -f) and the process at hand took a "long time" between the log messages (I'm debugging a new application). After the configured timeout (the mentioned 20 seconds) the connection would be lost and I had to restart the 'logs -f' to see the rest of the messages.

Also exec a shell in a pod to examine what is happening, looking something up on a webpage and then returning to the shell to find it has been closed because I was idle for more than 20 seconds is not very productive.

See also: #58486

@nielsbasjes nielsbasjes added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 28, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 28, 2020
@nielsbasjes
Copy link
Author

/sig api-machinery
/sig cli

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 28, 2020
@nielsbasjes
Copy link
Author

Note that at this point the only available mitigation for these practical problems is to increase the timeouts on the loadbalancer (haproxy in my case).
Yet this not desirable because

  1. also truly stale connections will remain in the memory of the loadbalancer for a longer time which may overload things in busy/large installations.
  2. the connections will still be closed after the longer timeout even if this is not wanted.

@fedebongio
Copy link
Contributor

/assign @lavalamp
(who volunteered to find the previous one related)

@lavalamp
Copy link
Member

lavalamp commented Sep 1, 2020

I think #94170 was the one I was thinking of. It adds the feature, but we would need to turn it on in various places. Probably both the kubectl <-> apiserver and the apiserver <-> kublet paths would need this on.

@lavalamp
Copy link
Member

lavalamp commented Sep 1, 2020

I don't think there's much harm in just turning it on, I don't really think we need a flag. A ping every 20s or so shouldn't break the bank.

@lavalamp lavalamp added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Sep 1, 2020
@lavalamp lavalamp removed their assignment Sep 1, 2020
@lavalamp lavalamp added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Sep 1, 2020
@caesarxuchao
Copy link
Member

caesarxuchao commented Sep 1, 2020

#94170 might fix this.

edit: lavalamp beat me

@lavalamp
Copy link
Member

lavalamp commented Sep 1, 2020 via email

@nielsbasjes
Copy link
Author

Given that the recommended timeout is 20 seconds I would set the default ping time to something like 5 or 10 seconds.

@djzager
Copy link

djzager commented Sep 3, 2020

/assign

@Nit123
Copy link

Nit123 commented Sep 17, 2020

/assign

@vinayvenkat
Copy link

/assign

@joshfix
Copy link

joshfix commented Dec 3, 2020

+1 This would be super useful.

@bevank
Copy link

bevank commented Dec 3, 2020

+1

@knight42
Copy link
Member

knight42 commented Dec 5, 2020

Hi, I have filed #97083 to enable SPDY pings to address this issue.

@nielsbasjes
Copy link
Author

As I understand #97083 addresses the exec issue but not the logs -f issue.

@knight42
Copy link
Member

knight42 commented Dec 5, 2020

As I understand #97083 addresses the exec issue but not the logs -f issue.

More precisely, #97083 should address exec and portforward.

As for the logs -f, I think this is a different problem because logs -f actually sends a HTTP request to the REST API, and the connection might be terminated if the kubectl client, or the apiserver, or the load balancer between the client and server thought the connection has been idle for a specific period.

IMHO I guess we need to reconnect until user interrupts to address the logs -f issue.

@tiloso
Copy link
Contributor

tiloso commented Dec 7, 2020

Hey, for logs -f the issue has been addressed in #95981 as far as I understand. (At least for environments that use HTTP/2.)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 7, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 6, 2021
@nielsbasjes
Copy link
Author

As far as I understand right now:

@djzager / @vinayvenkat / @Nit123 Does that mean this issue has been fully resolved (i.e. can be closed)?
Or is there something remaining?

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Reifier
Copy link

Reifier commented Nov 15, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

@Reifier: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Projects
None yet
Development

No branches or pull requests