Pods that fail health checks always restarting on the same minion instead of others? #13385

joshm1 · 2015-08-31T13:59:24Z

Over the weekend the skydns container in the kube-dns pod died. The exact reason I'm not sure because I couldn't find much detail from the logs, but watching the etcd and skydns logs showed the root issue could've been etcd. A theory I have is that the /mnt/emphemeral/kubernetes filesystem was full (it's only 3.75GB and has a few large empty-dir volumes). It was showing 3/4 ready for kube-dns.

This caused all of my application pods across 4 minions to go down. I had to manually delete the kube-dns pod and when it launched on another minion it was fine and everything came back online.

On the same token. I had 1 minion that would never consider any of my pods "ready", even though the other the other 3 minions did. I didn't find out why and my logs weren't helpful, so I just had to manually terminate that minion (EC2 instance) and auto-scale a new one (which happened to work fine).

For both of these cases, if k8s automatically moved the pods that constantly failed to other minions I think the cluster would've healed itself. Is the fact that failing pods always try to restart on the same minion intentional or something in the works?

I'm sorry I don't have logs to show. I'm not sure how to retrieve them from 2 days ago after so many pods have been restarted.

The text was updated successfully, but these errors were encountered:

lavalamp · 2015-09-01T18:25:46Z

There's two things here; first is to figure out what was wrong with your node and start detecting it. Second is the meta-problem of noticing when something is wrong with a node, even if we don't have a detection mechanism for that specific thing.

joshm1 · 2015-09-08T14:20:40Z

@lavalamp certainly, it's critical to be able to detect issues with nodes. Is this something on that roadmap that will be built into kubernetes/kubelets? In the meantime, I need someway to detect this internally and either automatically handle it and/or send alerts. What are some ways you'd advise to do this?

This issue bit me again over the weekend. I have a simple 3-node cluster in AWS that was provisioned with cluster/kube-up with only 3 non kube-system pods. Everything was healthy on Friday, and without any changes over the weekend, I checked it again a few days later and all pods on that particular node were failing [1], and had they restarted on another node everything would've been fine.

[1] is this is another Github issue I should create?

kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                                 READY     STATUS                                                                   RESTARTS   AGE       NODE
default       xxx-xxxxxxxx-prd-06c055d7-7neib                      1/1       Running                                                                  0          52m       ip-172-20-0-141.ec2.internal
default       xxx-xxxxxxxx-prd-06c055d7-sxkyo                      0/1       Image: xxxxxxxx/xxx-xxxxxxxx:prd-06c055d7 is not ready on the node       0          30m       ip-172-20-0-84.ec2.internal
kube-system   elasticsearch-logging-v1-61a90                       1/1       Running                                                                  0          5d        ip-172-20-0-84.ec2.internal
kube-system   fluentd-elasticsearch-ip-172-20-0-141.ec2.internal   1/1       Running                                                                  2          5d        ip-172-20-0-141.ec2.internal
kube-system   fluentd-elasticsearch-ip-172-20-0-183.ec2.internal   1/1       Running                                                                  0          5d        ip-172-20-0-183.ec2.internal
kube-system   fluentd-elasticsearch-ip-172-20-0-84.ec2.internal    1/1       Running                                                                  0          5d        ip-172-20-0-84.ec2.internal
kube-system   kibana-logging-v1-mldpo                              1/1       Running                                                                  0          5d        ip-172-20-0-84.ec2.internal
kube-system   kube-dns-v8-3zel5                                    4/4       Running                                                                  1          4d        ip-172-20-0-141.ec2.internal
kube-system   kube-dns-v8-ud0oq                                    1/4       API error (500): Cannot start container 060b46a4a91716cecc1e4cbe60a66450d33a8f7947db95577c9104cc849d744b: [8] System error: too many open files in system
              33                                                   5d        ip-172-20-0-84.ec2.internal
kube-system   kube-ui-v1-yq9an                                     1/1       Running   0         4d        ip-172-20-0-183.ec2.internal
kube-system   monitoring-heapster-v6-ckogk                         0/1       API error (500): Cannot start container 0e7daac3182af32b2867072b72b5d324b7e4177d1136df9d389fb671e8f280bf: [8] System error: too many open files in system
              11                                                   5d        ip-172-20-0-84.ec2.internal
kube-system   monitoring-influx-grafana-v1-4ubv9                   2/2       Running   2         4d        ip-172-20-0-84.ec2.internal

kubectl get events => Error syncing pod, skipping: API error (500): Cannot start container 8b987eaa17cade98a4ba702381d88b52e7344e03af2f6f9f157ad4049ef35c2f: [8] System error: too many open files in system

lavalamp · 2015-09-08T17:59:09Z

@joshm1 We already detect various problems with nodes (disk full, docker down, etc).

It looks like you're running out of file handles, so something is leaking them or you have that system setting too low (we raise it for master, but I'm not sure about nodes).

@dchen1107 Can we detect out-of-FDs and make the node not ready?

joshm1 · 2015-09-09T10:30:26Z

The minion that ran out of file handles didn't have any running pods on it
at the time of seeing that error. It's using a the AWS auto scaling group
on Ubuntu created by kube-up.

On Tuesday, September 8, 2015, Daniel Smith notifications@github.com
wrote:

@joshm1 https://github.com/joshm1 We already detect various problems
with nodes (disk full, docker down, etc).

It looks like you're running out of file handles, so something is leaking
them or you have that system setting too low (we raise it for master, but
I'm not sure about nodes).

@dchen1107 https://github.com/dchen1107 Can we detect out-of-FDs and
make the node not ready?

—
Reply to this email directly or view it on GitHub
#13385 (comment)
.

vishh · 2015-09-09T16:43:01Z

Detecting total number of fds open should be possible. In addition to that
we should impose fd limits on containers to prevent leaks. #3595

dchen1107 · 2015-09-22T23:48:06Z

@lavalamp We can detect out-of-FDs and mark the node is not ready.

@vishh I don't think we can impose fd limits on containers today yet given the current implementation from docker. I will explain it in #3595.

davidopp · 2015-12-15T09:08:54Z

@dchen1107 Should we rename this issue to "detect out-of-FDs and mark node not ready when it happens"? I thought maybe we already had an issue open for that, but I can't find one.

caesarxuchao · 2016-05-08T20:40:31Z

How should the user work around the error? There is a new report of hitting this issue in stackoverflow: http://stackoverflow.com/questions/37067434/kubernetes-cant-start-due-to-too-many-open-files-in-system.

bobintornado · 2016-05-27T04:02:40Z

Also reported in issue #26246

GreatSUN · 2016-08-08T13:43:54Z

Hi all, additionally to this, there can be other problems, like in virtualized environment the reported sizes that we can detect might not be those we can work with and we might infact be running on swap and therefore services might not react in time and though should be moved to other hosts.
I suggest a possibility to define an amount of healthcheck related restarts until the pod should be rescheduled.

Bekt · 2017-02-09T21:57:08Z

We experience this often. All nodes report healthy, but a pod gets stuck in restart loop (for whatever reason). However, if I delete the pod, it is recreated just fine (managed by RC or Deployment).

Is there any way to kill a pod after some threshold of restart?

davidopp · 2017-02-12T21:33:01Z

This seems like a reasonable request though it's tricky to pick the right policy.

@kubernetes/sig-node-feature-requests

bgrant0607 · 2017-02-23T07:05:23Z

Original thread was in #127.

We previously discussed moving anomalously crashlooping pods (if all pods of a controller are crashlooping, on multiple nodes, then there's no point in moving any) in the rescheduler:
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduling.md

fejta-bot · 2017-12-21T11:14:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-20T12:02:25Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

bgrant0607 · 2018-01-22T17:54:10Z

Closing in favor of kubernetes-sigs/descheduler#62

lavalamp added the kind/friction label Sep 1, 2015

lavalamp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 1, 2015

lavalamp assigned dchen1107 Sep 8, 2015

bgrant0607-nocc added the team/ux label Sep 27, 2015

bgrant0607 added area/usability team/control-plane and removed kind/friction labels Oct 8, 2015

bgrant0607 removed the team/control-plane (deprecated - do not use) label Feb 23, 2017

bgrant0607 unassigned dchen1107 Feb 23, 2017

bgrant0607 mentioned this issue Feb 23, 2017

Make backoff parameters configurable #17801

Closed

bgrant0607 mentioned this issue Feb 23, 2017

Flapping service handling #13225

Closed

kabakaev mentioned this issue Dec 6, 2017

deschedule pods that fail to start or restart too often kubernetes-sigs/descheduler#62

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 20, 2018

bgrant0607 closed this as completed Jan 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods that fail health checks always restarting on the same minion instead of others? #13385

Pods that fail health checks always restarting on the same minion instead of others? #13385

joshm1 commented Aug 31, 2015

lavalamp commented Sep 1, 2015

joshm1 commented Sep 8, 2015

lavalamp commented Sep 8, 2015

joshm1 commented Sep 9, 2015

vishh commented Sep 9, 2015

dchen1107 commented Sep 22, 2015

davidopp commented Dec 15, 2015

caesarxuchao commented May 8, 2016

bobintornado commented May 27, 2016

GreatSUN commented Aug 8, 2016

Bekt commented Feb 9, 2017

davidopp commented Feb 12, 2017

bgrant0607 commented Feb 23, 2017

fejta-bot commented Dec 21, 2017

fejta-bot commented Jan 20, 2018

bgrant0607 commented Jan 22, 2018

Pods that fail health checks always restarting on the same minion instead of others? #13385

Pods that fail health checks always restarting on the same minion instead of others? #13385

Comments

joshm1 commented Aug 31, 2015

lavalamp commented Sep 1, 2015

joshm1 commented Sep 8, 2015

lavalamp commented Sep 8, 2015

joshm1 commented Sep 9, 2015

vishh commented Sep 9, 2015

dchen1107 commented Sep 22, 2015

davidopp commented Dec 15, 2015

caesarxuchao commented May 8, 2016

bobintornado commented May 27, 2016

GreatSUN commented Aug 8, 2016

Bekt commented Feb 9, 2017

davidopp commented Feb 12, 2017

bgrant0607 commented Feb 23, 2017

fejta-bot commented Dec 21, 2017

fejta-bot commented Jan 20, 2018

bgrant0607 commented Jan 22, 2018