Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers should support inherent garbage collection & restart policy #12856

Closed
dalanlan opened this issue Aug 18, 2015 · 16 comments
Closed

Containers should support inherent garbage collection & restart policy #12856

dalanlan opened this issue Aug 18, 2015 · 16 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@dalanlan
Copy link
Contributor

1 garbage collection
As it's well known to us all, auto container gc could be kinda tricky (race condition etc.)
Hence I guess a raw field of Container struct to support stuff like docker run --rm would help a lot, which aims to manage user-defined gc as long as the container terminates.

2 restart policy
Maybe i'm nuts yet I'll bet restart policy makes more sense to a container-level instead of pod-level.

3 kubelet component
My kubelet suffers like

E0818 11:53:57.131088       1 kubelet.go:682] Image garbage collection failed: non-existent label "docker-images"

which has not been quite settled down. (Also see in #11733)
I tried to mount /var/lib/docker as a volume of kubelet once, which didnt work. Any ideas?
ENV: k8s 1.0.0-dir / docker 1.7.1 / ubuntu14.04
startup Opts for kubelet

docker run --net=host -d --privileged -v /sys:/sys:ro -v /var/run/docker.sock:/var/run/docker.sock  hyperkube:v1.0.0-dir /hyperkube kubelet --api-servers=https://127.0.0.1:6443 --v=2 --enable-server --config=/etc/kubernetes/manifests-multi --hostname-override=127.0.0.1 --cluster-dns=192.168.3.10 --cluster-domain=cluster.local --kubeconfig=/srv/kubernetes/config
@dalanlan
Copy link
Contributor Author

Saw some flags like --maximum-dead-containers of kubelet could help with gc, yet i assume an inherent way allows greater customization.

@roberthbailey roberthbailey added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. team/ux labels Aug 18, 2015
@roberthbailey
Copy link
Contributor

@dalanlan
Copy link
Contributor Author

Copied from #12760

Also, because kubelet relies on examining the dead containers to construct the pod status, we want to keep a certain number of dead containers around (restart counts, etc).

Let's say i set up a cron job to remove all of the exited containers nightly, would it break the behavior of kubelet?
/cc @yujuhong

@yujuhong
Copy link
Contributor

Let's say i set up a cron job to remove all of the exited containers nightly, would it break the behavior of kubelet?

Unfortunately, yes. Kubelet mostly would still work, but you'd observe some inconsistencies / undesirable pod behaviors due to the fact that kubelet relies on the dead containers for checkpointing.

E.g. Suppose you have a pod with two containers and a "Never" restart policy.

  • kubelet creates two containers A, B and pod becomes running
  • container A terminates; kubelet does not attempt to restart it due to the policy. (pod is still in the running state)
  • external GC removes container A
  • kubelet sees no dead containers for this pod and starts container A again; this violates the restart policy.

There are also minor problems about updating pod status, if the pod is still in the running state.

We plan to fix this in the long run by checkpointing kubelet. For the shorter term, we can keep an in-memory cache that would help with situation like this (but would not survive a restart).

On the other hand, it is perfectly fine to remove containers from deleted pods (i.e., no longer visible from the apiserver) (#8347).

@dchen1107
Copy link
Member

Here are 3 issues reported here, and first two are kind of design issues, and the latest one is sort of support issue / bug on Ubuntu machine. I re-opened #11733 for it, and leave this one for first two.

  • garbage collection:
    I agreed that we didn't do very good job on garbage collection, actually really bad on disk management. Part of the reason is lacking resource and priorities. We are improving that. But according to use docker feature like docker run --rm, I don't think it is a good idea for Kubernetes which is designed for managing containerized applications across multiple hosts (many eventually). Thinking about the user deploying a pod which has a container, using docker run --rm will immediately remove the terminated container, no matter succeeded or failed. You lost the information related to that container instance. Even we plan to improve kubelet to use lifecycle events for better performance and lower the overheads, but still a lot of information cannot be conveyed through events, for example, container logs, etc.

Why do you care about so much those dead containers? Do you have OOD issue badly?

  • restart policy:
    There were many discussion related to restart policy on pod level or container level before, and we decided to put it at pod level based on several design decisions. Restart policy affects pod's lifetime. All containers within a pod will be colocated. Unless all containers within a pod are terminated, we won't terminate a pod. Adding / removing / updating containers within a pod have to be explicitly updating pod definition. Pod is not designed to be a scheduling target, and container is not designed for intra-pod workflows. We don't plan to add dependencies between containers within a pod.

Do you have any use cases in mind to have restart policy at container level?

@dalanlan
Copy link
Contributor Author

Why do you care about so much those dead containers? Do you have OOD issue badly?

Yeah, indeed. Yet since kubelet relies on the (dead) containers to make a strategy, it's quite reasonable to keep it as it is in the short term.

Do you have any use cases in mind to have restart policy at container level?

Suppose we have a container A acting as a web server (let's say tomcat) which consumes a war provided by a container B. The only job for B is to copy the war somewhere A could access and exit gracefully. In this case, the restart policy is Always for A and OnFailure for B.
I believe it's kinda general case, see also #12588

@pmorie
Copy link
Member

pmorie commented Aug 20, 2015

Interesting. @bgrant0607, what would the backward compat issues be for this?

On Wednesday, August 19, 2015, Emma He notifications@github.com wrote:

Why do you care about so much those dead containers? Do you have OOD issue
badly?

Yeah, indeed. Yet since kubelet relies on the (dead) containers to make a
strategy, it's quite reasonable to keep it as it is in the short term.

Do you have any use cases in mind to have restart policy at container
level?

Suppose we have a container A acting as a web server (let's say tomcat)
which consumes a war provided by a container B. The only job for B is to
copy the war somewhere A could access and exit gracefully. In this case,
the restart policy is Always for A and OnFailure for B.
I believe it's kinda general case, see also #12588
#12588


Reply to this email directly or view it on GitHub
#12856 (comment)
.

@dalanlan
Copy link
Contributor Author

You lost the information related to that container instance.

What if the user does not care about it himself? This may sound odd, yet this feature can always be enabled for someone who needs it badly.

/cc @dchen1107

@yujuhong
Copy link
Contributor

What if the user does not care about it himself? This may sound odd, yet this feature can always be enabled for someone who needs it badly.

Kubelet would not be reporting accurate status, and/or respecting the restart policy in this case. It seems to me that this should not be optional. Is there any reason for removing all dead containers right away other than the potential saving in disk space?

@bgrant0607
Copy link
Member

See the original discussion re. restart policy in #127. I do not envision per-container restart policies. In addition to a number of semantic problems, it would be insufficient for what people want to use it for, such as #12588.

I also don't envision supporting general-purpose dependencies, which also have semantic problems. I do support addressing the most common use case, which is populating a volume upon pod startup (#831). PRs for that would be welcome.

We agree that container GC needs to be improved. I don't like the idea of exposing knobs in the API to control it, however.

We also agree that we should move away from relying entirely on dead containers as tombstones. As mentioned above, Kubelet would then need its own tombstones (#489).

@yujuhong
Copy link
Contributor

We also agree that we should move away from relying entirely on dead containers as tombstones. As mentioned above, Kubelet would then need its own tombstones (#489).

I just wanted to add that whether there is checkpointing or not, there is still a window of time where kubelet needs to inspect the dead container for information and save it somewhere. There would always be problems if user come in and remove containers underneath kubelet.

@dalanlan
Copy link
Contributor Author

Is there any reason for removing all dead containers right away other than the potential saving in disk space?

To be honest, i dont fancy this myself. Yet my partner insists and i'm just trying to make sure that he hits a dead end :-(

There would always be problems if user come in and remove containers underneath kubelet.

What if the user set --maximum-dead-containers-per-container to 0? It's kinda tricky, yet permitted.
/cc @yujuhong

In addition to a number of semantic problems, it would be insufficient for what people want to use it for.

I'm not sure how semantic problems matter, yet i'm fine with that if you insists.
/cc @bgrant0607

@jchauncey
Copy link

On the subject of restart policies:

It seems unacceptable to have an application potentially flap forever because it cannot be started properly. In most production environments using a process manager we would never have an application always restart. Instead we would set some limit to the number of times to retry and then just stop after taht limit. This would allow our monitoring tools to alert that the application is indeed down.

As it stands the RC would keep trying to restart the container and we would have a hard time determining if the application was down.

@fejta-bot
Copy link

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Freeze the issue for 90d with /lifecycle frozen.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 13, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

9 participants