Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

images are not cleaned up - requires newer kernel #120

Closed
tfoote opened this issue Jan 5, 2016 · 12 comments
Closed

images are not cleaned up - requires newer kernel #120

tfoote opened this issue Jan 5, 2016 · 12 comments
Assignees
Labels

Comments

@tfoote
Copy link
Member

tfoote commented Jan 5, 2016

We use the --rm option now everywhere which should clean them at the end of each run and they should not accumulate.

This will avoid the race conditions on cleaning up active containers.

@tfoote
Copy link
Member Author

tfoote commented Jan 13, 2016

Reopening this so we check in a day or two that containers are not piling up on the slaves.

@tfoote
Copy link
Member Author

tfoote commented Jan 14, 2016

I logged into the build slave and the changes applied in #124 does not seem to slow down the rate of abandoned containers. It was deployed ~ 48 hours ago and there appears to be approximately the same rate of appearance of left over containers before and after the change

A sampling from one of the slaves: https://gist.github.com/tfoote/993bf3b574d1b2f9dfa7

@tfoote
Copy link
Member Author

tfoote commented Jan 14, 2016

I think this might be related to moby/moby#9665 where there's a race condition on docker removing references to the mounted resource which prevents removal. They can be cleaned up once the trailing references have bee removed.

@dirk-thomas @esteve thoughts?

@dirk-thomas
Copy link
Member

In the thread it is mentioned that the 3.19 kernel might help with this. Can we use 14.04.3?

@tfoote
Copy link
Member Author

tfoote commented Jan 14, 2016

To jump the kernel forward we need to install the lts enablement packages: https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes#LTS_Hardware_Enablement_Stack

@tfoote tfoote added ready and removed in progress labels Jan 16, 2016
@dirk-thomas dirk-thomas changed the title Don't clean up containers images are not cleaned up - requires newer kernel Jan 27, 2016
@tfoote
Copy link
Member Author

tfoote commented May 10, 2016

We're still leaking containers to the point that the cleanup script cannot clean up images at all.
https://gist.github.com/tfoote/568128b201fc8f78931b71ffafee1424

it took about 6-7 weeks of heavy building to reach the point of the machine going offline due to disk space.

@dirk-thomas
Copy link
Member

Instead of migrating to a newer kernel we should switch from Trusty to Xenial (which will have a newer kernel anyway).

@tfoote
Copy link
Member Author

tfoote commented May 10, 2016

Yes, xenail would be great. We should also pull in overlayfs at that time as it's supposed to give notable performance boosts too.

@nuclearsandwich
Copy link
Contributor

If I'm reading this correctly, it's a Docker+Kernel issue for the buildfarm agent hosts and not about the Linux versions within the Docker containers. Which means that this should be resolved by the xenial migration: ros-infrastructure/buildfarm_deployment#146

@nuclearsandwich nuclearsandwich self-assigned this Aug 30, 2017
@dirk-thomas
Copy link
Member

Can we confirm this has been fixed on the new Xenial-based machine and close this ticket?

@nuclearsandwich
Copy link
Contributor

Can we confirm this has been fixed on the new Xenial-based machine and close this ticket?

The Docker version, kernel version, and the script that cleans up images on hosts have all been updated. The linked issue is resolved and they've referenced a similar, still open issue but I'm not sure if that one affects us.

We also haven't made the move to overlayfs and the docker overlay2 driver. It's something we're going to be experimenting with on the ROS 2 CI farm to address ros2/ci#75

I'm not sure if the issue this is tracking is closed. If we want to actively open an issue for overlayfs itself I think we should open a new one.

@dirk-thomas
Copy link
Member

The Docker version, kernel version, and the script that cleans up images on hosts have all been updated. The linked issue is resolved

I will close this ticket then. If we face the same problem again this can of course be reopened.

We also haven't made the move to overlayfs and the docker overlay2 driver. It's something we're going to be experimenting with on the ROS 2 CI farm to address ros2/ci#75
If we want to actively open an issue for overlayfs itself I think we should open a new one.

Since we have no imminent need to switch the FS I don't think we have to ticket it separately. The referenced CI ticket already covers the aarch64 case. Anyway since it would be a deployment specific ticket so I think it is fine to close out this one since the ros_buildfarm doesn't care what FS Docker is using on the machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants