New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
images are not cleaned up - requires newer kernel #120
Comments
Reopening this so we check in a day or two that containers are not piling up on the slaves. |
I logged into the build slave and the changes applied in #124 does not seem to slow down the rate of abandoned containers. It was deployed ~ 48 hours ago and there appears to be approximately the same rate of appearance of left over containers before and after the change A sampling from one of the slaves: https://gist.github.com/tfoote/993bf3b574d1b2f9dfa7 |
I think this might be related to moby/moby#9665 where there's a race condition on docker removing references to the mounted resource which prevents removal. They can be cleaned up once the trailing references have bee removed. @dirk-thomas @esteve thoughts? |
In the thread it is mentioned that the 3.19 kernel might help with this. Can we use 14.04.3? |
To jump the kernel forward we need to install the lts enablement packages: https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes#LTS_Hardware_Enablement_Stack |
We're still leaking containers to the point that the cleanup script cannot clean up images at all. it took about 6-7 weeks of heavy building to reach the point of the machine going offline due to disk space. |
Instead of migrating to a newer kernel we should switch from Trusty to Xenial (which will have a newer kernel anyway). |
Yes, xenail would be great. We should also pull in overlayfs at that time as it's supposed to give notable performance boosts too. |
If I'm reading this correctly, it's a Docker+Kernel issue for the buildfarm agent hosts and not about the Linux versions within the Docker containers. Which means that this should be resolved by the xenial migration: ros-infrastructure/buildfarm_deployment#146 |
Can we confirm this has been fixed on the new Xenial-based machine and close this ticket? |
The Docker version, kernel version, and the script that cleans up images on hosts have all been updated. The linked issue is resolved and they've referenced a similar, still open issue but I'm not sure if that one affects us. We also haven't made the move to overlayfs and the docker overlay2 driver. It's something we're going to be experimenting with on the ROS 2 CI farm to address ros2/ci#75 I'm not sure if the issue this is tracking is closed. If we want to actively open an issue for overlayfs itself I think we should open a new one. |
I will close this ticket then. If we face the same problem again this can of course be reopened.
Since we have no imminent need to switch the FS I don't think we have to ticket it separately. The referenced CI ticket already covers the aarch64 case. Anyway since it would be a deployment specific ticket so I think it is fine to close out this one since the |
We use the --rm option now everywhere which should clean them at the end of each run and they should not accumulate.
This will avoid the race conditions on cleaning up active containers.
The text was updated successfully, but these errors were encountered: