Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducable: bake creates un-prunable files in overlay2 #2061

Open
2 of 3 tasks
NiklasBeierl opened this issue Sep 27, 2023 · 5 comments
Open
2 of 3 tasks

Reproducable: bake creates un-prunable files in overlay2 #2061

NiklasBeierl opened this issue Sep 27, 2023 · 5 comments

Comments

@NiklasBeierl
Copy link

Contributing guidelines

I've found a bug and checked that ...

  • ... the documentation does not mention anything about my problem
  • ... there are no open or closed issues that are related to my problem

Description

Up front: Sorry for the wall of text, this problem seems to be very elusive...

One of our docker hosts frequently ran out of disk space, despite pruning containers, logs, images, volumes and build cache.

The directory "/var/lib/docker/overlay2" often used about 100GB, while:

  • docker ps -a showed nothing
  • docker volume ls showed nothing
  • docker images -a showed nothing
  • docker system df showed 0 bytes being used

For pruning we ran:

docker system prune -a -f --volumes
docker builder prune -a -f 
docker buildx prune -a -f

Whenever this situation occurred, the only thing we could do was uninstalling docker, removing /var/lib/docker and reinstalling it...

apt remove docker-ce
rm -rf /var/lib/docker
apt install docker-ce

I did not tick "there are no open or closed issues..." because there are several issues / forum threads out there along the lines of "/var/lib/docker/overlay2 fills up my disk". Usually these come down to the author not knowing about the docker (builder) prune commands or all of the additional flags that can be set for these commands.

I am quite confident that with the prune commands I mentioned above, /var/lib/docker/overlay2 should be empty. Matter of fact it ends up empty, except when using docker buildx bake with very specific inputs that we managed to pin down to a very small, reproducible setup...

Expected behaviour

I generally expect to be able to "free up" disk-space consumed by any docker resource through the docker-cli without having to reinstall docker, please let me know if this is somehow misguided.

More specifically after deleting all containers, images, volumes and pruning the build cache, I expect /var/lib/docker/overlay2 to be empty. (except for the folder called l.) To "empty" docker, we typically used docker system prune -a -f --volumes, docker builder prune, docker buildx prune.

Actual behaviour

When using the dockerfiles and docker-bake.hcl provided below, buildx produces files in /var/lib/docker/overlay2 which we seem to have no way of deleting through the cli and which are also not counted when running docker system df.

We have reduced our original dockerfiles and docker-bake.hcl as far as we could until simplifying further makes the problem disappear. Whenever we made any modification we "reset" our docker installation by uninstalling, removing /var/lib/docker and reinstalling.

The steps below were reproducible on two different machines:

Step1: Make sure your docker setup is "empty"

$ /var/lib/docker/overlay2 ls
l
$ /var/lib/docker/overlay2 du -hs .
8.0K    .
$ /var/lib/docker/overlay2 docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          0         0         0B        0B
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B

Step 2: Create dockerfiles and docker-bake.hcl

In a directory of your choice, create the files I have provided in section "Configuration".

Step 3: Build the docker images with docker buildx bake

$ ~/c/docker-garbage docker buildx bake
< Logs provided in section "Build logs" below. >

Step 4: Checking disk usage of docker and /var/lib/docker/overlay2:

$ ~/c/docker-garbage docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          2         0         1.602GB   1.602GB (100%)
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     8         0         355B      355B

We are not running any containers at the moment. mount shows no bind mounts in overlay2.

$ /var/lib/docker/overlay2 ls
602e8d236dc20cc7a5b8c6e8eee69c1a9757f8541ba9a50682d6b4762ce80150  f6s5n1yu2asw6rkr6qc87j64k  ndxiwzkjek6dke51shxtcfhrv
98th2ukd5n6fz7hfn2l6er2bq                                         hoc4down9mqimid31ttmrbau7  r2q82nycxvnuym2htxvrcsiab
ac60195faf150523e49f0adcd9e3a306075bc7f51c053f6b4962dd70935fbef0  l                          ytixoxoym50muwljx6e8oobcq

$ /var/lib/docker/overlay2 du -hs .
1.7G    .

Step 5: Prune docker

Notice that less than the "reclaimable" amount is reclaimed.

$ ~/c/docker-garbage docker system prune -a -f --volumes
Deleted Images:
untagged: child:latest
deleted: sha256:78f42581308f5aaa74e133eef3caaa68c74f3aab76dcb820d365fa222213a046
untagged: parent:latest
deleted: sha256:b400bd392d4f0b10d4b01fc4a978f59073d0d58c2b56a005c1ad77bf23291ce4

Deleted build cache objects:
r2q82nycxvnuym2htxvrcsiab
ig1fym5s2s41ov8lnbojhqi86
f6s5n1yu2asw6rkr6qc87j64k
98th2ukd5n6fz7hfn2l6er2bq
re3yl384a1hbepqwrl8k9p251
ytixoxoym50muwljx6e8oobcq
vk7y6vkww45ma28i321trsfrm
elqgvrx10rufna56k4bm10gc8

Total reclaimed space: 1.485GB

And the builder cache pretends to be empty!

$  ~/c/docker-garbage docker builder prune -a -f
Total:  0B
$ ~/c/docker-garbage docker buildx prune -a -f
Total:  0B

Step 6: Check disk usage again

Docker system, as expected shows 0B being used:

$ ~/c/docker-garbage docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          0         0         0B        0B
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B

Other docker commands also show no resources:

$ ~/c/docker-garbage docker volume ls                                                                                                 ✘ 125
DRIVER    VOLUME NAME
$ ~/c/docker-garbage docker images -a
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE
$ ~/c/docker-garbage docker ps -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

We are still not running any containers at the moment. mount shows no bind mounts in overlay2.
But when looking at overlay2, 1.7GB disk are being used.

$ /var/lib/docker/overlay2 du -hs .
1.7G    .
$ /var/lib/docker/overlay2 ls
602e8d236dc20cc7a5b8c6e8eee69c1a9757f8541ba9a50682d6b4762ce80150  hoc4down9mqimid31ttmrbau7  l

I have played around with this setup a little more (changed FROM and RUN commands, etc.) Whenever something made the issue disappear, I left a comment in the dockerfile or docker-bake.hcl.

Why I believe it is a problem with bake...

When using:
docker buildx build -t child -f child.dockerfile . and docker buildx build -t parent -f parent.dockerfile . to build these images instead of bake and then pruning, no files are left behind in overlay2!

Buildx seems to 'know' about these files...

Another interesting observation is that when I remove these folders without also reinstalling docker and then try to build again, the build fails because the directories are missing! So it seems that buildx is somehow aware of these files, it just doesn't prune them! And they are also not counted in system df...

$ /var/lib/docker/overlay2 rm -rf 602e8d236dc20cc7a5b8c6e8eee69c1a9757f8541ba9a50682d6b4762ce80150/ hoc4down9mqimid31ttmrbau7/
$ ~/c/docker-garbage docker buildx bake
[+] Building 9.4s (9/9) FINISHED                                                                                            docker:default
 => [parent internal] load build definition from parent.dockerfile                                                                    0.0s
 => => transferring dockerfile: 314B                                                                                                  0.0s
 => [parent internal] load .dockerignore                                                                                              0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => CANCELED [child] resolve image config for docker.io/docker/dockerfile:1                                                           2.4s
 => [parent] docker-image://docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021     4.3s
 => => resolve docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021                  0.0s
 => => sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021 8.40kB / 8.40kB                                        0.0s
 => => sha256:657fcc512c7369f4cb3d94ea329150f8daf626bc838b1a1e81f1834c73ecc77e 482B / 482B                                            0.0s
 => => sha256:a17ee7fff8f5e97b974f5b48f51647d2cf28d543f2aa6c11aaa0ea431b44bb89 1.27kB / 1.27kB                                        0.0s
 => => sha256:9d9c93f4b00be908ab694a4df732570bced3b8a96b7515d70ff93402179ad232 11.80MB / 11.80MB                                      4.0s
 => => extracting sha256:9d9c93f4b00be908ab694a4df732570bced3b8a96b7515d70ff93402179ad232                                             0.2s
 => [parent internal] load metadata for docker.io/library/debian:bookworm                                                             2.4s
 => [parent 1/2] FROM docker.io/library/debian:bookworm@sha256:eaace54a93d7b69c7c52bb8ddf9b3fcba0c106a497bc1fdbb89a6299cf945c63       0.1s
 => => resolve docker.io/library/debian:bookworm@sha256:eaace54a93d7b69c7c52bb8ddf9b3fcba0c106a497bc1fdbb89a6299cf945c63              0.0s
 => => sha256:eaace54a93d7b69c7c52bb8ddf9b3fcba0c106a497bc1fdbb89a6299cf945c63 1.85kB / 1.85kB                                        0.0s
 => => sha256:8a6e23e1b192b30eff14036a92e9ecdb551a1a10aa8535728b0c13d14d8c9462 529B / 529B                                            0.0s
 => => sha256:2657a4a0a6d5e8b3515004185275768f115a64a833de40125bb3f6b0b8cc598b 1.46kB / 1.46kB                                        0.0s
 => [child internal] load build definition from child.dockerfile                                                                      0.1s
 => => transferring dockerfile: 130B                                                                                                  0.0s
 => [child internal] load .dockerignore                                                                                               0.1s
 => => transferring context: 2B                                                                                                       0.0s
 => ERROR [parent 2/2] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y texlive-latex-extra                    0.0s
------
 > [parent 2/2] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y texlive-latex-extra:
------
parent.dockerfile:8
--------------------
   6 |
   7 |     # If you install "tree", instead of texlive, the issue disappears
   8 | >>> RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y texlive-latex-extra
   9 |
  10 |
--------------------
ERROR: failed to solve: failed to prepare sha256:7c85cfa30cb11b7606c0ee84c713a8f6c9faad7cb7ba92f1f33ba36d4731cc82 as epqo0uud3cje68ljmkwpp1ksb: open /var/lib/docker/overlay2/602e8d236dc20cc7a5b8c6e8eee69c1a9757f8541ba9a50682d6b4762ce80150/committed: no such file or directory

Buildx version

github.com/docker/buildx 0.11.2 9872040, github.com/docker/buildx v0.11.2 9872040

Docker info

# Machine 1:

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.11.2
    Path:     /usr/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.20.3
    Path:     /usr/lib/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 091922f03c2762540fd057fba91260237ff86acb.m
 runc version:
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.5-arch1-1
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.13GiB
 Name: ANUBIS
 ID: 64789752-7389-4406-aebf-e0ee6f3a0a50
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

# Machine 2:
Client: Docker Engine - Community
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.20.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
 runc version: v1.1.8-0-g82f18fe
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.0-11-amd64
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 3.823GiB
 Name: masterhorst.cs.uni-saarland.de
 ID: 8d4a754a-c18d-4728-bb92-6b7218e12de2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Builders list

# Machine 1:
NAME/NODE DRIVER/ENDPOINT STATUS  BUILDKIT             PLATFORMS
default * docker
  default default         running v0.11.6+0a15675913b7 linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386

# Machine 2:
NAME/NODE DRIVER/ENDPOINT STATUS  BUILDKIT             PLATFORMS
default * docker
  default default         running v0.11.6+0a15675913b7 linux/amd64, linux/amd64/v2, linux/386

Configuration

./parent.dockerfile

# syntax=docker/dockerfile:1
FROM debian:bookworm
# Also works with:
# FROM python:3.10
# FROM ubuntu:focal

# If you install "tree", instead of texlive, the issue disappears
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y texlive-latex-extra

./child.dockerfile

# syntax=docker/dockerfile:1
FROM parent

RUN echo "Hello, this is the child image!"

./docker-bake.hcl

group "default" {
  targets = ["parent", "child"]
}

target "parent" {
  context    = "."
  dockerfile = "parent.dockerfile"
  tags       = ["parent"]
}

# Removing the "child" target makes the issue disappear
target "child" {
  context    = "."
  dockerfile = "child.dockerfile"
  tags       = ["child"]
  contexts   = {
    parent = "target:parent"
  }
}

Build logs

[+] Building 263.4s (12/12) FINISHED                                                                                        docker:default
 => [parent internal] load .dockerignore                                                                                              0.0s
 => => transferring context: 2B                                                                                                       0.0s
 => [parent internal] load build definition from parent.dockerfile                                                                    0.0s
 => => transferring dockerfile: 314B                                                                                                  0.0s
 => [child] resolve image config for docker.io/docker/dockerfile:1                                                                    2.4s
 => [child] docker-image://docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021      8.7s
 => => resolve docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021                  0.0s
 => => sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021 8.40kB / 8.40kB                                        0.0s
 => => sha256:657fcc512c7369f4cb3d94ea329150f8daf626bc838b1a1e81f1834c73ecc77e 482B / 482B                                            0.0s
 => => sha256:a17ee7fff8f5e97b974f5b48f51647d2cf28d543f2aa6c11aaa0ea431b44bb89 1.27kB / 1.27kB                                        0.0s
 => => sha256:9d9c93f4b00be908ab694a4df732570bced3b8a96b7515d70ff93402179ad232 11.80MB / 11.80MB                                      4.0s
 => => extracting sha256:9d9c93f4b00be908ab694a4df732570bced3b8a96b7515d70ff93402179ad232                                             0.2s
 => [parent internal] load metadata for docker.io/library/debian:bookworm                                                             2.4s
 => [child 1/2] FROM docker.io/library/debian:bookworm@sha256:eaace54a93d7b69c7c52bb8ddf9b3fcba0c106a497bc1fdbb89a6299cf945c63       27.4s
 => => resolve docker.io/library/debian:bookworm@sha256:eaace54a93d7b69c7c52bb8ddf9b3fcba0c106a497bc1fdbb89a6299cf945c63              0.0s
 => => sha256:eaace54a93d7b69c7c52bb8ddf9b3fcba0c106a497bc1fdbb89a6299cf945c63 1.85kB / 1.85kB                                        0.0s
 => => sha256:8a6e23e1b192b30eff14036a92e9ecdb551a1a10aa8535728b0c13d14d8c9462 529B / 529B                                            0.0s
 => => sha256:2657a4a0a6d5e8b3515004185275768f115a64a833de40125bb3f6b0b8cc598b 1.46kB / 1.46kB                                        0.0s
 => => sha256:167b8a53ca4504bc6aa3182e336fa96f4ef76875d158c1933d3e2fa19c57e0c3 49.56MB / 49.56MB                                     16.2s
 => => extracting sha256:167b8a53ca4504bc6aa3182e336fa96f4ef76875d158c1933d3e2fa19c57e0c3                                             1.7s
 => [child internal] load .dockerignore                                                                                               0.1s
 => => transferring context: 2B                                                                                                       0.0s
 => [child internal] load build definition from child.dockerfile                                                                      0.1s
 => => transferring dockerfile: 130B                                                                                                  0.0s
 => [parent 2/2] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y texlive-latex-extra                        232.7s
 => [child 1/2] RUN echo "Hello, this is the child image!"                                                                            0.3s
 => [parent] exporting to image                                                                                                      12.5s
 => => exporting layers                                                                                                              12.5s
 => => writing image sha256:b400bd392d4f0b10d4b01fc4a978f59073d0d58c2b56a005c1ad77bf23291ce4                                          0.0s
 => => naming to docker.io/library/parent                                                                                             0.0s
 => [child] exporting to image                                                                                                       12.4s
 => => exporting layers                                                                                                              12.4s
 => => writing image sha256:78f42581308f5aaa74e133eef3caaa68c74f3aab76dcb820d365fa222213a046                                          0.0s
 => => naming to docker.io/library/child

Additional info

No response

@SimRi99
Copy link

SimRi99 commented Oct 2, 2023

Thanks for bringing this up! I also encoutered similar problems on my system, but couldn't find any solution but to reinstall docker every time overlay2 was full. Given your configuration, I could also reproduce your example on my system. It would be great if anyone could look into this! 👍
My docker version : Docker version 24.0.2, build cb74dfc

@tonistiigi
Copy link
Member

This looks similar to moby/moby#46136

@NiklasBeierl
Copy link
Author

NiklasBeierl commented Oct 3, 2023

This looks similar to moby/moby#46136

That might indeed be the same problem as we are facing. Is there an easy way for me to test the changes in moby/moby#45966 ? Or will I need to wait for a release of moby and then subsequently docker / buildx?

I couldn't find a straightforward way of upgrading buildkit alone and my buildx already seems to be on the latest version.

@Nova-Logic
Copy link

Tried to use buildx bake, but due to this issue I will revert to compose.

@NiklasBeierl
Copy link
Author

As far as I understood the discussion on moby/moby#46136
moby/moby#45966 should have fixed this and was supposed to land in docker v25.

Since docker v25 was released last week, we installed it:

Client: Docker Engine - Community
 Version:    25.0.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Buildkit seems to be v0.12 as well:

NAME/NODE DRIVER/ENDPOINT STATUS  BUILDKIT             PLATFORMS
default * docker
  default default         running v0.12.4+3b6880d2a00f linux/amd64, linux/amd64/v2, linux/386

But we are still facing the issue described above :( Can anyone comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants