Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build takes 100% of persistent memory #43133

Closed
me0x847206 opened this issue Jan 8, 2022 · 5 comments
Closed

Docker build takes 100% of persistent memory #43133

me0x847206 opened this issue Jan 8, 2022 · 5 comments

Comments

@me0x847206
Copy link

me0x847206 commented Jan 8, 2022

Building a Dockerfile when creating an UNIX user with large uid takes entire HDD space until it crashes

Steps to reproduce the issue:

-- ATTENTION: Do not run it on a production/personal/non-test-scoped-machine as it breaks things!

1.1. Consider this Dockerfile content

FROM debian
RUN useradd --uid '1234567890' 'theuser'

1.2. Run this command: docker build .

Describe the results you received:

2.1. Monitor how your HDD is getting filled until it reaches 100%

2.2. Also you have to see a new created partition (by running df -lh for example) like this one:

  • overlay 448G 82G 344G 20% /var/lib/docker/overlay2/70b3095035cbfd9382bf65976727fe1894d7f53d200c4c0335240d3026e80213/merged

2.3. By trying to cancel/kill the process started at step #1.2 you will not stop the thing happening at step #2.1 - HDD is still filled with some data, continuously

2.4. The only way to stop it is to stop/restart/kill docker daemon process

Describe the results you expected:

I would expect a different result like a successfully (or not) created docker image but definitely not loosing entire HDD memory.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:37 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:46 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)

Server:
 Containers: 4
  Running: 0
  Paused: 0
  Stopped: 4
 Images: 96
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: xxx
  Is Manager: true
  ClusterID: xxx
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: x.x.x.x
  Manager Addresses:
   x.x.x.x:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.0-10-amd64
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 125.9GiB
 Name: --
 ID: NAXP:2473:3RCK:CRAK:WV7L:T7WM:TVKO:UBA2:3BH6:ISOF:2ZC3:VHTF
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

# uname -a
Linux host0 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux
@cpuguy83
Copy link
Member

cpuguy83 commented Jan 8, 2022

It looks like useradd is doing something bad here.
It doesn't seem like something docker can do anything about. Docker is just running the command you told it to run unless I am missing something?

@me0x847206
Copy link
Author

Actually the build process (command at point #1.2) never ends. Process stays active until HDD becomes full then process fails with a EOF specific error.

Note that I'm able to create such an user on debian host machine, into an existing docker image and even into an existing docker container without any issues. The up mentioned issue is valid only during docker build.

I've just reproduced this issue on a personal set of jenkins agents by DoS attacking them all and it succeed. All jenkins agents tried to build such a Dockerfile and all of them became broken in few hours (as HDD is filled slow but constantly).

@me0x847206
Copy link
Author

Did some additional investigations and latest conclusion from my side is that something wrong (like overflow error/exception) happens at overlay level so at the end something is recursively created again and again until entire /var partition is considered then process fails because of memory starvation case. On my side I was able to fill 500 GB in up to 2 hours even after killing the docker build process at all.

@thaJeztah
Copy link
Member

I think this is a duplicate of #5419, which describes what's causing this, and some workarounds

@me0x847206
Copy link
Author

I think this is a duplicate of #5419, which describes what's causing this, and some workarounds

Agree. This issue fully duplicates the mentioned one. Will put an eye on that one. Thanks for linking them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants