dockerd: high memory usage #848

ceecko · 2019-11-08T13:00:57Z

This is a bug report
This is a feature request
I searched existing issues before opening this one

Expected behavior

dockerd should use less memory

Actual behavior

dockerd uses 4.5GB+ memory

Steps to reproduce the behavior

Not sure. We run multiple servers with docker and all of them experience high memory usage after some time.

I'm happy to provide any debugging logs as needed.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.3
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        a872fc2f86
 Built:             Tue Oct  8 00:58:10 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:36 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 171
  Running: 147
  Paused: 0
  Stopped: 24
 Images: 140
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: journald
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1062.1.2.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.05GiB
 Name: kkk
 ID: 6VZX:5BMH:3O4I:PU5H:YPVC:FYEN:VZUT:O5RW:PMU2:F7K6:DS44:DTWT
 Docker Root Dir: /data/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.)

physical server
API is used to control docker
logging is done to fluentd
no mounts are used
each container exposes one port
there's plenty of containers which are automatically restarted due to errors on startup (not related to docker) until restart limit is hit and are then stopped (~20-25 containers)
This appears in the logs pretty often

time="2019-11-08T13:38:15.333931156+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

The text was updated successfully, but these errors were encountered:

andrewhsu · 2019-11-12T18:50:24Z

@ceecko could you provide steps to reproduce? With current description of the issue, hard to nail down what is happening on your system.

ceecko · 2019-11-12T18:55:58Z

@andrewhsu I understand. I don't have any specific steps. We run tens of servers with 32GB of memory where containers come and go and all of them experience this high memory usage over time. Usually within 2-4 weeks.

Is there any debugging information I can get you to see what's using the memory?

ceecko · 2019-11-16T20:26:03Z

@andrewhsu I managed to replicate the issue. After running the following script the memory usage jumps to 262MB. It appears fluentd-async-connect=true is responsible for this.

Fluentd runs ok and accepts logs. Removing all containers does not decrease the memory usage.

#!/bin/bash
for i in {1..10}
do
  docker run -d \
    --restart always \
    --log-driver=fluentd \
    --log-opt fluentd-address=127.0.0.1:2222 \
    --log-opt fluentd-async-connect=true \
    debian sleep 2 &
done

ceecko · 2019-12-01T12:15:09Z

@andrewhsu is there any other information which would be useful?

kolyshkin · 2019-12-17T18:49:45Z

@ceecko can you please collect memory usage dumps and share it with us? The following article explains how to do that: https://success.docker.com/article/how-do-i-gather-engine-heap-information

ceecko · 2019-12-18T09:27:56Z

Attached you can find the files
pprof.dockerd.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
pprof.dockerd.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz

There appears to be an error in the output

[root@docker ~]# docker run --rm --net host -v $PWD:/root/pprof/ golang go tool pprof --svg --alloc_space localhost:8080/debug/pprof/heap
Fetching profile over HTTP from http://localhost:8080/debug/pprof/heap
Saved profile in /root/pprof/pprof.dockerd.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
failed to execute dot. Is Graphviz installed? Error: exec: "dot": executable file not found in $PATH
[root@docker ~]# docker run --rm --net host -v $PWD:/root/pprof/ golang go tool pprof --svg --inuse_space localhost:8080/debug/pprof/heap
Fetching profile over HTTP from http://localhost:8080/debug/pprof/heap
Saved profile in /root/pprof/pprof.dockerd.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
failed to execute dot. Is Graphviz installed? Error: exec: "dot": executable file not found in $PATH

davidschrooten · 2019-12-30T09:35:54Z

I am running into a similar problem on one of my kubernetes clusters. After 4 weeks memory consumption of dockerd is climbing from 1.5 gigabyte to 54 gigabyte. Only a reboot temporary solves the problem. Docker commands such as docker stats also become unresponsive when the memory usage starts rising. This happens on 18.06.2-ce on debian stretch. Problem does not happen on another cluster composed out of nodes that run coreos; which have the same deployments.

cpuguy83 · 2019-12-30T18:25:42Z

@ceecko Thanks for the dump. It seems like this only has a very small portion and shows 25MB of allocated objects.

The reason the svg is not working for you is the golang image does not have graphviz installed, which is what is used to generate that svg.

ceecko · 2019-12-30T18:54:20Z

The dump has been taken at a time when dockerd was using ~220MB of memory after running the provided script.

Maybe I'm reading the output of top wrong? Here it shows 11.3% from 32GB

MiB Mem :  31901.1 total,   1789.1 free,  25112.8 used,   4999.2 buff/cache
MiB Swap:   5118.0 total,   4800.5 free,    317.5 used.   6379.9 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15625 root      20   0 7957.0m   3.5g  21.0m S   2.0 11.3   1061:05 dockerd

srstsavage · 2020-02-05T07:51:54Z

I can confirm this memory leak. Each container deployment with fluentd-async-connect set to true causes dockerd to consume memory which is never released. With fluentd-async-connect set to false no problem occurs.

Here's a Grafana graph of dockerd memory usage (process_resident_memory_bytes from the /metrics endpoint):

In our case this leads to dockerd being killed by the OS due to oom.

Also, this seems to be a regression: only 19.x docker engines seem to be affected. 18.x dockerds are not affected.

pprof results: dockerd_fluentd_async_leak.tar.gz

Tested with:

Debian 8, 9, and 10
Docker 19.03.5 (affected), 19.03.4 (affected), 18.09.0 (not affected), 18.06.3-ce (not affected),

I'm also seeing similar pprof results as @ceecko; the memory usage reported by the pprof output (at least the svgs) is much lower than the memory usage reported by the OS and Docker's own /metrics.

ceecko · 2020-03-08T19:31:23Z

@thaJeztah is there any other information you need?

gotamilarasan · 2020-04-08T08:44:18Z

We are also facing a similar problem where docker daemon consumes 5GB+ of data, but the go pprof heap shows only ~1GB and it is caused by log driver.

docker_heap.pb.gz
docker_cpu.pb.gz

Steps to reproduce the behavior
I could reproduce the problem when starting a docker container running java and immediately tailing the logs of that container. I can not share the image because it is confidential. Let me know if there's anything else I can share that could help you.

top -b -o +%MEM | head 
top - 07:06:31 up 22:04,  2 users,  load average: 1.45, 1.62, 1.26
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us,  0.5 sy,  0.0 ni, 98.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 15842444 total,  6400792 free,  7067024 used,  2374628 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7992948 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28920 root      20   0 6927984 5.475g  47228 S   0.0 36.2   0:26.17 dockerd
29700 root      20   0 8120164 902096  20620 S   0.0  5.7   1:29.36 java

free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        6.7G        6.1G        160M        2.3G        7.6G
Swap:            0B          0B          0B

Daemon configuration:

cat /etc/docker/daemon.json
{
  "live-restore": true,
  "log-driver": "local",
  "log-opts": {
    "max-size": "50m",
    "max-file": "5"
  }
}

Docker info:

docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 49
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: local
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.4.0-1072-aws
 Operating System: Ubuntu 16.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.11GiB
 Name: <hostname>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

Additional environment details (AWS, VirtualBox, physical, etc.)

AWS EC2 instance
Uses local log driver
Earlier we had a instance with 4GB memory and caused OOM killing, so switched to a instance type with memory of 16GB for now.

@lmello

cpuguy83 · 2020-04-08T17:40:50Z

I think this is related to excessive allocations rather than an actual leak.
Every time we need to reset our decoding logic (short reads, EOF during follow, etc.) we create a new buffer instead of reusing the existing buffer.
I'm working on a patch for this.

cpuguy83 · 2020-04-08T19:30:39Z

I believe moby/moby#40796 should fix the problem.

srstsavage · 2020-04-08T19:37:38Z

@cpuguy83 Thanks for looking at this. Just to be clear, the inital bug report and my issue description both use the fluentd log driver, and your PR mentions

This only affects json-file and local log drivers.

If that's the case, this issue probably shouldn't be closed by your PR?

cpuguy83 · 2020-04-08T19:48:34Z

Right on, I fixed it.

sparrc · 2020-06-09T18:21:51Z

@cpuguy83 Is this issue fixed? Not entirely clear to me if moby/moby#40796 only applies to to json-file and local or if it also affects the fluentd log driver.

I see from the PR that most of the code changes are to local and json-file, but there are also changes to a generic logger utility that may have fixed this fluentd issue? (https://github.com/moby/moby/pull/40796/files#diff-0d16783edb4c661112478f7e13a17694)

cpuguy83 · 2020-06-09T18:30:36Z

It is not fixed for fluentd. The logging utility is a shared implementation of a rotating log file used by local and json-file. As you may have guessed, fluentd does not use this.

flixr · 2020-11-30T11:12:41Z

Did the fix for json-file land in docker yet?

thaJeztah · 2020-11-30T14:46:14Z

@flixr the PR that was linked above is not in docker 19.03 (see moby/moby#41130 (review)), but it's in the docker 20.10 release candidates (GA to be released soon as well)

gp-Airee · 2022-04-21T14:43:56Z

Any update on GA?

thaJeztah · 2022-07-07T10:38:21Z

docker 20.10 has been release quite some time ago; people on this thread still running into this with the 20.10 (or above) version?

remram44 · 2022-07-07T14:25:51Z

I was running docker-ce 5:20.10.14~3-0~ubuntu-focal when I ran into this. Of course it's possible that I mis-diagnosed and this was not the right issue to subscribe to...

It only happened once.

ceecko · 2022-07-08T06:26:16Z

I confirm this is no longer an issue with 20.10.14

BenasPaulikas · 2022-11-27T02:11:54Z

I confirm 20.10.12 is buggy and 20.10.14 is OK
From 30GB of ram to <1GB

Nice fix! 🎉 🎉

wangw469 · 2023-01-16T14:03:44Z

@ceecko @BenasPaulikas

I think another issue moby/moby#43165 related to high memoy usage, and it has been fixed in 20.10.13:

Prevent an OOM when using the “local” logging driver with containers that produce a large amount of log messages moby/moby#43165.

I can reproduce the problem by running (thanks to @aeriksson )

terminal 1

docker run --log-driver local -it --rm --name foo ubuntu sh -c "apt-get update && apt-get install -y nyancat && nyancat"

terminal 2

docker logs -f foo

thaJeztah added the status/more-info-needed label Nov 12, 2019

cpuguy83 mentioned this issue Apr 8, 2020

Reduce allocations for logfile reader moby/moby#40796

Merged

sam-thibault closed this as completed Dec 13, 2022

oomichi mentioned this issue Apr 14, 2023

DockerD consumes all memory - Kubespray 2.21 kubernetes-sigs/kubespray#9985

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dockerd: high memory usage #848

dockerd: high memory usage #848

ceecko commented Nov 8, 2019

andrewhsu commented Nov 12, 2019

ceecko commented Nov 12, 2019

ceecko commented Nov 16, 2019

ceecko commented Dec 1, 2019

kolyshkin commented Dec 17, 2019

ceecko commented Dec 18, 2019

davidschrooten commented Dec 30, 2019

cpuguy83 commented Dec 30, 2019

ceecko commented Dec 30, 2019

srstsavage commented Feb 5, 2020 •

edited

ceecko commented Mar 8, 2020

gotamilarasan commented Apr 8, 2020

cpuguy83 commented Apr 8, 2020

cpuguy83 commented Apr 8, 2020

srstsavage commented Apr 8, 2020

cpuguy83 commented Apr 8, 2020

sparrc commented Jun 9, 2020

cpuguy83 commented Jun 9, 2020

flixr commented Nov 30, 2020

thaJeztah commented Nov 30, 2020

gp-Airee commented Apr 21, 2022

thaJeztah commented Jul 7, 2022

remram44 commented Jul 7, 2022

ceecko commented Jul 8, 2022

BenasPaulikas commented Nov 27, 2022 •

edited

wangw469 commented Jan 16, 2023 •

edited

dockerd: high memory usage #848

dockerd: high memory usage #848

Comments

ceecko commented Nov 8, 2019

Expected behavior

Actual behavior

Steps to reproduce the behavior

andrewhsu commented Nov 12, 2019

ceecko commented Nov 12, 2019

ceecko commented Nov 16, 2019

ceecko commented Dec 1, 2019

kolyshkin commented Dec 17, 2019

ceecko commented Dec 18, 2019

davidschrooten commented Dec 30, 2019

cpuguy83 commented Dec 30, 2019

ceecko commented Dec 30, 2019

srstsavage commented Feb 5, 2020 • edited

ceecko commented Mar 8, 2020

gotamilarasan commented Apr 8, 2020

cpuguy83 commented Apr 8, 2020

cpuguy83 commented Apr 8, 2020

srstsavage commented Apr 8, 2020

cpuguy83 commented Apr 8, 2020

sparrc commented Jun 9, 2020

cpuguy83 commented Jun 9, 2020

flixr commented Nov 30, 2020

thaJeztah commented Nov 30, 2020

gp-Airee commented Apr 21, 2022

thaJeztah commented Jul 7, 2022

remram44 commented Jul 7, 2022

ceecko commented Jul 8, 2022

BenasPaulikas commented Nov 27, 2022 • edited

wangw469 commented Jan 16, 2023 • edited

srstsavage commented Feb 5, 2020 •

edited

BenasPaulikas commented Nov 27, 2022 •

edited

wangw469 commented Jan 16, 2023 •

edited