Docker Daemon Hangs under load #13885

mjsalinger · 2015-06-11T13:15:09Z

Docker Daemon hangs under heavy load. Scenario is starting/stopping/killing/removing many containers/second - high utilization. Containers contain one port, and are run without logging and in detached mode. The container receives an incoming TCP connection, does some work, sends a response, and then exits. An outside process cleans up by killing/removing and starting a new container.

I cannot get docker info from an actual hung instance, as once it is hung I can't get docker to run without a reboot. The below info is from one of the instances that has had the problem after a reboot.

We also have instances where something completely locks up and the instance cannot even be ssh'd into. This usually happens sometime after the docker lockup occurs.

Docker Info

Containers: 8
Images: 65
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Kernel Version: 3.18.0-031800-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 2
Total Memory: 3.675 GiB
Name:
ID: FAEG:2BHA:XBTO:CNKH:3RCA:GV3Z:UWIB:76QS:6JAG:SVCE:67LH:KZBP
WARNING: No swap limit support

Docker Version

Client version: 1.6.0
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 4749651
OS/Arch (client): linux/amd64
Server version: 1.6.0
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 4749651
OS/Arch (server): linux/amd64

uname -a
Linux 3.18.0-031800-generic #201412071935 SMP Mon Dec 8 00:36:34 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 14972
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 14972
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

The text was updated successfully, but these errors were encountered:

duglin · 2015-06-11T13:26:17Z

would you happen to have a reduced testcase to show this? Perhaps a small Dockerfile for what's in the container and a bash script that does the work of starting/stopping/... the containers?

mjsalinger · 2015-06-11T13:43:54Z

Container is in dockerhub kinvey/blrunner#v0.3.8

Using the remote API with the following options:

CREATE
Image: 'kinvey/blrunner#v0.3.8'
AttachStdout: false
AttachStderr: false
ExposedPorts: {'7000/tcp': {}}
Tty: false
HostConfig:
PublishAllPorts: true
CapDrop: [
"CHOWN"
"DAC_OVERRIDE"
"FOWNER"
"KILL"
"SETGID"
"SETPCAP"
"NET_BIND_SERVICE"
"NET_RAW"
"SYS_CHROOT"
"MKNOD"
"SETFCAP"
"AUDIT_WRITE"
]
LogConfig:
Type: "none"
Config: {}

START

container.start

REMOVE
force:true

cpuguy83 · 2015-06-11T15:11:28Z

Hmm, are you seeing excessive resource usage?
The fact that ssh isn't even working makes me suspect.
Could very well be inode issue with overlay, or too many open FD's, etc.

mjsalinger · 2015-06-11T15:14:59Z

Not particularly in regards to excessive resource usage... But the early symptom is docker completely hanging while other processes hum along happily...

Important to note, we only have 8 containers running at any one time on any one instance.

mjsalinger · 2015-06-12T00:17:50Z

Captured some stats where docker is no longer resposive:

lsof | wc -l

shows 1025.

However, an error appears several times:
lsof: WARNING: can't stat() overlay file system /var/lib/docker/overlay/aba7820e8cb01e62f7ceb53b0d04bc1281295c38815185e9383ebc19c30325d0/merged
Output information may be incomplete.

Sample output of top:

top - 00:16:53 up 12:22, 2 users, load average: 2.01, 2.05, 2.05
Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 3853940 total, 2592920 used, 1261020 free, 172660 buffers
KiB Swap: 0 total, 0 used, 0 free. 1115644 cached Mem

24971 kinvey 20 0 992008 71892 10796 S 1.3 1.9 9:11.93 node
902 root 20 0 1365860 62800 12108 S 0.3 1.6 30:06.10 docker
29901 ubuntu 20 0 27988 6720 2676 S 0.3 0.2 3:58.17 tmux
1 root 20 0 33612 4152 2716 S 0.0 0.1 14:22.00 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:04.40 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root 20 0 0 0 0 S 0.0 0.0 2:21.81 rcu_sched
8 root 20 0 0 0 0 S 0.0 0.0 0:01.91 rcu_bh

unclejack · 2015-06-13T11:17:50Z

@mjsalinger The setup you're using is unsupported. The reason why it's unsupported is that you're using Ubuntu 14.04 with a custom kernel.

Where does that 3.18.0-031800 kernel come from? Did you notice that this kernel build is outdated? The kernel you're using has been built in December last year.

I'm sorry, but there's nothing to be debugged here. This issue might actually be a kernel bug related to overlay or to some other already fixed kernel bug which is no longer an issue in the latest version of kernel 3.18.

I'm going to close this issue. Please try again with an up to date 3.18 or newer kernel and check if you're running into the problem. Please keep in mind that there are multiple issues opened against overlay and that you'll probably experience problems with overlay even after updating to the latest kernel version and to the latest Docker version.

mjsalinger · 2015-06-14T18:53:43Z

@unclejack @cpuguy83 @LK4D4 Please reopen this issue. This was advice for the configuration that we are using was specifically given by the docker team and experimentation. We've tried newer kernels (3.19 +) and they have a kernel panic bug of some kind that we were running into - so the advice was to go with 3.18 pre-December because there was a known kernel bug introduced after that which caused a Kernel panic that we were running into, which to my knowledge, has not yet been fixed.

As far as OverlayFS, that was also presented to me as the ideal FS for Docker after experiencing numerous performance problems with AUFS. If this isn't a supported configuration, can someone help me to find a performant, stable configuration that will work for this use case? We've been pushing to get this stable for several months.

cpuguy83 · 2015-06-16T01:06:06Z

@mjsalinger Can you provide the inode usage for the volume that overlay is running on? (df -i /var/lib/docker ought to do).

mjsalinger · 2015-06-16T03:22:23Z

Thanks for reopening. If the answer is a different kernel, that's fine, I just want to get to a stable scenario.

df -i /ver/lib/docker

Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda1 1638400 643893 994507 40% /

unclejack · 2015-06-16T05:52:25Z

Overlay still has a bunch of problems: https://github.com/docker/docker/issues?q=is%3Aopen+is%3Aissue+label%3A%2Fsystem%2Foverlay

I wouldn't use overlay in production. Others have commented on this issue tracker that they're using AUFS in production and that it's been stable for them.

Kernel 3.18 is unsupported on Ubuntu 14.04. Canonical doesn't provide support for that.

mjsalinger · 2015-06-16T14:03:09Z

AUFS in production is not performant at all and has been anything but stable for us - I would routinely run into I/O Bottlenecks, freezes, etc.

See:
#11228, #12758, #12962

Switching to Overlay resolved all of the above issues - we only have this one issue remaining.

also:

http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
http://qconlondon.com/system/files/presentation-slides/Docker%20Clustering.pdf

It seems that Overlay is being presented as the driver of choice by the community in general. Fine if it's not stable yet, but neither is AUFS and there has to be some way to get the performance and stability I need with Docker. I'm all for trying new things, but in our previous configuration (AUFS on Ubuntu 12.04 and AUFS on Ubuntu 14.04) we could not get either stability or performance. At least with Overlay we get good performance and better stability - but we just need to resolve this one problem.

mjsalinger · 2015-06-16T14:14:05Z

I also experienced symptoms similar to these running default Ubuntu instances under AUFS (12.04, 14.04, and 15.04).

#10355 #13940

Both of these issues disappeared after switching to the current Kernel and OverlayFS.

unclejack · 2015-06-16T17:33:57Z

@mjsalinger I'd recommend using Ubuntu 14.04 with the latest kernel 3.13 packages. I'm using that myself and I haven't run into any of those problems at all.

mjsalinger · 2015-06-16T19:17:27Z

@unclejack Tried that and ran into the issues specified above with heavy, high volume usage (creating/destroying lots of containers) and AUFS was incredibly non-performant. So that's not an option.

cpuguy83 · 2015-06-16T19:20:09Z

@mjsalinger Are you using upstart to start docker? What does /etc/init/docker.conf look like?

mjsalinger · 2015-06-16T19:57:09Z

Yes, using upstart.

/etc/init/docker.conf

description "Docker daemon"

start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [!2345]
limit nofile 524288 1048576
limit nproc 524288 1048576

respawn

pre-start script
        # see also https://github.com/tianon/cgroupfs-mount/blob/master/cgroupfs-mount
        if grep -v '^#' /etc/fstab | grep -q cgroup \
                || [ ! -e /proc/cgroups ] \
                || [ ! -d /sys/fs/cgroup ]; then
                exit 0
        fi
        if ! mountpoint -q /sys/fs/cgroup; then
                mount -t tmpfs -o uid=0,gid=0,mode=0755 cgroup /sys/fs/cgroup
        fi
        (
                cd /sys/fs/cgroup
                for sys in $(awk '!/^#/ { if ($4 == 1) print $1 }' /proc/cgroups); do
                        mkdir -p $sys
                        if ! mountpoint -q $sys; then
                                if ! mount -n -t cgroup -o $sys cgroup $sys; then
                                        rmdir $sys || true
                                fi
                        fi
                done
        )
end script

script
        # modify these in /etc/default/$UPSTART_JOB (/etc/default/docker)
        DOCKER=/usr/bin/$UPSTART_JOB
        DOCKER_OPTS=
        if [ -f /etc/default/$UPSTART_JOB ]; then
                . /etc/default/$UPSTART_JOB
        fi
        exec "$DOCKER" -d $DOCKER_OPTS
end script

# Don't emit "started" event until docker.sock is ready.
# See https://github.com/docker/docker/issues/6647
post-start script
        DOCKER_OPTS=
        if [ -f /etc/default/$UPSTART_JOB ]; then
                . /etc/default/$UPSTART_JOB
        fi
        if ! printf "%s" "$DOCKER_OPTS" | grep -qE -e '-H|--host'; then
                while ! [ -e /var/run/docker.sock ]; do
                        initctl status $UPSTART_JOB | grep -qE "(stop|respawn)/" && exit 1
                        echo "Waiting for /var/run/docker.sock"
                        sleep 0.1
                done
                echo "/var/run/docker.sock is up"
        fi
end script

mjsalinger · 2015-06-17T00:07:42Z

We are also now seeing the below when running any Docker command on some instances, for example...

sudo docker ps

FATA[0000] Get http:///var/run/docker.sock/v1.18/containers/json?all=1: dial unix /var/run/docker.sock: resource temporarily unavailable. Are you trying to connect to a TLS-enabled
daemon without TLS?

LK4D4 · 2015-06-17T05:23:27Z

@mjsalinger It's just crappy error message. In most cases it means that daemon crashed.

cpuguy83 · 2015-06-17T17:28:53Z

@mjsalinger What do the docker daemon logs say during this? /var/log/upstart/docker.log should be the location.

mjsalinger · 2015-06-17T19:10:45Z

Frozen, no new entries coming in. Here are the last entries in the log:

INFO[46655] -job log(create, 48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46655] -job create() = OK (0)
INFO[46655] POST /containers/48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a/start
INFO[46655] +job start(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a)
INFO[46655] +job allocate_interface(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a)
INFO[46655] -job allocate_interface(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a) = OK (0)
INFO[46655] +job allocate_port(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a)
INFO[46655] POST /containers/create?Image=kinvey%2Fblrunner%3Av0.3.8&AttachStdout=false&AttachStderr=false&ExposedPorts=&Tty=false&HostConfig=
INFO[46655] +job create()
INFO[46655] DELETE /containers/4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187?force=true
INFO[46655] +job rm(4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187)
INFO[46656] -job allocate_port(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a) = OK (0)
INFO[46656] +job log(start, 48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(start, 48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] +job log(create, 7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(create, 7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] -job create() = OK (0)
INFO[46656] +job log(die, 4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(die, 4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] +job release_interface(4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187)
INFO[46656] POST /containers/7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f/start
INFO[46656] +job start(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f)
INFO[46656] +job allocate_interface(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f)
INFO[46656] -job allocate_interface(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f) = OK (0)
INFO[46656] +job allocate_port(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f)
INFO[46656] -job release_interface(4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187) = OK (0)
INFO[46656] DELETE /containers/cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b?force=true
INFO[46656] +job rm(cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b)
INFO[46656] +job log(destroy, 4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(destroy, 4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] -job rm(4d447093f522f1d74f482b2f76c91adfd38b5ad264202b1c8262f05a0edaf187) = OK (0)
INFO[46656] +job log(die, cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(die, cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] +job release_interface(cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b)
INFO[46656] DELETE /containers/1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20?force=true
INFO[46656] +job rm(1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20)
INFO[46656] -job allocate_port(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f) = OK (0)
INFO[46656] +job log(start, 7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(start, 7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] -job start(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a) = OK (0)
INFO[46656] POST /containers/create?Image=kinvey%2Fblrunner%3Av0.3.8&AttachStdout=false&AttachStderr=false&ExposedPorts=&Tty=false&HostConfig=
INFO[46656] +job create()
INFO[46656] +job log(create, 1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(create, 1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] -job create() = OK (0)
INFO[46656] +job log(die, 1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(die, 1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] +job release_interface(1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20)
INFO[46656] GET /containers/48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a/json
INFO[46656] +job container_inspect(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a)
INFO[46656] -job container_inspect(48abb699bb6b8aefe042c010d06268d5e13515d616c5ca61f3a4930a325de26a) = OK (0)
INFO[46656] POST /containers/1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830/start
INFO[46656] +job start(1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830)
INFO[46656] +job allocate_interface(1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830)
INFO[46656] -job allocate_interface(1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830) = OK (0)
INFO[46656] +job allocate_port(1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830)
INFO[46656] -job release_interface(cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b) = OK (0)
INFO[46656] -job start(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f) = OK (0)
INFO[46656] GET /containers/7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f/json
INFO[46656] +job container_inspect(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f)
INFO[46656] -job release_interface(1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20) = OK (0)
INFO[46656] -job container_inspect(7ef9e347b762b4fb34a85508e5d259a609392decf9ffc8488730dbe8e731c84f) = OK (0)
INFO[46656] +job log(destroy, cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(destroy, cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] -job rm(cb03fc14e5eab2acf01d1d42dec2fc1990cccca69149de2dc97873f87474db9b) = OK (0)
INFO[46656] +job log(destroy, 1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(destroy, 1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20, kinvey/blrunner:v0.3.8) = OK (0)
INFO[46656] -job rm(1e8ddec281bd9b5bfe239d0e955874f83d51ffec95c499f88c158639f7445d20) = OK (0)
INFO[46656] DELETE /containers/4cfeb48701f194cfd40f71d7883d82906d54a538084fa7be6680345e4651aa60?force=true
INFO[46656] +job rm(4cfeb48701f194cfd40f71d7883d82906d54a538084fa7be6680345e4651aa60)
INFO[46656] -job allocate_port(1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830) = OK (0)
INFO[46656] +job log(start, 1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830, kinvey/blrunner:v0.3.8)
INFO[46656] -job log(start, 1ae5798d7aeec4857944b40a27f1a69789323bbe8edb8d67a250150241484830, kinvey/blrunner:v0.3.8) = OK (0)

mjsalinger · 2015-06-19T01:25:44Z

@cpuguy83 Was the log helpful at all?

cpuguy83 · 2015-06-19T01:41:07Z

@mjsalinger Makes me think there's a deadlock somewhere since there's nothing else indicating an issue.

mjsalinger · 2015-06-19T08:50:14Z

@cpuguy83 That would make sense given the symptoms. Is there anything I can do to help further trace this issue and where it comes from?

cpuguy83 · 2015-06-19T16:00:54Z

Maybe we can get an strace to see that it's actually hanging on a lock.

mjsalinger · 2015-06-23T12:00:57Z

Ok. Will work to see if we can get that. We wanted to try 1.7 first - did that but did not notice any improvement.

mjsalinger · 2015-06-23T19:22:16Z

@cpuguy83 From one of the affected hosts:

root@<host>:~# strace -q -y -v -p 899
futex(0x134f1b8, FUTEX_WAIT, 0, NULL^C <detached ...>

mjsalinger · 2015-06-25T10:45:38Z

@cpuguy83 Any ideas?

mjsalinger · 2015-06-25T21:21:00Z

Seeing the following in 1.7 with containers not being killed/started. This seems to be a precurser to the problem (note didn't see these errors in 1.6 but did see a volume of dead containers start to build up, even though a command was issued to kill/remove)

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container 09c12c9f72d461342447e822857566923d5532280f9ce25659d1ef3e54794484: Link not found

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container 5e29407bb3154d6f5778676905d112a44596a23fd4a1b047336c3efaca6ada18: Link not found

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container be22e8b24b70e24e5269b860055423236e4a2fca08969862c3ae3541c4ba4966: Link not found

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container c067d14b67be8fb81922b87b66c0025ae5ae1ebed3d35dcb4052155afc4cafb4: Link not found

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container 7f21c4fd8b6620eba81475351e8417b245f86b6014fd7125ba0e37c6684e3e42: Link not found

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container b31531492ab7e767158601c438031c8e9ef0b50b9e84b0b274d733ed4fbe03a0: Link not found

Error spawning container: Error: HTTP code is 500 which indicates error: server error - Cannot start container 477dc3c691a12f74ea3f07af0af732082094418f6825f7c3d320bda0495504a1: iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 32822 -j DNAT --to-destination 172.17.0.44:7000 ! -i docker0: iptables: No chain/target/match by that name.
 (exit status 1)

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container 6965eec076db173f4e3b9a05f06e1c87c02cfe849821bea4008ac7bd0f1e083a: Link not found

Error spawning container: Error: HTTP code is 404 which indicates error: no such container - Cannot start container 7c721dd2b8d31b51427762ac1d0fe86ffbb6e1d7462314fdac3afe1f46863ff1: Link not found

Error spawning container: Error: HTTP code is 500 which indicates error: server error - Cannot start container c908d059631e66883ee1a7302c16ad16df3298ebfec06bba95232d5f204c9c75: iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 32837 -j DNAT --to-destination 172.17.0.47:7000 ! -i docker0: iptables: No chain/target/match by that name.
 (exit status 1)

Error spawning container: Error: HTTP code is 500 which indicates error: server error - Cannot start container c3e11ffb82fe08b8a029ce0a94e678ad46e3d2f3d76bed7350544c6c48789369: iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 32847 -j DNAT --to-destination 172.17.0.48:7000 ! -i docker0: iptables: No chain/target/match by that name.
 (exit status 1)

tonistiigi · 2016-11-28T19:54:16Z

@mblaschke I looked over your traces(https://gist.github.com/tonistiigi/0fb0abfb068a89975c072b68e6ed07ce for better view). I can't find anything suspicious in there though. All long running goroutines are from open io copy that is normal if there are running containers or execs as these goroutines don't hold any locks. From the traces I would expect that other commands like docker ps or docker exec/stop were not blocked in your case and the hang you saw was only that you expected a specific container or exec to finish but it didn't. This may be related to either the application itself hanging on something or containerd not sending us any events about the container(cc @mlaventure).

The warnings you have in the logs should be fixed with https://github.com/docker/containerd/pull/351 . They should only cause spam and not be anything serious. Because the debug log is not enabled in the logs I can't see if there are any suspicious commands sent to the daemon. There don't seem to be any meaningful logs for minutes before you took the stacktrace.

mblaschke · 2016-11-29T14:48:33Z

The same code works with Docker 1.10.3, but not after Docker 1.11.x. The serverspec tests are failing randomly with timeouts.
I have the feeling that it's happening more often when we run the tests in parallels (eg. running 2 oder 4 tests on 4 core cpu) than in serial mode but the result is the same. There is no test run which runs successfully.
With Docker 1.11 the whole docker daemon was hanging, since 1.12 Docker only the exec fails or run into a timeout.

mlaventure · 2016-11-29T16:35:41Z

@mblaschke I had a look at the trace too, it really looks like the exec is just not finishing with its IO.

What exec program are you executing? Does it fork new processes withing its own session id?

GameScripting · 2016-11-29T17:50:15Z

We are executing (docker exec) ip addr on a regular basis in all containers to find their IP-adresses assigned by https://github.com/rancher/rancher , which are unfortunately not part of the docker metadata (yet)

We are seeing the hanging problem on a regular basis within our production cluster.
With ip addr there sould not be much magic with processes running withing their own session id, right?

mblaschke · 2016-11-29T18:01:39Z

@mlaventure
We're using serverspec/rspec for testing and are executing very simple tests (file checks, simple command checks, simple application checks). Most of the time even simple file permission checks are failing.

Tests are here (but they need some environment settings for execution):
https://github.com/webdevops/Dockerfile/tree/develop/tests/serverspec

you could try it with our code base:

make requirements for fetching all the python (pip) and ruby modules
bin/console docker:pull --threads=auto (fetch ~180 images from the hub)
bin/console test:serverspec --threads=auto to run all tests in parallized mode or bin/console test:serverspec in serial mode

mlaventure · 2016-11-29T18:01:48Z

@GameScripting I'm getting confused now 😅 . Are you referring to the same context that @mblaschke is running from?

If not, which docker version are you using?

And to answer your question, no, it's unlikely that ip addr would do anything like this.

Which image are you using? What is the exact docker exec command being used?

GameScripting · 2016-11-29T18:11:41Z

Sorry, it was not my intention to make this more confusion.
I am not refering to the same context, I do not use the test-suite that hewas refering to. I wanted to provide more (hopefully useful) context and/or infos on what we are doing that might cause the issue.

Main issue on resolving this bug is that no one was yet able to come up with stable, reproducable steps to trigger the hanging.

Seems like @mblaschke found something so heis able to reliably trigger the bug.

wallneradam · 2016-11-29T19:47:02Z

I use the CoreOS stable (1185.3.0) with Docker 1.11.2.
I run a watch with docker exec to my Redis container to check some variables. At least a day the Docker daemon hangs.

But I've found a workaround until you find a solution. I use the ctrutility from containerd (https://github.com/docker/containerd) to start a process inside the running container. It can be also used to start and stop containers. I think it is integrated into docker 1.11.2.
Unfortunately it has another bug on CoreOS I reported here: https://github.com/docker/containerd/issues/356#event-874038823 (they fixed it since), so you need to regularly clean /tmp but at least it has been working for me for a week in production.

The next docker exec example:

docker exec -i container_name /bin/sh 'echo "Hello"'

can be translated to ctr:

/usr/bin/ctr --address=/run/docker/libcontainerd/docker-containerd.sock containers exec --id=${id_of_the_running_container} --pid=any_unique_process_name -a --cwd=/ /bin/sh -c 'echo "Hello"'

So you can translate scripts temporarily to ctr as a workaround.

tonistiigi · 2016-11-30T02:50:53Z

@mblaschke The commands you posted did fail for me but they don't look like docker failures. https://gist.github.com/tonistiigi/86badf5a41dff3fe53bd68d8e83e4ec4 Could you enable debug logs. The master build also stores more debug data about daemon internals and allows to trace containerd as well with sigusr1 signals. Because we are tracking a stuck process even ps aux could help.

mblaschke · 2016-11-30T13:01:43Z

@tonistiigi
I've forgot the alpine blacklist (currently build problems) so please run: bin/console test:serverspec --threads=auto --blacklist=:alpine

Sorry :(

tonistiigi · 2016-12-01T00:56:03Z

@mblaschke Still doesn't look like docker issue. It is hanging on docker exec <id> /usr/bin/php -i. If I trace the image being used and start it manually I see that this image doesn't have real php installed:

root@254424aecc57:/# php -v
HipHop VM 3.11.1 (rel)
Compiler: 3.11.1+dfsg-1ubuntu1
Repo schema: 2f678922fc70b326c82e56bedc2fc106c2faca61

And this HipHopVM doesn't support -i but just blocks.

mblaschke · 2016-12-01T21:26:33Z

@tonistiigi
I was hunting this bug in the last Docker versions long time and didn't see this bug in our testsuite..We've fixed it, thanks and sorry /o\

I've searched the build logs and found one test failure (random issue, not in the hhvm tests) when using 1.12.3. We will continue to stress Docker and try to find the issue.

AaronDMarasco-VSI · 2016-12-06T12:29:52Z

@coolljt0725 I just came back to work to find Docker hung again. docker ps hung in one session, and sudo service docker stop hung as well. Opened a third session:

$ sudo lvs
  LV          VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool vg_ex  twi-aot--- 750.00g             63.49  7.88                            
$ sudo dmsetup udevcomplete_all
This operation will destroy all semaphores with keys that have a prefix 3405 (0xd4d).
Do you really want to continue? [y/n]: y
2 semaphores with keys prefixed by 3405 (0xd4d) destroyed. 0 skipped.

As soon as that completed, the other two sessions unblocked. I was checking disk space because that had been a problem once before so was the first place I looked. The 2 semaphores may have been the two calls I had, but there were many hung docker calls from my Jenkins server, which is why I was poking around at all...

An example of hung output after I did the dmsetup call, the docker run had started days ago:

docker run -td -v /opt/Xilinx/:/opt/Xilinx/:ro -v /opt/Modelsim/:/opt/Modelsim/:ro -v /data/jenkins_workspace_modelsim/workspace/examples_hdl/APPLICATION/bias/HDL_PLATFORM/modelsim_pf/TARGET_OS/7:/build --name jenkins-examples_hdl-APPLICATION--bias--HDL_PLATFORM--modelsim_pf--TARGET_OS--7-546 jenkins/build:v3-C7 /bin/sleep 10m
docker: An error occurred trying to connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create?name=jenkins-examples_hdl-APPLICATION--bias--HDL_PLATFORM--modelsim_pf--TARGET_OS--7-546: EOF.
See 'docker run --help'.

Calpicow · 2016-12-21T22:52:56Z

Docker 1.11.2 hanging, stack trace: https://gist.github.com/Calpicow/871621ba807d6eb9b18b91e8c2eb4eef

tonistiigi · 2016-12-21T23:12:41Z

trace from @Calpicow seems to be stuck on devicemapper. But doesn't look to udev_wait case. @coolljt0725 @rhvgoyal

goroutine 488 [syscall, 10 minutes, locked to thread]:
github.com/docker/docker/pkg/devicemapper._C2func_dm_task_run(0x7fbffc010f80, 0x0, 0x0, 0x0)
	??:0 +0x47
github.com/docker/docker/pkg/devicemapper.dmTaskRunFct(0x7fbffc010f80, 0xc800000001)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper_wrapper.go:96 +0x21
github.com/docker/docker/pkg/devicemapper.(*Task).run(0xc8200266d8, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engin
e/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper.go:155 +0x37
github.com/docker/docker/pkg/devicemapper.SuspendDevice(0xc821b07380, 0x55, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper.go:627 +0x99
github.com/docker/docker/pkg/devicemapper.CreateSnapDevice(0xc821013f80, 0x25, 0x1a, 0xc821b07380, 0x55, 0x17, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper.go:759 +0x92
github.com/docker/docker/daemon/graphdriver/devmapper.(*DeviceSet).createRegisterSnapDevice(0xc8203be9c0, 0xc82184d1c0, 0x40, 0xc821a78f40, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/graphdriver/devmapper/deviceset.go:860 +0x557
github.com/docker/docker/daemon/graphdriver/devmapper.(*DeviceSet).AddDevice(0xc8203be9c0, 0xc82184d1c0, 0x40, 0xc8219c47c0, 0x40, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/graphdriver/devmapper/deviceset.go:1865 +0x81f
github.com/docker/docker/daemon/graphdriver/devmapper.(*Driver).Create(0xc8200ff770, 0xc82184d1c0, 0x40, 0xc8219c47c0, 0x40, 0x0, 0x0, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/graphdriver/devmapper/driver.go:124 +0x5f
github.com/docker/docker/daemon/graphdriver.(*NaiveDiffDriver).Create(0xc820363940, 0xc82184d1c0, 0x40, 0xc8219c47c0, 0x40, 0x0, 0x0, 0x0, 0x0)
	<autogenerated>:24 +0xaa
github.com/docker/docker/layer.(*layerStore).Register(0xc820363980, 0x7fc02faa3898, 0xc821a6ae00, 0xc821ace370, 0x47, 0x0, 0x0, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/layer/layer_store.go:266 +0x382
github.com/docker/docker/distribution/xfer.(*LayerDownloadManager).makeDownloadFunc.func1.1(0xc8210cd740, 0xc8210cd680, 0x7fc02fb6b758, 0xc820f422d0, 0xc820ee15c0, 0xc820ee1680, 0xc8210cd6e0, 0xc820e8e0b0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/distribut
ion/xfer/download.go:316 +0xc01
created by github.com/docker/docker/distribution/xfer.(*LayerDownloadManager).makeDownloadFunc.func1
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/distribution/xfer/download.go:341 +0x191

coolljt0725 · 2016-12-22T01:24:02Z

@Calpicow Do you have some dmesg log? and can your show the output of dmsetup status?

ahmetb · 2017-01-02T18:46:30Z

My docker ps hangs, but other commands like info/restart/images work just fine.

I have a huge SIGUSR1 dump below as well (didn't fit here). https://gist.github.com/ahmetalpbalkan/34bf40c02a78e319eaf5710acb15cf9a

docker Server Version: 1.11.2
Storage Driver: overlay
Kernel Version: 4.7.3-coreos-r3
Operating System: CoreOS 1185.5.0 (MoreOS)

It looks like I have a ton (like 700) of these goroutines:

...
goroutine 1149 [chan send, 1070 minutes]:
github.com/vishvananda/netlink.LinkSubscribe.func2(0xc821159d40, 0xc820fa4c60)
	/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:898 +0x2de
created by github.com/vishvananda/netlink.LinkSubscribe
	/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:901 +0x107

goroutine 442 [chan send, 1129 minutes]:
github.com/vishvananda/netlink.LinkSubscribe.func2(0xc8211eb380, 0xc821095920)
	/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:898 +0x2de
created by github.com/vishvananda/netlink.LinkSubscribe
	/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:901 +0x107
...

cpuguy83 · 2017-01-02T18:49:52Z

@ahmetalpbalkan You look blocked waiting on a netlink socket to return.
This is a bug in the kernel, but 1.12.5 should at least have a timeout on this netlink socket.
If I had to guess, you'd have something in your dmesg output like device_count = 1; waiting for <interface> to become free

ahmetb · 2017-01-02T18:55:40Z

@cpuguy83 yeah I saw the coreos/bugs#254 which looked similar to my case, however I don't see those "waiting" messages in the kernel logs the person and you mentioned.

it looks like 1.12.5 did not hit even the coreos alpha stream yet. is there a kernel/docker version I can downgrade and have it working?

cpuguy83 · 2017-01-02T18:58:14Z

@ahmetalpbalkan Yay, for another kernel bug.
This is why we introduced the timeout on the netlink socket... in all likelihood you won't be able to start/stop any new containers, but at least Docker won't be blocked.

GameScripting · 2017-01-02T19:09:11Z

Is know what exactly IS the bug? Was the kernel-bug reported upstream? Or is there even a kernel version where this bug has been fixed?

cpuguy83 · 2017-01-02T19:12:26Z

@GameScripting Issue will have to be reported to whatever distro this was produced in, and as you can see we have more than 1 issue causing the same effect here as well.

Calpicow · 2017-01-09T18:42:19Z

Here's another one with Docker v1.12.3

dump
dmesg
dmsetup

Relevant syslog:

Jan  6 01:41:19 ip-100-64-32-70 kernel: INFO: task kworker/u31:1:91 blocked for more than 120 seconds.
Jan  6 01:41:19 ip-100-64-32-70 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  6 01:41:19 ip-100-64-32-70 kernel: kworker/u31:1   D ffff880201fc98e0     0    91      2 0x00000000
Jan  6 01:41:19 ip-100-64-32-70 kernel: Workqueue: kdmremove do_deferred_remove [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: ffff88020141bcf0 0000000000000046 ffff8802044ce780 ffff88020141bfd8
Jan  6 01:41:20 ip-100-64-32-70 kernel: ffff88020141bfd8 ffff88020141bfd8 ffff8802044ce780 ffff880201fc98d8
Jan  6 01:41:20 ip-100-64-32-70 kernel: ffff880201fc98dc ffff8802044ce780 00000000ffffffff ffff880201fc98e0
Jan  6 01:41:20 ip-100-64-32-70 kernel: Call Trace:
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8163b959>] schedule_preempt_disabled+0x29/0x70
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff81639655>] __mutex_lock_slowpath+0xc5/0x1c0
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff81638abf>] mutex_lock+0x1f/0x2f
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa0392e9d>] __dm_destroy+0xad/0x340 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa03947e3>] dm_destroy+0x13/0x20 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa0398d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa039b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa0390dae>] do_deferred_remove+0xe/0x10 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff81645818>] ret_from_fork+0x58/0x90
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Jan  6 01:41:20 ip-100-64-32-70 kernel: INFO: task dockerd:31587 blocked for more than 120 seconds.
Jan  6 01:41:20 ip-100-64-32-70 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  6 01:41:20 ip-100-64-32-70 kernel: dockerd         D 0000000000000000     0 31587      1 0x00000080
Jan  6 01:41:20 ip-100-64-32-70 kernel: ffff8800e768fab0 0000000000000086 ffff880034215c00 ffff8800e768ffd8
Jan  6 01:41:20 ip-100-64-32-70 kernel: ffff8800e768ffd8 ffff8800e768ffd8 ffff880034215c00 ffff8800e768fbf0
Jan  6 01:41:20 ip-100-64-32-70 kernel: ffff8800e768fbf8 7fffffffffffffff ffff880034215c00 0000000000000000
Jan  6 01:41:20 ip-100-64-32-70 kernel: Call Trace:
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8163a879>] schedule+0x29/0x70
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff81638569>] schedule_timeout+0x209/0x2d0
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8163ac46>] wait_for_completion+0x116/0x170
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff810ab190>] ? call_srcu+0x70/0x70
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff81219e3f>] ? __sync_blockdev+0x1f/0x40
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa039318d>] __dm_suspend+0x5d/0x220 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa0394c9a>] dm_suspend+0xca/0xf0 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa0399fe0>] ? table_load+0x380/0x380 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa039a174>] dev_suspend+0x194/0x250 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa0399fe0>] ? table_load+0x380/0x380 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa039aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8112482d>] ? call_rcu_sched+0x1d/0x20
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffffa039ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff811f1e75>] do_vfs_ioctl+0x2e5/0x4c0
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff8128bbee>] ? file_has_perm+0xae/0xc0
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff81640d01>] ? __do_page_fault+0xb1/0x450
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff811f20f1>] SyS_ioctl+0xa1/0xc0
Jan  6 01:41:20 ip-100-64-32-70 kernel: [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b

cpuguy83 · 2017-01-09T19:04:29Z

@Calpicow Thanks, yours looks like devicemapper has stalled.

github.com/docker/docker/pkg/devicemapper._C2func_dm_task_run(0x7fd3a40231b0, 0x7fd300000000, 0x0, 0x0)
	??:0 +0x4c
github.com/docker/docker/pkg/devicemapper.dmTaskRunFct(0x7fd3a40231b0, 0xc821a91620)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper_wrapper.go:96 +0x75
github.com/docker/docker/pkg/devicemapper.(*Task).run(0xc820345838, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper.go:155 +0x37
github.com/docker/docker/pkg/devicemapper.SuspendDevice(0xc8219c8600, 0x5a, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper.go:648 +0x99
github.com/docker/docker/pkg/devicemapper.CreateSnapDevice(0xc821a915f0, 0x25, 0x21, 0xc8219c8600, 0x5a, 0x1f, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/pkg/devicemapper/devmapper.go:780 +0x92
github.com/docker/docker/daemon/graphdriver/devmapper.(*DeviceSet).createRegisterSnapDevice(0xc820433040, 0xc821084080, 0x40, 0xc8210842c0, 0x140000000, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/graphdriver/devmapper/deviceset.go:861 +0x550
github.com/docker/docker/daemon/graphdriver/devmapper.(*DeviceSet).AddDevice(0xc820433040, 0xc821084080, 0x40, 0xc82025b5e0, 0x45, 0x0, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/graphdriver/devmapper/deviceset.go:1887 +0xa5c
github.com/docker/docker/daemon/graphdriver/devmapper.(*Driver).Create(0xc82036af00, 0xc821084080, 0x40, 0xc82025b5e0, 0x45, 0x0, 0x0, 0x0, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/graphdriver/devmapper/driver.go:131 +0x6f
github.com/docker/docker/daemon/graphdriver/devmapper.(*Driver).CreateReadWrite(0xc82036af00, 0xc821084080, 0x40, 0xc82025b5e0, 0x45, 0x0, 0x0, 0x0, 0x0, 0x0)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/graphdriver/devmapper/driver.go:126 +0x86
github.com/docker/docker/daemon/graphdriver.(*NaiveDiffDriver).CreateReadWrite(0xc82021c500, 0xc821084080, 0x40, 0xc82025b5e0, 0x45, 0x0, 0x0, 0x0, 0x0, 0x0)
	<autogenerated>:28 +0xbe
github.com/docker/docker/layer.(*layerStore).CreateRWLayer(0xc82021c580, 0xc82154f980, 0x40, 0xc82025b2c0, 0x47, 0x0, 0x0, 0xc820981380, 0x0, 0x0, ...)
	/root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/layer/layer_store.go:476 +0x5a9
github.com/docker/docker/daemon.(*Daemon).setRWLayer(0xc820432ea0, 0xc8216d12c0, 0x0, 0x0)

Can you open a separate issue with all the details?
Thanks!

Calpicow · 2017-01-09T19:15:59Z

#30003

thaJeztah · 2023-06-06T07:50:33Z

Let me close this ticket for now, as it looks like it went stale.

unclejack closed this as completed Jun 13, 2015

cpuguy83 reopened this Jun 16, 2015

ahmetb mentioned this issue Jan 2, 2017

Docker API freezing, journal errors: "failed to create endpoint $container on network bridge: failed to find host side interface veth$id: Link not found" coreos/bugs#1731

Closed

schuylr mentioned this issue Jan 5, 2017

etcd Catalog Disaster Recovery is broken rancher/rancher#7310

Closed

mblaschke mentioned this issue Apr 14, 2017

Limited docker exec concurreny #32633

Open

thaJeztah closed this as not planned Won't fix, can't repro, duplicate, stale Jun 6, 2023

Docker Daemon Hangs under load #13885

Docker Daemon Hangs under load #13885

Comments

mjsalinger commented Jun 11, 2015

duglin commented Jun 11, 2015

mjsalinger commented Jun 11, 2015

cpuguy83 commented Jun 11, 2015

mjsalinger commented Jun 11, 2015

mjsalinger commented Jun 12, 2015

unclejack commented Jun 13, 2015

mjsalinger commented Jun 14, 2015

cpuguy83 commented Jun 16, 2015

mjsalinger commented Jun 16, 2015

unclejack commented Jun 16, 2015

mjsalinger commented Jun 16, 2015

mjsalinger commented Jun 16, 2015

unclejack commented Jun 16, 2015

mjsalinger commented Jun 16, 2015

cpuguy83 commented Jun 16, 2015

mjsalinger commented Jun 16, 2015

mjsalinger commented Jun 17, 2015

LK4D4 commented Jun 17, 2015

cpuguy83 commented Jun 17, 2015

mjsalinger commented Jun 17, 2015

mjsalinger commented Jun 19, 2015

cpuguy83 commented Jun 19, 2015

mjsalinger commented Jun 19, 2015

cpuguy83 commented Jun 19, 2015

mjsalinger commented Jun 23, 2015

mjsalinger commented Jun 23, 2015

mjsalinger commented Jun 25, 2015

mjsalinger commented Jun 25, 2015

tonistiigi commented Nov 28, 2016

mblaschke commented Nov 29, 2016

mlaventure commented Nov 29, 2016

GameScripting commented Nov 29, 2016 • edited

mblaschke commented Nov 29, 2016

mlaventure commented Nov 29, 2016

GameScripting commented Nov 29, 2016 • edited

wallneradam commented Nov 29, 2016

tonistiigi commented Nov 30, 2016

mblaschke commented Nov 30, 2016

tonistiigi commented Dec 1, 2016

mblaschke commented Dec 1, 2016

AaronDMarasco-VSI commented Dec 6, 2016 • edited

Calpicow commented Dec 21, 2016

tonistiigi commented Dec 21, 2016

coolljt0725 commented Dec 22, 2016

ahmetb commented Jan 2, 2017 • edited

cpuguy83 commented Jan 2, 2017

ahmetb commented Jan 2, 2017 • edited

cpuguy83 commented Jan 2, 2017

GameScripting commented Jan 2, 2017

cpuguy83 commented Jan 2, 2017

Calpicow commented Jan 9, 2017 • edited

cpuguy83 commented Jan 9, 2017

Calpicow commented Jan 9, 2017

thaJeztah commented Jun 6, 2023

GameScripting commented Nov 29, 2016 •

edited

GameScripting commented Nov 29, 2016 •

edited

AaronDMarasco-VSI commented Dec 6, 2016 •

edited

ahmetb commented Jan 2, 2017 •

edited

ahmetb commented Jan 2, 2017 •

edited

Calpicow commented Jan 9, 2017 •

edited