Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker causes system freeze on Ubuntu 14.10 #10355

Closed
relgames opened this issue Jan 26, 2015 · 24 comments
Closed

Docker causes system freeze on Ubuntu 14.10 #10355

relgames opened this issue Jan 26, 2015 · 24 comments

Comments

@relgames
Copy link

Starting multiple Docker containers hangs the system.
Not sure what exact steps are but I have seen such behaviour several times.

Jan 26 15:57:27 oleg kernel: [257250.221647] device vethf7a6cc6 entered promiscuous mode
Jan 26 15:57:27 oleg kernel: [257250.221822] IPv6: ADDRCONF(NETDEV_UP): vethf7a6cc6: link is not ready
Jan 26 15:57:27 oleg kernel: [257250.271640] IPv6: ADDRCONF(NETDEV_CHANGE): vethf7a6cc6: link becomes ready
Jan 26 15:57:27 oleg kernel: [257250.271692] docker0: port 1(vethf7a6cc6) entered forwarding state
Jan 26 15:57:27 oleg kernel: [257250.271705] docker0: port 1(vethf7a6cc6) entered forwarding state
Jan 26 15:57:28 oleg kernel: [257251.014089] docker0: port 1(vethf7a6cc6) entered disabled state
Jan 26 15:57:28 oleg kernel: [257251.015661] device vethf7a6cc6 left promiscuous mode
Jan 26 15:57:28 oleg kernel: [257251.015677] docker0: port 1(vethf7a6cc6) entered disabled state
Jan 26 15:57:30 oleg kernel: [257252.550674] device veth7707973 entered promiscuous mode
Jan 26 15:57:30 oleg kernel: [257252.551075] IPv6: ADDRCONF(NETDEV_UP): veth7707973: link is not ready
Jan 26 15:57:30 oleg kernel: [257252.598878] IPv6: ADDRCONF(NETDEV_CHANGE): veth7707973: link becomes ready
Jan 26 15:57:30 oleg kernel: [257252.598919] docker0: port 1(veth7707973) entered forwarding state
Jan 26 15:57:30 oleg kernel: [257252.598935] docker0: port 1(veth7707973) entered forwarding state
Jan 26 15:57:45 oleg kernel: [257267.637453] docker0: port 1(veth7707973) entered forwarding state

Here it hangs. Only off/on with a power button helps. Event Kernel Reset keys are not working ( https://en.wikipedia.org/wiki/Magic_SysRq_key )

Jan 26 15:58:43 oleg kernel: [ 0.000000] Initializing cgroup subsys cpuset
Jan 26 15:58:43 oleg kernel: [ 0.000000] Initializing cgroup subsys cpu
Jan 26 15:58:43 oleg kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Jan 26 15:58:43 oleg kernel: [ 0.000000] Linux version 3.16.0-29-generic (buildd@tipua) (gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #39-Ubuntu SMP Mon Dec 15 22:27:29 UTC 2014 (Ubuntu 3.16.0-29.39-gen
Jan 26 15:58:43 oleg kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-29-generic.efi.signed root=/dev/mapper/ubuntu--vg-root ro

I'm not a Linux guru so let me know where else should I look for logs, dumps, etc

@relgames
Copy link
Author

Looks like it also caused FS coruption, some files are lost. Not cool.

@phemmer
Copy link
Contributor

phemmer commented Jan 26, 2015

If you can provide docker info and docker version, it would be helpful.
Also if you have the last few lines (50 or so) of the docker log when this happens, that would also be great. This can likely be found at /var/log/upstart/docker.log.

Also, are you sure this is a docker issue? If the system completely freezes, that sounds like a kernel panic. If this is a cloud hosted system, you can try getting the console output from the cloud hosting console (or whatever the provider has), as the kernel panic output won't be in filesystem logs.
Edit: just noticed, it sounds like you have physical access, so not a cloud system.

@anandkumarpatel
Copy link
Contributor

+1 I have seen this 2 times in the past month
syslog

Jan 24 01:05:05 main-gyarados kernel: [684812.792585] docker0: port 22(veth170a52e) entered disabled state
Jan 24 01:05:05 main-gyarados kernel: [684812.793708] device veth170a52e left promiscuous mode
Jan 24 01:05:05 main-gyarados kernel: [684812.793715] docker0: port 22(veth170a52e) entered disabled state
Jan 24 01:05:41 main-gyarados dd.collector[1522]: INFO (collector.py:379): Finished run #35160. Collection time: 4.8s. Emit time: 0.02s
Jan 24 01:06:03 main-gyarados kernel: [684871.351109] audit_printk_skb: 228 callbacks suppressed
Jan 24 01:06:03 main-gyarados kernel: [684871.351113] type=1400 audit(1422061563.840:408230): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351148] type=1400 audit(1422061563.840:408231): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351181] type=1400 audit(1422061563.840:408232): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351213] type=1400 audit(1422061563.840:408233): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="trace" denied_mask="trace" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351270] type=1400 audit(1422061563.840:408234): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="read" denied_mask="read" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351308] type=1400 audit(1422061563.840:408235): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="read" denied_mask="read" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351321] type=1400 audit(1422061563.840:408236): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="read" denied_mask="read" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351347] type=1400 audit(1422061563.840:408237): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="read" denied_mask="read" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351424] type=1400 audit(1422061563.840:408238): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="read" denied_mask="read" peer="docker-default"
Jan 24 01:06:03 main-gyarados kernel: [684871.351495] type=1400 audit(1422061563.840:408239): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=30184 comm="lsof" requested_mask="read" denied_mask="read" peer="docker-default"
Jan 24 01:06:08 main-gyarados dd.forwarder[1521]: INFO (transaction.py:158): No transaction to flush during flush #229960
Jan 24 01:55:03 main-gyarados rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1286" x-info="http://www.rsyslog.com"] start
Jan 24 01:55:03 main-gyarados rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ]
Jan 24 01:55:03 main-gyarados rsyslogd: rsyslogd's groupid changed to 104
Jan 24 01:55:03 main-gyarados rsyslogd: rsyslogd's userid changed to 101
Jan 24 01:55:03 main-gyarados kernel: [    0.000000] Initializing cgroup subsys cpuset
Jan 24 01:55:03 main-gyarados kernel: [    0.000000] Initializing cgroup subsys cpu
Jan 24 01:55:03 main-gyarados kernel: [    0.000000] Initializing cgroup subsys cpuacct
Jan 24 01:55:03 main-gyarados kernel: [    0.000000] Linux version 3.13.0-29-generic (buildd@toyol) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 (Ubuntu 3.13.0-29.53-generic 3.13.11.2)

my sysinfo

docker -D info
Containers: 96
Images: 8537
Storage Driver: aufs
 Root Dir: /docker/aufs
 Dirs: 8733
Execution Driver: native-0.2
Kernel Version: 3.13.0-29-generic
Operating System: Ubuntu 14.04.1 LTS
CPUs: 32
Total Memory: 58.81 GiB
Name: main-gyarados
ID: 734L:23AA:VSEA:DINN:ZTA2:OHCP:UHZV:RQAQ:DWGE:UVA3:IAMI:RAIJ
Debug mode (server): false
Debug mode (client): true
Fds: 98
Goroutines: 97
EventsListeners: 1
Init Path: /usr/bin/docker
Docker Root Dir: /docker
WARNING: No swap limit support
uname -a
Linux main-gyarados 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

@unclejack
Copy link
Contributor

@anandkumarpatel That kernel is ancient. The latest is 3.13.0-44 and it fixes quite a lot of bugs. It also most likely includes the fix for a data corruption bug we've known about and discussed in #7229.

@relgames kernel 3.16 might also be affected by that bug, but I'm not sure. Either way, you should make sure you've installed all system updates on your system.

@relgames
Copy link
Author

$ docker info
Containers: 286
Images: 539
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 1136
Execution Driver: native-0.2
Kernel Version: 3.16.0-29-generic
Operating System: Ubuntu 14.10
CPUs: 4
Total Memory: 15.63 GiB
Name: oleg
ID: FZVQ:WCJ5:M5XM:KPBO:ANO3:ODWM:I7P2:3I3F:APBY:OQGN:M4KH:PIUA
WARNING: No swap limit support
$ docker version
Client version: 1.4.1
Client API version: 1.16
Go version (client): go1.3.3
Git commit (client): 5bc2ff8
OS/Arch (client): linux/amd64
Server version: 1.4.1
Server API version: 1.16
Go version (server): go1.3.3
Git commit (server): 5bc2ff8
$ cat /etc/issue
Ubuntu 14.10 \n \l
$ uname -a
Linux oleg 3.16.0-29-generic #39-Ubuntu SMP Mon Dec 15 22:27:29 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

This is real machine, not a cloud.

Not sure if it is related, I will try to reproduce tomorrow, but I also connected my phone to the USB port around that time:

Jan 26 15:54:22 oleg kernel: [257065.256839] docker0: port 1(veth9f61071) entered disabled state
Jan 26 15:54:22 oleg NetworkManager[1148]:    SCPlugin-Ifupdown: devices removed (path: /sys/devices/virtual/net/veth9f61071, iface: veth9f61071)
Jan 26 15:56:03 oleg kernel: [257165.672095] usb 2-1.6.4: new high-speed USB device number 11 using ehci-pci
Jan 26 15:56:03 oleg kernel: [257165.767343] usb 2-1.6.4: New USB device found, idVendor=04e8, idProduct=6860
Jan 26 15:56:03 oleg kernel: [257165.767356] usb 2-1.6.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jan 26 15:56:03 oleg kernel: [257165.767359] usb 2-1.6.4: Product: SAMSUNG_Android
Jan 26 15:56:03 oleg kernel: [257165.767362] usb 2-1.6.4: Manufacturer: SAMSUNG
Jan 26 15:56:03 oleg kernel: [257165.767364] usb 2-1.6.4: SerialNumber: 2b1097cf
Jan 26 15:56:03 oleg colord: Device added: sysfs-SAMSUNG-SAMSUNG_Android
Jan 26 15:56:07 oleg colord-sane: io/hpmud/pp.c 627: unable to read device-id ret=-1
Jan 26 15:57:27 oleg kernel: [257250.221647] device vethf7a6cc6 entered promiscuous mode
Jan 26 15:57:27 oleg kJan 26 15:58:43 oleg rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="674" x-info="http://www.rsyslog.com"] start
Jan 26 15:58:43 oleg rsyslogd: rsyslogd's groupid changed to 103
Jan 26 15:58:43 oleg rsyslogd: rsyslogd's userid changed to 100
Jan 26 15:58:43 oleg kernel: [    0.000000] Initializing cgroup subsys cpuset

Also last lines from docker log:

2015/01/26 15:54:20 http: response.WriteHeader on hijacked connection
INFO[257102] GET /v1.12/containers/079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6/json
INFO[257102] +job container_inspect(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6)
INFO[257102] -job container_inspect(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6) = OK (0)
INFO[257103] GET /v1.12/containers/079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6/json
INFO[257103] +job container_inspect(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6)
INFO[257103] -job container_inspect(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6) = OK (0)
INFO[257105] POST /v1.12/containers/079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6/kill
INFO[257105] +job kill(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6)
INFO[257105] +job log(die, 079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6, f024d621128b)
INFO[257105] -job log(die, 079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6, f024d621128b) = OK (0)
INFO[257105] +job release_interface(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6)
INFO[257105] -job release_interface(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6) = OK (0)
INFO[257105] -job logs(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6) = OK (0)
INFO[257105] +job log(kill, 079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6, f024d621128b)
INFO[257105] -job log(kill, 079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6, f024d621128b) = OK (0)
INFO[257105] -job kill(079831439b9cdd3e2f55899f68982a573805d6df48604e3f9fbddeda77892fc6) = OK (0)
INFO[257274] GET /v1.16/containers/json
INFO[257274] +job containers()
INFO[257274] -job containers() = OK (0)
INFO[257289] POST /v1.12/build
INFO[257289] +job build()
INFO[257290] +job allocate_interface(25f7f692a95ad27e1de21534a7e69ab1aa9eaf754718124bc90b046ad3b420ca)
INFO[257290] -job allocate_interface(25f7f692a95ad27e1de21534a7e69ab1aa9eaf754718124bc90b046ad3b420ca) = OK (0)
INFO[257290] +job log(start, 25f7f692a95ad27e1de21534a7e69ab1aa9eaf754718124bc90b046ad3b420ca, 81697aeac1fc)
INFO[257290] -job log(start, 25f7f692a95ad27e1de21534a7e69ab1aa9eaf754718124bc90b046ad3b420ca, 81697aeac1fc) = OK (0)
INFO[257290] +job loINFO[0000] +job serveapi(unix:///var/run/docker.sock)
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
INFO[0000] +job init_networkdriver()
INFO[0000] -job init_networkdriver() = OK (0)
INFO[0000] WARNING: Your kernel does not support cgroup swap limit.
INFO[0000] Loading containers: start.

INFO[0000] Loading containers: done.
INFO[0000] docker daemon: 1.4.1 5bc2ff8; execdriver: native-0.2; graphdriver: aufs

@unclejack
Copy link
Contributor

@relgames If you want something more likely to be stable, you should probably try Ubuntu 14.04. Ubuntu 14.10 is going to be replaced by Ubuntu 15.04 soon, so I wouldn't expect some hidden bug in kernel 3.16 to be fixed soon (edit: I'm not saying it wouldn't get fixed, just that it might have lower priority for a fix).

@relgames
Copy link
Author

Updated to the latest kernel 3.18.3, will see how it goes.

@haldean
Copy link

haldean commented Apr 13, 2015

I'm seeing a similar thing (same kernel, same syslog messages), but the client can't connect to the daemon at all; it fails with read unix /var/run/docker.sock: connection reset by peer. Are you trying to connect to a TLS-enabled daemon without TLS?. This all started happening after a reboot. Did you find a fix?

@TrustNoOne
Copy link

It looks very similar to my issue, but I'm not sure it's the same thing

#13940

if you install crashdump (https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html) we can check if it's the same thing

@assimovt
Copy link

I am having a similar issue on Ubuntu 14.04. My dedicated server just hangs and I need to reboot it. What I can do to help debugging this issue? The output from docker -D info:

Containers: 366
Images: 738
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1470
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-52-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 31.36 GiB
Name: plateau
ID: HVH3:2PFC:SNZA:6QEX:BLOS:OZOF:K2AG:WHSI:ND6J:AESY:LMD5:BV6B
WARNING: No swap limit support

@harlov
Copy link

harlov commented Sep 5, 2015

Similar issue on Ubuntu 14.04.03 , kernel 3.19. All networking freeze for 1-2 minutes, and then all become normal. It's repeated by 10-15 min interval.

docker -D info:

Containers: 40
Images: 523
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 603
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-25-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 8
Total Memory: 31.29 GiB
Name: crosspromo-inpgngno01
ID: EOO2:DMGR:POHZ:VLKM:6KKZ:DX6S:5XIN:X4DA:22HH:L62N:FCI5:O32P
WARNING: No swap limit support

@rbjorklin
Copy link

Ping @thaJeztah
Running Ubuntu 14.04.4 all patched up with docker 1.10.2 we just had 6 out of 7 virtual machines (VmWare) completely freeze at pretty much the same time in our dev environment. The console provided by VmWare was completely unresponsive. We have hundreds of VMs running and I've never observed this behavior before. I'd like to blame docker but currently I have no hard proof. I'm at home now with no access so the docker output below isn't 100% accurate but very close. This is definitely a blocking issue before we can consider docker being ready for production use. I'll update this post with the actual server output aswell as log output first thing Monday morning.

$ uname -a
Linux vagrant-ubuntu-trusty-64 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:27:58 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ sudo docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.10.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 0
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: host bridge null
Kernel Version: 3.13.0-79-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.463 GiB
Name: vagrant-ubuntu-trusty-64
ID: PSS7:3JN2:NTAG:3DXP:5KUA:JGK3:C2ZY:4XC5:BDZ2:LI2C:ED6T:Q5CA
WARNING: No swap limit support

$ sudo docker version
Client:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:37:01 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:37:01 2016
 OS/Arch:      linux/amd64

@thaJeztah
Copy link
Member

@rbjorklin thanks for reporting, please provide as much information as possible (e.g. How are containers started, what kind of processes are started in the container, amount of logging, etc.). Note that there has been a kernel issue with aufs, but that should be fixed in 3.13.0-79.123 (see #18180 (comment)).

When did you encounter the hang? Were those machines fresh installs or just upgraded from 1.9.x?

@rbjorklin
Copy link

@thaJeztah We are running Marathon on top of Mesos so containers are started by the Mesos slave. All containers are running the official tomcat image with a bash script as ENTRYPOINT that traps sigterm to handle signals nicely. Inside the container we are also running the zabbix-agent to poll JMX values and report back. Pretty much all logging is sent out of the container to logstash with gelf. Tomcat is using this to get it's logs out. We encountered #18180 but this is different, this is way more serious since the entire machines froze.

Sometime between 15.00 & 16.00 CET today (2016-03-11). The machines were upgraded to 1.10 the day it was released (2016-02-04) and then upgraded to 1.10.2 about two weeks ago.

@thaJeztah
Copy link
Member

@rbjorklin if you're running Mesos, also be sure to upgrade to 1.10.3; 1.10.3 carries a patch that affected users running Mesos (see #19950). These hangs started after upgrading to 1.10.2? If so, can you open a new issue (Monday would be fine if you have access to those logs), to start "fresh".

@ninchan8328
Copy link

I have the same problem too with kernel

root:/var/log/upstart# docker info
Containers: 0
Images: 393
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 427
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-43-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 6
Total Memory: 11.74 GiB
Name: rcdn6-vm85-144
ID: FSVF:IIJG:YG7C:64RX:IGKC:LBVT:WA4A:65MM:PP4F:3ASJ:6CUM:KDAC
No Proxy: localhost, 127.0.0.1, ::1
/var/run/docker.sock, .docker
WARNING: No swap limit support

root:/var/log/upstart# docker --version
Docker version 1.8.2, build 0a8c2e3

is it fixed in any release yet?

@cpuguy83
Copy link
Member

@ninchan8328 What problem? When are you seeing problems? What is your daemon configuration?

@ZaZaLee
Copy link

ZaZaLee commented Feb 21, 2017

is this problem solved? I had the same problem...
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: linux/amd64

Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: linux/amd64

@thaJeztah
Copy link
Member

@ZaZaLee this looks to be a kernel issue, so make sure you have your kernel up to date.

@daveoncode
Copy link

i have the same problem with ubuntu 17.04 and recent kernel:

Containers: 45
Running: 37
Paused: 0
Stopped: 8
Images: 55
Server Version: 17.06.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 338
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.10.0-19-generic
Operating System: Ubuntu 17.04
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.33GiB
Name: dave-xps
ID: EFRY:MN32:DD2S:RLVB:EDUG:PACU:PJSI:SCCA:WB6N:SG24:HDSS:WOAM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

@thaJeztah
Copy link
Member

@daveoncode can you open a new issue with details, steps to reproduce and relevant daemon, system logs?

@adaiguoguo
Copy link

adaiguoguo commented Sep 1, 2017

I have the same problem with centos7 and kernel 3.10.0-514.26.2.el7.x86_64:
docker info

Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 15
Server Version: 17.06.0-101
Storage Driver: devicemapper
Pool Name: docker-253:2-131081-pool
Pool Blocksize: 65.54kB
Base Device Size: 17.18GB
Backing Filesystem: xfs
Data file: /dev/vg/docker-data
Metadata file: /dev/vg/docker-metadata
Data Space Used: 7.671GB
Data Space Total: 161.1GB
Data Space Available: 153.4GB
Metadata Space Used: 20.16MB
Metadata Space Total: 2.5GB
Metadata Space Available: 2.48GB
Thin Pool Minimum Free Space: 16.11GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3addd840653146c90a254301d6c3a663c7fd6429
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.26.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 56
Total Memory: 125.4GiB
Name: xg-mesos-94
ID: X6CP:GUEV:HCJ3:P3CE:4CC4:YT2Y:UU73:E7GH:IZWB:WENM:JM3X:CPMU
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

vmcore-dmesg

[2525425.910118] device veth087bf86 entered promiscuous mode
[2525425.910217] IPv6: ADDRCONF(NETDEV_UP): veth087bf86: link is not ready
[2525425.999730] IPv6: ADDRCONF(NETDEV_CHANGE): veth087bf86: link becomes ready
[2525425.999760] dockermesos: port 6(veth087bf86) entered forwarding state
[2525425.999767] dockermesos: port 6(veth087bf86) entered forwarding state
[2525426.071628] dockermesos: port 6(veth087bf86) entered disabled state
[2525426.093428] dockermesos: port 6(veth087bf86) entered disabled state
[2525426.093655] device veth087bf86 left promiscuous mode
[2525426.093663] dockermesos: port 6(veth087bf86) entered disabled state
[2525426.114573] XFS (dm-8): Unmounting Filesystem
[2525904.005206] NMI watchdog: Watchdog detected hard LOCKUP on cpu 20

And I use crash to analysis vmcore and find out this RIP line

crash> bt
PID: 0 TASK: ffff8810e9f40000 CPU: 20 COMMAND: "swapper/20"
#0 [ffff88203ef859f0] machine_kexec at ffffffff81059beb
#1 [ffff88203ef85a50] __crash_kexec at ffffffff81105822
#2 [ffff88203ef85b20] panic at ffffffff81680541
#3 [ffff88203ef85ba0] nmi_panic at ffffffff81085abf
#4 [ffff88203ef85bb0] watchdog_overflow_callback at ffffffff8112f879
#5 [ffff88203ef85bc8] __perf_event_overflow at ffffffff81174d2e
#6 [ffff88203ef85c00] perf_event_overflow at ffffffff81175974
#7 [ffff88203ef85c10] intel_pmu_handle_irq at ffffffff81009d88
#8 [ffff88203ef85e38] perf_event_nmi_handler at ffffffff8168ed6b
#9 [ffff88203ef85e58] nmi_handle at ffffffff816901b7
#10 [ffff88203ef85eb0] do_nmi at ffffffff816903c3
#11 [ffff88203ef85ef0] end_repeat_nmi at ffffffff8168f5d3
[exception RIP: distribute_cfs_runtime+114]
RIP: ffffffff810d18b2 RSP: ffff88203ef83e60 RFLAGS: 00000002
RAX: ffff88062868ef00 RBX: ffff88103fad6c40 RCX: 0000000000000008
RDX: ffff88062868ef00 RSI: 0000000000000038 RDI: ffff88203f3d6c40
RBP: ffff88203ef83e98 R8: ffffffff816b89e0 R9: 0000000000000001
R10: ffff880422e23d60 R11: 0000000000000000 R12: 00000000005d8595
R13: ffff881b44ae3240 R14: 0008f94bdad518ae R15: ffff88062868ee00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- ---
#12 [ffff88203ef83e60] distribute_cfs_runtime at ffffffff810d18b2
#13 [ffff88203ef83ea0] sched_cfs_period_timer at ffffffff810d1acf
#14 [ffff88203ef83ed8] __hrtimer_run_queues at ffffffff810b4d72
#15 [ffff88203ef83f30] hrtimer_interrupt at ffffffff810b5310
#16 [ffff88203ef83f80] local_apic_timer_interrupt at ffffffff81051037
#17 [ffff88203ef83f98] smp_apic_timer_interrupt at ffffffff81699f0f
#18 [ffff88203ef83fb0] apic_timer_interrupt at ffffffff8169845d
--- ---
#19 [ffff8810e9f4bde8] apic_timer_interrupt at ffffffff8169845d
[exception RIP: native_safe_halt+6]
RIP: ffffffff81060fe6 RSP: ffff8810e9f4be98 RFLAGS: 00000286
RAX: 00000000ffffffed RBX: ffff88203ef8d080 RCX: 0100000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
RBP: ffff8810e9f4be98 R8: 0000000000000000 R9: 0000000000000000
R10: ffff880422e23d60 R11: 0000000000000000 R12: 0008f9633d776a80
R13: ffff88203ef8fde0 R14: 778ffcdd0527890b R15: 0000000000000082
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#20 [ffff8810e9f4bea0] default_idle at ffffffff810347ff
#21 [ffff8810e9f4bec0] arch_cpu_idle at ffffffff81035146
#22 [ffff8810e9f4bed0] cpu_startup_entry at ffffffff810e82f5
#23 [ffff8810e9f4bf28] start_secondary at ffffffff8104f0da
crash> dis -l ffffffff810d18b2
/usr/src/debug/kernel-3.10.0-514.26.2.el7/linux-3.10.0-514.26.2.el7.x86_64/kernel/sched/fair.c: 3434
0xffffffff810d18b2 <distribute_cfs_runtime+114>: mov %rbx,%rdi

@antonio-petricca
Copy link

Mee too:

  • Laptop Dell E5450, 8Gb RAM
  • Linux Mint 18.3 (Ubuntu 16.04), Swap

Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 6
Server Version: 18.03.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.13-041513-generic
Operating System: Linux Mint 18.3
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.564GiB
Name: svd026p15s
ID: UTTI:4PTV:RVAL:5Z56:DL5V:Z3VU:7GEK:Q6DQ:BP6G:Q7ZY:2TRL:TPUG
Docker Root Dir: /var/lib/docker
Debug Mode (client): true
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

@AkihiroSuda
Copy link
Member

I'm closing this. If somebody is still hitting this, please open a new issue and also consider contacting to the distro's kernel maintainers.

Note that system hang-up may happen in various different reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests