Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic when having a privileged container with docker >= 1.10 #27885

Closed
rata opened this issue Jun 22, 2016 · 28 comments
Closed

Kernel panic when having a privileged container with docker >= 1.10 #27885

rata opened this issue Jun 22, 2016 · 28 comments
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@rata
Copy link
Member

rata commented Jun 22, 2016

Hi,

I'm using a privileged container in a kubernetes pod to build images. The container runs docker 1.10.3. I'm using kubernetes 1.2.4 on AWS (setup with kube-up).

From time to time, a node crashes. Here is the output of the last crash at the end.

It seems this is the bug, and maybe it's related to using docker >= 1.10 on debian jessie kernel (although it is not confirmed) as reported here: moby/moby#21081

If this is the case, THIS PROBABLY AFFECTS kubernentes 1.3 that is due to be released.

cc @justinsb

[   82.728265] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.760820] aufs au_opts_verify:1570:docker[1635]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.896108] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.928699] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.992993] aufs au_opts_verify:1570:docker[1673]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.385415] aufs au_opts_verify:1570:docker[1691]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.480134] aufs au_opts_verify:1570:docker[1691]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.592429] aufs au_opts_verify:1570:docker[1744]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.002341] aufs au_opts_verify:1570:docker[1689]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.083000] aufs au_opts_verify:1570:docker[1516]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.140267] aufs au_opts_verify:1570:docker[1516]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.219145] aufs au_opts_verify:1570:docker[1801]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.252038] aufs au_opts_verify:1570:docker[1801]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.293019] aufs au_opts_verify:1570:docker[1805]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.581778] aufs au_warn_loopback:122:loop1[1857]: you may want to try another patch for loopback file on ext4(0xef53) branch
[   84.603270] divide error: 0000 [#1] SMP 
[   84.604057] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xt_statistic xt_nat xt_mark ipt_REJECT xt_tcpudp xt_comment loop veth binfmt_misc sch_htb ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs(C) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev psmouse serio_raw parport_pc ttm parport drm_kms_helper drm i2c_piix4 i2c_core processor button thermal_sys autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod ata_generic crct10dif_pclmul crct10dif_common xen_netfront xen_blkfront crc32c_intel ata_piix libata scsi_mod floppy
[   84.609355] CPU: 1 PID: 1853 Comm: docker Tainted: G         C    3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u4
[   84.609355] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
[   84.609355] task: ffff8801e3657470 ti: ffff8801e47a8000 task.ti: ffff8801e47a8000
[   84.609355] RIP: 0010:[<ffffffffa0577200>]  [<ffffffffa0577200>] pool_io_hints+0xf0/0x1a0 [dm_thin_pool]
[   84.609355] RSP: 0018:ffff8801e47abbc8  EFLAGS: 00010246
[   84.609355] RAX: 0000000000010000 RBX: ffff8801e4736840 RCX: ffff8801c2662000
[   84.609355] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801e48c4080
[   84.609355] RBP: ffff8801e47abc10 R08: 0000000000000000 R09: 0000000000000000
[   84.609355] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffa057f5c8
[   84.609355] R13: 0000000000000001 R14: ffff8801e47abc90 R15: 0000000000000131
[   84.609355] FS:  00007ff465daf700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000
[   84.609355] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.609355] CR2: 000000c207f1c3fb CR3: 00000001e2a5a000 CR4: 00000000001406e0
[   84.609355] Stack:
[   84.609355]  ffffffff810a7c71 0000000043e06d70 ffffc9000115f040 0000000000000000
[   84.609355]  0000000043e06d70 ffffc9000115f040 0000000000000000 ffff8800e9da3800
[   84.609355]  ffffffffa00ba615 000fffffffffffff 00000000ffffffff 00000000000000ff
[   84.609355] Call Trace:
[   84.609355]  [<ffffffff810a7c71>] ? complete+0x31/0x40
[   84.609355]  [<ffffffffa00ba615>] ? dm_calculate_queue_limits+0x95/0x130 [dm_mod]
[   84.609355]  [<ffffffffa00b7ec3>] ? dm_swap_table+0x73/0x320 [dm_mod]
[   84.609355]  [<ffffffffa00b0101>] ? crc_t10dif_generic+0x101/0x1000 [crct10dif_common]
[   84.609355]  [<ffffffffa00bd0d0>] ? table_load+0x330/0x330 [dm_mod]
[   84.609355]  [<ffffffffa00bd165>] ? dev_suspend+0x95/0x220 [dm_mod]
[   84.609355]  [<ffffffffa00bda55>] ? ctl_ioctl+0x205/0x430 [dm_mod]
[   84.609355]  [<ffffffffa00bdc8f>] ? dm_ctl_ioctl+0xf/0x20 [dm_mod]
[   84.609355]  [<ffffffff811ba99f>] ? do_vfs_ioctl+0x2cf/0x4b0
[   84.609355]  [<ffffffff810d485e>] ? SyS_futex+0x6e/0x150
[   84.609355]  [<ffffffff811bac01>] ? SyS_ioctl+0x81/0xa0
[   84.609355]  [<ffffffff81513ecd>] ? system_call_fast_compare_end+0x10/0x15
[   84.609355] Code: 0f 84 a5 00 00 00 3b 96 10 06 00 00 49 c7 c4 c8 f5 57 a0 77 26 8b b6 18 06 00 00 89 d0 c1 e0 09 48 39 f0 0f 82 92 00 00 00 31 d2 <48> f7 f6 85 d2 74 2d 49 c7 c4 70 f5 57 a0 66 90 48 89 e6 e8 28 
[   84.609355] RIP  [<ffffffffa0577200>] pool_io_hints+0xf0/0x1a0 [dm_thin_pool]
[   84.609355]  RSP <ffff8801e47abbc8>
[   84.770467] ---[ end trace fcce781faebae9ce ]---
[   84.773018] Kernel panic - not syncing: Fatal exception
[   84.775963] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[    6.096097] xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...
[   17.402123] reboot: Failed to start orderly shutdown: forcing the issue
[   17.407629] xenbus: xenbus_dev_shutdown: device/vif/0: Initialising != Connected, skipping
[   17.412875] xenbus: xenbus_dev_shutdown: device/vbd/51744: Initialising != Connected, skipping
[   17.417585] xenbus: xenbus_dev_shutdown: device/vbd/51712: Initialising != Connected, skipping
[   17.421263] xenbus: xenbus_dev_shutdown: device/vfb/0: Initialised != Connected, skipping
[   17.424839] ACPI: Preparing to enter system sleep state S5
[   17.427112] reboot: Power down
@rata rata changed the title Kernel panic when having a privileged container with docker 1.10 Kernel panic when having a privileged container with docker >= 1.10 Jun 22, 2016
@dchen1107 dchen1107 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 22, 2016
@girishkalele
Copy link

@rata

The Docker issue you pointed to is a different kernel panic - need to find if your panic has been reported before.

[   84.603270] divide error: 0000 [#1] SMP 
[   84.609355] Call Trace:
[   84.609355]  [<ffffffff810a7c71>] ? complete+0x31/0x40
[   84.609355]  [<ffffffffa00ba615>] ? dm_calculate_queue_limits+0x95/0x130 [dm_mod]
[   84.609355]  [<ffffffffa00b7ec3>] ? dm_swap_table+0x73/0x320 [dm_mod]
[   84.609355]  [<ffffffffa00b0101>] ? crc_t10dif_generic+0x101/0x1000 [crct10dif_common]
[   84.609355]  [<ffffffffa00bd0d0>] ? table_load+0x330/0x330 [dm_mod]
[   84.609355]  [<ffffffffa00bd165>] ? dev_suspend+0x95/0x220 [dm_mod]
[   84.609355]  [<ffffffffa00bda55>] ? ctl_ioctl+0x205/0x430 [dm_mod]
[   84.609355]  [<ffffffffa00bdc8f>] ? dm_ctl_ioctl+0xf/0x20 [dm_mod]
[   84.609355]  [<ffffffff811ba99f>] ? do_vfs_ioctl+0x2cf/0x4b0
[   84.609355]  [<ffffffff810d485e>] ? SyS_futex+0x6e/0x150
[   84.609355]  [<ffffffff811bac01>] ? SyS_ioctl+0x81/0xa0
[   84.609355]  [<ffffffff81513ecd>] ? system_call_fast_compare_end+0x10/0x15

and the one reported in the thread is a NULL pointer dereference with a different stack trace.

Mar 10 03:01:10 node01 kernel: [1691882.846915] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
Mar 10 03:01:10 node01 kernel: [1691882.846982] IP: [<ffffffff810a2c38>] pick_next_task_fair+0x6b8/0x820
Mar 10 03:01:10 node01 kernel: [1691882.847028] PGD 0 
Mar 10 03:01:10 node01 kernel: [1691882.856551] Call Trace:
Mar 10 03:01:10 node01 kernel: [1691882.856585]  [<ffffffff8101b975>] ? sched_clock+0x5/0x10
Mar 10 03:01:10 node01 kernel: [1691882.856622]  [<ffffffff8150fed6>] ? __schedule+0x106/0x700
Mar 10 03:01:10 node01 kernel: [1691882.856660]  [<ffffffff8108ea86>] ? smpboot_thread_fn+0xc6/0x190
Mar 10 03:01:10 node01 kernel: [1691882.856698]  [<ffffffff8108e9c0>] ? SyS_setgroups+0x170/0x170
Mar 10 03:01:10 node01 kernel: [1691882.856736]  [<ffffffff8108805d>] ? kthread+0xbd/0xe0
Mar 10 03:01:10 node01 kernel: [1691882.856772]  [<ffffffff81087fa0>] ? kthread_create_on_node+0x180/0x180
Mar 10 03:01:10 node01 kernel: [1691882.856811]  [<ffffffff81513c58>] ? ret_from_fork+0x58/0x90
Mar 10 03:01:10 node01 kernel: [1691882.856848]  [<ffffffff81087fa0>] ? kthread_create_on_node+0x180/0x180

@Random-Liu
Copy link
Member

It seems that @jfrazelle encountered both issues before.
https://gist.github.com/jfrazelle/df0667df1be407ef96c2

@rata @girishkalele

@jessfraz
Copy link
Contributor

ah that's a bad kernel I remember that, I think there is a minor release update for it in ubuntu that is much much better

@dchen1107
Copy link
Member

@rata Thanks for reporting the issue. @Random-Liu and I looked at the initial docker issue, and looks like there are several kernel panics and both docker 1.10.X and docker 1.11.X on various kernel versions are affected. So far, I didn't observe the same failure in our jenkins tests, it could be we paper over the issue somehow. Anyway, we should make the problem visible to the end users first, and help with the debugging and fix since it might affect our Kubernetes 1.3 users.

Here are the plan I am thinking:

  1. Document this as known issue for both docker 1.10.X and docker 1.11.X in our release
  2. Update NodeProblemDetector's manifest file to catch above kernel dumps, so that after the node come up, NodeProblemDector can report an KernelCrash event back to upstream components, and the users can understand why their applications are restarted.

@dchen1107 dchen1107 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 22, 2016
@Random-Liu
Copy link
Member

@dchen1107 SGTM! :)

@dchen1107
Copy link
Member

xref: docker 1.10.X (#19720), docker 1.11.X (#23397)

@rata
Copy link
Member Author

rata commented Jun 22, 2016

@girishkalele ohh, sorry. I was in a hurry and they looked similar, sorry didn't check in detail but didn't have the time.

@dchen1107: thanks!

Is there any way to have some confidence that upgrade to k8s 1.3 won't cause many issues with nodes crashing because of this? I mean, when thinking to upgrade my production cluster to 1.3, I may need to create a new one in 1.3, run things (only to test k8s) for a few weeks there and then maybe upgrade? There is no downgrade procedure, right?

Also, just curious: is it a problem if docker 1.9 continues to be used? Or in 1.3 we are using some features that require docker > 1.9? Just to know if that is an option too, until the problem is better understood

Maybe the bug is cause because some storage driver is used (and only affects that storage driver). My container was using debian:jessie and installed docker from docker's apt repositories and just started the daemon. I'm on mobile connection right now, so I can't check the driver easily. I can check it out in a few hours (like 6 hours) when I'm home again.

@Random-Liu
Copy link
Member

Also, just curious: is it a problem if docker 1.9 continues to be used? Or in 1.3 we are using some features that require docker > 1.9? Just to know if that is an option too, until the problem is better understood

1.9 should still be supported.

Maybe the bug is cause because some storage driver is used (and only affects that storage driver). My container was using debian:jessie and installed docker from docker's apt repositories and just started the daemon. I'm on mobile connection right now, so I can't check the driver easily. I can check it out in a few hours (like 6 hours) when I'm home again.

I think it's aufs based on the kernel log and how you installed docker. :)

@dchen1107
Copy link
Member

Yes, 1.9.1 is still compatible with Kubernetes 1.3 release here.

@rata
Copy link
Member Author

rata commented Jun 23, 2016

@Random-Liu @dchen1107: awesome, thanks! I'll try using another storage driver and report back if I hit it or not :-)

@rata
Copy link
Member Author

rata commented Jun 23, 2016

It seems kubernetes 1.2.4 in AWS uses docker with AUFS:

root      5825  2.0  0.7 2117088 60908 ?       Ssl  May27 787:20 /usr/bin/docker daemon -H fd:// -s aufs -g /mnt/ephemeral/docker --bridge=cbr0 --iptables=false --ip-masq=false --log-level=warn

Is this the case on GKE and GCE too?

I'll check what is the default storage driver in k8s 1.3

@dchen1107
Copy link
Member

@rata, Kubernetes today support 3 different storage driver: aufs, overlayfs, and devicemapper. On both GKE and GCE case, Kubernetes are using aufs. We are switching to overlayfs through a new containervm image: gci, but just start this process.

@rata
Copy link
Member Author

rata commented Jun 23, 2016

@dchen1107: thanks for the info. It seems it's difficult for me to use another storage driver as the kube-up setup on AWS uses aufs and as nodes crashes, nodes are created again with aufs and not easy to use other format without modifying the Auto Scaling Group.

@Random-Liu
Copy link
Member

Random-Liu commented Jun 24, 2016

@rata @dchen1107 @girishkalele
FYI, the node problem detector v0.2 should be able to report the kernel panic to the control plane as event.
This will at least surface the problem to the user.

See kubernetes/node-problem-detector#22

@philips
Copy link
Contributor

philips commented Jun 27, 2016

I am confused on what is going on here.

During the release burndown @mike-saparov mentioned that we are considering recommending Docker v1.9 for k8s v1.3 because of this Kernel bug. However, it seems more reasonable to document it and ask the distros to patch their Kernels.

Can someone give an update on what the current thinking is for the release?

@rata
Copy link
Member Author

rata commented Jun 27, 2016

@philips: Maybe trying to reproduce helps you to have a better idea? That's the only thing I can add, the rest of the message is mostly about that and nothing else. So feel free to ignore :-)

I can easily reproduce this using a a pod with two containers: a) privileged container with debian jessie with docker 1.11.2 or 1.10.3 from docker repos (it happens with both), b) docker-gc branch "fixes" from https://github.com/rata/docker-gc (actually, I realize the repo at work has a small script that sleeps and runs docker-gc in an infinite loop and that is run). Although, if I only use a pod with only one container with docker >= 1.10 installed in debian jessie and listen as a daemon and use it via the network to build docker images (just like the pod with 2 containers, but without docker-gc cache is not deleted), then after a few days it crashes too. But with docker-gc it crashes way faster.

I can upload the Dockerfiles and yamls used if someone wants them

I'm not sure if this bug has been fixed upstream or if @jfrazelle, that also saw this, knows a work around. Nor if the tests and people is using a newer docker version without issues. Maybe the bug is related to something docker-gc does and unlikely to happen. But to the best of my knowledge, is not known. And also, kubernetes deletes docker images when there is not enough space free, not sure if that (or something else that kubernetes might do and I don't know) makes it more likely to happen.

I'll not have time to try to fix the kernel bug (or try newer kernels and see if it doesn't happen) these days. But no problem to help someone reproduce, upload the dockerfiles and deployments I use, etc.

@dchen1107
Copy link
Member

@philips We didn't recommend Docker 1.9 for k8s v1.3 yet, also what we discussed at burndown meeting has nothing to do with this issue. For this one, we plan to document it, or suggested the user of the node-problem-detector to upgrade their detector so that the kernel issues are visible to the end users. Also the users can understand why their applications being restarted or why their nodes being rebooted.

At burndown meeting, we talked about 1.3 blocker issue: #27691. The engineers suspected the issues in either Kubernetes component (we changed the entire code path for 1.3) or docker runtime code. To narrow down the issue, we decided to run some tests against docker 1.9.1, and kubernetes 1.3 beta.

@mike-saparov
Copy link

@dchen1107 thanks for clarification!

@Random-Liu
Copy link
Member

Random-Liu commented Jun 29, 2016

XREF #27076

@xinuc
Copy link

xinuc commented Aug 4, 2016

this error still happens with Docker 1.12 btw.

@rata
Copy link
Member Author

rata commented Aug 4, 2016

Just in case it's useful to someone, I workaround this by basically writing to an external volume.

The pod that builds docker images now uses an EBS volume mounted on /var/lib/docker and this issue never happened again (so far, at least). This makes sense, as it seemed to be an aufs related issue and now it is not using it to write docker images.

@xinuc
Copy link

xinuc commented Aug 5, 2016

we downgraded kubernetes to 1.2.6 but keep using docker 1.12, and the problem disappear.

so it's kubernetes 1.3's issue.

@rata
Copy link
Member Author

rata commented Aug 5, 2016

On Fri, Aug 05, 2016 at 04:15:11AM -0700, Nugroho Herucahyono wrote:

we downgraded kubernetes to 1.2.6 but keep using docker 1.12, and the problem disappear.

so it's kubernetes 1.3's issue.

A kernel bug seems more like a kernel issue :)

What kernel version? Can you upgrade your kernel and see if the issue persist
with k8s 1.3?

@chy168
Copy link

chy168 commented Oct 15, 2016

the same error happened:

aufs au_opts_verify:1597:dockerd[4583]: dirperm1 breaks the protection by the permission bits on the lower branch

server info:
Kubernetes: 1.3.6
Linux kernel: 4.4.0-38-generic
Docker version: 1.12.1

@cmluciano
Copy link

Anyone still observing this behavior?

@rata
Copy link
Member Author

rata commented Mar 1, 2017

@cmluciano Using the workaround I posted it doesn't happen. And it seems that with newer kernels it also doesn't happen. Are you seeing it? which k8s, docker and kernel version?

@cmluciano
Copy link

I have not, wondering if this issue should be closed

@rata
Copy link
Member Author

rata commented Mar 1, 2017

@cmluciano oh, good point. Will close it, it can be reopened if relevant. Thanks!

@rata rata closed this as completed Mar 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

10 participants