Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validated kernel version / filesystem plugin for 1.3? #30706

Closed
justinsb opened this issue Aug 16, 2016 · 33 comments
Closed

Validated kernel version / filesystem plugin for 1.3? #30706

justinsb opened this issue Aug 16, 2016 · 33 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@justinsb
Copy link
Member

We're seeing some kernel panics with the default AWS images, which use Debian Jessie & aufs, with k8s 1.3 & Docker 1.11.2.

What is the validated kernel version and filesystem with k8s 1.3?

cc @dchen1107

@k8s-github-robot k8s-github-robot added area/kubelet sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Aug 16, 2016
@justinsb justinsb added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Aug 16, 2016
@chrislovecnm
Copy link
Contributor

Do you have any information on the panic?

@rata
Copy link
Member

rata commented Aug 17, 2016

@justinsb can you post the panic? (it's available via the AWS console, in general). Are you doing heavy I/O on the docker root volume? Is aufs in the call trace?

I've seen a panic with 1.2 and newer docker daemon running on a container: #27885. If that is what you see, my workaround was to use a different backing volume and not write to the container root (but not tried k8s 1.3).

I don't know which version it is validated, but following links from the bug above it seems this issue is the one to track it and was not closed yet: #25893. Don't know if you already found it, it took me a while because I was looking at closed issues :)

@justinsb
Copy link
Member Author

Meta: if anyone knows how to persuade debian / journald to log panics, that would be appreciated :-)

Thanks for the links @rata - it looks a lot like the docker issue you linked to (but was different from your aufs issue): moby/moby#21081

Panics consistently appear to be in process scheduling. Looking at the kernel changelog for this, it seems very likely that a newer kernel will fix this (the source file in question changes a lot), but AFAICT GCE doesn't use a newer kernel either with containervm.

[17912.211411] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[17912.214609] IP: [<ffffffff8109f950>] check_preempt_wakeup+0xd0/0x1d0
[17912.214609] PGD 2b9c70067 PUD 2b9ef3067 PMD 0 
[17912.214609] Oops: 0000 [#1] SMP 
[17912.214609] Modules linked in: nf_conntrack_netlink nfnetlink ipt_REJECT xt_statistic xt_nat xt_tcpudp xt_recent xt_mark xt_comment iptable_filter veth sch_htb ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs(C) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc intel_rapl crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper evdev ablk_helper cryptd serio_raw ttm drm_kms_helper drm i2c_piix4 parport_pc parport i2c_core processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod ata_generic crct10dif_pclmul crct10dif_common xen_blkfront ata_piix libata crc32c_intel psmouse ixgbevf(O) scsi_mod
[17912.214609] CPU: 1 PID: 20461 Comm: exe Tainted: G         C O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2+deb8u3
[17912.214609] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
[17912.214609] task: ffff8802995c6a20 ti: ffff8802963d4000 task.ti: ffff8802963d4000
[17912.214609] RIP: 0010:[<ffffffff8109f950>]  [<ffffffff8109f950>] check_preempt_wakeup+0xd0/0x1d0
[17912.214609] RSP: 0018:ffff8802963d7e60  EFLAGS: 00010006
[17912.214609] RAX: 0000000000000000 RBX: ffff8800eb160280 RCX: 0000000000000008
[17912.214609] RDX: 0000000000000000 RSI: ffff8802b9e60110 RDI: ffff88040fc72f78
[17912.214609] RBP: 0000000000000000 R08: ffffffff81610960 R09: 0000000000000000
[17912.214609] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800eb2ef630
[17912.214609] R13: ffff88040fc72f00 R14: 0000000000000000 R15: 0000000000000000
[17912.214609] FS:  00007f2f37e13740(0000) GS:ffff88040fc20000(0000) knlGS:0000000000000000
[17912.214609] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17912.214609] CR2: 0000000000000078 CR3: 00000002a0846000 CR4: 00000000001406e0
[17912.214609] Stack:
[17912.214609]  ffffffff81099d77 ffff88040fc72f00 ffff8802b9e60110 ffff88040fc72f00
[17912.214609]  ffff8802b9e60794 0000000000000246 ffff880087709d40 ffffffff81094565
[17912.214609]  0000000000012f00 ffffffff810969ea 00007fffffffeffd ffff8802b9e60110
[17912.214609] Call Trace:
[17912.214609]  [<ffffffff81099d77>] ? sched_clock_cpu+0x47/0xb0
[17912.214609]  [<ffffffff81094565>] ? check_preempt_curr+0x85/0xa0
[17912.214609]  [<ffffffff810969ea>] ? wake_up_new_task+0xda/0x160
[17912.214609]  [<ffffffff8106698c>] ? do_fork+0x13c/0x390
[17912.214609]  [<ffffffff81514579>] ? stub_clone+0x69/0x90
[17912.214609]  [<ffffffff8151420d>] ? system_call_fast_compare_end+0x10/0x15
[17912.214609] Code: 39 c2 7d 27 0f 1f 80 00 00 00 00 83 e8 01 48 8b 5b 70 39 d0 75 f5 48 8b 7d 78 48 3b 7b 78 74 15 0f 1f 00 48 8b 6d 70 48 8b 5b 70 <48> 8b 7d 78 48 3b 7b 78 75 ee 48 85 ff 74 e9 e8 ec cb ff ff 48 
[17912.332074] RIP  [<ffffffff8109f950>] check_preempt_wakeup+0xd0/0x1d0
[17912.332074]  RSP <ffff8802963d7e60>
[17912.332074] CR2: 0000000000000078
[17912.332074] ---[ end trace a064d4acc250ee12 ]---
[17912.332074] Kernel panic - not syncing: Fatal exception
[17912.332074] Shutting down cpus with NMI
[17912.332074] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
[61708.106066] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[61708.109974] IP: [<ffffffff8109f9f0>] check_preempt_wakeup+0xd0/0x1d0
[61708.109974] PGD 36ac6067 PUD 8494a067 PMD 0 
[61708.109974] Oops: 0000 [#1] SMP 
[61708.109974] Modules linked in: xt_statistic xt_nat xt_tcpudp xt_recent xt_mark xt_comment sch_htb veth ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs(C) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc intel_rapl crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper evdev ablk_helper i2c_piix4 ppdev parport_pc parport cryptd serio_raw ttm drm_kms_helper drm i2c_core processor button thermal_sys autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod ata_generic ata_piix libata xen_blkfront crct10dif_pclmul crct10dif_common scsi_mod crc32c_intel psmouse ixgbevf(O)
[61708.109974] CPU: 1 PID: 17650 Comm: exe Tainted: G         C O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2
[61708.109974] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
[61708.109974] task: ffff8803f9ffea60 ti: ffff880084a64000 task.ti: ffff880084a64000
[61708.109974] RIP: 0010:[<ffffffff8109f9f0>]  [<ffffffff8109f9f0>] check_preempt_wakeup+0xd0/0x1d0
[61708.109974] RSP: 0000:ffff88040fc23d00  EFLAGS: 00010006
[61708.109974] RAX: 0000000000000000 RBX: ffff8800eb325700 RCX: 0000000000000008
[61708.109974] RDX: 0000000000000000 RSI: ffff8802bba00ca0 RDI: ffff88040fc32f78
[61708.109974] RBP: 0000000000000000 R08: ffffffff81610960 R09: 0000000000000001
[61708.109974] R10: 0000000000000000 R11: 0000000000000010 R12: ffff8803f9ffea60
[61708.109974] R13: ffff88040fc32f00 R14: 0000000000000000 R15: 0000000000000000
[61708.109974] FS:  00007f2c711e9740(0000) GS:ffff88040fc20000(0000) knlGS:0000000000000000
[61708.109974] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[61708.109974] CR2: 0000000000000078 CR3: 0000000084b13000 CR4: 00000000001406e0
[61708.109974] Stack:
[61708.109974]  ffffffff8109e902 ffff88040fc32f00 ffff8802bba00ca0 ffff88040fc32f00
[61708.109974]  0000000000000046 0000000000000001 0000000000000001 ffffffff81094605
[61708.109974]  ffff8802bba00ca0 ffffffff81094634 ffff8802bba00ca0 ffff88040fc32f00
[61708.109974] Call Trace:
[61708.109974]  <IRQ> 
[61708.109974]  [<ffffffff8109e902>] ? enqueue_task_fair+0x7f2/0xe20
[61708.109974]  [<ffffffff81094605>] ? check_preempt_curr+0x85/0xa0
[61708.109974]  [<ffffffff81094634>] ? ttwu_do_wakeup+0x14/0xd0
[61708.109974]  [<ffffffff81096f6e>] ? try_to_wake_up+0x1ce/0x2d0
[61708.109974]  [<ffffffff8108ab30>] ? hrtimer_get_res+0x50/0x50
[61708.109974]  [<ffffffff8108ab4e>] ? hrtimer_wakeup+0x1e/0x30
[61708.109974]  [<ffffffff8108b187>] ? __run_hrtimer+0x67/0x1c0
[61708.109974]  [<ffffffff8108b539>] ? hrtimer_interrupt+0xe9/0x220
[61708.109974]  [<ffffffff8100a04a>] ? xen_timer_interrupt+0x2a/0x150
[61708.109974]  [<ffffffffa04ac220>] ? br_handle_frame+0x170/0x240 [bridge]
[61708.109974]  [<ffffffff81391b9d>] ? add_interrupt_randomness+0x3d/0x1f0
[61708.109974]  [<ffffffff810bb375>] ? handle_irq_event_percpu+0x35/0x190
[61708.109974]  [<ffffffff810be84e>] ? handle_percpu_irq+0x3e/0x60
[61708.109974]  [<ffffffff810ba7a6>] ? generic_handle_irq+0x26/0x40
[61708.109974]  [<ffffffff8135acc0>] ? evtchn_2l_handle_events+0x260/0x270
[61708.109974]  [<ffffffff8135860f>] ? __xen_evtchn_do_upcall+0x3f/0x70
[61708.109974]  [<ffffffff8135a27f>] ? xen_evtchn_do_upcall+0x2f/0x50
[61708.109974]  [<ffffffff815167cd>] ? xen_hvm_callback_vector+0x6d/0x80
[61708.109974]  <EOI> 
[61708.109974]  [<ffffffff811656c1>] ? add_mm_counter_fast+0x21/0x30
[61708.109974]  [<ffffffff811682b8>] ? do_set_pte+0x88/0xe0
[61708.109974]  [<ffffffff8113dd42>] ? filemap_map_pages+0x1d2/0x230
[61708.109974]  [<ffffffff810d1f4e>] ? futex_wait+0x17e/0x260
[61708.109974]  [<ffffffff811685b2>] ? do_read_fault.isra.54+0x2a2/0x300
[61708.109974]  [<ffffffff81169b8c>] ? handle_mm_fault+0x63c/0x11c0
[61708.109974]  [<ffffffff8151319e>] ? mutex_lock+0xe/0x2a
[61708.109974]  [<ffffffff81094859>] ? set_task_cpu+0x99/0x1a0
[61708.109974]  [<ffffffff812aa8c0>] ? cpumask_next_and+0x30/0x40
[61708.109974]  [<ffffffff8109bc0f>] ? select_task_rq_fair+0x26f/0x700
[61708.109974]  [<ffffffff810572b7>] ? __do_page_fault+0x177/0x4f0
[61708.109974]  [<ffffffff81094605>] ? check_preempt_curr+0x85/0xa0
[61708.109974]  [<ffffffff81096a8a>] ? wake_up_new_task+0xda/0x160
[61708.109974]  [<ffffffff81066a22>] ? do_fork+0x152/0x390
[61708.109974]  [<ffffffff810d4b9e>] ? SyS_futex+0x6e/0x150
[61708.109974]  [<ffffffff81516a28>] ? page_fault+0x28/0x30
[61708.109974] Code: 39 c2 7d 27 0f 1f 80 00 00 00 00 83 e8 01 48 8b 5b 70 39 d0 75 f5 48 8b 7d 78 48 3b 7b 78 74 15 0f 1f 00 48 8b 6d 70 48 8b 5b 70 <48> 8b 7d 78 48 3b 7b 78 75 ee 48 85 ff 74 e9 e8 ec cb ff ff 48 
[61708.109974] RIP  [<ffffffff8109f9f0>] check_preempt_wakeup+0xd0/0x1d0
[61708.109974]  RSP <ffff88040fc23d00>
[61708.109974] CR2: 0000000000000078
[61708.109974] ---[ end trace 1700754226256280 ]---
[61708.109974] Kernel panic - not syncing: Fatal exception in interrupt
[61708.109974] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation ran

@rata
Copy link
Member

rata commented Aug 17, 2016

@justinsb cool. Just curious, what kernel in debian jessie (like apt-cache policy output for the kernel pkg)?

@justinsb
Copy link
Member Author

justinsb commented Aug 17, 2016

@rata 3.16.7-ckt25-2+deb8u3 and also 3.16.7-ckt25-2

(I think that's what you mean, right?)

@dchen1107 dchen1107 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Aug 18, 2016
@dchen1107
Copy link
Member

@justinsb We need to investigate on this. We didn't change minimal kernel version and docker version for both 1.3 and 1.4 release.

I did observe the same kernel panics before 1.3 release and we even introduces a node-problem-detector daemonset to make the issue visible. But with kubernetes 1.3 + docker 1.11.2 together on containervm e2e tests, I haven't seen such kernel panics.

One possibility is the different configurations on the node? For example, on GCE, we switched to use kubernet, not using docker's network component at all. How about aws?

@justinsb
Copy link
Member Author

Thanks @dchen1107 AWS is not using kubenet (though it should). I can probably upgrade the cluster most affected to use kubenet and see if it makes a diference.

Also, for unknown reasons, we aren't logging the kernel panic into the journald logs. Would the NPD pick up on it if it is not in the logs, but only visible e.g. in uptime? We currently get the hint from the uptime, and then confirm by looking at the AWS console output.

@chrislovecnm
Copy link
Contributor

cc @fandingo

@dchen1107
Copy link
Member

@justinsb today NPD can only pick up the issue from the logs. But which logs, and what format with regex can be configured easily.

In kernel panic case, Is there anyway to get console screenshot during reboot, something like startup script?

@justinsb
Copy link
Member Author

Just switched to kubenet and had another panic within a few hours (in the scheduler again).

[10905.003885] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[10905.007856] IP: [<ffffffff8109f9f0>] check_preempt_wakeup+0xd0/0x1d0
[10905.007856] PGD 16664b067 PUD 103790067 PMD 0 
[10905.007856] Oops: 0000 [#1] SMP 
[10905.007856] Modules linked in: nf_conntrack_netlink nfnetlink xt_statistic xt_nat xt_recent xt_mark ipt_REJECT xt_tcpudp sch_htb veth xt_comment ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs(C) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc evdev ppdev parport_pc parport intel_rapl ttm i2c_piix4 drm_kms_helper crc32_pclmul drm processor thermal_sys i2c_core aesni_intel aes_x86_64 lrw gf128mul glue_helper button ablk_helper serio_raw cryptd autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod ata_generic xen_blkfront crct10dif_pclmul crct10dif_common ata_piix crc32c_intel libata psmouse scsi_mod ixgbevf(O)
[10905.007856] CPU: 3 PID: 19256 Comm: exe Tainted: G         C O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2
[10905.007856] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
[10905.007856] task: ffff880245d80210 ti: ffff88015bc0c000 task.ti: ffff88015bc0c000
[10905.007856] RIP: 0010:[<ffffffff8109f9f0>]  [<ffffffff8109f9f0>] check_preempt_wakeup+0xd0/0x1d0
[10905.007856] RSP: 0018:ffff88015bc0fe60  EFLAGS: 00010006
[10905.007856] RAX: 0000000000000000 RBX: ffff88027c906280 RCX: 0000000000000008
[10905.007856] RDX: 0000000000000000 RSI: ffff880077ca81d0 RDI: ffff88040fc72f78
[10905.007856] RBP: 0000000000000000 R08: ffffffff81610960 R09: 0000000000000001
[10905.007856] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880245d80210
[10905.007856] R13: ffff88040fc72f00 R14: 0000000000000000 R15: 0000000000000000
[10905.007856] FS:  00007f38d545d740(0000) GS:ffff88040fc60000(0000) knlGS:0000000000000000
[10905.007856] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10905.007856] CR2: 0000000000000078 CR3: 0000000245d9f000 CR4: 00000000001406e0
[10905.007856] Stack:
[10905.007856]  ffffffff81099e58 ffff88040fc72f00 ffff880077ca81d0 ffff88040fc72f00
[10905.007856]  ffff880077ca8854 0000000000000246 ffff88017194b7c0 ffffffff81094605
[10905.007856]  0000000000012f00 ffffffff81096a8a 00007fffffffeffd ffff880077ca81d0
[10905.007856] Call Trace:
[10905.007856]  [<ffffffff81099e58>] ? sched_clock_cpu+0x88/0xb0
[10905.007856]  [<ffffffff81094605>] ? check_preempt_curr+0x85/0xa0
[10905.007856]  [<ffffffff81096a8a>] ? wake_up_new_task+0xda/0x160
[10905.007856]  [<ffffffff81066a0c>] ? do_fork+0x13c/0x390
[10905.007856]  [<ffffffff81514d79>] ? stub_clone+0x69/0x90
[10905.007856]  [<ffffffff81514a0d>] ? system_call_fast_compare_end+0x10/0x15

@justinsb
Copy link
Member Author

justinsb commented Aug 22, 2016

@dchen1107 I'm going to look at whether we can be sure to collect the panic output into the journald log. I would much rather do that "normally" instead of scraping the console output, but worst case it is a good suggestion!

@justinsb
Copy link
Member Author

justinsb commented Sep 6, 2016

We've tried a lot of things, and the only thing that seems to work is to run a 4.4 kernel (we are running a 4.4.19 kernel I built). This also entails running overlayfs.

I'm going to work on building an image that offers this as an option, and maybe we can consider making it the default on 1.4 on AWS.

@dchen1107
Copy link
Member

@justinsb Google GCI team is currently working together with node team to qualify GCI image for GKE/GCE replacing today's debian-based container-vm image. Since you are considering to rebase your image to newer kernel, why not sync with what GCI has so that we can provide better support for our aws users? Currently GCI image is on 4.4.14. cc/ @yinghan @kubernetes/goog-image

@justinsb could you please provide the detail information on what are not working on your current image? I understood there are kernel panic, unfortunately we saw such panic at very rare rate in house. But looks like in your case, it occurs very frequently. We need to understand the discrepancy and try to minimize them so that we can consolidate our effort to provide the better service in general to the users. Meanwhile, since Kubernetes is an open-source project, we can only harden minimal kernel version (3.10?) and many other configurations, instead of forcing a homogeneous environment, especially for the users running Kubernetes on the private clouds.

@justinsb justinsb added this to the v1.4 milestone Sep 6, 2016
@justinsb
Copy link
Member Author

justinsb commented Sep 7, 2016

@dchen1107 Yes, for the AWS image I will try to sync with GCI as closely as I can, if we go to 4.4 as the default on AWS. Although if you're maintaining 4.4.14 + patches, I do wonder if official upstream (4.4.19) would be better than 4.4.14 without patches. I don't know how different these will be in practice though. My feeling is that testing effort will inevitably focus on 4.4 kernels & overlayfs, if GCI, CoreOS & Ubuntu 16.04 are all on 4.4 kernels, and remaining on an earlier kernel is just going to cause more and more problems.

The panics are as I reported previously in the thread. The stack traces do not seem to change significantly: always in the scheduler, always segfaults slightly offset from 0x0. I have not been able to track down any root cause other than the m4 instance types, which are slightly newer chips, and typically with more cores. I originally thought it was a particular version of the ixgbevf driver, but the evidence does not appear to support that (unless all the versions I tried are bad, and the version in the 4.4 kernel fixes it, which is not impossible - the driver is pretty active).

I totally understand that we can't mandate 4.4, but we should not shy away from saying "here is the configuration k8s tests, and we run a lot of tests". The node team does a lot of under-appreciated work to find a working configuration, and I want to remain reasonably close to that with the default AWS configuration.

I tried getting the segfault to appear reliably in the journal and have given up. Capturing the aws console output and dumping it to a file seems more fruitful. Haven't yet had time to do that though...

@matchstick
Copy link
Contributor

@dchen1107 Assigning to you for triage right now. AS it is a 1.4 P0. If you are the wrong person I apologize. Also @thockin and @vishh should have this on their radar.

@dchen1107
Copy link
Member

@matchstick No need to apologize, this is on my radar all the time. And I know @justinsb is working on this actively and diligently. The only reason I didn't assign to either me or @justinsb is that this is not in 1.4 milestone, and I cannot do much on aws's image besides offering the suggestions. @justinsb Do you think this should block 1.4 release?

@justinsb yes, you and I are on the same page to align AWS node configuration with GCE ones so that we can get rid of one more discrepancy between AWS and GKE, and provide better support to our users on AWS. Thanks for all your support & effort on getting AWS configuration as close as possible to ours. GCI team is working on open source GCI. If that process is finalized, we can help to build an AWS kubernetes image based on GCI. Unfortunately we are not there yet. cc/ @yinghan @aronchick @mansoorj

On another hand, node team maintains node e2e test infrastructure and performance the tests on a list of images. You can access the test result at https://k8s-testgrid.appspot.com/google-node. We didn't attempt to exclude any images from that project. If you build your image based 4.4.19, you can add your image to our image project, we can include that in our test metrics too. The only caveat is that we shouldn't make the breakage on those image block our submit-queue.

@pwittrock
Copy link
Member

@justinsb @dchen1107

I am leaving this in the 1.4 release as a blocking issue. Please continue to provide daily status updates to this issue or move it out of the release and into v1.4-nonblocking where it will no longer be tracked as a release blocking issue in the burndown meetings.

@justinsb
Copy link
Member Author

justinsb commented Sep 8, 2016

I can probably take this as assigned to me. I don't think it should block the whole k8s release, but it is a pretty serious issue for AWS and I want to get it resolved for 1.4.

I'm planning on taking a look at the kernel you are running in GCI and trying to figure out how many patches you are carrying (any links greatly appreciated). My concern is that if we ship an image with our own kernel, we then have to build the kernel/AMI going forward whenever there is e.g. a security issue. It is pretty attractive to me to just build from the official 4.4 Linux LTS kernel and rely on their work, rather than trying to maintain a set of patches and get involved in kernel cherry-picks. And I believe this problem may go away around the end of the year with debian, because the next version of debian should include the next Linux LTS kernel.

And big thanks @dchen1107 for helping me figure out which kernel we should be running here! I'll update as I continue looking around (probably with more questions, I'm afraid). If you'd rather assign to me please do so!

@justinsb
Copy link
Member Author

justinsb commented Sep 8, 2016

@vishh Debian Jessie has an older kernel which kernel panics on m4 instance types. I suspect we just haven't been seeing it because (1) it is rare and (2) it only happen on certain instance types and (3) it is non-trivial to capture a kernel panic on a systemd system, so it might be happening and we might just not be noticing. I don't know if the NPD would detect kernel panics on non-journald systems?

Ubuntu Trusty is really old, and it seems very likely to fall victim to the same problems (although Ubuntu does a good job of backporting kernels). However, if we're going to run Trusty with a backported kernel, at that point we might as well use Ubuntu Xenial (which is 4.4, LTS, and systemd). I think that is another option, but I also want an option for Debian because we had to go with debian for 1.2 because there was no sufficiently supported version of Ubuntu at that time, and we want some continuity.

@dchen1107
Copy link
Member

@justinsb I leave this with 1.4 milestone and P0, but marked it as non-blocker. I assigned this one to you since you are the one doing the real work. I still leave myself as another assignee to help you figure out the short-term workaround and work on the long-term strategy on qualifying the image running on AWS. Building from the official 4.4 Linux LTS kernel sounds good to me for me. Again, let's add your image to node e2e test image project and run test suite against that daily.

@vishh Please read the initial description @justinsb wrote. They are using debian jessie (3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2+deb8u3) for AWS already, and observing the kernel crashes with high frequency. The initial suggestion is making those kernel crash visible for the end user / cluster admin first. We added the support to NodeProblemDector, but @justinsb ran into some issues of logging the kernel panic into the journald logs on AWS nodes.

@dchen1107
Copy link
Member

@justinsb You can find the kernel sources for GCI from their release notes at:
https://cloud.google.com/compute/docs/containers/vm-image/release-notes

@pwittrock pwittrock modified the milestones: v1.4-nonblocking, v1.4 Sep 8, 2016
@chrislovecnm
Copy link
Contributor

Just to throw my two cents in, we have to have a Debian tested version. Custom kernel is fine, but we need the .config on a stable configuration. Multiple parties are using Debian, and we need to make this happen.

Let me know if you need any support with this. We have already scheduled an upgrade in the next couple of weeks.

@justinsb we are in your debt for finding this one!

@chrislovecnm
Copy link
Contributor

@dchen1107 so here are some thoughts

  1. don't want to run GCI quite yet, till it is burned in more on EC2
  2. don't want to really maintain custom patches, unless someone can say, you have to have this. No choice.
  3. kinda leaning toward using vanilla LTS kernel

Opinion?

@dchen1107
Copy link
Member

@chrislovecnm Agreed with you above. We, GCE / GKE side are moving toward the direction listed above: 2) and 3). Also I'm not suggesting to running GCI unless it is completely open sourced.

@chrislovecnm
Copy link
Contributor

@justinsb what is the status on your patched Kernel? You have that into production?

@dims
Copy link
Member

dims commented Nov 9, 2016

Is this still a p0 (open since Aug!)?

@chrislovecnm
Copy link
Contributor

@dims this is listed as non-blocking but yah....

@dims
Copy link
Member

dims commented Jun 6, 2017

/remove-priority critical-urgent

@k8s-ci-robot k8s-ci-robot removed the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 6, 2017
@chrislovecnm
Copy link
Contributor

@dims I am not sure what priority to list, but we do not have a decent answer, and I am guessing that this is not documented well.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 28, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests