-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1000 node cluster, 1 kube-proxy comes up with "open /sys/module/nf_conntrack/parameters/hashsize: read-only file system" #24295
Comments
Why is /sys read-only? What version of Docker is it? try |
This is And yes, that's our mystery, too. I'm trying to figure out if docker is screwing us on this node. |
Grabbing the exited container now, just remembered. |
Here's ~~~one~~~ two of them:
|
Cutting out unrelated stuff.
On this machine: Both are docker 1.9.1, built Fri Nov 20 17:56:04 UTC 2015 Both are same kernel ????? |
I want to try a reboot or docker restart, but I am afraid it will fix the problem... |
Yeah, funny, we were trying similar experiments just now. :) |
moby/moby#7101 ??? |
Perhaps we need to add this as part of a node test suite? @dchen1107 @yujuhong |
Pinged the Docker bug. Yes, it might be valuable if the |
If you mean the node e2e suite, yes, we can add checks to see if the mounts are correct in a privileged container.
This requires a different type of check that performs docker write operations (e.g., creating dockers containers periodically). We talked about it briefly before v1.2, but didn't have time to act on it. I am not sure if this is in the scope of the "problem API" targeted for v1.3 (/cc @dchen1107 @Random-Liu). |
Are we going to have a docker health checker on nodes? This could be one
|
Why did you close this?
|
? I didn't do that??? |
Maybe misclick... Sorry about that... :( |
We talked about having an API so that kubelet can aggregate various issues from independent health checkers on the machines and surface the information. This could include a docker (or any container runtime) health checker. Now that I think about it, restarting/fixing the problem may not be part of the original scope. @dchen1107 had stronger opinions on this though. |
This is showing up very frequently on 1000 node clusters. Or at least, a symptom like it, in that
It's enough that this needs to be considered a priority for scalability work. |
Update the docker bug? |
I get this error message always when i try starting |
This is going to need attention before 1.3 ships. The answer from Docker seems to be "upgrade Docker and call us in the morning". |
Regarding to Yuju's commented at #24295 (comment) This issue is out-of-scope of node problem detector:
@zmerlynn mentioned that restarting docker daemon can remedy the problem, but so far I didn't find any signal to info docker daemon to restart. I think deleting and recreating pod on the node should fix the issue. But need to verify that. |
@mgoelzer Thanks for chiming in and giving us the suggestion on docker version for next release. The issue is one of many examples we are currently encountering with docker release. cc/ @thockin |
@zmerlynn gave me a node which running into this problem.
Nothing is in docker log again! |
Awesome. On Thu, May 12, 2016 at 5:09 PM, Dawn Chen notifications@github.com wrote:
|
Is there a corresponding docker issue? |
moby/moby#7101
|
That issue looks slightly different. Inside the container in the docker issue /sys is mounted ro when the user greps mounts. In Dawn's snippet /sys is still mounted rw inside the container. Possibly an issue with capabilities? |
(sorry still catching up) perhaps we should file a sperate issue for this. |
I think it was r/w only if explicitly remounted, but it starts as to in
|
@dchen1107 @thockin - Which of you is driving this? I thought someone had the ball on this bug. |
I was not driving a docker fix, and I think we prioritized this down for On Mon, May 23, 2016 at 7:36 AM, Zach Loafman notifications@github.com
|
What the difference between this issue and #25543? I thought we moved all the discussion to the new issue #25543 for a temporary workaround for 1.3. I just updated #25543 with the latest proposal. @zmerlynn Could you please file a separate issue against docker? I agreed with @mikedanese above it is a different docker issue: moby/moby#7101 |
I was wondering why #25543 was opened at all, it's basically a dupe. :) |
Honestly, I forgot that a dupe had been opened. We still need traction on that bug. I don't have time to chase this issue with Docker, and the last time I pinged them, they told me to GFY until we were on 1.11: moby/moby#7101 (comment) |
Do we still observe the problem with docker 1.11.X? If no, I want to close this; otherwise, we are going to reconfigure NodeProblemDetector to make the issue visible. |
Close this one as dup of #25543 |
I just brought up a 1000 node cluster, and had one node with
kube-proxy
flapping in the following manner:I compared this to a working
kube-proxy
, and it's clear that theopen /sys/module/nf_conntrack/parameters/hashsize: read-only file system
is obviously failing.dmesg
is showing:Could possibly be related to openshift/origin#7977
Keeping it around for posterity, but this is kind of an expensive cluster.
The text was updated successfully, but these errors were encountered: