Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for disabling /logs endpoint in kubelet #87273

Merged
merged 1 commit into from
Jul 7, 2020
Merged

Add support for disabling /logs endpoint in kubelet #87273

merged 1 commit into from
Jul 7, 2020

Conversation

SaranBalaji90
Copy link
Contributor

@SaranBalaji90 SaranBalaji90 commented Jan 16, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds "EnableSystemLogHandler" flag to kubelet configuration similar to "EnableDebuggingHandler". Based on the value of EnableSystemLogHandler, /logs handler will be made available in kubelet. Default value for this flag is true.

Special notes for your reviewer:
Let me know if this change sounds reasonable or if it needs additional changes on top of this.

Which issue(s) this PR fixes:
Fixes #87252

Does this PR introduce a user-facing change?:

Cluster admins can now turn off /logs endpoint in kubelet by setting enableSystemLogHandler to false in their kubelet configuration file. enableSystemLogHandler can be set to true only when enableDebuggingHandlers is also set to true.

/sig node

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 16, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @SaranBalaji90. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/kubeadm area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jan 16, 2020
@zhouya0
Copy link
Contributor

zhouya0 commented Jan 16, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 16, 2020
func (kl *Kubelet) ListenAndServe(address net.IP, port uint, tlsOptions *server.TLSOptions, auth server.AuthInterface, enableCAdvisorJSONEndpoints, enableDebuggingHandlers, enableContentionProfiling bool) {
server.ListenAndServeKubeletServer(kl, kl.resourceAnalyzer, address, port, tlsOptions, auth, enableCAdvisorJSONEndpoints, enableDebuggingHandlers, enableContentionProfiling, kl.redirectContainerStreaming, kl.criHandler)
func (kl *Kubelet) ListenAndServe(address net.IP, port uint, tlsOptions *server.TLSOptions, auth server.AuthInterface, enableCAdvisorJSONEndpoints, enableDebuggingHandlers, enableContentionProfiling, enableSystemLogHander bool) {
server.ListenAndServeKubeletServer(kl, kl.resourceAnalyzer, address, port, tlsOptions, auth, enableCAdvisorJSONEndpoints, enableDebuggingHandlers, enableContentionProfiling, kl.redirectContainerStreaming, enableSystemLogHander, kl.criHandler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableSystemLogHander maybe Handler I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks. Will fix this.

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@SaranBalaji90
Copy link
Contributor Author

Most of the tests except those in "kubernetes-e2e-kind", failed during cluster creation itself.

"Cluster failed to initialize within 300 seconds.
W0116 06:09:54.403] Last output from querying API server follows:
curl: (7) Failed to connect to 34.82.87.107 port 443: Connection refused"

For kubernetes-e2e-kind:

I see two test runs - 1217685941636829186 and 1217685946661605376. One of the run succeeded and other one failed because of following tests

  • [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: blockfs] [Testpattern: Pre-provisioned PV (default fs)] subPath should support non-existent path expand_more
    -> "nsenter" cmd failed with exit code-1
  • [sig-network] Services should serve a basic endpoint from pods [Conformance]
    -> "Timed out waiting for service endpoint-test2 in namespace services-5145 to expose endpoints map[pod1:[80]]"

Even though both of these looks like an issue associated with the worker node, these are not related to this PR. Looking at kubelet logs might help, but not sure how to get those logs. Any help would be appreciated.

@SaranBalaji90
Copy link
Contributor Author

/retest

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pr!

For me, my decision on whether we'd want to merge this pr rests on two questions:

  1. How many users/cluster admin are interested in disabling the /logs endpoint?
  2. Of those who do want to disable it, how important is it to do so?

In my mind, the answer to those two questions will help us weigh the benefits of allowing disabling the /logs endpoint vs the cost of adding additional configuration to the Kubelet (which already has a whole bunch of tunable parameters).

I think either #sig-node in slack or attending a sig-node meeting would be the best place to get an answer to these questions. Or perhaps folks will weigh in on the issue/pr?

If we do decide to move forward with this feature, your high-level implementation looks sound. I'll look at the details once we have answers to the product questions.

@mattjmcnaughton
Copy link
Contributor

/retest

Unsure if these test failures are legit or not... retesting should give us more info :)

@SaranBalaji90
Copy link
Contributor Author

I think either #sig-node in slack or attending a sig-node meeting would be the best place to get an answer to these questions. Or perhaps folks will weigh in on the issue/pr?

@mattjmcnaughton thanks for taking a look. Will bring this up in next sig-node meeting to get others opinion as well.

@SaranBalaji90
Copy link
Contributor Author

SaranBalaji90 commented Jan 16, 2020

Unsure if these test failures are legit or not... retesting should give us more info :)

@mattjmcnaughton Regarding the test, I tried couple of times and all test run failed while provisioning the cluster. Given that this is my first PR, I would like to dig into these errors irrespective of whether we merge this PR or not. Just to understand how this process works. Do you happen to have any doc link handy to look into why master provisioning failed?

@shyamjvs
Copy link
Member

/retest

@SaranBalaji90
Copy link
Contributor Author

SaranBalaji90 commented Jan 16, 2020

/retest

But this might fail again with the same error. When I look at job history, seems like other test are able to create cluster successfully and run tests. So something might be wrong with the PR itself?

@shyamjvs
Copy link
Member

Sure, just wanted to eliminate chances of some flaky or transient issues. Especially because I see that pull-kubernetes-e2e-gce-100-performance succeeded. That one creates a 100-node cluster and runs some scale tests, so it means cluster creation did work with this change.

@shyamjvs
Copy link
Member

Kubelet is crashing on the master instance. See:

I0116 19:18:33.169698    1333 server.go:145] Starting to listen on 0.0.0.0:10250
panic: http: multiple registrations for /logs/
goroutine 187 [running]:
net/http.(*ServeMux).Handle(0xc000baeac0, 0x415a3e6, 0x6, 0x484b780, 0xc0005fde90)
	GOROOT/src/net/http/server.go:2403 +0x302
net/http.(*ServeMux).HandleFunc(...)
	GOROOT/src/net/http/server.go:2440
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).addHandler(0xc00097b680, 0xc000ad4460, 0xc000baeac0, 0x0)
	vendor/github.com/emicklei/go-restful/container.go:132 +0x1ba
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).Add(0xc00097b680, 0xc000ad4460, 0x0)
	vendor/github.com/emicklei/go-restful/container.go:107 +0x2d4
k8s.io/kubernetes/pkg/kubelet/server.(*Server).InstallSystemLogHandlers(0xc000b56500)
	pkg/kubelet/server/server.go:494 +0x388
k8s.io/kubernetes/pkg/kubelet/server.NewServer(0x4965760, 0xc000600a80, 0x48c05c0, 0xc000cabd40, 0x489eb40, 0xc00030bd10, 0x100000001, 0x0, 0x0, 0x0, ...)
	pkg/kubelet/server/server.go:245 +0x309
k8s.io/kubernetes/pkg/kubelet/server.ListenAndServeKubeletServer(0x4965760, 0xc000600a80, 0x48c05c0, 0xc000cabd40, 0xc000abea70, 0x10, 0x10, 0x280a, 0xc00030b140, 0x489eb40, ...)
	pkg/kubelet/server/server.go:146 +0x1da
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).ListenAndServe(0xc000600a80, 0xc000abea70, 0x10, 0x10, 0x280a, 0xc00030b140, 0x489eb40, 0xc00030bd10, 0xc001000001)
	pkg/kubelet/kubelet.go:2228 +0x104
created by k8s.io/kubernetes/cmd/kubelet/app.startKubelet
	cmd/kubelet/app/server.go:1133 +0x266
kubelet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
kubelet.service: Failed with result 'exit-code'.

More specifically this seems to be the problem:

panic: http: multiple registrations for /logs/

Here are the kubelet logs - https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/87273/pull-kubernetes-e2e-gce/1217886262732525569/artifacts/e2e-f59951d578-674b9-master/kubelet.log (in general, you can find them under the 'Artifacts' tab in the prow page)

It's surprising why the 100-node test didn't see this issue though. Needs digging.

@SaranBalaji90
Copy link
Contributor Author

Here are the kubelet logs - https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/87273/pull-kubernetes-e2e-gce/1217886262732525569/artifacts/e2e-f59951d578-674b9-master/kubelet.log (in general, you can find them under the 'Artifacts' tab in the prow page)

It's surprising why the 100-node test didn't see this issue though. Needs digging.

Thanks Shyam. I will take a look at this.

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jul 2, 2020
@SaranBalaji90
Copy link
Contributor Author

/retest

@neolit123
Copy link
Member

/remove-area kubeadm
/remove-sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot removed area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jul 6, 2020
@liggitt
Copy link
Member

liggitt commented Jul 6, 2020

/approve

API bits lgtm, will leave final lgtm to kubelet reviewer

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, liggitt, neolit123, SaranBalaji90

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2020
@sjenning
Copy link
Contributor

sjenning commented Jul 6, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 6, 2020
@SaranBalaji90
Copy link
Contributor Author

/retest

2 similar comments
@SaranBalaji90
Copy link
Contributor Author

/retest

@SaranBalaji90
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 7, 2020

@SaranBalaji90: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kind-ipv6 e70e2d1561a8e3e1d7311765cde1932598619f40 link /test pull-kubernetes-e2e-kind-ipv6

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@SaranBalaji90
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 7e75a5e into kubernetes:master Jul 7, 2020
@SaranBalaji90 SaranBalaji90 deleted the kubelet-log-file branch July 7, 2020 15:17
Bregor added a commit to evilmartians/chef-kubernetes that referenced this pull request Sep 10, 2020
Bregor added a commit to evilmartians/chef-kubernetes that referenced this pull request Sep 14, 2020
kubernetes/kubernetes#87273

Cluster admins can now turn off /logs endpoint in kubelet by setting
enableSystemLogHandler to false in their kubelet configuration file.
enableSystemLogHandler can be set to true only when
enableDebuggingHandlers is also set to true.

Signed-off-by: Maxim Filatov <pipopolam@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to enable/disable kubelet "/logs/" endpoint