Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete the log of the component, it will still not rebuild the file to save it. #100478

Closed
lunhuijie opened this issue Mar 23, 2021 · 20 comments
Closed
Assignees
Labels
area/logging kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@lunhuijie
Copy link
Contributor

lunhuijie commented Mar 23, 2021

What happened:

1.when I delete the log's file of kubelet and find the status of kubelet is ok.
but i can't find where these logs go.
2.when I mv the log's file of kubelet to other path, it still write to this file.
then i rebuild the same file at the old path, it wouldn't write to this new file but still the one which moved to other path just now.
3.when I delete the logs of kubelet and restart kubelet, it will rebuild.

What you expected to happen:

whatever I do to these log's files, it will rebuild when cluster found it lost.

How to reproduce it (as minimally and precisely as possible):

see What happened

Anything else we need to know?:

This situation is right? I think it is unreasonable not to rebuild

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@lunhuijie lunhuijie added the kind/bug Categorizes issue or PR as related to a bug. label Mar 23, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 23, 2021
@lunhuijie lunhuijie changed the title Remove component‘s log , it still wrote log but not rebuild a file to save it. Delete the log of the component, it will still not rebuild the file to save it. Mar 23, 2021
@eloyekunle
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 23, 2021
@mengjiao-liu
Copy link
Member

I can repeat the question

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:12:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:03:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

I've looked at the code and this is probably a problem with the Klog library.

I will try to fix this.

/assign

@lunhuijie
Copy link
Contributor Author

I am also looking at this piece of code to solve this problem. If you have any nice idea, we can discuss it together. @mengjiao-liu

@mengjiao-liu
Copy link
Member

mengjiao-liu commented Mar 24, 2021

Now I can be sure that the problem is with the Klog library, because I test with Klog alone and repeat the problem as well @lunhuijie

@lunhuijie
Copy link
Contributor Author

lunhuijie commented Mar 24, 2021

klog.go->output->switch s { , I find v19 is differ to v20.
Every time I make a mistake when writing a file, an error value will be returned. In this code, the processing of the error return value is discarded. If we need to fix this problem, we may need to increase the judgment of the error return. but i don't know is's right or not.@mengjiao-liu

@mengjiao-liu
Copy link
Member

mengjiao-liu commented Mar 24, 2021

When we move or delete files, L.File [infoLog] != nil at this time
https://github.com/kubernetes/klog/blob/master/klog.go#L928-L933

so log file is not rebuilt @lunhuijie

@mengjiao-liu
Copy link
Member

Need to update the klog version to solve the issue after it is fixed

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2021
@mengjiao-liu
Copy link
Member

Haven't had time to do it recently, or can you continue with it? @lunhuijie

/unassign

@mengjiao-liu
Copy link
Member

The previous PR is here kubernetes/klog#232 for your reference.

Anyone is welcome to do it.😀

@pacoxu
Copy link
Member

pacoxu commented Jun 25, 2021

/remove-lifecycle stale
/triage accepted
/area logging

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. area/logging and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 25, 2021
@RPing
Copy link

RPing commented Jul 1, 2021

I can try this one.
So currently, the only reason kubernetes/klog#232 can't be merged is that we need to use file watch instead of original os.IsExist approach, right? @mengjiao-liu

@mengjiao-liu
Copy link
Member

mengjiao-liu commented Jul 1, 2021

Thanks for your effort. @RPing
Yes, I think it takes too much I/O to check every time and can slow down the components.

@RPing
Copy link

RPing commented Jul 1, 2021

/assign

@RPing
Copy link

RPing commented Jul 4, 2021

@mengjiao-liu I've implemented a simple fix using a inotify goroutine.
But I want to clarify that what's the timing for rebuilding the log?
If we want to rebuild it as soon as the log is deleted, we should put the inotify goroutine into kubelet, not klog.

@mengjiao-liu
Copy link
Member

If rebuilt, the time point is after the file is deleted.On second thought, the immediate reconstruction should have been done in Kubelet. File handle release and file reconstruction should be a feature rather than a bug.

Let's refer to more comments. @serathius @derekwaynecarr @mrunalp

@n4j n4j added this to Triaged in SIG Node Bugs Jul 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 2, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node Bugs automation moved this from Triaged to Done Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

8 participants