Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaking Test] UT TestSchedulerGuaranteeNonNilNodeInSchedulingCycle #124591

Open
pacoxu opened this issue Apr 28, 2024 · 3 comments
Open

[Flaking Test] UT TestSchedulerGuaranteeNonNilNodeInSchedulingCycle #124591

pacoxu opened this issue Apr 28, 2024 · 3 comments
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@pacoxu
Copy link
Member

pacoxu commented Apr 28, 2024

Which jobs are flaking?

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-unit/1784375152293711872

Which tests are flaking?

k8s.io/kubernetes/pkg: scheduler TestSchedulerGuaranteeNonNilNodeInSchedulingCycle

Since when has it been flaking?

04-28?

Testgrid link

https://testgrid.k8s.io/sig-release-master-blocking#ci-kubernetes-unit

Reason for failure (if possible)

E0428 00:37:25.772292   60434 watch.go:218] "Observed a panic" panic="channel full" panicGoValue="&errors.errorString{s:\"channel full\"}" stacktrace=<
	goroutine 560 [running]:
	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3e02b18, 0x62f3aa0}, {0x343eda0, 0xc0017fe0a0})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xe5
	k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x3e02b18, 0x62f3aa0}, {0x343eda0, 0xc0017fe0a0}, {0x62f3aa0, 0x0, 0xc0015c2b38?})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:82 +0xa2
	k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3971048?})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x18f
	panic({0x343eda0?, 0xc0017fe0a0?})
		/usr/local/go/src/runtime/panic.go:770 +0x132
	k8s.io/apimachinery/pkg/watch.(*RaceFreeFakeWatcher).Add(0xc000364bb8, {0x3de2d90, 0xc001b8a908})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/watch/watch.go:218 +0x1aa
	k8s.io/client-go/testing.(*tracker).add(0xc0006160a0, {{0x0, 0x0}, {0x381e48b, 0x2}, {0x381efa2, 0x4}}, {0x3de2d90, 0xc001b89b08}, {0x382de58, ...}, ...)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:431 +0x1015
	k8s.io/client-go/testing.(*tracker).Create(0xc0006160a0, {{0x0, 0x0}, {0x381e48b, 0x2}, {0x381efa2, 0x4}}, {0x3de2d90, 0xc001b89b08}, {0x382de58, ...})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:356 +0xfb
	k8s.io/client-go/kubernetes/fake.NewSimpleClientset.ObjectReaction.func2({0x3e08e08, 0xc0017ed080})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:103 +0x1249
	k8s.io/client-go/testing.(*SimpleReactor).React(0xc0006e1320, {0x3e08e08, 0xc0017ed080})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:530 +0x56
	k8s.io/client-go/testing.(*Fake).Invokes(0xc0007040b0, {0x3e08e08, 0xc0017ed000}, {0x3de2d90, 0xc001b89688})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fake.go:145 +0x36e
	k8s.io/client-go/kubernetes/typed/core/v1/fake.(*FakePods).Create(0xc001788f00, {0xc0015c3eb8?, 0x45838a?}, 0xc001b89208, {{{0x0, 0x0}, {0x0, 0x0}}, {0x0, 0x0, ...}, ...})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/kubernetes/typed/core/v1/fake/fake_pod.go:91 +0x225
	k8s.io/kubernetes/pkg/scheduler.TestSchedulerGuaranteeNonNilNodeInSchedulingCycle.func2()
		/home/prow/go/src/k8s.io/kubernetes/pkg/scheduler/schedule_one_test.go:602 +0x59b
	k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0006e4b60)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x42
	k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0006e4b60, {0x3dda360, 0xc000fd18f0}, 0x1, 0xc0003e0120)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xc5
	k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0006e4b60, 0x895440, 0x0, 0x1, 0xc0003e0120)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x10b
	k8s.io/apimachinery/pkg/util/wait.Until(...)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
	created by k8s.io/kubernetes/pkg/scheduler.TestSchedulerGuaranteeNonNilNodeInSchedulingCycle in goroutine 548
		/home/prow/go/src/k8s.io/kubernetes/pkg/scheduler/schedule_one_test.go:614 +0x1a3f
 >
panic: channel full [recovered]
	panic: channel full

goroutine 560 [running]:
k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x3e02b18, 0x62f3aa0}, {0x343eda0, 0xc0017fe0a0}, {0x62f3aa0, 0x0, 0xc0015c2b38?})
	/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:89 +0x148
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3971048?})
	/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x18f
panic({0x343eda0?, 0xc0017fe0a0?})

Anything else we need to know?

No response

Relevant SIG(s)

/sig scheduling

@pacoxu pacoxu added the kind/flake Categorizes issue or PR as related to a flaky test. label Apr 28, 2024
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 28, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chengjoey
Copy link
Contributor

+1

The default length of fake watcher's chan is only 100, which affects the performance a little. If the obj of chan is not consumed in time, it will cause panic.

Should the length be made configurable?

var (
DefaultChanSize int32 = 100
)

func (f *RaceFreeFakeWatcher) Action(action EventType, obj runtime.Object) {
f.Lock()
defer f.Unlock()
if !f.Stopped {
select {
case f.result <- Event{action, obj}:
return
default:
panic(fmt.Errorf("channel full"))
}
}
}

@chengjoey
Copy link
Contributor

In my tests, if I change DefaultChanSize to 40, it can be reproduced almost every time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

3 participants