[Flaking Test] UT TestSchedulerGuaranteeNonNilNodeInSchedulingCycle #124591

pacoxu · 2024-04-28T10:04:37Z

Which jobs are flaking?

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-unit/1784375152293711872

Which tests are flaking?

k8s.io/kubernetes/pkg: scheduler TestSchedulerGuaranteeNonNilNodeInSchedulingCycle

Since when has it been flaking?

04-28?

Testgrid link

https://testgrid.k8s.io/sig-release-master-blocking#ci-kubernetes-unit

Reason for failure (if possible)

E0428 00:37:25.772292   60434 watch.go:218] "Observed a panic" panic="channel full" panicGoValue="&errors.errorString{s:\"channel full\"}" stacktrace=<
	goroutine 560 [running]:
	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3e02b18, 0x62f3aa0}, {0x343eda0, 0xc0017fe0a0})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xe5
	k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x3e02b18, 0x62f3aa0}, {0x343eda0, 0xc0017fe0a0}, {0x62f3aa0, 0x0, 0xc0015c2b38?})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:82 +0xa2
	k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3971048?})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x18f
	panic({0x343eda0?, 0xc0017fe0a0?})
		/usr/local/go/src/runtime/panic.go:770 +0x132
	k8s.io/apimachinery/pkg/watch.(*RaceFreeFakeWatcher).Add(0xc000364bb8, {0x3de2d90, 0xc001b8a908})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/watch/watch.go:218 +0x1aa
	k8s.io/client-go/testing.(*tracker).add(0xc0006160a0, {{0x0, 0x0}, {0x381e48b, 0x2}, {0x381efa2, 0x4}}, {0x3de2d90, 0xc001b89b08}, {0x382de58, ...}, ...)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:431 +0x1015
	k8s.io/client-go/testing.(*tracker).Create(0xc0006160a0, {{0x0, 0x0}, {0x381e48b, 0x2}, {0x381efa2, 0x4}}, {0x3de2d90, 0xc001b89b08}, {0x382de58, ...})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:356 +0xfb
	k8s.io/client-go/kubernetes/fake.NewSimpleClientset.ObjectReaction.func2({0x3e08e08, 0xc0017ed080})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:103 +0x1249
	k8s.io/client-go/testing.(*SimpleReactor).React(0xc0006e1320, {0x3e08e08, 0xc0017ed080})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fixture.go:530 +0x56
	k8s.io/client-go/testing.(*Fake).Invokes(0xc0007040b0, {0x3e08e08, 0xc0017ed000}, {0x3de2d90, 0xc001b89688})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/testing/fake.go:145 +0x36e
	k8s.io/client-go/kubernetes/typed/core/v1/fake.(*FakePods).Create(0xc001788f00, {0xc0015c3eb8?, 0x45838a?}, 0xc001b89208, {{{0x0, 0x0}, {0x0, 0x0}}, {0x0, 0x0, ...}, ...})
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/client-go/kubernetes/typed/core/v1/fake/fake_pod.go:91 +0x225
	k8s.io/kubernetes/pkg/scheduler.TestSchedulerGuaranteeNonNilNodeInSchedulingCycle.func2()
		/home/prow/go/src/k8s.io/kubernetes/pkg/scheduler/schedule_one_test.go:602 +0x59b
	k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0006e4b60)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x42
	k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0006e4b60, {0x3dda360, 0xc000fd18f0}, 0x1, 0xc0003e0120)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xc5
	k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0006e4b60, 0x895440, 0x0, 0x1, 0xc0003e0120)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x10b
	k8s.io/apimachinery/pkg/util/wait.Until(...)
		/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
	created by k8s.io/kubernetes/pkg/scheduler.TestSchedulerGuaranteeNonNilNodeInSchedulingCycle in goroutine 548
		/home/prow/go/src/k8s.io/kubernetes/pkg/scheduler/schedule_one_test.go:614 +0x1a3f
 >
panic: channel full [recovered]
	panic: channel full

goroutine 560 [running]:
k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x3e02b18, 0x62f3aa0}, {0x343eda0, 0xc0017fe0a0}, {0x62f3aa0, 0x0, 0xc0015c2b38?})
	/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:89 +0x148
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3971048?})
	/home/prow/go/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x18f
panic({0x343eda0?, 0xc0017fe0a0?})

Anything else we need to know?

No response

Relevant SIG(s)

/sig scheduling

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-04-28T10:04:45Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chengjoey · 2024-05-08T03:55:00Z

+1

The default length of fake watcher's chan is only 100, which affects the performance a little. If the obj of chan is not consumed in time, it will cause panic.

Should the length be made configurable?

kubernetes/staging/src/k8s.io/apimachinery/pkg/watch/watch.go

Lines 51 to 53 in 7947052

    
           var ( 
        
           	DefaultChanSize int32 = 100 
        
           )

kubernetes/staging/src/k8s.io/apimachinery/pkg/watch/watch.go

Lines 266 to 277 in 7947052

    
           func (f *RaceFreeFakeWatcher) Action(action EventType, obj runtime.Object) { 
        
           	f.Lock() 
        
           	defer f.Unlock() 
        
           	if !f.Stopped { 
        
           		select { 
        
           		case f.result <- Event{action, obj}: 
        
           			return 
        
           		default: 
        
           			panic(fmt.Errorf("channel full")) 
        
           		} 
        
           	} 
        
           }

chengjoey · 2024-05-08T05:32:56Z

In my tests, if I change DefaultChanSize to 40, it can be reproduced almost every time.

pacoxu added the kind/flake Categorizes issue or PR as related to a flaky test. label Apr 28, 2024

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flaking Test] UT TestSchedulerGuaranteeNonNilNodeInSchedulingCycle #124591

[Flaking Test] UT TestSchedulerGuaranteeNonNilNodeInSchedulingCycle #124591

pacoxu commented Apr 28, 2024

k8s-ci-robot commented Apr 28, 2024

chengjoey commented May 8, 2024

chengjoey commented May 8, 2024

[Flaking Test] UT TestSchedulerGuaranteeNonNilNodeInSchedulingCycle #124591

[Flaking Test] UT TestSchedulerGuaranteeNonNilNodeInSchedulingCycle #124591

Comments

pacoxu commented Apr 28, 2024

Which jobs are flaking?

Which tests are flaking?

Since when has it been flaking?

Testgrid link

Reason for failure (if possible)

Anything else we need to know?

Relevant SIG(s)

k8s-ci-robot commented Apr 28, 2024

chengjoey commented May 8, 2024

chengjoey commented May 8, 2024