Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests #4460

Closed
klizhentas opened this issue Oct 6, 2020 · 6 comments
Closed

Flaky tests #4460

klizhentas opened this issue Oct 6, 2020 · 6 comments
Labels
automated-testing-problem Problems with automated tests (unit tests, integration tests)

Comments

@klizhentas
Copy link
Contributor


----------------------------------------------------------------------
--
188 | FAIL: services_test.go:176: ServicesSuite.TestSemaphoreContention
189 |  
190 | /go/src/github.com/gravitational/teleport/lib/services/suite/suite.go:1105:
191 | c.Assert(err, check.IsNil)
192 | ... value *trace.TraceErr =
193 | ERROR REPORT:
194 | Original Error: *trace.LimitExceededError too much contention on semaphore connection/alice
195 | Stack Trace:
196 | /go/src/github.com/gravitational/teleport/lib/services/local/presence.go:750 github.com/gravitational/teleport/lib/services/local.(*PresenceService).AcquireSemaphore
197 | /go/src/github.com/gravitational/teleport/lib/services/semaphore.go:240 github.com/gravitational/teleport/lib/services.AcquireSemaphoreLock
198 | /go/src/github.com/gravitational/teleport/lib/services/suite/suite.go:1104 github.com/gravitational/teleport/lib/services/suite.(*ServicesTestSuite).SemaphoreContention.func1
199 | /opt/go/src/runtime/asm_amd64.s:1374 runtime.goexit
200 | User Message: too much contention on semaphore connection/alice
201 | ("too much contention on semaphore connection/alice")
202 |  
203 | OOPS: 33 passed, 1 FAILED


@klizhentas
Copy link
Contributor Author

----------------------------------------------------------------------
FAIL: workpool_test.go:75: WorkSuite.TestFull

workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
... Error: Timeout waiting for lease
    c.Errorf("Timeout waiting for lease")
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
... Error: Timeout waiting for lease
... Error: Timeout waiting for lease
... Error: Timeout waiting for lease
workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
... Error: Timeout waiting for lease
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
workpool_test.go:122:
... Error: Timeout waiting for lease






    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease


    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:122:
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

    c.Errorf("Timeout waiting for lease")
... Error: Timeout waiting for lease

workpool_test.go:135:
    c.Errorf("unexpected lease grant: %+v, counts=%+v", l, counts)
... Error: unexpected lease grant: {group:0xc0001224e0 id:291 relOnce:0xc0000179a0}, counts={Target:200 Active:90}

OOPS: 1 passed, 1 FAILED
--- FAIL: Test (2.26s)

@webvictim
Copy link
Contributor

Another +1 for ServicesSuite.TestSemaphoreContention, it seems to fail whenever we have 3/4 test runs going at once in Drone .

It's running on 4 beefy GKE worker nodes with 8 CPUs and 32GB RAM and each Drone test run requests 3 CPUs to itself, I don't think we're overcontended here.

Also seen a few failures for SrvSuite.TestLimiter:

232 | FAIL: sshserver_test.go:1071: SrvSuite.TestLimiter
233 |  
234 | sshserver_test.go:1147:
235 | c.Assert(clt, NotNil)
236 | ... value *ssh.Client = (*ssh.Client)(nil)
237 |  
238 | OOPS: 23 passed, 1 skipped, 1 FAILED
239 | --- FAIL: TestRegular (51.03s)

@webvictim webvictim added the test-plan-problem Issues which have been surfaced by running the manual release test plan label Oct 7, 2020
@webvictim webvictim changed the title Flaky test Flaky tests Oct 7, 2020
@webvictim webvictim added automated-testing-problem Problems with automated tests (unit tests, integration tests) and removed test-plan-problem Issues which have been surfaced by running the manual release test plan labels Oct 15, 2020
@webvictim
Copy link
Contributor

Seen this one failing a few times over the last few days.

----------------------------------------------------------------------
199	FAIL: sshserver_test.go:371: SrvSuite.TestAgentForward
200	
201	sshserver_test.go:456:
202	    c.Fatalf("expected socket to be closed, still could dial after 150 ms")
203	... Error: expected socket to be closed, still could dial after 150 ms
204	
205	OOPS: 23 passed, 1 skipped, 1 FAILED
206	--- FAIL: TestRegular (16.40s)
207	FAIL
208	coverage: 60.9% of statements
209	FAIL

@webvictim
Copy link
Contributor

Again:

FAIL: services_test.go:176: ServicesSuite.TestSemaphoreContention
191	
192	/go/src/github.com/gravitational/teleport/lib/services/suite/suite.go:1105:
193	    c.Assert(err, check.IsNil)
194	... value *trace.TraceErr = 
195	ERROR REPORT:
196	Original Error: *trace.LimitExceededError too much contention on semaphore connection/alice
197	Stack Trace:
198		/go/src/github.com/gravitational/teleport/lib/services/local/presence.go:750 github.com/gravitational/teleport/lib/services/local.(*PresenceService).AcquireSemaphore
199		/go/src/github.com/gravitational/teleport/lib/services/semaphore.go:240 github.com/gravitational/teleport/lib/services.AcquireSemaphoreLock
200		/go/src/github.com/gravitational/teleport/lib/services/suite/suite.go:1104 github.com/gravitational/teleport/lib/services/suite.(*ServicesTestSuite).SemaphoreContention.func1
201		/opt/go/src/runtime/asm_amd64.s:1374 runtime.goexit
202	User Message: too much contention on semaphore connection/alice
203	 ("too much contention on semaphore connection/alice")
204	
205	OOPS: 33 passed, 1 FAILED
206	--- FAIL: Test (24.62s)
207	FAIL
208	coverage: 48.7% of statements

@webvictim
Copy link
Contributor

This one is the cause of the majority of integration test failures:

FAIL: integration_test.go:3141: IntSuite.TestRotateSuccess
--
269 |  
270 | integration_test.go:3238:
271 | c.Assert(err, check.IsNil)
272 | ... value *trace.TraceErr =
273 | ERROR REPORT:
274 | Original Error: *trace.BadParameterError timeout waiting for old service to stop
275 | Stack Trace:
276 | /go/src/github.com/gravitational/teleport/integration/integration_test.go:3847 github.com/gravitational/teleport/integration.waitForReload
277 | /go/src/github.com/gravitational/teleport/integration/integration_test.go:3237 github.com/gravitational/teleport/integration.(*IntSuite).TestRotateSuccess
278 | /opt/go/src/reflect/value.go:463 reflect.Value.call
279 | /opt/go/src/reflect/value.go:321 reflect.Value.Call
280 | /go/src/github.com/gravitational/teleport/vendor/gopkg.in/check.v1/check.go:782 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1
281 | /go/src/github.com/gravitational/teleport/vendor/gopkg.in/check.v1/check.go:676 gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1
282 | /opt/go/src/runtime/asm_amd64.s:1374 runtime.goexit
283 | User Message: timeout waiting for old service to stop
284 | ("timeout waiting for old service to stop")

@webvictim webvictim mentioned this issue Oct 29, 2020
15 tasks
@zmb3
Copy link
Collaborator

zmb3 commented Dec 28, 2021

Closing this as a duplicate of #4653 and #9492

@zmb3 zmb3 closed this as completed Dec 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated-testing-problem Problems with automated tests (unit tests, integration tests)
Projects
None yet
Development

No branches or pull requests

3 participants