Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestDatabases/redshift_cluster flakiness #41521

Open
ravicious opened this issue May 14, 2024 · 4 comments
Open

TestDatabases/redshift_cluster flakiness #41521

ravicious opened this issue May 14, 2024 · 4 comments
Assignees

Comments

@ravicious
Copy link
Member

Failure

Link(s) to logs

Relevant snippet

=== FAIL: e2e/aws TestDatabases/redshift_cluster/auto_user_keep (83.17s)
    redshift_test.go:162: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:275
        	            				/opt/go/src/runtime/asm_amd64.s:1695
        	Error:      	Should be false
        	Messages:   	user "auto_keep_d0fc17" should not be able to login after deactivating
    redshift_test.go:162: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:252
        	            				/__w/teleport/teleport/e2e/aws/redshift_test.go:162
        	            				/__w/teleport/teleport/e2e/aws/redshift_test.go:196
        	Error:      	Condition never satisfied
        	Test:       	TestDatabases/redshift_cluster/auto_user_keep
        	Messages:   	waiting for auto user "auto_keep_d0fc17" to be deactivated
{"caller":"reversetunnel/agent.go:561","component":"proxy:agent","leaseID":1,"level":"debug","message":"Ping -\u003e 127.0.0.1:41509.","target":"127.0.0.1:41509","timestamp":"2024-05-14T11:16:33Z","trace.fields":{"localCluster":"","targetCluster":"local-site"}}
{"addr":"127.0.0.1:48572","caller":"reversetunnel/localsite.go:777","component":"proxy:server","latency":209466,"level":"debug","message":"Ping \u003c- 127.0.0.1:48572","serverID":"localhost.local-site","timestamp":"2024-05-14T11:16:33Z","trace.fields":{"cluster":"local-site"}}
{"caller":"reversetunnel/agent.go:561","component":"proxy:agent","leaseID":1,"level":"debug","message":"Ping -\u003e 127.0.0.1:41509.","target":"127.0.0.1:41509","timestamp":"2024-05-14T11:16:36Z","trace.fields":{"localCluster":"","targetCluster":"local-site"}}
{"addr":"127.0.0.1:48572","caller":"reversetunnel/localsite.go:777","component":"proxy:server","latency":187520,"level":"debug","message":"Ping \u003c- 127.0.0.1:48572","serverID":"localhost.local-site","timestamp":"2024-05-14T11:16:36Z","trace.fields":{"cluster":"local-site"}}
{"caller":"reversetunnel/agent.go:561","component":"proxy:agent","leaseID":1,"level":"debug","message":"Ping -\u003e 127.0.0.1:41509.","target":"127.0.0.1:41509","timestamp":"2024-05-14T11:16:39Z","trace.fields":{"localCluster":"","targetCluster":"local-site"}}
{"addr":"127.0.0.1:48572","caller":"reversetunnel/localsite.go:777","component":"proxy:server","latency":191492,"level":"debug","message":"Ping \u003c- 127.0.0.1:48572","serverID":"localhost.local-site","timestamp":"2024-05-14T11:16:39Z","trace.fields":{"cluster":"local-site"}}
        --- FAIL: TestDatabases/redshift_cluster/auto_user_keep (83.17s)

 === FAIL: e2e/aws TestDatabases/redshift_cluster (7.21s)
    redshift_test.go:116: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:116
        	            				/opt/go/src/testing/testing.go:1175
        	            				/opt/go/src/testing/testing.go:1353
        	            				/opt/go/src/testing/testing.go:1657
        	Error:      	Received unexpected error:
        	            	ERROR: cannot drop this role since it has been granted on a user (SQLSTATE 0LP01)
        	Test:       	TestDatabases/redshift_cluster
        	Messages:   	test cleanup failed, stmt="DROP ROLE \"auto_role1_42536b\""
    redshift_test.go:116: 
        	Error Trace:	/__w/teleport/teleport/e2e/aws/redshift_test.go:116
        	            				/opt/go/src/testing/testing.go:1175
        	            				/opt/go/src/testing/testing.go:1353
        	            				/opt/go/src/testing/testing.go:1657
        	Error:      	Received unexpected error:
        	            	ERROR: cannot drop this role since it has been granted on a user (SQLSTATE 0LP01)
        	Test:       	TestDatabases/redshift_cluster
        	Messages:   	test cleanup failed, stmt="DROP ROLE \"auto_role2_8114fd\""
@zmb3
Copy link
Collaborator

zmb3 commented May 21, 2024

https://github.com/gravitational/teleport/actions/runs/9176973071/job/25233510207?pr=41813

@GavinFrazar can you take a look?

@nklaassen
Copy link
Contributor

@zmb3
Copy link
Collaborator

zmb3 commented May 31, 2024

@GavinFrazar
Copy link
Contributor

GavinFrazar commented Jun 2, 2024

I was out all last week, but I did discover shortly before going on leave that the tests are failing due to a deadlock bug in our auto user provisioning SQL, so I'm pretty confident that these failures are legit just inconsistent.
Basically what I've found is that concurrent transactions can acquire the same locks out of order leading to two transactions waiting on eachother. The database detects this and aborts one of the transactions, leaving the auto provisioned user activated instead of deactivating it:

{"caller":"postgres/engine.go:142","component":"db:service","db":"ci-database-e2e-tests-redshif
t-cluster-us-west-2-307493967395","error":"ERROR: deadlock detected (SQLSTATE 40P01)","id":"c5f
031d4-1c88-4231-b9b5-0ff070b02e8f","level":"error","message":"Failed to teardown auto user.","t
imestamp":"2024-05-27T14:21:37-07:00"}

I'll look into fixing that this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants