Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestListResources is flaky #13548

Closed
nklaassen opened this issue Jun 15, 2022 · 2 comments
Closed

TestListResources is flaky #13548

nklaassen opened this issue Jun 15, 2022 · 2 comments

Comments

@nklaassen
Copy link
Contributor

nklaassen commented Jun 15, 2022

Failure

Relevant Snippet

Nics-MacBook-Pro:teleport nklaassen$ go test github.com/gravitational/teleport/api/client -run TestListResources --count 1000 --failfast
--- FAIL: TestListResources (0.01s)
    --- FAIL: TestListResources/DatabaseServer (0.00s)
        client_test.go:527:
                Error Trace:    client_test.go:527
                Error:          Received unexpected error:
                                rpc error: code = Internal desc = unexpected EOF
                Test:           TestListResources/DatabaseServer
FAIL
FAIL    github.com/gravitational/teleport/api/client    6.899s
FAIL

CI Logs:

@nklaassen nklaassen changed the title TestListResources is flaky TestListResources is flaky Jun 17, 2022
@nklaassen nklaassen removed the bug label Jun 20, 2022
brianneville added a commit to brianneville/grpcbug that referenced this issue Jun 21, 2022
related/teleport_test.go contains a test that minimally
 reproduces the issue noticed in gravitational/teleport#13548
Also update go.mod, vendor with testify/require that teleport use.

In related/TestExhaustConnFail.log, we see the client's framer
 (0xc0001981c0) reporting a flow control error on line 3086
 before the EOF is hit:
2022/06/21 22:20:02 http2: Framer 0xc0001981c0:
 wrote RST_STREAM stream=3 len=4 ErrCode=FLOW_CONTROL_ERROR

 Looking at RFC7540 for information on this error code,
 something that stands out is:
  https://datatracker.ietf.org/doc/html/rfc7540#section-6.9
 "Frames that are exempt from flow control MUST be accepted and
 processed, unless the receiver is unable to assign resources to
 handling the frame.  A receiver MAY respond with a stream error
 (Section 5.4.2) or connection error (Section 5.4.1) of type
 FLOW_CONTROL_ERROR if it is unable to accept a frame."

Just a guess, but maybe the teleport issue is caused by grpc-go
 being slow to cleanup after receiving large responses, and its
 resources are still in use by the time that the next response
 is received?

Making the client sleep for a few milliseconds after exhausting
 resources also works to avoid the issue, although I didn't add
 a test for that because it would vary based on your system

It is weird that the workaround for the original bug in this repo
 (which did not involve exhausting resources and which did not
 find the FLOW_CONTROL_ERROR in the logs), can also be applied
 to this issue, so perhaps there's something more to it..
@zmb3
Copy link
Collaborator

zmb3 commented Jul 29, 2022

This is waiting on grpc-go release 1.49.0

@zmb3
Copy link
Collaborator

zmb3 commented Dec 28, 2022

We're now on grpc-go 1.51.0 and I've run about 10K iterations without a failure, so I'm going to call this fixed.

@zmb3 zmb3 closed this as completed Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants