Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goroutine leaks in TestClientHandshakeBasedOnClusterName #5091

Closed
timmyyuan opened this issue Dec 28, 2021 · 5 comments · Fixed by #5118
Closed

Goroutine leaks in TestClientHandshakeBasedOnClusterName #5091

timmyyuan opened this issue Dec 28, 2021 · 5 comments · Fixed by #5118
Assignees

Comments

@timmyyuan
Copy link

What version of gRPC are you using?

master 344b93a285883f2da713622d5064ad4b4512e63e

What version of Go are you using (go version)?

go version go1.17.2 linux/amd64

What operating system (Linux, Windows, …) and version?

$ uname -a
Linux DESKTOP-L5197OT 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

What did you do?

I introduce goleak to TestClientHandshakeBasedOnClusterName and run

go test -v -count 1 -run TestClientHandshakeBasedOnClusterName ./credentials/google

What did you expect to see?

The test passes

What did you see instead?

Goroutine leaks:

=== RUN   TestClientHandshakeBasedOnClusterName
=== RUN   TestClientHandshakeBasedOnClusterName/defaultCreds_no_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/defaultCreds_with_non-CFE_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/defaultCreds_with_CFE_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/computeCreds_no_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/computeCreds_with_non-CFE_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/computeCreds_with_CFE_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/defaultCredsWithOptions_no_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/defaultCredsWithOptions_with_non-CFE_cluster_name
=== RUN   TestClientHandshakeBasedOnClusterName/defaultCredsWithOptions_with_CFE_cluster_name
=== CONT  TestClientHandshakeBasedOnClusterName
    leaks.go:78: found unexpected goroutines:
        [Goroutine 18 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
        goroutine 18 [IO wait]:
        internal/poll.runtime_pollWait(0x7f130d2c97c8, 0x77)
                /usr/local/go/src/runtime/netpoll.go:229 +0x89
        internal/poll.(*pollDesc).wait(0xc0000b4080, 0xc0000901b0, 0x0)
                /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
        internal/poll.(*pollDesc).waitWrite(...)
                /usr/local/go/src/internal/poll/fd_poll_runtime.go:93
        internal/poll.(*FD).WaitWrite(...)
                /usr/local/go/src/internal/poll/fd_unix.go:529
        net.(*netFD).connect(0xc0000b4080, {0xa089c8, 0xc0000ac2a0}, {0xc00010f428, 0x40ed14}, {0x9fbfe0, 0xc0000b6000})
                /usr/local/go/src/net/fd_unix.go:142 +0x717
        net.(*netFD).dial(0xc0000b4080, {0xa089c8, 0xc0000ac2a0}, {0xa0e460, 0x0}, {0xa0e460, 0xc000090180}, 0xc00010f618)
                /usr/local/go/src/net/sock_posix.go:150 +0x379
        net.socket({0xa089c8, 0xc0000ac2a0}, {0x95161e, 0x3}, 0x2, 0x1, 0x0, 0x18, {0xa0e460, 0x0}, ...)
                /usr/local/go/src/net/sock_posix.go:71 +0x2a5
        net.internetSocket({0xa089c8, 0xc0000ac2a0}, {0x95161e, 0x3}, {0xa0e460, 0x0}, {0xa0e460, 0xc000090180}, 0x467d0e, 0x0, ...)
                /usr/local/go/src/net/ipsock_posix.go:142 +0xf8
        net.(*sysDialer).doDialTCP(0xc0000b4000, {0xa089c8, 0xc0000ac2a0}, 0x0, 0x8c8a40)
                /usr/local/go/src/net/tcpsock_posix.go:66 +0xa5
        net.(*sysDialer).dialTCP(0xc0000ac2a0, {0xa089c8, 0xc0000ac2a0}, 0x4f9206, 0x0)
                /usr/local/go/src/net/tcpsock_posix.go:62 +0x59
        net.(*sysDialer).dialSingle(0xc0000b4000, {0xa089c8, 0xc0000ac2a0}, {0xa064f0, 0xc000090180})
                /usr/local/go/src/net/dial.go:583 +0x28b
        net.(*sysDialer).dialSerial(0xc0000b4000, {0xa089c8, 0xc0000ac2a0}, {0xc00009e080, 0x1, 0x951a0d})
                /usr/local/go/src/net/dial.go:551 +0x312
        net.(*Dialer).DialContext(0xc000020960, {0xa08990, 0xc000024080}, {0x95161e, 0x7f130c27db30}, {0xc0000a8000, 0x118})
                /usr/local/go/src/net/dial.go:428 +0x736
        net.(*Dialer).Dial(...)
                /usr/local/go/src/net/dial.go:351
        net/http.(*Transport).dial(0xc0000a0000, {0xa08958, 0xc000073100}, {0x95161e, 0x4000105}, {0xc0000a8000, 0xffffffffffffffff})
                /usr/local/go/src/net/http/transport.go:1169 +0x5a
        net/http.(*Transport).dialConn(0xd65da0, {0xa08958, 0xc000073100}, {{}, 0x0, {0x95d52f, 0x4}, {0xc0000a8000, 0x12}, 0x0})
                /usr/local/go/src/net/http/transport.go:1604 +0x845
        net/http.(*Transport).dialConnFor(0xc00011dd40, 0xc0000aa000)
                /usr/local/go/src/net/http/transport.go:1446 +0xb0
        created by net/http.(*Transport).queueForDial
                /usr/local/go/src/net/http/transport.go:1415 +0x3d7

         Goroutine 19 in state select, with net.(*netFD).connect.func2 on top of the stack:
        goroutine 19 [select]:
        net.(*netFD).connect.func2()
                /usr/local/go/src/net/fd_unix.go:119 +0x9e
        created by net.(*netFD).connect
                /usr/local/go/src/net/fd_unix.go:118 +0x385
        ]
--- FAIL: TestClientHandshakeBasedOnClusterName (0.45s)
    --- PASS: TestClientHandshakeBasedOnClusterName/defaultCreds_no_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/defaultCreds_with_non-CFE_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/defaultCreds_with_CFE_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/computeCreds_no_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/computeCreds_with_non-CFE_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/computeCreds_with_CFE_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/defaultCredsWithOptions_no_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/defaultCredsWithOptions_with_non-CFE_cluster_name (0.00s)
    --- PASS: TestClientHandshakeBasedOnClusterName/defaultCredsWithOptions_with_CFE_cluster_name (0.00s)
FAIL
FAIL    google.golang.org/grpc/credentials/google       0.458s
FAIL
@easwars
Copy link
Contributor

easwars commented Dec 30, 2021

We do have our own leakcheck implementation in https://github.com/grpc/grpc-go/blob/master/internal/leakcheck/leakcheck.go.

I've sent #5098 to use our leakchecker.

If you feel there is a bug in the test or in the code that is being tested, please let us know.

@sunjayBhatia
Copy link

sunjayBhatia commented Jan 4, 2022

After running into this myself, I think I have an explanation to why I saw this leak, though it may not explain the original reporter's usage. I was able to narrow the leak down to only occurring for the Google default credential cases of the test in question.

If you are running the tests on a GCE instance this shows up, unless you explicitly override the credentials, e.g. (where /tmp/foo contains valid creds)

GOOGLE_APPLICATION_CREDENTIALS=/tmp/foo go test -v -count 1 ./credentials/google

I think the goroutines being leaked are the ones from here (the check to see if the process is running on a GCE instance): https://github.com/googleapis/google-cloud-go/blob/a531b65383652a2905d5d0a9f3d156b342bc31c1/compute/metadata/metadata.go#L108-L182

@github-actions
Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Jan 11, 2022
@sunjayBhatia
Copy link

cc @easwars ^

@easwars
Copy link
Contributor

easwars commented Jan 11, 2022

Thanks @sunjayBhatia for the pointers. I'm able to reproduce a goroutine leak on GCE even without overriding the default creds.

easwars@easwars-directpath-rls-test-vm:~/src/grpc-go$ go test google.golang.org/grpc/credentials/google -run Test/ClientHandshakeBasedOnClusterName/defaultCredsWithOptions_no_cluster_name -count 1
--- FAIL: Test (10.02s)
    --- FAIL: Test/ClientHandshakeBasedOnClusterName (10.02s)
        grpctest.go:39: Leaked goroutine: goroutine 10 [IO wait]:
            internal/poll.runtime_pollWait(0x7f7004a677c8, 0x72)
            	/usr/local/go/src/runtime/netpoll.go:234 +0x89
            internal/poll.(*pollDesc).wait(0xc0000f2080, 0xc0001b4000, 0x0)
            	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
            internal/poll.(*pollDesc).waitRead(...)
            	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
            internal/poll.(*FD).Read(0xc0000f2080, {0xc0001b4000, 0x1000, 0x1000})
            	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
            net.(*netFD).Read(0xc0000f2080, {0xc0001b4000, 0x43a2e7, 0xc000053c30})
            	/usr/local/go/src/net/fd_posix.go:56 +0x29
            net.(*conn).Read(0xc000010310, {0xc0001b4000, 0x12, 0xc0001b21a0})
            	/usr/local/go/src/net/net.go:183 +0x45
            net/http.(*persistConn).Read(0xc0000e4360, {0xc0001b4000, 0xc0000764e0, 0xc000053d30})
            	/usr/local/go/src/net/http/transport.go:1926 +0x4e
            bufio.(*Reader).fill(0xc00006cf60)
            	/usr/local/go/src/bufio/bufio.go:101 +0x103
            bufio.(*Reader).Peek(0xc00006cf60, 0x1)
            	/usr/local/go/src/bufio/bufio.go:139 +0x5d
            net/http.(*persistConn).readLoop(0xc0000e4360)
            	/usr/local/go/src/net/http/transport.go:2087 +0x1ac
            created by net/http.(*Transport).dialConn
            	/usr/local/go/src/net/http/transport.go:1747 +0x1e05
        grpctest.go:39: Leaked goroutine: goroutine 11 [select]:
            net/http.(*persistConn).writeLoop(0xc0000e4360)
            	/usr/local/go/src/net/http/transport.go:2386 +0xfb
            created by net/http.(*Transport).dialConn
            	/usr/local/go/src/net/http/transport.go:1748 +0x1e65
        grpctest.go:60: Leak check disabled for future tests
FAIL
FAIL	google.golang.org/grpc/credentials/google	10.027s
FAIL

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants