Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible deadlock in Server.GracefulStop #4584

Closed
bboreham opened this issue Jul 5, 2021 · 3 comments
Closed

Possible deadlock in Server.GracefulStop #4584

bboreham opened this issue Jul 5, 2021 · 3 comments

Comments

@bboreham
Copy link
Contributor

bboreham commented Jul 5, 2021

What version of gRPC are you using?

v1.34.0, but I see the same code in latest master.

What version of Go are you using (go version)?

go version go1.16.3 linux/amd64

What operating system (Linux, Windows, …) and version?

Linux gke-dev-us-central-0-main-n2s8-0-dbda6bb2-m1q4 4.19.167+ #1 SMP Tue Feb 2 07:59:55 PST 2021 x86_64 GNU/Linux

What did you do?

Did a bunch of serving then called GracefulStop(). Sorry, only observed the hang once.

What did you expect to see?

Clean shutdown.

What did you see instead?

A hang:

goroutine 401 [sync.Cond.Wait, 11 minutes]:
sync.runtime_notifyListWait(0xc0008b6210, 0xc000000002)
	/usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc0008b6200)
	/usr/local/go/src/sync/cond.go:56 +0x99
google.golang.org/grpc.(*Server).GracefulStop(0xc0008c2700)
	/backend-enterprise/vendor/google.golang.org/grpc/server.go:1695 +0x1ee
github.com/weaveworks/common/server.(*Server).Shutdown(0xc000265340)
	/backend-enterprise/vendor/github.com/weaveworks/common/server/server.go:438 +0xac
...

Translating line numbers into the current source, it is stuck on s.cv.Wait() here:

grpc-go/server.go

Lines 1788 to 1792 in dd58992

s.mu.Lock()
for len(s.conns) != 0 {
s.cv.Wait()
}

Note that the s.mu lock is held, but all calls to s.cv.Broadcast() also require to take that lock; hence deadlock.

(I don't see a goroutine attempting to take that lock; I do see evidence I was hit by #4447, but I wanted to report this one that seems fairly clear-cut.)

@easwars easwars self-assigned this Jul 7, 2021
@easwars
Copy link
Contributor

easwars commented Jul 7, 2021

Do you happen to have the full goroutine dump from the time of the hang?
Have you been able to eliminate the possibility that one of your service handlers is not exiting, which is why GracefulStop() is still stuck?

Translating line numbers into the current source, it is stuck on s.cv.Wait() here:

grpc-go/server.go

Lines 1788 to 1792 in dd58992

s.mu.Lock()
for len(s.conns) != 0 {
s.cv.Wait()
}

the call to Wait() atomically releases the lock and suspends execution of the calling goroutine till Signal() or Broadcast() is called from another goroutine. So, I don't follow your claim that holding the lock here causes the deadlock on the goroutine which is calling Broadcast()

@bboreham
Copy link
Contributor Author

bboreham commented Jul 7, 2021

I was not hit by this deadlock; I present it as a possibility I spotted while investigating something else which I concluded was #4447.

Can you expand on how calling Wait on a Condition releases a lock on a Mutex?

@bboreham
Copy link
Contributor Author

bboreham commented Jul 7, 2021

Ok I see what you mean. Sorry for troubling.

@bboreham bboreham closed this as completed Jul 7, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants