Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition during hot restart causes unix domain sockets to sometimes be removed when the original arbiter is TERM'd. #1523

Closed
rowillia opened this issue Jun 6, 2017 · 5 comments

Comments

@rowillia
Copy link
Contributor

rowillia commented Jun 6, 2017

It appears 418f140#diff-e8a86cbdd6c71fb2b04347a3a9990710 caused a regression for the issue fixed by #1220.

In a gunicorn instance running 16 workers we see the unix domain socket get deleted out from underneath the new arbiter roughly half of the time causing the new instance to fail to serve traffic.

I'm guessing we still need some amount of coordination between arbiters to ensure we're not deleting the UDS if it's going to be used by the new arbiter.

@rowillia
Copy link
Contributor Author

rowillia commented Jun 6, 2017

Tracked this down

The issue is 418f140#diff-057c7e5522ab065edb0f33fddc76ddd2R368 is not enough to ensure we don't close the socket . Specifically, when hot reloading a gevent worker, we shutdown the gevent server which in turn closes all of the open sockets:

https://github.com/lyft/gunicorn/blob/lyft_19.60_fixes/gunicorn/workers/ggevent.py#L123-L127 calls into
https://github.com/gevent/gevent/blob/v1.2a2/src/gevent/baseserver.py#L312-L329

Which makes me think we still need locking (or refcounting) to ensure we don't delete the UDS

@tilgovi
Copy link
Collaborator

tilgovi commented Aug 6, 2017

If new workers are started and you wait until they are listening before killing the old workers, does this avoid the problem? Is the use case a rapid USR2 followed by a QUIT/TERM to the old arbiter? Or does this occur on HUP?

@benoitc
Copy link
Owner

benoitc commented Nov 22, 2019

@tilgovi is this still relevant?

@tilgovi
Copy link
Collaborator

tilgovi commented Nov 24, 2019

I will check.

@benoitc
Copy link
Owner

benoitc commented May 7, 2023

no activity since awhile. closing the issue. if needed we will reopen it.

@benoitc benoitc closed this as not planned Won't fix, can't repro, duplicate, stale May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants