Race condition during hot restart causes unix domain sockets to sometimes be removed when the original arbiter is TERM'd. #1523

rowillia · 2017-06-06T19:23:06Z

It appears 418f140#diff-e8a86cbdd6c71fb2b04347a3a9990710 caused a regression for the issue fixed by #1220.

In a gunicorn instance running 16 workers we see the unix domain socket get deleted out from underneath the new arbiter roughly half of the time causing the new instance to fail to serve traffic.

I'm guessing we still need some amount of coordination between arbiters to ensure we're not deleting the UDS if it's going to be used by the new arbiter.

The text was updated successfully, but these errors were encountered:

rowillia · 2017-06-06T19:51:59Z

Tracked this down

The issue is 418f140#diff-057c7e5522ab065edb0f33fddc76ddd2R368 is not enough to ensure we don't close the socket . Specifically, when hot reloading a gevent worker, we shutdown the gevent server which in turn closes all of the open sockets:

https://github.com/lyft/gunicorn/blob/lyft_19.60_fixes/gunicorn/workers/ggevent.py#L123-L127 calls into
https://github.com/gevent/gevent/blob/v1.2a2/src/gevent/baseserver.py#L312-L329

Which makes me think we still need locking (or refcounting) to ensure we don't delete the UDS

tilgovi · 2017-08-06T23:42:09Z

If new workers are started and you wait until they are listening before killing the old workers, does this avoid the problem? Is the use case a rapid USR2 followed by a QUIT/TERM to the old arbiter? Or does this occur on HUP?

benoitc · 2019-11-22T20:44:01Z

@tilgovi is this still relevant?

tilgovi · 2019-11-24T00:00:20Z

I will check.

benoitc · 2023-05-07T13:19:00Z

no activity since awhile. closing the issue. if needed we will reopen it.

si14 mentioned this issue Jun 7, 2017

Missing unix socket after restart #1524

Closed

tilgovi added Investigation bug :( labels Sep 7, 2017

tilgovi self-assigned this Apr 28, 2018

benoitc closed this as not planned Won't fix, can't repro, duplicate, stale May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition during hot restart causes unix domain sockets to sometimes be removed when the original arbiter is TERM'd. #1523

Race condition during hot restart causes unix domain sockets to sometimes be removed when the original arbiter is TERM'd. #1523

rowillia commented Jun 6, 2017

rowillia commented Jun 6, 2017

tilgovi commented Aug 6, 2017

benoitc commented Nov 22, 2019

tilgovi commented Nov 24, 2019

benoitc commented May 7, 2023

Race condition during hot restart causes unix domain sockets to sometimes be removed when the original arbiter is TERM'd. #1523

Race condition during hot restart causes unix domain sockets to sometimes be removed when the original arbiter is TERM'd. #1523

Comments

rowillia commented Jun 6, 2017

rowillia commented Jun 6, 2017

tilgovi commented Aug 6, 2017

benoitc commented Nov 22, 2019

tilgovi commented Nov 24, 2019

benoitc commented May 7, 2023