Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunicorn doesn't shutdown immediately on receiving SIGTERM #1256

Closed
suriya opened this issue May 9, 2016 · 16 comments · Fixed by #1257
Closed

Gunicorn doesn't shutdown immediately on receiving SIGTERM #1256

suriya opened this issue May 9, 2016 · 16 comments · Fixed by #1257
Labels
bug :( Feature/Worker help wanted Open for everyone. You do not need permission to work on these. May need familiarity with codebase.

Comments

@suriya
Copy link
Contributor

suriya commented May 9, 2016

I am seeing a weird behavior where Gunicorn does not shutdown immediately when sent a SIGTERM. I see this with Python 3.5 on Ubuntu 16.04. The following environments (that I have access to) do not exhibit this issue:

  • Python 2.7 on Ubuntu 16.04
  • Python 2.7 on Ubuntu 14.04
  • Python 3.4 on Ubuntu 14.04

Refer to https://gist.github.com/suriya/1d9a11ea9ad33a5b1df0b2ba7ed0cd0e on how to reproduce this issue.

After debugging a while, I found that the Gunicorn arbiter properly signals the child worker. The child workers handle the signal properly as well (on all versions). However, something prevents the loop in run_for_one function

while self.alive:
from exiting right away with Python 3.5.

Could Python 3.5's change in the behavior of select.select() https://docs.python.org/3/library/select.html#select.select explain this issue.

@tilgovi
Copy link
Collaborator

tilgovi commented May 9, 2016

Do you have open connections when this happens? If so, that is expected. SIGTERM is a graceful shutdown and subject to the --graceful-timeout setting (default 30s).

At present, at least on the master branch, I observe a worker exiting within 1s of sending SIGTERM to the arbiter, tested on Python 3.5.

@suriya
Copy link
Contributor Author

suriya commented May 10, 2016

There is no open connection. As shown in the gist, All that the script does is start gunicorn and send a SIGTERM.

@suriya
Copy link
Contributor Author

suriya commented May 10, 2016

Happens to me with the master branch as well. This is with Python 3.5 on Ubuntu 16.04. Maybe, this happens because of Ubuntu.

$ python --version
Python 3.5.1+
$ gunicorn --version
gunicorn (version 19.5.0)
$ time ./run-and-stop-gunicorn.sh 
[2016-05-10 01:32:59 +0000] [8537] [INFO] Starting gunicorn 19.5.0
[2016-05-10 01:32:59 +0000] [8537] [INFO] Listening at: http://127.0.0.1:8000 (8537)
[2016-05-10 01:32:59 +0000] [8537] [INFO] Using worker: sync
[2016-05-10 01:32:59 +0000] [8541] [INFO] Booting worker with pid: 8541
[2016-05-10 01:33:00 +0000] [8537] [INFO] Handling signal: term
[2016-05-10 01:33:30 +0000] [8537] [INFO] Shutting down: Master

real    0m31.102s
user    0m0.320s
sys 0m0.012s

@suriya
Copy link
Contributor Author

suriya commented May 10, 2016

The same behavior happens with Python 3.5, Gunicorn (master, commit 9d158be), Centos 7.

$ time ./run-and-stop-gunicorn.sh 
CentOS Linux release 7.2.1511 (Core) 
Python 3.5.1
[2016-05-10 04:25:12 +0000] [1448] [INFO] Starting gunicorn 19.5.0
[2016-05-10 04:25:12 +0000] [1448] [INFO] Listening at: http://127.0.0.1:8000 (1448)
[2016-05-10 04:25:12 +0000] [1448] [INFO] Using worker: sync
[2016-05-10 04:25:12 +0000] [1452] [INFO] Booting worker with pid: 1452
[2016-05-10 04:25:13 +0000] [1448] [INFO] Handling signal: term
[2016-05-10 04:25:43 +0000] [1448] [INFO] Shutting down: Master

real    0m31.109s
user    0m0.231s
sys 0m0.053s

Python 3.4 does not have this issue.

$ time ./run-and-stop-gunicorn.sh 
CentOS Linux release 7.2.1511 (Core) 
Python 3.4.4
[2016-05-10 04:24:09 +0000] [1386] [INFO] Starting gunicorn 19.5.0
[2016-05-10 04:24:09 +0000] [1386] [INFO] Listening at: http://127.0.0.1:8000 (1386)
[2016-05-10 04:24:09 +0000] [1386] [INFO] Using worker: sync
[2016-05-10 04:24:09 +0000] [1390] [INFO] Booting worker with pid: 1390
[2016-05-10 04:24:09 +0000] [1386] [INFO] Handling signal: term
[2016-05-10 04:24:09 +0000] [1390] [INFO] Worker exiting (pid: 1390)
[2016-05-10 04:24:09 +0000] [1386] [INFO] Shutting down: Master

real    0m1.103s
user    0m0.240s
sys 0m0.058s

@tilgovi
Copy link
Collaborator

tilgovi commented May 10, 2016

You're right. I do observe this on Python 3. Perhaps I made a mistake earlier when I was testing.

@tilgovi
Copy link
Collaborator

tilgovi commented May 10, 2016

I also agree with the diagnosis.

@tilgovi tilgovi added bug :( Feature/Worker help wanted Open for everyone. You do not need permission to work on these. May need familiarity with codebase. labels May 10, 2016
@tilgovi
Copy link
Collaborator

tilgovi commented May 10, 2016

I don't believe this affects any of the async workers or the threaded worker.

@suriya
Copy link
Contributor Author

suriya commented May 10, 2016

PEP 475 https://www.python.org/dev/peps/pep-0475/#backward-compatibility has some information how this should be addressed.

Applications relying on the fact that system calls are interrupted with InterruptedError will hang. The authors of this PEP don't think that such applications exist, since they would be exposed to other issues such as race conditions (there is an opportunity for deadlock if the signal comes before the system call). Besides, such code would be non-portable.

In any case, those applications must be fixed to handle signals differently, to have a reliable behaviour on all platforms and all Python versions. A possible strategy is to set up a signal handler raising a well-defined exception, or use a wakeup file descriptor.

For applications using event loops, signal.set_wakeup_fd() is the recommanded option to handle signals. Python's low-level signal handler will write signal numbers into the file descriptor and the event loop will be awaken to read them. The event loop can handle those signals without the restriction of signal handlers (for example, the loop can be woken up in any thread, not just the main thread).

@benoitc
Copy link
Owner

benoitc commented May 10, 2016

Just tested it and was able to reproduce it. At least for me gunicorn exit is delayed. Removing the --timeout arguments makes the problem gone.

After debugging a little, it appears that after receiving the signal, the select is indeed blocking there:
https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/sync.py#L35

I am not sure yet how to solve the issue, I will reread the PEP 475 carefully this morning to see if it can give a solution, thanks for the link!

benoitc added a commit that referenced this issue May 10, 2016
in python 3.5 the select is blocking when waiting for it which prevent quick
exit on SIGTERM.

The problem is described:
https://www.python.org/dev/peps/pep-0475/#backward-compatibility

This change fix it by listening for signal event on the worker pipe. Once an
event is triggered it will forcefully wake up the select and return.

fix #1256
@benoitc
Copy link
Owner

benoitc commented May 10, 2016

@suriya @tilgovi please test #1257 and let me know :)

@suriya
Copy link
Contributor Author

suriya commented May 10, 2016

The test script run-and-stop-gunicorn.sh behaves identically with Python 3.5 and Python 3.4. The pull request addresses the SIGTERM issue.

@benoitc Are there other parts of the gunicorn that break because of PEP 475?

@benoitc
Copy link
Owner

benoitc commented May 10, 2016

@suriya not to my knowledge, I tested with other workers and it was fine.

@benoitc
Copy link
Owner

benoitc commented May 10, 2016

@suriya so the patch is fixing your issue right?

@suriya
Copy link
Contributor Author

suriya commented May 10, 2016

@benoitc Yes, absolutely fixing the issue. Thank you.

I was just wondering if there would be other places that might need to be fixed since the effects of PEP 475 seem to be pervasive.

@benoitc
Copy link
Owner

benoitc commented May 10, 2016

@suriya normally not, but will keep an eye on it :-)

mjjbell pushed a commit to mjjbell/gunicorn that referenced this issue Mar 16, 2018
in python 3.5 the select is blocking when waiting for it which prevent quick
exit on SIGTERM.

The problem is described:
https://www.python.org/dev/peps/pep-0475/#backward-compatibility

This change fix it by listening for signal event on the worker pipe. Once an
event is triggered it will forcefully wake up the select and return.

fix benoitc#1256
@confiq
Copy link
Contributor

confiq commented Jun 19, 2018

For those who arrive here and wonder in what version in gunicorn this was fixed, it's 19.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug :( Feature/Worker help wanted Open for everyone. You do not need permission to work on these. May need familiarity with codebase.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants