Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request interrupted during graceful restarting due to the signal #1337

Closed
cadina opened this issue Aug 25, 2016 · 7 comments
Closed

Request interrupted during graceful restarting due to the signal #1337

cadina opened this issue Aug 25, 2016 · 7 comments

Comments

@cadina
Copy link

cadina commented Aug 25, 2016

@xiaost's workaround is to block signals before processing a request and unblock after finishing the request.

Since it's a very common problem, is there any side effect that blocks us to handle the signal like this by default in Gunicorn?

@xiaost
Copy link

xiaost commented Aug 25, 2016

gunicorn calls signal.siginterrupt which ensure syscall to be restarted.
It is not working when socket timeout > 0. solutions of EINTR issue seem to be platform-specific.
Rather than sending SIGTERM to worker, it is a better way to use tmpfile(set exit byte) for graceful shutdown

@tilgovi
Copy link
Collaborator

tilgovi commented Sep 22, 2016

If someone can suggest an action here, we can try to make a change, but I am unable to decipher what you this issue is asking for.

Gunicorn already uses siginterrupt when available and recently PEP 475 has introduced new default behavior for Python 3.5 where calls should not be interrupted.

Furthermore, while these solutions may not work for every platform and every Python version, we plan to overhaul the inter-process communication for the next major iteration of Gunicorn.

Is there any action that should be taken on this issue or can it be closed?

@ivarref
Copy link

ivarref commented Oct 4, 2016

Hi @tilgovi . And thanks for a fine framework.

I think the issue here is that worker processes will receive SIGINT/SIGTERM and simply do a sys.exit(0) even if they are in the middle of handling a request.

I have reproduced a similar issue here: https://github.com/ivarref/gunicorn-swarm
Should I open a separate issue for this?

Correct me if I'm incorrect.

@tilgovi
Copy link
Collaborator

tilgovi commented Oct 4, 2016

INT causes sys.exit but TERM does not.

Is this different from what you see? Can someone tell me what Gunicorn version, OS, Python version, and worker class they see interruptions on?

As far as I know, graceful shutdown works on every worker type we ship on Py2.7 and Py3.4 on Linux. I believe I was testing 3.4 and not 3.5 last time I tried.

@ivarref
Copy link

ivarref commented Oct 5, 2016

Hi @tilgovi and thanks for the fast reply.

Let me focus on SIGTERM.

I added some logging to Gunicorn:master and got the following output:

[2016-10-05 12:33:12 +0000] [24020] [INFO] Starting gunicorn 19.6.0
[2016-10-05 12:33:12 +0000] [24020] [INFO] Listening at: http://0.0.0.0:8080 (24020)
[2016-10-05 12:33:12 +0000] [24020] [INFO] Using worker: sync
[2016-10-05 12:33:12 +0000] [24028] [INFO] Booting worker with pid: 24028

# At this point I started the hammering the server with requests in a different terminal. See the later linked commit for details.

handling request ...

# At this point I executed kill 24020 in a different terminal.

[2016-10-05 12:33:35 +0000] [24020] [INFO] Arbiter: Received signal 15
[2016-10-05 12:33:35 +0000] [24020] [INFO] Handling signal: term
[2016-10-05 12:33:35 +0000] [24020] [INFO] Arbiter: Handle SIGTERM
[2016-10-05 12:33:35 +0000] [24020] [INFO] gracefully closing socket ...
handling request ...
[2016-10-05 12:33:36 +0000] [24020] [INFO] waiting before gracefully closing socket, 0
handling request ...
[2016-10-05 12:33:37 +0000] [24020] [INFO] waiting before gracefully closing socket, 1
handling request ...
[2016-10-05 12:33:38 +0000] [24020] [INFO] waiting before gracefully closing socket, 2

[2016-10-05 12:33:38 +0000] [24020] [INFO] all server sockets should be closed by now
# ^^^ Now all listening sockets should have been closed

[2016-10-05 12:33:38 +0000] [24020] [INFO] killing workers ...
[2016-10-05 12:33:38 +0000] [24020] [INFO] killing workers in 3 ...
handling request ...
# ^^^ But workers keeps accepting new requests

[2016-10-05 12:33:39 +0000] [24020] [INFO] killing workers in 2 ...
handling request ...
# ^^^ And another requests is processed

[2016-10-05 12:33:41 +0000] [24020] [INFO] killing workers in 1 ...
[2016-10-05 12:33:41 +0000] [24020] [INFO] killing workers ... GO
[2016-10-05 12:33:41 +0000] [24028] [INFO] Worker calling sys.exit(0) (pid: 24028)
[2016-10-05 12:33:41 +0000] [24028] [INFO] Worker exiting (pid: 24028)
[2016-10-05 12:33:41 +0000] [24020] [INFO] Arbiter: Handle SIGCHLD
[2016-10-05 12:33:41 +0000] [24020] [INFO] Shutting down: Master

Isn't it strange that the worker keeps handling requests after the server socket has been closed?
Am I missing something?

The line handling request ... have been de-duplicated when appearing multiple times between regular log messages.

Here is a fork of Gunicorn:master showcasing this behaviour (added some logging statements and scripts to reproduce behaviour):
ivarref@b4049c4

Some details of my system:
uname -a => Linux (...) 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
./venv/bin/python --version => Python 2.7.12
lsb_release -a => Ubuntu 16.04.1 LTS, xenial

@tilgovi What do you think? It looks strange to me that workers keep accepting new requests after the server socket is closed. But I'm no IPC/socket expert, so I may very well have missed something.

Thanks for your time.

@benoitc
Copy link
Owner

benoitc commented Dec 27, 2016

"Isn't it strange that the worker keeps handling requests after the server socket has been closed?
Am I missing something?"

This is the purpose of graceful shutdown. Ie let's try to finish the last connection accepted in the next Graceful timeout. If not then kill it:
http://docs.gunicorn.org/en/stable/signals.html#master-process

Maybe some connections are already in the backlog and were were accepted before the workers got the signal?

@benoitc benoitc added this to Other in Bugs Feb 26, 2017
@tilgovi
Copy link
Collaborator

tilgovi commented Apr 28, 2018

Closing in favor of #1236

@tilgovi tilgovi closed this as completed Apr 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Bugs
Other
Development

No branches or pull requests

5 participants