Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When receiving a SIGTERM supervisors should terminate their processes before joining them #1069

Merged
merged 1 commit into from Jul 30, 2021

Conversation

sgsabbage
Copy link
Contributor

Fixes #852

By calling process.terminate(), it sends a SIGTERM to the child processes which should handle it appropriately. That said, I have a very naive understanding of multiprocessing so this may not be the right resolution.

@euri10
Copy link
Member

euri10 commented Jun 7, 2021

ok I tested the PR and as I expected it it doesnt fix it
in order to see it by yourself you do
uvicorn app:app --workers=2
in another terminal kill -15 ppid_of_parent_process
and it's still running

@euri10 euri10 closed this Jun 7, 2021
@sgsabbage
Copy link
Contributor Author

@euri10 What environment are you running out of interest? I'm running on Ubuntu 20.04

I just tried this and it worked for me with the PR and didn't work without it:

Without:

$ uvicorn test:app --port 8001 --workers 2
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
INFO:     Started parent process [703]
INFO:     Started server process [707]
INFO:     Waiting for application startup.
INFO:     Started server process [706]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.

$ kill -15 703

Nothing happens and had to close with CTRL-C

With:

$ uvicorn test:app --port 8001 --workers 2
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
INFO:     Started parent process [710]
INFO:     Started server process [714]
INFO:     Waiting for application startup.
INFO:     Started server process [713]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.

$ kill -15 710

INFO:     Shutting down
INFO:     Finished server process [713]
INFO:     Shutting down
INFO:     Finished server process [714]
INFO:     Stopping parent process [710]

Would be interested to see if this is down to different environments handling signals differently.

I also tested spinning up a docker container and using docker stop on the container. With the PR in the container shuts down gracefully. WIthout the container hangs and then gets killed.

@euri10
Copy link
Member

euri10 commented Jun 7, 2021

this is weird, I'm on debian

@euri10 euri10 reopened this Jun 7, 2021
@sgsabbage
Copy link
Contributor Author

I will try and spin up a fresh debian environment today to see if there's any strangeness in my setup that I've missed

@euri10
Copy link
Member

euri10 commented Jun 7, 2021

I think I may have fucked up the way I sent the 15 signal...
It indeed works fine using the ppid advertised in our logger, but I was using fzf and for some reason the pid that arise 1st is not the ppid when you type uvicorn

20210607_1028_1464x194_1623054506

this would be awesome

@euri10
Copy link
Member

euri10 commented Jun 7, 2021

we'd also need to test the behaviour with gunicorn

@sgsabbage
Copy link
Contributor Author

we'd also need to test the behaviour with gunicorn

I've started looking at testing it with gunicorn as well, but as far as I can find the majority of the documentation suggests that gunicorn takes the place of the supervisors in this case, as it's suggested to run it with gunicorn -k uvicorn.workers.UvicornWorker, which to the best of my knowledge doesn't use either the reload or the multiprocessing supervisors.

We're entering into areas that I don't have any knowledge of beyond scanning the code though, so if there are specific deployment scenarios where gunicorn uses either of the supervisors I'm more than happy to preemptively check them out in case there are any issues!

@gmeans
Copy link

gmeans commented Jun 17, 2021

I found this MR due to an issue with --reload not shutting down w/ a keyboard interrupt (ctrl+c). The end result would be the python process still running and my shell was hung up.

I installed this branch and this change solved my issue. I'm on MacOS 11.4 (Big Sur).

Poetry Dependencies

[tool.poetry.dependencies]
python = "^3.8"
fastapi = "^0.65.2"
python-dotenv = "^0.17.1"
psycopg2-binary = "^2.8.6"
alembic = "^1.6.5"
SQLAlchemy = "^1.4.17"
gunicorn = "^20.1.0"
uvicorn = {git = "https://github.com/sgsabbage/uvicorn.git", branch = "process_terminate", extras = ["standard"]}

Result:

Command in the script that's run: berglas exec -- uvicorn app.server:app --reload

image

This seems to reliably shut down the python processes, but I do see that os: process already finished error regularly. Sometimes I see it output multiple times, but I'm assuming that's a thread thing.

Hope this helps as I'd l really like to see this resolved soon. Let me know if I can help with any other testing.

Thanks!

@euri10
Copy link
Member

euri10 commented Jun 21, 2021

This seems to reliably shut down the python processes, but I do see that os: process already finished error regularly. Sometimes I see it output multiple times, but I'm assuming that's a thread thing.

Hope this helps as I'd l really like to see this resolved soon. Let me know if I can help with any other testing.

ok this is interesting @gmeans , it seems like a mac os thing, but not sure, I was not able to reproduce on linux, can you try reproduce with log-level=trace and see if that yields something interesting in your logs, I'd be interested in seeing where that os: process already finished happens

@euri10
Copy link
Member

euri10 commented Jun 22, 2021

seems like I can reproduce in this draft PR, so at least I've got an idea of what's going on
https://github.com/encode/uvicorn/pull/1090/checks?check_run_id=2884640907#step:5:45

Copy link
Member

@euri10 euri10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's roll with this, we can still see the macos "issue" after if needed, as this PR alone seems to solves the aforementionned issue ! thanks @sgsabbage

@euri10 euri10 merged commit de53c23 into encode:master Jul 30, 2021
@Asday
Copy link
Contributor

Asday commented Jul 30, 2021

!!!

When might I be able to see this released on PyPI?

@HansBrende
Copy link
Contributor

@sgsabbage I believe that this implementation was faulty... the BaseReload should not call process.terminate() on shutdown (after it has already received a SIGINT signal causing the shutdown in the first place). Reason being: now the child process is receiving BOTH a SIGINT followed by a SIGTERM!!! Which has the unfortunate side effect of a very ungraceful shutdown, since the server does a force shutdown after the second signal it receives. FWIW I'm on MacOS Catalina (10.15.7).

More info: https://stackoverflow.com/a/31907519/2599133

Here's an example of what happens:

INFO:     Started reloader process [522] using statreload
INFO:     Started server process [524]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Now...

  1. I send a SIGINT via ^C
  2. uvicorn.server.Server.handle_exit (child process) receives the SIGINT -> sets self.should_exit to True
  3. uvicorn.supervisors.basereload.BaseReload.signal_handler receives the SIGINT -> calls self.process.terminate()
  4. uvicorn.server.Server.handle_exit (child process) receives the SIGTERM -> sets self.force_exit to True
INFO:     Shutting down
INFO:     Finished server process [524]
ERROR:    Exception in 'lifespan' protocol
Traceback (most recent call last):
  File "/Users/hansbrende/miniconda3/envs/facade-api/lib/python3.8/site-packages/starlette/exceptions.py", line 58, in __call__
    await self.app(scope, receive, send)
  File "/Users/hansbrende/miniconda3/envs/facade-api/lib/python3.8/site-packages/starlette/routing.py", line 569, in __call__
    await self.lifespan(scope, receive, send)
  File "/Users/hansbrende/miniconda3/envs/facade-api/lib/python3.8/site-packages/starlette/routing.py", line 544, in lifespan
    await receive()
  File "/Users/hansbrende/miniconda3/envs/facade-api/lib/python3.8/site-packages/uvicorn/lifespan/on.py", line 135, in receive
    return await self.receive_queue.get()
  File "/Users/hansbrende/miniconda3/envs/facade-api/lib/python3.8/asyncio/queues.py", line 163, in get
    await getter
asyncio.exceptions.CancelledError
INFO:     Stopping reloader process [522]

@Kludex
Copy link
Sponsor Member

Kludex commented Nov 24, 2021

More on #1165

Kludex pushed a commit to sephioh/uvicorn that referenced this pull request Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sending SIGTERM to parent process when running with --workers hangs indefinitely
6 participants