Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arbiter: Handle SIGCHLD like all other signals + misc signal handling improvements #3148

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sylt
Copy link

@sylt sylt commented Feb 1, 2024

As of writing, there are three commits in this series in the following order:

  1. Remove special handling of SIGCHLD. Instead, try and handle it like all other signals, so that we're able to do stuff like printing without risk to cause bad behavior.
  2. Remove SIG_QUEUE for communicating which signals occurred. We can use the already existing PIPE in arbiter for this. This way, no dynamic memory allocations will be done in a signal context.
  3. Improve handling of status received from waitpid() (which is how I ended up here).

@sylt sylt force-pushed the master branch 3 times, most recently from 97c695b to fb6e5d2 Compare February 2, 2024 15:13
@sylt sylt changed the title arbiter: Use waitpid() facilities to handle worker exit status arbiter: Handle SIGCHLD like all other signals + misc signal handling improvements Feb 2, 2024
gunicorn/arbiter.py Outdated Show resolved Hide resolved
@benoitc
Copy link
Owner

benoitc commented Feb 3, 2024

Thanks for the patch I appreciate. But I'm not sure I agree with this change or #2314 (cc @tilgovi ). It removes the asynchronous way to monitor the workers by mixing it with arbiter signaling. Afaik we don't to wait for a worker signal when we just want to quit Gunicorn or reload it. These messages should be taken in priority among the rest.

Also likely we want to process all signals related to the workers and not skip one which is done when the signal queue is used. I see this is removed in this patch but by removing the signal queue you miss the fact that it is used a s a buffer that protect us to receive to much messages to handle. Sometimes this happening due to a badly configured alert handler or just a supervisor like systemd going crazy. We need to keep it around. This has been added on purpose.

To fix the orignal issue (not having the ability to print without crashing), I think we should handle logging in a separate thread out of the signal handler. We can redirect all messages to a pipe or unix socket that will be consumed by this thread to print or actually write the logs. A patch in this direction would not break the logic exposed above and continue. This thread could also be used to handle stats. Would you make this patch?

@sylt
Copy link
Author

sylt commented Feb 4, 2024

@benoitc Thanks for having a look! It's great to know some history as well behind the code -- git blame didn't help me much there unfortunately. I'll answer your findings in the same order as you wrote them:

It removes the asynchronous way to monitor the workers by mixing it with arbiter signaling. Afaik we don't to wait for a worker signal when we just want to quit Gunicorn or reload it. These messages should be taken in priority among the rest.

Maybe I've misunderstood then, because I believed that the code actually did wait for workers when one quits/reloads Gunicorn? I am basing this on Arbiter.stop():

    def stop(self, graceful=True):
        ... # lots of lines omitted

        self.kill_workers(sig)
        # wait until the graceful timeout
        while self.WORKERS and time.time() < limit:
            time.sleep(0.1)

This function won't exit until either the timeout has been reach, or all workers have been removed. So, while the code above is (most likely) executing the time.sleep(0.1), SIGCHLD keeps coming in, removing the workers. Reading #2314, I believe this is the same analysis which @tilgovi has done as well.

Just so that I don't misunderstand, was this what you were referring to?

Also likely we want to process all signals related to the workers and not skip one which is done when the signal queue is used. I see this is removed in this patch but by removing the signal queue you miss the fact that it is used a s a buffer that protect us to receive to much messages to handle. Sometimes this happening due to a badly configured alert handler or just a supervisor like systemd going crazy. We need to keep it around. This has been added on purpose.

Yes, I suspected that the '5' max limit was due to something like that, but if there is a supervisor process going bananas, spamming signals to a process, I doubt that such a limit would help in a precarious situation like that since the queue would keep being full as it gets new signals?

The reason the queue was removed has to do with signal safety: I think the signal handler should be kept as simple as possible, with no memory allocation nor print-outs nor locking of any kind. Since I believe the list manipulation in python involves some kind of malloc(), I don't think it's strictly signal safe (although in practice, it might be).

And since I saw that SIG_QUEUE has been around since the very first revision of arbiter.py (in 2009), I thought that maybe things have changed since back then. And since there's no logging when the buffer is full, I didn't know if we actually had any data which supported its existence?

To fix the orignal issue (not having the ability to print without crashing), I think we should handle logging in a separate thread out of the signal handler. We can redirect all messages to a pipe or unix socket that will be consumed by this thread to print or actually write the logs. A patch in this direction would not break the logic exposed above and continue. This thread could also be used to handle stats. Would you make this patch?

But how would we implement such a proposal, send formatted logs from a signal context to a pipe, without doing any dynamic memory allocation in the signal context? In C I would have understood -- use a static buffer on the stack and use snprintf(). In python however, we don't have that kind of control over memory. Also, then we'll have another thread to manage with all problems and considerations that comes with it (life-time, locking).

So before introducing another thread, I think we need to carefully consider if there is no other way to solve the problem. Although I am of course very biased in all of this, I think the solution offered in this PR solves the logging problem and also manages to remove some complexity rather than adding more on top.

If you still want to go in this direction however, perhaps creating a separate issue would be a good idea? But please consider these thoughts above!

Copy link

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw @tilgovi also singles out CLD in his PR.

Whichever PR gets continued / merged, it may be worthwhile documenting this special treatment / aliasing of CLD in the code, to ease future maintenance.

@@ -237,10 +249,12 @@ def run(self):
self.pidfile.unlink()
sys.exit(-1)

def handle_chld(self, sig, frame):
def handle_cld(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGNALS installs only SIGCHLD, not SIGCLD. What is this alias for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, this is a good question, I had to dig a little bit :)

Short answer: Because self.SIG_NAMES[Signals.SIGCHLD] => 'cld'.

Long answer: Consider the following code:

import signal

# I love dynamic languages
SIGNALS = [getattr(signal, "SIG%s" % x)
               for x in "CHLD HUP QUIT INT TERM TTIN TTOU USR1 USR2 WINCH".split()]
SIG_NAMES = dict(
    (getattr(signal, name), name[3:].lower()) for name in dir(signal)
    if name[:3] == "SIG" and name[3] != "_"
)

While SIGNALS looks correct, the bug is inside SIG_NAMES, specifically how we loop over dir(signal). On Linux, this list of attributes contains both the strings 'SIGCHLD' and 'SIGCLD' in this order. So what happens? Well, the same thing as { 'foo': 1, 'foo': 2 }: we end up with { 'foo': 2 }.

One way to fix this would be to loop over [s.name for s in signal.Signals] instead, which doesn't seem to include both SIGCLD and SIGCHLD (only the latter is included).

@tilgovi
Copy link
Collaborator

tilgovi commented Feb 4, 2024

@sylt Python signal handler code does not get executed in the C signal handler context. I believe the interpreter just executes the signal handler at certain times before returning to the main execution of the script. I think it's perfectly safe to allocate memory and so forth.

So while we don't have to worry about C-level concerns, we do still have to reason about the Python interpreter switching execution between the signal handler code and the main loop.

This is why my PR really only suggests that the benefit is a lack of risk of concurrent modification of the workers list and some code simplification in the form of reaping only happening in one place.

But I think our analysis is otherwise correct.

@tilgovi
Copy link
Collaborator

tilgovi commented Feb 4, 2024

@sylt
Copy link
Author

sylt commented Feb 4, 2024

@tilgovi Alright, thanks for the correction, that's a relief to hear it works in such a way! :) Well, then I guess commit 2 in this PR shouldn't be needed from a memory safety point of view, so perhaps I should throw it out.

@jcjones1515
Copy link

I found this PR after observing the following in the logs of a gunicorn docker container: [2024-02-05 14:59:11 +0000] [1] [ERROR] Unhandled signal: cld

this occurs when OOMKilled gets triggered by memory pressure

I actually came here to make a PR to fix it so both CHLD and CLD get handled the same, but it looks like you beat me to it by a few days?

@jcjones1515
Copy link

I actually came here to make a PR to fix it so both CHLD and CLD get handled the same, but it looks like you beat me to it by a few days?

misunderstood what was going on earlier -- it's looking for the wrong alias I think.

just in case this on takes a while to get merged, I made this quick fix as an option: #3151

@sylt
Copy link
Author

sylt commented Feb 7, 2024

@benoitc I have made an alternative approach here with the following highlights:

  • SIGCHLD is handled separately, just as before.
  • The signal queue with rate limiting (< 5 items) is kept just as before.
  • In the handler for SIGCHLD, we don't do any logging, we just do waitpid() and save the status to the worker object.
  • In the main loop where we manage workers, we go through all workers and see if they have an exit status. If so, print it and then remove the worker so that new ones can spawn.

Would such an approach seem more reasonable?

... as opposed to in signal context. This is beneficial, since it
means that we can, in a signal safe way, print messages about why
e.g. a worker stopped its execution.

And since handle_sigchld() logs what it does anyway, don't bother
printing out that we're handling SIGCHLD. If workers are killed at
rapid pace, we won't get as many SIGCHLD as there are workers killed
anyway.
Since we can use something from queue.*, we can make it blocking as
well, removing the need for two different data structures.
This change is meant to handle the return value of waitpid() in a way
that is more in line with the man page of said syscall. The changes
can be summarized as follows:

* Use os.WIFEXITED and os.WIFSIGNALED to determine what caused
  waitpid() to return, and exactly how a worker may have exited.

* In case of normal termination, use os.WEXITSTATUS() to read the exit
  status (instead of using a hand rolled bit shift). A redundant log
  was removed in this code path.

* In case of termination by a signal, use os.WTERMSIG() to determine
  the signal which caused the worker to terminate. This was buggy
  before, since the WCOREFLAG (0x80) could cause e.g. a SIGSEGV (code
  11) to be reported as "code 139", meaning "code (0x80 | 11)".

* Since waitpid() isn't called with WSTOPPED nor WCONTINUED, there's
  no need to have any os.WIFSTOPPED or os.WIFCONTINUED handling.
According to the python signal documentation[1], SIGCHLD is handled
differently from other signals. Specifically, if the underlying
implementation resets the SIGCHLD signal handler, then python won't
reinstall it (as it does for other signals).

This behavior doesn't seem to exist for neither Linux nor Mac, but
perhaps one could argue that it's good practise anyway.

[1] https://docs.python.org/3/library/signal.html
* Look up handlers in __init__() to induce run-time error early on if
  something is wrong.

* Since we now know that all handlers exist, we can simplify the main
  loop in arbiter, in such a way that we don't need to call wakeup().

So after this commit, the pipe in arbiter is only used to deliver
which signal was sent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants