De-duplicate and parameterize exit error logging. #3057

jeroenp · 2023-08-17T09:21:56Z

The signal log message had the PID and signal in the message format string, which makes each message unique. This is awkward in log processing with for example Sentry, where you want to see similar messages grouped together.

The exit message was emitted twice, from the "if exitcode != 0" and the "if exitcode > 0" clause. You would be seeing things like:

[2023-08-17 09:50:10 +0200] [66450] [ERROR] Worker (pid:66451) exited with code 1 
[2023-08-17 09:50:10 +0200] [66450] [ERROR] Worker (pid:66451) exited with code 1.

See issue #3056

The signal log message had the PID and signal in the message format string, which makes each message unique. This is awkward in log processing with for example Sentry, where you want to see similar messages grouped together. The exit message was emitted twice, from the "if exitcode != 0" and the "if exitcode > 0" clause. You would be seeing things like: [2023-08-17 09:50:10 +0200] [66450] [ERROR] Worker (pid:66451) exited with code 1 [2023-08-17 09:50:10 +0200] [66450] [ERROR] Worker (pid:66451) exited with code 1.

benoitc · 2023-09-05T12:23:14Z

I like this change. Are we abble to create a test for it?

jeroenp · 2023-09-07T16:58:59Z

I like this change. Are we abble to create a test for it?

Something like a logging class in dummy application in test_arbiter.py that captures log messages? I was assuming that arbiter.stop() hits these changes, but instead only reap_workers() hits this.

tilgovi

I like the change and think we should merge it with or without tests. It's pretty straightforward. That said, a simple test would be to just call reap_workers with os.waitpid patched on an arbiter constructed with a patched log.

But it seems excessive to me.

jeroenp · 2024-01-02T08:44:42Z

There's also PR 3094. #3094, which is related and prevents "operator confusion" so to say.

benoitc

please replace the word Child by Worker. Can you tell me why you don't want to use the exitcode= status >> 8 ?

benoitc · 2024-02-03T15:46:58Z

gunicorn/arbiter.py

                    if exitcode != 0:
-                        self.log.error('Worker (pid:%s) exited with code %s', wpid, exitcode)
+                        self.log.error("Child exited with code %s (pid: %s)",


we should use "Worker" instead of "Child" here. This is more meaningful in the context of Gunicorn arbitration.

Hi Benoit,

Sorry, I totally missed that I committed to this open PR! That was not my intention. I'll have a look what happened before and what I want from these changes. My intent was to lower the log level on the regular reload/sigterms. And the two raises in between the log statements were irksome to me.

What I find confusing that both master and worker are logging the same event, with the same wording. In the logs it's not clear which process is which.

benoitc · 2024-02-03T15:47:28Z

gunicorn/arbiter.py

+                        # 0: as seen from macos/py311 from reload on workers
+                        # where workers from SIGTERM but waitpid/nohang gives
+                        # me 0.
+                        self.log.info("Child terminated (pid: %s)", wpid)


same commmenta s above.

benoitc · 2024-02-03T15:51:38Z

gunicorn/arbiter.py

@@ -522,37 +522,40 @@ def reap_workers(self):
                    # A worker was terminated. If the termination reason was
                    # that it could not boot, we'll shut it down to avoid
                    # infinite start/stop cycles.
-                    exitcode = status >> 8
+
+                    exitcode = os.WEXITSTATUS(status) \


why not reusing the status ?

I was trying to find out why I was/am seeing 0 on worker reload, where I was expecting to see 15 for SIGTERM. I was hoping that this might give me more, but clearly 0 stays 0.

benoitc · 2024-02-03T15:54:00Z

gunicorn/arbiter.py

-                            wpid, sig_name)
-
-                        # Additional hint for SIGKILL
-                        if status == signal.SIGKILL:


we miss this extra hint in the change. Could you add it?

You miss the hint? I thought it was misleading because it never is a memory issue whenever we get to see these error logs.

benoitc self-assigned this Sep 5, 2023

benoitc added working on it :) To Review labels Sep 5, 2023

tilgovi approved these changes Dec 29, 2023

View reviewed changes

tilgovi added this to the 22.0 milestone Dec 29, 2023

pajod approved these changes Dec 29, 2023

View reviewed changes

jeroenpulles added 2 commits February 1, 2024 10:18

Merge remote-tracking branch 'origin/master' into log-errors

d4641cb

reorganize the log handling in arbiter's reap workers method.

baea1c8

benoitc requested changes Feb 3, 2024

View reviewed changes

benoitc reviewed Feb 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-duplicate and parameterize exit error logging. #3057

De-duplicate and parameterize exit error logging. #3057

jeroenp commented Aug 17, 2023 •

edited

benoitc commented Sep 5, 2023

jeroenp commented Sep 7, 2023

tilgovi left a comment

jeroenp commented Jan 2, 2024

benoitc left a comment

benoitc Feb 3, 2024

jeroenp Feb 5, 2024

benoitc Feb 3, 2024

benoitc Feb 3, 2024

jeroenp Feb 5, 2024

benoitc Feb 3, 2024

jeroenp Feb 5, 2024

De-duplicate and parameterize exit error logging. #3057

Are you sure you want to change the base?

De-duplicate and parameterize exit error logging. #3057

Conversation

jeroenp commented Aug 17, 2023 • edited

benoitc commented Sep 5, 2023

jeroenp commented Sep 7, 2023

tilgovi left a comment

Choose a reason for hiding this comment

jeroenp commented Jan 2, 2024

benoitc left a comment

Choose a reason for hiding this comment

benoitc Feb 3, 2024

Choose a reason for hiding this comment

jeroenp Feb 5, 2024

Choose a reason for hiding this comment

benoitc Feb 3, 2024

Choose a reason for hiding this comment

benoitc Feb 3, 2024

Choose a reason for hiding this comment

jeroenp Feb 5, 2024

Choose a reason for hiding this comment

benoitc Feb 3, 2024

Choose a reason for hiding this comment

jeroenp Feb 5, 2024

Choose a reason for hiding this comment

jeroenp commented Aug 17, 2023 •

edited