fix unix socket locking #1263

benoitc · 2016-05-14T08:51:31Z

This change add proper file locking to gunicorn. By default "gunicorn.lock" is created in the temporary directory when a unix socket is bound. In case someone want to fix the lock file path or use multiple gunicorn instance the "--lock-file" setting can be used to set the path of this file.

fix #1259

benoitc · 2016-05-14T08:53:20Z

Maybe we should create unique lock file instead of fixing it to gunicorn.sock ? That can be done by creating it on first boot and then pass it to tthe environnement. Thoughts?

jamadden · 2016-05-14T11:51:52Z

I think this could be a breaking change for anyone that runs multiple instances of gunicorn on the same machine at the same time, for example, in testing, or to run two different applications, etc. They would be forced to create a specific lock file location for each instance (which could be annoying if the configs are auto-generated; speaking from experience here). So a unique lock file name seems required to me.

We could automatically make the lock file name unique. Since we're only locking for the sake of the unix domain socket(s), we can do that by including a portion of their path in the lockfile name. At first I thought of just using it as-is, e.g., lockfile_name = '/tmp/gunicorn.' + os.path.basename(sock.cfg_addr) + '.lock', but that breaks if the multiple instances only differ in their prefixes, which is a common way to deploy and test things, especially if you use zc.buildout and some other automated deployment techniques.

So I would suggest, then, taking a hash of the unix domain socket path and including that (or some portion of it) in the lock: lockfile_name = '/tmp/gunicorn.' + hashlib.sha1(sock.cfg_addr).hexdigest + '.lock'. That automatically makes the lock name unique for each unix domain socket.

And if we're doing that, we can keep the lock file confined to the unix domain socket class itself, as before. If we also keep using the shared/exclusive lock technique, then that eliminates the need for changes to the arbiter, the create_sockets function, the close method, etc. It also has the benefit of keeping multiple instances from fighting over the same socket address, but letting you move sockets around between them easily.

benoitc · 2016-05-14T13:28:03Z

@jamadden There is only one lock file for all the socket, and it's better that way imo, since it can be later use for other stuff possibly.

I agree anyway that we should make the default lock file unique. The change above should fix it.

benoitc · 2016-05-14T15:04:43Z

@tilgovi @berkerpeksag any feedback is welcome :)

jamadden · 2016-05-14T15:31:11Z

I guess I'm just more a fan of finer-grained, single-purpose locking, especially when it's taken and managed close to the resource it's protecting. I find that easier to understand and debug when something (inevitably) goes wrong.

benoitc · 2016-05-14T15:36:29Z

@jamadden I am also fan about simple and focused solution and I think it's the case there :) What we want to is preventing any unix sockets file on which the app is listening to be unlinked on USR2. Having 1 file / socket is then not needed, in fact we only need to know if an arbiter is still listening. Which is done by using 1 lock file. Bonus point also: we reduced the number of files used and the number of locks.

jamadden · 2016-05-14T16:02:09Z

@benoitc ok, thanks. I see your point. I'm not sure I'm convinced about the complexity tradeoffs, but that's a personal thing :)

This change add proper file locking to gunicorn. By default "gunicorn.lock" is created in the temporary directory when a unix socket is bound. In case someone want to fix the lock file path or use multiple gunicorn instance the "--lock-file" setting can be used to set the path of this file. fix #1259

tilgovi · 2016-05-14T21:25:50Z

gunicorn/lockfile.py

+
+    def lock(self):
+        _lock(self._lockfile.fileno())
+        self._locked = True


This should be set to the return value of the _lockcall, or we may incorrectly report as locked when we are not.

hrm there is no return from the value, either it fails or not since it's a shared lock. did I miss anything?

You're right that there's no return value from lock. I was thinking of _unlock. If the lock fails, this will bubble an error, and that seems fine.

tilgovi · 2016-05-14T21:32:37Z

We should do something about the naming. Calling a function unlock when it actually attempts an exclusive lock is not super friendly to future readers.

tilgovi · 2016-05-14T21:35:19Z

Oh. I guess I'm a little late to comment.

tilgovi · 2016-05-14T21:36:44Z

I also think maybe we should always do the lock, regardless of UNIX socket or not. It's unnecessary branching that adds complexity to the function bodies and some function call signatures, when it would be harmless to always do it.

tilgovi · 2016-05-14T21:37:41Z

I don't love that sock.close now has an extra argument, but I'm not sure what to do about that, if anything should be.

benoitc · 2016-05-14T23:52:55Z

@tilgovi why should we create a lock file if we don't use it though? It's using a fd for nothing. I'm not against it though it would indeed simplify the code. Just that other servers like apache or nginx don't use locks by default.

About the parameters added to the socket, why instead not adding a terminate method that would be called if the socket is unlocked. This function in the case of the unix socket would unlink the file. Would it work?

About method names well, what about release, and then acquire instead of lock ?

tilgovi · 2016-05-15T00:19:43Z

I wonder if we could solve this simply without locks now that I think about it more. There would still be race conditions if USR2 is followed very quickly by QUIT, but that's true with the lock, also.

The way it would work is this:

Arbiter checks for an environment variable to the PID of the "main" arbiter
If there is no value, arbiter sets the value to its own PID
Arbiter tracks state of a single child arbiter, noting creation after USR2 and destruction after SIGCHLD
An arbiter always knows if it's a child by comparing the value of the environment variable to its parent pid
A child arbiter refuses to fork on USR2
On shutdown, only a parent arbiter unlinks the socket

Something like this should work just as well as the lock, with no need for a lock file. It relies on the guarantee that there are never more than two arbiters, which seems reasonable maybe.

On USR2, arbiter compare s the main arbiter pid to its parent pid and if they match, refuse to fork. Otherwise, the arbiter declares itself as main arbiter (update environment) and forks.

It is not possible to be inherited by a process with as PID that matches the parent arbiter, because to inherit the current process it would have had to exist before the parent arbiter spawned, but that would imply two processes with the same PID. Therefore, it's not even necessary to keep track of the parent, only to check it at signal time.

benoitc · 2016-05-15T00:29:00Z

I pushed some improvements in #1266

@tilgovi what is a child arbiter though? The new arbiter spawned is not attached to the old arbiter which means that the current arbiter will not receive any signals associated to it. In fact we should make sure they don't share anything since the purpose of USR2 is to allow the upgrade of the application.

It's late there so I will think more about it in the morning. The issue I can see right now also is that the arbiter can die before the control is given to the new arbiter.

tilgovi · 2016-05-15T17:01:31Z

I also want to test how things are working with systemd socket activation. If systemd manages the socket lifecycle, we should perhaps never unlink the socket.

I'll think this over more as I can but if you need to make a hotfix release go for it.

benoitc · 2016-05-15T23:24:04Z

Hrm I think you just found an issue. Indeed we should never unlink the socket if it passed to us via systemd. Or to be simpler when we don't control it ourself.

Actually modulo the issue above, we may not need to use a lock at all. Using the following code allows us to check if a process exists:

import errno
import os

def process_exists(Pid):
    try:
        os.kill(Pid, 0)
    except OSError as e:
        if e.errno == errno.ESRCH:
            return False
        elif e.errno == errno.EPERM:
            return True
        raise
    return True

if __name__ == '__main__':
    import sys

    if len(sys.argv) < 2:
        raise RuntimeError()

    pid = int(sys.argv[1])
    pid_exists = process_exists(pid)
    print("process exists: %s" % pid_exists)

So if we indeed keep somewhere the PID of the newly arbiter then we are good. Maybe we could open a temporary channel between the old arbiter and the new one? Says for example I could write on a pipe shared between two process via the environment then close it once it's done.

Thoughts?

tilgovi · 2016-05-16T01:56:53Z

A pipe might help, but I'm not sure.

For checking the PID, it's not sufficient to check is a PID exists because it may have been reused. We need to check if the current parent PID is the same PID that spawned the arbiter.

benoitc · 2016-05-16T13:18:29Z

@tilgovi so I reviewed your initial proposal more carefully, and you're right, it's the right way to do it. I summarised everything in #1267 with small changes to your solution. Let me know :) @jamadden your opinion is welcome also :)

fix unix socket locking

benoitc force-pushed the fix/1259 branch from efbeeff to cef175b Compare May 14, 2016 09:38

benoitc mentioned this pull request May 14, 2016

IOError: (9, 'Bad file descriptor') gunicorn-19.5.0 #1259

Closed

benoitc force-pushed the fix/1259 branch from e2beba3 to 8a6748e Compare May 14, 2016 20:03

benoitc merged commit 01f8369 into master May 14, 2016

tilgovi reviewed May 14, 2016
View reviewed changes

benoitc mentioned this pull request May 15, 2016

lockfile improvements #1266

Merged

benoitc mentioned this pull request May 16, 2016

USR2 improvements & fixes #1267

Closed

berkerpeksag deleted the fix/1259 branch September 23, 2016 15:13

mjjbell pushed a commit to mjjbell/gunicorn that referenced this pull request Mar 16, 2018

Merge pull request benoitc#1263 from benoitc/fix/1259

72a2203

fix unix socket locking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix unix socket locking #1263

fix unix socket locking #1263

benoitc commented May 14, 2016

benoitc commented May 14, 2016

jamadden commented May 14, 2016

benoitc commented May 14, 2016

benoitc commented May 14, 2016

jamadden commented May 14, 2016

benoitc commented May 14, 2016 •

edited

jamadden commented May 14, 2016

tilgovi May 14, 2016

benoitc May 14, 2016

tilgovi May 15, 2016

tilgovi commented May 14, 2016

tilgovi commented May 14, 2016

tilgovi commented May 14, 2016

tilgovi commented May 14, 2016

benoitc commented May 14, 2016

tilgovi commented May 15, 2016

benoitc commented May 15, 2016

tilgovi commented May 15, 2016

benoitc commented May 15, 2016 •

edited

tilgovi commented May 16, 2016

benoitc commented May 16, 2016

fix unix socket locking #1263

fix unix socket locking #1263

Conversation

benoitc commented May 14, 2016

benoitc commented May 14, 2016

jamadden commented May 14, 2016

benoitc commented May 14, 2016

benoitc commented May 14, 2016

jamadden commented May 14, 2016

benoitc commented May 14, 2016 • edited

jamadden commented May 14, 2016

tilgovi May 14, 2016

Choose a reason for hiding this comment

benoitc May 14, 2016

Choose a reason for hiding this comment

tilgovi May 15, 2016

Choose a reason for hiding this comment

tilgovi commented May 14, 2016

tilgovi commented May 14, 2016

tilgovi commented May 14, 2016

tilgovi commented May 14, 2016

benoitc commented May 14, 2016

tilgovi commented May 15, 2016

benoitc commented May 15, 2016

tilgovi commented May 15, 2016

benoitc commented May 15, 2016 • edited

tilgovi commented May 16, 2016

benoitc commented May 16, 2016

benoitc commented May 14, 2016 •

edited

benoitc commented May 15, 2016 •

edited