Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unix socket locking #1263

Merged
merged 1 commit into from
May 14, 2016
Merged

fix unix socket locking #1263

merged 1 commit into from
May 14, 2016

Conversation

benoitc
Copy link
Owner

@benoitc benoitc commented May 14, 2016

This change add proper file locking to gunicorn. By default "gunicorn.lock" is created in the temporary directory when a unix socket is bound. In case someone want to fix the lock file path or use multiple gunicorn instance the "--lock-file" setting can be used to set the path of this file.

fix #1259

@benoitc
Copy link
Owner Author

benoitc commented May 14, 2016

Maybe we should create unique lock file instead of fixing it to gunicorn.sock ? That can be done by creating it on first boot and then pass it to tthe environnement. Thoughts?

@jamadden
Copy link
Collaborator

I think this could be a breaking change for anyone that runs multiple instances of gunicorn on the same machine at the same time, for example, in testing, or to run two different applications, etc. They would be forced to create a specific lock file location for each instance (which could be annoying if the configs are auto-generated; speaking from experience here). So a unique lock file name seems required to me.

We could automatically make the lock file name unique. Since we're only locking for the sake of the unix domain socket(s), we can do that by including a portion of their path in the lockfile name. At first I thought of just using it as-is, e.g., lockfile_name = '/tmp/gunicorn.' + os.path.basename(sock.cfg_addr) + '.lock', but that breaks if the multiple instances only differ in their prefixes, which is a common way to deploy and test things, especially if you use zc.buildout and some other automated deployment techniques.

So I would suggest, then, taking a hash of the unix domain socket path and including that (or some portion of it) in the lock: lockfile_name = '/tmp/gunicorn.' + hashlib.sha1(sock.cfg_addr).hexdigest + '.lock'. That automatically makes the lock name unique for each unix domain socket.

And if we're doing that, we can keep the lock file confined to the unix domain socket class itself, as before. If we also keep using the shared/exclusive lock technique, then that eliminates the need for changes to the arbiter, the create_sockets function, the close method, etc. It also has the benefit of keeping multiple instances from fighting over the same socket address, but letting you move sockets around between them easily.

@benoitc
Copy link
Owner Author

benoitc commented May 14, 2016

@jamadden There is only one lock file for all the socket, and it's better that way imo, since it can be later use for other stuff possibly.

I agree anyway that we should make the default lock file unique. The change above should fix it.

@benoitc
Copy link
Owner Author

benoitc commented May 14, 2016

@tilgovi @berkerpeksag any feedback is welcome :)

@jamadden
Copy link
Collaborator

I guess I'm just more a fan of finer-grained, single-purpose locking, especially when it's taken and managed close to the resource it's protecting. I find that easier to understand and debug when something (inevitably) goes wrong.

@benoitc
Copy link
Owner Author

benoitc commented May 14, 2016

@jamadden I am also fan about simple and focused solution and I think it's the case there :) What we want to is preventing any unix sockets file on which the app is listening to be unlinked on USR2. Having 1 file / socket is then not needed, in fact we only need to know if an arbiter is still listening. Which is done by using 1 lock file. Bonus point also: we reduced the number of files used and the number of locks.

@jamadden
Copy link
Collaborator

@benoitc ok, thanks. I see your point. I'm not sure I'm convinced about the complexity tradeoffs, but that's a personal thing :)

This change add proper file locking to gunicorn. By default "gunicorn.lock" is created in the temporary directory when a unix socket is bound.  In case someone want to fix the lock file path or use multiple gunicorn instance the "--lock-file" setting can be used to set the path of this file.

fix #1259

def lock(self):
_lock(self._lockfile.fileno())
self._locked = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be set to the return value of the _lockcall, or we may incorrectly report as locked when we are not.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm there is no return from the value, either it fails or not since it's a shared lock. did I miss anything?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that there's no return value from lock. I was thinking of _unlock. If the lock fails, this will bubble an error, and that seems fine.

@tilgovi
Copy link
Collaborator

tilgovi commented May 14, 2016

We should do something about the naming. Calling a function unlock when it actually attempts an exclusive lock is not super friendly to future readers.

@tilgovi
Copy link
Collaborator

tilgovi commented May 14, 2016

Oh. I guess I'm a little late to comment.

@tilgovi
Copy link
Collaborator

tilgovi commented May 14, 2016

I also think maybe we should always do the lock, regardless of UNIX socket or not. It's unnecessary branching that adds complexity to the function bodies and some function call signatures, when it would be harmless to always do it.

@tilgovi
Copy link
Collaborator

tilgovi commented May 14, 2016

I don't love that sock.close now has an extra argument, but I'm not sure what to do about that, if anything should be.

@benoitc
Copy link
Owner Author

benoitc commented May 14, 2016

@tilgovi why should we create a lock file if we don't use it though? It's using a fd for nothing. I'm not against it though it would indeed simplify the code. Just that other servers like apache or nginx don't use locks by default.

About the parameters added to the socket, why instead not adding a terminate method that would be called if the socket is unlocked. This function in the case of the unix socket would unlink the file. Would it work?

About method names well, what about release, and then acquire instead of lock ?

@tilgovi
Copy link
Collaborator

tilgovi commented May 15, 2016

I wonder if we could solve this simply without locks now that I think about it more. There would still be race conditions if USR2 is followed very quickly by QUIT, but that's true with the lock, also.

The way it would work is this:

  • Arbiter checks for an environment variable to the PID of the "main" arbiter
  • If there is no value, arbiter sets the value to its own PID
  • Arbiter tracks state of a single child arbiter, noting creation after USR2 and destruction after SIGCHLD
  • An arbiter always knows if it's a child by comparing the value of the environment variable to its parent pid
  • A child arbiter refuses to fork on USR2
  • On shutdown, only a parent arbiter unlinks the socket

Something like this should work just as well as the lock, with no need for a lock file. It relies on the guarantee that there are never more than two arbiters, which seems reasonable maybe.

On USR2, arbiter compare s the main arbiter pid to its parent pid and if they match, refuse to fork. Otherwise, the arbiter declares itself as main arbiter (update environment) and forks.

It is not possible to be inherited by a process with as PID that matches the parent arbiter, because to inherit the current process it would have had to exist before the parent arbiter spawned, but that would imply two processes with the same PID. Therefore, it's not even necessary to keep track of the parent, only to check it at signal time.

@benoitc benoitc mentioned this pull request May 15, 2016
@benoitc
Copy link
Owner Author

benoitc commented May 15, 2016

I pushed some improvements in #1266

@tilgovi what is a child arbiter though? The new arbiter spawned is not attached to the old arbiter which means that the current arbiter will not receive any signals associated to it. In fact we should make sure they don't share anything since the purpose of USR2 is to allow the upgrade of the application.

It's late there so I will think more about it in the morning. The issue I can see right now also is that the arbiter can die before the control is given to the new arbiter.

@tilgovi
Copy link
Collaborator

tilgovi commented May 15, 2016

I also want to test how things are working with systemd socket activation. If systemd manages the socket lifecycle, we should perhaps never unlink the socket.

I'll think this over more as I can but if you need to make a hotfix release go for it.

@benoitc
Copy link
Owner Author

benoitc commented May 15, 2016

Hrm I think you just found an issue. Indeed we should never unlink the socket if it passed to us via systemd. Or to be simpler when we don't control it ourself.

Actually modulo the issue above, we may not need to use a lock at all. Using the following code allows us to check if a process exists:

import errno
import os

def process_exists(Pid):
    try:
        os.kill(Pid, 0)
    except OSError as e:
        if e.errno == errno.ESRCH:
            return False
        elif e.errno == errno.EPERM:
            return True
        raise
    return True

if __name__ == '__main__':
    import sys

    if len(sys.argv) < 2:
        raise RuntimeError()

    pid = int(sys.argv[1])
    pid_exists = process_exists(pid)
    print("process exists: %s" % pid_exists)

So if we indeed keep somewhere the PID of the newly arbiter then we are good. Maybe we could open a temporary channel between the old arbiter and the new one? Says for example I could write on a pipe shared between two process via the environment then close it once it's done.

Thoughts?

@tilgovi
Copy link
Collaborator

tilgovi commented May 16, 2016

A pipe might help, but I'm not sure.

For checking the PID, it's not sufficient to check is a PID exists because it may have been reused. We need to check if the current parent PID is the same PID that spawned the arbiter.

@benoitc
Copy link
Owner Author

benoitc commented May 16, 2016

@tilgovi so I reviewed your initial proposal more carefully, and you're right, it's the right way to do it. I summarised everything in #1267 with small changes to your solution. Let me know :) @jamadden your opinion is welcome also :)

@berkerpeksag berkerpeksag deleted the fix/1259 branch September 23, 2016 15:13
mjjbell pushed a commit to mjjbell/gunicorn that referenced this pull request Mar 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IOError: (9, 'Bad file descriptor') gunicorn-19.5.0
3 participants