Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown takes 30 second to complete #901

Closed
tahajahangir opened this issue Sep 30, 2014 · 2 comments
Closed

Shutdown takes 30 second to complete #901

tahajahangir opened this issue Sep 30, 2014 · 2 comments

Comments

@tahajahangir
Copy link

Summary:
Gunicorn (sometimes) takes several (up to 30) seconds to shutdown (without any request in progress). We use gunicron behind nginx, and this bug leads to huge increase in upstream time in nginx servers.

During stop interval listening socket is open, so nginx connects to it, and waits for connect (but gunicron doesn't even accept connections, because it's in shutdown stage). After shutdown, sockets are closed, and nginx retries with another upstream server, but with an unnecessary wait of several seconds.

This doesn't happen in a light (development) environment, but happens when gunicorn is under load of ~10 req/s (all requests always take less than 0.2s to complete).

Envrionment:

gunicron 19.1.1 (using 16 process, each with 5 thread) on Python 3.4.1 / CentOS 7 behind nginx 1.6.1 on FreeBSD

Log file:

This is errorlog with debug level:

[2014-09-30 07:15:32 +0330] [1102] [DEBUG] Current configuration:
  chdir: /var/www/src
  umask: 0
  threads: 5
  keepalive: 2
  check_config: False
  syslog_addr: udp://localhost:514
  enable_stdio_inheritance: False
  debug: False
  worker_tmp_dir: None
  errorlog: /var/log/gunicorn-error.log
  graceful_timeout: 30
  on_reload: <function OnReload.on_reload at 0x7f33be9aa378>
  worker_connections: 1000
  reload: False
  raw_env: []
  accesslog: /var/log/gunicorn-access.log
  limit_request_fields: 100
  workers: 16
  pidfile: None
  user: 1000
  forwarded_allow_ips: ['127.0.0.1']
  proxy_allow_ips: ['127.0.0.1']
  django_settings: None
  worker_class: sync
  pre_fork: <function pre_fork at 0x7f33bec329d8>
  post_fork: <function post_fork at 0x7f33bec328c8>
  syslog: False
  post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f33be9aa9d8>
  pre_exec: <function pre_exec at 0x7f33bec32950>
  post_request: <function PostRequest.post_request at 0x7f33be9af1e0>
  nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f33be9af510>
  access_log_format: %(h)s %(l)s %(u)s %(t)s %({Host}i)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(L)s
  paste: None
  spew: False
  max_requests: 0
  timeout: 30
  limit_request_field_size: 8190
  certfile: None
  preload_app: True
  worker_abort: <function worker_abort at 0x7f33bec32b70>
  proxy_protocol: False
  bind: ['0.0.0.0:8000']
  ssl_version: 3
  loglevel: debug
  group: 1000
  secure_scheme_headers: {'X-FORWARDED-SSL': 'on', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-PROTOCOL': 'ssl'}
  ca_certs: None
  on_starting: <function OnStarting.on_starting at 0x7f33be9aa1e0>
  pre_request: <function PreRequest.pre_request at 0x7f33be9af0d0>
  logger_class: gunicorn.glogging.Logger
  daemon: False
  when_ready: <function when_ready at 0x7f33bec32a60>
  keyfile: None
  suppress_ragged_eofs: True
  tmp_upload_dir: None
  pythonpath: None
  config: gunicorn_config.py
  on_exit: <function OnExit.on_exit at 0x7f33be9af6a8>
  logconfig: None
  backlog: 2048
  proc_name: None
  worker_exit: <function WorkerExit.worker_exit at 0x7f33be9af378>
  syslog_facility: user
  statsd_host: None
  default_proc_name: index
  cert_reqs: 0
  do_handshake_on_connect: False
  ciphers: TLSv1
  limit_request_line: 4094
  worker_int: <function worker_int at 0x7f33bec32ae8>
  syslog_prefix: None
[2014-09-30 07:15:33 +0330] [1102] [INFO] Starting gunicorn 19.1.1
[2014-09-30 07:15:33 +0330] [1102] [DEBUG] Arbiter booted
[2014-09-30 07:15:33 +0330] [1102] [INFO] Listening at: http://0.0.0.0:8000 (1102)
[2014-09-30 07:15:33 +0330] [1102] [INFO] Using worker: threads
[2014-09-30 07:15:33 +0330] [1102] [INFO] Server is ready. Spawning workers
[2014-09-30 07:15:33 +0330] [1106] [INFO] Booting worker with pid: 1106
[2014-09-30 07:15:33 +0330] [1106] [INFO] Worker spawned (pid: 1106)
[2014-09-30 07:15:33 +0330] [1107] [INFO] Booting worker with pid: 1107
[2014-09-30 07:15:33 +0330] [1107] [INFO] Worker spawned (pid: 1107)
[2014-09-30 07:15:33 +0330] [1108] [INFO] Booting worker with pid: 1108
[2014-09-30 07:15:33 +0330] [1108] [INFO] Worker spawned (pid: 1108)
[2014-09-30 07:15:33 +0330] [1109] [INFO] Booting worker with pid: 1109
[2014-09-30 07:15:33 +0330] [1109] [INFO] Worker spawned (pid: 1109)
[2014-09-30 07:15:33 +0330] [1110] [INFO] Booting worker with pid: 1110
[2014-09-30 07:15:33 +0330] [1110] [INFO] Worker spawned (pid: 1110)
[2014-09-30 07:15:33 +0330] [1111] [INFO] Booting worker with pid: 1111
[2014-09-30 07:15:33 +0330] [1111] [INFO] Worker spawned (pid: 1111)
[2014-09-30 07:15:33 +0330] [1112] [INFO] Booting worker with pid: 1112
[2014-09-30 07:15:33 +0330] [1112] [INFO] Worker spawned (pid: 1112)
[2014-09-30 07:15:33 +0330] [1113] [INFO] Booting worker with pid: 1113
[2014-09-30 07:15:33 +0330] [1113] [INFO] Worker spawned (pid: 1113)
[2014-09-30 07:15:33 +0330] [1114] [INFO] Booting worker with pid: 1114
[2014-09-30 07:15:33 +0330] [1114] [INFO] Worker spawned (pid: 1114)
[2014-09-30 07:15:33 +0330] [1115] [INFO] Booting worker with pid: 1115
[2014-09-30 07:15:33 +0330] [1115] [INFO] Worker spawned (pid: 1115)
[2014-09-30 07:15:33 +0330] [1102] [DEBUG] 10 workers
[2014-09-30 07:15:34 +0330] [1114] [DEBUG] GET /foo1
[2014-09-30 07:15:34 +0330] [1112] [DEBUG] GET /bar1
[2014-09-30 07:15:34 +0330] [1112] [DEBUG] Closing connection.
[2014-09-30 07:15:34 +0330] [1114] [DEBUG] Closing connection.
....
[2014-09-30 07:15:57 +0330] [1114] [DEBUG] GET /foo2
[2014-09-30 07:15:57 +0330] [1114] [DEBUG] Closing connection.
[2014-09-30 07:15:57 +0330] [1113] [DEBUG] GET /foo
[2014-09-30 07:15:57 +0330] [1112] [DEBUG] Closing connection.
[2014-09-30 07:15:58 +0330] [1110] [DEBUG] GET /
[2014-09-30 07:15:58 +0330] [1110] [DEBUG] GET /bar
[2014-09-30 07:15:58 +0330] [1110] [DEBUG] Closing connection.
[2014-09-30 07:15:58 +0330] [1111] [DEBUG] Closing connection.
[2014-09-30 07:15:58 +0330] [1102] [DEBUG] 10 workers
[2014-09-30 07:15:58 +0330] [1102] [INFO] Handling signal: term
[2014-09-30 07:15:58 +0330] [1111] [INFO] Worker exiting (pid: 1111)
[2014-09-30 07:15:58 +0330] [1109] [INFO] Worker exiting (pid: 1109)
[2014-09-30 07:15:58 +0330] [1108] [INFO] Worker exiting (pid: 1108)
[2014-09-30 07:15:58 +0330] [1107] [INFO] Worker exiting (pid: 1107)
[2014-09-30 07:15:58 +0330] [1115] [INFO] Worker exiting (pid: 1115)
[2014-09-30 07:15:58 +0330] [1112] [INFO] Worker exiting (pid: 1112)
[2014-09-30 07:15:58 +0330] [1106] [INFO] Worker exiting (pid: 1106)
[2014-09-30 07:15:58 +0330] [1110] [DEBUG] Closing connection.
[2014-09-30 07:15:58 +0330] [1110] [INFO] Worker exiting (pid: 1110)
[2014-09-30 07:15:58 +0330] [1113] [DEBUG] Closing connection.
[2014-09-30 07:15:58 +0330] [1113] [INFO] Worker exiting (pid: 1113)
[2014-09-30 07:16:28 +0330] [1114] [INFO] Worker exiting (pid: 1114)
[2014-09-30 07:16:28 +0330] [1102] [INFO] Shutting down: Master

Config file:

workers = 10
# threads = 10
accesslog = '/var/log/gunicorn-access.log'
errorlog = '/var/log/gunicorn-error.log'
preload_app = True
loglevel = 'debug'
access_log_format = '%(h)s %(l)s %(u)s %(t)s %({Host}i)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(L)s'
bind = '0.0.0.0:8000'
daemon = True
# worker_class = 'sync'


def post_fork(server, worker):
    server.log.info("Worker spawned (pid: %s)", worker.pid)

def pre_fork(server, worker):
    pass

def pre_exec(server):
    server.log.info("Forked child, re-executing.")

def when_ready(server):
    server.log.info("Server is ready. Spawning workers")

def worker_int(worker):
    worker.log.info("worker received INT or QUIT signal")

def worker_abort(worker):
    worker.log.info("worker received SIGABRT signal")
@tahajahangir
Copy link
Author

I found that's because at least one socket is stuck at CLOSE_WAIT state, and corresponding thread is in gunicorn.util.write function. Complete stack is:

File: "python/lib/python3.4/threading.py", line 888, in _bootstrap
  self._bootstrap_inner()
File: "python/lib/python3.4/threading.py", line 920, in _bootstrap_inner
  self.run()
File: "python/lib/python3.4/threading.py", line 868, in run
  self._target(*self._args, **self._kwargs)
File: "python/lib/python3.4/concurrent/futures/thread.py", line 65, in _worker
  work_item.run()
File: "python/lib/python3.4/concurrent/futures/thread.py", line 54, in run
  result = self.fn(*self.args, **self.kwargs)
File: "python/lib/python3.4/site-packages/gunicorn/workers/gthread.py", line 255, in handle
  keepalive = self.handle_request(req, conn)
File: "python/lib/python3.4/site-packages/gunicorn/workers/gthread.py", line 310, in handle_request
  resp.write(item)
File: "python/lib/python3.4/site-packages/gunicorn/http/wsgi.py", line 344, in write
  util.write(self.sock, arg, self.chunked)
File: "python/lib/python3.4/site-packages/gunicorn/util.py", line 301, in write
  sock.sendall(data)

The threads stuck in that function, will print this following after about 15 minute of inactivity, (and socket is closed):

[2014-10-01 14:24:51 +0330] [9262] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "python/lib/python3.4/site-packages/gunicorn/workers/gthread.py", line 255, in handle
    keepalive = self.handle_request(req, conn)
  File "python/lib/python3.4/site-packages/gunicorn/workers/gthread.py", line 325, in handle_request
    six.reraise(exc_info[0], exc_info[1], exc_info[2])
  File "python/lib/python3.4/site-packages/gunicorn/six.py", line 354, in reraise
    raise value
  File "python/lib/python3.4/site-packages/gunicorn/workers/gthread.py", line 310, in handle_request
    resp.write(item)
  File "python/lib/python3.4/site-packages/gunicorn/http/wsgi.py", line 344, in write
    util.write(self.sock, arg, self.chunked)
  File "python/lib/python3.4/site-packages/gunicorn/util.py", line 301, in write
    sock.sendall(data)
TimeoutError: [Errno 110] Connection timed out

@tilgovi
Copy link
Collaborator

tilgovi commented Dec 31, 2015

Closing in favor of #922 to keep the issues tidy. Thanks!

@tilgovi tilgovi closed this as completed Dec 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants