worker stops consuming tasks after redis reconnection on celery 5 #7276

fcovatti · 2022-02-02T23:23:53Z

fcovatti
Feb 2, 2022

I am experiencing an issue with celery==5.2.3 that I did not experience with celery 4.4.7 which I have recently migrated from.

I am using redis (5.0.9) as the message broker. When I manually restart redis, the celery worker from time to time (after the restart of redis) stops consuming tasks indefinitely. Celery beat is able to publish tasks to the broker without any problem after the redis restarts. Once I force a restart of the worker, it will get all the past scheduled tasks by beat.

Only if I run celery 5 worker without heartbeat/gossip/mingle this does not happen and I can restart redis without the worker stopping to consume tasks after it reconnects to it.

I am running the worker with the following options to "make it work":

celery -A proj worker -l info --without-heartbeat --without-gossip --without-mingle

When I try running celery with rabbitmq as the message broker and with mingle/gossip/heartbeat I cannot reproduce the bug (this only happens with redis). But for the scenario I am using I need to keep using redis.

I have 2 questions:

Is it okay to run celery without the heartbeat/gossip/mingle enabled?
Is this somehow a bug I should report?

Logs prior to when it get's stuck. I did wait for half an hour and tasks (periodic task are scheduled every 5 minutes) were not consumed by the worker, then I did hit ctrl+c. There is no logs when it stops consuming messages, it just "freezes":

[2022-02-01 22:50:13,323: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 326, in start
    blueprint.start(self)
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 618, in start
    c.loop(*c.loop_args())
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/kombu/asynchronous/hub.py", line 362, in create_loop
    cb(*cbargs)
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/kombu/transport/redis.py", line 1266, in on_readable
    self.cycle.on_readable(fileno)
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/kombu/transport/redis.py", line 504, in on_readable
    chan.handlers[type]()
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/kombu/transport/redis.py", line 896, in _brpop_read
    dest__item = self.client.parse_response(self.client.connection,
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/redis/client.py", line 1192, in parse_response
    response = connection.read_response()
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/redis/connection.py", line 814, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/redis/connection.py", line 320, in read_response
    raw = self._buffer.readline()
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/redis/connection.py", line 251, in readline
    self._read_from_socket()
  File "/home/covatti/.local/share/virtualenvs/souq-p_Vlgosd/lib/python3.8/site-packages/redis/connection.py", line 197, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
[2022-02-01 22:50:13,327: DEBUG/MainProcess] | Consumer: Restarting event loop...
[2022-02-01 22:50:13,327: DEBUG/MainProcess] | Consumer: Restarting Control...
[2022-02-01 22:50:13,328: DEBUG/MainProcess] | Consumer: Restarting Tasks...
[2022-02-01 22:50:13,328: DEBUG/MainProcess] Canceling task consumer...
[2022-02-01 22:50:13,328: DEBUG/MainProcess] | Consumer: Restarting Heart...
[2022-02-01 22:50:13,330: DEBUG/MainProcess] | Consumer: Restarting DriverConsumerStep...
[2022-02-01 22:50:13,330: DEBUG/MainProcess] | Consumer: Restarting Mingle...
[2022-02-01 22:50:13,330: DEBUG/MainProcess] | Consumer: Restarting Events...
[2022-02-01 22:50:13,330: DEBUG/MainProcess] | Consumer: Restarting Connection...
[2022-02-01 22:50:13,331: DEBUG/MainProcess] | Consumer: Starting Connection
[2022-02-01 22:50:13,333: ERROR/MainProcess] consumer: Cannot connect to redis://localhost:6379//: Connection closed by server..
Trying again in 2.00 seconds... (1/100)

[2022-02-01 22:50:15,337: ERROR/MainProcess] consumer: Cannot connect to redis://localhost:6379//: Error 111 connecting to localhost:6379. Connection refused..
Trying again in 4.00 seconds... (2/100)

[2022-02-01 22:50:19,345: INFO/MainProcess] Connected to redis://localhost:6379//
[2022-02-01 22:50:19,346: DEBUG/MainProcess] ^-- substep ok
[2022-02-01 22:50:19,346: DEBUG/MainProcess] | Consumer: Starting Events
[2022-02-01 22:50:19,350: DEBUG/MainProcess] ^-- substep ok
[2022-02-01 22:50:19,351: DEBUG/MainProcess] | Consumer: Starting Mingle
[2022-02-01 22:50:19,351: INFO/MainProcess] mingle: searching for neighbors
[2022-02-01 22:50:20,357: INFO/MainProcess] mingle: all alone
[2022-02-01 22:50:20,357: DEBUG/MainProcess] ^-- substep ok
[2022-02-01 22:50:20,357: DEBUG/MainProcess] | Consumer: Starting DriverConsumerStep
[2022-02-01 22:50:20,361: DEBUG/MainProcess] ^-- substep ok
[2022-02-01 22:50:20,361: DEBUG/MainProcess] | Consumer: Starting Heart
[2022-02-01 22:50:20,363: DEBUG/MainProcess] ^-- substep ok
[2022-02-01 22:50:20,363: DEBUG/MainProcess] | Consumer: Starting Tasks
[2022-02-01 22:50:20,368: DEBUG/MainProcess] ^-- substep ok
[2022-02-01 22:50:20,368: DEBUG/MainProcess] | Consumer: Starting Control
[2022-02-01 22:50:20,372: DEBUG/MainProcess] ^-- substep ok
[2022-02-01 22:50:20,373: DEBUG/MainProcess] | Consumer: Starting event loop
[2022-02-01 22:50:20,373: DEBUG/MainProcess] | Worker: Hub.register Pool...
[2022-02-01 22:50:20,373: DEBUG/MainProcess] basic.qos: prefetch_count->16
^C

Celery report

(proj) user@desktop:~/app$ celery -A proj report

software -> celery:5.2.3 (dawn-chorus) kombu:5.2.3 py:3.8.0
            billiard:3.6.4.0 redis:4.1.2
platform -> system:Linux arch:64bit, ELF
            kernel version:5.4.0-97-generic imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:redis://localhost:6379/

ABSOLUTE_URL_OVERRIDES: {
 }
ADMINS: []
ADMIN_PERMISSION_CLASS: <class 'rest_framework_jwt.permissions.create_custom_permission_class.<locals>.CustomPermission'>
ALLOWED_HOSTS: '*'
APPEND_SLASH: True
AUTHENTICATION_BACKENDS: ['django.contrib.auth.backends.ModelBackend']
AUTH_PASSWORD_VALIDATORS: '********'
AUTH_USER_MODEL: 'auth.User'
CACHES: {
 'default': {'BACKEND': 'django.core.cache.backends.locmem.LocMemCache'}}
CACHE_MIDDLEWARE_ALIAS: 'default'
CACHE_MIDDLEWARE_KEY_PREFIX: '********'
CACHE_MIDDLEWARE_SECONDS: 600
CELERY_ACCEPT_CONTENT: ['application/json']
CELERY_BROKER_URL: 'redis://localhost:6379//'
CELERY_RESULT_BACKEND: 'redis://localhost:6379/'
CELERY_RESULT_SERIALIZER: 'json'
CELERY_TASK_SERIALIZER: 'json'
CSRF_COOKIE_AGE: 31449600
CSRF_COOKIE_DOMAIN: None
CSRF_COOKIE_HTTPONLY: False
CSRF_COOKIE_NAME: 'csrftoken'
CSRF_COOKIE_PATH: '/'
CSRF_COOKIE_SAMESITE: 'Lax'
CSRF_COOKIE_SECURE: False
CSRF_FAILURE_VIEW: 'django.views.csrf.csrf_failure'
CSRF_HEADER_NAME: 'HTTP_X_CSRFTOKEN'
CSRF_TRUSTED_ORIGINS: []
CSRF_USE_SESSIONS: False
DATABASES: {
    'default': {   'ATOMIC_REQUESTS': False,
                   'AUTOCOMMIT': True,
                   'CONN_MAX_AGE': 0,
                   'ENGINE': 'django.db.backends.postgresql',
                   'HOST': 'localhost',
                   'NAME': 'proj',
                   'OPTIONS': {},
                   'PASSWORD': '********',
                   'PORT': '5432',
                   'TEST': {   'CHARSET': None,
                               'COLLATION': None,
                               'MIGRATE': True,
                               'MIRROR': None,
                               'NAME': None},
                   'TIME_ZONE': None,
                   'USER': 'proj'}}
DATABASE_ROUTERS: '********'
DATA_UPLOAD_MAX_MEMORY_SIZE: 2621440
DATA_UPLOAD_MAX_NUMBER_FIELDS: 1000
DATETIME_FORMAT: 'N j, Y, P'
DATETIME_INPUT_FORMATS: ['%Y-%m-%d %H:%M:%S',
 '%Y-%m-%d %H:%M:%S.%f',
 '%Y-%m-%d %H:%M',
 '%m/%d/%Y %H:%M:%S',
 '%m/%d/%Y %H:%M:%S.%f',
 '%m/%d/%Y %H:%M',
 '%m/%d/%y %H:%M:%S',
 '%m/%d/%y %H:%M:%S.%f',
 '%m/%d/%y %H:%M']
DATE_FORMAT: 'N j, Y'
DATE_INPUT_FORMATS: ['%Y-%m-%d',
 '%m/%d/%Y',
 '%m/%d/%y',
 '%b %d %Y',
 '%b %d, %Y',
 '%d %b %Y',
 '%d %b, %Y',
 '%B %d %Y',
 '%B %d, %Y',
 '%d %B %Y',
 '%d %B, %Y']
DEBUG: True
DEBUG_PROPAGATE_EXCEPTIONS: False
DECIMAL_SEPARATOR: '.'
DEFAULT_AUTO_FIELD: 'django.db.models.AutoField'
DEFAULT_CHARSET: 'utf-8'
DEFAULT_EXCEPTION_REPORTER: 'django.views.debug.ExceptionReporter'
DEFAULT_EXCEPTION_REPORTER_FILTER: 'django.views.debug.SafeExceptionReporterFilter'
DEFAULT_FILE_STORAGE: 'django.core.files.storage.FileSystemStorage'
DEFAULT_FROM_EMAIL: 'webmaster@localhost'
DEFAULT_HASHING_ALGORITHM: 'sha256'
DEFAULT_INDEX_TABLESPACE: ''
DEFAULT_TABLESPACE: ''
DISALLOWED_USER_AGENTS: []
FILE_UPLOAD_DIRECTORY_PERMISSIONS: None
FILE_UPLOAD_HANDLERS: ['django.core.files.uploadhandler.MemoryFileUploadHandler',
 'django.core.files.uploadhandler.TemporaryFileUploadHandler']
FILE_UPLOAD_MAX_MEMORY_SIZE: 2621440
FILE_UPLOAD_PERMISSIONS: 420
FILE_UPLOAD_TEMP_DIR: None
FIRST_DAY_OF_WEEK: 0
FIXTURE_DIRS: []
FORCE_SCRIPT_NAME: None
FORMAT_MODULE_PATH: None
FORM_RENDERER: 'django.forms.renderers.DjangoTemplates'
GRAPH: 'True'
HTTPS_ON: 'False'
IGNORABLE_404_URLS: []
INSTALLED_APPS: ['baton',
 'django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'django.contrib.sites',
 'django_celery_beat',
 'rest_framework',
 'proj',
 'container_handler',
 'debug_toolbar',
 'nested_admin',
 'django_extensions',
 'baton.autodiscover']
INTERNAL_IPS: ['127.0.0.1']
LOCALE_PATHS: []
LOGGING: {
    'disable_existing_loggers': False,
    'filters': {   'require_debug_false': {   '()': 'django.utils.log.RequireDebugFalse'},
                   'require_debug_true': {   '()': 'django.utils.log.RequireDebugTrue'}},
    'handlers': {   'console': {   'class': 'logging.StreamHandler',
                                   'filters': ['require_debug_true'],
                                   'level': 'DEBUG',
                                   'stream': <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>},
                    'console_on_not_debug': {   'class': 'logging.StreamHandler',
                                                'filters': [   'require_debug_false'],
                                                'level': 'WARNING'}},
    'loggers': {   'django': {   'handlers': [   'console',
                                                 'console_on_not_debug'],
                                 'level': 'INFO',
                                 'propagate': True},
                 },
    'version': 1}
LOGGING_CONFIG: 'logging.config.dictConfig'
LOGIN_REDIRECT_URL: '/accounts/profile/'
LOGIN_URL: '/accounts/login/'
LOGOUT_REDIRECT_URL: None
MANAGERS: []
MEDIA_ROOT: '/home/proj/app/csv'
MEDIA_URL: '/csv/'
MESSAGE_STORAGE: 'django.contrib.messages.storage.fallback.FallbackStorage'
MIDDLEWARE: ['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'debug_toolbar.middleware.DebugToolbarMiddleware']
MIGRATION_MODULES: {
 }
MOCK_DATA_GENERATION: {
    'CONFIG_FILE': None,
    'DIG_INDEX': '0',
    'DIG_INDEX_FILE': None,
    'ENABLED': False,
    'MODULE_ENABLED': False,
    'PARAM_LIMITS': '../mock-data-gen-params.csv',
    'SCHEDULE_CRON': '*/5',
    'SCHEDULE_SECONDS': None,
    'STEP_MULTIPLIER_MAX': 15}
MONTH_DAY_FORMAT: 'F j'
NUMBER_GROUPING: 0
PASSWORD_HASHERS: '********'
PASSWORD_RESET_TIMEOUT: '********'
PASSWORD_RESET_TIMEOUT_DAYS: '********'
PREPEND_WWW: False
READ_ADMIN_PERMISSION_CLASS: <class 'rest_framework_jwt.permissions.create_custom_permission_class.<locals>.CustomPermission'>
RECORDS_MAX_DISPLAYED: 999999
REST_FRAMEWORK: {
    'DEFAULT_AUTHENTICATION_CLASSES': (   'rest_framework_jwt.authentication.JSONWebTokenAuthentication',
                                          'rest_framework.authentication.SessionAuthentication',
                                          'rest_framework.authentication.BasicAuthentication'),
    'DEFAULT_PERMISSION_CLASSES': (   'rest_framework.permissions.IsAuthenticated',)}
ROOT_URLCONF: 'proj.urls'
SECURE_BROWSER_XSS_FILTER: False
SECURE_CONTENT_TYPE_NOSNIFF: True
SECURE_HSTS_INCLUDE_SUBDOMAINS: False
SECURE_HSTS_PRELOAD: False
SECURE_HSTS_SECONDS: 0
SECURE_PROXY_SSL_HEADER: None
SECURE_REDIRECT_EXEMPT: []
SECURE_REFERRER_POLICY: 'same-origin'
SECURE_SSL_HOST: None
SECURE_SSL_REDIRECT: False
SERVER_EMAIL: 'root@localhost'
SESSION_CACHE_ALIAS: 'default'
SESSION_COOKIE_AGE: 1209600
SESSION_COOKIE_DOMAIN: None
SESSION_COOKIE_HTTPONLY: True
SESSION_COOKIE_NAME: 'sessionid'
SESSION_COOKIE_PATH: '/'
SESSION_COOKIE_SAMESITE: 'Lax'
SESSION_COOKIE_SECURE: False
SESSION_ENGINE: 'django.contrib.sessions.backends.db'
SESSION_EXPIRE_AT_BROWSER_CLOSE: False
SESSION_FILE_PATH: None
SESSION_SAVE_EVERY_REQUEST: False
SESSION_SERIALIZER: 'django.contrib.sessions.serializers.JSONSerializer'
SETTINGS_MODULE: 'proj.settings'
SHORT_DATETIME_FORMAT: 'm/d/Y P'
SHORT_DATE_FORMAT: 'm/d/Y'
SIGNING_BACKEND: 'django.core.signing.TimestampSigner'
SILENCED_SYSTEM_CHECKS: []
SIMSUITE_INTEGRATION: 'False'
STATICFILES_DIRS: []
STATICFILES_FINDERS: ['django.contrib.staticfiles.finders.FileSystemFinder',
 'django.contrib.staticfiles.finders.AppDirectoriesFinder']
STATICFILES_STORAGE: 'django.contrib.staticfiles.storage.StaticFilesStorage'
STATIC_ROOT: '/home/proj/app/static'
STATIC_URL: '/static/'
TEMPLATES: [{'APP_DIRS': True,
  'BACKEND': 'django.template.backends.django.DjangoTemplates',
  'DIRS': ['/home/proj/app/templates'],
  'OPTIONS': {'context_processors': ['django.template.context_processors.debug',
                                     'django.template.context_processors.request',
                                     'django.contrib.auth.context_processors.auth',
                                     'django.contrib.messages.context_processors.messages']}}]
TEST_NON_SERIALIZED_APPS: []
TEST_RUNNER: 'django.test.runner.DiscoverRunner'
THOUSAND_SEPARATOR: ','
TIME_FORMAT: 'P'
TIME_INPUT_FORMATS: ['%H:%M:%S', '%H:%M:%S.%f', '%H:%M']
TIME_ZONE: 'UTC'
USE_I18N: True
USE_L10N: True
USE_THOUSAND_SEPARATOR: False
USE_TZ: True
USE_X_FORWARDED_HOST: False
USE_X_FORWARDED_PORT: False
WSGI_APPLICATION: 'proj.wsgi.APPLICATION'
X_FRAME_OPTIONS: 'DENY'
YEAR_MONTH_FORMAT: 'F Y'
is_overridden: <bound method Settings.is_overridden of <Settings "proj.settings">>
deprecated_settings: None
beat_schedule: {
    'read_data': {   'schedule': <crontab: */5 * * * * (m/h/d/dM/MY)>,
                           'task': 'data.tasks.read_data'}}
timezone: 'UTC'

Realsid · 2022-04-27T10:34:26Z

Realsid
Apr 27, 2022

Facing this exact issue.
Celery version:
celery = "^5.2.1"
Redis Version:
$ redis-server --version
output: Redis server v=6.2.6 sha=00000000:0 malloc=libc bits=64 build=c6f3693d1aced7d9

0 replies

wolfier · 2022-09-20T20:31:00Z

wolfier
Sep 20, 2022

I experience the same issue with celery==5.2.7 and redis==6.2.6.

[2022-09-19 23:58:00,794: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

[2022-09-19 23:58:03,802: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 4.00 seconds... (2/100)

[2022-09-19 23:58:08,830: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 6.00 seconds... (3/100)

[2022-09-19 23:58:15,866: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 8.00 seconds... (4/100)

[2022-09-19 23:58:24,890: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 10.00 seconds... (5/100)

[2022-09-19 23:58:34,907: INFO/MainProcess] Connected to redis://:**@accurate-axis-9558-redis:6379/0
[2022-09-19 23:58:34,915: INFO/MainProcess] mingle: searching for neighbors
[2022-09-19 23:58:35,923: INFO/MainProcess] mingle: all alone

The redis service would restart but the celery worker won't consume messages from redis afterwards.

0 replies

merlinux · 2022-09-22T14:08:12Z

merlinux
Sep 22, 2022

Same issue with:
Celery: v5.2.7
RabbitMQ: 3.10.7

0 replies

alexlinus · 2022-10-26T17:59:10Z

alexlinus
Oct 26, 2022

same issue with:
celery 5.2.7
and redis.

0 replies

cmessick · 2022-11-14T18:53:29Z

cmessick
Nov 14, 2022

Same issue with:
Celery v5.2.7
redis-cli 5.0.3

although the worker continued to consume tasks for ~20 minutes after redis reconnected, then stopped.

0 replies

mjkonarski-b · 2022-11-17T09:21:35Z

mjkonarski-b
Nov 17, 2022

Same issue:
celery 5.2.7
redis 6.2.7

0 replies

ericazhao-china · 2022-12-30T09:49:11Z

ericazhao-china
Dec 30, 2022

I also want to know if any negative impact to run celery without the heartbeat/gossip/mingle enabled? Any idea for this ? thanks !

0 replies

venomone · 2023-01-08T03:15:52Z

venomone
Jan 8, 2023

Same issue here
Celery 5.2.7 using redis
If I restart celery, only the worker instance, task get consumed again, but this is not really a solution to go for.
I also tested 5.3.0b1, same issue.

0 replies

adi- · 2023-01-19T23:58:22Z

adi-
Jan 19, 2023

Had the same issue. But, after some random period of time those tasks gets consume. Sometimes it takes loooong time (even hours), but they finally start to show up.

0 replies

AmFlint · 2023-01-20T11:08:59Z

AmFlint
Jan 20, 2023

We had the same issue, for now, we're "working around it" by using --without-mingle and --without-gossip (I did not use --without-heartbeat) but seems like the problem is resolved.

Hopefully we won't run into new issues because we deactivated both features.

3 replies

AmFlint Jul 1, 2023

Actually, we're still encountering the issue from time to time which is pretty critical

cw1427 Jan 9, 2024

Tried --without-mingle and --without-gossip which won't work. How about your latest status?

MrCrutchMaster Mar 26, 2024

celery==5.3.1
redis:6.2

I have the same problem on production and reproduced the problem locally, adding "--without-mingle" to celery worker solved the problem.

venomone · 2023-01-22T15:37:39Z

venomone
Jan 22, 2023

Same issue here, using a redis cluster

1 reply

rombr Jan 27, 2023

Celery 5.2.7 with redis broker
It was enough only --without-mingle in my case

johnbridstrup · 2023-02-07T00:41:31Z

johnbridstrup
Feb 7, 2023

Same here on 5.2.7. Although testing with docker compose it seems like sometimes it reconnects and resumes consuming immediately and other times it does not, definitely flaky. Using celery -A proj worker -l INFO for test

0 replies

vanschelven · 2023-02-22T13:43:21Z

vanschelven
Feb 22, 2023

Same issue here with

celery 5.2.7
redis 6.2.7
celery -A glitchtip worker -l info -B -s /tmp/celerybeat-schedule

0 replies

dsharma522 · 2023-03-03T05:54:33Z

dsharma522
Mar 3, 2023

Same issue here

Redis 6.2.11
celery 5.2.7

0 replies

Wats0ns · 2023-03-06T14:45:24Z

Wats0ns
Mar 6, 2023

I added a step by step process to reproduce the bug, please let me know if I can provide anything else to help this being resolved. Thanks a lot !

0 replies

ocervell · 2023-10-05T09:39:59Z

ocervell
Oct 5, 2023

Same issue but I'm using RabbitMQ as a broker and Redis as the backend:
celery:5.3.4
kombu:5.3.2
redis:5.0.1
amqp:5.1.1

0 replies

bnku · 2023-10-11T12:14:18Z

bnku
Oct 11, 2023

same issue:
celery: 5.3.1
redis: 7.2.1

0 replies

clwilliamson · 2023-10-16T19:32:14Z

clwilliamson
Oct 16, 2023

same issue:
celery 5.2.7
kombu 5.2.4
redis 4.5.4
amqp 5.1.1

any ideas when a fix will be available?

0 replies

testingIsNotFun · 2023-10-18T12:56:10Z

testingIsNotFun
Oct 18, 2023

Hi, I'm new to celery and experiencing a similar issue using celery+redis. Not sure if this can help but lowering the visibility_timeout to a low number seems to help, as per default the visibility_timeout is set to 3600.
With this the task will be received twice by the celery worker, once before redis is down, and once after redis is up and running again with the same task id, but the task sometimes executes multiple times which is a problem.
My testing scenario was sending a task that needs to be executed at eta time. I manually stop redis for a few seconds (e.g. at 13:00) and start it again. Then I wait for the time when the task needs to be executed (e.g. eta time=13:05). Without the visibility_timeout set to a low number, the task is not picked by the celery worker to be executed when the eta time comes. When I set the number to a low number the task is picked up again by the worker when the connection is up again and executed at the eta time.
app.conf.update(
broker_transport_options={"visibility_timeout": 5},
)

I still need to test it out a bit more, but maybe someone more experienced with celery and redis can try it out. Thanks.

7 replies

testingIsNotFun Nov 13, 2023

Agree, but it was either this or remove celery and redis from our app. Hope they fix this bug soon.

Kashemir001 Nov 14, 2023

Can you please elaborate what kubernetes probes did you try? These look like they would suffice, as I'm currently testing locally:

Before redis connection loss:

root@5a14eca55c3e:/app# celery -b $CELERY_BROKER_URL inspect ping -d celery@$HOSTNAME
->  celery@5a14eca55c3e: OK
        pong

1 node online.
root@5a14eca55c3e:/app# echo $?
0

After simulating connection loss with a manual redis restart:

root@5a14eca55c3e:/app# celery -b $CELERY_BROKER_URL inspect ping -d celery@$HOSTNAME
Error: No nodes replied within time constraint
root@5a14eca55c3e:/app# echo $?
69

Therefore I think liveness probe like this would do the trick and restart celery pods after connection loss (maybe several times if redis downtime was longer than probe intervals):

livenessProbe:
  exec:
    command: [
      "bash",
      "-c",
      "celery -b $CELERY_BROKER_URL inspect ping -d celery@$HOSTNAME"
    ]

Ideas source: #4079 . Note that some people are talking about high CPU utilization for this kind of pings, and it might require further tuning.

testingIsNotFun Nov 14, 2023

In my case when testing I found that it didn't work for me, to elaborate, the probes worked fine as intendent but there where still scenarios where the tasks where not executed.
My case: I need to restart the celery container when the redis connection is up again not during the connection loss.
When the connection is down I leave celery to try to reconnect to redis, in the meantime my sidecar container pings redis, when the connection is up again, wait 20s to let celery boot and connect to redis and then kill the celery container. This is the only case where my worker receives again the tasks that where received before the connection loss.

Rediness probe - it will make you container unavailable, there will be no restart, so it didn't work in my case
Liveness probe - it looked promising, but the probe restarted my container on redis connection loss which didn't help me. As you said the probe will restart the container several times, but the celery container won't restart when the redis connection is up again because the probe will not trigger as there is nothing wrong, connection is up again. I used a similar probe like you provided. Also as you said such pings lead to high cpu usage if the ping is frequent towards redis.

In your case, do you need to restart the celery container after the redis connection is up again or when the connection is lost? Also can you check if your tasks are received again after the connection is up again?

Kashemir001 Nov 14, 2023

but the celery container won't restart when the redis connection is up again because the probe will not trigger as there is nothing wrong

From what I've tested locally, these pings return error even after the connection to Redis is up again, but worker still does not pick up the tasks from it - so after the connection is up, it should trigger one final (maybe also the only one if Redis downtime was less than probe interval) restart after restoring connection.

Also as you said such pings lead to high cpu usage if the ping is frequent towards redis.

To be clear, did you mean "the ping toward Celery"? Because it does not ping Redis per se (but will also fail if Redis is unavailable). Also the cpu usage might be related to how you invoke the command - say, if you do celery -a project_name inspect ping -d celery@$HOSTNAME that would bootstrap you whole project just for the ping (I might be wrong here).

In your case, do you need to restart the celery container after the redis connection is up again or when the connection is lost?

I don't mind extra restarts during Redis downtime (the worker is not doing anything anyway), but ultimately I want worker to continue executing tasks, so I need a restart after Redis connection is up as well.

Also can you check if your tasks are received again after the connection is up again?

I'll try to post back when I have the results. I've also applied --no-mingle and --no-gossip options discussed above to Celery worker, which seemed to help during my local tests, hopefully the probe won't even fail after minor connection problems.

Kashemir001 Nov 16, 2023

I've tried to implement described checks for two similar projects, in both cases restarts seem to help workers to get back to executing tasks after manually bringing Redis down and back up. Will see how it goes on a distance.

Speaking about CPU usage, when worker pings happen I see spikes of about 30 mCPU usage in project A and about 250 mCPU usage in project B. Both use Celery==5.2.7, main differences from the get go is that project A works with Redis 7.0, and project B with Redis 6.2, it also uses Poetry's venv inside its container and invoking new poetry shell for each ping seems to add some overhead (roghly 100 mCPU, although my tests on that were brief).

Also invoking pings like celery -A project_name inspect ping ... for project A had no affect on CPU utilization.

auvipy · 2023-11-13T13:43:25Z

auvipy
Nov 13, 2023
Maintainer

Agree, but it was either this or remove celery and redis from our app. Hope they fix this bug soon.

Thanks for the Very helpful comment BTW

0 replies

cw1427 · 2023-12-08T10:10:40Z

cw1427
Dec 8, 2023

Same issue, any update for resolve?

3 replies

Nusnus Dec 8, 2023
Maintainer

We are working on a new testing infrastructure that will allow tackling such bugs with a better tool chain. It can already reproduce the error automatically but only once it will be released we'll be able to pay more attention to Redis, as it will be very helpful in resolving the issue inc. automatic tests which we currently don't have nor capable of having until the new infra is ready.

cw1427 Jan 10, 2024

Change to use rabbitmq broker there is no issue

Nusnus Jan 19, 2024
Maintainer

@cw1427 FYI

wtlyu · 2024-01-15T07:10:29Z

wtlyu
Jan 15, 2024

Same issue here. we use celery in our three projects. but all of them are facing this un-expected error.. we will try replace redis to rabbitmq. however if it not work, we have to think about the usage of celery in real-production

2 replies

Nusnus Jan 15, 2024
Maintainer

See my response below.

we have to think about the usage of celery in real-production

I hope you’ll conclude its best to keep it!
I’ve been fixing bugs & improving QA for the last year or so and I aim to improve the production stability forward as well, as a personal goal for 2024.

Wait around, Celery is going places ;)

Nusnus Jan 19, 2024
Maintainer

@wtlyu FYI

maximegaillard · 2024-01-15T09:11:42Z

maximegaillard
Jan 15, 2024

Same issue, we switched from redis to rabbitmq a few weeks ago, still the issue.
We noticed a drop in RAM usage on celery workers when the issue happens... so for now we have an alert on RAM drop on our pods... it's better than nothing but the broker switch was, in our case, useless.

3 replies

Nusnus Jan 15, 2024
Maintainer

You switched to RabbitMQ and the worker stopped consuming tasks after reconnection like with Redis?

maximegaillard Jan 15, 2024

Yes. We switched one project for now, and we still have the same issue on redis + rabbitmq projects

Nusnus Jan 19, 2024
Maintainer

@maximegaillard

Yes. We switched one project for now, and we still have the same issue on redis + rabbitmq projects

Can you please check if the bug reproduces with v5.4.0rc1?

Our new tests shows the bugfix works, for both Redis and RabbitMQ so either the bug is really fixed or our tests aren't good enough.

I'd appreciate any feedback.

Thank you 🙏

jamiegau · 2024-01-15T09:28:10Z

jamiegau
Jan 15, 2024

yes, time to shake the tree on this issue. Been listed for 2 years now. And no rectification.
Causing many developers and production sites headaches. This is a key feature of using celery.
I have been just painfully dealing with this, but now I am considering I need to take Redis outside of my main docker-compose and use a dedicated Redis instance. Other docker compose instances would then not rely on loosing connection based on if I reset a master docker-compose setup that contains the Redis setup...

But I suppose removing the Redis into an independent instance that is stable and rarely reset would be the middle ground.

0 replies

Nusnus · 2024-01-15T12:10:33Z

Nusnus
Jan 15, 2024
Maintainer

@wtlyu

Same issue here. we use celery in our three projects. but all of them are facing this un-expected error.. we will try replace redis to rabbitmq. however if it not work, we have to think about the usage of celery in real-production

@jamiegau

yes, time to shake the tree on this issue. Been listed for 2 years now. And no rectification. Causing many developers and production sites headaches. This is a key feature of using celery.

As I mentioned in a comment above (and a few other places), there is indeed a lot of work behind the scenes.
The main friction with fixing bugs is reproducing them in a development-ready environment. When the bug is just a faulty canvas, it’s very easy to reproduce, but when your environment requires more production-like scenarios, the friction causes an overhead that pushes away these fixes for years, as the Celery community unfortunately knows.

YESTERDAY we’ve reached a significant milestone with the “behind the scenes” effort to improve our testing infrastructure.
One of the benefits is that this Redis bug can now be easily reproduced in a testing environment. That being said, this is where my focus has been in the last 10-12 months or so and will continue to be until I finish the task, with the given Redis bug being like a huge weight on our shoulders (having complaints about the doc saying its feature complete where this bug breaks this statement).

All in all, I want to clarify that we took a step aside so we can take many more steps forward, and our focus is building the infrastructure that will allow us to deal with many years of issues, not just one at a time, by giving the community the simplicity it needs to contribute.

Once our new testing infrastructure is finalized, we’ll be able to focus on the Celery v5.4 release, which I’d hope to also include a fix for this issue as I prefer to avoid just decreasing the official support and fixing the bug instead.

2 replies

Nusnus Jan 15, 2024
Maintainer

One of the benefits is that this Redis bug can now be easily reproduced in a testing environment.

For example,

def test_redis_bug(celery_setup: CeleryTestSetup):
    res = add.s(4, 2).apply_async()
    assert res.get() == 6
    celery_setup.broker.restart()
    res = add.s(4, 2).apply_async()
    assert res.get(timeout=2) == 6

 -------------- smoke_tests_worker@f08c6a53dd65 v5.3.6 (emerald-rush)
--- ***** -----
-- ******* ---- Linux-6.5.11-linuxkit-aarch64-with-glibc2.36 2024-01-15 12:40:59
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         celery_test_app:0xffff94cfac50
- ** ---------- .> transport:   redis://352674e72f38:6379/0
- ** ---------- .> results:     redis://66857ed93ad6/0
- *** --- * --- .> concurrency: 10 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery


[tasks]
  . pytest_celery.vendors.worker.tasks.ping
  . t.integration.tasks.add
  . t.integration.tasks.add_chord_to_chord
  . t.integration.tasks.add_ignore_result
  . t.integration.tasks.add_not_typed
  . t.integration.tasks.add_replaced
  . t.integration.tasks.add_to_all
  . t.integration.tasks.add_to_all_to_chord
  . t.integration.tasks.build_chain_inside_task
  . t.integration.tasks.chain_add
  . t.integration.tasks.chord_add
  . t.integration.tasks.collect_ids
  . t.integration.tasks.delayed_sum
  . t.integration.tasks.delayed_sum_with_soft_guard
  . t.integration.tasks.errback_new_style
  . t.integration.tasks.errback_old_style
  . t.integration.tasks.fail
  . t.integration.tasks.fail_replaced
  . t.integration.tasks.fail_unpickleable
  . t.integration.tasks.identity
  . t.integration.tasks.ids
  . t.integration.tasks.mul
  . t.integration.tasks.print_unicode
  . t.integration.tasks.raise_error
  . t.integration.tasks.rebuild_signature
  . t.integration.tasks.redis_count
  . t.integration.tasks.redis_echo
  . t.integration.tasks.redis_echo_group_id
  . t.integration.tasks.replace_with_chain
  . t.integration.tasks.replace_with_chain_which_raises
  . t.integration.tasks.replace_with_empty_chain
  . t.integration.tasks.replace_with_stamped_task
  . t.integration.tasks.replaced_with_me
  . t.integration.tasks.retry
  . t.integration.tasks.retry_once
  . t.integration.tasks.retry_once_headers
  . t.integration.tasks.retry_once_priority
  . t.integration.tasks.retry_unpickleable
  . t.integration.tasks.return_exception
  . t.integration.tasks.return_nested_signature_chain_chain
  . t.integration.tasks.return_nested_signature_chain_chord
  . t.integration.tasks.return_nested_signature_chain_group
  . t.integration.tasks.return_nested_signature_chord_chain
  . t.integration.tasks.return_nested_signature_chord_chord
  . t.integration.tasks.return_nested_signature_chord_group
  . t.integration.tasks.return_nested_signature_group_chain
  . t.integration.tasks.return_nested_signature_group_chord
  . t.integration.tasks.return_nested_signature_group_group
  . t.integration.tasks.return_priority
  . t.integration.tasks.return_properties
  . t.integration.tasks.second_order_replace1
  . t.integration.tasks.second_order_replace2
  . t.integration.tasks.sleeping
  . t.integration.tasks.tsum
  . t.integration.tasks.write_to_file_and_return_int
  . t.integration.tasks.xsum
  . t.smoke.tasks.long_running_task
  . t.smoke.tasks.noop
  . t.smoke.tasks.replace_with_task
  . t.smoke.tasks.self_termination_delay_timeout
  . t.smoke.tasks.self_termination_exhaust_memory
  . t.smoke.tasks.self_termination_sigkill
  . t.smoke.tasks.self_termination_system_exit

/celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2024-01-15 12:40:59,979: WARNING/MainProcess] /celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2024-01-15 12:40:59,988: INFO/MainProcess] Connected to redis://352674e72f38:6379/0
Connected to redis://352674e72f38:6379/0
[2024-01-15 12:40:59,988: WARNING/MainProcess] /celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

/celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

mingle: searching for neighbors
[2024-01-15 12:40:59,990: INFO/MainProcess] mingle: searching for neighbors
[2024-01-15 12:41:00,995: INFO/MainProcess] mingle: all alone
mingle: all alone
smoke_tests_worker@f08c6a53dd65 ready.
[2024-01-15 12:41:01,004: INFO/MainProcess] smoke_tests_worker@f08c6a53dd65 ready.
[2024-01-15 12:41:01,370: INFO/MainProcess] Task t.integration.tasks.add[3db7c8bc-d3d2-4c6e-bb62-04f6fb4783e1] received
Task t.integration.tasks.add[3db7c8bc-d3d2-4c6e-bb62-04f6fb4783e1] received
Task t.integration.tasks.add[3db7c8bc-d3d2-4c6e-bb62-04f6fb4783e1] succeeded in 0.006876874999989013s: 6
[2024-01-15 12:41:01,378: INFO/ForkPoolWorker-8] Task t.integration.tasks.add[3db7c8bc-d3d2-4c6e-bb62-04f6fb4783e1] succeeded in 0.006876874999989013s: 6
[2024-01-15 12:41:01,493: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/celery/celery/worker/consumer/consumer.py", line 340, in start
    blueprint.start(self)
  File "/celery/celery/bootsteps.py", line 116, in start
    step.start(parent)
consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/celery/celery/worker/consumer/consumer.py", line 340, in start
    blueprint.start(self)
  File "/celery/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/celery/celery/worker/consumer/consumer.py", line 746, in start
    c.loop(*c.loop_args())
  File "/celery/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.11/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 1344, in on_readable
    self.cycle.on_readable(fileno)
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 569, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 913, in _receive
    ret.append(self._receive_one(c))
               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 923, in _receive_one
    response = c.parse_response()
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 824, in parse_response
    response = self._execute(conn, try_read)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 800, in _execute
    return conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 802, in <lambda>
    lambda error: self._disconnect_raise_connect(conn, error),
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 789, in _disconnect_raise_connect
    raise error
  File "/usr/local/lib/python3.11/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
           ^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 801, in <lambda>
    lambda: command(*args, **kwargs),
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 822, in try_read
    return conn.read_response(disconnect_on_error=False, push_request=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 500, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 15, in read_response
    result = self._read_response(disable_decoding=disable_decoding)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 25, in _read_response
    raw = self._buffer.readline()
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/socket.py", line 115, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/socket.py", line 68, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
  File "/celery/celery/worker/consumer/consumer.py", line 746, in start
    c.loop(*c.loop_args())
  File "/celery/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.11/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 1344, in on_readable
    self.cycle.on_readable(fileno)
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 569, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 913, in _receive
    ret.append(self._receive_one(c))
               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kombu/transport/redis.py", line 923, in _receive_one
    response = c.parse_response()
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 824, in parse_response
    response = self._execute(conn, try_read)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 800, in _execute
    return conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 802, in <lambda>
    lambda error: self._disconnect_raise_connect(conn, error),
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 789, in _disconnect_raise_connect
    raise error
  File "/usr/local/lib/python3.11/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
           ^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 801, in <lambda>
    lambda: command(*args, **kwargs),
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 822, in try_read
    return conn.read_response(disconnect_on_error=False, push_request=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 500, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 15, in read_response
    result = self._read_response(disable_decoding=disable_decoding)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 25, in _read_response
    raw = self._buffer.readline()
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/socket.py", line 115, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.11/site-packages/redis/_parsers/socket.py", line 68, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
[2024-01-15 12:41:01,496: WARNING/MainProcess] /celery/celery/worker/consumer/consumer.py:391: CPendingDeprecationWarning:
In Celery 5.1 we introduced an optional breaking change which
on connection loss cancels all currently executed tasks with late acknowledgement enabled.
These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered
back to the queue. You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss
setting. In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.

  warnings.warn(CANCEL_TASKS_BY_DEFAULT, CPendingDeprecationWarning)

/celery/celery/worker/consumer/consumer.py:391: CPendingDeprecationWarning:
In Celery 5.1 we introduced an optional breaking change which
on connection loss cancels all currently executed tasks with late acknowledgement enabled.
These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered
back to the queue. You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss
setting. In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.

  warnings.warn(CANCEL_TASKS_BY_DEFAULT, CPendingDeprecationWarning)

[2024-01-15 12:41:01,497: WARNING/MainProcess] /celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(
/celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(


[2024-01-15 12:41:01,498: ERROR/MainProcess] consumer: Cannot connect to redis://352674e72f38:6379/0: Error 111 connecting to 352674e72f38:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

consumer: Cannot connect to redis://352674e72f38:6379/0: Error 111 connecting to 352674e72f38:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

[2024-01-15 12:41:03,503: INFO/MainProcess] Connected to redis://352674e72f38:6379/0
Connected to redis://352674e72f38:6379/0
[2024-01-15 12:41:03,503: WARNING/MainProcess] /celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
/celery/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

  warnings.warn(

[2024-01-15 12:41:03,505: INFO/MainProcess] mingle: searching for neighbors
mingle: searching for neighbors
===================================================================== short test summary info =====================================================================
FAILED t/smoke/tests/test_canvas.py::test_redis_bug[celery_setup_worker-celery_redis_backend] - celery.exceptions.TimeoutError: The operation timed out.
=============================================================== 1 failed, 133 deselected in 26.62s ================================================================

Nusnus Jan 15, 2024
Maintainer

For comparison, running the same test with RabbitMQ passed:

pytest -vvvv t/smoke -k test_redis_bug
======================================================================= test session starts =======================================================================
platform darwin -- Python 3.12.1, pytest-7.4.4, pluggy-1.3.0 -- /Users/nusnus/.pyenv/versions/3.12.1/envs/celery_py312/bin/python3.12
cachedir: .pytest_cache
rootdir: /Users/nusnus/dev/GitHub/celery
configfile: pyproject.toml
plugins: docker-tools-3.1.3, timeout-2.2.0, order-1.2.0, click-1.1.0, subtests-0.11.0, rerunfailures-13.0, celery-1.0.0b1, xdist-3.5.0
collected 134 items / 133 deselected / 1 selected

t/smoke/tests/test_canvas.py::test_redis_bug[celery_setup_worker-celery_redis_backend] PASSED                                                               [100%]

=============================================================== 1 passed, 133 deselected in 50.95s ================================================================

jamiegau · 2024-01-15T13:23:37Z

jamiegau
Jan 15, 2024

Just to confirm, if the Redis server is run from outside a typical Docker compose implementation and is reasonably static and rarely restarted, this would largely avoid this issue?

I mainly have this issue when a main application, that has the Redis instance as part of the docker-compose file, is restarted, usually because I am updating the code. (every few weeks is common). When I do this I have 2 other containers that run as celery workers based on separate docker-compose files. They will be disconnected on Redis restart and fail to reconnect until I do a complete restart of those containers. (Which I commonly forget to do)

So technically, if I rarely stop Redis, I should avoid the problem most of the time.

If this is the case, it should be a recommendation in the docs until rectified.

1 reply

Nusnus Jan 16, 2024
Maintainer

As you said, this is an avoidance and in that case this solution is specific for your environment so I prefer to avoid recommending workarounds and if time permits focus, invest it in solving the issue and not to offer ways to live with it 😉

Nusnus · 2024-01-17T11:27:51Z

Nusnus
Jan 17, 2024
Maintainer

@jamiegau

yes, time to shake the tree on this issue. Been listed for 2 years now. And no rectification. Causing many developers and production sites headaches.

The tree has been shaken!
#8796 - waiting for a review from @thedrow

0 replies

Nusnus · 2024-01-17T18:08:16Z

Nusnus
Jan 17, 2024
Maintainer

Celery v5.4.0rc1 is ready for testing!
Please report if this issue is resolved! 🙏

4 replies

MaximilianStach Jan 25, 2024

Unfortunately I can still observe that with v5.4.0rc1 workers do not sync with other workers after reconnect and do not pick up new tasks.

[2024-01-25 12:56:34,626 ERROR/celery.worker.consumer.consumer-126 consumer.py:493] consumer: Cannot connect to redis://redis:6379//: Error -2 connecting to redis:6379. Name or service not known..
2024-01-25T12:56:34.633978004Z Trying again in 6.00 seconds... (3/100)
2024-01-25T12:56:34.633980491Z 
2024-01-25T12:56:40.683776431Z [2024-01-25 12:56:40,680 INFO/celery.worker.consumer.connection-126 connection.py:22] Connected to redis://redis:6379//
2024-01-25T12:56:40.683789253Z [2024-01-25 12:56:40,681 WARNING/py.warnings-126 warnings.py:109] /opt/venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
2024-01-25T12:56:40.683792996Z whether broker connection retries are made during startup in Celery 6.0 and above.
2024-01-25T12:56:40.683796263Z If you wish to retain the existing behavior for retrying connections on startup,
2024-01-25T12:56:40.683799478Z you should set broker_connection_retry_on_startup to True.
2024-01-25T12:56:40.683802597Z   warnings.warn(
2024-01-25T12:56:40.683805485Z 
2024-01-25T12:56:40.698315979Z [2024-01-25 12:56:40,698 INFO/celery.worker.consumer.mingle-126 mingle.py:40] mingle: searching for neighbors
2024-01-25T12:56:41.767789754Z [2024-01-25 12:56:41,757 INFO/celery.worker.consumer.mingle-126 mingle.py:49] mingle: all alone

We have multiple celery workers and redis as part of a docker swarm stack.
Our current workaround is to let the workers (and therefore the containers) crash on losing connection to redis by setting
broker_connection_retry = False
The newly started worker containers as part of the restart strategy of our docker stack then work as expected.

jamiegau Jan 26, 2024

HI all,
I had time to upgrade my code / docker images to this new version. v5.4.0rc1
I too saw the problem remain. After the shutdown and restart of the Redis instance, my celery worker could not reconnect to the redis server.
Could docker networking be connected to this? ie, being that I am exposing the celery-worker to the redis server via exposing the port?

fatimah-z Feb 3, 2024

I also tried upgrading to v5.4.0rc1 but unfortunately the issue still persists for me as well. On reconnection to redis, the workers just hang and do not pick any tasks. The only solution is to restart the workers as with the previous version 5.3.1 i had.

Nusnus Feb 3, 2024
Maintainer

Ok, thank you.
I'll continue the investigation after my next few items.

aperture147 · 2024-01-30T03:41:35Z

aperture147
Jan 30, 2024

Same problem with celery==5.3.6, kombu==5.3.4, using redis as broker and mongo as backend. My worker had just randomly stopped working. Tried redis from redislabs, upstash and self hosted on EC2, Oracle Cloud and all of them have the same problem. We also tried to switch mongo from mongo atlas to self hosted mongo in case of weird bugs but it does not help also. It seems like the network between my Linux machine which hosted the worker and redis server randomly disconnected for a split secon, made the worker lose itself.

This is my celeryconfig.py:

import configparser
config = configparser.ConfigParser()
config.read('config.ini')

CELERY_CONFIG = config['celery']

broker_url = CELERY_CONFIG['broker']
result_backend = CELERY_CONFIG['backend']

worker_cancel_long_running_tasks_on_connection_loss = True
broker_connection_retry_on_startup = True

# FIXME: work-around for sudden connection drop on Redis.
# Track this problem on:
#   - https://groups.google.com/g/celery-users/c/6yF34oA30Ys
#   - https://github.com/celery/celery/discussions/7276
broker_connection_max_retries = None
broker_pool_limit = None
worker_deduplicate_successful_tasks = True
worker_concurrency = 1
worker_prefetch_multiplier = 1

worker_state_db = "state.db"


worker_send_task_events = True

worker_pool = 'prefork'

task_time_limit = 3600

My worker starting command are: celery -A main worker --loglevel=DEBUG --logfile=/var/log/celery.log

Running celery -A project inspect active still shows that my worker is online. Log showed in my log file even showing that they PONG pack with the PING request from celery command above.

After my worker stopped consuming task, I tried to SIGTERM the process to shut it down but it refuses to die. I tried to run it by command by hitting Ctrl+C 2 times but it also kept hanging, I have to kill it by kill my docker container (or kill the supervisor of the process). Even letting it to be gracefully shutdown didn't work. This is my last line of debug log recorded in my log file.

It seems like the worker couldn't close the connection and stuck there.

I haven't tried the --without-heartbeat --without-gossip --without-mingle yet but I tried --without-gossip --without-mingle but the problem still persists. Heartbeat is important for us to check for system health, so it could not be disabled.

4 replies

krukas Mar 8, 2024

We had the same problem i don't know how related it is with the original post.I could reproduce it on our setup, because of a firewall reload could interrupt the network for few seconds.

To reproduce the problem:

Run a loop that send a task (can be a small dummy task)
Cause an interrupt (In our case a firewall reload)

Based on all the information i could find, I think this is what happens on a network interrupt:

Worker started and waiting for message
Beat/application send task
Task accepted by Redis
Network connection is interrupted
Redis want to send pubsub message and can't connect to client and drops connection
Worker is not aware that connection is dropped and keeps waiting on a dead connection.

Only thing is don't understand why ping and inspect command still seems to work. Are these broadcasted via a different way?

Solution:

Default the tcp keepalive is not enabled, this is something that is mentioned in a issue at Redis: redis/redis#7855

I have added the following settings:

import socket

CELERY_BROKER_TRANSPORT_OPTIONS = {
    "socket_keepalive": True,
    "socket_keepalive_options": {
        socket.TCP_KEEPIDLE: 60,
        socket.TCP_KEEPCNT: 5,
        socket.TCP_KEEPINTVL: 10,
    },
}

If I try to reproduce the problem with these settings, the worker stops processing jobs and after 60 seconds it starts processing jobs again.

Nusnus Mar 8, 2024
Maintainer

Very interesting! @krukas

May I ask that you also try this workaround on v5.4.0rc1?

Thinking out loud - if this gives a general enough remediation for this infamous bug, maybe it's worth making it the default settings for 5.4.0, at least to release the pressure..

Solution:

Default the tcp keepalive is not enabled, this is something that is mentioned in a issue at Redis: redis/redis#7855

I have added the following settings:
import socket

CELERY_BROKER_TRANSPORT_OPTIONS = {
    "socket_keepalive": True,
    "socket_keepalive_options": {
        socket.TCP_KEEPIDLE: 60,
        socket.TCP_KEEPCNT: 5,
        socket.TCP_KEEPINTVL: 10,
    },
}
If I try to reproduce the problem with these settings, the worker stops processing jobs and after 60 seconds it starts processing jobs again.

krukas Mar 8, 2024

@Nusnus No, I have only tested this on 5.3.6

Based on issue of Redis I think it should be a good idea to have some default for keepalive.

Nusnus Mar 8, 2024
Maintainer

Sure np - I’ll check it out myself then, hopefully soon enough 🙏
If it works for v5.3.6, including repeating D/C (e.g., Redis server/container restarts multiple times and the worker keeps working as expected, with a small delay maybe during the D/Ction), then this could be a good candidate for v5.4.0 until the source of the problem is resolved.

WilliamDEdwards · 2024-03-14T09:22:30Z

WilliamDEdwards
Mar 14, 2024

I am also experiencing this issue. Simple setup: Celery worker, one Redis database as broker, one as results backend, both on the same Redis instance. When restarting Redis, meaning it comes back up within seconds or even less, this is logged:

[2024-03-14 09:30:03,785: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/opt/clusterapi/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 340, in start
    blueprint.start(self)
  File "/opt/clusterapi/lib/python3.11/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/opt/clusterapi/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 742, in start
    c.loop(*c.loop_args())
  File "/opt/clusterapi/lib/python3.11/site-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/opt/clusterapi/lib/python3.11/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/opt/clusterapi/lib/python3.11/site-packages/kombu/transport/redis.py", line 1343, in on_readable
    self.cycle.on_readable(fileno)
  File "/opt/clusterapi/lib/python3.11/site-packages/kombu/transport/redis.py", line 568, in on_readable
    chan.handlers[type]()
  File "/opt/clusterapi/lib/python3.11/site-packages/kombu/transport/redis.py", line 912, in _receive
    ret.append(self._receive_one(c))
               ^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/kombu/transport/redis.py", line 922, in _receive_one
    response = c.parse_response()
               ^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/client.py", line 1542, in parse_response
    response = self._execute(conn, try_read)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/client.py", line 1518, in _execute
    return conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/client.py", line 1520, in <lambda>
    lambda error: self._disconnect_raise_connect(conn, error),
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/client.py", line 1507, in _disconnect_raise_connect
    raise error
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
           ^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/client.py", line 1519, in <lambda>
    lambda: command(*args, **kwargs),
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/client.py", line 1540, in try_read
    return conn.read_response(disconnect_on_error=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/connection.py", line 882, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/connection.py", line 349, in read_response
    result = self._read_response(disable_decoding=disable_decoding)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/connection.py", line 359, in _read_response
    raw = self._buffer.readline()
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/connection.py", line 262, in readline
    self._read_from_socket()
  File "/opt/clusterapi/lib/python3.11/site-packages/redis/connection.py", line 215, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
[2024-03-14 09:30:03,822: WARNING/MainProcess] /opt/clusterapi/lib/python3.11/site-packages/celery/worker/consumer/consumer.py:391: CPendingDeprecationWarning:
In Celery 5.1 we introduced an optional breaking change which
on connection loss cancels all currently executed tasks with late acknowledgement enabled.
These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered
back to the queue. You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss
setting. In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.

  warnings.warn(CANCEL_TASKS_BY_DEFAULT, CPendingDeprecationWarning)

[2024-03-14 09:30:03,833: WARNING/MainProcess] /opt/clusterapi/lib/python3.11/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2024-03-14 09:30:03,838: ERROR/MainProcess] consumer: Cannot connect to redis://:**@cluster-api-redis0.cyberfusion.cloud:6379/0: Error 111 connecting to cluster-api-redis0.cyberfusion.cloud:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

[2024-03-14 09:30:05,846: INFO/MainProcess] Connected to redis://:**@cluster-api-redis0.cyberfusion.cloud:6379/0
[2024-03-14 09:30:05,848: WARNING/MainProcess] /opt/clusterapi/lib/python3.11/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2024-03-14 09:30:05,855: INFO/MainProcess] mingle: searching for neighbors
[2024-03-14 09:30:06,865: INFO/MainProcess] mingle: all alone
[2024-03-14 09:30:11,869: INFO/MainProcess] missed heartbeat from cluster-api2.cyberfusion.cloud@cluster-api2.cyberfusion.cloud
[2024-03-14 09:30:11,870: INFO/MainProcess] missed heartbeat from cluster-api1.cyberfusion.cloud@cluster-api1.cyberfusion.cloud

Sometimes, tasks 'work' again, but more often, they don't. Whether they do or don't, seems random. When tasks are received, nothing is logged. Contrary to some other messages in this thread, Celery shuts down gracefully when stopping it.

What I've tried:

Set redis_socket_keepalive=True, which https://docs.celeryq.dev/en/stable/userguide/configuration.html#redis-socket-keepalive describes as:

Socket TCP keepalive to keep connections healthy to the Redis server, used by the redis result backend.

... so it's no surprise it doesn't work, as restarting the single Redis instance causes the broker to become unavailable too.

Set keepalive on the socket, according to the configuration at #7276 (reply in thread). That doesn't help either.

1 reply

timrichardson Mar 21, 2024

I have a related problem, or the same problem but perhaps a different root cause. In a kubernetes deployment, I have a redis pod, a django pod and a second deployment for celery (with autoscale process workers). In my logs I get socket timeout errors often while trying to get a redis lock. Then it all goes wrong. No more tasks are every executed. I have tried everything I can think of to get the worker to retry the redis connection, but the logs do not show any retry behaviour.

The redis pod has liveness checking and it never fails.

I have resorted to running a one a minute celery beat task which gets the worker to touch a file, and my liveness check in kubernetes checks how fresh that file is.

I do not know how to reproduce this error. I have celery 5.3.6 and redis 7.2.3

these are my configuration options:


##### CELERY
CELERY_DATABASE_NUMBER = 2
CELERY_BROKER_URL = f"redis://{REDIS_HOST}:6379/{CELERY_DATABASE_NUMBER}"
CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:6379/{CELERY_DATABASE_NUMBER}"
CELERY_BROKER_TRANSPORT_OPTIONS = {'global_keyprefix': SHORT_HOSTNAME + "-", 'max_retries': 5, 'interval_start': 0,
                                   'interval_step': 0.2, 'interval_max': 1,
                                   "socket_keepalive": True,  # newly tried, only in production for about two hours
                                   "socket_keepalive_options": {  # same with these, I can't see if these fix the problem or not
                                       socket.TCP_KEEPIDLE: 60,
                                       socket.TCP_KEEPCNT: 5,
                                       socket.TCP_KEEPINTVL: 10,
                                   },

                                   }
CELERY_TASK_DEFAULT_QUEUE = SHORT_HOSTNAME
CELERY_TASK_ACKS_LATE = True  #should allow retrying if the job is terminated, e.g. pod is killed by autoscaler
CELERY_CANCEL_TASKS_BY_DEFAULT = False  #cancel tasks which did not complete before worker terminated
CELERY_BROKER_CONNECTION_RETRY = True
CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True
CELERY_BROKER_CONNECTION_MAX_RETRIES = 5
CELERY_BROKER_CONNECTION_TIMEOUT = 2  # seconds

worker stops consuming tasks after redis reconnection on celery 5 #7276

Replies: 39 comments · 35 replies

auvipy Nov 13, 2023 Maintainer

Nusnus Dec 8, 2023 Maintainer

Nusnus Jan 19, 2024 Maintainer

Nusnus Jan 15, 2024 Maintainer

Nusnus Jan 19, 2024 Maintainer

Nusnus Jan 15, 2024 Maintainer

Nusnus Jan 19, 2024 Maintainer

Nusnus Jan 15, 2024 Maintainer

Nusnus Jan 15, 2024 Maintainer

Nusnus Jan 15, 2024 Maintainer

Nusnus Jan 16, 2024 Maintainer

Nusnus Jan 17, 2024 Maintainer

Nusnus Jan 17, 2024 Maintainer

Nusnus Feb 3, 2024 Maintainer

Nusnus Mar 8, 2024 Maintainer

Nusnus Mar 8, 2024 Maintainer

Replies: 39 comments 35 replies

auvipy
Nov 13, 2023
Maintainer

Nusnus Dec 8, 2023
Maintainer

Nusnus Jan 19, 2024
Maintainer

Nusnus Jan 15, 2024
Maintainer

Nusnus Jan 19, 2024
Maintainer

Nusnus Jan 15, 2024
Maintainer

Nusnus Jan 19, 2024
Maintainer

Nusnus
Jan 15, 2024
Maintainer

Nusnus Jan 15, 2024
Maintainer

Nusnus Jan 15, 2024
Maintainer

Nusnus Jan 16, 2024
Maintainer

Nusnus
Jan 17, 2024
Maintainer

Nusnus
Jan 17, 2024
Maintainer

Nusnus Feb 3, 2024
Maintainer

Nusnus Mar 8, 2024
Maintainer

Nusnus Mar 8, 2024
Maintainer