Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemD unit files don't work with Celery 5 #6397

Closed
11 of 18 tasks
thedrow opened this issue Oct 6, 2020 · 13 comments · Fixed by #6401
Closed
11 of 18 tasks

SystemD unit files don't work with Celery 5 #6397

thedrow opened this issue Oct 6, 2020 · 13 comments · Fixed by #6401

Comments

@thedrow
Copy link
Member

thedrow commented Oct 6, 2020

Checklist

  • I have verified that the issue exists against the master branch of Celery.
  • This has already been asked to the discussion group first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the master branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
  • I have verified that the issue exists against the master branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.

Optional Debugging Information

  • I have tried reproducing the issue on more than one Python version
    and/or implementation.
  • I have tried reproducing the issue on more than one message broker and/or
    result backend.
  • I have tried reproducing the issue on more than one version of the message
    broker and/or result backend.
  • I have tried reproducing the issue on more than one operating system.
  • I have tried reproducing the issue on more than one workers pool.
  • I have tried reproducing the issue with autoscaling, retries,
    ETA/Countdown & rate limits disabled.
  • I have tried reproducing the issue after downgrading
    and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

Possible Duplicates

  • None

Environment & Settings

Celery version: 5.0.0 (singularity)

celery report Output:

celery report

software -> celery:5.0.0 (singularity) kombu:5.0.2 py:3.8.2
            billiard:3.6.3.0 py-amqp:5.0.1
platform -> system:Linux arch:64bit, ELF
            kernel version:5.4.0-48-generic imp:CPython
loader   -> celery.loaders.default.Loader
settings -> transport:amqp results:disabled

deprecated_settings: None

Steps to Reproduce

Required Dependencies

  • Minimal Python Version: N/A or Unknown
  • Minimal Celery Version: 5.0.0
  • Minimal Kombu Version: N/A or Unknown
  • Minimal Broker Version: N/A or Unknown
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

alabaster==0.7.12
amqp==5.0.1
appdirs==1.4.4
attrs==19.3.0
autoflake==1.3.1
autopep8==1.5.4
aws-xray-sdk==0.95
azure-common==1.1.5
azure-nspkg==3.0.2
azure-storage==0.36.0
azure-storage-common==1.1.0
azure-storage-nspkg==3.1.0
Babel==2.8.0
backcall==0.2.0
bandit==1.6.2
basho-erlastic==2.1.1
billiard==3.6.3.0
bleach==3.1.5
boto==2.49.0
boto3==1.13.26
botocore==1.16.26
bump2version==1.0.0
bumpversion==0.6.0
case==1.5.3
cassandra-driver==3.20.2
-e git+git@github.com:celery/celery.git@eab4bc3d28b5dc547bb6dfac981b01595c565c3a#egg=celery
certifi==2020.4.5.1
cffi==1.14.0
cfgv==3.1.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
click-repl==0.1.6
cmake-setuptools==0.1.3
codecov==2.1.8
colorama==0.4.3
couchbase==2.5.12
coverage==5.2
cryptography==3.0
DateTime==4.3
decorator==4.4.2
distlib==0.3.1
dnspython==1.16.0
docker==4.2.1
docutils==0.15.2
ecdsa==0.15
elasticsearch==7.8.1
ephem==3.7.7.1
eventlet==0.26.1
filelock==3.0.12
future==0.18.2
gevent==20.6.2
gitdb==4.0.5
GitPython==3.1.7
greenlet==0.4.16
identify==1.4.19
idna==2.9
imagesize==1.2.0
importlib-metadata==1.6.1
iniconfig==1.0.1
ipython==7.17.0
ipython-genutils==0.2.0
isort==5.0.9
jedi==0.17.2
jeepney==0.4.3
Jinja2==2.11.2
jmespath==0.10.0
jsondiff==1.1.1
jsonpickle==1.4.1
keyring==21.2.1
kombu==5.0.2
linecache2==1.0.0
MarkupSafe==1.1.1
mock==4.0.2
monotonic==1.5
more-itertools==8.3.0
moto==1.3.7
msgpack==1.0.0
nodeenv==1.4.0
nose==1.3.7
packaging==20.4
parso==0.7.1
pbr==5.4.5
pexpect==4.8.0
pickleshare==0.7.5
pkginfo==1.5.0.1
pluggy==0.13.1
pre-commit==2.5.1
prompt-toolkit==3.0.7
ptyprocess==0.6.0
py==1.9.0
pyaml==20.4.0
pyArango==1.3.4
pycodestyle==2.6.0
pycouchdb==1.14.1
pycparser==2.20
pycryptodome==3.9.7
pycurl==7.43.0.5
pydocumentdb==2.3.2
pyflakes==2.2.0
Pygments==2.6.1
pylibmc==1.6.1
pymongo==3.11.0
pyparsing==2.4.7
pytest==6.0.1
pytest-celery==0.0.0a1
pytest-cov==2.10.0
pytest-rerunfailures==9.0
pytest-sugar==0.9.4
pytest-timeout==1.4.2
pytest-travis-fold==1.3.0
python-consul==1.1.0
python-dateutil==2.8.1
python-jose==2.0.2
python-memcached==1.59
python3-protobuf==2.5.0
pytz==2020.1
pyupgrade==2.6.2
PyYAML==5.3.1
readme-renderer==26.0
redis==3.5.3
requests==2.23.0
requests-toolbelt==0.9.1
responses==0.10.14
rfc3986==1.4.0
riak==2.7.0
s3transfer==0.3.3
SecretStorage==3.1.2
simplejson==3.17.2
six==1.15.0
smmap==3.0.4
snowballstemmer==2.0.0
softlayer-messaging==1.0.3
Sphinx==3.0.4
sphinx-celery==2.0.0
sphinx-click==2.5.0
sphinx-testing==0.7.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
SQLAlchemy==1.3.18
stevedore==3.2.0
tblib==1.7.0
termcolor==1.1.0
tokenize-rt==4.0.0
toml==0.10.1
tox==3.20.0
tqdm==4.48.0
traceback2==1.4.0
traitlets==4.3.3
twine==3.2.0
typer==0.3.0
unittest2==1.1.0
urllib3==1.25.9
vine==5.0.0
virtualenv==20.0.31
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==0.57.0
Werkzeug==1.0.1
wrapt==1.12.1
xar==19.4.22
xmltodict==0.12.0
zipp==3.1.0
zope.event==4.4
zope.interface==5.1.0

Other Dependencies

A recent version of SystemD.

Minimally Reproducible Test Case

Expected Behavior

The service should start without error.

Actual Behavior

The service crashes with the following error:

okt 06 18:13:53 nezzybuild systemd[1]: Started Celery Service.
okt 06 18:13:53 nezzybuild systemd[1]: celery.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
okt 06 18:13:53 nezzybuild systemd[1]: celery.service: Failed with result 'exit-code'.
@thedrow
Copy link
Member Author

thedrow commented Oct 6, 2020

@mfonville Can you please share your celery.conf file?
Also could you please check journald?
Paste the exact output of journalctl -xe --unit celery.service.

Mine is (as expected):

Oct 06 21:01:46 ubuntu systemd[1]: Starting Celery Service...
-- Subject: A start job for unit celery.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit celery.service has begun execution.
--
-- The job identifier is 6969.
Oct 06 21:01:46 ubuntu celery[72244]: Usage: celery [OPTIONS] COMMAND [ARGS]...
Oct 06 21:01:46 ubuntu celery[72244]: Error: Invalid value for '-A' / '--app': No module named 'proj'
Oct 06 21:01:46 ubuntu systemd[1]: celery.service: Control process exited, code=exited, status=2/INVALIDARGUMENT

@mfonville
Copy link
Contributor

mfonville commented Oct 6, 2020

celery.conf (masked the name of my django app):

# Name of nodes to start
# here we have a single node
CELERYD_NODES="w1"
# or we could have three nodes:
#CELERYD_NODES="w1 w2 w3"

# Absolute or relative path to the 'celery' command:
CELERY_BIN="/opt/mydjangoapp/env/bin/celery"

# App instance to use
# comment out this line if you don't use an app
CELERY_APP="mydjangoapp"
# or fully qualified:
#CELERY_APP="proj.tasks:app"

# How to call manage.py
CELERYD_MULTI="multi"

# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=8"

# - %n will be replaced with the first part of the nodename.
# - %I will be replaced with the current child process index
#   and is important when using the prefork pool to avoid race conditions.
CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERYD_LOG_FILE="/var/log/celery/%n%I.log"
CELERYD_LOG_LEVEL="INFO"

# you may wish to add these options for Celery Beat
CELERYBEAT_PID_FILE="/var/run/celery/beat.pid"
CELERYBEAT_LOG_FILE="/var/log/celery/beat.log"

# For Celery 5 beat with django
DJANGO_SETTINGS_MODULE=mydjangoapp.settings
CELERYD_CHDIR="/opt/mdyjangoapp"

And the log:

-- The job identifier is 1177 and the job result is done.
okt 06 21:13:09 nezzybuild systemd[1]: Starting Celery Service...
-- Subject: A start job for unit celery.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit celery.service has begun execution.
-- 
-- The job identifier is 1177.
okt 06 21:13:09 nezzybuild sh[4802]: celery multi v5.0.0 (singularity)
okt 06 21:13:09 nezzybuild sh[4802]: > Starting nodes...
okt 06 21:13:09 nezzybuild sh[4802]:         > w1@nezzybuild: OK
okt 06 21:13:09 nezzybuild systemd[1]: Started Celery Service.
-- Subject: A start job for unit celery.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit celery.service has finished successfully.
-- 
-- The job identifier is 1177.
okt 06 21:13:09 nezzybuild systemd[1]: celery.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- An ExecStart= process belonging to unit celery.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 2.
okt 06 21:13:09 nezzybuild systemd[1]: celery.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit celery.service has entered the 'failed' state with result 'exit-code'.
lines 2944-2979/2979 (END)

@mfonville
Copy link
Contributor

(with the patch from #6388 )

@thedrow
Copy link
Member Author

thedrow commented Oct 7, 2020

I think that your problem is due to #6362.
What happens if you remove --time-limit=300?

The worker starts correctly, right?

@mfonville
Copy link
Contributor

mfonville commented Oct 7, 2020

Even without --time-limit (but without any other parameters removed) it gives an error:

-- The job identifier is 4893 and the job result is done.
okt 07 11:54:35 nezzybuild systemd[1]: Starting Celery Service...
-- Subject: A start job for unit celery.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit celery.service has begun execution.
-- 
-- The job identifier is 4893.
okt 07 11:54:36 nezzybuild sh[243146]: celery multi v5.0.0 (singularity)
okt 07 11:54:36 nezzybuild sh[243146]: > Starting nodes...
okt 07 11:54:36 nezzybuild sh[243146]:         > w1@nezzybuild: OK
okt 07 11:54:36 nezzybuild systemd[1]: Started Celery Service.
-- Subject: A start job for unit celery.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit celery.service has finished successfully.
-- 
-- The job identifier is 4893.
okt 07 11:54:36 nezzybuild systemd[1]: celery.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- An ExecStart= process belonging to unit celery.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 2.
okt 07 11:54:36 nezzybuild systemd[1]: celery.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit celery.service has entered the 'failed' state with result 'exit-code'.

@thedrow
Copy link
Member Author

thedrow commented Oct 7, 2020

Is there something in Celery's log file that indicates why the process exited?

@mfonville
Copy link
Contributor

No, nothing visible in the log :-/

@mfonville
Copy link
Contributor

Is it maybe an idea to let the the command itself report in its output the parameters it has interpreted? That way the user can verify any discrepancies.
And maybe there is someone from the Click-project that could take a look (e.g. asking them on their Discord?) whether they have any suggestions. Because I have the feeling that with multi but also some other commands that allow a set of recursive options Click might have some more straightforward solutions?

@thedrow
Copy link
Member Author

thedrow commented Oct 7, 2020

No. This is definitely our problem and I now know why this happens.
I'm working on a fix.

@thedrow
Copy link
Member Author

thedrow commented Oct 7, 2020

I have found some problems but not the problem.

I think we should take a closer look at our process detaching code.

@thedrow
Copy link
Member Author

thedrow commented Oct 7, 2020

For future reference this command exits without error but doesn't really start the process:

/home/thedrow/.virtualenvs/celery/bin/python -m celery -A myapp worker --detach --pidfile=/home/thedrow/Documents/Projects/celery/examples/app/worker.pid --logfile=/home/thedrow/Documents/Projects/celery/examples/app/worker%I.log --loglevel=INFO -n worker@ubuntu --executable=/home/thedrow/.virtualenvs/celery/bin/python

Your working directory should be /path/to/git/clone/celery/examples/app.

@thedrow
Copy link
Member Author

thedrow commented Oct 8, 2020

Can you please try #6401 (including the SystemD changes) and let me know?
I tested it locally. The service starts, stops, and restarts.

@mfonville
Copy link
Contributor

@thedrow yes, that fixes the last issues for me and makes Celery5 start correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants