Celery hangs when an exception occurs inside a group of chains with redis as a broker #6437

emanuelmd · 2020-10-25T22:34:05Z

Checklist

This has already been asked to the discussion group first.
I have read the relevant section in the
contribution guide
on reporting bugs.
I have checked the issues list
for similar or identical bug reports.
I have checked the pull requests list
for existing proposed fixes.
I have checked the commit log
to find out if the bug was already fixed in the master branch.
I have verified that the issue exists against the master branch of Celery.
I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

I have included the output of celery -A proj report in the issue.
(if you are not able to do this, then at least specify the Celery
version affected).
I have verified that the issue exists against the master branch of Celery.
I have included the contents of pip freeze in the issue.
I have included all the versions of all the external dependencies required
to reproduce this bug.

Optional Debugging Information

I have tried reproducing the issue on more than one Python version
and/or implementation.
I have tried reproducing the issue on more than one message broker and/or
result backend.
I have tried reproducing the issue on more than one version of the message
broker and/or result backend.
I have tried reproducing the issue on more than one operating system.
I have tried reproducing the issue on more than one workers pool.
I have tried reproducing the issue with autoscaling, retries,
ETA/Countdown & rate limits disabled.
I have tried reproducing the issue after downgrading
and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

N/A

Related Issues

None

Possible Duplicates

I've skimmed the issues list and the internet and it seems there's definitely something going on with this kind of chain composition. It seems the issue has been dating since at least 2013 and I can reproduce it on Celery 5.0.1 & 4.7.x

Environment & Settings

Celery 5.0.1 & 4.7.x
I included the report in the demo repo

Steps to Reproduce

Follow the instructions inside the demo repo

Required Dependencies

Minimal Python Version: Tested with 3.7.0, 3.8.0 & 3.8.5
Minimal Celery Version: 4.7.x

Python Packages

Included in demo repo

Other Dependencies

N/a

Minimally Reproducible Test Case

https://github.com/emanuelmd/freezelery

Expected Behavior

Exceptions should be propagated from chains that contain groups

Actual Behavior

The calling code hangs

Notes

Happy to help tracking this down. Let me know if you need any other information

The text was updated successfully, but these errors were encountered:

maybe-sybr · 2020-10-29T23:42:18Z

This appears to be related to chords misbehaving again - possibly an edge case we missed in #6354 or something nearby? I have a more MRTC here which appears to cover the gist of your linked repository:

import atexit

import celery

app = celery.Celery("app", broker="redis://", backend="redis://")

@app.task
def nop(*_):
    pass

@app.task
def die(*_):
    raise RuntimeError

g = celery.group((nop.s() | die.s() | celery.group(nop.s())) for _ in range(3))
r = g.delay()

atexit.register(lambda: print(r.get(disable_sync_subtasks=False, timeout=1)))

You can run celery -A app worker -l DEBUG and then ^C out after the three runtime errors appear. The atexit sync get will timeout if the final element of each chain in a group (ie. exception in chord header which is magically constructed from the group with a preceding chain element).

Removing the external group so it's just a single chain allows the result object to be joined without timing out, so my gut feel is that perhaps a task ID or some completion event for the encapsulating group isn't being handled properly. Maybe the promoted chord doesn't retain some task ID which is needed to finalise the chain for an encapsulating canvas signature type?

* Added testcase for issue #6437. * Add second test case.

* Added testcase for issue celery#6437. * Add second test case.

emanuelmd added the Issue Type: Bug Report label Oct 25, 2020

thedrow added Component: Canvas Component: Redis Broker labels Nov 9, 2020

thedrow added Status: Confirmed ✔ Status: Has Testcase ✔ labels Mar 17, 2021

thedrow added a commit that referenced this issue Mar 17, 2021

Added testcase for issue #6437.

14a857d

thedrow mentioned this issue Mar 17, 2021

Added testcase for issue #6437 #6684

Merged

thedrow added a commit that referenced this issue Mar 17, 2021

Added testcase for issue #6437.

50bc6f6

auvipy pushed a commit that referenced this issue Mar 22, 2021

Added testcase for issue #6437 (#6684)

aaef28c

* Added testcase for issue #6437. * Add second test case.

jeyrce pushed a commit to jeyrce/celery that referenced this issue Aug 25, 2021

Added testcase for issue celery#6437 (celery#6684)

d7f7b21

* Added testcase for issue celery#6437. * Add second test case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Celery hangs when an exception occurs inside a group of chains with redis as a broker #6437

Celery hangs when an exception occurs inside a group of chains with redis as a broker #6437

emanuelmd commented Oct 25, 2020 •

edited by sync-by-unito bot

maybe-sybr commented Oct 29, 2020

Celery hangs when an exception occurs inside a group of chains with redis as a broker #6437

Celery hangs when an exception occurs inside a group of chains with redis as a broker #6437

Comments

emanuelmd commented Oct 25, 2020 • edited by sync-by-unito bot

Checklist

Mandatory Debugging Information

Optional Debugging Information

Related Issues and Possible Duplicates

Related Issues

Possible Duplicates

Environment & Settings

Steps to Reproduce

Required Dependencies

Python Packages

Other Dependencies

Minimally Reproducible Test Case

Expected Behavior

Actual Behavior

Notes

maybe-sybr commented Oct 29, 2020

emanuelmd commented Oct 25, 2020 •

edited by sync-by-unito bot