Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result object for groups containing chords which have a chain as their body time out #6734

Closed
4 of 18 tasks
maybe-sybr opened this issue Apr 19, 2021 · 6 comments
Closed
4 of 18 tasks

Comments

@maybe-sybr
Copy link
Contributor

Checklist

  • I have verified that the issue exists against the master branch of Celery.
  • This has already been asked to the discussion group first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the master branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
  • I have verified that the issue exists against the master branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.

Optional Debugging Information

  • I have tried reproducing the issue on more than one Python version
    and/or implementation.
  • I have tried reproducing the issue on more than one message broker and/or
    result backend.
  • I have tried reproducing the issue on more than one version of the message
    broker and/or result backend.
  • I have tried reproducing the issue on more than one operating system.
  • I have tried reproducing the issue on more than one workers pool.
  • I have tried reproducing the issue with autoscaling, retries,
    ETA/Countdown & rate limits disabled.
  • I have tried reproducing the issue after downgrading
    and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

  • None

Possible Duplicates

  • None

Environment & Settings

Celery version:

celery report Output:

Steps to Reproduce

Required Dependencies

  • Minimal Python Version: N/A or Unknown
  • Minimal Celery Version: N/A or Unknown
  • Minimal Kombu Version: N/A or Unknown
  • Minimal Broker Version: N/A or Unknown
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

Other Dependencies

N/A

Minimally Reproducible Test Case

This is the guts of the integration test I'm going to put up in #6733 :

child_chord = chord(identity.si(42), chain((identity.s(), )))
group_sig = group((child_chord, ))
res = group_sig.delay()
res.get(timeout=ARBITRARY)

Expected Behavior

  • result object should be able to be gotten

Actual Behavior

  • res.get() raises TimeoutError no matter how long you set your timeout value (AFAICT)
  • A subsequent call to res.get() gets the result properly since the task(s) do run and by that time the results have probably landed
@maybe-sybr
Copy link
Contributor Author

Discovered while experimenting with tests for #6721 in #6733. #6733 should land an xfailing test for this issue.

I think this might be due to the promise in the GroupResult not being fulifled since the chain has to be re-delayed from the serialised request stored as the chord's body. If that's the case then this issue is tangentially related to #6411 but would need to be solved in some other way.

My gut feel is that we might want to add a final check prior to allowing a timeout to bubble up, to see if the results we were waiting for have actually landed or if we truly did timeout.

@maybe-sybr maybe-sybr added this to the 5.2 milestone Apr 19, 2021
@maybe-sybr
Copy link
Contributor Author

I think this is likely something which can wait to land in an upcoming release rather than trying to squeeze into 5.1

maybe-sybr added a commit that referenced this issue Apr 19, 2021
This change ensures that we only have one piece of code which calculates
chord sizes (ie. `_chord._descend()`, recently made protected so other
canvas classes can use it as required). By doing so, we fix some edge
cases in the chord counting logic which was being used for children of
groups, and also add some unit tests to capture those cases and their
expected behaviours.

This change also introduces an integration test which checks the current
behaviour of chains used as chord bodies when nested in groups. Due to
some misbehaviour, likely with promise fulfillment, the `GroupResult`
object will time out unless all of its children are resolved prior to
`GroupResult` being joined (specifically, native joins block forever or
until timeout). This misbehaviour is tracked by #6734 and the test in
not marked as `xfail`ing to ensure that the current janky behaviour
continues to work as expected rather than regressing.
maybe-sybr added a commit that referenced this issue Apr 19, 2021
This change ensures that we only have one piece of code which calculates
chord sizes (ie. `_chord._descend()`, recently made protected so other
canvas classes can use it as required). By doing so, we fix some edge
cases in the chord counting logic which was being used for children of
groups, and also add some unit tests to capture those cases and their
expected behaviours.

This change also introduces an integration test which checks the current
behaviour of chains used as chord bodies when nested in groups. Due to
some misbehaviour, likely with promise fulfillment, the `GroupResult`
object will time out unless all of its children are resolved prior to
`GroupResult` being joined (specifically, native joins block forever or
until timeout). This misbehaviour is tracked by #6734 and the test in
not marked as `xfail`ing to ensure that the current janky behaviour
continues to work as expected rather than regressing.
maybe-sybr added a commit that referenced this issue Apr 27, 2021
This change ensures that we only have one piece of code which calculates
chord sizes (ie. `_chord._descend()`, recently made protected so other
canvas classes can use it as required). By doing so, we fix some edge
cases in the chord counting logic which was being used for children of
groups, and also add some unit tests to capture those cases and their
expected behaviours.

This change also introduces an integration test which checks the current
behaviour of chains used as chord bodies when nested in groups. Due to
some misbehaviour, likely with promise fulfillment, the `GroupResult`
object will time out unless all of its children are resolved prior to
`GroupResult` being joined (specifically, native joins block forever or
until timeout). This misbehaviour is tracked by #6734 and the test in
not marked as `xfail`ing to ensure that the current janky behaviour
continues to work as expected rather than regressing.
@maybe-sybr maybe-sybr self-assigned this Apr 27, 2021
@maybe-sybr
Copy link
Contributor Author

I wonder if #6746 will fix this...

thedrow pushed a commit that referenced this issue Apr 28, 2021
* improv: Deconflict `chord` class and kwarg names

* improv: Make `chord.descend` protected not private

This will allow us to call it from other code in this module which needs
to accurately count chord sizes.

* fix: Counting of chord-chain tails of zero tasks

* fix: Chord counting of group children

This change ensures that we only have one piece of code which calculates
chord sizes (ie. `_chord._descend()`, recently made protected so other
canvas classes can use it as required). By doing so, we fix some edge
cases in the chord counting logic which was being used for children of
groups, and also add some unit tests to capture those cases and their
expected behaviours.

This change also introduces an integration test which checks the current
behaviour of chains used as chord bodies when nested in groups. Due to
some misbehaviour, likely with promise fulfillment, the `GroupResult`
object will time out unless all of its children are resolved prior to
`GroupResult` being joined (specifically, native joins block forever or
until timeout). This misbehaviour is tracked by #6734 and the test in
not marked as `xfail`ing to ensure that the current janky behaviour
continues to work as expected rather than regressing.
jeyrce pushed a commit to jeyrce/celery that referenced this issue Aug 25, 2021
* improv: Deconflict `chord` class and kwarg names

* improv: Make `chord.descend` protected not private

This will allow us to call it from other code in this module which needs
to accurately count chord sizes.

* fix: Counting of chord-chain tails of zero tasks

* fix: Chord counting of group children

This change ensures that we only have one piece of code which calculates
chord sizes (ie. `_chord._descend()`, recently made protected so other
canvas classes can use it as required). By doing so, we fix some edge
cases in the chord counting logic which was being used for children of
groups, and also add some unit tests to capture those cases and their
expected behaviours.

This change also introduces an integration test which checks the current
behaviour of chains used as chord bodies when nested in groups. Due to
some misbehaviour, likely with promise fulfillment, the `GroupResult`
object will time out unless all of its children are resolved prior to
`GroupResult` being joined (specifically, native joins block forever or
until timeout). This misbehaviour is tracked by celery#6734 and the test in
not marked as `xfail`ing to ensure that the current janky behaviour
continues to work as expected rather than regressing.
@auvipy
Copy link
Member

auvipy commented Oct 31, 2021

I guess we can close this now or push to 5.2.x milestone

@auvipy
Copy link
Member

auvipy commented Nov 4, 2021

will re assign to 5.2.x in case it isnt fixed properly.

@auvipy auvipy closed this as completed Nov 4, 2021
@dobosevych
Copy link
Contributor

What should be expected behavior? The test is marked still by xfailed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants