Fix chord element counting #6354

maybe-sybr · 2020-09-18T05:56:48Z

This change fixes chord accounting in the redis backend by ensuring that
group IDs are not overwritten by chained callbacks.

Amends the changeset from #5984

celery/canvas.py

lgtm-com · 2020-09-18T06:22:52Z

This pull request introduces 2 alerts when merging 71f0f9e into 14a3524 - view on LGTM.com

new alerts:

1 for Non-exception in 'except' clause
1 for Module is imported with 'import' and 'import from'

maybe-sybr · 2020-09-18T06:34:50Z

Ah, I think the CI failures were a result of me slightly misunderstanding the base Signature type and brain explosion around how results get passed along. Should be fixed now 🤞

maybe-sybr · 2020-09-18T06:52:59Z

Hmm, interestingly a group result in a chord doesn't behave like I might expect in redis. Not sure if it's my fault or not just yet:

    def test_nested_chord_group_chain_group_tail(self, manager):
        """
        Sanity check that a deeply nested group is completed as expected.
    
        Groups at the end of chains nested in chords have had issues and this
        simple test sanity check that such a tsk structure can be completed.
        """
        try:
            manager.app.backend.ensure_chords_allowed()
        except NotImplementedError as e:
            raise pytest.skip(e.args[0])
    
        sig = chord(group(chain(
            identity.s(42),     # -> 42
            group(
                identity.s(),   # -> 42
                identity.s(),   # -> 42
            ),                  # [42, 42]
        )), identity.s())       # [[42, 42]]
        res = sig.delay()
>       assert res.get(timeout=TIMEOUT) == [[42, 42]]
E       assert [42, 42] == [[42, 42]]
E         At index 0 diff: 42 != [42, 42]
E         Left contains one more item: 42
E         Full diff:
E         - [[42, 42]]
E         ? -        -
E         + [42, 42]

It failed the other way on other backends in the previous CI pipeline because I had it asserting res.get(timeout=TIMEOUT) == [42, 42] without the encapsulating list :/

I think it might be failing based on some quirk of how chords absorb their headers but I'm not at all sure why it would be behaving differently on different backends.

maybe-sybr · 2020-09-18T07:13:03Z

It also looks like I may have broken a test case for the cache backend:
https://travis-ci.org/github/celery/celery/jobs/728244102

It expects a chorderror but it must not be getting raised as expected...

lgtm-com · 2020-09-28T05:00:37Z

This pull request introduces 1 alert when merging e5d9e57 into ea37db1 - view on LGTM.com

new alerts:

1 for Module is imported with 'import' and 'import from'

auvipy

File "/home/travis/build/celery/celery/.tox/3.8-integration-rabbitmq/lib/python3.8/site-packages/pytest_rerunfailures.py", line 7, in
pytest is throwing the error in integration tests
from _pytest.resultlog import ResultLog

ModuleNotFoundError: No module named '_pytest.resultlog'

maybe-sybr · 2020-09-29T00:28:35Z

@auvipy - I think this got broken for master/everything (https://travis-ci.org/github/celery/celery/builds/730832672) :( perhaps some test deps got updated and broke it?

auvipy

can you try to pin some test deps to run the integration tests?

celery/canvas.py

thedrow · 2020-09-30T14:56:36Z

I can't restart the build for some reason.
This does not reproduce locally for me.

auvipy · 2020-09-30T17:18:37Z

Redis & cache integration tests are failing now

auvipy

can you re check this failure? > assert res.get(timeout=TIMEOUT) == [[42, 42]]

E assert [42, 42] == [[42, 42]]

E At index 0 diff: 42 != [42, 42]

E Left contains one more item: 42

E Full diff:

E - [[42, 42]]

E ? - -

E + [42, 42]

t/integration/test_canvas.py:1015: AssertionError

maybe-sybr · 2020-10-13T05:45:16Z

Hey @auvipy . Sorry this PR hasn't gotten much love recently, I've been a bit busy!

wrt that test failure, I am now under the impression that the (AFAICT undocumented?) behaviour that groups included in groups and chord get unrolled as part of the fix for #1509 many years ago might be blame for some of this. I have added some new tests which enumerate the behaviour as I understand it and should make it more clear what we need to fix prior to merging this.

I've also found a couple of other issues in my adventures testing this PR, and I'm about to push up some new commits which fix those. One is closely related to chord counting, the other is about nested groups in chains and the promises used to fulfil them. Let me know if you'd like the promises on to be split into its own PR (it's arguably separate from this PR).

maybe-sybr · 2020-10-13T06:25:15Z

The latest build shows that the redis and key-value backends behave differently. Perhaps I've misunderstood unrolling and it doesn't happen with chords? I'm adding some more intermediate test cases locally, but if someone could chime in meanwhile to explain unrolling (if anyone knows :o ) it'd be helpful.

Edit: I think I understand why I was confused now. Groups will unroll nested groups, but obviously won't unroll chords. And chords won't unroll groups in their header either. So all my tests seems to be accurate, but the last one should have a doubly-nested list instead of a singly nested one. That also means that the redis backend needs to be convinced to return a double nested list in that case without breaking other cases. I think I know how to move forward now.

I have some lint fixes to push but I'll save them for whenever I fix the integration tests.

lgtm-com · 2020-10-13T06:26:45Z

This pull request introduces 1 alert when merging ddb2985 into 70cbed3 - view on LGTM.com

new alerts:

1 for Module is imported with 'import' and 'import from'

auvipy · 2020-10-13T07:29:04Z

sounds great. rabbit & redis integrations tests are now passing while others are failing

thedrow · 2020-10-14T06:46:47Z

I'm confused. Is #6411 part of this PR as well?

maybe-sybr · 2020-10-14T06:54:08Z

I'm confused. Is #6411 part of this PR as well?

I'm in the middle of splitting the change up for merge in #6411 out of this PR since it's independent. I'll have a new changeset up here shortly once I sanity check a couple of things, @thedrow .

maybe-sybr · 2020-10-14T08:21:33Z

The change from #6411 has been stripped out of this PR now. I believe I've addressed all of the coverage commentary and have left extra commentary where requested. I'll wait for the coverage report to come back before I resolve some of those review comments.

auvipy · 2020-10-14T08:29:15Z

needs rebase

lgtm-com · 2020-10-14T08:32:27Z

This pull request fixes 1 alert when merging 2892737 into 735f167 - view on LGTM.com

fixed alerts:

1 for Module is imported with 'import' and 'import from'

maybe-sybr · 2020-10-14T08:34:09Z

Rebased on top of master

lgtm-com · 2020-10-14T09:00:26Z

This pull request fixes 2 alerts when merging 71e40cf into 9367d36 - view on LGTM.com

fixed alerts:

1 for Non-exception in 'except' clause
1 for Module is imported with 'import' and 'import from'

When a group task which is part of a chain was to be delayed by `trace_task()`, it would be reconstructed from the serialized request. Normally, this sets the `group_id` of encapsulated tasks to the ID of the group being instantiated. However, in the specific situation of a group that is the last task in a chain which contributes to the completion of a chord, it is essential that the group ID of the top-most group is used instead. This top-most group ID is used by the redis backend to track the completions of "final elements" of a chord in the `on_chord_part_return()` implementation. By overwriting the group ID which was already set in the `options` dictionaries of the child tasks being deserialized, the chord accounting done by the redis backend would be made inaccurate and chords would never complete. This change alters how options are overridden for signatures to ensure that if a `group_id` has already been set, it cannot be overridden. Since group ID should be generally opaque to users, this should not be disruptive.

This change amends the implementation of `chord.__length_hint__()` to ensure that all child task types are correctly counted. Specifically: * all sub-tasks of a group are counted recursively * the final task of a chain is counted recursively * the body of a chord is counted recursively * all other simple signatures count as a single "final element" There is also a deserialisation step if a `dict` is seen while counting the final elements in a chord, however this should become less important with the merge of celery#6342 which ensures that tasks are recursively deserialized by `.from_dict()`.

These tests are intended to show that group unrolling should be respected in various ways by all backends. They should make it more clear what behaviour we should be expecting from nested canvas components and ensure that all the implementations (mostly relevant to chords and `on_chord_part_return()` code) behave sensibly.

This avoids an issue where the `on_chord_part_return()` implementation would check the the length of the result of a chain ending in a nested group. This would manifest in behaviour where a worker would be blocked waiting for for the result object it holds to complete since it would attempt to `.join()` the result object. In situations with plenty of workers, this wouldn't really cause any noticable issue apart from some latency or unpredictable failures - but in concurrency constrained situations like the integrations tests, it causes deadlocks. We know from previous commits in this series that chord completion is more complex than just waiting for a direct child, so we correct the `size` value in `BaseKeyValueStoreBackend.on_chord_part_return()` to respect the `chord_size` value from the request, falling back to the length of the `deps` if that value is missing for some reason (this is necessary to keep a number of the tests happy but it's not clear to me if that will ever be the case in real life situations).

This change fixes the chord result flattening issue which manifested when using the Redis backend due to its deliberate throwing away of information about the header result structure. Rather than assuming that all results which contribute to the finalisation of a chord should be siblings, this change checks if any are complex (ie. `GroupResult`s) and falls back to behaviour similar to that implemented in the `KeyValueStoreBackend` which restores the original `GroupResult` object and `join()`s it. We retain the original behaviour which is billed as an optimisation in f09b041. We could behave better in the complex header result case by not bothering to stash the results of contributing tasks under the `.j` zset since we won't be using them, but without checking for the presence of the complex group result on every `on_chord_part_return()` call, we can't be sure that we won't need those stashed results later on. This would be an opportunity for optimisation in future if we were to use an `EVAL` to only do the `zadd()` if the group result key doesn't exist. However, avoiding the result encoding work in `on_chord_part_return()` would be more complicated. For now, it's not worth the brainpower. This change also slightly refactors the redis backend unit tests to make it easier to build fixtures and hit both the complex and simple result structure cases.

maybe-sybr · 2020-10-14T09:13:33Z

This new changeset should address the last of the missed coverage 🤞

maybe-sybr · 2020-10-14T21:53:27Z

Thanks @auvipy

maybe-sybr commented Sep 18, 2020

View reviewed changes

celery/canvas.py Outdated Show resolved Hide resolved

maybe-sybr mentioned this pull request Sep 18, 2020

Deeply deserialize groups #6342

Merged

maybe-sybr force-pushed the maybe/fix-chord-counting branch from 71f0f9e to f63a58c Compare September 18, 2020 06:33

auvipy requested a review from thedrow September 20, 2020 13:38

maybe-sybr force-pushed the maybe/fix-chord-counting branch from f63a58c to e5d9e57 Compare September 28, 2020 04:31

auvipy requested changes Sep 28, 2020

View reviewed changes

auvipy added this to the Future milestone Sep 30, 2020

auvipy requested changes Sep 30, 2020

View reviewed changes

celery/canvas.py Outdated Show resolved Hide resolved

thedrow added Component: Canvas Component: Redis Results Backend labels Sep 30, 2020

auvipy closed this Sep 30, 2020

auvipy reopened this Sep 30, 2020

auvipy requested changes Oct 12, 2020

View reviewed changes

maybe-sybr force-pushed the maybe/fix-chord-counting branch 3 times, most recently from 34bd1c7 to ddb2985 Compare October 13, 2020 05:57

auvipy mentioned this pull request Oct 13, 2020

Fix chords with chained groups (#5947) #5984

Merged

maybe-sybr force-pushed the maybe/fix-chord-counting branch from ddb2985 to 42a3f46 Compare October 14, 2020 02:08

maybe-sybr mentioned this pull request Oct 14, 2020

fix: Pass back real result for single task chains #6411

Merged

maybe-sybr force-pushed the maybe/fix-chord-counting branch from db0c570 to 2892737 Compare October 14, 2020 08:17

maybe-sybr force-pushed the maybe/fix-chord-counting branch from 2892737 to 71e40cf Compare October 14, 2020 08:33

auvipy modified the milestones: Future, 5.0.1 Oct 14, 2020

auvipy requested review from georgepsarakis and xirdneh October 14, 2020 09:10

maybe-sybr added 6 commits October 14, 2020 20:11

test: Fix old markings for chord tests

d8783ef

maybe-sybr force-pushed the maybe/fix-chord-counting branch from 71e40cf to 8422937 Compare October 14, 2020 09:12

auvipy approved these changes Oct 14, 2020

View reviewed changes

auvipy merged commit beddbee into celery:master Oct 14, 2020

maybe-sybr deleted the maybe/fix-chord-counting branch October 14, 2020 21:53

maybe-sybr mentioned this pull request Oct 15, 2020

ci: Fix pypy3 CI #6417

Merged

maybe-sybr mentioned this pull request Oct 29, 2020

Celery hangs when an exception occurs inside a group of chains with redis as a broker #6437

Open

18 tasks

maybe-sybr mentioned this pull request Dec 14, 2020

Nested canvas - fetching result hangs #5060

Open

AllexVeldman mentioned this pull request Apr 16, 2021

Nested groups and chains raise ChordCounter.DoesNotExist celery/django-celery-results#194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix chord element counting #6354

Fix chord element counting #6354

maybe-sybr commented Sep 18, 2020

lgtm-com bot commented Sep 18, 2020

maybe-sybr commented Sep 18, 2020 •

edited

maybe-sybr commented Sep 18, 2020 •

edited

maybe-sybr commented Sep 18, 2020

lgtm-com bot commented Sep 28, 2020

auvipy left a comment

maybe-sybr commented Sep 29, 2020

auvipy left a comment

thedrow commented Sep 30, 2020

auvipy commented Sep 30, 2020

auvipy left a comment

maybe-sybr commented Oct 13, 2020

maybe-sybr commented Oct 13, 2020 •

edited

lgtm-com bot commented Oct 13, 2020

auvipy commented Oct 13, 2020

thedrow commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

auvipy commented Oct 14, 2020

lgtm-com bot commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

lgtm-com bot commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

Fix chord element counting #6354

Fix chord element counting #6354

Conversation

maybe-sybr commented Sep 18, 2020

lgtm-com bot commented Sep 18, 2020

maybe-sybr commented Sep 18, 2020 • edited

maybe-sybr commented Sep 18, 2020 • edited

maybe-sybr commented Sep 18, 2020

lgtm-com bot commented Sep 28, 2020

auvipy left a comment

Choose a reason for hiding this comment

maybe-sybr commented Sep 29, 2020

auvipy left a comment

Choose a reason for hiding this comment

thedrow commented Sep 30, 2020

auvipy commented Sep 30, 2020

auvipy left a comment

Choose a reason for hiding this comment

maybe-sybr commented Oct 13, 2020

maybe-sybr commented Oct 13, 2020 • edited

lgtm-com bot commented Oct 13, 2020

auvipy commented Oct 13, 2020

thedrow commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

auvipy commented Oct 14, 2020

lgtm-com bot commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

lgtm-com bot commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

maybe-sybr commented Oct 14, 2020

maybe-sybr commented Sep 18, 2020 •

edited

maybe-sybr commented Sep 18, 2020 •

edited

maybe-sybr commented Oct 13, 2020 •

edited