New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix chord element counting #6354
Conversation
This pull request introduces 2 alerts when merging 71f0f9e into 14a3524 - view on LGTM.com new alerts:
|
71f0f9e
to
f63a58c
Compare
Ah, I think the CI failures were a result of me slightly misunderstanding the base |
Hmm, interestingly a group result in a chord doesn't behave like I might expect in redis. Not sure if it's my fault or not just yet: def test_nested_chord_group_chain_group_tail(self, manager):
"""
Sanity check that a deeply nested group is completed as expected.
Groups at the end of chains nested in chords have had issues and this
simple test sanity check that such a tsk structure can be completed.
"""
try:
manager.app.backend.ensure_chords_allowed()
except NotImplementedError as e:
raise pytest.skip(e.args[0])
sig = chord(group(chain(
identity.s(42), # -> 42
group(
identity.s(), # -> 42
identity.s(), # -> 42
), # [42, 42]
)), identity.s()) # [[42, 42]]
res = sig.delay()
> assert res.get(timeout=TIMEOUT) == [[42, 42]]
E assert [42, 42] == [[42, 42]]
E At index 0 diff: 42 != [42, 42]
E Left contains one more item: 42
E Full diff:
E - [[42, 42]]
E ? - -
E + [42, 42] It failed the other way on other backends in the previous CI pipeline because I had it asserting I think it might be failing based on some quirk of how chords absorb their headers but I'm not at all sure why it would be behaving differently on different backends. |
It also looks like I may have broken a test case for the cache backend: It expects a chorderror but it must not be getting raised as expected... |
f63a58c
to
e5d9e57
Compare
This pull request introduces 1 alert when merging e5d9e57 into ea37db1 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File "/home/travis/build/celery/celery/.tox/3.8-integration-rabbitmq/lib/python3.8/site-packages/pytest_rerunfailures.py", line 7, in
pytest is throwing the error in integration tests
from _pytest.resultlog import ResultLog
ModuleNotFoundError: No module named '_pytest.resultlog'
@auvipy - I think this got broken for master/everything (https://travis-ci.org/github/celery/celery/builds/730832672) :( perhaps some test deps got updated and broke it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try to pin some test deps to run the integration tests?
I can't restart the build for some reason. |
Redis & cache integration tests are failing now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you re check this failure? > assert res.get(timeout=TIMEOUT) == [[42, 42]]
E assert [42, 42] == [[42, 42]]
E At index 0 diff: 42 != [42, 42]
E Left contains one more item: 42
E Full diff:
E - [[42, 42]]
E ? - -
E + [42, 42]
t/integration/test_canvas.py:1015: AssertionError
Hey @auvipy . Sorry this PR hasn't gotten much love recently, I've been a bit busy! wrt that test failure, I am now under the impression that the (AFAICT undocumented?) behaviour that groups included in groups and chord get unrolled as part of the fix for #1509 many years ago might be blame for some of this. I have added some new tests which enumerate the behaviour as I understand it and should make it more clear what we need to fix prior to merging this. I've also found a couple of other issues in my adventures testing this PR, and I'm about to push up some new commits which fix those. One is closely related to chord counting, the other is about nested groups in chains and the promises used to fulfil them. Let me know if you'd like the promises on to be split into its own PR (it's arguably separate from this PR). |
34bd1c7
to
ddb2985
Compare
The latest build shows that the redis and key-value backends behave differently. Perhaps I've misunderstood unrolling and it doesn't happen with chords? I'm adding some more intermediate test cases locally, but if someone could chime in meanwhile to explain unrolling (if anyone knows :o ) it'd be helpful. Edit: I think I understand why I was confused now. Groups will unroll nested groups, but obviously won't unroll chords. And chords won't unroll groups in their header either. So all my tests seems to be accurate, but the last one should have a doubly-nested list instead of a singly nested one. That also means that the redis backend needs to be convinced to return a double nested list in that case without breaking other cases. I think I know how to move forward now. I have some lint fixes to push but I'll save them for whenever I fix the integration tests. |
This pull request introduces 1 alert when merging ddb2985 into 70cbed3 - view on LGTM.com new alerts:
|
sounds great. rabbit & redis integrations tests are now passing while others are failing |
ddb2985
to
42a3f46
Compare
I'm confused. Is #6411 part of this PR as well? |
db0c570
to
2892737
Compare
The change from #6411 has been stripped out of this PR now. I believe I've addressed all of the coverage commentary and have left extra commentary where requested. I'll wait for the coverage report to come back before I resolve some of those review comments. |
needs rebase |
This pull request fixes 1 alert when merging 2892737 into 735f167 - view on LGTM.com fixed alerts:
|
2892737
to
71e40cf
Compare
Rebased on top of master |
This pull request fixes 2 alerts when merging 71e40cf into 9367d36 - view on LGTM.com fixed alerts:
|
When a group task which is part of a chain was to be delayed by `trace_task()`, it would be reconstructed from the serialized request. Normally, this sets the `group_id` of encapsulated tasks to the ID of the group being instantiated. However, in the specific situation of a group that is the last task in a chain which contributes to the completion of a chord, it is essential that the group ID of the top-most group is used instead. This top-most group ID is used by the redis backend to track the completions of "final elements" of a chord in the `on_chord_part_return()` implementation. By overwriting the group ID which was already set in the `options` dictionaries of the child tasks being deserialized, the chord accounting done by the redis backend would be made inaccurate and chords would never complete. This change alters how options are overridden for signatures to ensure that if a `group_id` has already been set, it cannot be overridden. Since group ID should be generally opaque to users, this should not be disruptive.
This change amends the implementation of `chord.__length_hint__()` to ensure that all child task types are correctly counted. Specifically: * all sub-tasks of a group are counted recursively * the final task of a chain is counted recursively * the body of a chord is counted recursively * all other simple signatures count as a single "final element" There is also a deserialisation step if a `dict` is seen while counting the final elements in a chord, however this should become less important with the merge of celery#6342 which ensures that tasks are recursively deserialized by `.from_dict()`.
These tests are intended to show that group unrolling should be respected in various ways by all backends. They should make it more clear what behaviour we should be expecting from nested canvas components and ensure that all the implementations (mostly relevant to chords and `on_chord_part_return()` code) behave sensibly.
This avoids an issue where the `on_chord_part_return()` implementation would check the the length of the result of a chain ending in a nested group. This would manifest in behaviour where a worker would be blocked waiting for for the result object it holds to complete since it would attempt to `.join()` the result object. In situations with plenty of workers, this wouldn't really cause any noticable issue apart from some latency or unpredictable failures - but in concurrency constrained situations like the integrations tests, it causes deadlocks. We know from previous commits in this series that chord completion is more complex than just waiting for a direct child, so we correct the `size` value in `BaseKeyValueStoreBackend.on_chord_part_return()` to respect the `chord_size` value from the request, falling back to the length of the `deps` if that value is missing for some reason (this is necessary to keep a number of the tests happy but it's not clear to me if that will ever be the case in real life situations).
This change fixes the chord result flattening issue which manifested when using the Redis backend due to its deliberate throwing away of information about the header result structure. Rather than assuming that all results which contribute to the finalisation of a chord should be siblings, this change checks if any are complex (ie. `GroupResult`s) and falls back to behaviour similar to that implemented in the `KeyValueStoreBackend` which restores the original `GroupResult` object and `join()`s it. We retain the original behaviour which is billed as an optimisation in f09b041. We could behave better in the complex header result case by not bothering to stash the results of contributing tasks under the `.j` zset since we won't be using them, but without checking for the presence of the complex group result on every `on_chord_part_return()` call, we can't be sure that we won't need those stashed results later on. This would be an opportunity for optimisation in future if we were to use an `EVAL` to only do the `zadd()` if the group result key doesn't exist. However, avoiding the result encoding work in `on_chord_part_return()` would be more complicated. For now, it's not worth the brainpower. This change also slightly refactors the redis backend unit tests to make it easier to build fixtures and hit both the complex and simple result structure cases.
71e40cf
to
8422937
Compare
This new changeset should address the last of the missed coverage 🤞 |
Thanks @auvipy |
This change fixes chord accounting in the redis backend by ensuring that
group IDs are not overwritten by chained callbacks.
Amends the changeset from #5984