Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested groups and chains raise ChordCounter.DoesNotExist #194

Closed
JannKleen opened this issue Apr 9, 2021 · 11 comments · Fixed by #195
Closed

Nested groups and chains raise ChordCounter.DoesNotExist #194

JannKleen opened this issue Apr 9, 2021 · 11 comments · Fixed by #195

Comments

@JannKleen
Copy link

I get this error with django-celery-results==2.0.1 and Celery==5.0.5. I didn't test the dev branches yet.
Similar to #177 the following Exception is raised, but in this case it seems to happen because of the nesting of groups and chains.

Exception:

django_celery_results.models.ChordCounter.DoesNotExist: ChordCounter matching query does not exist.

This fails:

            transaction.on_commit(
                 chain(
                     task_1,
                     group(
                         task_2,
                         chain(
                             task_3,
                             group(task_4, task_5)),
                     ),
                     task_6,
                 ).delay
            )

This works:

            transaction.on_commit(
                chain(
                    task_1,
                    group(
                        task_2,
                        task_3,
                    ),
                    group(task_4, task_5),
                    task_6,
                ).delay
            )
@auvipy
Copy link
Member

auvipy commented Apr 11, 2021

this should be fixed in master. 3f74b82 i will release soon

@AllexVeldman
Copy link
Contributor

3f74b82 will not raise the error but if there is a bug in nested chords it might be that a callback won't be executed.
@JannKleen do you know at the end of which task the exception is raised?

I'll do some testing to see if I can recreate this and if all tasks are executed correctly.

@AllexVeldman
Copy link
Contributor

AllexVeldman commented Apr 16, 2021

@auvipy my suspicion was right, this is the result when run with the master branch:

[2021-04-16 07:08:00,893] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_1[4d356049-60c2-4014-9c70-81cfb2bee88a]  
[2021-04-16 07:08:00,912] [ForkPoolWorker-1] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 1
[2021-04-16 07:08:00,939] [ForkPoolWorker-1] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_1[4d356049-60c2-4014-9c70-81cfb2bee88a] succeeded in 0.042876192999999674s: None
[2021-04-16 07:08:00,948] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_2[0c712ed2-a6eb-4905-8b41-55fc38d0bac2]  
[2021-04-16 07:08:06,071] [ForkPoolWorker-2] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 2
[2021-04-16 07:08:06,085] [ForkPoolWorker-2] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_2[0c712ed2-a6eb-4905-8b41-55fc38d0bac2] succeeded in 0.04307537600000089s: None
[2021-04-16 07:08:06,973] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_3[14125c56-6853-414b-896d-cf2bc3f8184f]  
[2021-04-16 07:08:06,994] [ForkPoolWorker-2] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 3
[2021-04-16 07:08:07,034] [ForkPoolWorker-2] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_3[14125c56-6853-414b-896d-cf2bc3f8184f] succeeded in 0.056360951999998576s: None
[2021-04-16 07:08:07,142] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_4[d0b9f6ff-81c3-484b-9257-26ee523f42a9]  
[2021-04-16 07:08:07,299] [ForkPoolWorker-3] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 4
[2021-04-16 07:08:07,321] [ForkPoolWorker-3] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_4[d0b9f6ff-81c3-484b-9257-26ee523f42a9] succeeded in 0.05146732000000043s: None
[2021-04-16 07:08:08,536] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_5[0735d763-a08d-4f0c-b445-9032a510db1b]  
[2021-04-16 07:08:08,558] [ForkPoolWorker-3] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 5
[2021-04-16 07:08:08,567] [ForkPoolWorker-3] [WARNING] [django_celery_results.backends.database] Can't find ChordCounter for Group 2b7b2eba-f843-4422-85e5-908bf262b951
[2021-04-16 07:08:08,569] [ForkPoolWorker-3] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_5[0735d763-a08d-4f0c-b445-9032a510db1b] succeeded in 0.028897945000000647s: None

With some additional logs for sequences:

[2021-04-16 07:40:19,976] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_1[833f0af2-b9d9-4717-8bb4-1d199096cae6]  
[2021-04-16 07:40:19,995] [ForkPoolWorker-1] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 1
[2021-04-16 07:40:19,998] [ForkPoolWorker-1] [INFO] [celery.canvas] apply_chord - Group a60ecd37-fc82-45b2-8198-c05d8a5a9903
[2021-04-16 07:40:20,062] [ForkPoolWorker-1] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_1[833f0af2-b9d9-4717-8bb4-1d199096cae6] succeeded in 0.08224789199999982s: None

[2021-04-16 07:40:20,168] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_2[ea42a000-0c5d-4174-93d3-9de62f1715e4]  
[2021-04-16 07:40:20,471] [ForkPoolWorker-2] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 2
[2021-04-16 07:40:20,479] [ForkPoolWorker-2] [INFO] [celery.backends.base] on_chord_part_return - Group a60ecd37-fc82-45b2-8198-c05d8a5a9903
[2021-04-16 07:40:20,491] [ForkPoolWorker-2] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_2[ea42a000-0c5d-4174-93d3-9de62f1715e4] succeeded in 0.05539991599999894s: None

[2021-04-16 07:40:20,854] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_3[f44567b4-2a34-4b73-9533-473dbfd70d55]  
[2021-04-16 07:40:20,881] [ForkPoolWorker-2] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 3
[2021-04-16 07:40:20,939] [ForkPoolWorker-2] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_3[f44567b4-2a34-4b73-9533-473dbfd70d55] succeeded in 0.07884279700000008s: None

[2021-04-16 07:40:21,049] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_4[ded031ba-7682-4b21-9b71-f8f12d1da4d4]  
[2021-04-16 07:40:21,346] [ForkPoolWorker-3] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 4
[2021-04-16 07:40:21,354] [ForkPoolWorker-3] [INFO] [celery.backends.base] on_chord_part_return - Group a60ecd37-fc82-45b2-8198-c05d8a5a9903
[2021-04-16 07:40:21,364] [ForkPoolWorker-3] [INFO] [django_celery_results.backends.database] Trigger Callback, Delete Group a60ecd37-fc82-45b2-8198-c05d8a5a9903
[2021-04-16 07:40:21,378] [ForkPoolWorker-3] [INFO] [django_celery_results.backends.database] Group not ready - Group a60ecd37-fc82-45b2-8198-c05d8a5a9903
[2021-04-16 07:40:21,379] [ForkPoolWorker-3] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_4[ded031ba-7682-4b21-9b71-f8f12d1da4d4] succeeded in 0.07478742400000016s: None

[2021-04-16 07:40:21,961] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_5[8f788188-b387-438a-8ce0-43b23d7c72de]  
[2021-04-16 07:40:21,987] [ForkPoolWorker-3] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 5
[2021-04-16 07:40:21,995] [ForkPoolWorker-3] [INFO] [celery.backends.base] on_chord_part_return - Group a60ecd37-fc82-45b2-8198-c05d8a5a9903
[2021-04-16 07:40:25,817] [ForkPoolWorker-3] [WARNING] [django_celery_results.backends.database] Can't find ChordCounter for Group a60ecd37-fc82-45b2-8198-c05d8a5a9903
[2021-04-16 07:40:25,820] [ForkPoolWorker-3] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_5[8f788188-b387-438a-8ce0-43b23d7c72de] succeeded in 3.8528664799999994s: None

So the issue is in the original length of the top-level group, it's being triggered after task_4 instead of task_5.
that together with the GroupResult.ready() check causes a silent drop of the callback task (which is meant to prevent failed headers to trigger the callback)

I'll see if the Redis backend handles this case correctly

@AllexVeldman
Copy link
Contributor

AllexVeldman commented Apr 16, 2021

Redis log:

[2021-04-16 07:38:39,708] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_1[eef2f666-d477-4da0-996e-2016b3d9471e]  
[2021-04-16 07:38:39,717] [ForkPoolWorker-1] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 1
[2021-04-16 07:38:39,720] [ForkPoolWorker-1] [INFO] [celery.canvas] apply_chord - Group a11da2d8-ee1a-4a2d-bf7b-b86e9912fd33
[2021-04-16 07:38:39,757] [ForkPoolWorker-1] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_1[eef2f666-d477-4da0-996e-2016b3d9471e] succeeded in 0.044746176000000304s: None

[2021-04-16 07:38:39,868] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_2[65bd0346-92a9-435d-8ca3-4101c3936a02]  
[2021-04-16 07:38:40,144] [ForkPoolWorker-2] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 2
[2021-04-16 07:38:40,148] [ForkPoolWorker-2] [INFO] [celery.backends.base] on_chord_part_return - Group a11da2d8-ee1a-4a2d-bf7b-b86e9912fd33
[2021-04-16 07:38:40,151] [ForkPoolWorker-2] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_2[65bd0346-92a9-435d-8ca3-4101c3936a02] succeeded in 0.020867630999999776s: None

[2021-04-16 07:38:40,571] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_3[0ba9b40b-f34e-405e-9d62-8f5905fafdf6]  
[2021-04-16 07:38:40,582] [ForkPoolWorker-2] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 3
[2021-04-16 07:38:40,638] [ForkPoolWorker-2] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_3[0ba9b40b-f34e-405e-9d62-8f5905fafdf6] succeeded in 0.06057350299999875s: None

[2021-04-16 07:38:40,745] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_4[012ae855-14f9-4774-ada1-bef1cd94c86a]  
[2021-04-16 07:38:41,012] [ForkPoolWorker-3] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 4
[2021-04-16 07:38:41,016] [ForkPoolWorker-3] [INFO] [celery.backends.base] on_chord_part_return - Group a11da2d8-ee1a-4a2d-bf7b-b86e9912fd33
[2021-04-16 07:38:41,019] [ForkPoolWorker-3] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_4[012ae855-14f9-4774-ada1-bef1cd94c86a] succeeded in 0.02017579199999986s: None

[2021-04-16 07:38:41,677] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_5[de530510-892d-444e-bc6c-ac99914f944c]  
[2021-04-16 07:38:41,686] [ForkPoolWorker-3] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 5
[2021-04-16 07:38:41,690] [ForkPoolWorker-3] [INFO] [celery.backends.base] on_chord_part_return - Group a11da2d8-ee1a-4a2d-bf7b-b86e9912fd33
[2021-04-16 07:38:42,256] [ForkPoolWorker-3] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_5[de530510-892d-444e-bc6c-ac99914f944c] succeeded in 0.5735485610000026s: None

[2021-04-16 07:38:42,367] [MainProcess] [INFO] [celery.worker.strategy] Received task: chainstock.taskqueue.chordcounter_tasks.task_6[152f7054-13f7-42e1-853a-12f3b99a017a]  
[2021-04-16 07:38:42,641] [ForkPoolWorker-4] [INFO] [chainstock.taskqueue.chordcounter_tasks] Task 6
[2021-04-16 07:38:42,642] [ForkPoolWorker-4] [INFO] [chainstock.taskqueue.chordcounter_tasks] ([None, [None, None]],)
[2021-04-16 07:38:42,646] [ForkPoolWorker-4] [INFO] [celery.app.trace] Task chainstock.taskqueue.chordcounter_tasks.task_6[152f7054-13f7-42e1-853a-12f3b99a017a] succeeded in 0.018303156999998293s: None

@AllexVeldman
Copy link
Contributor

This was fixed in the redis backend with celery/celery#6354
I'll see if I can extract the useful parts.

Possibly related for the future: celery/celery#6576
This seems to have broken the same thing for celery master:
celery/celery#6721

It also introduced a set_chord_size we might need to implement to not break the generator type behaviour.

@auvipy
Copy link
Member

auvipy commented Apr 16, 2021

Allex can you dedicate some time to contribute upstream to celery

@AllexVeldman
Copy link
Contributor

@auvipy I don't have a huge amount of time available, let me see what I can dedicate.

This issue of calculating chord size does seem a more general issue spanning multiple backends.
What exactly would you like me to look into?

@AllexVeldman
Copy link
Contributor

This was fixed in the redis backend with celery/celery#6354
I'll see if I can extract the useful parts.

The Redis implementation relies on GroupResult.save()/.restore() so without #40 I won't be able to replicate it here.

@AllexVeldman
Copy link
Contributor

Actually, I might have just overlooked something in the original implementation.
The chord size is available on the Signature body["chord_size"] in apply_chord()

My serialisation of the GroupResult is already sufficient for triggering the callback.

@auvipy how can I add a test like this to this project? I did not find tests that actually delay tasks for this project.

@JannKleen
Copy link
Author

@auvipy We finally got around to upgrading to the latest version, but we're still running into this issue.

Our current pipeline looks like this.

            transaction.on_commit(
                 group(
                     task_0,
                     chain(
                         task_1,
                         group(
                             task_2,
                             chain(
                                 task_3,
                                 task_4, 
                                 task_5,
                                 task_6,
                             ),
                         ),
                         task_7,
                     ),
                 ).delay
            )

With the added complexity of task_2 calling self.replace to replace itself with a group of n (normally 2-8) tasks.
I'm not quite sure how to debug this, but if you want me to test anything, just tell me what you'd like to see.

AllexVeldman added a commit to AllexVeldman/django-celery-results that referenced this issue Feb 2, 2022
(I did not reproduce it yet..)
AllexVeldman added a commit to AllexVeldman/django-celery-results that referenced this issue Feb 2, 2022
(I did not reproduce it yet..)
AllexVeldman added a commit to AllexVeldman/django-celery-results that referenced this issue Feb 2, 2022
(I did not reproduce it yet..)
AllexVeldman added a commit to AllexVeldman/django-celery-results that referenced this issue Feb 2, 2022
(I did not reproduce it yet..)
AllexVeldman added a commit to AllexVeldman/django-celery-results that referenced this issue Feb 2, 2022
(I did not reproduce it yet..)
AllexVeldman added a commit to AllexVeldman/django-celery-results that referenced this issue Feb 2, 2022
(I did not reproduce it yet..)
AllexVeldman added a commit to AllexVeldman/django-celery-results that referenced this issue Feb 2, 2022
(I did not reproduce it yet..)
@AllexVeldman
Copy link
Contributor

AllexVeldman commented Feb 2, 2022

@JannKleen odd, I've tried to reproduce the setup you mention but was not able to get an error or miss any tasks:
https://github.com/AllexVeldman/django-celery-results/runs/5038842196?check_suite_focus=true

it's setup so you can run the same locally using docker-compose for reproducibility.

I also tried making the replacement tasks for task_2 take longer to see if it was a race-condition somewhere in the ChordCounter but that only resulted in having to wait longer for task_7 to trigger.

Would it be possible for you to check AllexVeldman#1 and see if the task setup is the same as what you expect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants