Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: raw delete expired instead of Queryset.delete #235

Merged
merged 12 commits into from Oct 11, 2021

Conversation

daadu
Copy link
Contributor

@daadu daadu commented Sep 27, 2021

This is avoid unncessary loading of objects in-memory in certain conditions.

From Django docs:

The delete() method does a bulk delete and does not call any delete() methods on your models. It does, however, emit the pre_delete and post_delete signals for all deleted objects (including cascaded deletions).

Django needs to fetch objects into memory to send signals and handle cascades. However, if there are no cascades and no signals, then Django may take a fast-path and delete objects without fetching into memory. For large deletes this can result in significantly reduced memory usage. The amount of executed queries can be reduced, too.

Closes #232

This is avoid unncessary loading of objects in-memory in certain conditions
@daadu
Copy link
Contributor Author

daadu commented Sep 27, 2021

There is already a test that checks if the delete_expired method is functioning properly - so in case of breaking change in "protected Django method" - we will be able to catch it (Hopefully!)

@daadu
Copy link
Contributor Author

daadu commented Sep 27, 2021

@auvipy - Requesting review

@auvipy
Copy link
Member

auvipy commented Sep 27, 2021

thanks for the pr

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to see some new unit tests

@daadu
Copy link
Contributor Author

daadu commented Sep 27, 2021

@auvipy unit test against raw_delete added as suggested.

@@ -88,7 +88,7 @@ def get_all_expired(self, expires):
def delete_expired(self, expires):
"""Delete all expired results."""
with transaction.atomic():
self.get_all_expired(expires).delete()
raw_delete(queryset=self.get_all_expired(expires))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thing i want to be assured that running makemigrations detect any change in migrations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added similar test for django-silk two days back, where i simply called management command makemigrations silk --check --dry-run - if there are pending migrations then it returns with exit code 1.

Should we implement the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already this test which does that.

def test_models_match_migrations(self):
"""Make sure that no model changes exist.
This logic is taken from django's makemigrations.py file.
Here just detect if model changes exist that require
a migration, and if so we fail.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we replace this with call_command thing i mentioned above? One advantage is that it is a higher level Django API - changes in Django internal will not affect the test - until only the command behaviour is changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which part do you want to replace with what? can you elaborate please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running makemigrations django_celery_results --check --dry-run command inside the test as mentioned above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made those changes. Please review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit - 113ac2e

@@ -88,7 +88,7 @@ def get_all_expired(self, expires):
def delete_expired(self, expires):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests for this method needs to be updated as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of update? the current test looks sufficient to me.

…calling internal logic of `makemigrations` to check if no pending migrations
@daadu daadu requested a review from auvipy September 28, 2021 07:06
@auvipy
Copy link
Member

auvipy commented Sep 28, 2021

can you check why CI is failing please

@daadu
Copy link
Contributor Author

daadu commented Sep 28, 2021

@auvipy Looks like the CI is stuck.

@auvipy
Copy link
Member

auvipy commented Oct 11, 2021

CI is running green :D

@auvipy
Copy link
Member

auvipy commented Oct 11, 2021

going to merge this for now. future improvements are always welcome

@auvipy auvipy merged commit 7c7cb67 into celery:master Oct 11, 2021
@daadu daadu deleted the raw-delete-cleanup branch October 12, 2021 06:12
@ForthTurn
Copy link

I have a django model with a ForeignKey to TaskResult, and we set the on_delete to models.CASCADE. Maybe there is a problem after this change because the _raw_delete will not cause django's CASCADE delete?

@auvipy
Copy link
Member

auvipy commented Jan 13, 2023

can you please share more detail?

@ForthTurn
Copy link

I have a django model named MyTaskResult, here is the code:

class MyTaskResult(models.Model):
    result = models.OneToOneField(
        TaskResult,
        on_delete=models.CASCADE,
        verbose_name='TaskResult',
    )
    # other fields

and it works well when django-celery-results at 2.2.0.
But if update to 2.3.0 which the version include this mr, when I triggered TaskResult.objects.delete_expired():

from datetime import timedelta
from django_celery_results.models import TaskResult

t = timedelta(minutes=1)
TaskResult.objects.delete_expired(t)

I got this error:

django.db.utils.IntegrityError: (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (..., CONSTRAINT `..._result_id_bbca4d3d_fk_django_ce` FOREIGN KEY (`result_id`) REFERENCES `django_celery_results_taskresult` (`id`))')"

I think the builtin task celery.backend_cleanup will also trigger this method and raise this exception.

In 2.2.0, delete_expired()calls self.get_all_expired(expires).delete(), and it will make django to delete the MyTaskResult before delete TaskResult.

@auvipy
Copy link
Member

auvipy commented Jan 16, 2023

can you help to partially revert this manually? or provide a bettter mechanism??

@ForthTurn
Copy link

can you help to partially revert this manually? or provide a bettter mechanism??

Maybe can add a config key to switch? I saw it in here, but not in this mr

@ForthTurn
Copy link

And I think the default behavior should be to use queryset's delete instead of raw_delete. Queryset.delete can be guaranteed to be correct. If the users can confirm they don't have a cascade ForeignKey or listen to the delete signal, they can switch the config to reduce the usage of memory.

@auvipy
Copy link
Member

auvipy commented Jan 17, 2023

And I think the default behavior should be to use queryset's delete instead of raw_delete. Queryset.delete can be guaranteed to be correct. If the users can confirm they don't have a cascade ForeignKey or listen to the delete signal, they can switch the config to reduce the usage of memory.

I will be happy to accept a PR with partial revert of the PR[raw-delete part]

@ForthTurn
Copy link

I will be happy to accept a PR with partial revert of the PR[raw-delete part]

I will submit this at a later date :)

hansegucker added a commit to hansegucker/django-celery-results that referenced this pull request Apr 22, 2023
auvipy added a commit that referenced this pull request Apr 28, 2023
…#384)

* Revert "feat: raw delete expired instead of `Queryset.delete` (#235)"

This partially reverts commit 7c7cb67.

* Fix tests for Django 4.2

---------

Co-authored-by: Asif Saif Uddin <auvipy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DatabaseBackend.cleanup causes OOM kill for large expired result
3 participants