Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When copying Model-Instances as per django documentation, adjusting either deletes the File for the other #98

Closed
Grosskopf opened this issue Jan 20, 2023 · 4 comments · Fixed by #95

Comments

@Grosskopf
Copy link

Hi,
The Django documentation mentions this method for copying Model-Instance: https://docs.djangoproject.com/en/4.0/topics/db/queries/#copying-model-instances as per default Django behaviour this results in a non-unique filename. If one of them is afterwards adjusted, #51 takes place and The file is deleted from the other copy. Could you please mention in your documentation/enforce in some way Uniqueness in FileFields so that Files with the same name are not deleted as easily? :)

@Grosskopf
Copy link
Author

I'm currently considering adding this pre_save signal:

def pre_save_file_field_containing_model(
        sender: Type[Model], instance: Model, **kwargs):
    """
    Handle the pre-save signal of a Model with a FileField.
    These Models should all check, if the FileField they contain contains
    a non-unique FileField and enforce uniqueness.
    """
    if instance.pk is not None:
        return
    file_fields = list(get_fields_for_model(get_model_name(sender)))
    for file_field in file_fields:
        if getattr(instance, file_field):
            other_exist = sender.objects.filter(
                **{file_field: getattr(instance, file_field)}).exists()
            if other_exist:
                file = getattr(instance, file_field)
                new_path = file.storage.get_available_name(file.path)
                shutil.copyfile(file.path, new_path)
                new_path = path.relpath(new_path, file.storage.location)
                setattr(instance, file_field, new_path)

Propably it's not the best way to write this in a long shot but it gets the job done, enforcing uniqueness in future copy operations

@vinnyrose
Copy link
Collaborator

This comes up occasionally, past issues mention this #85, #51, maybe others I can't find. I can update documentation to call out that we don't support this out of the box and provide recommendations.

@vinnyrose vinnyrose mentioned this issue Jan 28, 2023
@Grosskopf
Copy link
Author

Thank you :)

A specific mention of "unique" or that this happens when copying instances would have been awesome to stop this in future from happening but a bit of text on this topic allready helps a lot :)

For anyone finding this Issue later down the line because they duplicated a model as per django documentation and now noticed that a file is missing:

What we did after figuring this out

We ran this code to find out how many hundreds of files were affected across the years:

from django_cleanup import cache
from django.db.models import Count, Q
import os
import shutil
count_not_found = 0
filenames = []
for model in cache.cleanup_models():
    model_name = cache.get_model_name(model)
    file_fields = list(cache.get_fields_for_model(model_name))
    for file_field in file_fields:
        instances = list(model.objects.all().exclude(Q(**{file_field:''})|Q(**{file_field:None})))
        for instance in instances:
            file = getattr(instance, file_field)
            if not os.path.exists(file.path):
                count_not_found += 1
                print(f'{model_name} | {file_field}  | {str(instance.pk)} | {file.path}')

We attached the above mentioned Signal to one of our apps, that helps stop this from happening.

carefull on this one, this can result in more file-loss

We regenerated our few hundred files from years of backups and after trying it on local testing copies of the live system because the warnings are shown after the file is already gone (renamed) ran this code to get the few hundred duplicate files unique:

from django_cleanup import cache
from django.db.models import Count, Q
import os
import shutil
count = 0
count_files = 0
filenames = []
for model in cache.cleanup_models():
    model_name = cache.get_model_name(model)
    file_fields = list(cache.get_fields_for_model(model_name))
    for file_field in file_fields:
        files = model.objects.values(file_field).annotate(count=Count('id')).order_by(file_field).filter(count__gt=1).exclude(Q(**{file_field:''})|Q(**{file_field:None}))
        for file in files:
            filename = file.get(file_field)
            instances = list(model.objects.filter(**{file_field: filename}))
            first_file = getattr(instances[0], file_field)
            if filename in filenames:
                print('warning: found file twice')
                print(filename)
                print(instances)
            else:
                filenames.append(filename)
            if not os.path.exists(first_file.path):
                continue
            print(filename)
            count_files += 1
            save_objects = True
            for instance in instances:
                file_field_instance = getattr(instance, file_field)
                storage = file_field_instance.storage
                current_base_path = file_field_instance
                new_path = storage.get_available_name(file_field_instance.path)
                shutil.copyfile(file_field_instance.path, new_path)
                updated_path = os.path.relpath(new_path, storage.location)
                if len(updated_path) < 100:
                    setattr(instance, file_field, updated_path)
                else:
                    save_objects = False
                count += 1
                print(f'{model_name} | {file_field}  | {str(instance.pk)}')
            if save_objects:
                for instance in instances:
                    instance.save()

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants