Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BadDigest errors when uploading non-ascii text files to S3 #1357

Open
ahumeau opened this issue Feb 27, 2024 · 1 comment
Open

BadDigest errors when uploading non-ascii text files to S3 #1357

ahumeau opened this issue Feb 27, 2024 · 1 comment

Comments

@ahumeau
Copy link

ahumeau commented Feb 27, 2024

Hi there,

I encountered the following error while using django-storages' S3 backend:

botocore.exceptions.ClientError: An error occurred (BadDigest) when calling the PutObject operation (reached max retries: 1): The Content-MD5 you specified did not match what we received.

I tracked it down to it being triggered by the upload of text files containing non-ascii characters to S3.

After a bit of spelunking in django-storages and boto3, I identified storages.utils.ReadBytesWrapper as the culprit.

Its issue is that while it does encode text content as expected in its read method, it does not handle seek and tell correctly. Indeed, it delegates those calls to the underlying text file handler which produces results inconsistent with what the read method returns:

> file = ReadBytesWrapper(ContentFile("é"))
> len(file.read())
2
> # Seek to the "end" of the file, it should return 2 since the binary data has a length
> # of 2 but returns 1 because the text data has a length of 1
> file.seek(0, 2)
1
> # Similar results with tell
> file.tell()
1

boto3 uses seek and tell to determine the length of the content to upload cf, gets an incorrect value from this and then uploads truncated content which do not pass the MD5 checksum check that S3 (thankfully :)) does.

Please find a minimal working example here.

The fix in our codebase is very simple: encode the text data ourselves instead of delegating that to django-storages but it would obviously be better if this was fixed upstream.

@GabrielDumbrava
Copy link

Hi, I had the same issue and I've fixed it by encoding the content to UTF-8. Thanks for the workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants