Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Content-MD5 header for s3transfer #2221

Merged
merged 2 commits into from Nov 17, 2020
Merged

Fix Content-MD5 header for s3transfer #2221

merged 2 commits into from Nov 17, 2020

Conversation

nateprewitt
Copy link
Contributor

Overview

This is a continuation of #1985 to resolve #1979. We're adding two additional tests to prevent future regression with empty bodies, and adding some more info on the underlying issue.

Bug

While the initial PR was addressing a lack of Content-MD5 headers being added for empty files going to s3, the bug is little more subtle. Currently, using the low level API for PutBucket will ensure all input, regardless of length, is transformed into a file-like object. When this is passed to our MD5 checksum function, it will always be True and get the header added correctly. These always evaluate to True because bool resolves "truthiness" to be if the file-object exists or not.

The bug enters when the higher level s3 APIs are being invoked like s3 sync in the AWS CLI. This replaces our standard file-like wrappers (StringIO, BytesIO) with s3transfer's ReadFileChunk. Due to s3transfer implementing __len__ on what would otherwise appear to be a file-like object, we now get different criteria for what makes it truthy. bool will choose __len__ > 0 as being more specific than if the file exists. That broke our previous assumptions on what would be coming through to conditionally_calculate_md5.

Outcome

We probably should make ReadFileChunk adhere closer to the file-like object spec, but at this point it would be a backwards incompatible change. I think the original proposal from @wimglenn is the safest bet to make things operate as expected.

* Include Content-MD5 header even when body is empty. Closes #1979
@codecov-io
Copy link

codecov-io commented Nov 16, 2020

Codecov Report

Merging #2221 (b263f6e) into develop (60d2bf7) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2221      +/-   ##
===========================================
- Coverage    98.23%   98.23%   -0.01%     
===========================================
  Files           58       58              
  Lines        10852    10848       -4     
===========================================
- Hits         10661    10657       -4     
  Misses         191      191              
Impacted Files Coverage Δ
botocore/handlers.py 96.35% <ø> (-0.04%) ⬇️
botocore/utils.py 97.73% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60d2bf7...b263f6e. Read the comment docs.

@nateprewitt nateprewitt merged commit dd10b44 into develop Nov 17, 2020
@nateprewitt nateprewitt deleted the empty_md5_fix branch November 17, 2020 00:08
Copy link
Contributor

@zdutta zdutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can not PUT a 0 bytes object when Object Lock is enabled on s3 bucket
5 participants