use Response.ContentLength & clarify log messages #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I spent time wondering why does checksum calculation take a while until I realized what happened.
The first time we call
io.Copy()
- it is to read the bytes fromhttp.Response.Body
and write them into sha256hash.Hash
and abytes.Buffer
(viaTeeReader
) for later copying into the destination file (if checksum matches). Nowhttp.Response.Body
does not buffer the bytes internally (which is a very sensible design decision), instead it streams them on-demand over the network. If the network is slow then this may therefore take a while.https://pkg.go.dev/net/http#Response
I originally intended the relevant code to avoid buffering - which it does - but only in cases where checksum verification is skipped, which arguably should be minority of cases.
I am not sure we can avoid buffering and avoid downloading the bytes twice (before and after verification), so retrospectively it's possible this may have been over-optimization and for code readability it would probably be beneficial to just always write into a buffer and pass it around? I definitely want to avoid writing the bytes into a file on the filesystem prior to having a chance to verify checksum but keeping it in a buffer seems harmless.
I'm open to any other thoughts/suggestions.