Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image#size and Content-Length header mismatch, intermittent #546

Closed
taylorthurlow opened this issue Sep 20, 2022 · 3 comments
Closed

Image#size and Content-Length header mismatch, intermittent #546

taylorthurlow opened this issue Sep 20, 2022 · 3 comments

Comments

@taylorthurlow
Copy link

taylorthurlow commented Sep 20, 2022

MiniMagick 4.11.0:

I'm having an intermittent issue where:

  • The Content-Length header of an image URL reports the correct size in bytes of the image
  • MiniMagick sometimes reads the image from the same URI and returns a value for #size that does not match the header's value

Theoretically this is the cause of corrupted images being written by MiniMagick.

For example:

image_uri = URI("https://via.placeholder.com/350x150.jpg")
expected_image_size_bytes = Faraday.head(image_uri).headers["content-length"].to_i
image = MiniMagick::Image.open(image_uri)

raise "Expected size and actual size mismatch" if image.size != expected_image_size_bytes

This occurs rarely, but reliably, and is not specific to the image host the image is being served from. Every instance of this problem can be solved by re-running the job this code executes in, and things work as expected. The size of the image as reported by the Content-Length header is always the "correct" size of the image.

What's even stranger is that the reported value returned from #size is so far (in every case) larger than the Content-Length header value, up to nearly double the size.

I don't want to blame MiniMagick here, but it happens with both IM and GM backends, latest IM 6.x and GM versions. My only theory is some problem with the underlying call to IO.copy_stream which is somehow wigging out in the middle of the stream copy, only to rewind and start copying the file from the start again.

Let me know if any of this sounds plausible, I'm continuing to investigate but this one is hard to pin down.

@taylorthurlow taylorthurlow changed the title Image#size and Content-Length header mismatch Image#size and Content-Length header mismatch, intermittent Sep 20, 2022
@taylorthurlow
Copy link
Author

taylorthurlow commented Sep 20, 2022

Further testing confirms that the initial HEAD request matches Content-Length with a subsequent GET request which includes the full image in the body. The Image#size result still occasionally fails to match with the header value. That test took place with a separate GET request for the header, and allowed MiniMagick to submit a separate request to the same image as it saw fit (given the same input URI). Next change is to use the body from the explicit GET request as an input to Image.read and wait for another size mismatch.

@janko
Copy link
Member

janko commented Dec 7, 2022

It's likely this is a bug in open-uri, which MiniMagick uses to download images. The issue is most certainly not in IO.copy_stream, because that doesn't touch the network.

Have you tried downloading images with a different HTTP library?

@janko
Copy link
Member

janko commented Jun 7, 2024

I think the issue is that the image wasn't fully downloaded to disk, so I'm closing this for now.

@janko janko closed this as completed Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants