New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot parse inline stream images, because open() always seeks 0 #7096
Comments
I'd be reluctant to change it, in case there are users relying on being able to read part of an image, and then have Pillow fail to load that same image without the user first seeking backwards. I imagine that you considered reading the rest of the stream, but didn't like it for performance reasons? im = Image.open(BytesIO(file.read())) I've created PR #7097 to update the documentation. |
Not every file seeks to 0 first though: Pillow/src/PIL/Hdf5StubImagePlugin.py Lines 31 to 46 in 70aaa20
So it might actually be good to make that consistent. Here's some of the locations I see that seek to a specific offset: Pillow/src/PIL/BlpImagePlugin.py Lines 285 to 286 in 1321b6e
Pillow/src/PIL/EpsImagePlugin.py Lines 97 to 103 in 1321b6e
Lines 3202 to 3206 in 1321b6e
Line 3225 in 1321b6e
Pillow/src/PIL/ImImagePlugin.py Lines 122 to 129 in 1321b6e
This is when saving a file instead of reading one, but it's basically the same situation. Pillow/src/PIL/MpoImagePlugin.py Line 94 in 1321b6e
Pillow/src/PIL/MpoImagePlugin.py Lines 108 to 109 in 1321b6e
Pillow/src/PIL/MspImagePlugin.py Line 115 in 1321b6e
Pillow/src/PIL/PcdImagePlugin.py Lines 30 to 33 in 1321b6e
I also noticed this one, which is seeking to an offset from the start of the image, but if the image isn't at the start of the file it won't be correct. I think there might be others like this. Line 110 in 1321b6e
|
It isn't the individual plugins that are seeking to zero, it is Lines 3202 to 3225 in aec7a8d
|
Oh yeah, I guess that does happen first. |
You could maybe consider adding a property to It would also require making sure the code doesn't do anything like |
Though that would require as Yay noted consistency among the readers to actually to also not seek to zero but rather seek to the initial |
That's at least not how Pillow behaves for images with multiple frames, where for the sake of performance, we don't read all of the file data in on |
I understand that you could request that Pillow seeks back to the end of all of the frames within the file object after each Furthermore, some images in the wild end unexpectedly early - they are truncated. Detecting that end relies on there being no more data at the end of the file. While it is conceivable, if complicated, for Pillow to be provided an offset to the start of an image within a file object, your expectations that
aren't in line with the way that Pillow operates. |
I'm guessing that this isn't a useful feature without those other features. |
Yes. I would assume any embedded image is well-formed. I wouldn't expect this to be a sort of wild-form but a complete image inside another file format. But, I don't think well-formed is necessarily a requirement here. Just that no file format does a seek that isn't based off the initial position within the stream, and that we can avoid the initial seek(0) command. The former being the harder of those, it would require restricting all loader code to Alternatively, it may need some code to maybe figure out the type and the size of the image at the seek location. Much like it already figures out what type of image, it could figure out the size and maybe just seek and load that amount of bytes and wrap it in a stream (if stream.tell() != 0). But, it seems like a routine to detect the size of the file based on the type of the file (much like Pillow does with the filetype itself) could maybe settle the issue usefully. Though documenting it does settle the issue at a minimum, and that's now done. |
Adding offset to the API will lead to significant overcomplicating on many layers. Have you tried wrappers around files objects like this one? |
I'm saying that's not an assumption that Pillow can make. Some users seek support for images, even though they are malformed. See #4370 for instance.
Pillow supports, to some extent, reading truncated images. If an image is truncated, then that means that it ends unexpectedly early, and so any information we might use to infer when the image ends would be wrong. Alternatively, it is conceivable that images might just report their size incorrectly, like in #5164. I'm reluctant to add a new method to guess the file size because it may be wrong, and because for most use cases the image size in bytes seems obvious, as it is the file size. If we were to start reading images relative to the current position, I don't see how that would be useful to your situation without either the ability to predict the seek position relative to the image when reading is done (which isn't always at the end of the image), or to know the image size in bytes.
Because of this, and because I don't think any changes to Pillow's behaviour are viable for reasons listed here and previously, if #7097 is merged to update the documentation, this will be closed. If you still have further thoughts, feel free to post them. |
homm's suggestion is actually really clever. While I solved my code and wrapped it in another buffer moving the bytes over by solving how large the .bmp image needs to be. There's actually a somewhat reasonable chance that wrapping it into an offset-position fileobject where There was always a chance that an embedded image file could have had a non-obvious size and I would have previously been out of luck. It's certainly not 100% but I assume embedded images should be well-formed and complete. And there's some reasonable expectation that for some embedded images (including actually the one that lead to this issue) could be solved by simply duck-casting an offset file-stream. Though I malformed files would always fail and I'm not sure there's any guarantee that the files end at the end of the stream (since the data can be interspersed). |
I am parsing a file format with an inline bitmap. I reached the point at the start of the bitmap and passed the stream to Image.open(). It does a seek(0), which is a default
stream.seek(0,0)
which went to the start of the stream which is not where the bitmap is located.I expected that I could pass PIL the opened stream at the start of the file and the stream would be located at the very end of the file. To do this I suspect that:
Would have been used, and the size of the file would be determined by Pillow knowing the file types.
Worse yet, attempting to use this as an inline reader had the stream seek to the beginning of my reader, issued a false positive error on the read, and said it couldn't parse the image type. After some work on the image it became a bit obvious that it absolutely could parse a super-common 24-bit bitmap, so that message was in error and debugging found this issue.
Without inline stream reading, I needed to read the prefix myself, figure out the size, get that chunk of bytes, wrap those bytes in another
BytesIO()
and submit that. My workaround seems pretty clumsy looking. And if this was an arbitrary inline image, I might need a lot of different methods to determine the size.At a minimum, document the fact that streams passed as the
fp
object must have the image located at the beginning of the stream (regardless where the image is located within the stream).The text was updated successfully, but these errors were encountered: