New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image.convert() on TIFF image creates OSError -2 #5673
Comments
I can set up a conda environment using conda-forge, sure. I'll respond back shortly. |
Here's the result, after setting up a Pillow environment using conda-forge: >>> from PIL import image
>>> filename = '5eef56'
>>> i=Image.open(filename,formats=(["TIFF"]))
>>> im=i.convert('RGB') Got the following error stack:
With an additional error showing up on the command line, after closing IDLE:
Another of the "Problem" TIFF images errors out with the same error stack, although the JPEGLib error is slightly different:
Unfortunately, I haven't been able to track down the exact software package used to produce these specific images. |
This is certainly not a valid TIFF file. As a workaround, the image data can be read as follows (if this fails, the file could be scanned for a JPEG stream): from io import BytesIO
from PIL import Image, ImageFile
img = Image.open('5eef56', formats=['TIFF'])
offset = img.tag_v2[513] # JPEGInterchangeFormat
bytecount = img.tag_v2[514] # JPEGInterchangeFormatLength
img.close()
with open('5eef56', 'rb') as fh:
fh.seek(offset)
jpeg = fh.read(bytecount)
ImageFile.LOAD_TRUNCATED_IMAGES = True
img = Image.open(BytesIO(jpeg), formats=['JPEG'])
print(img)
img.show() |
Could you explain this a bit further? |
TileWidth and TileLength are not a multiple of 16. TileOffsets and TileByteCounts do not encode a valid JPEG stream. Compression is JPEG (7) but the file uses OJPEG (6, old style JPEG, invalidated by TIFF TechNote 2). |
It appears that the files were likely created by our Scan Center using Kodak Capture Pro (now Alaris Capture Pro), back in 2008. There are several thousand such files, usually embedded in a set of otherwise manageable TIFF pages. As I mentioned earlier, IrfanView can open these, but I really don't want to tell staff to manually convert them to a more .readable format, if they can be handled in an automated manner. Any advice for handling these in Pillow, or is this something that would require a lower level tool? Would modifying the TIFF tags be enough? |
Try changing the compression tag to OJPEG (the following script patches the file!) and then process them in Pillow. Or maybe there is a way to change the compression tag value in Pillow before decoding the image data? import tifffile
with tifffile.TiffFile('5eef56', mode='r+b') as tif:
tif.pages[0].tags['Compression'].overwrite(6) |
Thanks, using TifffFile to convert the compression tag worked. Unfortunately, I didn't see an obvious way to edit the TIFF compression tag values in Pillow, at least without processing the image first. If there is a way to do such a thing, I'd be glad to hear about it. >>> import tifffile
>>> with tifffile.TiffFile('5eef56',mode='r+b') as tif:
tif.pages[0].tags["Compression"].overwrite(6) (Note: both TIFFfile and Pillow still had some issues with the fixed TIFF image from the above code. I couldn't open it directly using tifffile.imread(), or do much processing in Pillow -- I could open it in Pillow, and save it as a PDF, but not as a TIFF file -- at least not directly. However, converting the updated file to a Numpy array worked: >>> import numpy as np
>>> from PIL import Image
>>>
>>> img=Image.open('5eef56',formats=['TIFF'])
>>> arr=np.array(img) #This statement failed with the original file, with the mismatched compression tag.
>>> img2=Image.fromarray(arr) At this point, I have a normal Pillow image, and can do with it as I like. Thanks very much for your help; it's greatly appreciated! |
What did you do?
I am moving documents from an aging document management system (HighView) into OpenText Content Server for a local government. The documents are stored in HighView as single page TIFF images, mostly RGB images saved with JPEG ("old style" compression). I am using Python with Pillow to:
Gather all TIFF pages for a single document into a single page.
Convert all document pages into TIFF with JPEG (but a more modern form)
Append the converted pages to a list, and save the list to a PDF document.
Import the PDF into OpenText Content Services.
What did you expect to happen?
I expected to be able to open the TIFF files, and convert them into a single multi-page PDF file.
What actually happened?
For the most part, all has happened as expected. I converted 11,000+ documents to PDF, the longest of them over 1100 pages long. However, a few document pages give Pillow (and, to be fair, many other software packages) fits. I can open these image files using the HighView document viewer, Snagit (from Techsmith), and IrfanView. I can open them using Pillow's Image.open() method, but the instant I try to convert them Pillow throws errors.
What are your OS, Python and Pillow versions?
OS: Windows 10 20H2
Python: 3.8.10, Anaconda
Pillow 8.3.1
Sample code
results in following error stack:
this code:
results in the
imagetest.zip
imagetest.zip
following error stack:
Additional Notes
I've two TIFF files to the issue. One is an original "problem" file (5eef56, no extension), plus a copy opened and re-saved using Snagit (5eef56_snagit.tif), so that you can easily view the stored image.
These are "Live" files from the office, but they are public domain, and contain no identifying information. I can supply other image samples if desired.
Any ideas would be greatly appreciated. These "problem" files seem to have been created during a quality control rescan during late 2008. They're fairly rare (less than .1% of the current sample), but I'd much rather handle them in my code, than have to deal with them manually.
The text was updated successfully, but these errors were encountered: