Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image.convert() on TIFF image creates OSError -2 #5673

Closed
drj-snoco opened this issue Aug 12, 2021 · 9 comments
Closed

Image.convert() on TIFF image creates OSError -2 #5673

drj-snoco opened this issue Aug 12, 2021 · 9 comments
Labels
Anaconda Issues with Anaconda's Pillow TIFF

Comments

@drj-snoco
Copy link

drj-snoco commented Aug 12, 2021

What did you do?
I am moving documents from an aging document management system (HighView) into OpenText Content Server for a local government. The documents are stored in HighView as single page TIFF images, mostly RGB images saved with JPEG ("old style" compression). I am using Python with Pillow to:

Gather all TIFF pages for a single document into a single page.
Convert all document pages into TIFF with JPEG (but a more modern form)
Append the converted pages to a list, and save the list to a PDF document.
Import the PDF into OpenText Content Services.
What did you expect to happen?
I expected to be able to open the TIFF files, and convert them into a single multi-page PDF file.

What actually happened?
For the most part, all has happened as expected. I converted 11,000+ documents to PDF, the longest of them over 1100 pages long. However, a few document pages give Pillow (and, to be fair, many other software packages) fits. I can open these image files using the HighView document viewer, Snagit (from Techsmith), and IrfanView. I can open them using Pillow's Image.open() method, but the instant I try to convert them Pillow throws errors.

What are your OS, Python and Pillow versions?
OS: Windows 10 20H2
Python: 3.8.10, Anaconda
Pillow 8.3.1

Sample code

>>> from PIL import Image
>>> filename = '5eef56'
>>> i=Image.open(filename,formats=(['TIFF']))
>>> im=i.convert('RGB')

results in following error stack:

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    im=i.convert('RGB')
  File "C:\ProgramData\Anaconda3\envs\assessorfb\lib\site-packages\PIL\Image.py", line 915, in convert
    self.load()
  File "C:\ProgramData\Anaconda3\envs\assessorfb\lib\site-packages\PIL\TiffImagePlugin.py", line 1122, in load
    return self._load_libtiff()
  File "C:\ProgramData\Anaconda3\envs\assessorfb\lib\site-packages\PIL\TiffImagePlugin.py", line 1226, in _load_libtiff
    raise OSError(err)
OSError: -2

this code:

>>> from PIL import Image
>>> filename = '5eef56'
>>> i=Image.open(filename,formats=(['TIFF']))
>>> i.save('c:\\test\\test.pdf',format='PDF')

results in the
imagetest.zip
imagetest.zip
following error stack:

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    i.save('c:\\test\\test.pdf',format='PDF')
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\Image.py", line 2201, in save
    self._ensure_mutable()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\Image.py", line 624, in _ensure_mutable
    self._copy()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\Image.py", line 617, in _copy
    self.load()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\TiffImagePlugin.py", line 1122, in load
    return self._load_libtiff()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\TiffImagePlugin.py", line 1226, in _load_libtiff
    raise OSError(err)
OSError: -2
>>>

Additional Notes
I've two TIFF files to the issue. One is an original "problem" file (5eef56, no extension), plus a copy opened and re-saved using Snagit (5eef56_snagit.tif), so that you can easily view the stored image.

These are "Live" files from the office, but they are public domain, and contain no identifying information. I can supply other image samples if desired.

Any ideas would be greatly appreciated. These "problem" files seem to have been created during a quality control rescan during late 2008. They're fairly rare (less than .1% of the current sample), but I'd much rather handle them in my code, than have to deal with them manually.

@radarhere radarhere added the TIFF label Aug 12, 2021
@kmilos
Copy link
Contributor

kmilos commented Aug 13, 2021

@drj-snoco
Copy link
Author

I can set up a conda environment using conda-forge, sure. I'll respond back shortly.

@drj-snoco
Copy link
Author

drj-snoco commented Aug 13, 2021

Here's the result, after setting up a Pillow environment using conda-forge:

>>> from PIL import image
>>> filename = '5eef56'
>>> i=Image.open(filename,formats=(["TIFF"]))
>>> im=i.convert('RGB')

Got the following error stack:

Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    im=i.convert('RGB')
  File "C:\ProgramData\Anaconda3\envs\pillowforge\lib\site-packages\PIL\Image.py", line 915, in convert
    self.load()
  File "C:\ProgramData\Anaconda3\envs\pillowforge\lib\site-packages\PIL\TiffImagePlugin.py", line 1122, in load
    return self._load_libtiff()
  File "C:\ProgramData\Anaconda3\envs\pillowforge\lib\site-packages\PIL\TiffImagePlugin.py", line 1226, in _load_libtiff
    raise OSError(err)
OSError: -2

With an additional error showing up on the command line, after closing IDLE:

>>>JPEGLib: Not a JPEG file: starts with 0xda 0xfd.

Another of the "Problem" TIFF images errors out with the same error stack, although the JPEGLib error is slightly different:

>>>JPEGLib: Not a JPEG file: starts with 0xf1 0x1f.

Unfortunately, I haven't been able to track down the exact software package used to produce these specific images.

@cgohlke
Copy link
Contributor

cgohlke commented Aug 18, 2021

This is certainly not a valid TIFF file. As a workaround, the image data can be read as follows (if this fails, the file could be scanned for a JPEG stream):

from io import BytesIO
from PIL import Image, ImageFile

img = Image.open('5eef56', formats=['TIFF'])
offset = img.tag_v2[513]  # JPEGInterchangeFormat
bytecount = img.tag_v2[514]  # JPEGInterchangeFormatLength
img.close()

with open('5eef56', 'rb') as fh:
    fh.seek(offset)
    jpeg = fh.read(bytecount)
    
ImageFile.LOAD_TRUNCATED_IMAGES = True
img = Image.open(BytesIO(jpeg), formats=['JPEG'])
print(img)
img.show()

@radarhere
Copy link
Member

This is certainly not a valid TIFF file.

Could you explain this a bit further?

@cgohlke
Copy link
Contributor

cgohlke commented Aug 23, 2021

Could you explain this a bit further?

TileWidth and TileLength are not a multiple of 16. TileOffsets and TileByteCounts do not encode a valid JPEG stream. Compression is JPEG (7) but the file uses OJPEG (6, old style JPEG, invalidated by TIFF TechNote 2).

@drj-snoco
Copy link
Author

drj-snoco commented Sep 2, 2021

It appears that the files were likely created by our Scan Center using Kodak Capture Pro (now Alaris Capture Pro), back in 2008. There are several thousand such files, usually embedded in a set of otherwise manageable TIFF pages. As I mentioned earlier, IrfanView can open these, but I really don't want to tell staff to manually convert them to a more .readable format, if they can be handled in an automated manner.

Any advice for handling these in Pillow, or is this something that would require a lower level tool? Would modifying the TIFF tags be enough?

@cgohlke
Copy link
Contributor

cgohlke commented Sep 2, 2021

Try changing the compression tag to OJPEG (the following script patches the file!) and then process them in Pillow. Or maybe there is a way to change the compression tag value in Pillow before decoding the image data?

import tifffile

with tifffile.TiffFile('5eef56', mode='r+b') as tif:
    tif.pages[0].tags['Compression'].overwrite(6)

@drj-snoco
Copy link
Author

drj-snoco commented Sep 7, 2021

Thanks, using TifffFile to convert the compression tag worked. Unfortunately, I didn't see an obvious way to edit the TIFF compression tag values in Pillow, at least without processing the image first. If there is a way to do such a thing, I'd be glad to hear about it.

>>> import tifffile
>>> with tifffile.TiffFile('5eef56',mode='r+b') as tif:
	    tif.pages[0].tags["Compression"].overwrite(6)

(Note: both TIFFfile and Pillow still had some issues with the fixed TIFF image from the above code. I couldn't open it directly using tifffile.imread(), or do much processing in Pillow -- I could open it in Pillow, and save it as a PDF, but not as a TIFF file -- at least not directly.

However, converting the updated file to a Numpy array worked:

>>> import numpy as np
>>> from PIL import Image
>>> 
>>> img=Image.open('5eef56',formats=['TIFF']) 

>>> arr=np.array(img) #This statement failed with the original file, with the mismatched compression tag.
>>> img2=Image.fromarray(arr)

At this point, I have a normal Pillow image, and can do with it as I like.

Thanks very much for your help; it's greatly appreciated!

@aclark4life aclark4life added the Anaconda Issues with Anaconda's Pillow label May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Anaconda Issues with Anaconda's Pillow TIFF
Projects
None yet
Development

No branches or pull requests

5 participants