Image.convert() on TIFF image creates OSError -2 #5673

drj-snoco · 2021-08-12T22:43:06Z

What did you do?
I am moving documents from an aging document management system (HighView) into OpenText Content Server for a local government. The documents are stored in HighView as single page TIFF images, mostly RGB images saved with JPEG ("old style" compression). I am using Python with Pillow to:

Gather all TIFF pages for a single document into a single page.
Convert all document pages into TIFF with JPEG (but a more modern form)
Append the converted pages to a list, and save the list to a PDF document.
Import the PDF into OpenText Content Services.
What did you expect to happen?
I expected to be able to open the TIFF files, and convert them into a single multi-page PDF file.

What actually happened?
For the most part, all has happened as expected. I converted 11,000+ documents to PDF, the longest of them over 1100 pages long. However, a few document pages give Pillow (and, to be fair, many other software packages) fits. I can open these image files using the HighView document viewer, Snagit (from Techsmith), and IrfanView. I can open them using Pillow's Image.open() method, but the instant I try to convert them Pillow throws errors.

What are your OS, Python and Pillow versions?
OS: Windows 10 20H2
Python: 3.8.10, Anaconda
Pillow 8.3.1

Sample code

>>> from PIL import Image
>>> filename = '5eef56'
>>> i=Image.open(filename,formats=(['TIFF']))
>>> im=i.convert('RGB')

results in following error stack:

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    im=i.convert('RGB')
  File "C:\ProgramData\Anaconda3\envs\assessorfb\lib\site-packages\PIL\Image.py", line 915, in convert
    self.load()
  File "C:\ProgramData\Anaconda3\envs\assessorfb\lib\site-packages\PIL\TiffImagePlugin.py", line 1122, in load
    return self._load_libtiff()
  File "C:\ProgramData\Anaconda3\envs\assessorfb\lib\site-packages\PIL\TiffImagePlugin.py", line 1226, in _load_libtiff
    raise OSError(err)
OSError: -2

this code:

>>> from PIL import Image
>>> filename = '5eef56'
>>> i=Image.open(filename,formats=(['TIFF']))
>>> i.save('c:\\test\\test.pdf',format='PDF')

results in the
imagetest.zip
imagetest.zip
following error stack:

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    i.save('c:\\test\\test.pdf',format='PDF')
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\Image.py", line 2201, in save
    self._ensure_mutable()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\Image.py", line 624, in _ensure_mutable
    self._copy()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\Image.py", line 617, in _copy
    self.load()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\TiffImagePlugin.py", line 1122, in load
    return self._load_libtiff()
  File "C:\programdata\anaconda3\envs\AssessorFB\lib\site-packages\PIL\TiffImagePlugin.py", line 1226, in _load_libtiff
    raise OSError(err)
OSError: -2
>>>

Additional Notes
I've two TIFF files to the issue. One is an original "problem" file (5eef56, no extension), plus a copy opened and re-saved using Snagit (5eef56_snagit.tif), so that you can easily view the stored image.

These are "Live" files from the office, but they are public domain, and contain no identifying information. I can supply other image samples if desired.

Any ideas would be greatly appreciated. These "problem" files seem to have been created during a quality control rescan during late 2008. They're fairly rare (less than .1% of the current sample), but I'd much rather handle them in my code, than have to deal with them manually.

The text was updated successfully, but these errors were encountered:

kmilos · 2021-08-13T08:38:05Z

First idea: there is a known issue with default Anaconda packages on Windows, could you try conda-forge instead?

drj-snoco · 2021-08-13T17:45:19Z

I can set up a conda environment using conda-forge, sure. I'll respond back shortly.

drj-snoco · 2021-08-13T18:16:01Z

Here's the result, after setting up a Pillow environment using conda-forge:

>>> from PIL import image
>>> filename = '5eef56'
>>> i=Image.open(filename,formats=(["TIFF"]))
>>> im=i.convert('RGB')

Got the following error stack:

Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    im=i.convert('RGB')
  File "C:\ProgramData\Anaconda3\envs\pillowforge\lib\site-packages\PIL\Image.py", line 915, in convert
    self.load()
  File "C:\ProgramData\Anaconda3\envs\pillowforge\lib\site-packages\PIL\TiffImagePlugin.py", line 1122, in load
    return self._load_libtiff()
  File "C:\ProgramData\Anaconda3\envs\pillowforge\lib\site-packages\PIL\TiffImagePlugin.py", line 1226, in _load_libtiff
    raise OSError(err)
OSError: -2

With an additional error showing up on the command line, after closing IDLE:

>>>JPEGLib: Not a JPEG file: starts with 0xda 0xfd.

Another of the "Problem" TIFF images errors out with the same error stack, although the JPEGLib error is slightly different:

>>>JPEGLib: Not a JPEG file: starts with 0xf1 0x1f.

Unfortunately, I haven't been able to track down the exact software package used to produce these specific images.

cgohlke · 2021-08-18T18:32:41Z

This is certainly not a valid TIFF file. As a workaround, the image data can be read as follows (if this fails, the file could be scanned for a JPEG stream):

from io import BytesIO
from PIL import Image, ImageFile

img = Image.open('5eef56', formats=['TIFF'])
offset = img.tag_v2[513]  # JPEGInterchangeFormat
bytecount = img.tag_v2[514]  # JPEGInterchangeFormatLength
img.close()

with open('5eef56', 'rb') as fh:
    fh.seek(offset)
    jpeg = fh.read(bytecount)
    
ImageFile.LOAD_TRUNCATED_IMAGES = True
img = Image.open(BytesIO(jpeg), formats=['JPEG'])
print(img)
img.show()

radarhere · 2021-08-23T12:09:31Z

This is certainly not a valid TIFF file.

Could you explain this a bit further?

cgohlke · 2021-08-23T15:57:36Z

Could you explain this a bit further?

TileWidth and TileLength are not a multiple of 16. TileOffsets and TileByteCounts do not encode a valid JPEG stream. Compression is JPEG (7) but the file uses OJPEG (6, old style JPEG, invalidated by TIFF TechNote 2).

drj-snoco · 2021-09-02T18:36:13Z

It appears that the files were likely created by our Scan Center using Kodak Capture Pro (now Alaris Capture Pro), back in 2008. There are several thousand such files, usually embedded in a set of otherwise manageable TIFF pages. As I mentioned earlier, IrfanView can open these, but I really don't want to tell staff to manually convert them to a more .readable format, if they can be handled in an automated manner.

Any advice for handling these in Pillow, or is this something that would require a lower level tool? Would modifying the TIFF tags be enough?

cgohlke · 2021-09-02T19:08:48Z

Try changing the compression tag to OJPEG (the following script patches the file!) and then process them in Pillow. Or maybe there is a way to change the compression tag value in Pillow before decoding the image data?

import tifffile

with tifffile.TiffFile('5eef56', mode='r+b') as tif:
    tif.pages[0].tags['Compression'].overwrite(6)

drj-snoco · 2021-09-07T21:31:41Z

Thanks, using TifffFile to convert the compression tag worked. Unfortunately, I didn't see an obvious way to edit the TIFF compression tag values in Pillow, at least without processing the image first. If there is a way to do such a thing, I'd be glad to hear about it.

>>> import tifffile
>>> with tifffile.TiffFile('5eef56',mode='r+b') as tif:
	    tif.pages[0].tags["Compression"].overwrite(6)

(Note: both TIFFfile and Pillow still had some issues with the fixed TIFF image from the above code. I couldn't open it directly using tifffile.imread(), or do much processing in Pillow -- I could open it in Pillow, and save it as a PDF, but not as a TIFF file -- at least not directly.

However, converting the updated file to a Numpy array worked:

>>> import numpy as np
>>> from PIL import Image
>>> 
>>> img=Image.open('5eef56',formats=['TIFF']) 

>>> arr=np.array(img) #This statement failed with the original file, with the mismatched compression tag.
>>> img2=Image.fromarray(arr)

At this point, I have a normal Pillow image, and can do with it as I like.

Thanks very much for your help; it's greatly appreciated!

radarhere added the TIFF label Aug 12, 2021

drj-snoco closed this as completed Sep 8, 2021

aclark4life added the Anaconda Issues with Anaconda's Pillow label May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image.convert() on TIFF image creates OSError -2 #5673

Image.convert() on TIFF image creates OSError -2 #5673

drj-snoco commented Aug 12, 2021 •

edited by radarhere

kmilos commented Aug 13, 2021 •

edited

drj-snoco commented Aug 13, 2021

drj-snoco commented Aug 13, 2021 •

edited by hugovk

cgohlke commented Aug 18, 2021

radarhere commented Aug 23, 2021

cgohlke commented Aug 23, 2021

drj-snoco commented Sep 2, 2021 •

edited by radarhere

cgohlke commented Sep 2, 2021

drj-snoco commented Sep 7, 2021 •

edited by radarhere

Image.convert() on TIFF image creates OSError -2 #5673

Image.convert() on TIFF image creates OSError -2 #5673

Comments

drj-snoco commented Aug 12, 2021 • edited by radarhere

kmilos commented Aug 13, 2021 • edited

drj-snoco commented Aug 13, 2021

drj-snoco commented Aug 13, 2021 • edited by hugovk

cgohlke commented Aug 18, 2021

radarhere commented Aug 23, 2021

cgohlke commented Aug 23, 2021

drj-snoco commented Sep 2, 2021 • edited by radarhere

cgohlke commented Sep 2, 2021

drj-snoco commented Sep 7, 2021 • edited by radarhere

drj-snoco commented Aug 12, 2021 •

edited by radarhere

kmilos commented Aug 13, 2021 •

edited

drj-snoco commented Aug 13, 2021 •

edited by hugovk

drj-snoco commented Sep 2, 2021 •

edited by radarhere

drj-snoco commented Sep 7, 2021 •

edited by radarhere