Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong decoding of 15-bit JPEG2000 images #6194

Closed
AleksMat opened this issue Apr 10, 2022 · 8 comments · Fixed by #6197
Closed

Wrong decoding of 15-bit JPEG2000 images #6194

AleksMat opened this issue Apr 10, 2022 · 8 comments · Fixed by #6197

Comments

@AleksMat
Copy link

AleksMat commented Apr 10, 2022

What did you do?

Satellite images from Sentinel-2 satellite, released by European Space Agency, are originally encoded in JPEG2000 format with an unusual 15-bit encoding. Since early versions, Pillow is reading these images incorrectly.

  • For Pillow<9.0.0 the error was that the decoded values were twice as large as they should be. Therefore a simple fix of dividing values by 2 solved the problem. This can be seen from code snippet 2.

  • For Pillow>=9.0.0 the decoded values seem to be completely arbitrary and such a fix doesn't work anymore. This can be seen from code snippet 1.

What did you expect to happen?

It is expected that values would be read correctly, like this is done by OpenCV and rasterio packages (code snippets 3 and 4).

What actually happened?

Pillow reads values incorrectly.

What are your OS, Python and Pillow versions?

  • OS: any OS (used Linux Ubuntu 20.04)
  • Python: any version (used 3.8)
  • Pillow: >=9.0.0 (used 9.1.0)

Code snippets

For all code snippets we use a small 15-bit JP2 image from sentinelhub package unit tests. For simplicity we compare only mean values.

  1. Reading with Pillow>=9.0.0 (used Pillow==9.1.0):

    from PIL import Image
    import numpy as np
    
    image = np.array(Image.open("img-15bit.jp2"))
    print(np.mean(image))
    
    >> 0.33467347487234617
  2. Reading with Pillow<9.0.0 (used Pillow==8.4.0):

    from PIL import Image
    import numpy as np
    
    image = np.array(Image.open("img-15bit.jp2"))
    image = image / 2  # A fix that produces correct values
    print(np.mean(image))
    
    >> 0.3041897339424886
  3. Reading with OpenCV (used opencv-python==4.5.5.64):

    import cv2
    import numpy as np
    
    image = np.array(cv2.imread("img-15bit.jp2", cv2.IMREAD_UNCHANGED))
    print(np.mean(image))
    
    >> 0.3041897339424886
  4. Reading with rasterio (used rasterio==1.2.10):

    import rasterio
    import numpy as np
    
    with rasterio.open("img-15bit.jp2", "r") as fp:
        image = np.array(fp.read())
    print(np.mean(image))
    
    >> 0.3041897339424886
@nulano
Copy link
Contributor

nulano commented Apr 10, 2022

This is the only change (edit: to JP2K decoding) I can see in the history between 8.4.0 and 9.0.0: 4b7b07d (a fix for big-endian architectures).

It doesn't look to me like that should affect little-endian, and I doubt you are on big-endian or you would not get the correct answer with 8.4.0 either but I can reproduce on x86_64, so I'm not sure what is the issue.

@radarhere
Copy link
Member

The changes to Jpeg2KDecode in #5901 are indeed causing the difference. I've created #6197 to fix this.

It would be good to add a test for this, so that we can ensure that this problem doesn't happen again. Do you have an image that we could add to our test suite, and distribute under the Pillow license?

@nulano
Copy link
Contributor

nulano commented Apr 11, 2022

Oh, I see now what the issue is, the x0 parameter is effectively divided by two be the smaller data size. So not only 15-bit that was affected, but any image with multiple tiles. Don't know if there is such an image, but for reference, this is the repository for test images for OpenJPEG: https://github.com/uclouvain/openjpeg-data

@AleksMat
Copy link
Author

AleksMat commented Apr 12, 2022

It would be good to add a test for this, so that we can ensure that this problem doesn't happen again. Do you have an image that we could add to our test suite, and distribute under the Pillow license?

Feel free to take the example image img-15bit.jp2 I referenced above. In sentinelhub package we use MIT license and the image originates from one of Sentinel-2 L1C products available on Copernicus SciHub. Any other image from there also uses the same encoding but most of them are very large.

@radarhere
Copy link
Member

I managed to derive a test image from one of our existing images, and have added it to the PR.

@AleksMat
Copy link
Author

AleksMat commented Jul 7, 2022

Thanks for improving this. However, the issue is not yet completely solved. It turns out the test image img-15bit.jp2 is even more special than the image issue_6194.j2k that was created as a test case for this fix. In case of img-15bit.jp2 Pillow==9.2.0 now produces the same result as Pillow<9.0.0 which is off by exactly a factor of 2. So we still have to do the following fix to get the correct values:

from PIL import Image
import numpy as np

image = np.array(Image.open("img-15bit.jp2"))
image = image / 2  # A fix that produces correct values
print(np.mean(image))

>> 0.3041897339424886

Is this something you would be interested to improve?

@radarhere
Copy link
Member

Just to be clear, you agree that we have fixed the regression. Your request is that we change the original behaviour.

If I run ImageMagick's convert command,

convert img-15bit.jp2 imagemagick.png

and then run

from PIL import Image
import numpy as np

image = np.array(Image.open("imagemagick.png"))
print(np.mean(image))

I get 0.6083794678849772 - the same value I get when running

from PIL import Image
import numpy as np

image = np.array(Image.open("img-15bit.jp2"))
print(np.mean(image))

It seems to me that ImageMagick agrees with us - so I would be interested in more evidence that Pillow is doing something wrong, beyond just OpenCV and rasterio producing a different result.

@AleksMat
Copy link
Author

AleksMat commented Jul 7, 2022

Yes, I confirm the regression has been fixed. 👍

It seems to me that ImageMagick agrees with us - so I would be interested in more evidence that Pillow is doing something wrong, beyond just OpenCV and rasterio producing a different result.

Fair point. So far I don't have enough low-level understanding of the encoding used in this special case and about the formal definition of JPEG2000 encoding. So I'm not able to explain what exactly the problem is or which tools are correct. I can only say that according to my domain-specific knowledge the values of satellite images read by Pillow are not correct unless we divide them by 2.

I'll let you know more if I ever get a better understanding about this. However, at the moment this seems to be a low-priority issue for us so it is possible I won't have time to dig into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants