Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pillow 10.3.0 breaks test_filters.test_rgba #2568

Open
stefan6419846 opened this issue Apr 2, 2024 · 3 comments
Open

Pillow 10.3.0 breaks test_filters.test_rgba #2568

stefan6419846 opened this issue Apr 2, 2024 · 3 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow

Comments

@stefan6419846
Copy link
Collaborator

Running the tests with Pillow==10.3.0 breaks test_filters.test_rgba. Pillow==10.2.0 works correctly.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('cryptography', '42.0.5'), PIL=10.3.0

Code + PDF

Just run pytest -k 'test_rgba'.

Expected image:

tika-972174_p0-im0

Actual image:

file

Traceback

This is the complete traceback I see:

__________________________________ test_rgba ___________________________________
[gw3] linux -- Python 3.12.2 /opt/hostedtoolcache/Python/3.12.2/x64/bin/python

    @pytest.mark.enable_socket()
    def test_rgba():
        """Decode rgb with transparency"""
        reader = PdfReader(BytesIO(get_data_from_url(name="tika-972174.pdf")))
        data = reader.pages[0].images[0]
        assert ".jp2" in data.name
        similarity = image_similarity(
            data.image, BytesIO(get_data_from_url(name="tika-972174_p0-im0.png"))
        )
>       assert similarity > 0.99
E       assert 0.6877076861263712 > 0.99

tests/test_filters.py:380: AssertionError
@stefan6419846 stefan6419846 added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels Apr 2, 2024
@stefan6419846
Copy link
Collaborator Author

There is an upstream fix available as a PR for the next Pillow release which fixes this.

This slightly breaks test_filters.test_rgba and test_workflows.py.test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/972/972174.pdf-tika-972174.pdf], but this can be fixed by setting ImageFile.LOAD_TRUNCATED_IMAGES = True for the scope of the corresponding test method.

I am not sure whether we should ban Pillow==10.3.0 from pypdf for now or whether we consider this an issue which does not occur too often and have no control over it anyway. From my perspective, I would probably not restrict this for now.

@pubpub-zz
Copy link
Collaborator

@stefan6419846 can you confirm that the transparency is correct?

@stefan6419846
Copy link
Collaborator Author

@pubpub-zz The alpha masking is done in a separate step and looks correct.

This is the newly rendered image after applying the patch:

file

The file size differs slightly, but I could not see any real visual difference when comparing it to the reference image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

2 participants