Image with Filter "[/FlateDecode/JPXDecode]" not extracted #2087

ghost · 2022-11-28T09:33:28Z

Describe the bug (mandatory)

I'm trying to iterate over all images in the document, but for one filter combination it fails.
PyMuPDF works correctly for all filter combinations except Filter [/FlateDecode/JPXDecode]. Pdfs with such image filters are correctly read by pdf readers and other python pdf libs, but PyMuPDF fails to extract image and get correct filters by xref_get_key(xref, "/Filters").

these:
<</ColorSpace/DeviceRGB/BitsPerComponent 8/Width 1672/Length 1964389/Height 1124/Name/im1/Subtype/Image/Type/XObject/Filter/JPXDecode>>

and

<</ID 26 0 R/Type/XObject/Length 476/Filter[/FlateDecode/DCTDecode]/Subtype/Image/BitsPerComponent 8/Width 126/Height 81/ColorSpace/DeviceRGB>>

are ok

this
<</ColorSpace/DeviceGray/BitsPerComponent 8/Width 67/Length 1958/Height 68/Name/im2/Subtype/Image/Type/XObject/Filter[/FlateDecode/JPXDecode]>>

fails

However, document.xref_stream(xref) correctly decompresses the stream and output is valid jpeg2000 stream.

To Reproduce (mandatory)

    document = fitz.Document(srcFileName)
    allXrefsLength = document.xref_length()
    for xref in range(1, allXrefsLength):
        if document.xref_get_key(xref, "Subtype")[1] != "/Image":
            continue

        imgDict = document.extract_image(xref)
        if not imgDict:
            tmpFilters = document.xref_get_key(xref, "/Filters")
            print("subtype of xref {0} is /Image, but pymupdf can not extract it as image. filters: {1}".format(xref, tmpFilters))

output for images with such filters:
subtype of xref 76 is /Image, but pymupdf can not extract it as image. filters: ('null', 'null')

Your configuration (mandatory)

Windows 10 x64, python 3.10, pymupdf 1.21, installed by pip install pymupdf

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2022-11-28T09:53:53Z

Duplicate of #1995: the maximum supported values for image width and height currently are 2**16 = 65,536.
See that issue for details.

For the other part of the issue, extraction of the /Filter value, please allow me an example.

ghost · 2022-11-28T09:56:49Z

Duplicate of #1995: the maximum supported values for image width and height currently are 2**16 = 65,536. See that issue for details.

It happens with all image sizes, in my example it is 67 x 68 px. Only the filter combination matters.

JorjMcKie · 2022-11-28T10:00:06Z

Does doc.xref_get_key(xref, "Filter") work correctly?
Only image extraction is the problem?

JorjMcKie · 2022-11-28T10:13:37Z

You have a typo in extracting the filter names: /Filters does not exist, only /Filter.

JorjMcKie · 2022-11-28T12:43:09Z

Please provide a reproducer file.

ghost · 2022-11-28T15:58:31Z

You have a typo in extracting the filter names: /Filters does not exist, only /Filter.

My bad :( "Filter" works correctly, so image extraction is the only problem,
Attached sample file.

massage.pdf

JorjMcKie · 2022-11-28T17:56:44Z

Thanks for the file.
Fixed it.
Will be published with next version.

ghost · 2022-11-28T17:57:30Z

Thank you!

Issue 2087: `fitz.i (extract_image)´: the type of JPX images with more than one `/Filter` are not correctly recognized if inspecting the raw stream. Fixing this by extracting the decoded stream: we already know the type from the PDF dict. Issue 2094: Rectangle recognition `(helper-devices.i (jm_checkrect())` was wrong in not confirming that also x-coordinates are the same in respective corners. Also simplified rectangle orientation detection.

This reverts commit 899ac3e.

Issue 2087: `fitz.i (extract_image)´: the type of JPX images with more than one `/Filter` are not correctly recognized if inspecting the raw stream. Fixing this by extracting the decoded stream: we already know the type from the PDF dict. Issue 2094: Rectangle recognition `(helper-devices.i (jm_checkrect())` was wrong in not confirming that also x-coordinates are the same in respective corners. Also simplified rectangle orientation detection.

Fix #2110 (Discussion item #2111): File `__main__.py` - also include the font's xref in the generated file name. Fix #2094: File `helper-device.i' - also ensure equality of x coordinates of relevant corners before assuming a rectangle. Fix #2087: File `fitz.i`- if JPX image format is already known, make sure to read the decoded image stream, instead of raw stream in the other cases.

Fix pymupdf#2110 (Discussion item pymupdf#2111): File `__main__.py` - also include the font's xref in the generated file name. Fix pymupdf#2094: File `helper-device.i' - also ensure equality of x coordinates of relevant corners before assuming a rectangle. Fix pymupdf#2087: File `fitz.i`- if JPX image format is already known, make sure to read the decoded image stream, instead of raw stream in the other cases.

Fix #2110 (Discussion item #2111): File `__main__.py` - also include the font's xref in the generated file name. Fix #2094: File `helper-device.i' - also ensure equality of x coordinates of relevant corners before assuming a rectangle. Fix #2087: File `fitz.i`- if JPX image format is already known, make sure to read the decoded image stream, instead of raw stream in the other cases.

julian-smith-artifex-com · 2022-12-13T14:30:58Z

Fixed in PyMuPDF-1.21.1.

JorjMcKie added the duplicate label Nov 28, 2022

JorjMcKie changed the title ~~"Filter [/FlateDecode/JPXDecode]" is extracted as "('null', 'null')" and document.extract_image(xref) fails for such images~~ Image with Filter "[/FlateDecode/JPXDecode]" not extracted Nov 28, 2022

JorjMcKie added bug Fixed in next release and removed duplicate labels Nov 28, 2022

JorjMcKie mentioned this issue Nov 28, 2022

Multiple Fixes of open Issues #2091

Closed

JorjMcKie added a commit that referenced this issue Nov 30, 2022

Revert "Fixes #2094 and #2087"

8867791

This reverts commit 899ac3e.

JorjMcKie added a commit that referenced this issue Nov 30, 2022

Revert "Fixes #2094 and #2087"

5985fb9

This reverts commit 899ac3e.

julian-smith-artifex-com removed the Fixed in next release label Dec 13, 2022

julian-smith-artifex-com closed this as completed Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image with Filter "[/FlateDecode/JPXDecode]" not extracted #2087

Image with Filter "[/FlateDecode/JPXDecode]" not extracted #2087

ghost commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022 •

edited

ghost commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

ghost commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

ghost commented Nov 28, 2022

julian-smith-artifex-com commented Dec 13, 2022

Image with Filter "[/FlateDecode/JPXDecode]" not extracted #2087

Image with Filter "[/FlateDecode/JPXDecode]" not extracted #2087

Comments

ghost commented Nov 28, 2022

Describe the bug (mandatory)

To Reproduce (mandatory)

Your configuration (mandatory)

JorjMcKie commented Nov 28, 2022 • edited

ghost commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

ghost commented Nov 28, 2022

JorjMcKie commented Nov 28, 2022

ghost commented Nov 28, 2022

julian-smith-artifex-com commented Dec 13, 2022

JorjMcKie commented Nov 28, 2022 •

edited