Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for embedding indexed images #443

Merged
merged 23 commits into from May 27, 2022
Merged

Added support for embedding indexed images #443

merged 23 commits into from May 27, 2022

Conversation

RedShy
Copy link

@RedShy RedShy commented May 19, 2022

I added the support for embedding indexed images in a PDF. The code was already there, I just had to do some minor fixes. I also added a test to cover the new feature.

There is a problem though: previous tests now breaks.

The tests processed indexed images by converting them to RGBA and then embedding them in the PDF, now instead they embeds them directly as indexed images.

Not sure how to proceed now.

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

Copy link
Member

@Lucas-C Lucas-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job really! Well done, the code is clean and you seemingly reduced the size of the PDFs generated when they embed indexed images.

My idea is to generate the PDF again using generate=True.

Seems like the best approach here.
Just make sure the new reference PDF files are smaller than the previous ones.

test/image/png_indexed/test_png_indexed.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented May 20, 2022

Codecov Report

Merging #443 (838d33c) into master (1477519) will increase coverage by 0.12%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##           master     #443      +/-   ##
==========================================
+ Coverage   91.88%   92.00%   +0.12%     
==========================================
  Files          22       22              
  Lines        6408     6431      +23     
  Branches     1290     1298       +8     
==========================================
+ Hits         5888     5917      +29     
+ Misses        298      293       -5     
+ Partials      222      221       -1     
Impacted Files Coverage Δ
fpdf/image_parsing.py 88.07% <87.50%> (+1.90%) ⬆️
fpdf/fpdf.py 89.51% <100.00%> (+0.40%) ⬆️
fpdf/drawing.py 93.24% <0.00%> (-0.15%) ⬇️
fpdf/svg.py 96.96% <0.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1477519...838d33c. Read the comment docs.

@RedShy
Copy link
Author

RedShy commented May 20, 2022

Good job really! Well done

Thank you! Much appreciated :)

I observed that if I change the image filter to "JPXDecode" or "DCTDecode" it throws an error when executing img.save(compressed_bytes, format="JPEG2000") so I forced the conversion to RGBA in case of image_filter != "FlateDecode" and image mode is "P" or "PA".

It also seems that Pillow doesn't work well with "PA" images.
I used pngquant to create an indexed-alpha PNG, but Pillow still opens it as a "P" image without alpha channel. I then tried to convert it to "PA" but the result embedded image is:

To make sure that the code regarding the alpha channel works correctly, I manually set the alpha channel:

palette = img.palette
a = np.asarray(img)
for r in range(a.shape[0]):
   for c in range(a.shape[1]):
       a[r][c][1] = 100
img = Image.fromarray(a, mode="PA")
img.palette = palette

And the result image is without artifacts:

@RedShy RedShy marked this pull request as ready for review May 20, 2022 17:19
@Lucas-C
Copy link
Member

Lucas-C commented May 20, 2022

Thank you for you work on this!
I think I'll take the time to review this on Monday morning.
Have a good week-end 😊

PS: don't forget to add a mention of this in the CHANGELOG.md

@RedShy
Copy link
Author

RedShy commented May 21, 2022

I think I'll take the time to review this on Monday morning.

Sure! Thank you! Have a good week-end you too 😊

test/errors/test_FPDF_errors.py Outdated Show resolved Hide resolved
test/image/image_info.json Outdated Show resolved Hide resolved
test/image/png_images/test_png_file.py Show resolved Hide resolved
test/image/test_image_info.py Show resolved Hide resolved
test/image/png_indexed/test_png_indexed.py Outdated Show resolved Hide resolved
@Lucas-C
Copy link
Member

Lucas-C commented May 23, 2022

I observed that if I change the image filter to JPXDecode or DCTDecode it throws an error when executing img.save(compressed_bytes, format="JPEG2000")

What was the error exactly?

It also seems that Pillow doesn't work well with "PA" images.
I used pngquant to create an indexed-alpha PNG, but Pillow still opens it as a "P" image without alpha channel.

Yeah, I've made some tests and Pillow definitively has trouble with those images...
It may be worth reporting them this issue btw.
Regarding fpdf2, I don't have any advice/guidance regarding this case, I think you did well in this PR!

@RedShy
Copy link
Author

RedShy commented May 23, 2022

It may be worth reporting them this issue btw.

In the next days I will try open an issue in the Pillow's github page

I think you did well in this PR!

Thank you!

What was the error exactly?

image_filter=JPXDecode: OSError: encoder error -2 when writing image file executing img.save(compressed_bytes, format="JPEG2000")

Traceback (most recent call last):
  File "C:\Program Files\Python3.10\lib\site-packages\PIL\ImageFile.py", line 500, in _save
    fh = fp.fileno()
io.UnsupportedOperation: fileno

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Progetti OpenSource\playground_fpdf2\coverage.py", line 80, in <module>
    test_png_indexed_files(None)
  File "D:\Progetti OpenSource\playground_fpdf2\coverage.py", line 53, in test_png_indexed_files
    pdf.image(HERE / "palette_2.png", x=80, y=10, w=50, h=50)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\fpdf.py", line 299, in wrapper
    return fn(self, *args, **kwargs)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\fpdf.py", line 3235, in image
    info = get_img_info(img, self.image_filter)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\image_parsing.py", line 92, in get_img_info
    info["data"] = _to_data(img, image_filter)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\image_parsing.py", line 151, in _to_data
    img.save(compressed_bytes, format="JPEG2000")
  File "C:\Program Files\Python3.10\lib\site-packages\PIL\Image.py", line 2300, in save
    save_handler(self, fp, filename)
  File "C:\Program Files\Python3.10\lib\site-packages\PIL\Jpeg2KImagePlugin.py", line 348, in _save
    ImageFile._save(im, fp, [("jpeg2k", (0, 0) + im.size, 0, kind)])
  File "C:\Program Files\Python3.10\lib\site-packages\PIL\ImageFile.py", line 526, in _save
    raise OSError(f"encoder error {s} when writing image file") from exc
OSError: encoder error -2 when writing image file

image_filter=DCTDecode: OSError: cannot write mode P as JPEG executing img.save(compressed_bytes, format="JPEG")

Traceback (most recent call last):
  File "C:\Program Files\Python3.10\lib\site-packages\PIL\JpegImagePlugin.py", line 633, in _save
    rawmode = RAWMODE[im.mode]
KeyError: 'P'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Progetti OpenSource\playground_fpdf2\coverage.py", line 80, in <module>
    test_png_indexed_files(None)
  File "D:\Progetti OpenSource\playground_fpdf2\coverage.py", line 56, in test_png_indexed_files
    pdf.image(HERE / "palette_3.png", x=150, y=10, w=50, h=50)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\fpdf.py", line 299, in wrapper
    return fn(self, *args, **kwargs)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\fpdf.py", line 3235, in image
    info = get_img_info(img, self.image_filter)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\image_parsing.py", line 92, in get_img_info
    info["data"] = _to_data(img, image_filter)
  File "D:\Progetti OpenSource\forked_fpdf2\fpdf2\fpdf\image_parsing.py", line 146, in _to_data
    img.save(compressed_bytes, format="JPEG")
  File "C:\Program Files\Python3.10\lib\site-packages\PIL\Image.py", line 2300, in save
    save_handler(self, fp, filename)
  File "C:\Program Files\Python3.10\lib\site-packages\PIL\JpegImagePlugin.py", line 635, in _save
    raise OSError(f"cannot write mode {im.mode} as JPEG") from e
OSError: cannot write mode P as JPEG

@RedShy
Copy link
Author

RedShy commented May 27, 2022

In the last days I searched in the code of Pillow and I discovered I had the false assumption that "P" images don't support transparency and "PA" yes. Instead, both support transparency:

  • "P" support transparency through the attribute info["transparency"]: (source)
    • int: is the color in the palette regarded as "transparent", others are full opaque
    • bytes: a byte string with one alpha value for each palette entry
  • "PA" support transparency through one alpha value for each pixel

Seems that Pillow has a problem in generating the alpha layer when converting from "P" to "PA". This doesn't happen in the conversion "P" to "RGBA". I opened the issue and the relative PR that should fix it.

Regarding fpdf2 and "P" images, currently the transparency is not ported in the PDF and for doing this I see 2 approaches:

  • Implement a homebrew solution using info["transparency"]
  • Use Pillow to get the alpha channel of "P" images and reuse the existing code. This can be achieved by converting the image in "RGBA" and extracting the alpha channel. This is the solution I implemented now.

Regarding "PA" images, the code in the fpdf2 seems to work correctly if the alpha layer is correctly set. So I suggest we keep supporting them.
I added tests in which I test the "PA" images by inserting the alpha channel extracted from the converted "RGBA" image and they are displayed correctly in the PDF.

(I also changed the sample images in the test putting some pics of flowers that I took near my house 😊)

@Lucas-C
Copy link
Member

Lucas-C commented May 27, 2022

Wow, you did an excellent job!
You even fixed the bug upstream in Pillow, awesome!

I'm going to fix the minor file conflicts that prevent this PR from being merged,
and see if it can be closed afterwards.

fpdf/image_parsing.py Outdated Show resolved Hide resolved
test/errors/test_FPDF_errors.py Outdated Show resolved Hide resolved
test/image/png_indexed/test_png_indexed.py Outdated Show resolved Hide resolved
test/image/png_indexed/test_png_indexed.py Show resolved Hide resolved
test/image/png_indexed/test_png_indexed.py Show resolved Hide resolved
@RedShy
Copy link
Author

RedShy commented May 27, 2022

Wow, you did an excellent job!
You even fixed the bug upstream in Pillow, awesome!

Thank you, that means a lot to me! it's really rewarding and encourages me to do more! 😊

@Lucas-C
Copy link
Member

Lucas-C commented May 27, 2022

It seems like some tests are failing:

FAILED test/html/test_html.py::test_img_inside_html_table - AssertionError: c...
FAILED test/html/test_html.py::test_img_inside_html_table_without_explicit_dimensions
FAILED test/html/test_html.py::test_img_inside_html_table_centered - Assertio...
FAILED test/html/test_html.py::test_img_inside_html_table_centered_with_align
FAILED test/image/test_image_info.py::test_get_img_info - assert {'bpc': 8,\n...
FAILED test/image/png_images/test_png_file.py::test_insert_png_files - Assert...
================= 6 failed, 914 passed, 18 warnings in 42.17s ==================

Do you have the same errors on your computer?

@RedShy
Copy link
Author

RedShy commented May 27, 2022

Do you have the same errors on your computer?

Yes I forgot to run all the tests again

@Lucas-C Lucas-C merged commit d2fa052 into py-pdf:master May 27, 2022
@Lucas-C
Copy link
Member

Lucas-C commented May 27, 2022

Merged! Thank you @RedShy 😊

@RedShy RedShy deleted the palette_images branch May 27, 2022 17:08
@Lucas-C
Copy link
Member

Lucas-C commented Jun 17, 2022

This feature has been included in the new v2.5.5 release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants