Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: only convert GIF frames if palette actually changes #5974

Closed
wants to merge 1 commit into from

Conversation

josch
Copy link

@josch josch commented Jan 18, 2022

This is a WIP pull request which avoids converting image frame of GIF images in case all earlier palettes of the GIF are actually the same.

I wrote this because the suggestions by @radarhere in #5966 (comment) doesn't work for some images which already get converted in the _seek function. Thus, this pull request introduces a new variable that keeps track of the last used palette and only converts images if any of the former frames changed the palette (including the global palette).

What do you think?

@josch josch force-pushed the gif_same_palette branch 3 times, most recently from 1fbaf9c to e576c3d Compare January 18, 2022 21:05
@josch
Copy link
Author

josch commented Jan 18, 2022

Some dispose-related tests are still failing and I'm working on that (suggestions welcome!).

The advantages of only changing to the new behaviour if needed (i.e. if the palette does change from one frame to the next) are:

  • faster because we have to run convert less often
  • on all my test input the palette images were smaller than the images after they were converted to RGB(A)
  • software like img2pdf does not need to add special-casing to be compatible with both Pillow 9.0.0 and Pillow before 9.0.0. Other pieces of software might be in a similar situation.

@radarhere
Copy link
Member

radarhere commented Jan 18, 2022

software like img2pdf does not need to add special-casing to be compatible with both Pillow 9.0.0 and Pillow before 9.0.0

Could you provide some more detail on this point? If I'm understanding correctly, the img2pdf tests are failing just because they expect an indexed colorspace and are instead finding RGB. That doesn't sound like a problem for end users, just a problem for the test suite.

#5966 (comment)

what can software do that wants to really figure out whether frame X of an image is P or RGB?

At the moment in the main branch, if you seek past the first frame of a GIF, it will be either RGB or RGBA.

My primary concern with this PR is that it would no longer be predictable what mode the image would be in that scenario. It might have the same palette and be P, it might not and be RGB. Surely that sounds like it would make situations like the one happening in your test suite worse, not better.

Edit: To add some context, we've also had another user request that we go the other way, #5929, and set every frame to RGB.

@radarhere radarhere added the GIF label Jan 18, 2022
@josch
Copy link
Author

josch commented Jan 19, 2022

software like img2pdf does not need to add special-casing to be compatible with both Pillow 9.0.0 and Pillow before 9.0.0

Could you provide some more detail on this point? If I'm understanding correctly, the img2pdf tests are failing just because they expect an indexed colorspace and are instead finding RGB. That doesn't sound like a problem for end users, just a problem for the test suite.

Yes, this is a minor argument and only becomes the larger argument if there is more software or users having similar expectations. The bigger argument is, that the files produced by img2pdf are now bigger because they are not palette images anymore even though they could be palette images because all frames share the same palette.

The larger file size and the additional processing cost (due to converting) are my main arguments.

#5966 (comment)

what can software do that wants to really figure out whether frame X of an image is P or RGB?

At the moment in the main branch, if you seek past the first frame of a GIF, it will be either RGB or RGBA.

My primary concern with this PR is that it would no longer be predictable what mode the image would be in that scenario. It might have the same palette and be P, it might not and be RGB. Surely that sounds like it would make situations like the one happening in your test suite worse, not better.

But it would be predictable. If the input GIF has the same palette for all frames, then all frames would be P. I'd just add different test cases for the different situations.

Really, the img2pdf testsuite is of little concern here. If subsequent images have the same palette as all the frames before, then that's an advantage for filesize (because we can retain the palette when converting) and for processing speed (because frames don't have to be converted from P to RGB).

EDIT: If I want to obtain the same small file sizes for input images where all frames have the same palette with 9.0.0, then I'd now have to analyze all frames of the GIF after the first and check whether their colors are the same as in the first frame in which case they can become palette images. I'd not have to do that if Pillow would not throw away the information that the frame could be of type P if the palettes are all the same. So right now I have the choice between:

  • live with larger file sizes for GIF frames that are now stored as RGB instead of P
  • add code that checks whether subsequent frames could become P again by finding out if the frame contains less than 256 colors and thus making the overall execution slower but potentially generate files that are as small as they were with Pillow 8.4

This could be avoided if Pillow would not throw away the information that subsequent frames could be stored as P in case the palette does not change and only convert to RGB if the palette changes.

@radarhere radarhere changed the title WIP: src/PIL/GifImagePlugin.py: only convert image if palettes actually changes WIP: only convert GIF frames if palettes actually changes Jan 19, 2022
@radarhere radarhere changed the title WIP: only convert GIF frames if palettes actually changes WIP: only convert GIF frames if palette actually changes Jan 19, 2022
@radarhere
Copy link
Member

the files produced by img2pdf are now bigger

Just thinking aloud here for a second. Presuming you were talking about img2pdf's animation.gif, I ran

from PIL import Image
with Image.open("animation.gif") as im:
    im.save("out.gif", save_all=True)

and found that the saved GIF is actually smaller with Pillow 9.0 than it is with Pillow 8.4.0. That seemed to go against your point, but then I realised - you're not talking about Pillow resaving a GIF, you're talking about img2pdf saving a GIF.

@wiredfool
Copy link
Member

What is img2pdf actually doing with the multiple frames of gif images?

@josch
Copy link
Author

josch commented Jan 21, 2022

What is img2pdf actually doing with the multiple frames of gif images?

For images that contain more than one frame (so not only for GIF images) img2pdf outputs one page per frame unless the --first-frame-only option is given.

Thus, I very much welcome the fixes to Pillow concerning the automatic conversion to RGB if 256 colors are not enough to represent a frame. I think that's awesome! I just want to retain the palette in those cases were it does not change because storing an RGB image (even after paeth filter and zlib compression) in PDF requires more space than the palette based image.

Sorry, that I haven't pushed more fixes to this branch yet. I'll find more time to work on this during the weekend.

@josch
Copy link
Author

josch commented Jan 23, 2022

Hi,
I worked on it a bit more but I'm stuck with four tests failing:

FAILED Tests/test_file_gif.py::test_transparent_dispose - assert 2 == 1
FAILED Tests/test_file_gif.py::test_dispose_previous_first_frame - AssertionError: got different content
FAILED Tests/test_file_gif.py::test_dispose2_diff - assert (0, 0, 0, 255) == (255, 255, 255, 0)
FAILED Tests/test_file_gif.py::test_missing_background - AssertionError: got different content

Maybe somebody has some time to look into this and find out where I made a mistake?

Thanks!

@mara004
Copy link

mara004 commented Mar 19, 2022

Hi,
I was wondering whether there is any progress on this topic?
Thanks!

@radarhere
Copy link
Member

I've created PR #6150 as an alternative to this. In it, I've added a setting to control this behaviour. Running this following code with that PR should fix the img2pdf tests.

from PIL import GifImagePlugin
GifImagePlugin.PALETTE_TO_RGB = GifImagePlugin.ModeStrategy.DIFFERENT_PALETTE_ONLY 

However, I'm having trouble confirming that, due to https://gitlab.mister-muffin.de/josch/img2pdf/issues/132. If you would like to confirm that my change resolves this?

@mara004
Copy link

mara004 commented Mar 22, 2022

Thanks, that sounds great. I installed Pillow from your pull request:

gh pr checkout 6150
pip3 install . -v

and then set the configuration:

GifImagePlugin.PALETTE_TO_RGB = GifImagePlugin.ModeStrategy.DIFFERENT_PALETTE_ONLY 

I can confirm that all GIF tests in question pass now (in my local, patched img2pdf variant).

(I think you could also check yourself if you downgrade pikepdf to an earlier version. @josch says that 4.2.0 should work.)

@josch
Copy link
Author

josch commented Mar 22, 2022

Thanks for testing @mara004 -- I tried testing it too but now other tests fail. Minimal reproducer by using this image:

https://gitlab.mister-muffin.de/josch/img2pdf/src/branch/main/src/tests/input/animation.gif

And then running:

src/img2pdf.py -v src/tests/input/animation.gif -o out.pdf
[...]
ERROR:__main__:error: local variable 'frame_dispose_extent' referenced before assignment
Traceback (most recent call last):
  File "/home/josch/annex/personal/projects/img2pdf/src/img2pdf.py", line 3980, in main
    convert(
  File "/home/josch/annex/personal/projects/img2pdf/src/img2pdf.py", line 2280, in convert
    ) in read_images(
  File "/home/josch/annex/personal/projects/img2pdf/src/img2pdf.py", line 1654, in read_images
    imgdata.seek(img_page_count)
  File "/usr/lib/python3/dist-packages/PIL/GifImagePlugin.py", line 147, in seek
    self._seek(f)
  File "/usr/lib/python3/dist-packages/PIL/GifImagePlugin.py", line 319, in _seek
    self.dispose_extent = frame_dispose_extent
UnboundLocalError: local variable 'frame_dispose_extent' referenced before assignment

Can you confirm @mara004?

@radarhere thanks a lot for #6150 -- I'll continue to test your work. To work around the pikepdf issue in img2pdf you can either downgrade to 4.2.0 as @mara004 already suggested or apply the following patch:

--- a/src/img2pdf.py
+++ b/src/img2pdf.py
@@ -1108,7 +1110,10 @@ class pdfdoc(object):
 
         # the /OpenAction array must contain the page as an indirect object
         if self.engine == Engine.pikepdf:
-            initial_page = self.writer.make_indirect(initial_page)
+            if isinstance(initial_page, pikepdf.Page):
+                initial_page = self.writer.make_indirect(initial_page.obj)
+            else:
+                initial_page = self.writer.make_indirect(initial_page)
 
         if self.magnification == Magnification.fit:
             catalog[PdfName.OpenAction] = PdfArray([initial_page, PdfName.Fit])

@mara004
Copy link

mara004 commented Mar 22, 2022

No, I can't reproduce your issue, at least not with the state of my local repository. For me, converting animation.gif works just fine and all tests pass. Your traceback suggests that the issue is happening down in Pillow whilst seeking to the next frame. Maybe we installed a different state of the pull request?

@radarhere
Copy link
Member

@mara004 thanks for testing

@josch I applied the patch, and then the 4 failing tests from #5966 passed, so thanks for that.

As for

  File "/usr/lib/python3/dist-packages/PIL/GifImagePlugin.py", line 319, in _seek
    self.dispose_extent = frame_dispose_extent
UnboundLocalError: local variable 'frame_dispose_extent' referenced before assignment

I'm having trouble seeing how this is possible with my branch.

frame_dispose_extent = None

comes before
self.dispose_extent = frame_dispose_extent

So it should always be set.

@josch
Copy link
Author

josch commented Mar 22, 2022

Thank you @radarhere! Feel free to close this pull request in favor of #6150 -- thanks again! 👍

@josch
Copy link
Author

josch commented Mar 22, 2022

@radarhere I can now confirm the results of @mara004 and #6150 lets the img2pdf testsuite pass just fine after setting GifImagePlugin.PALETTE_TO_RGB = GifImagePlugin.ModeStrategy.DIFFERENT_PALETTE_ONLY. Thus, I'm closing this pull request in favor of #6150. Thanks!

@josch josch closed this Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants