Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raster image rendering performance dropped #1444

Closed
SvenBecker opened this issue Sep 10, 2021 · 11 comments
Closed

Raster image rendering performance dropped #1444

SvenBecker opened this issue Sep 10, 2021 · 11 comments
Labels
performance Too slow renderings
Milestone

Comments

@SvenBecker
Copy link

The rendering speed of raster images dropped since WeasyPrint version 53, see benchmarks in #1439.

@liZe liZe added the performance Too slow renderings label Sep 10, 2021
@liZe
Copy link
Member

liZe commented Sep 13, 2021

The problem is particularly important with images other than JPEGs, as we have to convert them into JPEG2000 to embed them in the PDF.

Maybe there’s a better solution. I’ve read that some PNG files can also be embedded with no conversion.

@aschmitz
Copy link
Contributor

I'll also mention that Firefox's PDF.js is also very slow at decoding JPEG2000 images, so this affects both rendering and viewing speed in some cases.

I believe you're right that certain kinds of PNG can be embedded without significant image conversion (assuming a compatible filter, color choices, etc.), but if we're trying to load raster graphics (other than JPEG or JPEG2000), it might be fastest to convert to that subset of PNG that can be directly embedded, rather than JPEG2000 (which is a fairly complicated and slow format to encode).

@liZe
Copy link
Member

liZe commented Oct 2, 2021

I'll also mention that Firefox's PDF.js is also very slow at decoding JPEG2000 images, so this affects both rendering and viewing speed in some cases.

That’s sad 😢.

it might be fastest to convert to that subset of PNG that can be directly embedded, rather than JPEG2000

That would be the best solution. I’ve spent some time playing with the PIL to get the information needed to include the PNG files, but I’m not sure that it’s the right tool for this job. Could a PNG-in-PDF guru help me for this issue 😄?

@liZe liZe changed the title raster image rendering performance dropped Raster image rendering performance dropped Oct 6, 2021
@gnyers
Copy link

gnyers commented Oct 26, 2021

Continuing the conversation from #1475.

@liZe: you mention in the other ticket that you could analyze the document. Unfortunately I can't share it, but if you can provide some instructions/pointers/code here, I'd be happy to give it a try myself. I've some experience with generating PDFs from PyPDF2 and can find my way in a PDF doc.

@liZe
Copy link
Member

liZe commented Oct 27, 2021

@liZe: you mention in the other ticket that you could analyze the document. Unfortunately I can't share it, but if you can provide some instructions/pointers/code here, I'd be happy to give it a try myself. I've some experience with generating PDFs from PyPDF2 and can find my way in a PDF doc.

Before this return, you can check the size of the generated JPEG2000 image (using len(image_file.getvalue()) I think), and compare it to the size of your original images. If it’s really larger, then the problem comes from images.

If you could share at least one image, it would help to find why there’s such a difference between the PNG and the JPEG2000 files.

@vojkny
Copy link

vojkny commented Oct 27, 2021

Can you give us more context to what caused this in v53? Is there some different tooling? What was the motivation for this change?

@liZe
Copy link
Member

liZe commented Oct 27, 2021

Can you give us more context to what caused this in v53? Is there some different tooling? What was the motivation for this change?

As explained in this article, we removed Cairo in version 53. Cairo was in charge, with GDK-Pixbuf, to include images into the PDF. Now that we build the PDF without Cairo, we have to take care of images by ourselves. We now include all the images in JPG or JPG2000, because they can easily be embedded in the PDF. PNG images can’t be embedded directly, so we transform them into JPG2000, but this conversion is slow and the resulting image is bigger. We have to find a solution to include PNG files directly, but it’s not easily possible and we have to find how other tools do.

@summersz
Copy link

Perhaps the top answer and linked project from this question on stack overflow create-small-high-quality-pdf-embedding-optimized-png could be of interest?

This file in particular from the project:
png.py

@liZe
Copy link
Member

liZe commented Oct 27, 2021

Perhaps the top answer and linked project from this question on stack overflow create-small-high-quality-pdf-embedding-optimized-png could be of interest?

Yes, we’ve already seen this topic on SO (among other sources, the issue seems to be quite well documented at different places).

Rinohtype is released under AGPL, so we shouldn’t look at the code too closely 😉. But we can at least see that it’s complicated. As explained earlier, I’ve spent some time playing with the PIL to get the right information and see if a PNG file can be embedded or not, but I’m not sure that it’s the right tool.

@liZe
Copy link
Member

liZe commented Oct 30, 2021

#1481 should help a lot. Tests with the current master are welcome!

@mpth
Copy link

mpth commented Nov 1, 2021

Just ran a test with a real-world document. It had ~31 images pointing to PNG files and also 515 PNG images embedded via data-uri (src="data:image/png...")

Size before #1481
37.5 MB

Size after #1481
4.7 MB

Looks like an improvement to me 🎉

Thanks @liZe @aschmitz ❤️

@liZe liZe added this to the 55.0 milestone Jan 3, 2022
@grewn0uille grewn0uille modified the milestones: 55.0, 54.0 Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

No branches or pull requests

8 participants