Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow svg rendering since version 53.0 #1439

Closed
SvenBecker opened this issue Sep 6, 2021 · 9 comments
Closed

Very slow svg rendering since version 53.0 #1439

SvenBecker opened this issue Sep 6, 2021 · 9 comments
Labels
performance Too slow renderings
Milestone

Comments

@SvenBecker
Copy link

SvenBecker commented Sep 6, 2021

The time it took to create a pdf with multiple svg images increased drastically.
We used to create some figures via matplotlib and base64 encode those images, so we do not have to create image files:

from io import BytesIO
import base64

from matplotlib.figure import Figure
import weasyprint

fig = Figure(...)
...
buffer = BytesIO()
fig.savefig(buffer, format='svg')
img_str = base64.b64encode(buffer.getbuffer()).decode("ascii")
html_image = f"<img src='data:image/svg+xml;base64,{img_str}' />"
weasyprint.HTML(string=f"<html><head></head><body>{html_image}</body></html>").write_pdf(...)

We also tried to use svg html tags (without base64 encoding) instead but this took way too long as well (plus there is some issue with namespaces (ElementTree.fromstring exception) so we had to use some regex sub stuff to remove metadata an namespace data from the svg xml).

Results from cProfile for the method write_pdf (the underlying draw methods got called 15 times each):

  • Prior to 53.0:
    time: 8602 ms, 31.7%
  • After 53.* update:
    time: 28502 ms, 71.4%
  • Using html svg tags:
    time: 23376 ms, 68.7%

Those stats got worse the more svg images where included.

@liZe liZe added the performance Too slow renderings label Sep 6, 2021
@liZe
Copy link
Member

liZe commented Sep 6, 2021

Hello!

Thanks for this bug report.

I’ve tried to reproduce the problem with a small SVG sample (included a lot of times in the HTML sample), and I have no significant difference between 52.x and 53.x.

So, I’ve tried with a random matplotlib graph and … I got no difference either 😒.

So… Your problem probably comes from something special in the SVG files you generate. Could you please share one of the SVG files you use?

@SvenBecker
Copy link
Author

SvenBecker commented Sep 8, 2021

Hey thank you for the fast response. I also did some testing:

import base64
import time
import io

import numpy as np
import weasyprint
from matplotlib.figure import Figure

np.random.seed(7)
data = np.random.randint(0, 1_000_000, size=(10_000, 2))

fig = Figure()
ax = fig.add_subplot()
ax.plot(data)


buffer_svg = io.BytesIO()
fig.savefig(buffer_svg, format='svg')
img_data_svg = base64.b64encode(buffer_svg.getbuffer()).decode("ascii")
html_str_svg = f"""
<html>
  <head></head>
  <body>
    <img src='data:image/svg+xml;base64,{img_data_svg}' />
  </body>
</html>
"""
html_svg = weasyprint.HTML(string=html_str_svg)

duration = []
for _ in range(10):
    start = time.time()
    html_svg.write_pdf('example.pdf')
    duration.append(time.time() - start)
print(f'svg stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}')

buffer_png = io.BytesIO()
fig.savefig(buffer_png, format='png')
img_data_png = base64.b64encode(buffer_png.getbuffer()).decode("ascii")
html_str_png = f"""
<html>
  <head></head>
  <body>
    <img src='data:image/png;base64,{img_data_png}' />
  </body>
</html>
"""
html_png = weasyprint.HTML(string=html_str_png)

duration = []
for _ in range(10):
    start = time.time()
    html_png.write_pdf('example.pdf')
    duration.append(time.time() - start)
print(f'\npng stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}')

WeasyPrint version 53.3:

svg stats
avg=0.7405799388885498, min=0.6415700912475586, max=0.857792854309082

png stats
avg=0.30138554573059084, min=0.24124979972839355, max=0.3395390510559082

WeasyPrint version 52:

svg stats
avg=0.49631505012512206, min=0.429671049118042, max=0.644413948059082

png stats
avg=0.04460523128509521, min=0.03843402862548828, max=0.05540776252746582

Usage of inline svg (WeasyPrint 53.2 only):

buffer_svg_xml = io.StringIO()
fig.savefig(buffer_svg_xml, format='svg')
img_data_svg_xml = f"<svg {buffer_svg_xml.getvalue().split('<svg ')[1]}"
html_str_svg = f"""
<html>
  <head></head>
  <body>
    {img_data_svg_xml}
  </body>
</html>
"""
html_svg_xml = weasyprint.HTML(string=html_str_svg)

duration = []
for _ in range(10):
    start = time.time()
    html_svg_xml.write_pdf('example.pdf')
    duration.append(time.time() - start)
print(f'svg xml stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}')
...
WARNING:weasyprint:Ignored `fill:none` at 1:1, unknown property.
WARNING:weasyprint:Ignored `stroke:#000000` at 1:11, unknown property.
WARNING:weasyprint:Ignored `stroke-linecap:square` at 1:26, unknown property.
ERROR:weasyprint:Failed to load inline SVG: prefix must not be bound to one of the reserved namespace names: line 1, column 0

=> Adjustments so the error does not get thrown:

import re

buffer_svg_xml = io.StringIO()
fig.savefig(buffer_svg_xml, format='svg')
img_data_svg_xml = f"<svg {buffer_svg_xml.getvalue().split('<svg ')[1]}"
img_data_svg_xml = img_data_svg_xml.replace(' xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"', '')
img_data_svg_xml = re.sub(r'(<metadata>[\S\s]*?</metadata>)', '', img_data_svg_xml)
html_str_svg = f"""
<html>
  <head></head>
  <body>
    {img_data_svg_xml}
  </body>
</html>
"""
html_svg_xml = weasyprint.HTML(string=html_str_svg)

duration = []
for _ in range(10):
    start = time.time()
    html_svg_xml.write_pdf('example.pdf')
    duration.append(time.time() - start)
print(f'svg xml stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}')

svg xml stats
avg=0.9133987426757812, min=0.8300449848175049, max=0.9757318496704102

@liZe
Copy link
Member

liZe commented Sep 8, 2021

Thanks a lot for this benchmark.

The PNG part of the benchmark doesn’t surprise me so much, even if I would have expected better results (our benchmarks were not that bad). The way images are managed changed a lot in version 53, and we’ve seen a lot of (often bad) results regarding memory and speed for raster images. For sure, there’s room for improvement here, and dropping Cairo is definitely not a valid excuse to explain this huge gap.

The SVG part is more surprising. CairoSVG is often faster than the implementation we now use, but most of the code is the same and the difference shouldn’t be so important. Moreover, I don’t reproduce the problem with the SVG file referenced earlier. I thought that the high number of points was causing the problem, but I can definitely reproduce it with only 10 points 😒.There’s something different in this plot, and I’ll find what!

So…

Could you please open new issues about:

  • raster image performance
  • SVG prefix crashing the rendering (we’ve silently ignored this one for too long)

(If you don’t want to follow these issues, tell me, I’ll open them myself.)

We’ll keep this issue for the slow rendering of your plot.

@liZe
Copy link
Member

liZe commented Sep 8, 2021

Moreover, I don’t reproduce the problem with the SVG file referenced earlier.

That’s just wrong, I have the problem with the other plot too.

@liZe
Copy link
Member

liZe commented Sep 8, 2021

OK, I’ve found one of the causes of the problem.

Replacing the image with a letter shows the same behavior. The problem is not the image, it’s the font. That was the second point listed in the article linked previously, I should have read it again 😉.

The font is optimized with fonttools, that’s obviously much slower than Cairo. The overhead is generally invisible, because it’s done once per font. But for very short documents, it’s really important.

Not optimizing fonts (passing optimize_size=() to write_pdf()) makes the rendering time even for JPG files (it’s actually faster with v53), but the generated PDF is larger.

For PNG files, there’s still a gap. It’s caused by the PNG to JPEG2000 conversion done by Pillow. A separate issue would be useful if we want to "fix" this use case.

For the SVG (that’s the main point of this issue), I have a gap too:

  • 2.2 times slower for version 53
  • 1.6 times slower for version 53 without optimized fonts

I’ll try to find why.

@SvenBecker
Copy link
Author

Ok thank you for the response. I opened two new issues. Btw. is there some documentation on how to use optimize_size or image_cache. I was not really aware of it and now I'm not really sure hot to use it.

@liZe
Copy link
Member

liZe commented Sep 10, 2021

Ok thank you for the response. I opened two new issues.

Thank you.

Btw. is there some documentation on how to use optimize_size or image_cache. I was not really aware of it and now I'm not really sure hot to use it.

You’re right, the documentation is not really clear about that. We’ll have to add a chapter about these parameters.

liZe added a commit that referenced this issue Sep 10, 2021
@liZe
Copy link
Member

liZe commented Sep 11, 2021

The speed regression is caused by the use tags. CairoSVG is really faster to find the referenced tag, maybe because there’s a cache, maybe because of something else.

I’ll try to find a way to make use faster.

@liZe liZe closed this as completed in 37af386 Sep 11, 2021
@liZe liZe added this to the 54.0 milestone Sep 11, 2021
@liZe
Copy link
Member

liZe commented Sep 11, 2021

Caching use tags makes the speed closer to what it was. There’s room for improvement, but I think that it’s enough for this performance problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

No branches or pull requests

2 participants