New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow svg rendering since version 53.0 #1439
Comments
Hello! Thanks for this bug report. I’ve tried to reproduce the problem with a small SVG sample (included a lot of times in the HTML sample), and I have no significant difference between 52.x and 53.x. So, I’ve tried with a random matplotlib graph and … I got no difference either 😒. So… Your problem probably comes from something special in the SVG files you generate. Could you please share one of the SVG files you use? |
Hey thank you for the fast response. I also did some testing: import base64
import time
import io
import numpy as np
import weasyprint
from matplotlib.figure import Figure
np.random.seed(7)
data = np.random.randint(0, 1_000_000, size=(10_000, 2))
fig = Figure()
ax = fig.add_subplot()
ax.plot(data)
buffer_svg = io.BytesIO()
fig.savefig(buffer_svg, format='svg')
img_data_svg = base64.b64encode(buffer_svg.getbuffer()).decode("ascii")
html_str_svg = f"""
<html>
<head></head>
<body>
<img src='data:image/svg+xml;base64,{img_data_svg}' />
</body>
</html>
"""
html_svg = weasyprint.HTML(string=html_str_svg)
duration = []
for _ in range(10):
start = time.time()
html_svg.write_pdf('example.pdf')
duration.append(time.time() - start)
print(f'svg stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}')
buffer_png = io.BytesIO()
fig.savefig(buffer_png, format='png')
img_data_png = base64.b64encode(buffer_png.getbuffer()).decode("ascii")
html_str_png = f"""
<html>
<head></head>
<body>
<img src='data:image/png;base64,{img_data_png}' />
</body>
</html>
"""
html_png = weasyprint.HTML(string=html_str_png)
duration = []
for _ in range(10):
start = time.time()
html_png.write_pdf('example.pdf')
duration.append(time.time() - start)
print(f'\npng stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}') WeasyPrint version 53.3: svg stats png stats WeasyPrint version 52: svg stats png stats Usage of inline svg (WeasyPrint 53.2 only): buffer_svg_xml = io.StringIO()
fig.savefig(buffer_svg_xml, format='svg')
img_data_svg_xml = f"<svg {buffer_svg_xml.getvalue().split('<svg ')[1]}"
html_str_svg = f"""
<html>
<head></head>
<body>
{img_data_svg_xml}
</body>
</html>
"""
html_svg_xml = weasyprint.HTML(string=html_str_svg)
duration = []
for _ in range(10):
start = time.time()
html_svg_xml.write_pdf('example.pdf')
duration.append(time.time() - start)
print(f'svg xml stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}')
=> Adjustments so the error does not get thrown: import re
buffer_svg_xml = io.StringIO()
fig.savefig(buffer_svg_xml, format='svg')
img_data_svg_xml = f"<svg {buffer_svg_xml.getvalue().split('<svg ')[1]}"
img_data_svg_xml = img_data_svg_xml.replace(' xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"', '')
img_data_svg_xml = re.sub(r'(<metadata>[\S\s]*?</metadata>)', '', img_data_svg_xml)
html_str_svg = f"""
<html>
<head></head>
<body>
{img_data_svg_xml}
</body>
</html>
"""
html_svg_xml = weasyprint.HTML(string=html_str_svg)
duration = []
for _ in range(10):
start = time.time()
html_svg_xml.write_pdf('example.pdf')
duration.append(time.time() - start)
print(f'svg xml stats\navg={sum(duration) / len(duration)}, min={min(duration)}, max={max(duration)}') svg xml stats |
Thanks a lot for this benchmark. The PNG part of the benchmark doesn’t surprise me so much, even if I would have expected better results (our benchmarks were not that bad). The way images are managed changed a lot in version 53, and we’ve seen a lot of (often bad) results regarding memory and speed for raster images. For sure, there’s room for improvement here, and dropping Cairo is definitely not a valid excuse to explain this huge gap. The SVG part is more surprising. CairoSVG is often faster than the implementation we now use, but most of the code is the same and the difference shouldn’t be so important. So… Could you please open new issues about:
(If you don’t want to follow these issues, tell me, I’ll open them myself.) We’ll keep this issue for the slow rendering of your plot. |
That’s just wrong, I have the problem with the other plot too. |
OK, I’ve found one of the causes of the problem. Replacing the image with a letter shows the same behavior. The problem is not the image, it’s the font. That was the second point listed in the article linked previously, I should have read it again 😉. The font is optimized with fonttools, that’s obviously much slower than Cairo. The overhead is generally invisible, because it’s done once per font. But for very short documents, it’s really important. Not optimizing fonts (passing For PNG files, there’s still a gap. It’s caused by the PNG to JPEG2000 conversion done by Pillow. A separate issue would be useful if we want to "fix" this use case. For the SVG (that’s the main point of this issue), I have a gap too:
I’ll try to find why. |
Ok thank you for the response. I opened two new issues. Btw. is there some documentation on how to use |
Thank you.
You’re right, the documentation is not really clear about that. We’ll have to add a chapter about these parameters. |
The speed regression is caused by the I’ll try to find a way to make |
Caching |
The time it took to create a pdf with multiple svg images increased drastically.
We used to create some figures via matplotlib and base64 encode those images, so we do not have to create image files:
We also tried to use svg html tags (without base64 encoding) instead but this took way too long as well (plus there is some issue with namespaces (ElementTree.fromstring exception) so we had to use some regex sub stuff to remove metadata an namespace data from the svg xml).
Results from cProfile for the method
write_pdf
(the underlyingdraw
methods got called 15 times each):time: 8602 ms, 31.7%
time: 28502 ms, 71.4%
time: 23376 ms, 68.7%
Those stats got worse the more svg images where included.
The text was updated successfully, but these errors were encountered: