New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HTML 5 serialization in C #2596
Implement HTML 5 serialization in C #2596
Conversation
I'm unsure where the downstream error |
I can't reproduce this locally. I guess I'll try on a Linux VM…but not tonight. |
@stevecheckoway Thanks for doing this! The error you're seeing is because Github will rebase the PR onto current I'm going to explicitly rebase this and update the get-struct macros, and then it should go green. |
@stevecheckoway Looks like the downstream sanitize test suite is segfaulting: https://github.com/sparklemotion/nokogiri/runs/7392514046?check_suite_focus=true |
I ran valgrind on the sanitize tests using nokogiri from this PR, and it says:
Looks like |
Oh, no, it's actually that fragments don't have a node name. |
@stevecheckoway I pushed a commit, would love your feedback. |
Some rough benchmarks:
so pretty good 10x improvement, @stevecheckoway !!! |
I think I'd like to squash these commits but this approach looks good. My only changes were using |
HTML 5 serialization was previously done entirely in Ruby. The Ruby code is slow. This reimplements the serialization in C. Reencoding happens after UTF-8 serialization. This is about 10x faster: ``` C - ruby 3.2.0dev (2022-07-18T21:06:30Z master 85ea46730d) [x86_64-linux]: 848.4 i/s C - ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]: 812.0 i/s - same-ish: difference falls within error ruby - ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) +YJIT [x86_64-linux]: 86.3 i/s - 9.83x (± 0.00) slower ruby - ruby 3.2.0dev (2022-07-18T21:06:30Z master 85ea46730d) +YJIT [x86_64-linux]: 82.9 i/s - 10.24x (± 0.00) slower ruby - ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]: 80.4 i/s - 10.55x (± 0.00) slower ruby - ruby 3.2.0dev (2022-07-18T21:06:30Z master 85ea46730d) [x86_64-linux]: 74.7 i/s - 11.36x (± 0.00) slower ``` Fixes: #2569 Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
|
I've squashed all the commits and updated the commit message with benchmark data. When this is green, I'd like to merge. |
@stevecheckoway Any last-minute thoughts on this? Feel free to whack the "merge" button. |
Your changes look good. Hopefully people can give it a test drive and shake out any bugs before the next release. |
woo! |
HTML 5 serialization was previously done entirely in Ruby.
The Ruby code is slow. This reimplements the serialization in C.
Reencoding happens after UTF-8 serialization.
Fixes: #2569
What problem is this PR intended to solve?
#2569
Have you included adequate test coverage?
Not yet.
Does this change affect the behavior of either the C or the Java implementations?
It should speed up the serialization of HTML 5 without a change in behavior.