Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of DefaultEmitter #35

Open
untitaker opened this issue Feb 2, 2022 · 5 comments
Open

Improve performance of DefaultEmitter #35

untitaker opened this issue Feb 2, 2022 · 5 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@untitaker
Copy link
Owner

While implementing lycheeverse/lychee#480 I realized how slow the default emitter really is. It makes link extraction 10-40% slower than html5ever. It is currently not really possible to beat html5ever at all unless a custom emitter is implemented.

We could:

  • build another emitter that reuses strings, and calls a callback with borrowed strings instead. Therefore much closer to lol-html's API.
  • allow for custom allocators for all the strings we create -- similar to strtendril magic html5ever does (but definetly not using that crate)
@untitaker untitaker added bug Something isn't working good first issue Good for newcomers labels Feb 2, 2022
@untitaker untitaker changed the title Improve performance ot DefaultEmitter Improve performance of DefaultEmitter Feb 2, 2022
@untitaker
Copy link
Owner Author

untitaker commented Feb 4, 2022

@lebensterben if you're looking to contribute to html5gum I think this would be a good start to get some overlap between lychee contributors and html5gum contributors as it also provides immediate value to lychee and hyperlink (if performance is improved by sufficient margin) Let me know if you are interested in that, also happy to answer any questions.

@Ygg01
Copy link

Ygg01 commented May 25, 2022

allow for custom allocators for all the strings we create -- similar to strtendril magic html5ever does (but definetly not using that crate)

Why not? Is strtendril not working?

@untitaker
Copy link
Owner Author

@Ygg01 while lurking the servo zulip and the public issue tracker, my impression was that author simon sapin would like to rewrite the library or get rid of its usage in html5ever, eg servo/tendril#58

I also think it's fine if users of html5gum get only passable performance out of the box, as long as there's options to tweak things to optimal performance.

@Ygg01
Copy link

Ygg01 commented May 27, 2022

Hmm, perhaps it's possible to use zbuf from html5ever branch? Maybe fork it or something.

@untitaker
Copy link
Owner Author

I believe long-term, html5gum should implement a treebuilder + dom. so it may be wiser to reevaluate the discussion about string allocation until after then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants