Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using html5ever in wasm package for an isomorphic html sanitizer #497

Open
dejang opened this issue May 6, 2023 · 3 comments
Open

Using html5ever in wasm package for an isomorphic html sanitizer #497

dejang opened this issue May 6, 2023 · 3 comments

Comments

@dejang
Copy link

dejang commented May 6, 2023

Hello,

I am looking at ways to build an HTML5 sanitizer capable of running in both Browser, NodeJS and Java environments, Java being the lowest priority at the moment. The most important requirement is to not rely on a DOM to be able to operate in these environments. I stumbled upon html5ever and it looks like the perfect tool to use for my scenario with the added benefit that it's part of the Servo project.

For Browser and NodeJS environments I would have to produce WASM artifacts given the simplicity of dealing with multiple platforms in NodeJS but also because of environments where I may not be able to load NodeJS binary native plugins. For Browser environments or mobile WebView there is no other option than producing a WASM artifact so these are the restrictions around the distribution process which I am fine with.

I am using Rust to build the sanitizer so this keeps things easy to manage staying in the same programming language all the way in the development process.

Currently, when compiling html5ever to WASM I get an output of 450kb even when running it through wasm-opt and being very aggressive on the optimizations for size. Unfortunately that is way too big of a file for the Web. Ideally, if it can be around 50kb it would make html5ever a much more desirable alternative to existing Javascript sanitizers for the browser.

I would like to ask if there is a way to either compile html5ever to WASM so that I can reach my desired target size or, alternatively, use only features from the parser that I currently need in hopes that by doing this I will manage to shave off a considerable amount of code.
My main scenario is the following: given a string containing HTML, produce a DOM tree which can be traversed to identify tags, attributes and attribute values which should be eliminated. Return a string.

Thank you for taking the time to read this issue, hopefully with your help I'll be able to use html5ever to achieve my goals.

@jdm
Copy link
Member

jdm commented May 6, 2023

I have no experience with attempting to minimize wasm builds, so I can't provide any assistance there. Html5ever is designed to follow a specific parsing algorithm that is web-compatible, and I'm unaware of any optional features that can be disabled as a result.

@tetsuharuohzeki
Copy link

@dejang

From my few experience to minimize wasm build size, at least Rust v1.69, it can reduce the size aggressively to enable lto option than to do post processing by wasm-opt. There are FullLTO or ThinLTO, either is fine.

I hope you try it out :)

@dejang
Copy link
Author

dejang commented May 24, 2023

@tetsuharuohzeki I was using nightly for this one with lto optimization. It seems the 450kb is the best I could do after a bit of fiddling around with optimization settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants