Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASM file size #557

Open
MartinKavik opened this issue Oct 19, 2019 · 13 comments
Open

WASM file size #557

MartinKavik opened this issue Oct 19, 2019 · 13 comments

Comments

@MartinKavik
Copy link

Hi,

when I include this library to the Seed app and then compile and optimize it => the output WASM file size changes from 466 to 745 KB (+279 KB).
Do you know how to reduce it? (feature flags, dependency changes, etc.)

Thank you!

@SimonSapin
Copy link
Member

The Unicode tables used for IDNA come to mind, but there is no plan to make those optional.

https://github.com/rustwasm/twiggy is made to investigate exactly this kind of question. Let us know what you find.

@MartinKavik
Copy link
Author

Thanks!

Let us know what you find.

WASM without rust-url
$ twiggy top -n 30 package_bg.wasm
 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         51594 ┊    13.36% ┊ "function names" subsection
         48499 ┊    12.56% ┊ data[0]
         12522 ┊     3.24% ┊ todomvc::view::h97225a52d3c16381
          9025 ┊     2.34% ┊ seed::dom_entity_names::styles::style_names::St::as_str::h034572cb759631f0
          8152 ┊     2.11% ┊ <seed::events::Ev as alloc::string::ToString>::to_string::hcc1c264f86668900
          7330 ┊     1.90% ┊ seed::dom_types::At::as_str::h7851cbcb93205c52
          6673 ┊     1.73% ┊ todomvc::todo_item::h1df45a0a040d8b33
          6311 ┊     1.63% ┊ seed::patch::patch_els::h957d52591338c6a5
          5900 ┊     1.53% ┊ <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct::h59632fa41a2529e5
          4124 ┊     1.07% ┊ core::num::flt2dec::strategy::dragon::format_shortest::hc3fdbdc1b58d0107
          3958 ┊     1.03% ┊ core::num::flt2dec::strategy::dragon::mul_pow10::he029ff2298318da8
          3803 ┊     0.99% ┊ seed::vdom::App<Ms,Mdl,ElC,GMs>::rerender_vdom::hffd3fa7a285240be
          3587 ┊     0.93% ┊ seed::dom_types::Tag::as_str::hd5afd3749469c804
          3580 ┊     0.93% ┊ <seed::events::Ev as core::convert::From<&str>>::from::h61e1af94d01ac761
          3459 ┊     0.90% ┊ seed::vdom::App<Ms,Mdl,ElC,GMs>::run::h2f5f29613cf62e77
          3387 ┊     0.88% ┊ core::num::flt2dec::strategy::dragon::format_exact::hd48d2106360aebab
          3265 ┊     0.85% ┊ serde_json::read::parse_escape::h35d05d9b977340bf
          3160 ┊     0.82% ┊ dlmalloc::dlmalloc::Dlmalloc::malloc::hdeb1d8a00336bd7d
          2951 ┊     0.76% ┊ seed::vdom::App<Ms,Mdl,ElC,GMs>::process_cmd_and_msg_queue::h61adb145d03dbd19
          2555 ┊     0.66% ┊ serde_json::error::make_error::h1cc92dbd2c6d0ac9
          2368 ┊     0.61% ┊ seed::vdom::AppBuilder<Ms,Mdl,ElC,GMs>::finish::h5b4eaa26a1873c35
          2330 ┊     0.60% ┊ core::ptr::real_drop_in_place::h79875ed4f59e7acc
          2286 ┊     0.59% ┊ todomvc::_IMPL_SERIALIZE_FOR_Todo::<impl serde::ser::Serialize for todomvc::Todo>::serialize::h4580e1901b094566
          2285 ┊     0.59% ┊ core::ptr::real_drop_in_place::h7ae7c6f180dcdde7
          2255 ┊     0.58% ┊ core::num::flt2dec::strategy::grisu::format_shortest_opt::h3f9e6abbaaffd604
          2166 ┊     0.56% ┊ todomvc::update::h208900e1779716fc
          2078 ┊     0.54% ┊ seed::routing::push_route::h67bcc0c2821cfc75
          2030 ┊     0.53% ┊ <alloc::rc::Rc<T> as core::ops::drop::Drop>::drop::h5afa544d59e23393
          1970 ┊     0.51% ┊ todomvc::selection_li::hd214a0aee5f289be
          1948 ┊     0.50% ┊ seed::routing::setup_link_listener::{{closure}}::h5500e3ba84953659
        170489 ┊    44.16% ┊ ... and 892 more.
        386040 ┊   100.00% ┊ Σ [922 Total Rows]




$ twiggy top -n 30 --retained  package_bg.wasm
 Retained Bytes │ Retained % │ Item
────────────────┼────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
         185236 ┊     47.98% ┊ table[0]
         143925 ┊     37.28% ┊ elem[2]
          55082 ┊     14.27% ┊ todomvc::view::h97225a52d3c16381
          51594 ┊     13.36% ┊ "function names" subsection
          48499 ┊     12.56% ┊ data[0]
          29694 ┊      7.69% ┊ seed::vdom::App<Ms,Mdl,ElC,GMs>::rerender_vdom::hffd3fa7a285240be
          27995 ┊      7.25% ┊ seed::routing::setup_popstate_listener::{{closure}}::h4266f23cdeb07c34
          26471 ┊      6.86% ┊ serde_json::de::from_str::hacb26ff04f7f2291
          25738 ┊      6.67% ┊ seed::patch::patch_els::h957d52591338c6a5
          25376 ┊      6.57% ┊ <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct::h59632fa41a2529e5
          19867 ┊      5.15% ┊ core::fmt::float::<impl core::fmt::Display for f64>::fmt::hec36d9763bb92887
          14770 ┊      3.83% ┊ seed::websys_bridge::patch_el_details::h59aec7fe0bdbcd88
          14176 ┊      3.67% ┊ elem[3]
          13612 ┊      3.53% ┊ seed::util::set_value::he1efe3e9d94600c2
          12965 ┊      3.36% ┊ render
          11858 ┊      3.07% ┊ core::num::dec2flt::dec2flt::hecb7c0c5227f8e08
          11466 ┊      2.97% ┊ <&T as core::fmt::Display>::fmt::hea41aae39fbc745a
          11452 ┊      2.97% ┊ <seed::dom_types::Style as core::fmt::Display>::fmt::he3b21f40a393b270
           9883 ┊      2.56% ┊ <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter::hfdce77303d539ef6
           9025 ┊      2.34% ┊ seed::dom_entity_names::styles::style_names::St::as_str::h034572cb759631f0
           8939 ┊      2.32% ┊ seed::routing::setup_link_listener::{{closure}}::h5500e3ba84953659
           8217 ┊      2.13% ┊ <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter::h1f098b6f32d27b75
           8153 ┊      2.11% ┊ todomvc::update::h208900e1779716fc
           8152 ┊      2.11% ┊ <seed::events::Ev as alloc::string::ToString>::to_string::hcc1c264f86668900
           8001 ┊      2.07% ┊ todomvc::selection_li::hd214a0aee5f289be
           7628 ┊      1.98% ┊ core::fmt::float::float_to_decimal_common_shortest::hfeed2ce960232b0f
           7532 ┊      1.95% ┊ seed::vdom::AppBuilder<Ms,Mdl,ElC,GMs>::finish::h5b4eaa26a1873c35
           7529 ┊      1.95% ┊ <core::iter::adapters::FilterMap<I,F> as core::iter::traits::iterator::Iterator>::next::h8f90343733163db1
           7330 ┊      1.90% ┊ seed::dom_types::At::as_str::h7851cbcb93205c52
           7030 ┊      1.82% ┊ todomvc::todo_item::h1df45a0a040d8b33
            ... ┊        ... ┊ ... and 892 more.
            ... ┊        ... ┊ Σ [922 Total Rows]
WASM with rust-url
$ twiggy top -n 30 package_bg.wasm
 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────
        154403 ┊    18.56% ┊ data[0]
         96585 ┊    11.61% ┊ unicode_normalization::tables::compatibility_fully_decomposed::h610a8da78adcc9a1
         61391 ┊     7.38% ┊ "function names" subsection
         53080 ┊     6.38% ┊ unicode_normalization::tables::canonical_fully_decomposed::haf0dd53cfd99b850
         23916 ┊     2.88% ┊ unicode_normalization::tables::composition_table::ha618e31638f3c9ff
         21658 ┊     2.60% ┊ unicode_normalization::tables::is_combining_mark::h1c0865ae6b8e00b7
         19871 ┊     2.39% ┊ core::iter::traits::iterator::Iterator::ne::h1630a628cdc19202
         19342 ┊     2.33% ┊ <alloc::string::String as core::iter::traits::collect::Extend<char>>::extend::h1ad93c9fe3031830
         13927 ┊     1.67% ┊ unicode_normalization::decompose::Decompositions<I>::push_back::h7cf7930ccccb37a6
         12526 ┊     1.51% ┊ todomvc::view::h97225a52d3c16381
          9025 ┊     1.08% ┊ seed::dom_entity_names::styles::style_names::St::as_str::h034572cb759631f0
          8439 ┊     1.01% ┊ url::parser::Parser::parse_file::h286a08fd89f7178d
          8152 ┊     0.98% ┊ <seed::events::Ev as alloc::string::ToString>::to_string::hcc1c264f86668900
          7330 ┊     0.88% ┊ seed::dom_types::At::as_str::h7851cbcb93205c52
          7237 ┊     0.87% ┊ idna::uts46::processing::h7bc547d566a42b45
          6675 ┊     0.80% ┊ todomvc::todo_item::h1df45a0a040d8b33
          6299 ┊     0.76% ┊ seed::patch::patch_els::h392e600e625a8eba
          5900 ┊     0.71% ┊ <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct::h59632fa41a2529e5
          4980 ┊     0.60% ┊ url::parser::Parser::after_double_slash::h4342ebb1373fbba2
          4547 ┊     0.55% ┊ idna::uts46::validate::hb7e9ea2ff48c68a8
          4325 ┊     0.52% ┊ url::host::Host::parse::h502bf6c1a17156ab
          4278 ┊     0.51% ┊ url::parser::Parser::parse_url::h58b7248043dc2fa9
          4128 ┊     0.50% ┊ core::num::flt2dec::strategy::dragon::format_shortest::hc3fdbdc1b58d0107
          3958 ┊     0.48% ┊ core::num::flt2dec::strategy::dragon::mul_pow10::he029ff2298318da8
          3803 ┊     0.46% ┊ seed::vdom::App<Ms,Mdl,ElC,GMs>::rerender_vdom::hffd3fa7a285240be
          3587 ┊     0.43% ┊ seed::dom_types::Tag::as_str::hd5afd3749469c804
          3580 ┊     0.43% ┊ <seed::events::Ev as core::convert::From<&str>>::from::h61e1af94d01ac761
          3567 ┊     0.43% ┊ seed::vdom::App<Ms,Mdl,ElC,GMs>::run::h2f5f29613cf62e77
          3395 ┊     0.41% ┊ url::parser::Parser::parse_relative::hae36df578e35a007
          3388 ┊     0.41% ┊ core::num::flt2dec::strategy::dragon::format_exact::hd48d2106360aebab
        248534 ┊    29.88% ┊ ... and 1036 more.
        831826 ┊   100.00% ┊ Σ [1066 Total Rows]



$ twiggy top -n 30 --retained package_bg.wasm
 Retained Bytes │ Retained % │ Item
────────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────
         514673 ┊     61.87% ┊ table[0]
         471594 ┊     56.69% ┊ elem[2]
         329161 ┊     39.57% ┊ <seed::routing::Url as core::convert::TryFrom<alloc::string::String>>::try_from::h51ed6a9bac51d949
         329113 ┊     39.57% ┊ <seed::routing::Url as core::convert::TryFrom<&str>>::try_from::h3c8a298d9c804190
         325989 ┊     39.19% ┊ url::ParseOptions::parse::hfc8f64da49b49052
         325820 ┊     39.17% ┊ url::parser::Parser::parse_url::h58b7248043dc2fa9
         283405 ┊     34.07% ┊ url::host::Host::parse::h502bf6c1a17156ab
         273762 ┊     32.91% ┊ idna::domain_to_ascii::h350edb6a1946720e
         273747 ┊     32.91% ┊ idna::uts46::Config::to_ascii::h7c1693bf98e7c502
         267643 ┊     32.18% ┊ idna::uts46::processing::h7bc547d566a42b45
         167194 ┊     20.10% ┊ <&mut I as core::iter::traits::iterator::Iterator>::next::hc5f836fdf3b2c238
         154403 ┊     18.56% ┊ data[0]
          96585 ┊     11.61% ┊ unicode_normalization::tables::compatibility_fully_decomposed::h610a8da78adcc9a1
          61391 ┊      7.38% ┊ "function names" subsection
          55089 ┊      6.62% ┊ todomvc::view::h97225a52d3c16381
          53080 ┊      6.38% ┊ unicode_normalization::tables::canonical_fully_decomposed::haf0dd53cfd99b850
          29682 ┊      3.57% ┊ seed::vdom::App<Ms,Mdl,ElC,GMs>::rerender_vdom::hffd3fa7a285240be
          27253 ┊      3.28% ┊ seed::routing::setup_popstate_listener::{{closure}}::h90464e42c2b12b20
          26439 ┊      3.18% ┊ serde_json::de::from_str::hacb26ff04f7f2291
          26205 ┊      3.15% ┊ idna::uts46::validate::hb7e9ea2ff48c68a8
          25726 ┊      3.09% ┊ seed::patch::patch_els::h392e600e625a8eba
          25344 ┊      3.05% ┊ <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct::h59632fa41a2529e5
          24025 ┊      2.89% ┊ unicode_normalization::normalize::compose::h1986a9b3d9b06a39
          23916 ┊      2.88% ┊ unicode_normalization::tables::composition_table::ha618e31638f3c9ff
          21658 ┊      2.60% ┊ unicode_normalization::tables::is_combining_mark::h1c0865ae6b8e00b7
          19874 ┊      2.39% ┊ core::fmt::float::<impl core::fmt::Display for f64>::fmt::hec36d9763bb92887
          19871 ┊      2.39% ┊ core::iter::traits::iterator::Iterator::ne::h1630a628cdc19202
          19342 ┊      2.33% ┊ <alloc::string::String as core::iter::traits::collect::Extend<char>>::extend::h1ad93c9fe3031830
          14770 ┊      1.78% ┊ seed::websys_bridge::patch_el_details::hcca53e0da59c1d8f
          14221 ┊      1.71% ┊ unicode_normalization::decompose::Decompositions<I>::push_back::h7cf7930ccccb37a6
            ... ┊        ... ┊ ... and 1036 more.
            ... ┊        ... ┊ Σ [1066 Total Rows]
Differences
Wihout   With     Difference    Item
--------------------------------------------------------------------------------------------------------------------------------
48499    154403   105904        data[0]
0        96585    96585         unicode_normalization::tables::compatibility_fully_decomposed::h610a8da78adcc9a1
51594    61391    9797          "function names" subsection
0        53080    53080         unicode_normalization::tables::canonical_fully_decomposed::haf0dd53cfd99b850
0        23916    23916         unicode_normalization::tables::composition_table::ha618e31638f3c9ff
0        21658    21658         unicode_normalization::tables::is_combining_mark::h1c0865ae6b8e00b7
0/<2000  19871    19871         core::iter::traits::iterator::Iterator::ne::h1630a628cdc19202
0/<2000  19342    19342         <alloc::string::String as core::iter::traits::collect::Extend<char>>::extend::h1ad93c9fe3031830
0        13927    13927         unicode_normalization::decompose::Decompositions<I>::push_back::h7cf7930ccccb37a6
...      ...      ...
386040   831826   445786        Total

So if I created Differences table correctly, it seems that you are right about unicode tables. I assume that bigger data[0] and Iterators are side-effects.

there is no plan to make those optional.

Do you think that there is a workaround? Or am I able to "hack" it somehow in my fork at reasonable time? - I.e. Should I try to find a solution? Thank you.

@SimonSapin
Copy link
Member

I was thinking of the idna crate, but unicode_normalization contributes a lot more. unicode-rs/unicode-normalization#37 improved things semi-recently, but there may be optimizations possible to lower the code size. Work in this direction would be very welcome.

Do you think that there is a workaround? Or am I able to "hack" it somehow in my fork

What kind of workaround do you mean? Those tables are required to correctly implement the parsing algorithm according to the spec.

Or if you want another to fork the url crate in order to remove IDNA support entirely you can find uses of the idna crate easily enough. Of course this produces incorrect behavior, and you end up with a fork that either takes maintenance work or drifts out of sync as fixes are made here.

@MartinKavik
Copy link
Author

there may be optimizations possible to lower the code size

Do you think that some optimizations would help to reduce rust-url size at least to XX KB instead of cca 400 KB (in my example above)?

What kind of workaround do you mean?

I hoped that we can leverage some native browser APIs like Encoding API or even URL.

Of course this produces incorrect behavior

I have practically zero knowledge about this domain / this project architecture, so thanks for explanation.
The main reasons why I've created this issue:

  1. We can't use rust-url in a Rust-wasm framework or in user apps directly because of size.
  2. We can't use libraries like reqwest, because it's dependent on rust-url. (That's why I suggested to create a fork - it would share the public API, but use probably browser URL API under the hood.)

@SimonSapin
Copy link
Member

reduce rust-url size at least to XX KB

That’s really hard to say without trying it

leverage some native browser APIs

For wasm-in-the-browser that may possible. But it’s an entirely different project form this crate.

@paolobarbolini
Copy link

paolobarbolini commented Sep 14, 2020

I just found this issue and seeing how the url crate is being used by many crates which have to deal with urls, I think it should support an optional feature which leverages the browser's APIs to avoid having to include idna

@tmccombs
Copy link
Contributor

How would that work? Wasm doesn't have direct access to browser API s.

@paolobarbolini
Copy link

paolobarbolini commented Sep 14, 2020

How would that work? Wasm doesn't have direct access to browser API s.

In a wasm-bindgen project it can be done easily through web-sys

Here's what getrandom does 1, 2, for example. Browser support is usually put behind a feature, so that downstream users can specify whether they are using the wasm32-unknown-unknown target for the browser or something else.

Some projects also use stdweb, I never tried it, but I imagine it to be similar.

@SimonSapin
Copy link
Member

Libraries that use rust-url could have a switch that makes them use web-sys instead, but I feel that within rust-url itself is not the right place for such a switch.

@djc
Copy link
Contributor

djc commented Sep 14, 2020

@SimonSapin why do you feel that way? It's not clear to me how this would work. If people would like to use browser APIs just for the unicode data tables, it seems like they would have to duplicate the entire URL parser and IDNA logic on their side.

But perhaps you meant that folks could just use the browser's built-in URL parser when their code is running inside the browser?

@SimonSapin
Copy link
Member

Yes, using the browser’s URL parsing is what I had in mind:

https://url.spec.whatwg.org/#api
https://docs.rs/web-sys/0.3.45/web_sys/struct.Url.html

@micolous
Copy link
Contributor

micolous commented Nov 2, 2023

What I can see has happened here so far:

  1. IDNA support (a significant part of url's footprint) was made optional in init commit to make idna optional #728, with the feature defaulted to on.

  2. That was rolled back in Make IDNA dependency non-optional #790.

    There's limited context on that PR, I presume it is because users of the url crate with default-features = false lost IDNA support and that was a SemVer-breaking API change.

  3. There's a new proposal in Add feature "disable_idna" #836 to add a disable-idna feature.

    While this would avoid breaking those with default-features = false, it would violate the principle that features should be additive, and means that anything in the dependency chain setting that feature would opt-out for everything.

    @valenting suggested that one could disable IDNA support by patching out the idna crate, and so that PR appears to have stalled.

There's still a significant footprint from the url crate, and I saw similar size decreases to the original reporter when replacing url with Web APIs in web_sys. Thankfully I didn't have anything else pulling in the url crate for that project, so it's all just kinda clunky to use.

I feel that disabling IDNA support is a band-aid solution to the problem in WASM - the JavaScript/Web URL type actually supports IDNA just fine, so it would be nice to be able to use it:

>>> var url = new URL('https://example.com/test');

>>> url.toString()
"https://example.com/test"

>>> url.host = 'example.net'
"example.net"

>>> url.toString()
"https://example.net/test"

>>> url.host = 'straße.example.com'
"straße.example.com"

>>> url.toString()
"https://xn--strae-oqa.example.com/test"

One could instead patch out the url crate, and replace it with an API-compatible version which calls web-sys::Url and friends. However, this risks breaking and diverging over time.

Binding url to Web APIs could also be implemented as a --cfg option in the url crate itself, so that it could take advantage of its existing test cases, ensure the API remains consistent, and make it easy to access.

I feel that within rust-url itself is not the right place for such a switch.

The trouble is url::Url is the de-facto Rust URL type, used in many crates and exposed in their APIs, and is not part of the Rust standard library.

If it was part of the standard library, the toolchain could just swap the Url implementation for browser APIs when targeting a browser, provided they can be made like-for-like, and it becomes a set of assumptions on that platform. There are similar abstractions in Rust for other platform-specific core APIs, like file I/O, threading, and using environment variables.

But for a crate outside of the standard library - it's not the toolchain's problem to solve. Url could become part of the Rust standard library, but I suspect the bar for that is pretty high.

In the end, any WASM-specific implementation would need to end up on the Url type. 🤷

@hsivonen
Copy link
Contributor

My PR to reimplement the internals on top of ICU4X makes the Wasm size smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants