Splitting into multiple lazily-loaded modules #3939

jbms · 2024-04-27T20:08:28Z

Motivation

Proposed Solution

I have implemented a (limited/hacky) prototype, based on the following components:

A #[wasm_split(xyz)] function attribute macro that serves to annotate a function as a split point. xyz is an identifier for the module that this function should be "split off" into. The same identifier can be used multiple times, in which case multiple functions will be "split off" into the same module. In my prototype the function must be non-async, and this macro turns it into an async function, but it wouldn't be hard to support both sync and async split points.

For example, the macro converts:

#[wasm_split(zstd)]
fn get_zstd_decoder(
    encoded_reader: Pin<Box<dyn futures::io::AsyncBufRead>>,
) -> Pin<Box<dyn futures::io::AsyncRead>> {
    Box::pin(async_compression::futures::bufread::ZstdDecoder::new(
        encoded_reader,
    ))
}

into

async fn get_zstd_decoder(
    __wasm_split_arg_0: Pin<Box<dyn futures::io::AsyncBufRead>>,
) -> Pin<Box<dyn futures::io::AsyncRead>> {
    thread_local! {
        static ::wasm_split::LazySplitLoader> = unsafe { ::wasm_split::LazySplitLoader::new(__wasm_split_load_zstd) };
    }
    #[link(wasm_import_module = "./__wasm_split.js")]
    extern "C" {
        #[no_mangle]
        fn __wasm_split_load_zstd(
            callback: unsafe extern "C" fn(*const ::std::ffi::c_void, bool),
            data: *const ::std::ffi::c_void,
        ) -> ();
        #[allow(improper_ctypes)]
        #[no_mangle]
        fn __wasm_split_00zstd00_import_56925a789e8e525628ef50b9c566f070_get_zstd_decoder(
            encoded_reader: Pin<Box<dyn futures::io::AsyncBufRead>>,
        ) -> Pin<Box<dyn futures::io::AsyncRead>>;
    }
    #[allow(improper_ctypes_definitions)]
    #[no_mangle]
    pub extern "C" fn __wasm_split_00zstd00_export_56925a789e8e525628ef50b9c566f070_get_zstd_decoder(
        encoded_reader: Pin<Box<dyn futures::io::AsyncBufRead>>,
    ) -> Pin<Box<dyn futures::io::AsyncRead>> {
        Box::pin(async_compression::futures::bufread::ZstdDecoder::new(encoded_reader))
    }
    ::wasm_split::ensure_loaded(&__wasm_split_loader).await.unwrap();
    unsafe {
        __wasm_split_00zstd00_import_56925a789e8e525628ef50b9c566f070_get_zstd_decoder(
            __wasm_split_arg_0,
        )
    }
}

Note that the real body of the function is moved to a separate exported function (__wasm_split_00zstd00_export_56925a789e8e525628ef50b9c566f070_get_zstd_decoder) that is never called. The original function body is replaced by code that ensures the module is asynchronously loaded, and then calls a separate imported function (__wasm_split_00zstd00_import_56925a789e8e525628ef50b9c566f070_get_zstd_decoder). In a post-processing step, __wasm_split_00zstd00_import_56925a789e8e525628ef50b9c566f070_get_zstd_decoder will be changed to refer to a function that does an indirect call of __wasm_split_00zstd00_export_56925a789e8e525628ef50b9c566f070_get_zstd_decoder.

This effectively disconnects the call graph at this split point, which is important for the post-processing.

Then we compile and link the program using -Clink-args=--emit-relocs.

The post-processing reads in the linked .wasm file (before running wasm-bindgen, since wasm-bindgen does not preserve relocation information), identifies the split points based on the symbol names, and then determines the dependency graph of all symbols based on the relocation information.

Note that the dependency graph includes both functions and data symbols, since data symbols such as vtables refer to functions via the indirect function table.

We then compute the contents of the "main" module as the transitive dependencies of:

The start function
Any exported function

For each split module, we then compute the transitive dependencies of the real implementation function (such as __wasm_split_00zstd00_export_56925a789e8e525628ef50b9c566f070_get_zstd_decoder) for each split point assigned to the module. When computing transitive dependencies here, we can stop once we encounter a symbol that is assigned to the main module.

Symbols that are uniquely in the transitive dependencies of a single split module are assigned to that split module. Symbols that are in the transitive dependencies of more than one split module are assigned to a separate "chunk" module identified by the set of two or more split modules that have the symbol as a transitive dependency. Thus we may in general produce a large number of chunk modules. Various heuristics could be used to combine them.

The split point implementation functions, and any function that is called from more than one module, gets added to the __indirect_function_table.

We then emit each module, using the relocation information to remap functions. In the prototype, although we compute dependencies as if data symbols are split out, in fact all of the data segments remain in the main module, but it should be feasible to split the data as well. Calls to functions defined in other modules are replaced by calls to a stub function that does an indirect call. Each split module has no start function but has an active element that initializes a portion of the __indirect_function_table.

The support javascript for loading the module looks something like:

import { initSync } from "./main.js";

export async function __wasm_split_load_zstd(callback_index, callback_data) {
  let mainExports = undefined;
  try {
    const response = await fetch(new URL("./zstd.wasm", import.meta.url));
    mainExports = initSync(undefined, undefined);
    const imports = {
      env: {
        memory: mainExports.memory,
      },
      __wasm_split: {
        __indirect_function_table: mainExports.__indirect_function_table,
        __stack_pointer: mainExports.__stack_pointer,
        __tls_base: mainExports.__tls_base,
      },
    };
    const module = await WebAssembly.instantiateStreaming(response, imports);
    mainExports.__indirect_function_table.get(callback_index)(
      callback_data,
      true,
    );
  } catch (e) {
    console.error("Failed to load zstd", e);
    if (mainExports === undefined) {
      mainExports = initSync(undefined, undefined);
    }
    mainExports.__indirect_function_table.get(callback_index)(
      callback_data,
      false,
    );
  }
}

Alternatives

This implementation was inspired by the description of the emscripten wasm-split tool (https://emscripten.org/docs/optimizing/Module-Splitting.html#module-splitting). The emscripten wasm-split tool differs in the following ways:

Only splits into one main module and one secondary module. The secondary module is loaded synchronously on demand.
The split is determined based on profiling rather than explicit annotations in the code.

While the emscripten wasm-split approach could presumably be adapted to rust fairly easily, I think there are a lot of advantages to explicitly-annotated, asynchronously-loaded split points.

Another alternative would be to provide something closer to dlopen, which I think may be along the lines of what is being proposed for a webassembly dynamic linking mechanism. The advantage of what I'm proposing here over a dlopen-style interface is:

split points can be inserted at arbitrary locations, rather than only at crate boundaries,
the wasm_split macro provides a very ergonomic interface
code and data is automatically deduplicated across modules.

Additional Context

The current prototype implementation is basically independent of wasm-bindgen --- it works with an unmodified wasm-bindgen but the module loading depends slightly on implementation details of wasm-bindgen.

Ultimately, though, as a feature for which I think there is quite a lot of interest in the community, it would probably be better to integrate this into wasm-bindgen itself --- that would allow the javascript code to be split along with the wasm module.

Towards that goal, I'd appreciate some guidance on whether this feature would likely be accepted, and if so, any comments on how best to integrate it.

The current prototype implementation uses wasm_encoder and wasmparser directly. I initially attempted to use walrus but found that its abstractions didn't work very well given the need to make use of the relocation information. Possibly walrus could be modified to provide the necessary functionality. Alternatively, the splitting could be done first using wasm_encoder and wasmparser directly, and then the remaining wasm-bindgen processing could be done using walrus.

The text was updated successfully, but these errors were encountered:

tlively · 2024-04-29T19:56:17Z

This is super cool work. FWIW, we would be happy to take PRs implementing the call-graph-based split analysis for wasm-split. Combined with PRs to add the ability to pass wasm-split function name patterns rather than complete function names to seed the list of functions to split out, I think we could recover all of the functionality you describe above without the need for a separate post-processing tool besides wasm-split. This could also simplify your macro somewhat because you wouldn't need to worry about disconnecting the call graph at the source level.

One downside is that you wouldn't be able to do the analysis of the vtables in wasm-split, but it would be extremely useful for projects in other languages as well if that information could also be passed to wasm-split.

jbms · 2024-04-29T20:37:05Z

I'll try to get my prototype into a git repo so that others can take a look.

As far as integrating it into wasm-split vs wasm-bindgen --- I think that for users of wasm-bindgen it would work better to integrate it into wasm-bindgen, because wasm-bindgen already post-processes the .wasm and to properly split the wasm-bindgen-generated javascript will also require integration.

However, the same strategy could certainly be employed for other languages. The only thing I've done that is rust-specific is the async integration. But I think you could expose it in a similar (maybe slightly less ergonomic way) to C and C++ via a C preprocessor macro. Note that the analysis of data symbols like vtables is not rust-specific at all --- it is entirely based on the relocation information emitted by LLVM wasm-ld, so I don't think there would be any problem integrating that into wasm-split.

Disconnecting the call graph at the split points is not just to help the call graph analysis. Initially I tried leaving the call graph connected and just marking the function noinline. The problem was that the compiler/linker still inferred information across the call boundary, which made the splitting ineffective in the test case: in particular, in the example snippet I showed above, the split function returns Pin<Box<dyn futures::io::AsyncRead>> which is represented as a pointer to the object and a pointer to the vtable. Because the compiler/linker inferred that the vtable pointer was a constant, it just propagated this constant to the caller, which led to the "main" module having a dependency on basically all of zstd via the vtable. While C++ doesn't separate the vtable pointers in the same way, I imagine something similar might happen in C++ due to devirtualization, etc. Disconnecting the call graph provides a pretty robust way to prevent this sort of thing from happening even in the presence of lto.

jbms · 2024-04-30T05:40:19Z

The code is now available here: https://github.com/jbms/wasm-split-prototype

gbj · 2024-05-01T23:51:32Z

@jbms I threw together a quick prototype adaptation of this using Leptos to lazy-load a second page. It took me about 15-20 mins to get it to work perfectly with our reactive system. I ran into a bunch of odd little things along the way that I didn't investigate (panics when trying to use one type as return type vs. another, etc.) but I just have to say: I've been waiting for this moment for about 4 years of using WASM with Rust. This is honestly a game-changer for Rust front-end frameworks: code splitting is one of the last big drawbacks relative to JS. Thanks so much for your work on it.

Seeing our reactive system work in both directions across a split binary... Truly awesome.

If there are ways people in the community can help out with testing or development let me know.

Screen.Recording.2024-05-01.at.7.41.42.PM.mov

https://github.com/gbj/wasm-split-prototype

jbms · 2024-05-02T00:55:43Z

@gbj Glad that you were able to get a test with leptos working.

Indeed I also saw code splitting as a pretty critical limitation for moving certain types of applications to WebAssembly in rust, and put together this prototype in order to verify that the limitation could be eliminated in the future.

In fact I don't have a ton of time to work on this, so if you or others are interested in helping turn this prototype into a real usable thing that would be awesome.

The first step would be to determine whether this should be integrated into wasm-bindgen or made into a standalone library/tool -- I think integrating it into wasm-bindgen is the best option but that depends on the wasm-bindgen maintainers being supportive of its inclusion.

jbms · 2024-05-03T17:24:14Z

By the way, in the prototype, functions marked as split points need to have a body that is compatible with a sync function, i.e. no use of await. Additionally, the return type needs to be a concrete type, no existential types like impl Trait. Is that the return type issue you ran into @gbj , or was it a different issue?

Making real async function bodies work should be easy enough (though it may require boxing the Future). Making existential return types work might be possible, but they would run the risk of reducing the effectiveness of the splitting by propagating code dependencies across the split point.

gbj · 2024-05-03T17:48:23Z

I encountered the no impl Trait early and that makes perfect sense. I also ran into some panics (during the build process) that seemed hard to predict: for example, if I added a console log in the split function, building would panic, but if I removed it, it was fine; a function that returned -> AnyView<Dom> panicked during build, but -> HtmlElement<Button, (), (), Dom> was fine, etc.

I was just messing around at this point so didn't bother experimenting further. If reproductions would be helpful I'm happy to share them but I was pulling in so much outside library code from the framework I don't know whether an MRE would be easy to make.

jbms · 2024-05-04T04:59:48Z

Overall I didn't put in a lot of effort to make the implementation robust, because I was just trying to create a proof-of-concept and I expect all or most of the code will end up being rewritten before this gets integrated into wasm-bindgen.

However, issues that indicate limitations of the current splitting strategy would be interesting to look at. Panics during building a probably relatively easy to analyze even without a reduced example. Crashes at runtime are significantly more annoying to debug...

BlinkyStitt · 2024-05-04T05:16:21Z

One place I've been wanting a split wasm is in an Audio Worklet. It gets its own special thread that doesn't have everything (no TextEncoder/Decoder). Will this code help with that?

jbms · 2024-05-04T05:35:27Z

One place I've been wanting a split wasm is in an Audio Worklet. It gets its own special thread that doesn't have everything (no TextEncoder/Decoder). Will this code help with that?

I don't have experience with audio worklets, but I think the prototype won't work as is because it generates some JavaScript that uses fetch to retrieve the additional modules. However, you could probably modify the generated JavaScript in __wasm_split.js to instead send a message to the main thread, have the main thread fetch and compile the module, and then send it back to the worklet.

To support multi-threading with a SharedArrayBuffer memory we would actually benefit from something similar, to avoid redundantly fetching and compiling the same module in more than one worker.

gbj · 2024-05-04T11:11:11Z

However, issues that indicate limitations of the current splitting strategy would be interesting to look at. Panics during building a probably relatively easy to analyze even without a reduced example. Crashes at runtime are significantly more annoying to debug...

Just went back and checked and both panics I experienced were actually the assert at wasm_split_cli/src/emit.rs:310, with the result

thread 'main' panicked at crates/wasm_split_cli/src/emit.rs:310:21:
func_id=0

jbms · 2024-05-04T19:20:57Z

Yeah previously the prototype did not properly handle references to imported functions from split modules. I just pushed out a fix for that.

jbms · 2024-05-04T19:23:43Z

@daxpedda Would you be open to a PR that integrates module splitting into wasm-bindgen?

gbj · 2024-05-05T00:17:46Z

Yeah previously the prototype did not properly handle references to imported functions from split modules. I just pushed out a fix for that.

Thanks, that fixed all my issues. With that additional commit, this is working perfectly for me.

Here it is, lazily loading code for each of three separate routes, as well as code shared between the three routes.

code-split-routing.mov

I'll stop spamming in this issue now :-) This is just very exciting, since WASM code-splitting has been an unrealized dream for a long time.

I will be writing our next release to support async functions in reasonable places, so that this can be a drop-in enhancement -- whether it is included officially in wasm-bindgen (which would certainly have my vote) or whether we need to add it to our own build tooling separately.

ETA: Thinking out loud; It would be very useful to have a way to output a manifest of all the WASM bundles per split. You can see one waterfall in my example above, where it needs to load the shared chunk and also the "view A" chunk, but doesn't load A until after the shared chunk. Not sure if those can be done concurrently, but in any case with server-side rendering if we have access to a manifest (just a JSON file that says "entry point function A will requires these 3 WASM files") we can start preloading them all immediately.

jbms · 2024-05-05T14:45:00Z

ETA: Thinking out loud; It would be very useful to have a way to output a manifest of all the WASM bundles per split. You can see one waterfall in my example above, where it needs to load the shared chunk and also the "view A" chunk, but doesn't load A until after the shared chunk. Not sure if those can be done concurrently, but in any case with server-side rendering if we have access to a manifest (just a JSON file that says "entry point function A will requires these 3 WASM files") we can start preloading them all immediately.

It would be pretty easy to fix the generated JavaScript to fetch and instantiate all the modules in parallel. The start functions of the split modules just add entries to the indirect function table and there are no ordering constraints currently, though if support for splitting c++ dynamic initialization code for global variables were added, then those would have ordering constraints and need to be handled differently, probably by putting them in a separate function to be called after all the modules are instantiated.

jbms · 2024-05-06T21:54:03Z

I pushed a change to fetch and instantiate the common chunks concurrently.

jbms added the enhancement label Apr 27, 2024

This was referenced Apr 29, 2024

Splitting and lazy loading wasm modules rustwasm/team#52

Open

lazy loading and resumability ala qwik leptos-rs/leptos#12

Open

erlend-sh mentioned this issue May 2, 2024

Switch from Svelte to Leptos sebadob/rauthy#309

Closed

jbms mentioned this issue May 2, 2024

Split the wasm file bevyengine/bevy#4555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting into multiple lazily-loaded modules #3939

Splitting into multiple lazily-loaded modules #3939

jbms commented Apr 27, 2024

tlively commented Apr 29, 2024

jbms commented Apr 29, 2024

jbms commented Apr 30, 2024

gbj commented May 1, 2024

jbms commented May 2, 2024

jbms commented May 3, 2024

gbj commented May 3, 2024

jbms commented May 4, 2024

BlinkyStitt commented May 4, 2024

jbms commented May 4, 2024

gbj commented May 4, 2024

jbms commented May 4, 2024

jbms commented May 4, 2024

gbj commented May 5, 2024 •

edited

jbms commented May 5, 2024

jbms commented May 6, 2024

Splitting into multiple lazily-loaded modules #3939

Splitting into multiple lazily-loaded modules #3939

Comments

jbms commented Apr 27, 2024

Motivation

Proposed Solution

Alternatives

Additional Context

tlively commented Apr 29, 2024

jbms commented Apr 29, 2024

jbms commented Apr 30, 2024

gbj commented May 1, 2024

jbms commented May 2, 2024

jbms commented May 3, 2024

gbj commented May 3, 2024

jbms commented May 4, 2024

BlinkyStitt commented May 4, 2024

jbms commented May 4, 2024

gbj commented May 4, 2024

jbms commented May 4, 2024

jbms commented May 4, 2024

gbj commented May 5, 2024 • edited

jbms commented May 5, 2024

jbms commented May 6, 2024

gbj commented May 5, 2024 •

edited