Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WASM support #14

Closed
zacharywhitley opened this issue Mar 29, 2021 · 12 comments
Closed

Add WASM support #14

zacharywhitley opened this issue Mar 29, 2021 · 12 comments
Labels
enhancement New feature or request
Milestone

Comments

@zacharywhitley
Copy link

This isn't quite specific to lingua-rs but I've been looking into WebAssembly lately and it would be great to be able to use Lingua-rs into a wasm project. I did an initial test but it failed on a problem with bzip2-sys. I'll have to keep looking into it and let you know what I find but I thought you might be interested.

@pemistahl
Copy link
Owner

Indeed, WebAssembly is definitely an interesting topic. I have not dealt with it so far but would like to do some time in the future. Do you know whether modern browsers are capable of storing all language models on the client side? Are there any memory limitations?

Yes, please, let me know about anything you find out. I'm curious about these things. Feel free to send me a pull request if you like. Adding WASM support to Lingua would surely be an exciting project.

@pemistahl pemistahl changed the title WASM support Add WASM support Mar 29, 2021
@pemistahl pemistahl added the enhancement New feature or request label Mar 29, 2021
@martindisch
Copy link
Contributor

Cool to see there's interest in this, because I might have a solution. I'm looking to have lingua work on wasm32-unknown-unknown, not to use it in the browser (which should be possible too), but to embed it into different programming languages (e.g. C#) by way of running it in a Wasm runtime like wasmtime.

While looking into it I found two issues that I made changes for in #19.

bzip2

As Zachary pointed out, the zip crate has a default feature allowing the use of bzip2. This has a dependency on bzip2-sys which is a binding to the libbz2 library, which of course does not exist in a typical Wasm environment. This was pretty easy to resolve, since we can just disable the default features of the zip crate. bzip2 isn't used anyway except when specified, because the standard compression method for zip files is deflate. So I think there should be no change in behavior from this.

rayon

In most Wasm environments (except maybe when using WASI in the future) there's no access to threading. Unfortunately, unlike the bzip2 case, this is not apparent during compilation, the WebAssembly code will instead panic at runtime. The slightly tricky question is how to best disable rayon on demand.

  • Ideally we'd have rayon as an optional (but default) feature, because that's what it is. But if we did that, the code of people having default features disabled (for example to only bundle a small number of languages) would suddenly no longer use rayon, unless they knew that this is now a feature they have to explicitly re-enable in that case. This makes it technically a breaking change and documentation would have to be updated too.
  • That's why I chose the slightly less ergonomic route of adding a feature that disables rayon. It's not as nice because we're now using double negations for conditional compilation like #[cfg(not(feature = "without-rayon"))] (if we don't have rayon disabled, do this). But it's non-breaking.

@zacharywhitley
Copy link
Author

Thanks for the reminder. I had totally forgotten about this and had gone down the rabbit hole of learning about wasm. If you're going to be using C# have you looked into using Blazor?

@martindisch
Copy link
Contributor

I haven't actually had a use case for Blazor yet. As I understand it (and that might be wrong) Blazor works by having the full runtime compiled as Wasm, which then allows for loading of normal DLLs which are your code and any of its dependencies. So in that scenario you don't actually compile your code to Wasm.

What I'm trying to do is pretty different, I want to use Rust from C#, just not in the standard native FFI way that you'd achieve with P/Invoke in .NET. It's basically WebAssembly as the portable compilation target for running lightweight isolated modules on the server.

So far with the modifications from in the linked PR I was able to compile some code using lingua into a Wasm module that I can load and use from C# with no trouble. But I suspect language detectors such as lingua won't be the best candidates to use in this way, since they rely on lots of things to be loaded into memory (language models and the like), which comes at a relatively heavy "startup" cost. So ideally you'd keep this state around and reuse it for further requests, but that goes against the typical execution model for WebAssembly because it's supposed to be short-lived and you'd want to recreate the environment for subsequent requests to maintain isolation.

@pemistahl
Copy link
Owner

Hi @martindisch,

I apologize for my late response. I was busy writing the Python implementation of the library. Thank you for your effort to make my library compatible with WASM environments.

As far as Rayon is concerned, I actually favor the first option of making it an optional but default feature. For users who have disabled all default features, the library will not break but only run on a single CPU core. I think people are capable of reading updated documentation and adding the Rayon feature again, that is not much work to do. The alternative, namely adding a feature that disables Rayon, is ugly in my opinion. That's not the approach that I want for my library.

I will make some updates to your pull request and then merge it nevertheless. Thanks again for your work. :)

@martindisch
Copy link
Contributor

No worries, all in good time. I like your approach, have at it! Let me know if you want any help. And there's no rush, it's just something that came up in an experiment and there's no expectation this has to make it into the library. I'm just happy it exists!

By the way, we evaluated a bunch of language detection libraries (mainly from the C# ecosystem) at work and yours was the uncontested winner. Great job and thanks for all the effort you're putting in!

@pemistahl
Copy link
Owner

By the way, we evaluated a bunch of language detection libraries (mainly from the C# ecosystem) at work and yours was the uncontested winner.

Wow, that's cool. Thank you for your kind words. :-) I'm still a bit surprised that nobody has come up with the algorithm that I use. I've always wanted to contribute something useful to the open source community and I'm very happy that I've found something. Beste Grüße in die Schweiz. (-:

@pemistahl
Copy link
Owner

pemistahl commented Mar 25, 2022

@martindisch @zacharywhitley I've finally found the time to make the library compile to WASM. Would you like to test it? The easiest way to compile is to use wasm-pack:

wasm-pack build --target web -- --no-default-features --features all-languages

--no-default-features disables parallelism, otherwise the library fails at runtime. Instead of using the feature all-languages, you can specify the languages yourselves that you want to use.

In your HTML, you can then call it like this, for instance:

<script type="module">
    import init, { LanguageDetectorBuilder } from './pkg/lingua.js';

    init().then(_ => {
        const detector = LanguageDetectorBuilder.fromAllLanguages().build();
        console.log(detector.computeLanguageConfidenceValues("languages are awesome"));
    });
</script>

I will add unit tests for the WASM module later on.

@zacharywhitley
Copy link
Author

I hope that means you are feeling better. That's great. I'll take a look, thanks.

@pemistahl
Copy link
Owner

pemistahl commented Mar 25, 2022

I hope that means you are feeling better.

Yes, I do. :) Fortunately, after three vaccinations I wasn't as sick as I feared to be.

@martindisch
Copy link
Contributor

Wow, you even made a nice wrapper that lets people conveniently use it from JS, therefore opening up the library to a whole other ecosystem. That's going above and beyond! It's nice to see Rust really shine when it comes to compiling to or interoperating with other targets and languages. I can definitely tell you that since it now compiles to Wasm it works for me already, that's all that's needed for embedding it in different environments.

I'll post some news here about my latest experiment this weekend, although don't expect too much, since as I was saying it comes with some considerable downsides.

@martindisch
Copy link
Contributor

All done, you can check it out at #54.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants