-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust #1208
Comments
Support this, it would be a great feature. |
I asked the author of Rust Regex library @BurntSushi to help bringing Rust flavor to regex101.com.
So I think, it should be possible to just copy the Go flavor and give it the name I created a playground gist that can be compiled and run online to check special cases where the flavors can differ: |
Please don't do this. It's one thing to say, "Go is very similar to Rust, so using that as a stopgap for most cases on ASCII text will work fine." But please don't officially label it as Rust because users will ultimately get quite confused when it differs from the actual Rust implementation. :-) |
I wanted to say more precisely: |
@BurntSushi, I have a question about "flags/modifiers". |
I think special delimiters are not used in Rust. The regular expression is just a String. |
These questions seem off topic for this thread, but they can be readily answered by the docs:
|
Could the Rust regex engine be compiled into WASM and used on the website? If so, could someone create a PoC? That would speed up the process of actually getting this implemented. |
I've created a very rudimentary proof of concept following the wasm-bindgen guide. To test, clone the repo, then run |
@JonathanTroyer Thank you! Mind including a readme so I know how to run, build, etc? |
Done. Sorry for overlooking it, and thanks for working on this! Happy to help more in the future. |
@JonathanTroyer Thanks, I'll have a look this weekend most likely. Does this bundle the Rust regex engine into WASM, or are they just native bindings, relying on the user to have Rust installed locally? |
No bindings, it's fully compiled to WASM. I've got it hosted on Netlify for quick testing. |
Sweet! What size is it? |
In development mode with no optimizations, about 3MB everything included. The demo does not use all the features of Rust's regex package, so that size may grow depending on the final usage. |
@JonathanTroyer That is quite large, ideally we'd want it down to <500kb. I have followed their optimization guide, but I am unable to get |
Untill they make it fully |
@firasdib How do the other implementations work? I'd assume there's less of a size restriction if you don't have to serve the binaries. Assuming it is just something like a CLI program that runs locally, would you be able to specify the required interface? If so, somebody here could likely quickly build a working implementation. |
@tgross35 They are compiled to web assembly and interfaced through Javascript. The binaries will be downloaded from my server, so for the sake of both me and the users, they should be as small as possible. |
I also made a PWA progressive web app with Wasm/Webassembly compiled from Rust. So it uses exactly the regex crate. |
That's pretty interesting @bestia-dev, what size of the wasm binaries were you able to get down to? I think that is the main crux of support here |
The rust_regex_explanation_pwa_bg.wasm file is 1MB. |
@bestia-dev are you building the regex crate with the perf features disabled? That might help reduce binary size. Not sure though. |
I took @JonathanTroyer's small example and modified the Cargo.toml a little, and rebuilt std + panic on abort on nightly. Building from https://github.com/akarras/wasm-regex The readme includes the exact wasm-pack command I used to create it. |
My bet is that you can disable some of the Unicode features too. Some are pretty arcane and not often used. I would recommend just using the following: unicode-bool, unicode-case, unicode-gencat, unicode-perl, unicode-script. In other words, disable unicode-age and unicode-segment. Probably not a huge win. If you wanted to go barebones, you could try just enabling unicode-case and unicode-perl. |
With @BurntSushi's suggestions, down to 445kb. I'm not sure what kind of API is needed, but I think that gives enough headroom to add a few things while staying under the <500kb goal. |
I wrote a quick manual json output and a replacer function to go with it https://github.com/tgross35/wasm-regex, my binary size is even smaller at 427kB. Newer versions maybe? I have npm LTS 8.19.2 and wasm-pack 0.10.3 @BurntSushi is there a good way to match up capture group numbers and names? It seems like you can iterate names |
Thank you! I wasn't aware of this change.
I will check it out! |
@tgross35 While it's fine you return the values hex encoded, they are being treated as literals (transferred to js as P.s., |
Hm, is it rendering correctly on the page? It seems like console tends to print with the escape characters, but it shows up correctly in HTML.
Good catch, all gone :) |
Just a short progress update. Things are moving forward, albeit a bit slow. I'm almost done implementing the regex parser for Rust, and will then proceed with the other necessary adjustments. As it stands right now, the |
Thanks for the update, that all sounds good for a start. I'll revisit the non unicode stuff after the other stuff is working 👍 |
How do you guys recommend we handle substitution strings? In other languages, you are able to insert |
Every regex engine has their own replacement string stuff. There isn't a ton of consistency there. The I'm not sure what you mean by inserting |
It would help to show an example. |
@BurntSushi You're right, clumsy formulation on my part. I meant regarding the string literals used. In the other languages, I've opted for a string type that allows for escapes to be included, i.e. |
Those are fine. |
Are you just talking about how js handles the escapes and how it's displayed for this kind of info?
|
Sorry, I may have confused you. I am talking specifically about the substitution string. Using the code @tgross35 provided, you can't insert a newline by using the string |
It does seem like it's working as I would expect from the library side That doesn't work in the little gui thingy in that wasm demo, but I think that's just because js automatically adds escapes to |
@tgross35 That's the problem, it needs to work from the GUI. The string needs to be expanded on the Rust side of things :). The users will insert If that's not possible, I'll have to expand them on the JS side. |
This is only needed for the replace string and not the content, since it's not multiline on the GUI - right? It shouldn't be too bad to do them on the rust side. How do the other languages handle it, they need to handle all js double escapes correct? https://www.tutorialspoint.com/escape-characters-in-javascript |
Actually I guess they would technically need to do the Rust escapes, which also has a couple other tricks https://doc.rust-lang.org/reference/tokens.html#quote-escapes and this might also apply to the input |
Is there Go code that already does this for the Go regex engine? If so, it might be good to be able to just port that. |
Honestly I guess we could just use the literal rustc lexer https://docs.rs/rustc_lexer/latest/rustc_lexer/unescape/index.html even though the published one is unfortunately a three year old version. Guess the implementation doesn't change much |
It's no problem, I can do it on my end - I just wanted to double check if there was a way to handle it in Rust without my intervention. |
No worries, writing a wrapper for the rustc lexer will be quick, and it already handles everything exactly how the language does. I'll add it in a bit |
Okay, cool - think this might be what you need. There are now 2 (for find) or 3 (for the replaces) optional parameters to set validation/unescaping for each of the 2/3 input strings. They accept the values Error example: (my fork is up to date) |
Thank you for all your help everybody, especially @tgross35! I will have an initial release for Rust in the near future, and we can improve on it where necessary. |
That is awesome news!! Feel free to tag me when bugfixes pop up. |
@firasdib just a minor nit - the delims are I think this propegates to the unescaping algorithm too, Anyway, thank you for the awesome site and all the work on this, it looks great! |
w00t! Awesome work everyone! |
@firasdib FYI, this ticket still needs to move on the project board: https://github.com/firasdib/Regex101/projects/3 |
Flavor Request
The syntax is similar to Perl, but I feel it has enough differences to justify a different flavor, especially when one considers the massive popularity of ripgrep (which is used by VSCode!) and the growth of Rust.
The text was updated successfully, but these errors were encountered: