Closed
Description
What version of regex are you using?
If it isn't the latest version, then please upgrade and check whether the bug
is still present.
Describe the bug at a high level.
1.4.4 breaks windows build of grpcio.
What are the steps to reproduce the behavior?
Just run cargo test --all
on Windows.
What is the actual behavior?
Build fails when generating bindings using rust-bindgen, which uses regex to process sources.
When downgrade regex back to before e040c1b, the build can finish successfully, so the bug is probably introduced by #749.
What is the expected behavior?
It compiles successfully.
What do you expect the output to be?
Metadata
Metadata
Assignees
Labels
No labels
Activity
BurntSushi commentedon Mar 13, 2021
I need a smaller reproduction please. There is no obvious reason that I see why any recent changes would cause a stack overflow. It also isn't clear whether you're reporting a compilation error or something that happens at regex runtime. If the former, that then sounds like a rustc bug, no? If the latter, a stack trace would be helpful.
BusyJay commentedon Mar 13, 2021
I'm not familiar with windows platform enough to provide a stacetrace, sorry. It's compilation error because bindgen fails to generate bindings. The reason that it fails to generate bindings because of stackoverflow. rust-bindgen is the only one dependency of grpcio-sys that uses regex. I guess there is some code in rust-bindgen that invokes regex heavily and result in stackoverflow.
I can confirm wrapping T into
Box
get around the issue, although I'm not sure if it's the correct fix.regex/src/pool.rs
Line 121 in 951b8b9
Before the PR, all values are allocated at heap when not enables
perf-cache
.BurntSushi commentedon Mar 13, 2021
I'm not familiar enough with Windows either.
Your patch that fixes the issue is quite weird. I wonder if this is not a case of recursion causing a stack overflow, but rather, too many things on the stack itself. If Windows has smaller stack sizes than, let's say, Linux or macOS, it could explain why Windows specifically is having a problem. But for this to be true, I think you would need to put a lot of regexes on the stack.
BusyJay commentedon Mar 13, 2021
According to the rustc output, the new Pool size becomes 848 byte, is it an expected size? The default stack size of a thread in Rust is 2MiB. Supposing half of the stack is used for other stack frames, then users are expected not to create more than 2473 regex expressions.
I think this is a common problems for all platform. The reason why it always stackoverflow on Windows may be related to different symbols on different platforms. Perhaps rust-bindgen just needs to handle more symbols than other platform.
BusyJay commentedon Mar 13, 2021
Actually it's true. I got the 2MiB from https://doc.rust-lang.org/std/thread/index.html#stack-size, but I missed the bottom line
After referring to the docs and writing small snippets to verify it, the stack sizes of main thread on MacOS and Linux are the same as
ulimit -s
, which is usually 8MiB. And on Windows it is 1MiB.BurntSushi commentedon Mar 14, 2021
@jdm One thing that would be useful is a pointer to the source code where bindgen uses regexes. If there are a lot of them on the stack, then I think that would explain things here.
BurntSushi commentedon Mar 14, 2021
servo/servo#28269 is the tracking bug in servo for this, as they are hitting it too.
BurntSushi commentedon Mar 14, 2021
I'm working on a patch for this now.
impl: substantially reduce regex stack size
jdm commentedon Mar 14, 2021
Every use of RegexSet in https://github.com/rust-lang/rust-bindgen/blob/dedbea5bc0317be4e7fd47a49392a6b080f47ac8/src/lib.rs#L1592 stores a regex::RegexSet inline, and BindgenOptions is stored in https://github.com/rust-lang/rust-bindgen/blob/dedbea5bc0317be4e7fd47a49392a6b080f47ac8/src/lib.rs#L2050, and that's stored on the stack.
impl: substantially reduce regex stack size
BurntSushi commentedon Mar 14, 2021
@jdm OMG. That's so many regexes! Hahahaha. That has to be it.
I opened #752 that shrinks the size of
Regex
from 856 bytes to 16. Lol. It actually used to be 552 bytes before 1.4.4, so it was never particularly small. So it sounds like it must have crossed a stack size threshold somewhere.132 remaining items