Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexSet references cannot be passed across unwind boundaries #576

Closed
bdonlan opened this issue Apr 18, 2019 · 7 comments · Fixed by #749
Closed

RegexSet references cannot be passed across unwind boundaries #576

bdonlan opened this issue Apr 18, 2019 · 7 comments · Fixed by #749

Comments

@bdonlan
Copy link

bdonlan commented Apr 18, 2019

The following minimal test case fails on regex 1.1.6:

fn main() {
    let rs = regex::RegexSet::new(vec![""]).unwrap();
    let rrs = &rs;
    std::panic::catch_unwind(move || rrs.matches(""));
}

with the following error output:

error[E0277]: the type `std::cell::UnsafeCell<std::option::Option<std::boxed::Box<std::cell::RefCell<regex::exec::ProgramCacheInner>>>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
 --> src/main.rs:4:5
  |
4 |     std::panic::catch_unwind(move || rrs.matches(""));
  |     ^^^^^^^^^^^^^^^^^^^^^^^^ `std::cell::UnsafeCell<std::option::Option<std::boxed::Box<std::cell::RefCell<regex::exec::ProgramCacheInner>>>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
  |
  = help: within `regex::re_set::unicode::RegexSet`, the trait `std::panic::RefUnwindSafe` is not implemented for `std::cell::UnsafeCell<std::option::Option<std::boxed::Box<std::cell::RefCell<regex::exec::ProgramCacheInner>>>>`
  = note: required because it appears within the type `thread_local::CachedThreadLocal<std::cell::RefCell<regex::exec::ProgramCacheInner>>`
  = note: required because it appears within the type `regex::exec::Exec`
  = note: required because it appears within the type `regex::re_set::unicode::RegexSet`
  = note: required because of the requirements on the impl of `std::panic::UnwindSafe` for `&regex::re_set::unicode::RegexSet`
  = note: required because it appears within the type `[closure@src/main.rs:4:30: 4:53 rrs:&regex::re_set::unicode::RegexSet]`
  = note: required by `std::panic::catch_unwind`

error[E0277]: the type `std::cell::UnsafeCell<regex::exec::ProgramCacheInner>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
 --> src/main.rs:4:5
  |
4 |     std::panic::catch_unwind(move || rrs.matches(""));
  |     ^^^^^^^^^^^^^^^^^^^^^^^^ `std::cell::UnsafeCell<regex::exec::ProgramCacheInner>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
  |
  = help: within `regex::re_set::unicode::RegexSet`, the trait `std::panic::RefUnwindSafe` is not implemented for `std::cell::UnsafeCell<regex::exec::ProgramCacheInner>`
  = note: required because it appears within the type `std::cell::RefCell<regex::exec::ProgramCacheInner>`
  = note: required because it appears within the type `std::marker::PhantomData<std::cell::RefCell<regex::exec::ProgramCacheInner>>`
  = note: required because it appears within the type `thread_local::ThreadLocal<std::cell::RefCell<regex::exec::ProgramCacheInner>>`
  = note: required because it appears within the type `thread_local::CachedThreadLocal<std::cell::RefCell<regex::exec::ProgramCacheInner>>`
  = note: required because it appears within the type `regex::exec::Exec`
  = note: required because it appears within the type `regex::re_set::unicode::RegexSet`
  = note: required because of the requirements on the impl of `std::panic::UnwindSafe` for `&regex::re_set::unicode::RegexSet`
  = note: required because it appears within the type `[closure@src/main.rs:4:30: 4:53 rrs:&regex::re_set::unicode::RegexSet]`
  = note: required by `std::panic::catch_unwind`

error[E0277]: the type `std::cell::UnsafeCell<isize>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
 --> src/main.rs:4:5
  |
4 |     std::panic::catch_unwind(move || rrs.matches(""));
  |     ^^^^^^^^^^^^^^^^^^^^^^^^ `std::cell::UnsafeCell<isize>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
  |
  = help: within `regex::re_set::unicode::RegexSet`, the trait `std::panic::RefUnwindSafe` is not implemented for `std::cell::UnsafeCell<isize>`
  = note: required because it appears within the type `std::cell::Cell<isize>`
  = note: required because it appears within the type `std::cell::RefCell<regex::exec::ProgramCacheInner>`
  = note: required because it appears within the type `std::marker::PhantomData<std::cell::RefCell<regex::exec::ProgramCacheInner>>`
  = note: required because it appears within the type `thread_local::ThreadLocal<std::cell::RefCell<regex::exec::ProgramCacheInner>>`
  = note: required because it appears within the type `thread_local::CachedThreadLocal<std::cell::RefCell<regex::exec::ProgramCacheInner>>`
  = note: required because it appears within the type `regex::exec::Exec`
  = note: required because it appears within the type `regex::re_set::unicode::RegexSet`
  = note: required because of the requirements on the impl of `std::panic::UnwindSafe` for `&regex::re_set::unicode::RegexSet`
  = note: required because it appears within the type `[closure@src/main.rs:4:30: 4:53 rrs:&regex::re_set::unicode::RegexSet]`
  = note: required by `std::panic::catch_unwind`

error: aborting due to 3 previous errors

For more information about this error, try `rustc --explain E0277`.
error: Could not compile `testapp`.

To learn more, run the command again with --verbose.
@bdonlan
Copy link
Author

bdonlan commented Apr 18, 2019

Note that this is an issue for Regex types as well, not just RegexSet.

@BurntSushi
Copy link
Member

Did this use to work? (I don't think it did.) So are you reported this as a feature request?

It's not immediately obvious to me that this should work. But it's probably okay for regex internals to assert that it is unwind safe since interior mutability is never used for correctness, only performance.

@bdonlan
Copy link
Author

bdonlan commented Apr 19, 2019

I'm not sure if this was supported in the past; it was a bit of a sharp edge that bit me, as I'd assume normally that immutable objects would be unwind safe. If the interior mutability is only used for caching, perhaps the safest approach would be to drop the cache if a panic occurs inside the regex engine?

BurntSushi added a commit that referenced this issue Sep 2, 2019
This makes the thread_local (and by consequence, lazy_static) crates
optional by providing a naive caching mechanism when perf-cache is
disabled. This is achieved by defining a common API and implementing it
via both approaches.

The one tricky bit here is to ensure our naive version implements the
same auto-traits as the fast version. Since we just use a plain mutex,
it impls RefUnwindSafe, but thread_local does not. So we forcefully
remove the RefUnwindSafe impl from our safe variant.

We should be able to implement RefUnwindSafe in both cases, but
this likely requires some mechanism for clearing the regex cache
automatically if a panic occurs anywhere during search. But that's a
more invasive change and is part of  #576.
BurntSushi added a commit that referenced this issue Sep 2, 2019
This makes the thread_local (and by consequence, lazy_static) crates
optional by providing a naive caching mechanism when perf-cache is
disabled. This is achieved by defining a common API and implementing it
via both approaches.

The one tricky bit here is to ensure our naive version implements the
same auto-traits as the fast version. Since we just use a plain mutex,
it impls RefUnwindSafe, but thread_local does not. So we forcefully
remove the RefUnwindSafe impl from our safe variant.

We should be able to implement RefUnwindSafe in both cases, but
this likely requires some mechanism for clearing the regex cache
automatically if a panic occurs anywhere during search. But that's a
more invasive change and is part of  #576.
BurntSushi added a commit that referenced this issue Sep 3, 2019
This makes the thread_local (and by consequence, lazy_static) crates
optional by providing a naive caching mechanism when perf-cache is
disabled. This is achieved by defining a common API and implementing it
via both approaches.

The one tricky bit here is to ensure our naive version implements the
same auto-traits as the fast version. Since we just use a plain mutex,
it impls RefUnwindSafe, but thread_local does not. So we forcefully
remove the RefUnwindSafe impl from our safe variant.

We should be able to implement RefUnwindSafe in both cases, but
this likely requires some mechanism for clearing the regex cache
automatically if a panic occurs anywhere during search. But that's a
more invasive change and is part of  #576.
@rtkaratekid
Copy link

Just wanted to chime in that when using libfuzz or afl.rs this is also an issue because you need std::panic::RefUnwindSafe implemented for objects within the closure. If you're trying to fuzz something that involves regex matching, you just wont be able to (at least from my rust newbie perspective).

@dpathakj
Copy link

The situation may have changed here because of changes upstream in thread_local. The statement in the #[cfg(not(feature = "perf-cache"))] impl of Cached<T> states that "CachedThreadLocal impls Send, Sync and UnwindSafe, but NOT RefUnwindSafe." That doesn't appear to be the case on tip of master, because (Cached)ThreadLocal is now RefUnwindSafe. If you remove that PhantomData you'll get an implementation that passes the testcase above regardless of whether perf-cache is enabled. (Undoing f6f8276 breaks it again.) Should the PhantomData be removed or is there another reason specific to this crate that it should remain?

(I noticed this because the build is broken due to a warning. CachedThreadLocal is deprecated and now just passes through to ThreadLocal. That relates to this issue because the fix (just use ThreadLocal) should probably also adjust the comment, but just replacing "CachedThreadLocal" with "ThreadLocal" in the comment doesn't seem to make it right.)

@BurntSushi
Copy link
Member

@dpathakj Yeah, I'm working on addressing this. I have a newborn at home though, so it's tough. Basically, my plan here is to actually remove the thread_local dependency. I've had this thought for a while since it is the root cause of some use cases resulting in bad memory leaks. Specifically, what happens is when a regex is used across lots of different threads and where (I suppose) some of those threads become idle but not actually destroyed, each thread winds up having a copy mutable scratch space used during matching. There have been a few issues about it I think. This one comes to mind: BurntSushi/rure-go#3

The main problem with removing the dependency is the reason why thread_local exists in the first place: it's really fast. So I need to do a bit of a benchmark investigation to see what I can do to replace it without sacrificing too much performance. Ideally without using any unsafe code, since the sort of unsafe required to make thread_local work is really quite tricky from my perspective and I don't really trust myself to get it right without a lot of deep thought and effort that I don't really have time for at the moment.

BurntSushi added a commit that referenced this issue Mar 12, 2021
This commit removes the thread_local dependency (even as an optional
dependency) and replaces it with a more purpose driven memory pool. The
comments in src/pool.rs explain this in more detail, but the short story
is that thread_local seems to be at the root of some memory leaks
happening in certain usage scenarios.

The great thing about thread_local though is how fast it is. Using a
simple Mutex<Vec<T>> is easily at least twice as slow. We work around
that a bit by coding a simplistic fast path for the "owner" of a pool.
This does require one new use of `unsafe`, of which we extensively
document.

This now makes the 'perf-cache' feature a no-op. We of course retain it
for compatibility purposes (and perhaps it will be used again in the
future), but for now, we always use the same pool.

As for benchmarks, it is likely that *some* cases will get a hair
slower. But there shouldn't be any dramatic difference. A careful review
of micro-benchmarks in addition to more holistic (albeit ad hoc)
benchmarks via ripgrep seems to confirm this.

Now that we have more explicit control over the memory pool, we also
clean stuff up with repsect to RefUnwindSafe.

Fixes #362, Fixes #576

Ref BurntSushi/rure-go#3
BurntSushi added a commit that referenced this issue Mar 12, 2021
This commit removes the thread_local dependency (even as an optional
dependency) and replaces it with a more purpose driven memory pool. The
comments in src/pool.rs explain this in more detail, but the short story
is that thread_local seems to be at the root of some memory leaks
happening in certain usage scenarios.

The great thing about thread_local though is how fast it is. Using a
simple Mutex<Vec<T>> is easily at least twice as slow. We work around
that a bit by coding a simplistic fast path for the "owner" of a pool.
This does require one new use of `unsafe`, of which we extensively
document.

This now makes the 'perf-cache' feature a no-op. We of course retain it
for compatibility purposes (and perhaps it will be used again in the
future), but for now, we always use the same pool.

As for benchmarks, it is likely that *some* cases will get a hair
slower. But there shouldn't be any dramatic difference. A careful review
of micro-benchmarks in addition to more holistic (albeit ad hoc)
benchmarks via ripgrep seems to confirm this.

Now that we have more explicit control over the memory pool, we also
clean stuff up with repsect to RefUnwindSafe.

Fixes #362, Fixes #576

Ref BurntSushi/rure-go#3
@BurntSushi
Copy link
Member

This should be fixed in regex 1.4.4. All regex types now impl RefUnwindSafe.

This was referenced Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants