Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack Overflow on Windows only (regression 1.1.0) #23

Closed
pragmatrix opened this issue Jan 9, 2021 · 12 comments
Closed

Stack Overflow on Windows only (regression 1.1.0) #23

pragmatrix opened this issue Jan 9, 2021 · 12 comments

Comments

@pragmatrix
Copy link

Hi, maintainer of rust-skia here. Since a two days we were getting stack overflow errors with a binding generation step on Windows only on our nightly builds, and the cause was a transitive SemVer update of thread_local dependency from 1.0.1 to 1.1.0.

Sticking to 1.0.1 fixed the issue.

Here is the dependency tree:

├── bindgen v0.56.0
│   ├── bitflags v1.2.1
│   ├── cexpr v0.4.0
│   │   └── nom v5.1.2
│   │       └── memchr v2.3.4
│   │       [build-dependencies]
│   │       └── version_check v0.9.2
│   ├── clang-sys v1.0.1
│   │   ├── glob v0.3.0
│   │   ├── libc v0.2.82
│   │   └── libloading v0.6.6
│   │       └── winapi v0.3.9
│   │   [build-dependencies]
│   │   └── glob v0.3.0
│   ├── clap v2.33.3
│   │   ├── atty v0.2.14
│   │   │   └── winapi v0.3.9
│   │   ├── bitflags v1.2.1
│   │   ├── strsim v0.8.0
│   │   ├── textwrap v0.11.0
│   │   │   └── unicode-width v0.1.8
│   │   ├── unicode-width v0.1.8
│   │   └── vec_map v0.8.2
│   ├── env_logger v0.8.2
│   │   ├── atty v0.2.14 (*)
│   │   ├── humantime v2.0.1
│   │   ├── log v0.4.11
│   │   │   └── cfg-if v0.1.10
│   │   ├── regex v1.4.3
│   │   │   ├── aho-corasick v0.7.15
│   │   │   │   └── memchr v2.3.4
│   │   │   ├── memchr v2.3.4
│   │   │   ├── regex-syntax v0.6.22
│   │   │   └── thread_local v1.0.1
│   │   │       └── lazy_static v1.4.0
│   │   └── termcolor v1.1.2
│   │       └── winapi-util v0.1.5
│   │           └── winapi v0.3.9
│   ├── lazy_static v1.4.0
│   ├── lazycell v1.3.0
│   ├── log v0.4.11 (*)
│   ├── peeking_take_while v0.1.2
│   ├── proc-macro2 v1.0.24
│   │   └── unicode-xid v0.2.1
│   ├── quote v1.0.8
│   │   └── proc-macro2 v1.0.24 (*)
│   ├── regex v1.4.3 (*)
│   ├── rustc-hash v1.1.0
│   ├── shlex v0.1.1
│   └── which v3.1.1
│       └── libc v0.2.82

Any ideas what could have caused that?

@Amanieu
Copy link
Owner

Amanieu commented Jan 9, 2021

The internal data structure was rewritten in #22 to use a concurrent vector instead of a hash table. It should behave identically to the previous version, just faster.

Can you get a backtrace of the crash>

@pragmatrix
Copy link
Author

Hmm, the stacktrace just points to the function where it happens ensure_libclang_is_loaded in bindgen, here is the suspicious code:

bindgen:

    if clang_sys::is_loaded() {
        return;
    }

    // XXX (issue #350): Ensure that our dynamically loaded `libclang`
    // doesn't get dropped prematurely, nor is loaded multiple times
    // across different threads.

    lazy_static! {
        static ref LIBCLANG: std::sync::Arc<clang_sys::SharedLibrary> = {
            clang_sys::load().expect("Unable to find libclang");
            clang_sys::get_library().expect(
                "We just loaded libclang and it had better still be \
                 here!",
            )
        };
    }

    clang_sys::set_library(Some(LIBCLANG.clone()));

clang_sys::is_loaded:

        pub fn is_loaded() -> bool {
            LIBRARY.with(|l| l.borrow().is_some())
        }

clang_sys::load:

        #[allow(dead_code)]
        pub fn load() -> Result<(), String> {
            let library = Arc::new(load_manually()?);
            LIBRARY.with(|l| *l.borrow_mut() = Some(library));
            Ok(())
        }
 thread_local!(static LIBRARY: RefCell<Option<Arc<SharedLibrary>>> = RefCell::new(None));

get_library:

        pub fn get_library() -> Option<Arc<SharedLibrary>> {
            LIBRARY.with(|l| l.borrow_mut().clone())
        }

set_library:

        pub fn set_library(library: Option<Arc<SharedLibrary>>) -> Option<Arc<SharedLibrary>> {
            LIBRARY.with(|l| mem::replace(&mut *l.borrow_mut(), library))
        }

I am not sure where exactly the stack overflow happens, it looks like a conflict with the use of thread_local from within a lazy_static initialization block.

@Amanieu
Copy link
Owner

Amanieu commented Jan 9, 2021

Can you give me instructions to reproduce this crash? I'll have a look.

@pragmatrix
Copy link
Author

I've prepared a branch with all unnecessary build steps removed.

With

git clone https://github.com/pragmatrix/rust-skia.git --branch thread-local-issue-23
cd rust-skia
cargo build -vv

the stack overflow should be reproducible on a Windows machine.

@pragmatrix
Copy link
Author

I've forgot, you also need a recent version of the Microsoft Build Tools and clang/libclang (LLVM) in the path.

@Amanieu
Copy link
Owner

Amanieu commented Jan 9, 2021

I'm running this on Linux but can't reproduce the issue.

@pragmatrix pragmatrix changed the title Stack Overflow (regression 1.1.0) Stack Overflow on Windows only (regression 1.1.0) Jan 10, 2021
@Amanieu
Copy link
Owner

Amanieu commented Jan 10, 2021

Can you post a full backtrace from the debugger? If it's a stack overflow then it should be repeating itself with the same functions getting called recursively.

@Amanieu
Copy link
Owner

Amanieu commented Jan 10, 2021

Also note that the thread_local! macro is provided by std, it is not related to this crate. In fact, from your dependency graph, ThreadLocal is only uaed within regex, so you might want to see if there's something wrong with your use of regex.

@pragmatrix
Copy link
Author

The actual recursion steps aren't shown in Visual Studio, the function ensure_libclang_is_loaded is shown as the last step of the call trace when I run the executable with VS as a debugger attached.

so you might want to see if there's something wrong with your use of regex.

Thank you for pointing to that. I've removed some of the regular expression strings that set up the options for Bindgen, and after a certain number, the stack overflow disappeared. For now it seems to be related to the number of regular expressions that are initialized before Bindgen is started.

@Amanieu
Copy link
Owner

Amanieu commented Jan 10, 2021

Maybe the regex are misbehaving somehow? Can you check if the same options are passed to bindgen with thread_local 1.0.1 vs 1.1.0?

@pragmatrix
Copy link
Author

pragmatrix commented Mar 13, 2021

A recent update to regex version 1.4.4 (which removes the thread_local dependency), causes a stack overflow on Windows at the same build step, too. The scope of this problem seems to widen: rust-lang/regex#750

@Amanieu
Copy link
Owner

Amanieu commented Mar 13, 2021

This doesn't seem to be a problem with thread_local, so I'm closing this issue.

@Amanieu Amanieu closed this as completed Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants