Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatched use of fiber and thread locals #869

Open
Zoxc opened this issue Mar 23, 2024 · 9 comments
Open

Mismatched use of fiber and thread locals #869

Zoxc opened this issue Mar 23, 2024 · 9 comments

Comments

@Zoxc
Copy link

Zoxc commented Mar 23, 2024

On Windows, mimalloc uses fiber locals to detect the end of a fiber, then uses that to free the thread local heap, instead of freeing the thread local heap when the thread ends.

@res2k
Copy link
Contributor

res2k commented Mar 23, 2024

If you're not explicitly messing around with fibers on a thread, I think threads and fibers are practically the same (the thread could be seen as having exactly one fiber, I guess; or maybe thread and fiber are the same).
Also, the FLS functions seem to be consistently used for "per-thread" data, so it's plausible that even when dealing with fibers, things may work out OK.
Whether anyone did that, I don't know.
If you happen to work with multiple fibers, and ran into problems, maybe describe what's happening?

@daanx
Copy link
Collaborator

daanx commented Mar 23, 2024

Thanks @Zocx, @res2k. When mimalloc is statically linked, the fiber local storage is used to detect when a thread is terminated.

  • Thread start: how do we detect a thread start? Well, we don't :-) -- except by checking the thread local variable _mi_default_heap and initializing it if it is pointing to the default empty heap. So, on a first allocation in a new thread this is done (and all fibers associated with that thread share the thread local _mi_default_heap). The initialization also calls the prim_thread_associate_default_heap which sets the fiber local variable to a non NULL value -- this is for the initial fiber associated with this thread that allocates. Any further fibers associated with this thread won't set the value since the heap itself is still thread local (and is initialized now).

  • When any fiber is terminated, the mi_fls_done is called with the current fiber local variable but only when it is not NULL the mi_thread_done is called -- so only for the initial fiber the _mi_thread_done is called.

  • This works fine -- except if someone would explicitly create a new fiber (or fibers) to run on and then terminate the initial fiber associated with the thread... not sure how we can fix that robustly, or if that is worth fixing. I would say just don't do that and always keep the initial fiber alive while the thread is alive, and only terminate fibers that are subsequently explicitly created. Or link mimalloc dynamically as a DLL as in that case the DLL_THREAD_DETACH messages are used (and none of the fiber things apply)

Thanks for bringing this up, always good to rethink how this works. Maybe we should keep a thread local fiber count to fix the last point although I am not sure it would be worth fixing since fibers are hardly used nowadays.

@Zoxc
Copy link
Author

Zoxc commented Mar 23, 2024

There's also this trick / (in chromium). That's what I'm using in my Rust rewrite of mimalloc. It's also helpful in ensuring that the callback happens after the callback which destroys Rust's standard library's thread locals. There's apparently an issue with loaded DLLs though.

@res2k
Copy link
Contributor

res2k commented Mar 23, 2024

There's also this trick / (in chromium).

The thing is, FlsAlloc() allows to specify a callback, which is essentially a "thread end callback", at least in the (typical) case that fibers aren't really used explicitly. Additionally, since it's part of the Windows API, it's much easier to use this functionality across different compilers than your linked trick, which, as given, would only work on MSVC, perhaps clang, but would certainly need something else for gcc.

@daanx
Copy link
Collaborator

daanx commented Mar 23, 2024

Yes, as @res2k mentions, using the special data segment only works for MSVC; mimalloc actually uses that for process initialization detection for static libraries using msvc (see the end of init.c, using the .CRT$XIU data section). As such, the fiber API seems the most robust solution (if we use static linking, otherwise using DLL thread detach seems best). Do you happen to have a link to the rust trouble with thread locals?

@Zoxc
Copy link
Author

Zoxc commented Mar 23, 2024

Are you sure about mingw / GNU not supporting that section? I don't see a fallback path for GNU in Rust's standard library. It's kind of MSVC 6 or UCRT based anyway.

Do you happen to have a link to the rust trouble with thread locals?

Not sure what you mean here. Here's the implementation of thread local destructors in Rust's standard library if that's relevant.

@res2k
Copy link
Contributor

res2k commented Mar 24, 2024

Are you sure about mingw / GNU not supporting that section? I don't see a fallback path for GNU in Rust's standard library. It's kind of MSVC 6 or UCRT based anyway.

Well, to be sure, I tried it with Compiler Explorer: https://godbolt.org/z/4Gv9qdGKK
Unfortunately, MSVC binary execution is not supported on CE. However, compiling & running locally, the output/return code is 1, implying the "thread callback" was executed.
Binary execution is supported for MinGW gcc and clang cases and can be seen directly in CE. It is, in both cases, 0 - ie the callback didn't run.
Also unsurprisingly, enabling warnings for unknown pragmas reports the #pragma comments.

@Zoxc
Copy link
Author

Zoxc commented Mar 24, 2024

It seems to work if you properly specify the segment with __attribute__((section(".CRT$XLB"))).

@daanx
Copy link
Collaborator

daanx commented Mar 24, 2024

Nice -- yes, I think as long you link with the UCRT (the microsoft shared libc) and emit the right linker sections it should work as I think it is eventually just the UCRT that will inspect the linker sections and call the functions in there. It won't work with other libc's though but I guess none exist for Windows (?). As such, this technique should work robustly as well I think, especially if Rust uses it as well :-)

(I wont switch (yet) to this solution for now though -- maybe it won't work with older libc's before UCRT, and I would need to test more as it might change when the tls exit functions are called relative to the FlsAlloc solution.)

(also, I wonder how the UCRT is able to call the TLS exit routines reliably... it must somehow be notified as well ? Ah, I guess UCRT is always a dynamically linked and thus gets DLL_THREAD_DETACH messages.. it would be good to check how this works though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants