fix possible race condition in `get_async_backend`, Fixes 425. #714

DavidJiricek · 2024-04-09T08:46:57Z

Related issue is #425

agronholm · 2024-04-09T16:25:44Z

My only concern with this is performance – this function is called a lot. I'd like to see some benchmarks against the previous version to ensure that there is no notable performance degradation.

DavidJiricek · 2024-04-13T20:08:40Z

I ran a benchmark and was surprised by the performance drop. It was noticeable.

So I made a new, more performant version - it is not as bulletproof as before, but it works and still fixes the problem.

It may be insecure if a following occurs:
The loaded module already has the `backend_class' attribute, but is not fully initialized.

In this case, get_async_backend() will return an incomplete object.

But in the current version of the code, the backend_class attribute is assigned at the very end of the module initialization. So the above situation should not occur. And it is still much safer than the current version of get_async_backend().

agronholm · 2024-04-15T20:33:04Z

Looks much better. Please add a changelog note (credit yourself and add the appropriate issue link) and I'll merge.

gschaffner · 2024-04-17T07:01:19Z

It may be insecure if a following occurs: The loaded module already has the backend_class attribute, but is not fully initialized.

In this case, get_async_backend() will return an incomplete object.

i believe that there are ways to do this that will incur neither the problem you mention in the quote nor major performance degradation. e.g. i think a technique like in master...9d2d95c should be good on both of these fronts¹.

i wrote this patchset a while back, but i got busy and thus never got around to submitting it :)

some notes on it that i had drafted at the time:

this should fix Thread race condition in _eventloop.get_asynclib() (with fix) #425 without degrading get_async_backend's performance.

regarding the problem in the current code:

The module will exist in sys.modules before the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to sys.modules beforehand prevents unbounded recursion in the worst case and multiple loading in the best. --- https://docs.python.org/3/reference/import.html#loading

regarding this proposed fix, which relies on Python's import locking: https://docs.python.org/3/whatsnew/3.3.html#a-finer-grained-import-lock, https://github.com/python/cpython/blob/v3.12.1/Lib/importlib/_bootstrap.py#L1357

i did write a regression test for this but in order to force the race condition to occur it relies on (a) get_async_backend using importlib and (b) monkeypatching importlib privates. it also has a fixed runtime since the monkeypatched thread needs to hold the module mid-initialization for long enough that the main thread can check if get_async_backend can bypass the import lock.

IMO that test should probably not be merged (what do you think about it?), but even if not merged it does serve as a test now (just not as a regression test for later).

↩

agronholm · 2024-04-17T22:42:07Z

It may be insecure if a following occurs: The loaded module already has the backend_class attribute, but is not fully initialized.
In this case, get_async_backend() will return an incomplete object.

i believe that there are ways to do this that will incur neither the problem you mention in the quote nor major performance degradation. e.g. i think a technique like in master...9d2d95c should be good on both of these fronts1.

Footnotes

i wrote this patchset a while back, but i got busy and thus never got around to submitting it :)

some notes on it that i had drafted at the time:
this should fix Thread race condition in _eventloop.get_asynclib() (with fix) #425 without degrading get_async_backend's performance.
regarding the problem in the current code:

The module will exist in sys.modules before the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to sys.modules beforehand prevents unbounded recursion in the worst case and multiple loading in the best. --- https://docs.python.org/3/reference/import.html#loading

regarding this proposed fix, which relies on Python's import locking: https://docs.python.org/3/whatsnew/3.3.html#a-finer-grained-import-lock, https://github.com/python/cpython/blob/v3.12.1/Lib/importlib/_bootstrap.py#L1357
i did write a regression test for this but in order to force the race condition to occur it relies on (a) get_async_backend using importlib and (b) monkeypatching importlib privates. it also has a fixed runtime since the monkeypatched thread needs to hold the module mid-initialization for long enough that the main thread can check if get_async_backend can bypass the import lock.
IMO that test should probably not be merged (what do you think about it?), but even if not merged it does serve as a test now (just not as a regression test for later).

↩

Can you elaborate on how this is different from what anyio does right now?

agronholm · 2024-04-21T20:34:41Z

@DavidJiricek would you mind adding a changelog entry?

davidbrochart · 2024-04-22T09:51:06Z

Just got hit by this race condition, thanks for the fix @DavidJiricek !

gschaffner · 2024-04-23T07:18:55Z

Can you elaborate on how this is different from what anyio does right now?

get_async_backend wants to use a cache for performance, so that it has a fast-path that avoids the expensive importlib call. right now, the cache it uses for this is sys.modules. the problem with this is that modules are placed into sys.modules before they have been initialized [ref]. thus when get_async_backend hits the cache (return sys.modules[...].backend_class) it can attempt to return an attribute of a module that is not initialized. (this happens when another thread is currently initializing the backend module.)

the import statement (and importlib.import_module) avoid returning partially-initialized modules by using an import lock [ref]. by accessing the module from sys.modules directly, AnyIO is currently bypassing said lock.

the way that master...9d2d95c is different is: the cache it uses is not sys.modules. it has AnyIO maintain its own cache instead of trying to piggyback on sys.modules. this cache is specifically for memoizing get_async_backend, i.e. we only place initialized items into it. so our cache has the invariant we want (it can only contain initialized objects), unlike sys.modules.

agronholm · 2024-04-23T08:42:53Z

Can you elaborate on how this is different from what anyio does right now?

get_async_backend wants to use a cache for performance, so that it has a fast-path that avoids the expensive importlib call. right now, the cache it uses for this is sys.modules. the problem with this is that modules are placed into sys.modules before they have been initialized [ref]. thus when get_async_backend hits the cache (return sys.modules[...].backend_class) it can attempt to return an attribute of a module that is not initialized. (this happens when another thread is currently initializing the backend module.)

the import statement (and importlib.import_module) avoid returning partially-initialized modules by using an import lock [ref]. by accessing the module from sys.modules directly, AnyIO is currently bypassing said lock.

the way that master...9d2d95c is different is: the cache it uses is not sys.modules. it has AnyIO maintain its own cache instead of trying to piggyback on sys.modules. this cache is specifically for memoizing get_async_backend, i.e. we only place initialized items into it. so our cache has the invariant we want (it can only contain initialized objects), unlike sys.modules.

Yes, I can see that this is indeed a preferable solution. Why the WeakValueDictionary though?

…get-async-backend

agronholm · 2024-04-23T22:22:58Z

Following the suggestion of @gschaffner, I think you should modify the solution to store the backend class in a dict on a successful import. It can be a normal dict though; the weak-value dictionary seems unnecessary.

gschaffner · 2024-04-23T22:46:38Z

Yes, I can see that this is indeed a preferable solution. Why the WeakValueDictionary though?

It can be a normal dict though; the weak-value dictionary seems unnecessary.

yeah, agreed. I think my thought process was that weakrefs would let the module get GC'd if it's no longer in use and someone forcibly unloads it (del sys.modules[...]). but probably that's not a case we should worry about supporting.

gschaffner

a couple minor comments, at your discretion. otherwise LGTM!

src/anyio/_core/_eventloop.py

docs/versionhistory.rst

Co-authored-by: Ganden Schaffner <github@xqzw.me>

agronholm · 2024-05-10T08:44:43Z

Thanks!

fix possible race condition in get_async_backend, Fixes 425.

e2f56ee

DavidJiricek mentioned this pull request Apr 9, 2024

Thread race condition in _eventloop.get_asynclib() (with fix) #425

Open

more performant version

f0eca4a

Merge branch 'master' into fix-get-async-backend

f347d4c

davidbrochart mentioned this pull request Apr 22, 2024

Release 7.0 ipython/ipykernel#1222

Open

6 tasks

DavidJiricek and others added 3 commits April 23, 2024 16:00

Merge branch 'agronholm:master' into fix-get-async-backend

cc32875

changelog entry

630fa9a

Merge remote-tracking branch 'origin/fix-get-async-backend' into fix-…

c5cc39c

…get-async-backend

agronholm added 3 commits May 9, 2024 14:43

Added issue link

a18dd5d

Replaced the implementation with a safer one

8b745b4

Merge branch 'master' into fix-get-async-backend

3734e25

agronholm requested a review from gschaffner May 9, 2024 12:25

gschaffner approved these changes May 10, 2024

View reviewed changes

src/anyio/_core/_eventloop.py Outdated Show resolved Hide resolved

src/anyio/_core/_eventloop.py Outdated Show resolved Hide resolved

docs/versionhistory.rst Outdated Show resolved Hide resolved

agronholm and others added 3 commits May 10, 2024 11:22

Update docs/versionhistory.rst

eee7a90

Co-authored-by: Ganden Schaffner <github@xqzw.me>

Addressed review comments

1b8ad98

Further simplified the implementation

10b173a

gschaffner approved these changes May 10, 2024

View reviewed changes

agronholm merged commit 8e8b7f5 into agronholm:master May 10, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix possible race condition in `get_async_backend`, Fixes 425. #714

fix possible race condition in `get_async_backend`, Fixes 425. #714

DavidJiricek commented Apr 9, 2024 •

edited

agronholm commented Apr 9, 2024

DavidJiricek commented Apr 13, 2024

agronholm commented Apr 15, 2024

gschaffner commented Apr 17, 2024 •

edited

agronholm commented Apr 17, 2024

Footnotes

agronholm commented Apr 21, 2024

davidbrochart commented Apr 22, 2024

gschaffner commented Apr 23, 2024 •

edited

agronholm commented Apr 23, 2024

agronholm commented Apr 23, 2024

gschaffner commented Apr 23, 2024

gschaffner left a comment

agronholm commented May 10, 2024

fix possible race condition in get_async_backend, Fixes 425. #714

fix possible race condition in get_async_backend, Fixes 425. #714

Conversation

DavidJiricek commented Apr 9, 2024 • edited

agronholm commented Apr 9, 2024

DavidJiricek commented Apr 13, 2024

agronholm commented Apr 15, 2024

gschaffner commented Apr 17, 2024 • edited

Footnotes

agronholm commented Apr 17, 2024

Footnotes

agronholm commented Apr 21, 2024

davidbrochart commented Apr 22, 2024

gschaffner commented Apr 23, 2024 • edited

agronholm commented Apr 23, 2024

agronholm commented Apr 23, 2024

gschaffner commented Apr 23, 2024

gschaffner left a comment

Choose a reason for hiding this comment

agronholm commented May 10, 2024

fix possible race condition in `get_async_backend`, Fixes 425. #714

fix possible race condition in `get_async_backend`, Fixes 425. #714

DavidJiricek commented Apr 9, 2024 •

edited

gschaffner commented Apr 17, 2024 •

edited

gschaffner commented Apr 23, 2024 •

edited