Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TSAN tests pass with the GIL disabled in free-threaded builds #117657

Open
18 of 60 tasks
Tracked by #108219
mpage opened this issue Apr 8, 2024 · 3 comments
Open
18 of 60 tasks
Tracked by #108219

Make TSAN tests pass with the GIL disabled in free-threaded builds #117657

mpage opened this issue Apr 8, 2024 · 3 comments
Assignees
Labels
topic-free-threading type-feature A feature request or enhancement

Comments

@mpage
Copy link
Contributor

mpage commented Apr 8, 2024

Feature or enhancement

We need to make the TSAN tests pass with the GIL disabled before we can disable the GIL by default in free-threaded builds.

This should proceed in two phases:

  1. Add suppressions for existing TSAN warnings.
  2. Triage, fix, and remove the suppressions for the warnings enumerated in (1).

How to run the TSAN tests

  1. Build with TSAN:
    env CC=clang CXX=clang++ ./configure --disable-gil --with-thread-sanitizer --with-pydebug && make -j
  2. Run tests:
    env TSAN_OPTIONS=suppressions=<repo_root>/Tools/tsan/suppressions_free_threading.txt ./python -m test --tsan -j4

Working on a race

  1. Verify that the TSAN tests are passing using the steps from above.
  2. Pick a suppression from the section below and assign it to yourself (edit it and add your username next to it). Some of them may be related. If you find that other suppressions are related to the race you're working on please assign them to yourself or contact their owner if they're already assigned.
  3. Optionally, head over to the race index to see examples and which tests reproduce the race corresponding to the suppression.
  4. Delete the suppression from <repo_root>/Tools/tsan/suppressions_free_threading.txt , run the TSAN tests, and verify that the race is reported by TSAN. You may need to comment out unrelated functions (notably, _PyEval_EvalFrameDefault) in order to reproduce the race.
  5. Fix the race.

Suppressions

Tasks

Linked PRs

@mpage mpage added type-feature A feature request or enhancement topic-free-threading labels Apr 8, 2024
@mpage mpage self-assigned this Apr 8, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Apr 12, 2024
colesbury pushed a commit that referenced this issue Apr 15, 2024
Additionally, reduce the iterations for a few weakref tests that would
otherwise take a prohibitively long amount of time (> 1 hour) when TSAN
is enabled and the GIL is disabled.
colesbury pushed a commit that referenced this issue Apr 15, 2024
…rld()` and `tstate_try_attach()` (#117828)

TSAN erroneously reports a data race between the `_Py_atomic_compare_exchange_int`
on `tstate->state` in `tstate_try_attach()` and the non-atomic load of
`tstate->state` in `start_the_world`. The `_Py_atomic_compare_exchange_int` fails,
but TSAN erroneously treats it as a store.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ython#117736)

Additionally, reduce the iterations for a few weakref tests that would
otherwise take a prohibitively long amount of time (> 1 hour) when TSAN
is enabled and the GIL is disabled.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…the_world()` and `tstate_try_attach()` (python#117828)

TSAN erroneously reports a data race between the `_Py_atomic_compare_exchange_int`
on `tstate->state` in `tstate_try_attach()` and the non-atomic load of
`tstate->state` in `start_the_world`. The `_Py_atomic_compare_exchange_int` fails,
but TSAN erroneously treats it as a store.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
DinoV pushed a commit that referenced this issue Apr 17, 2024
…#117954)

Fix data races in the method cache in free-threaded builds

These are technically data races, but I think they're benign (to
the extent that that is actually possible). We update cache entries
non-atomically but read them atomically from another thread, and there's
nothing that establishes a happens-before relationship between the
reads and writes that I can see.
DinoV added a commit to mpage/cpython that referenced this issue Apr 17, 2024
DinoV pushed a commit that referenced this issue Apr 17, 2024
…#117955)

Quiet erroneous TSAN reports of data races in `_PySeqLock`

TSAN reports a couple of data races between the compare/exchange in
`_PySeqLock_LockWrite` and the non-atomic loads in `_PySeqLock_{Abandon,Unlock}Write`.
This is another instance of TSAN incorrectly modeling failed compare/exchange
as a write instead of a load.
DinoV added a commit that referenced this issue Apr 19, 2024
)

Use relaxed load to check if dictkeys are immortal
DinoV added a commit that referenced this issue Apr 19, 2024
…th strong enough semantics (#118111)

Use acquire for load of ob_ref_shared
DinoV pushed a commit that referenced this issue Apr 23, 2024
… `tstate->state` (#118165)

Quiet TSAN warnings about remaining non-atomic accesses of `tstate->state`
colesbury pushed a commit to colesbury/cpython that referenced this issue Jun 3, 2024
…19268)

Fix itertools.count in free-threading mode
(cherry picked from commit 87939bd)

Co-authored-by: Arnon Yaari <wiggin15@yahoo.com>
colesbury added a commit that referenced this issue Jun 3, 2024
#120005)

The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.

This fixes a race condition involving the tracking of whether the
behavior is suppressed.

(cherry picked from commit 47fb432)
colesbury added a commit that referenced this issue Jun 3, 2024
)

Fix itertools.count in free-threading mode
(cherry picked from commit 87939bd)

Co-authored-by: Arnon Yaari <wiggin15@yahoo.com>
mliezun pushed a commit to mliezun/cpython that referenced this issue Jun 3, 2024
…ython#119921)

The GIL may be disabled concurrently with this call so we need to use a
relaxed atomic load.
mliezun pushed a commit to mliezun/cpython that referenced this issue Jun 3, 2024
The `sem_clockwait` function is not currently instrumented, which leads
to false positives.
mliezun pushed a commit to mliezun/cpython that referenced this issue Jun 3, 2024
)

The free-threaded build currently immortalizes objects that use deferred
reference counting (see pythongh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.

This fixes a race condition involving the tracking of whether the
behavior is suppressed.
colesbury added a commit that referenced this issue Jun 4, 2024
The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.

Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).

The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 4, 2024
…nGH-119923)

The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.

Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).

The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
(cherry picked from commit e69d068)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit to colesbury/cpython that referenced this issue Jun 4, 2024
colesbury added a commit that referenced this issue Jun 4, 2024
…19923) (#120038)

The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.

Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).

The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
(cherry picked from commit e69d068)

Co-authored-by: Sam Gross <colesbury@gmail.com>
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
Only call `gc_restore_tid()` from stop-the-world contexts.
`worklist_pop()` can be called while other threads are running, so use a
relaxed atomic to modify `ob_tid`.
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
Due to a limitation in TSAN, all reads from `PyThreadState.state` must be
atomic to avoid reported races.
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
…#119908)

Seen in CI occasionally when running `test_weakref`.
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
…ython#119921)

The GIL may be disabled concurrently with this call so we need to use a
relaxed atomic load.
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
The `sem_clockwait` function is not currently instrumented, which leads
to false positives.
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
)

The free-threaded build currently immortalizes objects that use deferred
reference counting (see pythongh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.

This fixes a race condition involving the tracking of whether the
behavior is suppressed.
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
…n#119923)

The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.

Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).

The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
colesbury added a commit that referenced this issue Jun 6, 2024
This adds a `_PyRecursiveMutex` type based on `PyMutex` and uses that
for the import lock. This fixes some data races in the free-threaded
build and generally simplifies the import lock code.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 6, 2024
This adds a `_PyRecursiveMutex` type based on `PyMutex` and uses that
for the import lock. This fixes some data races in the free-threaded
build and generally simplifies the import lock code.
(cherry picked from commit e21057b)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit that referenced this issue Jun 6, 2024
…20169)

This adds a `_PyRecursiveMutex` type based on `PyMutex` and uses that
for the import lock. This fixes some data races in the free-threaded
build and generally simplifies the import lock code.
(cherry picked from commit e21057b)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit to colesbury/cpython that referenced this issue Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants