Fix deque steal race condition #726

kmaork · 2021-07-30T21:26:43Z

No description provided.

taiki-e

Thanks!

bors r+

bors · 2021-07-30T21:57:22Z

Build succeeded:

ci

SpadeA-Tang · 2024-03-24T06:30:50Z

crossbeam-deque/src/deque.rs

-            .compare_exchange(f, f.wrapping_add(1), Ordering::SeqCst, Ordering::Relaxed)
-            .is_err()
+        // If the buffer has been swapped or the increment fails, we retry.
+        if self.inner.buffer.load(Ordering::Acquire, guard) != buffer


I'm wondering why this can cause troubles?

The details are in the discussion here, but it seems that it's not public currently. @taiki-e is it possible to make the discussion visible? I could also copy here the explanation that I wrote there.

Thanks for the reply! I see it says that a task may be popped twice. But I still does not understand why this is the case given the CAS below https://github.com/crossbeam-rs/crossbeam/pull/726/files#diff-43da9e89d19dacafed5e7cc0d1ac11867f059a529d82daa4fdf2c5e39243a849R640-R644.

@taiki-e is it possible to make the discussion visible?

Sorry, I don't know how to make the discussion in the GitHub advisory public.

Thanks @taiki-e, so I'm just quoting here:

The problem in the stealing functions is that we first load the pointer to the buffer, and then read a task from the buffer. Between loading the buffer and reading the task from the buffer, the buffer may have been swapped and we find ourselves reading stale data. But, if we verify that the buffer hasn't been swapped, the task is valid and we can increment/decrement the relevant atomic counter to steal the task. From this point onward we don't care if the buffer is swapped, because we already read a task from it and we're not planning to touch it again.

Between loading the buffer and reading the task from the buffer, the buffer may have been swapped and we find ourselves reading stale data.

Even the buffer may changed during this gap, old buffer and new buffer should have the the same value at position f. Em.. do we encounter a concrete case?

Yes, I was able to consistently recreate memory corruption with the old version of the code. Below is a more detailed explanation:

To exploit the bug, two things must happen:

The buffer must be swapped during a steal.

The data read from the old buffer must be different from the data at the same index in the new buffer.

Note that the buffer is both growable and shrinkable (shrinks when size <= capacity / 4). Therefore lots of pushing, popping and stealing will satisfy condition 1. For me it worked best with one pusher/popper thread and one stealer thread (as can be seen in the script above).

To satisfy condition 2 we need to empty the queue after the buffer had been swapped and then push (at least) one task. If that happens before the stealer finished reading the task, the stealer will "declare" it has stolen the newly pushed task, while it actually read an old task. Now the two threads had popped the same task (the newly pushed task is lost) and the program has entered an undefined state.

Note that both conditions are things that must happen between loading the buffer and reading from it in the stealer thread. For that reason the bug is very likely to occur when using the steal_batch and steal_batch_and_pop functions with a lifo queue, as these functions load the buffer once and repetitively read tasks from it. In the other cases, the likelihood might depend more heavily on the environment. For example, if many threads are running on the target system, the chances the OS will preempt a thread at the "right" time for an exploit grows.

I will read this tomorrow. Thank you very much for this.

fix deque steal race condition

38c07fc

taiki-e approved these changes Jul 30, 2021

View reviewed changes

taiki-e added the crossbeam-deque label Jul 30, 2021

bors bot merged commit 3e72cde into crossbeam-rs:master Jul 30, 2021

taiki-e mentioned this pull request Jul 30, 2021

v0.7: Fix deque steal race condition #728

Merged

kmaork deleted the deque_race_fix branch July 30, 2021 22:50

taiki-e mentioned this pull request Jul 30, 2021

Add fuzzing tests to deque #730

Open

daira mentioned this pull request Aug 25, 2021

Multicore improvements zkcrypto/bellman#69

Merged

SpadeA-Tang reviewed Mar 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix deque steal race condition #726

Fix deque steal race condition #726

kmaork commented Jul 30, 2021

Uh oh!

taiki-e left a comment

Uh oh!

bors bot commented Jul 30, 2021

Uh oh!

SpadeA-Tang Mar 24, 2024 •

edited

Loading

Uh oh!

kmaork Mar 24, 2024

Uh oh!

SpadeA-Tang Mar 24, 2024

Uh oh!

taiki-e Mar 24, 2024

Uh oh!

kmaork Mar 24, 2024

Uh oh!

SpadeA-Tang Mar 24, 2024

Uh oh!

kmaork Mar 24, 2024

Uh oh!

SpadeA-Tang Mar 24, 2024

Uh oh!

Fix deque steal race condition #726

Fix deque steal race condition #726

Conversation

kmaork commented Jul 30, 2021

Uh oh!

taiki-e left a comment

Choose a reason for hiding this comment

Uh oh!

bors bot commented Jul 30, 2021

Uh oh!

SpadeA-Tang Mar 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmaork Mar 24, 2024

Choose a reason for hiding this comment

Uh oh!

SpadeA-Tang Mar 24, 2024

Choose a reason for hiding this comment

Uh oh!

taiki-e Mar 24, 2024

Choose a reason for hiding this comment

Uh oh!

kmaork Mar 24, 2024

Choose a reason for hiding this comment

Uh oh!

SpadeA-Tang Mar 24, 2024

Choose a reason for hiding this comment

Uh oh!

kmaork Mar 24, 2024

Choose a reason for hiding this comment

Uh oh!

SpadeA-Tang Mar 24, 2024

Choose a reason for hiding this comment

Uh oh!

SpadeA-Tang Mar 24, 2024 •

edited

Loading