context: introduce alternate global context scheme based on #346 #539

apoelstra · 2022-11-25T00:40:11Z

This implements, but does not use, the new global context scheme described in #346. As described in #538, we want to actually use the global context API throughout the entire codebase, rather than maintaining parallel global-context and non-global-context APIs.

This is in its own commit so that we can argue about it independently of the rest of the PR. I believe this is correct and safe because the docs for `AtomicPtr` say that (a) it is #[repr(C)] and (b) that it has the same in-memory representation as a *mut T.

This covers both the std and no-std cases. This is perhaps an unconventional strategy; see rust-bitcoin#346 for discussion leading up to it.

Weirdly we have no tests that don't require std or alloc. We should revisit this. In many cases it is because of the dependency on rand-std but we should really try to narrow that down to specific tests.

This commit should probably be removed before merging; it can be part of the next PR.

Kixunil

I believe this code is unsound. First how I understand what the code is doing:

There are two contexts, one gets re-randomized, the other doesn't. The pointer to currently usable context is stored in secp256k1_context_signing_1. When the context is being re-randomized it atomically sets a "busy", flag switches the pointers, modifies the context, then restores the pointers and finally unsets the busy flag.

The problem with this is that when setting the busy flag we don't know if the context is still being used by other thread (for reading). So it could happen that another thread holds the pointer it obtained before while it's being modified.

The only possible fix I see here is to implement RW lock - count the number of threads using the primary context and only rerandomize if there are none. It kinda worries me though that this may significantly decrease the frequency of rerandomizations. One thing we should take care of is to ensure that the rerandomized context is used for signing only. Verification should always use the alternate context.

I'm also thinking about alternative design:

Have 3 context: sig1, sig2, verify
verify is constant and accessed from appropriate operations
A thread accessing signing context tries sig1 and if it's busy goes for sig2
seed is 64B - rerandomization writes the remaining 32B into a (now locked) global buffer
A thread that is releasing read lock tries to obtain write lock for sig2 to rerandomize sig2
If successful it rerandomizes it with the seed obtained from global buffer
There needs to be some synchronization to ensure only one context is locked.
This whole thing probably should be modeled as a queue or something similar.

The expected outcome is that while it may happen that two threads reused the same context at least they wouldn't use a default, completely non-randomized one.

Kixunil · 2022-11-25T10:54:43Z

src/context.rs

+
+    thread_local! {
+        pub static CONTEXT: RefCell<Secp256k1<All>> = RefCell::new(Secp256k1::new());
+    }


We could actually use pure UnsafeCell and just make sure we don't access it twice from our methods nor call any user-provided callbacks. Maybe not worth optimizing though.

I think we should do this, but not on the first iteration. RefCell will let us do development "in debug mode" :)

Kixunil · 2022-11-25T10:57:24Z

src/context.rs

+    ///
+    /// Safety: you are being given a raw pointer. Do not mutate through it. Do not
+    /// move it, or any reference to it, across thread boundaries.
+    pub unsafe fn with_global_context<F: FnOnce(*const ffi::Context) -> R, R>(f: F) -> R {


Why not use &ffi::Context instead?

Oh, neat! I spent so long going back and forth on ffi::Context vs Secp256k1<All> vs some custom type that I didn't even consider the pointer type.

This is a good idea.

Kixunil · 2022-11-25T11:18:58Z

src/context.rs

+            // The reason is that a panic at this point in the code would leave
+            // `LOCK` at true (as well as both `signing_1` and `signing_2` pointing
+            // at the same, never-rerandomized, place) and therefore prevent any
+            // future rerandomizations from succeeding.


We could still panic after unlocking 🤷‍♂️

Hah, good idea! Again, I was so worked up over an unrelated thing (whether to call randomize or let the user pass in an arbitrary-maybe-panicking closure) that I didn't consider this.

Code review ftw.

Kixunil · 2022-11-25T11:47:08Z

src/context.rs

+
+        unsafe {
+            // Obtain a pointer to the alternate context.
+            let alt_ctx = ffi::secp256k1_context_signing_2.load(Ordering::Acquire);


This is never modified so we shouldn't need atomic.

Ah, right. It was modified in a first iteration of this, and my on-paper design (when I was trying to use AtomicPtr::swap on two AtomicPtrs directly, which annoyingly you cannot do).

But as you noticed in your above comment, this whole scheme doesn't seem to work, so it's a moot point.

apoelstra · 2022-11-25T15:14:46Z

The problem with this is that when setting the busy flag we don't know if the context is still being used by other thread (for reading). So it could happen that another thread holds the pointer it obtained before while it's being modified.

Ah, yes, you're correct. The pointers are okay but the actual context object may be seen while it's being modified. Dammit.

The only possible fix I see here is to implement RW lock

OK, this isn't too bad. I was scared of doing this because it could mean that reads might fail due to contention, but now I see that this won't be an issue.

I'll think a bit about whether having a rwlock would let me get rid of the second context object and the pointer-swaps, which would be nice (and might make this upstreamable).

It kinda worries me though that this may significantly decrease the frequency of rerandomizations

So, in theory if you have multiple threads constantly signing in a tight loop, it would reduce the frequency to 0 (since the RW lock would never count down to zero). It's not really a fair comparison, but this is the same as the current situation ;). Less unfairly, I don't think this is going to be a very common case -- even if you are signing in a multi-threaded context, I'd expect contention to only happen in short bursts, and if you can pull a rerandomization off even once in 100 sigantures that offers significant protection. (It's hard to quantify this stuff because the attack model is so murky "the attacker can see everything you're doing with the key...except not the actual key". But "typical" timing attacks on not-egregiously-broken code take well over 100 signatures to get enough signal to usefully weaken a key.)

In practice I think it'll be really rare to be signing in more than one thread anyway, and verification we already have planned to use a different context (the existing _no_precomp one) (with no synchronization). But if we can solve this, we should.

I'm also thinking about alternative design:

Heh, I like this in princple, but given that (a) we have no real locking facilities other than spinlocks which we've decided to eschew; (b) even having unbounded queues will be difficult.

Kixunil · 2022-11-27T12:36:49Z

having a rwlock would let me get rid of the second context object

I don't see how. I don't suggest traditional RW lock but fall back to the read-only context when the lock is locked for writing. This read only context must exist. Otherwise we would have to wait which we don't want to.

But "typical" timing attacks on not-egregiously-broken code take well over 100 signatures to get enough signal to usefully weaken a key.

That's pretty comforting but I was pointing out at fall backs: if it happens that signing falls back to the constant, attacker-known context enough times the attacker might extract the key. Ironically, the more often re-randomization happens the more likely it is to trigger this. My idea with three contexts prevents that. (We could have two but then verifications would cause contention.)

even having unbounded queues will be difficult.

I meant bounded queue containing two items. But since then I tried to think about that and came up with a design without a queue:

Data structures

Contexts

extern "C" static mut GLOBAL_SIGN_CONTEXTS: [ffi::Context; 2];
extern "C" static GLOBAL_VERIFY_CONTEXT: ffi::Context;
static CONTEXTS_DIRTY: [AtomicBool; 2]; (Should have better perf than single AtomicU8 but I think some platforms need to lock a chunk of memory so maybe not)

Main lock

The main lock is AtomicUsize which stores two flags in the lowest bits and the reader count in the remaining bits. Thus atomically adding or subtracting 4 from it modifies the counter.
The two flags are:

Which context is active - least significant (0b1)
Needs swap - second least significant (0b10)

Rerandomization lock

Simple AtomicBool - true is locked, false is unlocked

Seed

[AtomicU8; 32] stores the seed for RNG. Notably it doesn't matter that a thread may read "inconsistent" content because it's all random data. If the array is being overwritten while being read it cannot worsen entropy and the exact data doesn't matter.

RNG

RNG is HMAC(seed, counter) where each rerandomization bumps the counter. The counter is behind rerandomization lock and doesn't need to be atomic.

Signing thread operations

Add 4 with Acquire to main lock assigning the current lock value to last_lock
Panic if last_lock >= usize::MAX / 2 - having usize::MAX threads should be impossible so if this happens it's because of a bug
let ctx = &GLOBAL_SIGN_CONTEXTS[usize::from(last_lock & 1)];
Sign with ctx
CONTEXTS_DIRTY[usize::from(last_lock & 1)].set(true, Relaxed);
Subtract 4 with Relaxed from main lock assinngning the current lock value to last_lock
if last_lock & !1 == 0b10 { MAIN_LOCK.compare_xchange(last_lock, (last_lock & 1) ^ 1, Relaxed, Relaxed) }
If the CAS in the previous step succeeded attempt to re-randomize

Reseeding

Public reseeding function takes 64B seed.

RERAND_LOCK.compare_exchange(false, true, Acquire, Relaxed)
If successful rerandomize if needed using first 32B and release the lock with Release
Unconditionally overwrite the global seed with the remaining 32B

Alternatively we could take 32B, first overwrite the seed and then call rerandomization. Probably simpler code but calls into HMAC even in hypothetical case when supplying 64B is cheaper.

Rerandomization

(unless previous operations already locked the lock) RERAND_LOCK.compare_exchange(false, true, Acquire, Relaxed)
let prev_main = MAIN_LOCK.fetch_and(!0b10, Relaxed) - clear needs swap flag to avoid swap while this thread is mutating
if prev_main & 0b10 == 0b10 - the context was already rerandomzied and doesn't need to be again. Restore the needs swap flag first, release RERAND_LOCK, try to swap if prev_main & !0b11 == 0, try rerandomizing again if swap succeeded, return otherwise.
If !CONTEXTS_DIRTY[usize::from((prev_main & 0b01) ^ 1)].load(Relaxed) release RERAND_LOCK with Relaxed` and return
Rerandomize GLOBAL_SIGN_CONTEXTS[usize::from((prev_main & 0b01) ^ 1)]
CONTEXTS_DIRTY[usize::from((prev_main & 0b01) ^ 1)].store(false, Relaxed)
let prev_main = MAIN_LOCK.fetch_or(0b10, Release)
RERAND_LOCK.store(false, Release)
MAIN_LOCK.compare_exchange((prev_main & 1) | 0b10, (prev_main & 1) ^ 1, Relaxed, Relaxed) - we held the lock so the other thread may have failed at rerandomizing
If succeeded repeat rerandomization

apoelstra · 2022-11-29T14:51:39Z

Just letting you know that I haven't forgotten about this, but I've read it twice and need to read it some more times before I can grok everything that's going on.

Kixunil · 2022-11-29T15:50:05Z

If it helps the main idea is for signing threads to use whichever one the two contexts while the other is available for rerandomization. When there's no contention the contexts are swapped so that the used one can be re-randomized.

apoelstra added 5 commits November 24, 2022 22:55

secp256k1-sys: export two static signing-capable contexts from C

a986e8d

context: add methods to provide immutable access and rerandomization

a62f398

This covers both the std and no-std cases. This is perhaps an unconventional strategy; see rust-bitcoin#346 for discussion leading up to it.

add a simple end-to-end test not feature-gated on std or alloc

a03bedd

Weirdly we have no tests that don't require std or alloc. We should revisit this. In many cases it is because of the dependency on rand-std but we should really try to narrow that down to specific tests.

f try using the new global context

eac1b86

This commit should probably be removed before merging; it can be part of the next PR.

apoelstra mentioned this pull request Nov 25, 2022

Context overhaul tracking issue #538

Open

6 tasks

make cargo fmt shut up

087c2f5

Kixunil requested changes Nov 25, 2022

View reviewed changes

tcharding mentioned this pull request Apr 20, 2023

Implement new global context #605

Draft

apoelstra mentioned this pull request Apr 28, 2023

Introduce a primitive types crate rust-bitcoin/rust-bitcoin#1818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context: introduce alternate global context scheme based on #346 #539

context: introduce alternate global context scheme based on #346 #539

apoelstra commented Nov 25, 2022

Kixunil left a comment

Kixunil Nov 25, 2022

apoelstra Nov 25, 2022

Kixunil Nov 25, 2022

apoelstra Nov 25, 2022

Kixunil Nov 25, 2022

apoelstra Nov 25, 2022

Kixunil Nov 25, 2022

apoelstra Nov 25, 2022

apoelstra commented Nov 25, 2022

Kixunil commented Nov 27, 2022

apoelstra commented Nov 29, 2022

Kixunil commented Nov 29, 2022

context: introduce alternate global context scheme based on #346 #539

Are you sure you want to change the base?

context: introduce alternate global context scheme based on #346 #539

Conversation

apoelstra commented Nov 25, 2022

Kixunil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apoelstra commented Nov 25, 2022

Kixunil commented Nov 27, 2022

Data structures

Contexts

Main lock

Rerandomization lock

Seed

RNG

Signing thread operations

Reseeding

Rerandomization

apoelstra commented Nov 29, 2022

Kixunil commented Nov 29, 2022