Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement insecure-erase feature (was: Zeroize) #582

Merged
merged 1 commit into from
Feb 22, 2023

Conversation

kwantam
Copy link
Contributor

@kwantam kwantam commented Feb 18, 2023

This PR adds Zeroize derivations for the following structs:

  • SecretKey
  • KeyPair
  • SharedSecret
  • Scalar
  • DisplaySecret

This is only a Zeroize impl, and does not make Zeroize happen automatically on drop (doing that would be a breaking change because it would preclude deriving Copy). But this is still useful, because it allows downstream libraries to implement ZeroizeOnDrop for structs that contain such secrets and/or simply to use the Zeroizing container struct.

Because these new impls are never invoked automatically, performance impact should be zero. Safety-wise, the Zeroize library appears to be widely used in cryptographic code. For example, Supranational's blst Rust bindings use it, and in turn are used in one of the most popular eth2 validator implementations.

Thanks for maintaining a really great library!

Copy link
Member

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this an optional, default-off feature so we're not adding a dependency for downstream crates that don't use this?

@kwantam
Copy link
Contributor Author

kwantam commented Feb 19, 2023

Ah, excellent point. Done!

@apoelstra
Copy link
Member

See also discussion on #553 .

Aside from issues of feature-gating, I don't understand how Zeroize can possibly provide value for Copy types, and I'm surprised this is even supported by them. Is there documentation about this?

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

I agree that in many cases Zeroize isn't much help because of the Copy impls---but it doesn't hurt, and as long as it's feature gated it also doesn't add any dependency overhead.

Speaking to my particular application: I have a SecretKey field in a non-Copy struct, and I'm perfectly happy to pin that struct in order to ensure that Zeroize does its thing. I'd even be willing to manually implement Zeroize for that struct, but there's just no way that I can see (other than unsafe) to overwrite the contents of the SecretKey's backing array.

In other words: as far as I can tell, it's currently impossible to securely erase SecretKeys other than by using unsafe transmutations, even for libraries that are happy to pin, use std, avoid Copy, etc. This PR doesn't fully solve the problems, but it does change the situation from "impossible" to "possible but hard to do right".

Would it help to address your concerns if I added a paragraph in the crate docs with a brief rundown of the limitations and a pointer to the Zeroize documents for more info?

Cargo.toml Outdated
@@ -35,6 +35,7 @@ global-context = ["std"]
# if you are doing a no-std build, then this feature does nothing
# and is not necessary.)
global-context-less-secure = ["global-context"]
zeroizable = ["zeroize", "secp256k1-sys/zeroizable"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features should be named the same as crates. You can rename the crate if you need to activate additional features. (see how it's done for serde in bitcoin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Unfortunately, renaming the crate doesn't seem to work here: it seems like the renaming breaks the derive macro. (I could easily be doing something wrong, though!)

I'll leave this open for now. If the direction of the conversation changes and it starts to look like this PR has a chance at being merged I can revisit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If zeroize doesn't support an attribute similar to serde(crate = "...") then I suggest to open an issue there first.

secp256k1-sys/Cargo.toml Outdated Show resolved Hide resolved
@apoelstra
Copy link
Member

@kwantam using unsafe code there are a couple ways you can modify the underlying backing store:

  • Using as_mut_c_ptr() to get a pointer to the raw bytes (the length is always 32)
  • Using negate and add_tweak to force the underlying value to a fixed one (though I actually doubt this would avoid copying)

I agree that from a purely technical point of view, there's "no harm" in adding a feature-gated zeroize implementation. But in the past we've had problems with stuff like this -- we tried something similar with schemars in bitcoin_hashes where one user really wanted support, but it didn't fit our MSRV and the rest of us didn't see the point, and so we wound up with an "unmaintained feature" which actually caused us huge headaches even though we pretended it didn't exist. It cluttered up the feature-gating, it required special CI support to deal with its wrong MSRV, it was enabled by --all-features so we had to tiptoe around that, etc., and when it (constantly) broke stuff it was much harder to deal with because none of us had a good idea of how it was supposed to work.

Instead I think we should just make it possible to get the underlying backing store out of a SecretKey. Because we have invariants to maintain (one of which is that the key is never all-bytes-zero!) we unfortunately can't give you direct mutable access. But would any of these help?:

  • an unsafe as_mut_inner method which took SecretKey -> &mut [u8] with a stern lecture about how if you set it to all-bits-zero you're not allowed to use the key anymore and if you do it may abort the process
  • into_inner method which took SecretKey -> [u8; 32]
  • insecure_erase method which just sets all the underlying bytes to 1s in-place or something, and has some warnings about how these types are Copy and we don't try to prevent compiler optimizations and that to use this effectively you need to pin the type and do some other extra work.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 20, 2023

Great point @apoelstra! I personally like unsecure_erase the most because it doesn't close the door to niche optimization in the future. into_inner would not work.

Also I just realized that if we do not provide this then the only sound unsafe solution is to wrap SecretKey into MaybeUninit, which is kinda ugly and not many people would understand why they need to do that.

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

@apoelstra I completely understand the maintenance burden, particularly regarding MSRV, and @Kixunil thanks very much for the helpful review and for your thoughts on alternatives.

Of the alternatives @apoelstra lists, my preference would be into_inner: it gives consuming libraries a direct path to a Zeroize implementation that has a chance of not being optimized away. In contrast, getting zeroizing right with insecure_erase probably requires reimplementing a few of the low-level tricks already used in Zeroize---or it might not work at all without resorting to unsafe, kind of defeating the point.

I understand @Kixunil's point about how this leaks an abstraction, meaning that future optimizations might become breaking changes. Y'all obviously know much better than I do, but as far as I can tell the underlying representation of SecretKey has never changed---so my guess is that the likelihood of causing a problem with into_inner is essentially zero. (And we might further reduce that likelihood by feature gating the into_inner method to tacitly discourage people from using it unless they absolutely have to, adding documentation stating that it could break, etc.)

If that works for you, I'll just transmute this PR into an into_inner PR for the relevant secrets.

Thanks again for the very helpful discussion on this!

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

Apologies for the double comment. Coming back to correct myself.

I just realized that in fact insecure_erase would be reasonably straightforward to implement in exactly the style of the zeroize crate without breaking MSRV. In particular, something like the below should make it reasonable to manually implement Zeroize in consuming crates. (compiler_fence and write_volatile are both available in 1.41.x.)

// *** borrowed from zeroize ***
#[inline(always)]
fn atomic_fence() {
    std::atomic::compiler_fence(std::atomic::Ordering::SeqCst);
}

#[inline(always)]
fn volatile_write<T: Copy + Sized>(dst: &mut T, src: T) {
    unsafe { std::ptr::write_volatile(dst, src) }
}
// *** end borrowed from zeroize ***

impl SecretKey {
    /// big warning goes here
    pub fn insecure_erase(&mut self) {
        volatile_write(&mut self.0, [1u8; constants::SECRET_KEY_SIZE]);
        atomic_fence();
    }
}

// and likewise for other secret-containing structs

@apoelstra
Copy link
Member

If that works for you, I'll just transmute this PR into an into_inner PR for the relevant secrets.

I'd be fine with this. I agree that it seems extremely unlikely that we'll ever change SecretKey to no longer be a [u8; 32] (maybe it could become "a [u8; 32] plus some phantomdata" or something but that's still compatible).

I just realized that in fact insecure_erase would be reasonably straightforward to implement in exactly the style of the zeroize crate without breaking MSRV.

I like it!! Thank you for keeping the name insecure_erase even though we're now making some effort to avoid compiler optimizations.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 20, 2023

I specifically said into_inner would not work because it'd create a copy that'd then get overwritten.
Regarding internal representation, the memory layout will be forever [u8; 32], maybe aligned more but the issue is that we may signal to the compiler it's never zero to enable niche optimizations.

insecure

Not a native but to my knowledge "insecure" means having insecurity, essentially being afraid of something. So we should either say unsecure ot non_secure

And can someone explain why the atomic fence is used? I don't think it should be needed.

@kwantam kwantam changed the title derive Zeroize impl for all secret-related structs implement insecure-erase feature (was: Zeroize) Feb 20, 2023
@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

OK, I've taken a first cut at insecure-erase. The place that was least obvious to me was what to do about KeyPair. In the end, I put a constant byte vector in secp256k1-sys that represents the KeyPair corresponding to the secret [1u8; 32], but there may be a better way that I missed. This way, insecure_erase() always results in a value that is valid.

README.md Outdated
@@ -38,6 +38,10 @@ Alternatively add symlinks in your `.git/hooks` directory to any of the githooks
We use a custom Rust compiler configuration conditional to guard the bench mark code. To run the
bench marks use: `RUSTFLAGS='--cfg=bench' cargo +nightly bench --features=recovery`.

### A note on the `insecure-erase` feature

The `insecure-erase` feature is provided to assist other libraries in building secure secret erasure. When the `insecure-erase` feature is enabled, secret types (`SecretKey`, `KeyPair`, `SharedSecret`, `Scalar`, and `DisplaySecret`) have a method called `insecure_erase` that *attempts* to overwrite the contained secret. This library makes no guarantees about the security of using `insecure_erase`. In particular, since all of these types are `Copy`, the compiler is free to move and copy them, thereby making additional copies in memory. For more information, consult the [`zeroize`](https://docs.rs/zeroize) documentation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all of these types are Copy, the compiler is free to move and copy them

This is misleading, even non-Copy types can be copied by the compiler. Copy literally means "doesn't run destructors", nothing else.

It should say something like "In particular, the compiler doesn't know that this is a secret so it may arbitrarily copy it anywhere it pleases."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Reworded!

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

I specifically said into_inner would not work because it'd create a copy that'd then get overwritten. Regarding internal representation, the memory layout will be forever [u8; 32], maybe aligned more but the issue is that we may signal to the compiler it's never zero to enable niche optimizations.

Apologies for misunderstanding! What you say here makes total sense.

insecure

Not a native but to my knowledge "insecure" means having insecurity, essentially being afraid of something. So we should either say unsecure ot non_secure

The word "insecure" can be used in the way that's meant here. For example, one could talk about an "insecure hash function". "Unsecure" is not in modern use (the Oxford English Dictionary lists it as obsolete). The closest is the adjective "unsecured", which (to my knowledge) is most commonly used to refer to financial products: "an unsecured loan".

"non_secure_erase" would be totally fine and is, as far as I can tell, almost perfectly synonymous with "insecure_erase". The only disadvantage is the slightly longer name.

And can someone explain why the atomic fence is used? I don't think it should be needed.

The zeroize crate has a comment saying that the atomic fence is to prevent later reads from being reordered. I'm guessing this is because the semantics of std::ptr::write_volatile: according to the Rust stdlib docs, volatile operations are only guaranteed "to not be elided or reordered by the compiler across other volatile operations." To me this implies that non-volatile reads could get reordered before the volatile write. If that's true, the memfence would indeed prevent this.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 20, 2023

Oh, I just realized this whole PR is not needed to implement zeroize! Just write this code:

let dummy = Secret::from_slice(&[1; 32]).unwrap();
volatile_write(&mut the_keypair_you_want_to_erase, dummy);
// fence, if it's actually needed

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 20, 2023

To me this implies that non-volatile reads could get reordered before the volatile write. If that's true, the memfence would indeed prevent this.

The code in question should run in destructor, so there should be no more reads afterwards (other than some horrible UB accessing uninitialized memory - pretty certain the memory will get corrupted anyway). Even if we wanted to prevent those Acquire should be used instead.

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

To me this implies that non-volatile reads could get reordered before the volatile write. If that's true, the memfence would indeed prevent this.

The code in question should run in destructor, so there should be no more reads afterwards (other than some horrible UB accessing uninitialized memory - pretty certain the memory will get corrupted anyway). Even if we wanted to prevent those Acquire should be used instead.

Yes, I think you're 100% right if we assume that erasure will only ever happen in a destructor, but I think that's not an assumption the zeroize authors are making, hence their use of the memory fence.

Regarding memory ordering: I think you're right that it's possible to choose a slightly weaker constraint than SeqCst. I'm less sure that trying to weaken the fence here is worthwhile: if the goal of this change is to enable folks to implement zeroize in consuming libraries, it seems likely that in most cases they'll be mixing their implementation with implementations derived from the zeroize crate---at which point they're going to have a bunch of SeqCst memory fences no matter what. So the advantage of deviating in this implementation seems minimal, and it's potentially going to cause headaches of the form "hey guys why didn't you use SeqCst like the zeroize crate???"

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

Oh, I just realized this whole PR is not needed to implement zeroize! Just write this code:

let dummy = Secret::from_slice(&[1; 32]).unwrap();
volatile_write(&mut the_keypair_you_want_to_erase, dummy);
// fence, if it's actually needed

Yes, 100% agreed. I guess this is an instance of the "zeroize with unsafe code" approach discussed earlier in the thread.

On the other hand, it would be nice to provide this implementation in one place so that consuming libraries don't all have to implement it themselves. This is especially handy since the only use of unsafe in this PR is in secp256k1-sys, which already has to use unsafe because of FFI. This means that consuming libraries can forbid(unsafe_code) and still implement zeroize.

@apoelstra
Copy link
Member

I don't think we should feature-gate this. The point of calling it insecure is to discourage people from using it without understanding exactly what it provides. But if it's just a method we might as well make it always available.

I don't have any strong feelings about the memory fence. I guess we should have a clear justification for why we do what we do, so we're not cargo-culting, but I also feel like using anything but SeqCst is "just an optimization". Though I do agree that if the comment "prevent future accesses from being reordered to before erasure" is a complete description of the purpose of the fence, that Acquire does seem like it should be sufficient.

Other than that, ACK 6b0961c

@apoelstra
Copy link
Member

Oh, and I do mildly prefer non_secure to insecure actually. These words do have different meanings (or at least, different connotations). non_secure means we have made no effort to make it secure. insecure means there is actually some vulnerability.

My reason for choosing insecure was simply to warn users that they shouldn't try to use this for erasing secrets unless they understand what they're doing and they read the docs and source code. But I think non_secure may communicate the intent more clearly.

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

I don't think we should feature-gate this. The point of calling it insecure is to discourage people from using it without understanding exactly what it provides. But if it's just a method we might as well make it always available.

Just pushed a commit that removes the feature gates and rewords the text in the README accordingly. We can easily revert if the discussion pushes us back in the other direction.

I don't have any strong feelings about the memory fence. I guess we should have a clear justification for why we do what we do, so we're not cargo-culting, but I also feel like using anything but SeqCst is "just an optimization". Though I do agree that if the comment "prevent future accesses from being reordered to before erasure" is a complete description of the purpose of the fence, that Acquire does seem like it should be sufficient.

Other than that, ACK 6b0961c

If it would help, I can try to ping the authors of the zeroize crate and get more info about their use of SeqCst. But as I said above, it seems like the bar to choosing a weaker consistency model should be pretty high since it's unlikely to be a helpful optimization in practice and there are likely some subtleties that aren't captured by the terse comments in the zeroize code.

@apoelstra
Copy link
Member

My vote is to just stick with SeqCst.

@kwantam oof, these CI failures do not look fun. I believe the reason is that you've hardcoded an internal representation of the all-1s keypair, but this internal representation is different on systems of different endianness. The most straightforward approach may be to hardcode two versions, switched based on cfg flags, because I think there are only two endiannesses that anybody uses anymore. But otherwise you may need to "construct" the keypair object properly, which would probably involve editing C code..

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

My vote is to just stick with SeqCst.

@kwantam oof, these CI failures do not look fun. I believe the reason is that you've hardcoded an internal representation of the all-1s keypair, but this internal representation is different on systems of different endianness. The most straightforward approach may be to hardcode two versions, switched based on cfg flags, because I think there are only two endiannesses that anybody uses anymore. But otherwise you may need to "construct" the keypair object properly, which would probably involve editing C code..

Ah, hrm. Makes sense. I need to step away from this PR for the next few hours, but I'll take another look tonight. Thanks for the pointer!

@kwantam
Copy link
Contributor Author

kwantam commented Feb 20, 2023

OK! All the cross tests work locally for me now. BE32 and BE64 needed different constants.

If having three different constants feels too ugly, a couple obvious alternatives:

  • give up on leaving KeyPair valid after erasure (e.g., make the secret key all-1s and ignore the public key)
  • make KeyPair::non_secure_erase take a context (but this will be slow because erasure will involve a point multiplication)
  • comedy option: find a KeyPair whose repr is stable across all endiannesses and pointer sizes

I'm in favor of keeping the constants; seems like the least bad option.

@apoelstra
Copy link
Member

If having three different constants feels too ugly, a couple obvious alternatives:

  • give up on leaving KeyPair valid after erasure (e.g., make the secret key all-1s and ignore the public key)

I think I'd prefer the ugliness of the multiple constants to this. I may consider PR'ing upstream to see if they'll expose some "dummy key" constants.

  • make KeyPair::non_secure_erase take a context (but this will be slow because erasure will involve a point multiplication)

Heh, no, I think this is a really bad idea. It would be literally a 10000x performance hit.

  • comedy option: find a KeyPair whose repr is stable across all endiannesses and pointer sizes

:) I did consider this actually, but I am convinced that it's computationally impossible.

@apoelstra
Copy link
Member

Could you squash/rebase this so that each PR is reviewable individually?

@apoelstra
Copy link
Member

I think probably you could just squash everything into one commit.

@kwantam
Copy link
Contributor Author

kwantam commented Feb 21, 2023

Done! Thanks a lot for thinking through this with me.

@kwantam
Copy link
Contributor Author

kwantam commented Feb 21, 2023

Since you didn't mention it, I assume this PR should not bump the version number. But please let me know if you'd like me to do so.

tcharding
tcharding previously approved these changes Feb 21, 2023
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my comments are superficial documentation improvements. Feel free to ignore them, the code changes look good to me. I've pushed the changes to my tree and I can do a follow up PR if you want me to (branch 02-21-non-secure-erase-docs branch on my tree) or you can grab them from there.

ACK 2865863

README.md Outdated Show resolved Hide resolved
secp256k1-sys/src/lib.rs Outdated Show resolved Hide resolved
secp256k1-sys/src/lib.rs Outdated Show resolved Hide resolved
src/key.rs Outdated Show resolved Hide resolved
src/macros.rs Outdated Show resolved Hide resolved
@kwantam
Copy link
Contributor Author

kwantam commented Feb 21, 2023

Thanks @tcharding! I squashed in the branch 02-21-non-secure-erase-docs from your tree and wrapped the new README paragraph to 100 lines.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 21, 2023

but I think that's not an assumption the zeroize authors are making, hence their use of the memory fence.

Do they just blindly stick (overly-strong) fence everywhere? Sounds wrong. I would suggest we leave it off and document it.

I guess this is an instance of the "zeroize with unsafe code" approach discussed earlier in the thread.

I wouldn't say so. Without any support the code would either have to be unsound or the secret keys would have to be stored inside MaybeUninit which would be annoying. But simple volatile write that is trivial to audit is a completely different thing (I also wonder why there's no volatile_write<T>(&mut T, T) function).

This means that consuming libraries can forbid(unsafe_code) and still implement zeroize.

That seems to promote an unhealthy relationship with unsafe. unsafe is not some kind of boogeyman that will come to you at night and eat your head. I find it kinda annoying that the attribute even exists if it doesn't apply to dependencies. Ultimately, you have to review everything.

Looking at the implementation I find it highly concerning that we are manually writing the contents of structs that are supposed to be internal to libsecp256k1. This really should be upstreamed. If not we MUST have tests checking that the internal representation didn't change in a new version and we should document that people compiling the library for any platform we don't test in CI MUST test it themselves.

Also I noticed there are still several instances mentioning Copy which is totally irrelevant.

@kwantam
Copy link
Contributor Author

kwantam commented Feb 21, 2023

Thanks again @Kixunil for the helpful feedback!

Do they just blindly stick (overly-strong) fence everywhere? Sounds wrong. I would suggest we leave it off and document it.

The default implementations use a memory fence. It's possible that it's overly strong. It seems more likely to me that the zeroize authors have thought about this a lot longer than I have, and have a reason for choosing SeqCst. Getting rid of it without understanding their reasoning seems a bit more aggressive than I'm comfortable with.

Stepping back: the goal, from my perspective, is to provide library implementors with the tool they need to implement a conforming zeroize implementation in structs containing types from this library. Given that, I have a strong bias towards remaining consistent with what zeroize would provide if we'd instead just added derive(Zeroize) to these structs.

That seems to promote an unhealthy relationship with unsafe. unsafe is not some kind of boogeyman that will come to you at night and eat your head. I find it kinda annoying that the attribute even exists if it doesn't apply to dependencies. Ultimately, you have to review everything.

I completely agree with you! We should certainly not be scared of unsafe. But it still might make sense to contain it to a small number of crates in one's dependency tree, because it narrows the scope of the unsafe-related auditing effort.

Looking at the implementation I find it highly concerning that we are manually writing the contents of structs that are supposed to be internal to libsecp256k1. This really should be upstreamed. If not we MUST have tests checking that the internal representation didn't change in a new version and we should document that people compiling the library for any platform we don't test in CI MUST test it themselves.

👍 absolutely crucial, as you say. There's currently a test in src/key.rs asserting that an un_secure_erased KeyPair is identical to a KeyPair initialized with the secret key [1u8; 32], which is what found the endianness issues.

Also I noticed there are still several instances mentioning Copy which is totally irrelevant.

Great catch, thank you. I'll fix those right now.

This PR implements a `non_secure_erase()` method on SecretKey,
KeyPair, SharedSecret, Scalar, and DisplaySecret. The purpose
of this method is to (attempt to) overwrite secret data with
valid default values. This method can be used by libraries
to implement Zeroize on structs containing secret values.

`non_secure_erase()` attempts to avoid being optimized away or
reordered using the same mechanism as the zeroize crate: first,
using `std::ptr::write_volatile` (which will not be optimized
away) to overwrite the memory, then using a memory fence to
prevent subtle issues due to load or store reordering.

Note, however, that this method is *very unlikely* to do anything
useful on its own. Effective use involves carefully placing these
values inside non-Copy structs and pinning those structs in place.
See the [`zeroize`](https://docs.rs/zeroize) documentation for tips
and tricks, and for further discussion.

[this commit includes a squashed-in commit from tcharding to fix docs
and helpful suggestions from apoelstra and Kixunil]
@Kixunil
Copy link
Collaborator

Kixunil commented Feb 21, 2023

I have a strong bias towards remaining consistent with what zeroize would provide if we'd instead just added

People who want it can just call the fence right after calling our function. People who don't (e.g. they are only calling it in destructors) don't have to pay the cost. Libraries should strive to be flexible and this is more flexible without affecting ergonomics greatly.

@kwantam
Copy link
Contributor Author

kwantam commented Feb 21, 2023

People who want it can just call the fence right after calling our function. People who don't (e.g. they are only calling it in destructors) don't have to pay the cost. Libraries should strive to be flexible and this is more flexible without affecting ergonomics greatly.

I think there are two important questions here: what are we optimizing for, and what will the result of that optimization be?

The second one is easier to answer: there will be almost exactly zero performance improvement as a result of removing this fence. Every other zeroize implementation under the sun already has a fence, and on every processor I'm aware of the performance impact of multiple fences stacked up is deeply sublinear in (or even negligibly dependent on) the number of fences. So when we call non_secure_erase on these values as part of the Zeroize implementation of some containing struct, in all likelihood we will have saved very little.

Case in point:

fn _bench_fence<const FENCE_PERIOD: usize>() {
    let mut foo = [16u8; 32];

    for i in 0..(1 << 25) {
        let j = (i % 256) as u8;
        unsafe { ptr::write_volatile(&mut foo, [j; 32]); }

        if i % FENCE_PERIOD == 0 {
            atomic::compiler_fence(atomic::Ordering::SeqCst);
        }
    }

    unsafe { ptr::write_volatile(&mut foo, [17u8; 32]); }
    atomic::compiler_fence(atomic::Ordering::SeqCst);
}

Results are below. bench_X corresponds to _bench_fence::<X>(). I used a const generic rather than a function argument to give the compiler more optimization opportunities, e.g., loop unrolling, when X is small; this results in code closer to what we'd expect to see in a typical Zeroize impl. [EDIT: previously misstated my point in the prior sentence. I updated the sentence to reflect what I actually meant. Apologies for misstating initially.]

running 5 tests
test bench_4096 ... bench:  26,266,191 ns/iter (+/- 402,691)
test bench_512  ... bench:  26,455,962 ns/iter (+/- 441,852)
test bench_64   ... bench:  27,717,758 ns/iter (+/- 306,653)
test bench_8    ... bench:  26,296,641 ns/iter (+/- 1,313,000)
test bench_1    ... bench:  26,190,808 ns/iter (+/- 637,916)

In sum: stacking up more fences just doesn't matter. This is partly because the volatile writes are far more expensive than the fences, and partly because the performance impact of going from zero fences to one fence completely swamps the performance impacts of additional fences. Removing this fence just isn't a meaningful optimization in the vast majority of expected uses of this function, i.e., as part of a larger zeroize implementation of a containing struct.


Back to the first question: what are we optimizing for? If we're optimizing for performance of clearing secrets out of memory, the above benchmarks give pretty good evidence that this optimization isn't doing much. On the other hand, if we're optimizing for the fewest ways for a user to shoot themselves in the foot, it seems pretty obvious that adding a memory fence is the right way to go.

So as far as I can tell, removing the fence gains us nothing (performance-wise) and potentially loses a lot (in terms of foot-guns).

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 21, 2023

Nice analysis! OK, let's keep the fence in.

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW ACK 8fffbea

@apoelstra
Copy link
Member

Wow! This is a lot of analysis to wake up to. Thanks for soldiering through @kwantam and thanks for your detailed analysis @Kixunil.

I agree with Kix's discomfort at hardcoding internal reprs like this. But it's fine for now. I'll open an issue upstream about it.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8fffbea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants