Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does Zeroize use a fence? #988

Open
oconnor663 opened this issue Dec 1, 2023 · 16 comments
Open

Why does Zeroize use a fence? #988

oconnor663 opened this issue Dec 1, 2023 · 16 comments

Comments

@oconnor663
Copy link

oconnor663 commented Dec 1, 2023

The docs mention that the fence prevents reordering "as a precaution to help ensure reads are not reordered before memory has been zeroed". My understanding is that the compiler and the hardware are already obligated to make sure that a thread reads its own writes, so reorderings are only possible from the perspective of other threads (or interrupts). But zeroize is using a compiler fence rather than a full atomic fence, so it doesn't seem like we're worried about other threads here. Am I missing some other scenario? What reorderings could bite us here without the fence?

@tarcieri
Copy link
Member

tarcieri commented Dec 1, 2023

I believe when it was originally added the concerns were things like interruption by signal handlers, though now I can't find the previous discussion

@newpavlov
Copy link
Member

newpavlov commented Jan 22, 2024

I also think that the fences are not necessary. With the current crate MSRV we may even replace the volatile writes with simple writes followed by the following "black box" asm!:

asm!(
    "# {x}", // "use" named argument inside comment
    x = in(reg) x,
    options(nostack, readonly, preserves_flags),
);

As we can see here, it's sufficient for preventing compiler from eliminating zeroization and it should be universal across supported targets.

@tarcieri
Copy link
Member

asm! is only supported on a limited number of platforms, so we still need a portable fallback.

BTW this is more or less what was proposed in #841.

@newpavlov
Copy link
Member

asm! is only supported on a limited number of platforms

Hm, yes. I thought empty asm! would work even if proper syntax is not yet stable, but it's not the case.

We probably then should use the asm! black box on supported targets and use core::hint::black_box as a best-effort fallback on others. I played with the later on godbolt and on simple examples it works as expected, with a slightly worse code generation (on x86 it adds two unnecessary instrucitons).

As for the fences, I think they are simply not necessary. Lack of explicit memory synchronization may cause other cores to see non-zeroized data with relaxed reads for a short time window, but IIUC it can only happen if the data was already cached by the core.

@tarcieri
Copy link
Member

tarcieri commented Jan 22, 2024

We probably then should use the asm! black box on supported targets and use core::hint::black_box as a best-effort fallback on others.

The documentation for core::hint::black_box is pretty adamant that it shouldn't be used for these sorts of applications and I think it would at least invite questions if we were to use it, even if the codegen spotchecks:

https://doc.rust-lang.org/stable/core/hint/fn.black_box.html

"Note however, that black_box is only (and can only be) provided on a best-effort basis. The extent to which it can block optimisations may vary depending upon the platform and code-gen backend used. Programs cannot rely on black_box for correctness, beyond it behaving as the identity function. As such, it must not be relied upon to control critical program behavior. This immediately precludes any direct use of this function for cryptographic or security purposes."

As for the fences, I think they are simply not necessary.

I can't find a good justification for them, so they can probably be removed.

@newpavlov
Copy link
Member

This is why I wrote black_box being "best effort". According to the rust-lang Zulip discussion linked in the PR, the scary documentation is mostly a way for the Rust team to waiver any responsibility around potential issues regarding surprising black_box codegen. IIUC the main concern is about uses of black_box to enforce const time execution, not about elimination of writes. But it's the most suitable tool we currently have in the language outside of empty asm! blocks. Our use of write_volatile can be considered a misuse since the function is intended for IO memory and thus compiler in theory may eliminate it when it's used on a stack memory.

@tarcieri
Copy link
Member

@newpavlov based on my discussions with the Unsafe Code Working Group, zeroize's usage of write_volatile is fine. In fact, it lead to changing some of the language in the documentation which suggested otherwise:

rust-lang/rust#60972

cc @RalfJung

@oconnor663 oconnor663 changed the title Why does Zeroice use a fence? Why does Zeroize use a fence? Jan 22, 2024
@RalfJung
Copy link

RalfJung commented Jan 22, 2024 via email

@newpavlov
Copy link
Member

@RalfJung
Could you please comment on difference between core::hint::black_box and empty asm! block like here? I wonder why the former emits additional unnecessary instructions. Is it guaranteed that the asm! block makes the value "observed" and that compiler can not eliminate last writes to it?

@RalfJung
Copy link

asm! just guarantees that that assembly gets emitted, nothing else. black_box doesn't hard guarantee anything, as is stated in its docs.

Volatile writes seem like the most reliable approach to me, but I'm not an expert in what compilers actually will do when it comes to secret management. I can only comment on the language spec, which is not very useful in this case since the spec does not account for secrets.

@newpavlov
Copy link
Member

newpavlov commented Jan 22, 2024

asm! just guarantees that that assembly gets emitted

The question is whether compiler is allowed to look inside a provided assembly block or not. If during optimization passes it's allowed to only analyze provided input/output arguments and options (let's forget about register allocation for named arguments for now), then I think it would be a sufficient guarantee for us.

With the empty assembly block we rely on the fact that we do not mark the assembly block as "pure" and pass pointer to zeroized data to it as an input argument. In other words, we make compiler to believe that the value may be observed and this observation may have side effects, thus compiler can not remove zeroization present before the block. An "unnecessarily smart" compiler may peek inside the assembly block and prove that the pointer is actually not used or that the assembly block is "pure", thus applying undesirable optimizations.

Unfortunately, we have already encountered at least one case (honestly, it was a really unpleasant surprise) when compiler "peeks" inside assembly block (it does not allow to use some target feature dependent instructions without enabling appropriate target features), so it would be nice to clarify on what we can and can not rely regarding assembly blocks.

@RalfJung
Copy link

I'm the wrong person for these questions. I was under the impression that it doesn't "peek" into asm blocks at all.

Cc @Amanieu

@newpavlov
Copy link
Member

newpavlov commented Jan 22, 2024

I was under the impression that it doesn't "peek" into asm blocks at all.

Here is the relevant issue: rust-lang/rust#112709

arch64-unknown-linux-gnu does not assume that AES instructions are always available, and attempting to use AES instructions in an inline assembly without enabling the aes target feature will result in an error.

@Amanieu
Copy link

Amanieu commented Jan 22, 2024

The question is whether compiler is allowed to look inside a provided assembly block or not. If during optimization passes it's allowed to only analyze provided input/output arguments and options (let's forget about register allocation for named arguments for noц), then I think it would be a sufficient guarantee for us.

The problem isn't about the compiler peeking into the asm. The current implementation as written does guarantee that a copy of the data is zeroed. But nothing about the compiler/optimizer spec is guaranteeing that this is the only copy of the data that exists. You're relying on the optimizer not introducing unnecessary copies, which is something that it is perfectly allowed to do, although it usually won't because that tends to result in worse performance.

@tarcieri
Copy link
Member

FWIW, much of what @Amanieu is talking about is highlighted in this paper: https://eprint.iacr.org/2023/1713.pdf

We should definitely add stack spilling to the list of caveats documented here:

utils/zeroize/src/lib.rs

Lines 193 to 195 in 160fa8c

//! - Moves and [`Copy`]
//! - Heap reallocation when using [`Vec`] and [`String`]
//! - Borrowers of a reference making copies of the data

@newpavlov
Copy link
Member

You're relying on the optimizer not introducing unnecessary copies

Yes, it's true, but it's a separate issue. The copy zeroization guarantee is a fine start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants