Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can atomic-polyfill and portable-atomic soundly coexist in the same dependency graph? #39

Closed
cr1901 opened this issue Sep 13, 2022 · 11 comments
Labels
C-question Category: A question

Comments

@cr1901
Copy link

cr1901 commented Sep 13, 2022

Like it or not, atomic-polyfill and portable-atomic both exist as crates solving the same problem of "providing atomic primitives to targets that don't have them" in different ways. The major difference between the two crates is that:

  1. atomic-polyfill depends on critical-section, and expects another crate to define extern functions acquire and release. The critical-sectioncrate uses acquire and release to implement critical sections. If acquire and release aren't defined by an external crate, a link error will occur. In other words, the critical section implementation can be swapped out with another.
  2. portable-atomic implements critical sections for all targets within the portable-atomic crate, and does not defer to an external crate.

According to reverse dependencies on crates.io, these two crates don't have any shared reverse dependencies, and some of these reverse deps have 10 million+ downloads. This means it's plausible to have both crates in the same dependency graph. And this means it's also plausible that these two crates in the same dependency graph implement critical sections in different ways.

Is this a soundness problem in practice? I can't visualize one, since types from this crate won't share memory locations with types from atomic-polyfill. But it's enough to get me thinking and open an issue. If it is a soundness issue as opposed to simply "two crates doing the same thing in different ways are in the same dependency graph", is it possible to fail the build if both crates get pulled in, or automatically have portable-atomic be a shim for atomic-polyfill (or vice-versa)?

@taiki-e
Copy link
Owner

taiki-e commented Sep 13, 2022

Having both in the same dependency graph is not a problem.

A case where problems can occur is when atomic operations are performed on the same memory location, one as atomic-polyfill and the other as portable-atomic. (as you mentioned)

If you do this for portable-atomic and core::sync::atomic1, it is basically2 not a problem. However, doing this for atomic-polyfill may cause problems because the layout of atomic types is different from that of portable-atomic and core::sync::atomic. And it is possible that the implementation provided by the user may not be compatible with portable-atomic and core::sync::atomic3.

Footnotes

  1. Both have the same layout and also implement atomic types in a compatible way except for 2. Lock-based fallbacks in portable-atomic are not compatible with core, but there is no possibility of misuse because portable-atomic provides lock-based fallbacks only in cases where core cannot provide that type.

  2. If Optional (unsafe) cfg to use atomic builtins on targets where atomic CAS is not supported, or on atomic operations of unsupported sizes #33 or alternative is provided, core::sync::atomic and portable-atomic will be incompatible on targets where core::sync::atomic does not have CAS but has load/store. 2

  3. Also, depending on the user-provided implementation, but note that the atomic-polyfill may cause problems when used with implementations such as global allocators and some synchronization primitives.

@taiki-e taiki-e added the C-question Category: A question label Sep 13, 2022
@cr1901
Copy link
Author

cr1901 commented Sep 13, 2022

Re: footnote 2:

Does footnote 2 only applies to archs which have the potential to be multicore? I.e. msp430 core::sync::atomic types don't work properly, but if they did I would expect to be able mix and match accesses to the same atomic using portable-atomic and core::sync::atomic because disabling interrupts truly implies exclusive access to the memory location (though Idk if portable-atomic provides the transmute between core::sync::atomic types or not).

Re: footnote 3:

Also, depending on the user-provided implementation, but note that the atomic-polyfill may cause problems when used with implementations such as global allocators and some synchronization primitives.

Well the user-provided implementations are marked unsafe, so unsoundness that results from the safe wrappers calling into the unsafe extern functions is the implementor's problem :P. I don't think there's much that can be done for "safety variants are conditionally violated depending on the code that ultimately gets linked in" on the Rust end.

In any case, like it or not, for maximum crate compat, I'll eventually have to implement msp430 for atomic-polyfill as well as portable-atomic.

@taiki-e
Copy link
Owner

taiki-e commented Sep 13, 2022

Does footnote 2 only applies to archs which have the potential to be multicore?

Yes. footnote 2 is only for multicore systems.

Also, depending on the user-provided implementation, but note that the atomic-polyfill may cause problems when used with implementations such as global allocators and some synchronization primitives.

Well the user-provided implementations are marked unsafe, so unsoundness that results from the safe wrappers calling into the unsafe extern functions is the implementor's problem :P.

Well, since the critical-section safety requirements do not mention compatibility with these use cases, an implementation that is incompatible with them is also considered a sound critical-section implementation.

@taiki-e
Copy link
Owner

taiki-e commented Sep 17, 2022

Closing -- I believe I answered your question.

@taiki-e taiki-e closed this as completed Sep 17, 2022
@cr1901
Copy link
Author

cr1901 commented Sep 17, 2022

@taiki-e I didn't close this because I thought I had ask "should "Relationship with atomic-polyfill" be elaborate upon in the docs/README.md for this crate"? And you didn't have time to respond yet.

But doing CTRL+F, I've found that that I'd never asked such a thing :P. Oops.

Do you think such a mention of atomic-polyfill is worth a mention (along w/ comparison to portable-atomic) in the README.md? You can even copy and paste your responses/my questions into the README.md :D!

@taiki-e
Copy link
Owner

taiki-e commented Sep 17, 2022

Do you think such a mention of atomic-polyfill is worth a mention (along w/ comparison to portable-atomic) in the README.md? You can even copy and paste your responses/my questions into the README.md :D!

My thoughts on atomic-polyfill are that the atomic-polyfill and the feature-based approach it recommends can very easily introduce unsoundness (see smol-rs/atomic-waker#7 (comment) for more). So, my recommendation is that common libraries should avoid atomic-polyfill.

If we mention the atomic-polyfill in README we cannot avoid mentioning it. And when we mention it, it makes little sense to explain interoperability with atomic-polyfill in README because we recommend not the use of atomic-polyfill anyway. (The link to this issue is fine though.)

@cr1901
Copy link
Author

cr1901 commented Sep 17, 2022

My thoughts on atomic-polyfill are that the atomic-polyfill and the feature-based approach it recommends can very easily introduce unsoundness.

If you're referring to the multi-core unsoundness, that is gone in atomic-polyfill; atomic-polyfill does "either everything's emulated using critical-sections" or "nothing is emulated by critical-sections". I've asked about adding back "CAS only emulation" for single core targets.

That said, seeing at peripheral I/O access in Rust-embedded requires critical-section, I cannot get rid of at least critical-section as a dep. So I have to keep my implementation in the msp430 crate.

I'll get over it, but the duplication doesn't feel right to me:

  1. I will potentially have two pieces of the same nominally-inlineable code (acquire and release) provided by different crates.1 Thing probably work fine as-is, but for my own peace of mind, I'm happy to make a PR to export msp430's asm to portable-atomic as a dep if you're willing.
  2. The ecosystem is already fragmenting, where some crates are depending on portable-atomic and others on atomic-polyfill. And I'm not sure how comfortable I feel about adding a portable-atomic feature that overrides the atomic-polyfill dependency (to preserve feature union/avoid mutually-exclusive features) to every crate that chose to use atomic-polyfill over portable-atomic.2

Footnotes

  1. In msp430's case, the two different impls of interrupt-based critical sections are morally equivalent but may or may not emit the same insns; using a u16 in my case optimizes better for size thanks to extern in critical-section.
  2. I believe using portable-atomic anywhere atomic-polyfill is used is sound. atomic-polyfill being chosen as a dep for a downstream crate should be morally equivalent to a choosing the one best implementation of critical-section automatically using the cfgs provided by atomic-polyfill. However, such PRs to add a portable-atomic feature that overrides portable-atomic might not be accepted upstream.

@taiki-e
Copy link
Owner

taiki-e commented Sep 17, 2022

If you're referring to the multi-core unsoundness, that is gone in atomic-polyfill; atomic-polyfill does "either everything's emulated using critical-sections" or "nothing is emulated by critical-sections".

This seems to be about the unsoundness I mentioned in rust-lang/rust#99668 (comment), I see that it has been fixed.

However, my concern here is about the feature-based approach recommended by atomic-polyfill and critical-section and used by the crate that implements critical-section. In smol-rs/atomic-waker#7 (comment), I said:

The atomic-polyfill and the feature-based approach it recommends can very easily introduce unsoundness when the dependency enables that feature. (Even if they no longer provide a default implementation, even if they warn about it in their documentation. -- I have seen many libraries in the async ecosystem where the similar feature to select TLS or runtime is enabled by default or always enabled. So, I believe that such an approach is fundamentally fragile.)

See also mvdnes/spin-rs#114 and tokio-rs/bytes#461 (comment).

This is fine for some platforms (e,g., msp430 which is single core) and use cases, but can be problematic for libraries that are not embedded-specific.


Reading the rest of your comment, I felt you are hoping for portable-atomic to support critical-section crate. (Sorry if this is a misunderstanding.)

We have considered it in the past (#26) and would be fine with supporting it. However, since the version of critical-section is considered a public API (matklad/once_cell#190 (comment)), we may need to take an approach like "unsafe cfg to enable the implementation based on critical-section" + "feature to select the version of critical-section".

@cr1901
Copy link
Author

cr1901 commented Sep 17, 2022

Reading the rest of your comment, I felt you are hoping for portable-atomic to support critical-section crate. (Sorry if this is a misunderstanding.)

My concerns boil down to "we have two crates implementing the same functionality in two different ways, and the crates ecosystem is not converging on either of them." Which crate gets chosen seems to depend on who you know :P.

This leads to code duplication that is probably sound and or doesn't break1, such as you and I maintaining different versions of msp430 interrupt enable and disable. But I would find it weird to have "two versions of the same interrupt enable/disable code with potentially different codegen" in an application, when looking at e.g. objdump -D.

Additionally, when making changes to either of our interrupt enable/disable code, we have to make sure our two versions are semantically equivalent even if we have different codegen (see #40 (comment)). I think having the interrupt enable/disable code in one place, such as you depending on the msp430 crate, would remove that burden (and it becomes only my fault if I break the interrupt enable/disable code :P).

We have considered it in the past (#26) and would be fine with supporting it.

This would also solve my concern if I enabled the unsafe cfg. I would still need to add a portable-atomic feature to everything using atomic-polyfill that overrides the atomic-polyfill crate to get rid of the code dup to my satisfaction.

Footnotes

  1. Example of where things could break: If my application's dependency graph brings in both atomic-polyfill and portable-atomic, they should in principle coexist fine now that the CAS-only emulation was gone. atomic-polyfill doesn't work around backend bugs however, so when the CAS-only emulation was present, trying to compile atomic-polyfill even as a transitive dependency would result in an LLVM "cannot select" failure.

@taiki-e
Copy link
Owner

taiki-e commented Oct 17, 2022

we have to make sure our two versions are semantically equivalent even if we have different codegen

IIUC the problem occurs when this crate and another crate are disabling interrupts in incompatible ways. Examples of this are M-mode vs S-mode on RISC-V, disabling only irq vs disabling both irq & fiq on pre-v6 ARM, etc. Is there such a case with MSP430? (No as far as I know, but I'm not an expert on MSP430.)

FYI #44 has somewhat improved the situation on this, and the interrupt module's readme now has documentation on which way is used.

I think having the interrupt enable/disable code in one place, such as you depending on the msp430 crate, would remove that burden (and it becomes only my fault if I break the interrupt enable/disable code :P).

I basically agree with you on this. However, rust-embedded crates do not always generate the code I expect or support the older compilers that portable-atomic supports for historical reasons.

Some problems can be solved by contributions such as rust-embedded/msp430#12, but I don't want to make requests that clearly burden the maintainers of rust-embedded crates, such as support for older compilers. (Also, as mentioned in #40 (comment), it may allow for optimizations that are not possible with rust-embedded crates, which need to consider a variety of use cases.)

Therefore, there are no plans to change the current style of using our own implementation by default.
However, if some users do not want to use the default implementation of this crate, it is probably fine to support the msp430 crate as an optional dependency. (There is the issue of how to handle updates to dependency versions, though.)

@cr1901
Copy link
Author

cr1901 commented Oct 17, 2022

IIUC the problem occurs when this crate and another crate are disabling interrupts in incompatible ways. Examples of this are M-mode vs S-mode on RISC-V, disabling only irq vs disabling both irq & fiq on pre-v6 ARM, etc.

Indeed, this is another problem. However, I was approaching it from the "what happens if the implementations accidentally diverge in two or more crates and how do we efficiently make them all correct wrt each other?" angle. More implementations == more things can go wrong. But...

it may allow for optimizations that are not possible with rust-embedded crates, which need to consider a variety of use cases.)

This is also true, and compelling. I think I've changed my mind about "only one place to maintain the code" since I wrote that comment.

Is there such a case with MSP430? (No as far as I know, but I'm not an expert on MSP430.)

No, not right now. The critical-section provided by the msp430 crate can safely be used in place of the portable-atomic one. The converse isn't true, but... you've made a good argument that that's probably okay.

I have considered making an msp-threads crate for preemptive multithreading as an experiment, and an alternate critical-section implementation incompatible with disabling interrupts may be appropriate there. But I have not worked out the details about how such a critical-section would interact with portable-atomic.1

  1. Other than "a preemptive multithreading environment, to the extent possible, doesn't want other code to disable interrupts". And well, if you use portable-atomic with a hypothetical msp-threads, disabling ints is exactly what you get :D.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-question Category: A question
Projects
None yet
Development

No branches or pull requests

2 participants