Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contract of spi::FullDuplex in presence of hardware FIFOs? #130

Closed
edarc opened this issue Mar 10, 2019 · 17 comments
Closed

Contract of spi::FullDuplex in presence of hardware FIFOs? #130

edarc opened this issue Mar 10, 2019 · 17 comments

Comments

@edarc
Copy link

edarc commented Mar 10, 2019

Background: I'm using a SPI device on an STM32F303, with the stm32f30x-hal crate. In this crate, the FullDuplex::send implementation does the following check (edited below for clarity):

if sr.txe().bit_is_set() {  // <-- note this register
    unsafe { ptr::write_volatile(&self.spi.dr as *const _ as *mut u8, byte) }
    Ok(())
} else {
    Err(nb::Error::WouldBlock)
}

I am guessing, based on the way FullDuplex is documented, that the intention here is to make sure the TXFIFO is empty before accepting a write, otherwise it should nb::WouldBlock. That would have the effect that if send accepts your write, then as soon as read returns a value (and not nb::WouldBlock) exactly once, you know that the SPI bus has quiesced.

In a driver I maintain, there is a string of blind writes (discarding data from MISO) to the device it owns through the SPI impl, and it tries to pipeline the writes by repeatedly doing the following:

match self.spi.send(word) {
    Ok(()) => {
        let _ = self.spi.read();
        Ok(())
    }
    Err(nb::Error::Other(_)) => Err(nb::Error::Other(())),
    Err(nb::Error::WouldBlock) => Err(nb::Error::WouldBlock),
}

Under the assumption I described above about read's behavior, I was originally synchronizing at the end of this string of blind writes by blocking once on read. However this sometimes would return before the bus quiesces. In trying to understand this, I discovered that the TXE bit of the STM32F303's SPI control register, despite its name, does not in fact indicate the TXFIFO is empty, it indicates it is no more than half full:

[...] write access of a data frame to be transmitted is managed by the TXE event. This event is triggered when the TXFIFO level is less than or equal to half of its capacity. Otherwise TXE is cleared and the TXFIFO is considered as full.

-- STM32F303 reference manual, §30.5.9 (warning: PDF)

So this means that my match block can succeed at calling send multiple times before read ever returns Ok, and so I cannot really know how many values I need to read before the bus is guaranteed to be quiescent.

I was trying to determine how to fix my problem, but I am not sure how to proceed, because embedded-hal FullDuplex trait does not document whether an impl that queues sends in a FIFO is conforming or not.

  • If queueing is non-conforming, then I suppose it is a bug in stm32f30x-hal: it should really check BSY, or perhaps FTLVL, not TXE. In fact, I noticed that a commented-out manual impl of blocking::spi traits does spin on BSY, so it would have actually behaved differently.
  • If allowing to queue writes is conforming, then FullDuplex needs to provide a way to detect when the SPI bus is quiescent, otherwise it seems to me as though it isn't possible to correctly synchronize with the SPI device e.g. for managing a CS line. Trying to infer it from the behavior of read is not sufficient, because although you can empty the RXFIFO by doing this, the TXFIFO may still have items in it, which will sometime later cause RXFIFO to again be non-empty, and there is no way to detect this. I "fixed" my driver by having it call read repeatedly until it nb::WouldBlock, but for the reasons just mentioned I believe this is still technically incorrect. (Edit: I suppose you can manually keep track of whether read has returned Ok exactly as many times as send returned Ok, but it seems like it would be far more ergonomic to just query the driver)

I guess my request here is to clarify the contract of FullDuplex with regard to FIFOs or other queueing behavior, so it's obvious what to do about this.

If it matters at all, I'd slightly prefer declaring that a queuing impl is conforming and then providing a way to detect quiescence. While digging around about this problem, I also found japaric/stm32f30x-hal/issues/28, in which it appears the reporter believes, as I originally did, that this impl is not using the FIFOs and requesting that it do so.

@david-sawatzke
Copy link
Contributor

FYI: This is what #120 tries to tackle (I think)

@agalakhov
Copy link

Another question: in full-duplex SPI mode each send() results in reading the same amount of bits that has been send. I believe it makes sense to change the return value of send() to Result<T, Error> (just like the one of read()) so that sending also implies reading. This would solve lots of problems.

And, please, please, add half duplex (send only and receive only) traits too.

@david-sawatzke
Copy link
Contributor

david-sawatzke commented Mar 30, 2019

Returning values unfortunately doesn't work out that well. You can't use it at all with fifos at all and even without it's kind of strange & non-deterministic and only really makes sense in combination with block:

if Err(Error::WouldBlock) == spi.send(0x40) {
    // Do some stuff and decide something else needs to be sent
    let x = block!(spi.send(0x60));  // What byte gets sent out? Do two bytes get sent out?
}

I've opened a pr for send-only traits as well (#121), but it depends on #120. If you need this, you might want to bring it up to the hal team in the weekly meetings.
EDIT: Clarified example

@agalakhov
Copy link

agalakhov commented Mar 30, 2019

Then probably a deferred read will do? Let write() return Result<PendingRead>:

let r = spi.write(42).unwrap();
let reply = r.read().unwrap();

or something like this. Anyway, on the hardware level a successful write of one byte guarantees exactly one incoming byte regardless of the FIFO, and this should be somehow reflected in the API.

Also, I believe the FullDuplex trait also needs some guarantees about error recovery. Many implenetations (all STM32 ones?) just return an error if error flags are seen, not resetting these flags and resulting in a permanent failure if the failure was observed once. That is, the code like if Err(x) = spi.send(0x40) won't work, as the second send is guaranteed to fail immediately.

I know at least three important cases for send-only traits: driving shift-registers, driving WS2812 LEDs and driving the ST7565 character display. Especially in the WS2812 case due to hard real-time requirements calling read() to read back is a pain. It leads to very strange behaviour while the code runs fine alone, but stops working by just adding extra pause in the main loop. The pause results in FIFO overrun.

@david-sawatzke
Copy link
Contributor

My example code was a bit unclear, I didn't mean an error condition but a blocking "Error", sorry 'bout that, updated it.

The PendingRead idea is interesting, but in my opinion not that great in this case:

  • If you read them out of order, what should happen? Should the old bytes be discarded (since you can only read the oldest bytes)?
    • If you just block, you need two different blocking "Errors" (not yet received or not in front of fifo)
  • What happens if you read an already discarded byte (by the fifo or by the first bullet point)? Just return an Err(Other(Discarded))?
  • How would you handle fifo overruns where new bytes get ignored (for example on the stm32f0), since read overruns can happen easily?

While it's flexible, it only seems realistic to use it when you have hardware knowledge. Otherwise, you can (hopefully) only rely on the most recent byte, for not much additional gain compared to just adding the restrictions proposed by me in #120. Adding device-specific methods to be able to use the rx fifo efficiently seems like the way to go, but out of the scope of embedded-hal.

It also seems to add a lot of complexity to the hal.

I agree on the need for more "standardized" hal errors & recovery, for something universal like crc or similar things, but haven't come up with great yet. If you have anything I'm sure the hal team would be receptive.

For my ws2812 implementation, I'm just doing a non-blocking read and it worked on anything i tested it on (anything i've seen has larger rx than tx fifos, so even extended pauses shouldn't be an issue in that regard), but I agree it's not great.

@agalakhov
Copy link

@david-sawatzke The implementation I'm talking about is exactly yours. And it does not work properly if combined with external delays on certain microcontrollers. Even in the simple blink example on a STM32F103 it result in FIFO overrun. What's happening is, the read() call is done too early and returns WouldBlock, so do most of the calls, and the balance between read() and send() is violated. Then a FIFO overrun occurs.
(BTW, I believe I found one more issue and I believe it is in ws2812-spi and is protocol-related, investigating it now.)

@edarc
Copy link
Author

edarc commented Mar 30, 2019

@david-sawatzke You've mentioned a couple of times that you don't think it's possible to have FullDuplex support use of hardware FIFOs. Can you explain why you think that? I read through #120 and still don't understand the reasoning.

It is really discouraging as a driver writer to hear "well if you want to use FIFOs, drop embedded-hal support and write directly against the device RAL". One of the drivers I maintain is a graphical display driver which streams O(thousands) of pixels through SPI that are derived from an iterator, and while yes it works fine without buffering, it gives a nontrivial performance improvement to allow buffering so that Iterator::next() does not need to strictly interleave with FullDuplex::send().

To be clear, I do not think the FullDuplex trait as it is defined today is sufficient to support correct FIFO usage, but that's what I'm wondering: what is the minimum change required to allow someone to use FullDuplex with FIFOs in a well-defined way? I think it's actually not that hard to fix; as far as I can determine the only things missing are:

  • The ability to query whether the SPI master has become idle, meaning "TXFIFO is empty and all bus traffic has finished". This is needed for (among other things) correctly managing CS lines. I put together a non-blocking version of shared-bus supporting FullDuplex. It works fine except that CS line changes are racy because you cannot reliably tell when it is safe to toggle them, and I believe having this primitive from the HAL would fix it.
  • Availability of a way to prevent overruns. A couple options:
    • Add language to the effect that, if you empty the RXFIFO by calling read() until it returns WouldBlock immediately before each send(), you have a guarantee that all accepted send() calls will have space in RXFIFO for the return data (assuming there is no insane hardware around where the capacity of RXFIFO is less than 1 + the capacity of TXFIFO). This enables the user to program to avoid overruns, but does not mandate it. It also does not require FullDuplex impls to do anything extra unless their hardware is insane in the aforementioned way.
    • Enforce in the impls by changing send()'s contract to not permit sends that will overflow the RXFIFO if no read() occurs, e.g. by returning WouldBlock. Said another way, enforce that all entries in TXFIFO have an open slot in RXFIFO ready to accept the return data at the time of the send() call. This is more work for the impl, but an overrun cannot occur due to API misuse. It has a downside of potentially creating the opportunity for block!(spi.send(...)) to block forever, but I think that might be possible already? I guess you can return some other Err to avoid that.

@agalakhov
Copy link

agalakhov commented Mar 30, 2019

@edarc The problem we're facing is, many crates that use SPI via FullDuplex fail to empty incoming queue properly. While it seems to be doable, the amount of errors done here is above average in the Rust world. This indicates that something important is missing.
I totally agree with you about missing things, and especially the second one (lack of availability to prevent overruns) is a pain. I personally like such an option:

  • Allow doing send() if a receive overflow occurs. Provide a way of dropping RXFIFO contents and clear its overflow flag (probably together with other errors). Reasoning:
    • How does one impl<T> SendOnly for T where T: FullDuplex? Or will it be the opposite, trait FullDuplex: SpiSend + SpiRead?
    • I generally expect that, if I only send and never read, FullDuplex acts like SendOnly.
    • I expect that most things like "A must always be called after B" are enforced by the Rust type system.
    • Some applications may combine write only access with full duplex or read only if multiple devices are connected to the same SPI bus. Then one may want to send (discarding all input), then reset, then send and receive. Sending and receiving are likely to be done with different drivers, and the sending driver is likely to be send-only. That is, a temporary send-only mode should be available even for full duplex SPI.

@david-sawatzke
Copy link
Contributor

@agalakhov Unfortunately I haven't used the stm32f103 yet, but that does sound bad.
If you find anything I could do (other than hoping something like #120 or an spi send trait being accepted/implemented), tell me please.

@edarc The TxFifo is fine, but the RxFifo isn't in my opinion. You can do the TxFifo potentially transparently, but not the RxFifo.
Unfortunately lots of hardware has RxFifos less than TxFifo + 1, many only including 1 bytes of send/receive Fifo (see: lots of avrs, stm32f103, stm32f413, probably loads of other ones).

Both of your options have flaws:

Your first option doesn't work (at least easily), because "insane" hardware like that is pretty widely spread.
It also makes for pretty horrid code (in my opinion).

while Err(WouldBlock) != spi.read() {}
spi.send(0);
while Err(WouldBlock) != spi.read() {}
spi.send(0);

I think there might also be race conditions in there for "normal" hardware, but I'm not that sure.
But it does reflect the current state the closest.

Your second option forces drivers to assume the worst case and we're back to the good ol' block(spi.send()) block(spi.receive()) dance, which makes Fifos useless for the normal case

@agalakhov
Copy link

@david-sawatzke The only reliable option for now is to do blocking reads sometimes. But this may interfere with the realtime nature of WS2812. For now I just hacked the SPI implementation to ignore FIFO overruns, but this is not an option for long-term use. We really need something like #120 or, better, an API that has no implicit assumptions and does everything using the type system. Or, even better, the write-only SPI mode.

@agalakhov
Copy link

A good example of how things could be done is the command set of FTDI's MPSSE engine. It has following commands:

  • Clock SCK for N cycles and receive N bits from MISO.
  • Clock SCK for N cycles and send N bits to MOSI. Discard MISO data.
  • Clock SCK for N cycles, send to MOSI and receive from MISO at the same time.
  • Idle clock SCK for N cycles, don't send and don't receive.

The fourth one is not needed unless SPI is abused for tasks other than serial data transfer, so probably nobody needs this in traits. First three ones correspond to read-only, write-only and read-write access. I imagine something like this:

trait ReadOnly<T> {
    fn read(&self) -> Result<T>;
}
trait WriteOnly<T> {
    fn send(&self, data: T) -> Result<()>;
}
trait FullDuplex<T>: ReadOnly<T> + WriteOnly<T> {
    fn send_read(&self, data: T) -> Result<T>;
}

but it is hard to combine with both FIFO and non-blocking at the same time.

@edarc
Copy link
Author

edarc commented Mar 31, 2019

@david-sawatzke

@edarc The TxFifo is fine, but the RxFifo isn't in my opinion. You can do the TxFifo potentially transparently, but not the RxFifo.
Unfortunately lots of hardware has RxFifos less than TxFifo + 1, many only including 1 bytes of send/receive Fifo (see: lots of avrs, stm32f103, stm32f413, probably loads of other ones).

Argh, that is unfortunate. I really should learn to not underestimate how user-hostile hardware can be :-)

Both of your options have flaws:

Your first option doesn't work (at least easily), because "insane" hardware like that is pretty widely spread.
It also makes for pretty horrid code (in my opinion).

while Err(WouldBlock) != spi.read() {}
spi.send(0);
while Err(WouldBlock) != spi.read() {}
spi.send(0);

I think there might also be race conditions in there for "normal" hardware, but I'm not that sure.
But it does reflect the current state the closest.

I agree it's not great, I was just trying to think of an option that doesn't involve breaking changes to the trait. You're probably right in that it may race if there is an interrupt between read and send but I can't think of a concrete example.

Your second option forces drivers to assume the worst case and we're back to the good ol' block(spi.send()) block(spi.receive()) dance, which makes Fifos useless for the normal case

Sorry, maybe I'm being thick.. what assumptions must be made if send guarantees there cannot be overruns? The idea of wanting to use a non-blocking trait with FIFOs is so that in cases where you don't have data dependencies between consecutive read and write calls, you can just do something like

let mut maybe_tx = iter.next();
loop {
  if let Some(tx) = maybe_tx {
    match spi.send(tx) {
      Err(WouldBlock) => {
        spi.read().map(|rx| my_buffer.push(rx)); // or whatever
      }
      Ok(()) => { word = iter.next(); }
    }
  } else {
    break;
  }
}

If RXFIFO doesn't have space to accept the MOSI word for this send after accounting for what's already in TXFIFO, send does not accept the write until you make room by having read succeed. As far as I can determine this works regardless of the relative sizes of TXFIFO and RXFIFO, and if you have usable FIFOs then that code will utilize them safely. But, if you're seeing a problem I'm not, I'm happy to be corrected.


@agalakhov

I think splitting the trait does make sense, I was just trying to avoid it because it's a breaking change, but maybe the current API is not salvageable without breaking it or making many implementations non-conforming (which #120 does AFAICT). I maintain both a write-only driver and a read-write driver, and it would be useful in the former case to have the HAL worry about throwing away the MISO data for me so I don't have to deal with it. But I'd definitely want that to be opt-in behavior; having FullDuplex discard data silently if I screw up send vs read in my read-write driver would be really unfortunate—worse than the current situation IMO because incorrect code that does care about MISO will go from crashing to causing silent corruption.

You'd need to think carefully about how devices switch from one of these modes to the other if the API remains non-blocking, otherwise it becomes extremely hard or impossible to do bus sharing correctly.

My only other observation would be that ReadOnly seems maybe not needed and probably unusable in some cases where you'd otherwise be tempted to use it. For example some SPI devices have requirements as to what gets clocked in on MOSI if you're "just reading". For example, MAX7301 you have to send in a specific "no-op" word, such that it knows you are not pipelining in a second command at the same time as you shift out the read data from the first command. I don't think it's too onerous to ask the user to supply the word that should be sent when they just want to read; in cases where it doesn't matter they can just supply an arbitrary constant.

@agalakhov
Copy link

@edarc My personal opinion is, taking into account the amount of errors I saw, the current API is not salvageable. I also believe that a breaking change here is even desirable in order to force both HAL implementors and HAL users to fix the behaviour. Looks like virtually nobody (including me) does this correctly. I'd rather deprecate the existing trait and introduce new ones.

My only other observation would be that ReadOnly seems maybe not needed and probably unusable in some cases where you'd otherwise be tempted to use it.

It's about wiring, not about periphery. In some cases there is no MOSI at all, and the corresponding pin is configured as GPIO. The precise meaning of traits should probably be "Has MOSI", "Has MISO" and "Has both".

@edarc
Copy link
Author

edarc commented Mar 31, 2019

@edarc My personal opinion is, taking into account the amount of errors I saw, the current API is not salvageable. I also believe that a breaking change here is even desirable in order to force both HAL implementors and HAL users to fix the behaviour. Looks like virtually nobody (including me) does this correctly. I'd rather deprecate the existing trait and introduce new ones.

Agree that it is extremely difficult to use the current trait correctly in any arrangement that couldn't just be served by the current blocking Write or Transfer traits.

I'm not strongly opinionated on how it gets fixed, other than I'd obviously be sad if we ruled out FIFO support :-) ... I'm even fine if FIFO access is exposed through separate traits so that users who don't want to worry about ordering and overrun issues can just use the non-FIFO traits.

My only other observation would be that ReadOnly seems maybe not needed and probably unusable in some cases where you'd otherwise be tempted to use it.

It's about wiring, not about periphery. In some cases there is no MOSI at all, and the corresponding pin is configured as GPIO. The precise meaning of traits should probably be "Has MOSI", "Has MISO" and "Has both".

Ah, yes that makes sense.

@agalakhov
Copy link

other than I'd obviously be sad if we ruled out FIFO support

Strictly speaking, there is no SPI without FIFO. At least one byte is always buffered. "No FIFO" means FIFO of size 1.

@eldruin
Copy link
Member

eldruin commented Sep 16, 2020

Can this be closed since #120 has been merged?

@Dirbaio
Copy link
Member

Dirbaio commented Apr 28, 2023

3 years later...

yes.

@Dirbaio Dirbaio closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants