Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "ack/nack" packets for unreliable transports #137

Open
daniel5151 opened this issue Apr 13, 2023 · 6 comments
Open

Support "ack/nack" packets for unreliable transports #137

daniel5151 opened this issue Apr 13, 2023 · 6 comments
Labels
API-non-breaking Non-breaking API change design-required Getting this right will require some thought help wanted Extra attention is needed new-api Add a new feature to the API (possibly non-breaking)

Comments

@daniel5151
Copy link
Owner

While most users of gdbstub are likely debugging over transport mechanisms that are inherently reliable (i.e: pipes, TCP, emulated serial, etc...), it would be good to nice if gdbstub also properly supported use-cases where the transport mechanism is not reliable (e.g: a noisy physical serial line).

See docs at https://sourceware.org/gdb/onlinedocs/gdb/Packet-Acknowledgment.html#Packet-Acknowledgment

gdbstub today has pretty minimal support for running in ack-mode, essentially assuming that all packets will send/recv successfully.

As such, gdbstub will currently fail if:

  1. The host sends a "nack" packet (i.e: requesting re-transmission of the last packet)
    • gdbstub will simply error-out with a "nack not supported" error
  2. gdbstub receives a malformed packet (e.g: mismatched checksum, invalid packet format, etc...)
    • instead of sending a "nack" packet to the host and waiting for a response, gdbstub errors-out immediately

Possible Solutions

Solving 2 will likely require catching certain errors as they propagate up the call stack, and instead of bailing out, squelching them and sending a "nack" packet.

Solving 1 will be a bit tricker, as gdbstub currently require that outgoing packets get buffered anywhere.

One possible approach: augment the Connection trait with a fn retransmit(&mut self) -> Result<bool, Self::Error> method that downstream implementations can optionally implement? i.e: return true if supported, false if not supported (which would be the default impl).

After all - we guarantee that the flush() method is called after each packet, so it should be pretty trivial for a Connection to maintain a buffer of its last-sent packet.

I suspect that this can be done in a backwards-compatible way, without requiring a breaking API change, but I'm not 100% sure.

And of course - any solution should ideally minimize binary size impact when running over a reliable transport.

@daniel5151 daniel5151 added help wanted Extra attention is needed new-api Add a new feature to the API (possibly non-breaking) API-non-breaking Non-breaking API change design-required Getting this right will require some thought labels Apr 13, 2023
@daniel5151 daniel5151 changed the title Properly handle "ack/nack" ('+'/'-') packets over unreliable transports Support "ack/nack" ('+'/'-') packets over unreliable transports Apr 13, 2023
@daniel5151 daniel5151 changed the title Support "ack/nack" ('+'/'-') packets over unreliable transports Support "ack/nack" packets for unreliable transports Apr 13, 2023
@daniel5151
Copy link
Owner Author

@xobs this may or may not be relevant to betrusted-io/xous-core#361

You might want to double-check what errors the gdbstub is reporting when it dies. If it's related to packet parsing / or nack packets, then it might be worth looking into fixing this.

On the other hand, I would've expected any reasonably modern serial hardware to be reasonably error free, so unless you're working with some particularly shoddy hardware, it may very well not have anything to do with this.

@xobs
Copy link
Contributor

xobs commented Apr 13, 2023

It's difficult to do that because the device has only one serial port so when debugging there is no other way to get errors out. The best I could do is to panic, or maybe buffer the messages and send them later. Unless there is a way to send strings to gdb that doesn't involve being a callback in a Monitor.

I think what's going on in this particular case is that the FIFO is getting filled because gdb is blasting it with data. The solutions in this case are:

  1. Increase the FIFO so that we can weather the onslaught
  2. Use ACK mode to prevent deep buffering
  3. Rework the debugger so that it doesn't run in an ISR but runs in a bottom-half. This is more complicated and involves spawning a second thread in the kernel -- doable, but untested and potentially difficult

So the issue isn't that errors are appearing, the issue is that GDB is sending data too fast and characters are getting dropped, and this serial link doesn't have CTS/RTS signals wired up. Perhaps XON/XOFF could work, if GDB's remote serial protocol supports that.

@daniel5151
Copy link
Owner Author

I would definitely suggest setting up some kind of system to stash errors / panics across resets + report them when you get the chance (akin to something like panic_persist), and not just for debugging gdbstub issues (seems like a useful thing to have in general 😅)

the issue is that GDB is sending data too fast and characters are getting dropped

Hmmmm. On one hand, this does sound like something that you might potentially be able to solve on your end... but on the other hand, if characters are getting intermittently dropped (for whatever reason - be it a noisy line, or a line that's occasionally sending data too quickly), it might make sense to just work around that by investing in proper ack/nack infrastructure?

As I allude to in the top-level comment, hacking together a slap-dash fork of gdbstub with a POC nack/retransmission implementation might not be too hard. I would (selfishly) suggest taking a crack at that (if you're so inclined), and if it fixes the stability issues, I can absolutely help to polish it up / find some time to write a polished implementation myself.

@xobs
Copy link
Contributor

xobs commented May 13, 2023

Good news on this front -- part of the problem was due to the fact that we were rewriting the process pagetables as part of every access, AND every access was bytewise. By grouping accesses into 16- and 32-bit operations, and by simply setting a CPU register bit rather than flusing the pagetables, we've improved things to the point where it's more reliable. I'm still waiting to hear back on whether No-Ack is still required, but for the most part this seems to be much improved.

I'll leave this issue here anyway in order to track No Ack mode, but I thought you'd like to hear about this success.

betrusted-io/xous-core#380

@xobs
Copy link
Contributor

xobs commented May 13, 2023

Actually, it seems as though this is working for our purposes even WITH no-ack mode.

And with the latest patch making no-ack optional, I think this issue can be closed now.

@daniel5151
Copy link
Owner Author

daniel5151 commented May 13, 2023

Hey, that's great to hear! Glad you were able to sort out your reliability issues!

I'm going to leave this issue open, as ack mode is technically a part of the GDB RSP that gdbstub should have support for (eventually).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-non-breaking Non-breaking API change design-required Getting this right will require some thought help wanted Extra attention is needed new-api Add a new feature to the API (possibly non-breaking)
Projects
None yet
Development

No branches or pull requests

2 participants