Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal refactor to enable using 100% of PacketBuf as response buffer #70

Open
daniel5151 opened this issue Sep 8, 2021 · 15 comments
Open
Labels
API-non-breaking Non-breaking API change design-required Getting this right will require some thought

Comments

@daniel5151
Copy link
Owner

As discussed in #69, we'll gradually want to move various callback based APIs over to a "here is a &mut [u8] buffer, please fill it with data" APIs.

Instead of allocating a whole new outgoing packet buffer for this, it'd be nice to reuse the existing PacketBuf. This could work, since packets that request data are almost always able to be parsed into fixed-size, lifetime-free structs, which would leave the packet buf available to be used as scratch space.

Unfortunately, the current implementation of the packet parser includes a "late decode" step, whereby target-dependent fields (such as memory addresses / breakpoint kinds / etc...) are parsed into &[u8] bufs in the packet parsing code, and are only converted into their concrete types later on, in the handler code (where the type of Target is known). This is an important property to maintain, as eventually, we'd want to support debugging multiple Target types at the same time (e.g: on macOS, a single gdbserver can debug both x86 and ARM code), and the only way to do this would be by having the packet parsing code be Target agnostic.

Instead, there should be some way of obtaining a reference to the entire, raw, underlying &mut [u8] PacketBuf after the late decode step has been completed, but this is harder than it seems. Getting the lifetimes to line up here will probably be tricky, and I suspect getting this working will require some real code-cotorsion, and possibly even a sprinkle of unsafe.


In the meantime, we'll be going with the approach used by the m packet, whereby the packet parsing code will "stash" a &mut [u8] pointing to the trailing unused bit of the buffer as part of the parsed struct.

This works, but is a bit wasteful (as not 100% of the packet buffer is being utilized), and also a bit annoying to implement on a per-packet basis. Nonetheless, the GDB RSP allows targets to return less than the requested amount of data provided without there necessarily being an error, specifically because certain implementation might be using different sized buffers for incoming / outgoing data.

Given that "workaround" works pretty well, and that losing ~30 bytes of a ~4096 byte PacketBuf isn't particularly noticeable, getting to 100% efficiency isn't a super high priority, but it's still something to think about, and potentially implement at some point.

@daniel5151 daniel5151 added the API-non-breaking Non-breaking API change label Sep 8, 2021
@bet4it
Copy link
Contributor

bet4it commented Sep 10, 2021

We can store the Range of PacketBuf in ParseCommand rather than &[u8]?

@daniel5151
Copy link
Owner Author

Some relevant discussion regarding this issue: #72 (comment)

@bet4it
Copy link
Contributor

bet4it commented Sep 22, 2021

Summary of discussion in #72 (comment):

Because of the existence of vFile:readlink, we must use PacketBuf to hold the input data, so it's impossible to use 100% of PacketBuf as the response buffer.
We have three ways to solve it:

  1. Accept an optional PacketBufOut parameter, the handler code would check for it's existence when deciding which PacketBuf to pass to the target implementation.
    Drawback: if PacketBufOut is not provided, we can only use part of PacketBuf as response buffer, which is smaller than PacketSize, then in some situations we will get an unreasonable behavior, such as it's not enough to store the result of readlink.
  2. Split the PacketBuf user provided into two, and set PacketSize to half of the length of PacketBuf.
    Drawback: Because most packets don't need a huge buffer of PacketIn and PacketOut at the same time, it will result in a bunch of memory which is not used all that often.
  3. Make the gdbstub constructor take a const-generic value for the PacketBuf len, then we can allocate on the stack with size of this value to hold the input data when needed.
    Drawback: We may can't allocate such a big stack space in some limited environment.

I prefer the third way. The environment that can't allocate a big stack space is really rare, and it's very likely the Host I/O extensions or other extensions that need the stack space are not implemented in such environment, or we can set the length of PacketBuf to a smaller value that can be allocated on the stack.

@daniel5151
Copy link
Owner Author

This is a great summary of our discussion - thanks you!

Personally, I prefer the first way. It gives the end user the ultimate flexibility wrt. where they'd like to allocate this additional response buffer, and exactly how large it should be. Much of the "complexity" of this approach will be hidden within gdbstub itself, whereby the user-facing API will be a single additional GdbStubBuilder method.

That said, as I've stated earlier: I believe the current readlink implementation - even with it's edge-case limitations - should work just fine in almost all use-cases. As such, I don't think this is a task we should prioritize at the moment, and leave things as-is until someone expresses a need for a more robust solution.

@daniel5151 daniel5151 added the design-required Getting this right will require some thought label Sep 25, 2021
@bet4it
Copy link
Contributor

bet4it commented Sep 27, 2021

I think the first way is quite meaningless. We must know why we want to use the whole PacketBuf as the response buffer, because the spec tells us that the size of response won't exceed PacketSize, which is the size of PacketBuf. So if we can use the whole PacketBuf, we can make sure that our implementation is perfect, we can handle with all the packets that should can be handled.

If we use the first way, our implementation is only perfect when the user provide PacketBufOut with a buffer which is not less than PacketSize. It's really weird, how could we tell the user when to use it? "The current implementation may not works in some situations, unless you provide the PacketBufOut big enough"? Or "If your code use vFile:readlink, you need to provide PacketBufOut with a buffer of PacketSize"?

But if we choose the second or the third way, we can assure our implementation is always perfect.

That said, as I've stated earlier: I believe the current readlink implementation - even with it's edge-case limitations - should work just fine in almost all use-cases. As such, I don't think this is a task we should prioritize at the moment, and leave things as-is until someone expresses a need for a more robust solution.

I really agree with it! The situation that readlink is used should be really rare, and in most situations the data read shouldn't be very long. If you can bear the imperfection, we can leave it as this.

@bet4it
Copy link
Contributor

bet4it commented Sep 27, 2021

We may need to do something what bytes crate has done:

use bytes::Bytes;

let mut mem = Bytes::from("Hello world");
let a = mem.slice(0..5);

assert_eq!(a, "Hello");

let b = mem.split_to(6);

assert_eq!(mem, "world");
assert_eq!(b, "Hello ");

(But bytes require alloc)

@daniel5151
Copy link
Owner Author

I'm not entirely sure what you're suggesting with that bytes example, but as you've pointed out, bytes requires alloc, which is a non-starter for core gdbstub code. Any solution will have to rely on built-in slice methods, and/or custom no_std utility code

@bet4it
Copy link
Contributor

bet4it commented Sep 28, 2021

Bytes keeps both a pointer to the shared state containing the full memory slice and a pointer to the start of the region visible by the handle. Bytes also tracks the length of its view into the memory.

Isn't this the way you used by ReadLinkPacket in #72 (comment) ?

And Bytes can track different part of a buffer at the same time, which is we need when parse packet (For example, addr and kind when parse breakpoint packet).


I have a new idea: we can only assure the code is perfect (as described above) when alloc is enabled? In most situations, alloc should be enabled, then we can allocate a temporary buffer to store filename. If some limited environments don't enable alloc, they most probabily don't need the Host I/O extension.

@daniel5151
Copy link
Owner Author

Ahh, yes, sure, that's certainly a valid way of doing it (i.e: instead of storing a slice reference directly, store the offsets into buf corresponding to the field data).

FWIW, that approach would enable using 100% of the PacketBuf for any fields of target-dependent size (e.g: addr, offset, kind, etc...). Those can be parsed into the correct type within gdbstub prior to calling the handler API, thereby allowing the entire buffer to be used in the handler as well.

The real complexity is how to handle packets like ReadLinkPacket, where the handler API needs access to variable-length incoming packet data (e.g: the filename field). In that case, you have to either do those aforementioned ownership-juggling I sketched out in the linked comment above, or use an entirely separate buffer.

@bet4it
Copy link
Contributor

bet4it commented Oct 2, 2021

We may need to figure out that if there is only readlink command use variable-length incoming packet data and also need a buffer to prepare the response. If only this command has this situation, we can use an easy way ( Use alloc when it's enabled as I mentioned above, or just ignore it ) to handle it, and I still hope we can use entire PacketBuf for all the other commands.
( Then we don't need this loop: )

while n != 0 {
let chunk_size = n.min(buf.len());
use num_traits::NumCast;
let addr = addr + NumCast::from(i).ok_or(Error::TargetMismatch)?;
let data = &mut buf[..chunk_size];
match target.base_ops() {
BaseOps::SingleThread(ops) => ops.read_addrs(addr, data),
BaseOps::MultiThread(ops) => {
ops.read_addrs(addr, data, self.current_mem_tid)
}
}
.handle_error()?;
n -= chunk_size;
i += chunk_size;
res.write_hex_buf(data)?;
}


If what bytes does is what we need, we may suggest the upstream not to use alloc when the buffer is provided by the user?

@daniel5151
Copy link
Owner Author

So, to reiterate - it is entirely possible to rewrite the current m packet implementation (and many others) to use 100% of the packet buffer, using the "store a start..end of the addr/len instead of a &'a [u8]" approach I described above, without requiring any surface-level API changes.

I agree - now that we have the extra protocol constraint of "responses must fill fit in the packet buffer", we can absolutely remove that gnarly NumCast loop, and swap it out with a significantly simpler implementation. If you'd like to take a crack at that, feel free. Otherwise, it's something I'll get around to... eventually.


Could you rephrase your comment regarding bytes?


Also, I should probably clarify something important about alloc: even with the alloc feature enabled, I would not want to introduce any "hidden" allocations inside gdbstub. i.e: any buffers gdbstub uses should be opt-in by the user, either by them providing the buffer directly, or by setting some kind of option to enable some kind of one-time internal heap allocation.

The rationale for this is to support the kind of use cases like gz's kernel debugger - a use-case where alloc is available + enabled, but gdbstub is still (sometimes) driven via interrupt handlers. Heap allocation from an interrupt handler can lead to Bad Things:tm: happening, and should usually be avoided.

Right now, the only time gdbstub does any kind of internal hidden heap allocation is when the trace-pkt feature is enabled, which is strictly for debugging purposes.

@bet4it
Copy link
Contributor

bet4it commented Oct 3, 2021

I know we can solve it with ReadLinkPacket, I just think it's a bit ugly so I want to find a way more elegant to do it. It's better that there are already crates do the same thing so we can use it directly.

I think bytes crate do the thing what we want to do: tracks the whole buffer and different parts of the buffer at the same time. But we can't use it because it allocates memory by itself, we can't provide a buffer to it.

@daniel5151
Copy link
Owner Author

Again - and I really cannot stress this enough: using bytes or any similar alloc-dependent crates is not an option.

Period.

gdbstub is a no_std + no_alloc library. Every single packet must support no_std + no_alloc.

No exceptions.

With ReadLinkPacket, it really seems that there are only two approaches:

  1. Rewrite the read_link API as something like Pass PacketBuf as an argument of API #72 (comment), giving the implementation a chance to read+copy the filename into its own local buffer, before proceeding to write the result into the (now free to use) PacketBuf
  2. Add an optional output_buffer argument to the GdbStubBuilder, which gives users an option to provide a separate output buffer, enabling "transparent" use of 100% of the packet buffer without requiring any API contortions.

FWIW, I actually lean towards the first option, as while it results in a slightly less "elegant" API, it doesn't require the end user to allocate a buffer that will be left unused almost all of the time.


Also, some more thoughts on gdbstub's approach to alloc:

If gdbstub was an alloc library, the entire API and internal structure of the library would look completely different.

It would probably look a lot closer to https://github.com/luser/rust-gdb-remote-protocol, with plenty of Vecs and Strings being used as part of the API, and the internal implementation of the library would look a lot "cleaner" to the typical Rust programmer, as it would be devoid of any lifetimes and complex ownership juggling.

But gdbstub is not an alloc library. The fact that it is no_std + no_alloc means that implementing code in gdbstub is harder and more annoying - that's just the name of the game when working with no_std code!

The constraints of the platform that no_std code is expected to run on results in library code that doesn't look like "typical" Rust code you'd see on hosted platforms. Instead of heap-allocation + .clone(), you'll see delicate lifetime annotations + zero copy references to buffers. Instead of allocating heap memory when a buffer is needed, you'll see that the buffer has to be provided by the user, as not to "force" the implementation to have some kind of inherent memory requirement.

I honestly feel like I should put this explanation somewhere more visible, like a top-level CONTRIBUTING.md, since it really cuts to the core of gdbstub's implementation philosophy.

To really summarize the point: gdbstub is no_std first, and will only ever use a light sprinkle of alloc as an entirely optional way to make life easier on hosted platforms.

@bet4it
Copy link
Contributor

bet4it commented Oct 4, 2021

using bytes or any similar alloc-dependent crates is not an option

I know this, so I said if what bytes does is what we need, we may suggest the upstream not to use alloc when the buffer is provided by the user, then bytes could be no_std + no_alloc.


giving the implementation a chance to read+copy the filename into its own local buffer

I actually lean towards the first option, as while it results in a slightly less "elegant" API, it doesn't require the end user to allocate a buffer that will be left unused almost all of the time.

Don't you said it's impossible for us to allocate a local buffer with unknown size?

@daniel5151
Copy link
Owner Author

Ah, apologies - I didn't catch the part about wanting to upstream these changes...

I don't believe that bytes would want to upstream that kind of change, as it doesn't really fit with what that crate is trying to do. Feel free to open an issue upstream to ask, but I would be very surprised if that was something that's a change they'd be willing to make / implement.

For gdbstub, we'd want to roll our own abstraction.


Yes, us as in "us who write the code in gdbstub" cannot allocate any buffers on the user's behalf. There is no reason why the user cannot allocate however they like from within the readlink handler itself - that code is out of our control.

If they want to allocate some buffer on the heap, sure. If they want to stack allocate it, also cool. If they want to write data into some long-lived global buffer, that's also fine. The important thing is we don't make the decision for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-non-breaking Non-breaking API change design-required Getting this right will require some thought
Projects
None yet
Development

No branches or pull requests

2 participants