Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Seekable Buffer ("SeekBuf") and cursor/iterator support #657

Open
dylanplecki opened this issue Jan 19, 2024 · 2 comments · May be fixed by #658
Open

Feature Request: Seekable Buffer ("SeekBuf") and cursor/iterator support #657

dylanplecki opened this issue Jan 19, 2024 · 2 comments · May be fixed by #658

Comments

@dylanplecki
Copy link

dylanplecki commented Jan 19, 2024

(note that this is a somewhat large feature request with significant functionality additions)

While using the bytes crate, I noticed a use case that is not currently covered under the existing Buf type. Specifically, when returning non-contiguous chunks of memory, there is no ability to "look ahead" of the current chunk without also advancing the buffer and losing access to the current chunk. This is somewhat possible using the chunks_vectored method on Buf, but only in std environments and without a guarantee that it's implemented by the buffer (i.e. the default impl only allows partial access to the buffer).

In a network programming sense, it may be useful to scan further than the first chunk without consuming the original chunk of data, i.e. when parsing an HTTP header across a ring buffer (like VecDeque).

One way to implement this using the currently available interfaces is to have a Buf type that also implements Clone. The user could then simply advance a new, cloned buffer without advancing the original. But with this, the user is unaware whether Clone would clone the underlying buffer (an expensive operation) or just a cheap reference to a buffer. It is also not clear whether mutations to the new clone would alter the original (i.e. via an Arc). Buf does not provide any way to distinguish between an owned memory region and a reference to a memory region, so there is no way for an existing Buf to be iterated past the first chunk without possible mutations or copies for all existing implementations.

Any change to Buf to make it able to seek to any arbitrary point in the buffer would likely be backward-incompatible, since current implementations of Buf only provide access to the first chunk. So I propose adding a new parent type, SeekBuf, that also implements Buf but adds two new additional methods: chunk_from to get a chunk beginning at some offset in the buffer, and chunk_to to get a chunk ending at some offset in the buffer. Any type that implements Buf can also implement SeekBuf if it supports these features.

When a buffer type implements SeekBuf, we can also use a new BufCursor type to iterate and seek within the buffer without consuming its contents. This new cursor type can implement Iterator, Buf, and SeekBuf to provide full immutable access to the underlying buffer memory, useful for parsing, lookahead, search operations, etc. It can also be used to provide highly-tuned operations across any memory buffer, such as selective SIMD operations that provide search-like behavior (i.e. in HTTP parsing).

SeekBuf will be strictly a superset of the functionality of Buf and will not be automatically implemented for any type which already implements Buf, thus requiring opt-in support from consumers of this crate.

I have already prepared a PR for this feature and will attach it to this issue.

@dylanplecki dylanplecki linked a pull request Jan 19, 2024 that will close this issue
@atagen
Copy link

atagen commented Jan 24, 2024

Great PR, this is very much something I was looking for in bytes. If for whatever reason this isn't accepted, I would suggest it's worthwhile packaging it as a third-party "add-on" crate.

Regarding using BufCursor for parsing and lookahead - would it be possible to implement Clone for its iterator state?

This would allow passing cheap clones of a BufCursors to sub-functions in a main parsing loop, allowing us to eagerly peek at bytes to return a successful parse or continue on to the next parsing attempt, without disturbing the main cursor's position.

I note this could be achieved at present by checking cursor_position and re-slicing the main buffer into a new cursor each time, but in terms of elegance and transparency I feel like simply cloning the BufCursor would be far superior.

Edit: Okay, I just realised this is what subcursors are for, but it's still kind of suboptimal to deal with n levels of nested BufCursor<BufCursor<...>.

@dylanplecki
Copy link
Author

dylanplecki commented Jan 26, 2024

That's a great idea! And a very simple change (already added to the PR), thank you, it was definitely an oversight to not include a Clone impl for BufCursor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants