Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider switching Bytes to a "trait object", supporting custom memory management strategies. #294

Closed
carllerche opened this issue Sep 9, 2019 · 4 comments · Fixed by #298
Milestone

Comments

@carllerche
Copy link
Member

carllerche commented Sep 9, 2019

Currently, Bytes uses its own internal implementation for memory management. It supports a few strategies:

  • Vec<u8>
  • Arc<Vec<u8>>
  • &'static [u8]
  • an inlined strategy.

However, using a fixed strategy prohibits alternate strategies that would be more appropriate to the use case in question. One example is initializing a Bytes instance backed by a file using mmap.

Proposal

The Bytes struct would be updated to:

struct Bytes {
    raw: RawBytes,
}

struct RawBytes {
    ptr: *const u8,
    len: usize,
    data: NonNull<()>,
    vtable: &'static Vtable,
}

struct Vtable {
    clone: unsafe fn(ptr: *const u8, len: usize, data: NonNull<()>) -> Bytes;
    drop: unsafe fn(data: NonNull<()>);
}

An additional Bytes::from_raw constructor would be added. The rest of the Bytes API can be implemented using the above APIs.

Provided strategies

By default, the bytes crate should provide some strategies for using Bytes. A default of Arc<Vec<u8>> seems sufficient. Feature flags can be used to opt-out of the default implementation.

BytesMut

The BytesMut structure is no longer a core part of the bytes API. It is unclear if it should stay in the bytes crate as an option or move out to a separate crate.

Inline representation

This proposal would remove the ability to initialize a Bytes instance using the "inline" representation. It is unclear how big of an impact this would have. It is also unclear if use cases that take advantage of "inline" representation would be able to use an alternate strategy enabled with this change.

Refs #269

@carllerche carllerche added this to the v0.5 milestone Sep 9, 2019
@cbeck88
Copy link
Contributor

cbeck88 commented Sep 9, 2019

how big of an impact this would have

I expect that in a lot of cases, llvm will be able to inline through these function pointers, as long as it can prove to itself that vtable only ever has one value in a given translation unit. Depending on how the bytes crate works it might or might not be able to prove that to itself -- if Bytes exposes multiple constructors for this trait object, and your lib has a pub interface that takes Bytes objects from the wild, then it probably can't prove that to itself, unless there is whole-program optimization

I think another way would be to use the C++ pattern for allocator-aware containers, where Bytes is generic over an allocator type. The allocator may be "stateful" in that model, and may e.g. contain a reference to an Arena on the stack, where memory is allocated from. In versions based on Vec, the allocator would be a ZST and there would be no overhead in the size of a Bytes instance.

Is there a good reason to prefer this sample code above with an explicit vtable pointer, to letting the compiler generate that and using a what the language calls a trait object? My impression was that the latter approach is more idiomatic but maybe there's something I'm missing

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Sep 9, 2019

Personally I'd prefer C++ allocator-aware style container, where user defines allocator and controls all allocations via static interface rather than using plain function pointers.
Allocator interface like in C++ is a more clean solution than bunch of function pointers, which would have difficulty to store internal state, and instead put responsibility onto Bytes itself
From my experience writing custom wakers, vtable interface is just a mistake which is very limiting for library author.

It must be noted that stateful allocator is likely to be necessity, and it would make sense for data to be stored in allocator instance, rather than in Bytes
I know it might seem inflexible, but allocator would require Bytes to implement it's interface via Trait so that it could be dynamically dispatched, if necessary.
But it wouldn't restrict user option of static dispatch.

This is btw what alloc aware containers use in Rust too, but trait parameters are not exposed, sadly

I expect that in a lot of cases, llvm will be able to inline through these function pointers, as long as it can prove to itself that vtable only ever has one value in a given translation unit

I wouldn't trust compiler to do it properly.
There are a lot of iffy cases where it is hard to get rid of raw pointers, especially if your struct itself can modify vtable outside of construction or create instances at runtime without literal pointers

@carllerche
Copy link
Member Author

From my experience writing custom wakers, vtable interface is just a mistake which is very limiting for library author.

Could you provide a concrete example of this?

@DoumanAsh
Copy link
Contributor

The main concern is state, which will have to go as argument and be stored by Bytes

Another thing is that vtable is unlikely to be inlined unless you're carefully designing it, which would imply each allocation/reallocation incur extra cost (which is ofc minor, but still)

vtable interface is likely to be inflexible and limiting as user will be limited in available methods and most likely will just have to define allocator object, have Bytes to pass it to vtable and invoke methods using this pointer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants