Consider switching `Bytes` to a "trait object", supporting custom memory management strategies. #294

carllerche · 2019-09-09T18:59:26Z

Currently, Bytes uses its own internal implementation for memory management. It supports a few strategies:

Vec<u8>
Arc<Vec<u8>>
&'static [u8]
an inlined strategy.

However, using a fixed strategy prohibits alternate strategies that would be more appropriate to the use case in question. One example is initializing a Bytes instance backed by a file using mmap.

Proposal

The Bytes struct would be updated to:

struct Bytes {
    raw: RawBytes,
}

struct RawBytes {
    ptr: *const u8,
    len: usize,
    data: NonNull<()>,
    vtable: &'static Vtable,
}

struct Vtable {
    clone: unsafe fn(ptr: *const u8, len: usize, data: NonNull<()>) -> Bytes;
    drop: unsafe fn(data: NonNull<()>);
}

An additional Bytes::from_raw constructor would be added. The rest of the Bytes API can be implemented using the above APIs.

Provided strategies

By default, the bytes crate should provide some strategies for using Bytes. A default of Arc<Vec<u8>> seems sufficient. Feature flags can be used to opt-out of the default implementation.

`BytesMut`

The BytesMut structure is no longer a core part of the bytes API. It is unclear if it should stay in the bytes crate as an option or move out to a separate crate.

Inline representation

This proposal would remove the ability to initialize a Bytes instance using the "inline" representation. It is unclear how big of an impact this would have. It is also unclear if use cases that take advantage of "inline" representation would be able to use an alternate strategy enabled with this change.

Refs #269

The text was updated successfully, but these errors were encountered:

cbeck88 · 2019-09-09T19:33:13Z

how big of an impact this would have

I expect that in a lot of cases, llvm will be able to inline through these function pointers, as long as it can prove to itself that vtable only ever has one value in a given translation unit. Depending on how the bytes crate works it might or might not be able to prove that to itself -- if Bytes exposes multiple constructors for this trait object, and your lib has a pub interface that takes Bytes objects from the wild, then it probably can't prove that to itself, unless there is whole-program optimization

I think another way would be to use the C++ pattern for allocator-aware containers, where Bytes is generic over an allocator type. The allocator may be "stateful" in that model, and may e.g. contain a reference to an Arena on the stack, where memory is allocated from. In versions based on Vec, the allocator would be a ZST and there would be no overhead in the size of a Bytes instance.

Is there a good reason to prefer this sample code above with an explicit vtable pointer, to letting the compiler generate that and using a what the language calls a trait object? My impression was that the latter approach is more idiomatic but maybe there's something I'm missing

DoumanAsh · 2019-09-09T20:47:20Z

Personally I'd prefer C++ allocator-aware style container, where user defines allocator and controls all allocations via static interface rather than using plain function pointers.
Allocator interface like in C++ is a more clean solution than bunch of function pointers, which would have difficulty to store internal state, and instead put responsibility onto Bytes itself
From my experience writing custom wakers, vtable interface is just a mistake which is very limiting for library author.

It must be noted that stateful allocator is likely to be necessity, and it would make sense for data to be stored in allocator instance, rather than in Bytes
I know it might seem inflexible, but allocator would require Bytes to implement it's interface via Trait so that it could be dynamically dispatched, if necessary.
But it wouldn't restrict user option of static dispatch.

This is btw what alloc aware containers use in Rust too, but trait parameters are not exposed, sadly

I expect that in a lot of cases, llvm will be able to inline through these function pointers, as long as it can prove to itself that vtable only ever has one value in a given translation unit

I wouldn't trust compiler to do it properly.
There are a lot of iffy cases where it is hard to get rid of raw pointers, especially if your struct itself can modify vtable outside of construction or create instances at runtime without literal pointers

carllerche · 2019-09-09T20:48:56Z

From my experience writing custom wakers, vtable interface is just a mistake which is very limiting for library author.

Could you provide a concrete example of this?

DoumanAsh · 2019-09-09T21:02:30Z

The main concern is state, which will have to go as argument and be stored by Bytes

Another thing is that vtable is unlikely to be inlined unless you're carefully designing it, which would imply each allocation/reallocation incur extra cost (which is ofc minor, but still)

vtable interface is likely to be inflexible and limiting as user will be limited in available methods and most likely will just have to define allocator object, have Bytes to pass it to vtable and invoke methods using this pointer

carllerche added this to the v0.5 milestone Sep 9, 2019

seanmonstar mentioned this issue Oct 11, 2019

Refactor Bytes to use an internal vtable #298

Merged

seanmonstar closed this as completed in #298 Oct 16, 2019

This was referenced Nov 5, 2019

Use a trait-object representation also for BytesMut? #310

Open

Type erased (dynamically dispatched) allocators rust-lang/wg-allocators#33

Open

quark-zju mentioned this issue Jan 17, 2020

[Question] Next steps about vtable and mmap? #359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider switching `Bytes` to a "trait object", supporting custom memory management strategies. #294

Consider switching `Bytes` to a "trait object", supporting custom memory management strategies. #294

carllerche commented Sep 9, 2019 •

edited

cbeck88 commented Sep 9, 2019 •

edited

DoumanAsh commented Sep 9, 2019 •

edited

carllerche commented Sep 9, 2019

DoumanAsh commented Sep 9, 2019

Consider switching Bytes to a "trait object", supporting custom memory management strategies. #294

Consider switching Bytes to a "trait object", supporting custom memory management strategies. #294

Comments

carllerche commented Sep 9, 2019 • edited

Proposal

Provided strategies

BytesMut

Inline representation

cbeck88 commented Sep 9, 2019 • edited

DoumanAsh commented Sep 9, 2019 • edited

carllerche commented Sep 9, 2019

DoumanAsh commented Sep 9, 2019

Consider switching `Bytes` to a "trait object", supporting custom memory management strategies. #294

Consider switching `Bytes` to a "trait object", supporting custom memory management strategies. #294

carllerche commented Sep 9, 2019 •

edited

`BytesMut`

cbeck88 commented Sep 9, 2019 •

edited

DoumanAsh commented Sep 9, 2019 •

edited