Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NonZero* or NonNull, so that Option<Bytes> is no larger than Bytes #241

Closed
SimonSapin opened this issue Jan 17, 2019 · 20 comments
Closed

Comments

@SimonSapin
Copy link

SimonSapin commented Jan 17, 2019

It would be nice if size_of::<Option<Bytes>>() was the same as size_of::<Bytes>(). The compiler does this if one of the fields inside Bytes / bytes::Inner has a type that is statically known to be non-zero or non-null: &T, Box<T>, num::NonZeroUsize, ptr::NonNull<T>, …

https://github.com/carllerche/bytes/blob/42b669690af1f358dd5234beb5078ebdf25b3e02/src/bytes.rs#L301-L304

I believe that arc is the only possible candidate without changing the data representation. It contains either a non-zero marker in its lower two bits, or a non-null pointer for the KIND_ARC case. Other fields can be entirely zero, at least in the KIND_INLINE case.

However replacing the field with arc: NonNull<Shared> is not enough, since some of the accesses to it are atomic.

The standard library provides AtomicPtr and ptr::NonNull, but not AtomicNonNullPtr nor atomic operations on raw pointers. However, maybe we could build a AtomicNonNullPtr type that contains UnsafeCell<NonNull<T>>, and implements atomic operations based on unsafely casting &UnsafeCell<NonNull<T>> to &AtomicPtr<T>:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=811d20754131c6339802fb05d0908afe

@RalfJung I think this would be sound, but am I missing something?

@RalfJung
Copy link
Contributor

As long as you make sure that you never write a NULL there, you should be good. Did you try writing a few tests and running them in Miri?

Also, notice that bytes initializes an Inner with mem::uninitialized. We are still debating whether that's legal for integers, but it certainly isn't legal for NonNull. So the init code would need some redoing.

@seanmonstar
Copy link
Member

I feel it'd also help if we could change struct Inner to union Inner, but I just tried to do so, and non-Copy values in a union are still unstable (AtomicPtr and NonNull are both !Copy).

@carllerche
Copy link
Member

@seanmonstar union would be nice, but we use a few bits to track the variant. I'm not sure if union would let us do that.

@SimonSapin
Copy link
Author

SimonSapin commented Jun 10, 2019

The bit tricks would have to operate on one of the union variant.

(NonNull is Copy, but yes this leaves AtomicPtr.)

@carllerche
Copy link
Member

@SimonSapin I don't know much about Rust unions. We would need the memory layout to be well defined in that case.

@mzabaluev
Copy link
Contributor

mzabaluev commented Jun 17, 2019

Is it the representation tricks that necessitate the arc pointer itself to be atomically accessed, or is there some other interior mutability wizardry? For reference, Arc uses NonNull internally.

@SimonSapin
Copy link
Author

@mzabaluev Yes. In the code link in the issue description you can see a arc: AtomicPtr<Shared> field. It’s called arc because atomic reference counting is involved, but it’s not using std::sync::Arc and the access to the pointer itself is (in some cases) also atomic.

@mzabaluev
Copy link
Contributor

@SimonSapin yes, I understand most of that, except this last part:

the access to the pointer itself is (in some cases) also atomic.

It actually should be always atomic with AtomicPtr unless you explicitly bypass the atomicity?
And why must the pointer itself be atomic in Bytes/BytesMut which exhibit no interior mutability, meaning that a non-mutating thread could never observe a partially updated pointer anyway?

@SimonSapin
Copy link
Author

I mean, I’m only reading the code. You can presumably do so as well as I.

@mzabaluev
Copy link
Contributor

I can't claim to be good enough to understand everything in there, and I have better uses of my work day than to read through screenfuls of pointers-as-bitfields arcana 😆 It's just on the principle, I can't see a good reason imposed by the API for these pointers to be atomic.

@carllerche
Copy link
Member

@mzabaluev there is mutation that happens internally. When you create a Bytes initially, it only allocates once (similar to Vec). This works because there is only one outstanding handle. As soon as you clone, an arc is needed (see here).

This is done, among other reasons, to make conversions from Vec<u8> cheap.

@mzabaluev
Copy link
Contributor

mzabaluev commented Jun 18, 2019

This is done, among other reasons, to make conversions from Vec cheap.

I suspect this is one of the cases where an implicit dynamic optimization made to benefit one use case penalizes all other users of the crate who may not need it at all.

What usage patterns channel people into starting off with Vec and then converting? In-place initialization could be provided via macros like bytes![...] and bytes_mut![...] which would also select the optimal representation at compile time.

Could the optimized conversions be addressed explicitly with a Cow-like enum type (there's already support for Either which cuts part of the cake), or, going forward, by abstracting over Buf and specializing for Bytes/BytesMut to achieve zero copy behavior?

@mzabaluev
Copy link
Contributor

When you create a Bytes initially, it only allocates once (similar to Vec).

The double allocation - one for Vec, one for Shared - could be avoided by inlining the data into Shared as a DST.

@carllerche
Copy link
Member

The problem is that buffer capacity is often allocated in multiples of page size. Inlining the metadata will break this and not play well with allocators.

@carllerche
Copy link
Member

Either way, the path forward for any change would be to demonstrate it performs better than the current impl for common usage patterns.

@mzabaluev
Copy link
Contributor

The problem is that buffer capacity is often allocated in multiples of page size. Inlining the metadata will break this and not play well with allocators.

The allocator is supposed to properly align any Layout thrown at it and that covers the fixed-size array parameterizations of the generic type that would CoerceUnsized to the DST pointer after allocation.
As to the size, the buffer capacity for reallocation could account for mem::size_of::<SharedHeader>() to come up with a total allocation size fitting within the next power of two.

@carllerche
Copy link
Member

Yep... so now a 4kb allocation takes 8kb... etc

@mzabaluev
Copy link
Contributor

Does it matter how much of a 2^n allocated size is taken up by the header, as long as the rest is available for data? I did not notice any particular guarantees about the capacity growth strategy in the docs, so I assume the implementation is free to change it.

@mzabaluev
Copy link
Contributor

One potential drawback with an inline shared buffer layout is reallocation of small buffers, where the overhead of copying the header to the new contiguous DST structure factors noticeably in the overall copying cost. I'd wager that smaller reserved capacities in BytesMut are more likely to be seen when encoding output buffers. The implementation could hedge against this by choosing a larger minimal allocated size, but forced allocation overcapacity has its own performance hazards...

I agree that benchmarking on typical application workloads is important.

@seanmonstar
Copy link
Member

In master now (from #298), both Bytes and BytesMut now take advantage of non-null optimization for Option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants