Use NonZero* or NonNull, so that Option<Bytes> is no larger than Bytes #241

SimonSapin · 2019-01-17T11:50:43Z

It would be nice if size_of::<Option<Bytes>>() was the same as size_of::<Bytes>(). The compiler does this if one of the fields inside Bytes / bytes::Inner has a type that is statically known to be non-zero or non-null: &T, Box<T>, num::NonZeroUsize, ptr::NonNull<T>, …

https://github.com/carllerche/bytes/blob/42b669690af1f358dd5234beb5078ebdf25b3e02/src/bytes.rs#L301-L304

I believe that arc is the only possible candidate without changing the data representation. It contains either a non-zero marker in its lower two bits, or a non-null pointer for the KIND_ARC case. Other fields can be entirely zero, at least in the KIND_INLINE case.

However replacing the field with arc: NonNull<Shared> is not enough, since some of the accesses to it are atomic.

The standard library provides AtomicPtr and ptr::NonNull, but not AtomicNonNullPtr nor atomic operations on raw pointers. However, maybe we could build a AtomicNonNullPtr type that contains UnsafeCell<NonNull<T>>, and implements atomic operations based on unsafely casting &UnsafeCell<NonNull<T>> to &AtomicPtr<T>:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=811d20754131c6339802fb05d0908afe

@RalfJung I think this would be sound, but am I missing something?

The text was updated successfully, but these errors were encountered:

RalfJung · 2019-01-21T13:05:37Z

As long as you make sure that you never write a NULL there, you should be good. Did you try writing a few tests and running them in Miri?

Also, notice that bytes initializes an Inner with mem::uninitialized. We are still debating whether that's legal for integers, but it certainly isn't legal for NonNull. So the init code would need some redoing.

seanmonstar · 2019-06-10T16:48:45Z

I feel it'd also help if we could change struct Inner to union Inner, but I just tried to do so, and non-Copy values in a union are still unstable (AtomicPtr and NonNull are both !Copy).

carllerche · 2019-06-10T17:14:49Z

@seanmonstar union would be nice, but we use a few bits to track the variant. I'm not sure if union would let us do that.

SimonSapin · 2019-06-10T17:48:46Z

The bit tricks would have to operate on one of the union variant.

(NonNull is Copy, but yes this leaves AtomicPtr.)

carllerche · 2019-06-10T18:14:14Z

@SimonSapin I don't know much about Rust unions. We would need the memory layout to be well defined in that case.

mzabaluev · 2019-06-17T22:37:11Z

Is it the representation tricks that necessitate the arc pointer itself to be atomically accessed, or is there some other interior mutability wizardry? For reference, Arc uses NonNull internally.

SimonSapin · 2019-06-17T22:40:51Z

@mzabaluev Yes. In the code link in the issue description you can see a arc: AtomicPtr<Shared> field. It’s called arc because atomic reference counting is involved, but it’s not using std::sync::Arc and the access to the pointer itself is (in some cases) also atomic.

mzabaluev · 2019-06-18T00:04:41Z

@SimonSapin yes, I understand most of that, except this last part:

the access to the pointer itself is (in some cases) also atomic.

It actually should be always atomic with AtomicPtr unless you explicitly bypass the atomicity?
And why must the pointer itself be atomic in Bytes/BytesMut which exhibit no interior mutability, meaning that a non-mutating thread could never observe a partially updated pointer anyway?

SimonSapin · 2019-06-18T05:13:27Z

I mean, I’m only reading the code. You can presumably do so as well as I.

mzabaluev · 2019-06-18T06:00:25Z

I can't claim to be good enough to understand everything in there, and I have better uses of my work day than to read through screenfuls of pointers-as-bitfields arcana 😆 It's just on the principle, I can't see a good reason imposed by the API for these pointers to be atomic.

carllerche · 2019-06-18T06:16:40Z

@mzabaluev there is mutation that happens internally. When you create a Bytes initially, it only allocates once (similar to Vec). This works because there is only one outstanding handle. As soon as you clone, an arc is needed (see here).

This is done, among other reasons, to make conversions from Vec<u8> cheap.

mzabaluev · 2019-06-18T07:43:37Z

This is done, among other reasons, to make conversions from Vec cheap.

I suspect this is one of the cases where an implicit dynamic optimization made to benefit one use case penalizes all other users of the crate who may not need it at all.

What usage patterns channel people into starting off with Vec and then converting? In-place initialization could be provided via macros like bytes![...] and bytes_mut![...] which would also select the optimal representation at compile time.

Could the optimized conversions be addressed explicitly with a Cow-like enum type (there's already support for Either which cuts part of the cake), or, going forward, by abstracting over Buf and specializing for Bytes/BytesMut to achieve zero copy behavior?

mzabaluev · 2019-06-18T07:53:20Z

When you create a Bytes initially, it only allocates once (similar to Vec).

The double allocation - one for Vec, one for Shared - could be avoided by inlining the data into Shared as a DST.

carllerche · 2019-06-18T15:20:48Z

The problem is that buffer capacity is often allocated in multiples of page size. Inlining the metadata will break this and not play well with allocators.

carllerche · 2019-06-18T15:22:26Z

Either way, the path forward for any change would be to demonstrate it performs better than the current impl for common usage patterns.

mzabaluev · 2019-06-18T19:43:03Z

The problem is that buffer capacity is often allocated in multiples of page size. Inlining the metadata will break this and not play well with allocators.

The allocator is supposed to properly align any Layout thrown at it and that covers the fixed-size array parameterizations of the generic type that would CoerceUnsized to the DST pointer after allocation.
As to the size, the buffer capacity for reallocation could account for mem::size_of::<SharedHeader>() to come up with a total allocation size fitting within the next power of two.

carllerche · 2019-06-18T22:56:50Z

Yep... so now a 4kb allocation takes 8kb... etc

mzabaluev · 2019-06-19T00:59:25Z

Does it matter how much of a 2^n allocated size is taken up by the header, as long as the rest is available for data? I did not notice any particular guarantees about the capacity growth strategy in the docs, so I assume the implementation is free to change it.

mzabaluev · 2019-06-19T01:28:06Z

One potential drawback with an inline shared buffer layout is reallocation of small buffers, where the overhead of copying the header to the new contiguous DST structure factors noticeably in the overall copying cost. I'd wager that smaller reserved capacities in BytesMut are more likely to be seen when encoding output buffers. The implementation could hedge against this by choosing a larger minimal allocated size, but forced allocation overcapacity has its own performance hazards...

I agree that benchmarking on typical application workloads is important.

seanmonstar · 2019-10-16T19:05:33Z

In master now (from #298), both Bytes and BytesMut now take advantage of non-null optimization for Option.

mzabaluev mentioned this issue Jun 23, 2019

RFC: Zero Cost Abstraction Reform #268

Closed

seanmonstar closed this as completed Oct 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use NonZero* or NonNull, so that Option<Bytes> is no larger than Bytes #241

Use NonZero* or NonNull, so that Option<Bytes> is no larger than Bytes #241

SimonSapin commented Jan 17, 2019 •

edited

RalfJung commented Jan 21, 2019

seanmonstar commented Jun 10, 2019

carllerche commented Jun 10, 2019

SimonSapin commented Jun 10, 2019 •

edited

carllerche commented Jun 10, 2019

mzabaluev commented Jun 17, 2019 •

edited

SimonSapin commented Jun 17, 2019

mzabaluev commented Jun 18, 2019

SimonSapin commented Jun 18, 2019

mzabaluev commented Jun 18, 2019

carllerche commented Jun 18, 2019

mzabaluev commented Jun 18, 2019 •

edited

mzabaluev commented Jun 18, 2019

carllerche commented Jun 18, 2019

carllerche commented Jun 18, 2019

mzabaluev commented Jun 18, 2019

carllerche commented Jun 18, 2019

mzabaluev commented Jun 19, 2019

mzabaluev commented Jun 19, 2019

seanmonstar commented Oct 16, 2019

Use NonZero* or NonNull, so that Option<Bytes> is no larger than Bytes #241

Use NonZero* or NonNull, so that Option<Bytes> is no larger than Bytes #241

Comments

SimonSapin commented Jan 17, 2019 • edited

RalfJung commented Jan 21, 2019

seanmonstar commented Jun 10, 2019

carllerche commented Jun 10, 2019

SimonSapin commented Jun 10, 2019 • edited

carllerche commented Jun 10, 2019

mzabaluev commented Jun 17, 2019 • edited

SimonSapin commented Jun 17, 2019

mzabaluev commented Jun 18, 2019

SimonSapin commented Jun 18, 2019

mzabaluev commented Jun 18, 2019

carllerche commented Jun 18, 2019

mzabaluev commented Jun 18, 2019 • edited

mzabaluev commented Jun 18, 2019

carllerche commented Jun 18, 2019

carllerche commented Jun 18, 2019

mzabaluev commented Jun 18, 2019

carllerche commented Jun 18, 2019

mzabaluev commented Jun 19, 2019

mzabaluev commented Jun 19, 2019

seanmonstar commented Oct 16, 2019

SimonSapin commented Jan 17, 2019 •

edited

SimonSapin commented Jun 10, 2019 •

edited

mzabaluev commented Jun 17, 2019 •

edited

mzabaluev commented Jun 18, 2019 •

edited