Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Bytes to use an internal vtable #298

Merged
merged 1 commit into from Oct 16, 2019
Merged

Conversation

seanmonstar
Copy link
Member

Bytes is a useful tool for managing multiple slices into the same region
of memory, and the other things it used to have been removed to reduce
complexity. The exact strategy for managing the multiple references is
no longer hard-coded, but instead backing by a customizable vtable.

  • Removed ability to mutate the underlying memory from the Bytes type.
  • Removed the "inline" (SBO) mechanism in Bytes. The reduces a large
    amount of complexity, and improves performance when accessing the
    slice of bytes, since a branch is no longer needed to check if the
    data is inline.
  • Removed Bytes knowledge of BytesMut (BytesMut may grow that
    knowledge back at a future point.)

Closes #294

@seanmonstar
Copy link
Member Author

Note, I didn't actually make the vtable public yet...

@seanmonstar seanmonstar force-pushed the bytes-vtable branch 4 times, most recently from c8b5531 to eb6eeb6 Compare October 14, 2019 20:01
@seanmonstar
Copy link
Member Author

seanmonstar commented Oct 14, 2019

The benchmarks (below) are promising! They show that deref, the most common operation by far, is about twice as fast. Even cloning and dropping is a little faster in most cases.

The main regression is when the data would previously have fit in the "inline" variant. The assumption is that the likelihood of data fitting inline is sufficiently low compared to the amount of times we do the other operations.

Before

running 20 tests
test alloc_big                      ... bench:          67 ns/iter (+/- 1)
test alloc_mid                      ... bench:          62 ns/iter (+/- 0)
test alloc_small                    ... bench:       3,318 ns/iter (+/- 61)
test alloc_write_split_to_mid       ... bench:         114 ns/iter (+/- 1)
test clone_arc                      ... bench:      28,577 ns/iter (+/- 220)
test clone_inline                   ... bench:       5,366 ns/iter (+/- 477)
test clone_static                   ... bench:       5,281 ns/iter (+/- 88)
test deref_inline                   ... bench:       1,119 ns/iter (+/- 44)
test deref_shared                   ... bench:         979 ns/iter (+/- 40)
test deref_two                      ... bench:       1,236 ns/iter (+/- 79)
test deref_unique                   ... bench:         980 ns/iter (+/- 46)
test deref_unique_unroll            ... bench:       1,054 ns/iter (+/- 25)
test drain_write_drain              ... bench:         650 ns/iter (+/- 20)
test fmt_write                      ... bench:          36 ns/iter (+/- 1) = 1027 MB/s
test from_long_slice                ... bench:          82 ns/iter (+/- 3) = 1560 MB/s
test slice_avg_le_inline_from_arc   ... bench:      24,499 ns/iter (+/- 363)
test slice_empty                    ... bench:      19,038 ns/iter (+/- 461)
test slice_large_le_inline_from_arc ... bench:      23,458 ns/iter (+/- 621)
test slice_short_from_arc           ... bench:      22,943 ns/iter (+/- 367)
test split_off_and_drop             ... bench:     113,326 ns/iter (+/- 5,070)

After

bytes.rs:

test clone_arc_vec                  ... bench:      18,692 ns/iter (+/- 542)
test clone_shared                   ... bench:      21,608 ns/iter (+/- 864)
test clone_static                   ... bench:       4,590 ns/iter (+/- 281)
test deref_shared                   ... bench:         425 ns/iter (+/- 9)
test deref_static                   ... bench:         426 ns/iter (+/- 18)
test deref_unique                   ... bench:         427 ns/iter (+/- 13)
test from_long_slice                ... bench:          86 ns/iter (+/- 5) = 1488 MB/s
test slice_avg_le_inline_from_arc   ... bench:      33,667 ns/iter (+/- 2,360)
test slice_empty                    ... bench:       4,371 ns/iter (+/- 133)
test slice_large_le_inline_from_arc ... bench:      35,964 ns/iter (+/- 1,015)
test slice_short_from_arc           ... bench:      33,599 ns/iter (+/- 1,455)
test split_off_and_drop             ... bench:     122,869 ns/iter (+/- 5,183)

bytes_mut.rs:

test alloc_big                ... bench:          70 ns/iter (+/- 1)
test alloc_mid                ... bench:          63 ns/iter (+/- 1)
test alloc_small              ... bench:      39,019 ns/iter (+/- 2,061)
test alloc_write_split_to_mid ... bench:         115 ns/iter (+/- 5)
test clone_frozen             ... bench:      21,638 ns/iter (+/- 688)
test deref_shared             ... bench:         568 ns/iter (+/- 85)
test deref_two                ... bench:         426 ns/iter (+/- 7)
test deref_unique             ... bench:         599 ns/iter (+/- 138)
test deref_unique_unroll      ... bench:         442 ns/iter (+/- 23)
test drain_write_drain        ... bench:         582 ns/iter (+/- 24)
test fmt_write                ... bench:          28 ns/iter (+/- 0) = 1321 MB/s

src/buf/chain.rs Outdated Show resolved Hide resolved
@carllerche
Copy link
Member

The assumption is that the likelihood of data fitting inline is sufficiently low compared to the amount of times we do the other operations.

Also, the added flexibility from the dynamism will allow handling those cases using other strategies.


impl PartialEq<Bytes> for Vec<u8> {
f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this can be compare_exchange with Acquire ordering on failure.

src/bytes.rs Outdated Show resolved Hide resolved
1 << (repr + (MIN_ORIGINAL_CAPACITY_WIDTH - 1))
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete or uncomment.

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The only feedback I'd like to see in before merging is switching the drop vtable fn to take &mut AtomicPtr<()>. This will allow skipping the atomic op in drop fns.

@seanmonstar seanmonstar force-pushed the bytes-vtable branch 3 times, most recently from c114ba6 to 78534db Compare October 15, 2019 21:46
@seanmonstar
Copy link
Member Author

Ah woops, loom's AtomicPtr doesn't yet have get_mut, PR up: tokio-rs/loom#80

@carllerche
Copy link
Member

Loom release coming: tokio-rs/loom#81

@seanmonstar seanmonstar force-pushed the bytes-vtable branch 2 times, most recently from 5089be1 to 493b37a Compare October 16, 2019 16:37
Bytes is a useful tool for managing multiple slices into the same region
of memory, and the other things it used to have been removed to reduce
complexity. The exact strategy for managing the multiple references is
no longer hard-coded, but instead backing by a customizable vtable.

- Removed ability to mutate the underlying memory from the `Bytes` type.
- Removed the "inline" (SBO) mechanism in `Bytes`. The reduces a large
  amount of complexity, and improves performance when accessing the
  slice of bytes, since a branch is no longer needed to check if the
  data is inline.
- Removed `Bytes` knowledge of `BytesMut` (`BytesMut` may grow that
  knowledge back at a future point.)
pub fn truncate(&mut self, len: usize) {
self.inner.truncate(len);
if len >= self.len {
self.len = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. The documentation says truncate should have no effect in this case.

tarcieri pushed a commit to iqlusioninc/crates that referenced this pull request Dec 2, 2019
- Removes the `bytes-preview` feature from `zeroize`
- Upgrades `secrecy` to use `bytes` v0.5

Now that `bytes` v0.5 is out, I've opened a PR to upstream the `Zeroize`
impl for `BytesMut`:

tokio-rs/bytes#335

Unfortunately it's no-longer possible to impl `Zeroize` for `Bytes` as
of `bytes` v0.5, as the `try_mut` method was dropped in this PR:

tokio-rs/bytes#298

I have brought this up on the first PR.

In the meantime, this vendors the previous `BytesMut` impl of `Zeroize`
into `secrecy`'s `SecretBytesMut` type, and drops `SecretBytes` as it's
no-longer possible to implement.
tarcieri pushed a commit to iqlusioninc/crates that referenced this pull request Dec 2, 2019
- Removes the `bytes-preview` feature from `zeroize`
- Upgrades `secrecy` to use `bytes` v0.5

Now that `bytes` v0.5 is out, I've opened a PR to upstream the `Zeroize`
impl for `BytesMut`:

tokio-rs/bytes#335

Unfortunately it's no-longer possible to impl `Zeroize` for `Bytes` as
of `bytes` v0.5, as the `try_mut` method was dropped in this PR:

tokio-rs/bytes#298

I have brought this up on the first PR.

In the meantime, this vendors the previous `BytesMut` impl of `Zeroize`
into `secrecy`'s `SecretBytesMut` type, and drops `SecretBytes` as it's
no-longer possible to implement.
braddunbar added a commit to braddunbar/bytes that referenced this pull request Jan 28, 2024
These seem to have been commented by accident in tokio-rs#298, and are still
passing.
braddunbar added a commit to braddunbar/bytes that referenced this pull request Jan 28, 2024
These seem to have been commented by accident in tokio-rs#298, and are still
passing.
carllerche pushed a commit that referenced this pull request Feb 6, 2024
These seem to have been commented by accident in #298, and are still
passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider switching Bytes to a "trait object", supporting custom memory management strategies.
3 participants