Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite BiLock lock/unlock to be allocation free #2384

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

exrook
Copy link
Contributor

@exrook exrook commented Apr 3, 2021

Inspired by a previous attempt to remove allocation (#606), this version
uses three tokens to synchronize access to the locked value and the waker.
These tokens are swapped between the inner struct of the lock and the
bilock halves. The tokens are initalized with the LOCK token held by the
inner struct, the WAKE token held by one bilock half, and the NULL token
held by the other bilock half.

To poll the lock, our half swaps its token with the inner token:
if we get the LOCK token we now have the lock,
if we get the NULL token we swap again and loop
if we get the WAKE token we store our waker in the inner and swap again,
if we then get the LOCK token we now have the lock, otherwise we return
Poll::Pending

To unlock the lock, our half swaps its token (which we know must be the LOCK
token) with the inner token:
if we get the NULL token, there is no contention so we return
if we get the WAKE token, we wake the waker stored in the inner

Additionally, this change makes the bilock methods require &mut self

@exrook exrook requested a review from taiki-e as a code owner April 3, 2021 17:55
@exrook exrook force-pushed the token-bilockimpl branch 3 times, most recently from 1de24ef to 7b5f662 Compare April 3, 2021 18:23
@taiki-e
Copy link
Member

taiki-e commented Apr 10, 2021

Thanks! It seems the previous attempt had performance issue, so I'm curious if this implementation fixed that issue.

@exrook
Copy link
Contributor Author

exrook commented Apr 10, 2021

Sure, here's some benchmarks with criteron on my Ryzen 2700X desktop. I also did some preliminary benchmarks on my ARM phone with similar results, although with a slight (40us -> 50us) regression on lock_unlock for some reason. Additionally with the old implementation the contended benchmark was leaking memory like crazy on ARM, allocating over 8GB during the ~5s benchmark run.

concurrent

image

contended

image

lock_unlock

image

@taiki-e taiki-e added the A-lock Area: futures::lock label Apr 10, 2021
Add concurrent bilock bench from rust-lang#606
Add bilock exclusion test
Run rustfmt on bilock.rs
Inspired by a previous attempt to remove allocation (rust-lang#606), this version
uses three tokens to synchronize access to the locked value and the waker.
These tokens are swapped between the inner struct of the lock and the
bilock halves. The tokens are initalized with the LOCK token held by the
inner struct, the WAKE token held by one bilock half, and the NULL token
held by the other bilock half.

To poll the lock, our half swaps its token with the inner token:
if we get the LOCK token we now have the lock,
if we get the NULL token we swap again and loop
if we get the WAKE token we store our waker in the inner and swap again,
if we then get the LOCK token we now have the lock, otherwise we return
Poll::Pending

To unlock the lock, our half swaps its token (which we know must be the LOCK
token) with the inner token:
if we get the NULL token, there is no contention so we return
if we get the WAKE token, we wake the waker stored in the inner

Additionally, this change makes the bilock methods require &mut self
@exrook
Copy link
Contributor Author

exrook commented Apr 11, 2021

mobile aarch64 benchmarks:

concurrent

image

contended

image

Note the regression below! I'd appreciate hearing any theories on why this may be happening
lock_unlock

image

@exrook exrook force-pushed the token-bilockimpl branch 2 times, most recently from 45e044f to aa207ca Compare April 11, 2021 04:31
@taiki-e
Copy link
Member

taiki-e commented Apr 18, 2021

FWIW, results on my MacBook Pro (Intel Core i7-9750H, macOS v11.2.3).

old (2be999d)

test bench::concurrent  ... bench:   4,328,809 ns/iter (+/- 2,353,379)
test bench::contended   ... bench:      94,411 ns/iter (+/- 12,618)
test bench::lock_unlock ... bench:      23,861 ns/iter (+/- 1,640)

new (37c66f6)

test bench::concurrent  ... bench:     486,154 ns/iter (+/- 92,153)
test bench::contended   ... bench:      41,488 ns/iter (+/- 10,598)
test bench::lock_unlock ... bench:      25,685 ns/iter (+/- 1,622)

new (3208bb5)

test bench::concurrent  ... bench:     480,203 ns/iter (+/- 73,745)
test bench::contended   ... bench:      39,459 ns/iter (+/- 2,696)
test bench::lock_unlock ... bench:      24,195 ns/iter (+/- 1,498)

new (08707e3)

test bench::concurrent  ... bench:     558,425 ns/iter (+/- 36,155)
test bench::contended   ... bench:      51,819 ns/iter (+/- 7,067)
test bench::lock_unlock ... bench:      32,035 ns/iter (+/- 4,968)

new (be2242e)

test bench::concurrent  ... bench:     491,844 ns/iter (+/- 96,733)
test bench::contended   ... bench:      39,943 ns/iter (+/- 11,236)
test bench::lock_unlock ... bench:      24,148 ns/iter (+/- 881)

new (aa5f129)

test bench::concurrent  ... bench:     554,916 ns/iter (+/- 75,457)
test bench::contended   ... bench:      38,764 ns/iter (+/- 6,630)
test bench::lock_unlock ... bench:      25,267 ns/iter (+/- 4,527)

new (c8788cd)

test bench::concurrent  ... bench:     448,243 ns/iter (+/- 54,914)
test bench::contended   ... bench:     247,089 ns/iter (+/- 47,252)
test bench::lock_unlock ... bench:      23,646 ns/iter (+/- 5,088)

Both 3208bb5 and be2242e look good on my machine (I tend to prefer 3208bb5, which is compatible with thread sanitizer).

@jonassmedegaard
Copy link

any progress here? I would love to see the bilock feature not requiring nightly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lock Area: futures::lock
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants