Use a list instead of a hash map #22

Kestrer · 2021-01-02T08:20:31Z

Running the benchmarks, this increases performance from 5ns/iter to 2ns/iter on my machine.

As a side effect this fixes the UB in #21.

Amanieu

Have you considered using a concurrent vector instead of a linked list? I would expect even better performance.

The basic idea is pretty simple: you have a [AtomicPtr; 65] which points to arrays of varying length. The first has 1 element, the second has 1, the third has 2, the fourth has 4, etc. (array[N] has length 2^(N-1)).

You can map an index to a bucket with a simple calculation: bucket = usize::BITS - index.leading_zeros(). Simply create the array at that bucket if it doesn't exist yet.

src/thread_id.rs

Kestrer · 2021-01-04T17:24:22Z

OK, I have implemented three versions:

The first (commit 0b0830e) uses a linked list of vectors.
The second (commit 34edea0) uses a concurrent vector, but still uses a single mutex for concurrent accesses.
The third (commit 4a4c93c) also uses a concurrent vector, and stores a Once with every single bucket to increase concurrency - there is no longer a single mutex.

I haven't benchmarked them yet, I will tomorrow. I like the concurrent vector approach because it would allow us to implement #19. Also, it allows us to avoid boxing the inner types as their location won't change anyway.

Kestrer · 2021-01-05T06:52:25Z

I've done the benchmarks, these are the results:

Benchmark	Linked list of vectors	Concurrent vector with mutex	Concurrent vector with `Once`
Get	2.0ns	2.4ns	2.5ns
Get cached	2.0ns	2.3ns	2.3ns
Get cached (second thread)	2.4ns	2.8ns	2.8ns
Insert	52ns	38ns	26ns
Insert cached	37ns	48ns	47ns

It's a bit odd that the concurrent vector approach would affect the cached performance so much - I suppose it's because I'm passing around four usizes (id, bucket, bucket size, index) instead of one?

Amanieu

Great work!

I'm concerned about adding a Once to each bucket since this will drastically increase the size of ThreadLocal. Keep in mind that the bucket array is directly inlined into the type so the consequences may be significant.

Because of this, I'm tending towards the second version (Concurrent vector with mutex). Insertions are relatively rare (only once per thread) so I don't expect much contention on the mutex.

Considering how fast a lookup is now, we probably don't need CachedThreadLocal any more. I would suggest just turning it into a typedef to ThreadLocal with a deprecation notice.

src/lib.rs

Kestrer · 2021-01-07T14:17:07Z

Ok, I've deprecated CachedThreadLocal. I didn't make it a type alias because it's possible that someone may have implemented a trait for ThreadLocal and CachedThreadLocal and we don't want to break anything.

Amanieu · 2021-01-07T21:23:22Z

Once again, great work!

Amanieu · 2021-01-23T15:15:25Z

src/lib.rs

@@ -375,30 +321,42 @@ impl<T: Send + UnwindSafe> UnwindSafe for ThreadLocal<T> {}

 struct RawIter<T: Send> {
    remaining: usize,
+    buckets: [*const UnsafeCell<Option<T>>; BUCKETS],


I missed this during my initial review: I would strongly prefer if the iterator didn't make a whole copy of the buckets and instead just accessed the bucket array by reference.

@Amanieu That would be a bit difficult to implement because the location of the buckets may be moved in memory in IntoIter, so we can't hold a pointer to it. So we would have to either box the ThreadLocal in IntoIter or remove the RawIterMut abstraction entirely.

RawIter doesn't actually need to implement Iterator. It can use inherent methods instead where the pointer to the array of buckets is passed as a parameter.

Kestrer added 2 commits January 2, 2021 08:17

Use a list instead of a hash map

50c5c7d

Save space with NonNull instead of a raw pointer

14c5889

Amanieu reviewed Jan 3, 2021

View reviewed changes

src/thread_id.rs Outdated Show resolved Hide resolved

Kestrer added 6 commits January 3, 2021 17:30

Reverse the thread id binary heap

6e46d4f

Use criterion for benchmarks and add more of them

d7b2b1d

Skip benchmarking on 1.28.0

6c744de

Fix testing on 1.28.0

0b0830e

Use a concurrent vector

fe072dc

Fix running benchmarks on CI

34edea0

Amanieu reviewed Jan 6, 2021

View reviewed changes

src/lib.rs Show resolved Hide resolved

Deprecate CachedThreadLocal

ef33f17

Amanieu merged commit 1ac832a into Amanieu:master Jan 7, 2021

Kestrer deleted the list branch January 7, 2021 21:36

Amanieu mentioned this pull request Jan 9, 2021

Stack Overflow on Windows only (regression 1.1.0) #23

Closed

fanatid mentioned this pull request Jan 12, 2021

chore(deps): update vectordotdev/vector#5999

Merged

Amanieu reviewed Jan 23, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a list instead of a hash map #22

Use a list instead of a hash map #22

Kestrer commented Jan 2, 2021

Amanieu left a comment

Kestrer commented Jan 4, 2021

Kestrer commented Jan 5, 2021

Amanieu left a comment

Kestrer commented Jan 7, 2021

Amanieu commented Jan 7, 2021

Amanieu Jan 23, 2021

Kestrer Jan 23, 2021

Amanieu Jan 23, 2021

Use a list instead of a hash map #22

Use a list instead of a hash map #22

Conversation

Kestrer commented Jan 2, 2021

Amanieu left a comment

Choose a reason for hiding this comment

Kestrer commented Jan 4, 2021

Kestrer commented Jan 5, 2021

Amanieu left a comment

Choose a reason for hiding this comment

Kestrer commented Jan 7, 2021

Amanieu commented Jan 7, 2021

Amanieu Jan 23, 2021

Choose a reason for hiding this comment

Kestrer Jan 23, 2021

Choose a reason for hiding this comment

Amanieu Jan 23, 2021

Choose a reason for hiding this comment