New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a list instead of a hash map #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using a concurrent vector instead of a linked list? I would expect even better performance.
The basic idea is pretty simple: you have a [AtomicPtr; 65]
which points to arrays of varying length. The first has 1 element, the second has 1, the third has 2, the fourth has 4, etc. (array[N]
has length 2^(N-1)
).
You can map an index to a bucket with a simple calculation: bucket = usize::BITS - index.leading_zeros()
. Simply create the array at that bucket if it doesn't exist yet.
OK, I have implemented three versions:
I haven't benchmarked them yet, I will tomorrow. I like the concurrent vector approach because it would allow us to implement #19. Also, it allows us to avoid boxing the inner types as their location won't change anyway. |
I've done the benchmarks, these are the results:
It's a bit odd that the concurrent vector approach would affect the cached performance so much - I suppose it's because I'm passing around four usizes (id, bucket, bucket size, index) instead of one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
I'm concerned about adding a Once
to each bucket since this will drastically increase the size of ThreadLocal
. Keep in mind that the bucket array is directly inlined into the type so the consequences may be significant.
Because of this, I'm tending towards the second version (Concurrent vector with mutex). Insertions are relatively rare (only once per thread) so I don't expect much contention on the mutex.
Considering how fast a lookup is now, we probably don't need CachedThreadLocal
any more. I would suggest just turning it into a typedef to ThreadLocal
with a deprecation notice.
Ok, I've deprecated |
Once again, great work! |
@@ -375,30 +321,42 @@ impl<T: Send + UnwindSafe> UnwindSafe for ThreadLocal<T> {} | |||
|
|||
struct RawIter<T: Send> { | |||
remaining: usize, | |||
buckets: [*const UnsafeCell<Option<T>>; BUCKETS], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this during my initial review: I would strongly prefer if the iterator didn't make a whole copy of the buckets and instead just accessed the bucket array by reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Amanieu That would be a bit difficult to implement because the location of the buckets may be moved in memory in IntoIter
, so we can't hold a pointer to it. So we would have to either box the ThreadLocal
in IntoIter
or remove the RawIterMut
abstraction entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawIter
doesn't actually need to implement Iterator
. It can use inherent methods instead where the pointer to the array of buckets is passed as a parameter.
Running the benchmarks, this increases performance from 5ns/iter to 2ns/iter on my machine.
As a side effect this fixes the UB in #21.