Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a list instead of a hash map #22

Merged
merged 9 commits into from Jan 7, 2021
Merged

Use a list instead of a hash map #22

merged 9 commits into from Jan 7, 2021

Conversation

Kestrer
Copy link
Contributor

@Kestrer Kestrer commented Jan 2, 2021

Running the benchmarks, this increases performance from 5ns/iter to 2ns/iter on my machine.

As a side effect this fixes the UB in #21.

Copy link
Owner

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using a concurrent vector instead of a linked list? I would expect even better performance.

The basic idea is pretty simple: you have a [AtomicPtr; 65] which points to arrays of varying length. The first has 1 element, the second has 1, the third has 2, the fourth has 4, etc. (array[N] has length 2^(N-1)).

You can map an index to a bucket with a simple calculation: bucket = usize::BITS - index.leading_zeros(). Simply create the array at that bucket if it doesn't exist yet.

src/thread_id.rs Outdated Show resolved Hide resolved
@Kestrer
Copy link
Contributor Author

Kestrer commented Jan 4, 2021

OK, I have implemented three versions:

  • The first (commit 0b0830e) uses a linked list of vectors.
  • The second (commit 34edea0) uses a concurrent vector, but still uses a single mutex for concurrent accesses.
  • The third (commit 4a4c93c) also uses a concurrent vector, and stores a Once with every single bucket to increase concurrency - there is no longer a single mutex.

I haven't benchmarked them yet, I will tomorrow. I like the concurrent vector approach because it would allow us to implement #19. Also, it allows us to avoid boxing the inner types as their location won't change anyway.

@Kestrer
Copy link
Contributor Author

Kestrer commented Jan 5, 2021

I've done the benchmarks, these are the results:

Benchmark Linked list of vectors Concurrent vector with mutex Concurrent vector with Once
Get 2.0ns 2.4ns 2.5ns
Get cached 2.0ns 2.3ns 2.3ns
Get cached (second thread) 2.4ns 2.8ns 2.8ns
Insert 52ns 38ns 26ns
Insert cached 37ns 48ns 47ns

It's a bit odd that the concurrent vector approach would affect the cached performance so much - I suppose it's because I'm passing around four usizes (id, bucket, bucket size, index) instead of one?

Copy link
Owner

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I'm concerned about adding a Once to each bucket since this will drastically increase the size of ThreadLocal. Keep in mind that the bucket array is directly inlined into the type so the consequences may be significant.

Because of this, I'm tending towards the second version (Concurrent vector with mutex). Insertions are relatively rare (only once per thread) so I don't expect much contention on the mutex.

Considering how fast a lookup is now, we probably don't need CachedThreadLocal any more. I would suggest just turning it into a typedef to ThreadLocal with a deprecation notice.

src/lib.rs Show resolved Hide resolved
@Kestrer
Copy link
Contributor Author

Kestrer commented Jan 7, 2021

Ok, I've deprecated CachedThreadLocal. I didn't make it a type alias because it's possible that someone may have implemented a trait for ThreadLocal and CachedThreadLocal and we don't want to break anything.

@Amanieu Amanieu merged commit 1ac832a into Amanieu:master Jan 7, 2021
@Amanieu
Copy link
Owner

Amanieu commented Jan 7, 2021

Once again, great work!

@@ -375,30 +321,42 @@ impl<T: Send + UnwindSafe> UnwindSafe for ThreadLocal<T> {}

struct RawIter<T: Send> {
remaining: usize,
buckets: [*const UnsafeCell<Option<T>>; BUCKETS],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this during my initial review: I would strongly prefer if the iterator didn't make a whole copy of the buckets and instead just accessed the bucket array by reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amanieu That would be a bit difficult to implement because the location of the buckets may be moved in memory in IntoIter, so we can't hold a pointer to it. So we would have to either box the ThreadLocal in IntoIter or remove the RawIterMut abstraction entirely.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawIter doesn't actually need to implement Iterator. It can use inherent methods instead where the pointer to the array of buckets is passed as a parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants