Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for other underlying smart pointers for the data structures links #7

Closed
brendanzab opened this issue Jan 30, 2018 · 17 comments
Closed

Comments

@brendanzab
Copy link

It would be nice to either use Box or Rc as the underlying smart pointer type. Alas, until we have associated generic types, we can't be polymorphic over this, but perhaps in the interim code generation could be used to create concrete implementations of:

  • ArcList
  • RcList
  • BoxList
  • etc.

Then once associated generic types comes we could convert these to type aliases that refer to a common List type.

@orium
Copy link
Owner

orium commented Feb 1, 2018

We can't do Box since we need to be able to have multiple pointers to the underlying data (due to structural sharing).

But this is something I had in mind for a while, because of performance reasons. Why pay the price of having thread-safe reference counting when you do not need it (which is probably most of the time)? And there is a significant price to pay. For example, the HashTrieMap::insert() and HashTrieMap::remove() were more than twice as fast with Rc than with Arc!

So this is something that has a lot of value, however I really dislike the idea of solving this through code generation, specially since there are plans to support higher kinded types that would solve this nice and neatly. When rust supports them I will implement this.

@Marwes
Copy link
Contributor

Marwes commented Feb 2, 2018

If you have a stomach for hacks it is sort of possible to emulate the HKT that are needed for this to work :).

use std::rc::Rc;
use std::sync::Arc;
use std::ops::Deref;

trait Shared<T> {
    type Ptr: Clone + Deref<Target = T>;
}

impl<T> Shared<T> for Rc<()> {
    type Ptr = Rc<T>;
}

impl<T> Shared<T> for Arc<()> {
    type Ptr = Arc<T>;
}

struct List<T, S> where S: Shared<T> {
    field: S::Ptr,
}

struct Map<K, V, S> where S: Shared<K> + Shared<V> {
    key: <S as Shared<K>>::Ptr,
    value: <S as Shared<V>>::Ptr,
}

fn main() {
    Map::<i32, &str, Rc<()>> {
        key: Rc::new(1),
        value: Rc::new(""),
    };
}

https://play.rust-lang.org/?gist=8b67f30a72e00dc3c2e3865bb31ae797&version=stable

@Marwes
Copy link
Contributor

Marwes commented Mar 9, 2018

Hacked together #7 (comment) at https://github.com/Marwes/rpds/tree/rc . The speedups aren't very impressive so I am not sure if the added complexity is worth it.

running 8 tests
test vector_drop_last        ... bench:   8,240,820 ns/iter (+/- 1,384,333)
test vector_drop_last_mut    ... bench:     424,192 ns/iter (+/- 35,384)
test vector_get              ... bench:      83,473 ns/iter (+/- 6,715)
test vector_iterate          ... bench:      80,264 ns/iter (+/- 14,550)
test vector_push_back        ... bench:   9,929,261 ns/iter (+/- 197,915)
test vector_push_back_mut    ... bench:     963,687 ns/iter (+/- 176,121)
test vector_push_back_mut_rc ... bench:     847,343 ns/iter (+/- 126,324)
test vector_push_back_rc     ... bench:   7,367,774 ns/iter (+/- 582,476)

Marwes added a commit to Marwes/rust that referenced this issue Mar 10, 2018
Were seeing some odd performance problems when using incremental compilation where `Rc` pointers were actually slower than `Arc` pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent).

```
test vector_push_back        ... bench:  11,668,015 ns/iter (+/- 772,861)
test vector_push_back_mut    ... bench:   1,423,771 ns/iter (+/- 22,011)
test vector_push_back_mut_rc ... bench:   1,181,765 ns/iter (+/- 123,724)
test vector_push_back_rc     ... bench:  17,141,746 ns/iter (+/- 203,048)
```
(Source and non incremental benchmarks orium/rpds#7 (comment))
bors added a commit to rust-lang/rust that referenced this issue Mar 10, 2018
Add missing inline annotations to Cell

Were seeing some odd performance problems when using incremental compilation where `Rc` pointers were actually slower than `Arc` pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent).

```
test vector_push_back        ... bench:  11,668,015 ns/iter (+/- 772,861)
test vector_push_back_mut    ... bench:   1,423,771 ns/iter (+/- 22,011)
test vector_push_back_mut_rc ... bench:   1,181,765 ns/iter (+/- 123,724)
test vector_push_back_rc     ... bench:  17,141,746 ns/iter (+/- 203,048)
```
(Source and non incremental benchmarks orium/rpds#7 (comment))
@orium
Copy link
Owner

orium commented Mar 12, 2018

I measure a better speedup:

 name              bench-arc ns/iter  bench-rc ns/iter  diff ns/iter   diff %  speedup 
 vector_drop_last  81,000,464         43,920,265         -37,080,199  -45.78%   x 1.84 
 vector_push_back  84,319,145         47,405,064         -36,914,081  -43.78%   x 1.78 

but even with this significant speedup I still don't think it is worth it, given the "hacky nature" of the solution.

If everything goes well we should have generic associated types in rust by the end of this year (at least in nightly) and we will be able to have an elegant solution.

@Marwes
Copy link
Contributor

Marwes commented Mar 12, 2018

I'd be (pleasantly) surprised if generic associated types appear this year though you may be right that this is a bit to hacky.

That said, I fear that generic associated types will barely simplify anything. The main problem with this implementation is that deriving does not work when storing the associated type of a type parameter (deriving(Debug) gives us ERROR: P::Ptr does not implement Debug). However that will still be a problem with an implementation using generic associated types since that also entails using P::Ptr for the fields (https://github.com/rust-lang/rfcs/blob/master/text/1598-generic_associated_types.md#associated-type-constructors-of-type-arguments).

The only actual benefit of generic associated types is that we don't have to specify bounds explicitly for every type that is behind a pointer https://github.com/Marwes/rpds/blob/e482d5abbaa6c876d7c624e497affe7299bbeece/src/sequence/vector/mod.rs#L153.

@brendanzab
Copy link
Author

Would top-level higher kinded types solve this?

@Marwes
Copy link
Contributor

Marwes commented Mar 13, 2018

Would top-level higher kinded types solve this?

Maybe? This mostly depends on how the bounds in deriving gets specified. Something like below would be necessary.

#[derive(Debug)]
struct List<T, P<_>> {
}
// Generates
impl<T, P> Debug for List<T, P<_>> where for<U> P<U>: Debug { }

However that may actually be to broad of a bound in some case since we only need "for<U> where P<U> is a field of List" P<U>: Debug (basically the two bounds that are explicitly specified now).

rust-lang/rfcs#2353 might help with this issue though.

@Marwes
Copy link
Contributor

Marwes commented Mar 13, 2018

I guess for<U> P<U>: Debug ought to be fine in this case though and it is impossible to generate "perfect" bounds anyway. (Something to do with self-recursive types).

@Marwes
Copy link
Contributor

Marwes commented Mar 30, 2018

Semi-related, it would be nice to omit the reference counted pointer for the values themselves and just store values as-is. For small types such as plain integers or floats as well as types which are already reference counted the Arc just incurs additional overhead.

@Marwes
Copy link
Contributor

Marwes commented Apr 1, 2018

Actually, if the keys and values are stored unboxed I think a less hacky solution may be used. I looked through a few of the structures (not all), and it seems that only a single type is boxed in an Arc per type. So it would be enough to just parameterize on that type directly, avoiding the whole mess of associated types and manually implementing derive's.

struct HashTrieMap<K, V, Entry> {
    ...
    entry: Entry,
}

pub trait Shared: Deref {
     fn new(t: Self::Target) -> Self;
     fn make_mut(self_: &mut Self) -> &mut Self::Target;
}

impl<K, V, Entry> HashTrieMap<K, V, Entry> where Entry: Shared<Target = Entry<K, V>> { .. }

pub type ArcHashTrieMap<K, V> = HashTrieMap<K, V, Arc<Entry<K, V>>>;
pub type RcHashTrieMap<K, V> = HashTrieMap<K, V, Rc<Entry<K, V>>>;

@orium orium changed the title Support for other underlying smart pointers Support for other underlying smart pointers for the data structures links Jul 8, 2018
@orium
Copy link
Owner

orium commented Jul 8, 2018

Splited this in two issues, one for to control the container of the elements of the data structure, which can be Rc, Arc, and owned (using clone).

This one is now just for the links inside between the data structures nodes.

@orium
Copy link
Owner

orium commented Mar 6, 2019

I've implemented this for List:

pub struct List<T, P = SharedPointerKindRc>

This uses Archery, a project I created recently to be able to abstract from Rc/Arc easily and with good ergonomics. Take a look if you are interested.

@orium
Copy link
Owner

orium commented Mar 6, 2019

Work in progress:

  • List
  • Stack
  • Queue
  • Vector
  • HashTrieMap
  • HashTrieSet
  • RedBlackTreeMap
  • RedBlackTreeSet
  • Add info to the README.md and maybe to the documentation of the data structures.

@Marwes
Copy link
Contributor

Marwes commented Mar 6, 2019

Nice! I was wondering when this would show up after seeing Archery :)

@Marwes
Copy link
Contributor

Marwes commented Mar 6, 2019

I may end up stealing the ideas in Archery and try to do something similar for RefCell/RwLock/Mutex

@orium
Copy link
Owner

orium commented Mar 6, 2019

I may end up stealing the ideas in Archery and try to do something similar for RefCell/RwLock/Mutex

Awesome. Consider adding them to archery if they are general/flexible enough. In my mind I want archery to be more than just abstraction between Rc/Arc. For instance I want to add some sort of container that can be either owning T, or having a SharedPointer<T, Rc/Arc>.

orium added a commit that referenced this issue Mar 6, 2019
orium added a commit that referenced this issue Mar 6, 2019
@orium
Copy link
Owner

orium commented Sep 11, 2019

Release 0.7.0 where you can choose between Rc and Arc pointer. Note that #36 still stands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants