Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterators generating (many) vectors #722

Closed
Philippe-Cholet opened this issue Aug 1, 2023 · 8 comments
Closed

Iterators generating (many) vectors #722

Philippe-Cholet opened this issue Aug 1, 2023 · 8 comments

Comments

@Philippe-Cholet
Copy link
Member

Problem: slow because of repetitive heap-allocation

The 5 iterators below generate Vec<...> items.

multi_cartesian_product(self)
combinations(self, k: usize)
combinations_with_replacement(self, k: usize)
permutations(self, k: usize)
powerset(self)

It can be slow, because of the repetitive allocations, even if the user does not need to permanently store each permutation.

I think that in most cases, the user uses each vector to produce something else and then each vector is discarded. In that case, because those iterators generate (up to) many many items (product of the lengths, factorial-related, powers of two), it creates many temporary vectors which can slow down code execution.

Proposed solution: map slices

An alternative iterator could own one single (internal/private) vector and only give a reference to it to a function implementing FnMut(&[...]) -> R. Those iterators would then generate R items instead of vectors.

// current methods unchanged, new methods:
multi_cartesian_product_map<R, F>(self, func: F)
combinations_map<R, F>(self, k: usize, func: F)
combinations_with_replacement_map<R, F>(self, k: usize, func: F)
permutations_map<R, F>(self, k: usize, func: F)
powerset_map<R, F>(self, func: F)
// all with  F: FnMut(&[...]) -> R

Usage examples

it.combinations(k) and it.combinations_map(k, ToOwned::to_owned) would produce the same elements, but the first should be used!

it.combinations(k).map(closure) and it.combinations_map(k, closure) could produce the same elements if it does not require a vector. For what I tested, it's then roughly 4 to 5 times faster to go through all combinations, unless the iterator has very few elements (then .map should be used).

In it.permutations(5).filter_map(|perm: Vec<_>| some_option), if some_option does not need perm to be an owned vector then it could become it.permutations_map(5, |perm: &[_]| some_option).flatten() where there is no heap-allocation for each permutation.

Implementation (commits here)

New module vec_items (behind use_alloc feature) with:

  • pub trait VecItems<T> { type Output; fn new_item<I: Iterator<Item = T>>(&mut self, iter: I) -> Self::Output; }
  • pub struct CollectToVec; implementing VecItems with Vec<T> output and just iter.collect() (so a new vector each time, as it's currently done).
  • pub struct MapSlice<F, T> { func: F, vec: Vec<T> } implementing VecItems with R output where F: FnMut(&[T]) -> R and the internal vector is cleared but not dropped after use (and its length will not change often if ever).

E.g. for combinations_map:

  • In Itertools trait: #[cfg(feature = "use_alloc")] fn combinations_map<R, F: FnMut(&[Self::Item]) -> R>(self, k: usize, f: F) -> CombinationsMap<Self, F> where ...
  • in src/combinations.rs:
    • Convert Combinations<I> to CombinationsBase<I, F> with the new field manager: F
    • New (fused)iterator constraint F: VecItems<I::Item> and item type F::Output
    • New pub type Combinations<I> = CombinationsBase<I, CollectToVec>;
    • New pub type CombinationsMap<I, F> = CombinationsBase<I, MapSlice<F, <I as Iterator>::Item>>;
    • New fn combinations_map<I, F>(iter: I, k: usize, f: F) -> CombinationsMap<I, F>
  • Test to compare with combinations.

Very similar for the 4 others. powerset[_map] is a bit simpler because it just uses combinations[_map] internally ; and it was simpler for multi_cartesian_product than anticipated.

Closing thoughts

Avoid cloning in MapSlice case? I think(?) it might be possible to only have references: MapSlice { func: F, vec: Vec<&T> }. Then F: FnMut(&[&T]) -> R and no cloning. But I struggle with the lifetimes (or more probably some hard barrier). But is it worth it? Probably not if it's just integers, but otherwise?

The documentation I did not wrote yet would suggest to use the non _map variants if the user needs owned vectors and not slices (or if there is very little elements expected).

PS: I got the idea while replacing (0..nb).permutations(nb) (after using flamegraph and heaptrack) by a _map version of mine of permutohedron::Heap to avoid any (internal) heap-allocation while keeping a nice iterator structure. And for 8! == 40320 times, that was also 5 times faster.

@phimuemue
Copy link
Member

Related: #682

@Philippe-Cholet
Copy link
Member Author

Philippe-Cholet commented Aug 1, 2023

I saw that PR some weeks or months ago but forgot about it, I only remember the crate name and that I thought it seemed quite complex. My approach with trait VecItems is simple but quite efficient and flexible enough to make an alternative iterator for any iterator generating vectors like I did here for all 5 iterators.

@Philippe-Cholet
Copy link
Member Author

@phimuemue After diving in a bit, I agree that the problem lies with "Iterator" itself and that a "lending iterator" would be a better choice but I share the opinion that it should not be a maintenance burden to have an (evolving) additional dependency and that it would be preferable to wait it reaches the "core" if it is even a possibility (I don't know).

I like my workaround but I guess I should close this issue?

@Easyoakland
Copy link
Contributor

@Philippe-Cholet
I don't think you should close it.

@phimuemue
This sounds similar to rust-lang/rust#94667 and rust-lang/rust#82413. This PR makes sense since it appears to be how this kind of thing is being handled in std. I still think the lending solution is better but it doesn't look like that will happen any time soon.

@phimuemue
Copy link
Member

@Easyoakland Thanks for the link, I did not know about map_windows.

Still, I am unsure that going this way is the best solution:

  • As it stands, MapSlice populates the whole vector, even if many of the elements would stay in-place.

    Possible (maybe more efficient) alternatives:

    • Track the elements in the result vector, and alter them in accordance with the respective changes in indices.
    • Instead of populating a vector, hand out a proxy that generates the elements on demand.
  • We may be doing duplicate work in the first place: Imagine that someone wants to generate permutations of [0, 1, 2, 3, 4, 5]. The implementation internally generates indices and the resulting Item -- even though the indices themselves would be sufficient in this particular case.

  • For maximum performance, there might be more efficient algorithms (such as Heap's for permutations (which is in permutohedron)).

To sum up: The proposed solution improves performance, but does not offer maximum convenience (due to complicating the API), and it is unclear if the performance could be even better (my guess is "it could be better with reasonable amount of work, depending on your use case").

As such, it "falls in the middle" (neither most convenient nor most performant). If that's impossible (or hard) to offer both convenience and performance, then the API should imho not sacrifice at both ends. Thus, I am reluctant to merge this, and would resort to using other, more efficient, algorithms if needed.

@Philippe-Cholet Do you happen to have a comparison for your permutations_map vs. your patched permutohedron::Heap_mapped? If Heap_mapped can be significantly faster than permutations_map, then I'd interpret this as a sign that having and maintaining permutations_map is questionable.

@Philippe-Cholet
Copy link
Member Author

Philippe-Cholet commented Aug 8, 2023

@phimuemue
I agree it's in the middle as you say it. I think it remains convenient enough (and the old methods remain there unchanged) for a nice speed up but it definitely is not perfect (clone + populate whole vector + clear...), mainly because I did not have to change the internals too much. This workaround was generic enough to modify all 5 five iterators quite easily. Without it, I guess I would have to discard the lazy buffer for something else, and do different things for each iterator.

I did not compare permutations_map with "permutohedron::Heap_mapped" but the permutations are generated in different orders. Heap's algo is designed to be the fastest while permutations[_map] create permutations in the natural order. Compare them did not seem fair. But it might tell us something.
... Well it does tell us there is real room for improvement for permutations_map:

Time Permutations of (0..10).collect_vec() Comment
235ms permutohedron::Heap::new(&mut vec).map(|_| 0).last()
324ms permutations(vec.into_iter(), 10).map(|_| 0).last()
8.08ms heap::permutations_map(&mut vec, |_| 0).last() permutohedron::Heap wrapped by me
87.6ms permutations_map(vec.into_iter(), 10, |_| 0).last() MapSlice way

The first two lines tell us how much Heap's algo is faster than getting the natural order. Both allocates each item to a vector, while the last two do not.
I guess we could have expected permutations_map to be roughly 8 times faster. I did not thought it would be that bad.

Well I'm closing this. And I'll reopen only if I try and really fasten it.

EDIT: It was a nice exercise and I learned that it has to be really fast if it's not the most convenient, no compromise.

@phimuemue
Copy link
Member

@Philippe-Cholet Wow, thanks a lot for you effort.

I agree with all your conclusions.

@Philippe-Cholet
Copy link
Member Author

For permutations (and 4 others), we could discard my Itertools::permutations_map<R,F> and add a method map_slices<R,F> that would convert Permutations<I> to PermutationsMap<I,F> to do the same job as before.
It's just a nicer API in my opinion, exact same performance (not great yet but that might eventually change).

Name it map would be a nasty breaking change because it would override Iterator::map and has a quite different signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants