Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More generic maps (in GroupingMap) #901

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Philippe-Cholet
Copy link
Member

@Philippe-Cholet Philippe-Cholet commented Mar 12, 2024

At the moment, this is only a proof of concept, but it would close #588 (and eventually solve some issues, see generic-container Generic vector/hasher/map ).

  • I wrote a straightforward private trait generalizing both BTreeMap and HashMap (with any hasher).
    Maybe too much methods, or not enough. I will adjust it.
    I don't think we should generalize to more than those two maps but I guess it's possible.
  • Changed use_std to use_alloc where possible (because BTreeMap only needs use_alloc while HashMap needs use_std) and relaxed some Eq + Hash bounds.
  • I added a alternative aggregate_in to aggregate which takes one more argument:
    any empty map (or one we would .clear()).

The last point is where I struggled the most but that's pretty much the only real possibility because there is so many possibilities of initializations: BTreeMap::new(), HashMap::new(), HashMap::default() for some hashers, and with_capacity shenanigans. And I think it's the most general solution (I can even imagine one single HashMap being used multiple times to avoid reallocations).

This could extend to other areas than just GroupingMap, and a Set trait could be wrote to generalize BTreeSet and HashSet but it's for another PR.

For now, I would like some feedback.

CI currently fails because of temporary unused imports.

@Philippe-Cholet Philippe-Cholet added the generic-container Generic vector/hasher/map label Mar 12, 2024
@Philippe-Cholet
Copy link
Member Author

Currently, we can do it.into_grouping_map().product(); but we can't have a generic map.

let my_map = HashMap::new();
let my_map = HashMap::with_capacity(100);
let my_map = FxHashMap::default(); // rustc_hash
let mut my_map = FxHashMap::default(); my_map.reserve(100);
let my_map = BTreeMap::new();

But where to give the map? (without breaking change)

1. Initial idea: suffix all GroupingMap methods with _in. But it results in many weird names like minmax_by_key_in.

it.into_grouping_map().product_in(my_map);
//                            --- ------

For each GroupingMap method like product, old it.into_grouping_map().product(); would delegate to new it.into_grouping_map().product_in(HashMap::new());.

2. Suffix Itertools methods. The trait would be a bit more crowded. But only two redirections.

it.into_grouping_map_in(my_map).product();
//                  --- ------

Old it.into_grouping_map(); would delegate to new it.into_grouping_map_in(HashMap::new());.
Similar for into_grouping_map_by -> into_grouping_map_by_in but that's it.

That would be similar to other possible extensions like Itertools::counts_in and Itertools::all_unique_in.

3. A bit nested. But no _in suffix.

it.into_grouping_map().with_map(my_map).product();
//                    -----------------

For each GroupingMap method like product, old it.into_grouping_map().product(); would delegate to new it.into_grouping_map().with_map(HashMap::new()).product();.

@phimuemue
Copy link
Member

Hi @Philippe-Cholet, two thoughts:

  • Does into_grouping_map play nicely with type inference? (Currently, GroupingMap returns maps with different value types, e.g. minmax returns MinMaxResult<V>s, but max returns Vs.) If so, I would go with it (offers a central place to supply a map and generalizes well to counts et al).
  • I know that for many day-to-day cases, a plain old (possibly sorted) Vec<(K, V)> can be more efficient than {BTree|Hash}Map. Thus, I think we should ensure that trait Map could support this type.

@Philippe-Cholet
Copy link
Member Author

Philippe-Cholet commented Mar 13, 2024

@phimuemue
I see your performance point about Vec<(K, V)>. It can always be added later (in the next PR, or any release) as we would not break anything by implementing Map for a new type.

Well, silly me...!! I don't know how it could work in a central place (cases 2 and 3 in my last comment).
We would create a new struct GroupingMapIn that GroupingMap would alias, right?
pub type GroupingMap<I> = GroupingMapIn<I, HashMap<_, _>>;... Simply blocked! The first one would be <I as Iterator>::Item.0 (which I can't write) and the second is still unknown since it's the method using it that will determine it...
Then only remain my initial idea (case 1) method_name_in<..., M: Map<Key = ..., Value = ...>>(..., map: M) -> M.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

Attention: Patch coverage is 95.34368% with 21 lines in your changes are missing coverage. Please review.

Project coverage is 94.43%. Comparing base (6814180) to head (5a71e63).
Report is 40 commits behind head on master.

Files Patch % Lines
src/generic_containers.rs 54.54% 20 Missing ⚠️
src/grouping_map.rs 99.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #901      +/-   ##
==========================================
+ Coverage   94.38%   94.43%   +0.04%     
==========================================
  Files          48       49       +1     
  Lines        6665     7187     +522     
==========================================
+ Hits         6291     6787     +496     
- Misses        374      400      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Philippe-Cholet Philippe-Cholet marked this pull request as ready for review March 18, 2024 10:20
Copy link
Member Author

@Philippe-Cholet Philippe-Cholet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add tests using BTreeMap.

EDIT: doctests added.

src/generic_containers.rs Outdated Show resolved Hide resolved
src/grouping_map.rs Outdated Show resolved Hide resolved
@Philippe-Cholet
Copy link
Member Author

@SkiFire13 I see you are the author of GroupingMap. What do you think of such extension?

Copy link
Contributor

@SkiFire13 SkiFire13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, silly me...!! I don't know how it could work in a central place (cases 2 and 3 in my last comment).
We would create a new struct GroupingMapIn that GroupingMap would alias, right?
pub type GroupingMap<I> = GroupingMapIn<I, HashMap<_, _>>;... Simply blocked! The first one would be <I as Iterator>::Item.0 (which I can't write) and the second is still unknown since it's the method using it that will determine it...

You could add two generics (one for the key and one for the value) to GroupingMap (which would feel a bit weird though, but I think this is fundamentally required if we want to move the "in" at the start of the chain) and then restrict the various methods to M: Map<Value = ExpectedValue>.


Overall, this looks good to me. The duplication of the methods is a bit unfortunate though.

Could this be extended to into_group_map[_by] too?

src/generic_containers.rs Outdated Show resolved Hide resolved
src/grouping_map.rs Show resolved Hide resolved
src/generic_containers.rs Show resolved Hide resolved
@Philippe-Cholet
Copy link
Member Author

@SkiFire13

The duplication of the methods is a bit unfortunate though.

I totally agree, I'm gonna investigate your "add two generics" idea.

Could this be extended to into_group_map[_by] too?

Yes, in the next PR. I intend to extend all HashMap usage in the crate (and HashSet later), most generic-container Generic vector/hasher/map issues should be solved with this Map.

Because I want to make all this work first, I'm gonna delay two of your ideas:

  • Optimize remove+insert in GroupingMap::aggregate_in.
  • impl<M: Map> Map for &mut M.

@SkiFire13
Copy link
Contributor

Because I want to make all this work first, I'm gonna delay two of your ideas:

* Optimize `remove+insert` in `GroupingMap::aggregate_in`.

* `impl<M: Map> Map for &mut M`.

Understandable. As long as Map is private this can be changed later, so I'm fine with it.

@Philippe-Cholet Philippe-Cholet marked this pull request as draft March 19, 2024 15:55
Philippe-Cholet added a commit to Philippe-Cholet/itertools that referenced this pull request Mar 21, 2024
rust-itertools#901 (comment)
SkiFire13 had the idea this could be optimized for vectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generic-container Generic vector/hasher/map
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is there any way to make the into_grouping_map family of functions support custom hashers
3 participants