Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add groupByTo and groupMapTo to IterableOnceOps #51

Open
BalmungSan opened this issue Dec 3, 2020 · 6 comments · May be fixed by #53
Open

Add groupByTo and groupMapTo to IterableOnceOps #51

BalmungSan opened this issue Dec 3, 2020 · 6 comments · May be fixed by #53
Labels
enhancement New feature or request library:collections status:pending This enhancement request lacks consensus on whether or not to include it

Comments

@BalmungSan
Copy link
Contributor

Similar to #8 the only reason why Iterators (or IterableOnce in general) do not have groupBy & groupMap is because, you can not have a Map[K, Iterator[V]] without consuming the Iterator as such the values of the Map would be empty.

But, since groupBy & groupMap already consume the Iterator I would say nobody would expect the values to be a lazy collection, I would even say that most people would be ok with either List or Seq.
However, I would guess that the best would be just giving users the possibility to choose which collection to use to collect the results; this even opens the possibility to a somewhat common requirement of grouping unique values by using a Set.

So the proposal is to add the following two methods to IterableOnceOps so they are not only available on Iterator but for all collections.

def groupByTo[C, K](factory: Factory[A, C])(key: A => K): Map[K, C]
def groupMapTo[C, K, B](factory: Factory[B, C])(key: A => K)(f: A => B): Map[K, C]
@NthPortal NthPortal added enhancement New feature or request library:collections status:pending This enhancement request lacks consensus on whether or not to include it labels Dec 3, 2020
@julienrf
Copy link
Contributor

julienrf commented Dec 3, 2020

Another operation a bit similar would be:

trait IterableOps[A, CC[_], C] {
  def groupByTo[MC[_, _], K](factory: Factory[(K, CC[A]), MC[K, CC[A]]])(f: A => K): MC[K, CC[A]]
}

The motivation would be to be able to put the results in a TreeMap without computing an intermediate Map, for instance:

users.groupByTo(TreeMap)(_.age)

@BalmungSan
Copy link
Contributor Author

BalmungSan commented Dec 3, 2020

@julienrf oh that is also a good idea!
I think both could be done in the same operation and leaving the normal Map as a default, something like:

def groupByTo[CC[_], MC[_, _], K](factory: Factory[A, CC[A]], mapFactory: MapFactory[MC] = Map)(key: A => K): MC[K, CC[A]]

And I think it is absolutely worth it.
I may have a couple of errors in the signature but as soon as #23 is merged I will give it a try to this one (even if it isn't formally accepted by then, unless it was formally rejected).

@NthPortal
Copy link
Contributor

I do think we should try to add as few ops that allow the most things, even if you have to specify a few more parameters in some cases (e.g. identity)

@BalmungSan
Copy link
Contributor Author

BalmungSan commented Dec 7, 2020

Ok, I started to prototype this and I have a small API problem.

I defined the method like this:

def groupMapTo[K, B, C1, MC](key: A => K)(f: A => B)
                            (colFactory: Factory[B, C1])
                            (mapFactory: Factory[(K, C1), MC] = immutable.Map): MC = {

Which works great if you use it like this:

data.groupMapTo(_.country)(_.user)(colFactory = SortedSet)(mapFactory = SortedMap

However, this has two "problems"

  1. The most important one, If I just want a normal Map then I need to add an empty parenthesis at the end (), which looks quite ugly.
  2. I believe it feels more natural to put the factories first and then the key and mapping functions; especially because the mapping function can be quite big.

I originally tried to define the function like this:

def groupMapTo[K, B, C1, MC](colFactory: Factory[B, C1], mapFactory: Factory[(K, C1), MC] = immutable.Map)(key: A => K)(f: A => B): MC = {

However, the problem here is type inference.
(Scala 3 allows to at least have both factories in the same last group, mitigating problem 1)

Does anyone have any ideas?

@BalmungSan
Copy link
Contributor Author

What do you all think of an API like this:

trait GroupMap[K, V, CC[_], MC[_, _]] {
  def collectValuesAs[CC1[_]](factory: Factory[V, CC1[V]]): GroupMap[K, V, CC1, MC]
  def collectResultsAs[MC1[_, _]](factory: Factory[(K, V), MC1[K, V]]): GroupMap[K, V, CC, MC1]

  def result: MC[K, CC[V]]
  def reduce(reduce: (V, V)): MC[K, V] // Optional.
}

def groupMapTo[K, V](key: A => K)(f: A => V): NextIterableOnceOpsExtensions.GroupMap[K, V, CC, Map] = ???

So it can be used like this:

data
  .groupMapTo(_.country)(_.user)
  .collectValuesAs(SortedSet)
  .collectResultsAs(SortedMap)
  .result

This way in the future all the current methods like groupBy or groupMapReduce can be implemented in terms of this.
This also opens the possibility of new methods like groupReduce and getting the results as a SortedMap.

@julienrf
Copy link
Contributor

julienrf commented Dec 7, 2020

This looks promising!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request library:collections status:pending This enhancement request lacks consensus on whether or not to include it
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants