Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROOVY-6992: Added support for collecting over sets #498

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mperry
Copy link

@mperry mperry commented Jul 31, 2014

Pull request for https://jira.codehaus.org/browse/GROOVY-6992.

Collecting over sets without passing in a collection to add to returns a list. We really want it to return a set.

I added support for this to DefaultGroovyMethods for a set using the default set of LinkedHashSet. I noticed this was used as the default. It worked for HashSet too, but I have kept the current default. I initially tried TreeSet, but this class is broken in Java and should always require an ordering operation for elements.

A warning, this may break existing code that relies on collecting over a set returning a list. For this behaviour, toList() can be called either before or after the collect method.

@paulk-asert
Copy link
Member

An example showing the ramifications of this breaking change:

def nums = [2,4,5,6]
Set a = [*nums]
Set b = [*nums, 'foo', 'bar']

// before
assert [1, 2, 3] as Set == a.collect { (int) (it / 2) }.toSet()
assert ['4 * Integer', '2 * String'] == b.collect { it.class }.groupBy().collect{ k, v -> "${v.size()} * $k.simpleName" }

// after
assert [1, 2, 3] as Set == a.collect { (int) (it / 2) }
assert ['4 * Integer', '2 * String'] == b.toList().collect { it.class }.groupBy().collect{ k, v -> "${v.size()} * $k.simpleName" }

So for a you would no longer need toSet() but for b you need to call toList() before collect. It's not clear to me that the proposed change is obviously better. Groovy doesn't assume the transformation in the collect closure produces something which should have a uniqueness property.

@paulk-asert
Copy link
Member

Another example:

Set a = [-100, 100, 10]
def avgAbsolute = a*.abs().with{ it.sum() / it.size() } // *. is collect shorthand
// before
assert 70 == avgAbsolute
// after
assert 55 == avgAbsolute

Obviously you can get the alternatives:

// before for 55
a*.abs().toSet().with{...}
// after for 70
a.toList()*.abs().with{...}

@mperry
Copy link
Author

mperry commented Jul 31, 2014

For me, I found the behaviour of mapping over sets surprising. I thought mapping over functors should return the same type and initially I thought a set would be a functor. For reference, the haskell signature is:

fmap :: (a -> b) -> f a -> f b

The bigger issue is should all collections be transformed into lists by default? I had a look at the JDK to see what this could also apply to and the main interfaces I saw were List, Set and Queue. Would this also apply to the Iterable interface? Hmm, after thought Iterable is probably less relevant because it is accessing a structure sequentially like a list.

Now perhaps a set is not a functor. You can't guarantee that composing two functions together will get the same result as applying them individually:

set.map(f).map(g) != set.map(f.andThen(g))

For more on the design issue of set and functor in Scalaz:
http://typelevel.org/blog/2014/06/22/mapping-sets.html
http://jwesleysmith.bitbucket.org/set-functor.pdf
http://failex.blogspot.jp/2013/06/fake-theorems-for-free.html

@mperry
Copy link
Author

mperry commented Jul 31, 2014

Paul, I had not seen using '*' to convert to sets, only the spread operator for collect. How does it convert to a set and what is the difference between a and b below (where a is a set and b is a list)

def nums = [2,4,5,6]
Set a = [*nums]
def b = [*nums]

@mperry
Copy link
Author

mperry commented Jul 31, 2014

Does the same thing apply to collectMany?

@paulk-asert
Copy link
Member

The list despread operator has nothing to do with Set assignment - it was just a way of removing some dupe in the above example (you could wonder why there is no set despread operator but that's perhaps for another day).

There is special target typing performed for list literals. The type given to the list literal is given special treatment for typed declarations with a Collection type. Think "diamond" operator in Java or method parameter target typing in Java 8. So basically Set a = [1, 2, 3] is the same as def a = [1, 2, 3] as Set or def a = new LinkedHashSet([1, 2, 3]).

@paulk-asert
Copy link
Member

Yes, the type is retained for methods which return a filter, subset or expansion of some original collection, e.g. things like: grep, findAll, split, take, drop, intersect, plus, minus, but where a transformation is (potentially) performed then a new default collection is returned (generally an ArrayList).

One thing that is being thought of for 2.4+ is improved target type inferencing for DGM methods. So for the methods above (but not collect or collectMany) then we know if you provide a Set, then a set is returned, rather than having to provide a special DGM Set variant, those methods might be annotated (with something like @RetainsType - working suggestion only). That way you'd get some of the type inference I imagine that you are after.

@mperry
Copy link
Author

mperry commented Jul 31, 2014

We are chasing our tails because we don't have higher-order types. I want the to use the functor typeclass, or alternatively in OO terms, have a co-variant return type that matches the structure we are performing the method on (which means duplicate code compared to typeclasses). My argument is we should be allowed to override the method with a co-variant return type, rather than keep it invariant as Collection.

Why should list be regarded as special that everything should return?

Why should collect and collectMany and list be treated specially for the 2.4 type inference proposal?

Would the target type inferencing for RetainType work statically? I can see this would be ok dynamically, I guess it could work statically by generating methods for each subclass at compile time. I guess the compiler could use that information to do type inference fro each call to the method.

Converting everything back to list makes it harder to write more general libraries if list is treated specially.

@mperry
Copy link
Author

mperry commented Jul 31, 2014

One idea for 2.4 that I have not thought through is to be able to attach a trait to an existing class, rather than have to write extension methods to add those methods. So if trait T is attached to class C, an object of type C could be passed around as a T.

@paulk-asert
Copy link
Member

+1 for higher-order kinds. But how to implement nicely? So many questions! :-)

There is no implementation of @RetainType at present - just design discussions and prototyping bits of it. The current early design ideas are the @RetainType would be used only for type inferencing during static compilation. It wouldn't generate a whole family of methods - so no e.g. Set variant would be visible from Java for instance - it would just know about the right type in use when checking for errors and could do appropriate casts in some places if needed at the bytecode level.

I'd have to think a bit more about your trait idea but I am sure the Groovy team is open to useful suggestions. Do you have some concrete examples of how the idea would work?

@yukoba
Copy link
Contributor

yukoba commented Jul 31, 2014

I think the better implementation way for this is to change the return type of Collection.collect(Closure) from List to Collection,
and use createSimilarCollection(self) instead of new ArrayList(self.size()).
You don't need to add defaultSet(). It is already in createSimilarCollection().

But I think one of the biggest problem is the upper compatibility as Paul said.

@melix
Copy link
Member

melix commented Jul 31, 2014

Hi guys. Note that everything that changes the method signature is unlikely to be accepted anytime soon as it is a binary breaking change.

@blackdrag
Copy link
Member

This discussion shows one fundamental problem in those collections based operations we offer, and that is trying to keep the characteristics of the origin, while at the same time not being able to really do that. For example, even if we produce a LinkedHashSet for an LinkedHashSet, as soon as a custom Set class is used, all bets are of. And maybe Groovy started to develop a bit into the wrong direction here

We started of with an List type (I think ArrayList) as basic result of operations. If this is known, then all users have to do is to convert. Looking at Java 8, that result type is Stream. If you feed in a Set, you will later have to convert the Stream back to a Set. Currently this looks to me like a better approach.

Then about functors... I am a bit confused, and surely you know much more about functional programming terms than I do... but I thought a functor is a function, not a type. So in my understanding you want collect to be a functor, not Set.

Going to
set.map(f).map(g) == set.map(f.andThen(g))

So for example:

Set set = [1,2,3]
def twoTimes = {it*2}
def intHalf = {it.intdiv(2)}
assert set.collect(intHalf).collect(twoTimes) == set.collect(twoTimes << intHalf)

This assert is already true and for this here it does not matter if the result of Set#collect is a List or a Set. Sure, it is no homomorphism then. I am aware

For me the practical problem is more, that without some kind of lazy evaluated structure you won't get any advantage of this. the left side of the assert will performane worse than the right side.

@yukoba
Copy link
Contributor

yukoba commented Jul 31, 2014

One option for the upper compatibility problem is to use a new method name map() and implement Iterable.map(Closure) like this.

public static <S,T> Collection<T> map(Iterable<S> self, @ClosureParams(FirstParam.FirstGenericType.class) Closure<T> transform) {
    return collect(self, (Collection<T>) createSimilarCollection(self), transform);
}

And write "We recommend you to use map() instead of collect()" in collect() JavaDoc.

My argument is we should be allowed to override the method with a co-variant return type, rather than keep it invariant as Collection.

In Groovy, if you don't use @TypeChecked or @CompileStatic, Collection is enough.

@mperry
Copy link
Author

mperry commented Jul 31, 2014

You are probably familiar with these references already @paulk-asert, but perhaps others aren't.

The story of how higher order types got into Scala is pretty interesting. Vincent Crement wrote a compiler for Featherweight Generic Java (FGJ) and generalised parameters representing types to type constructors. The paper from his work is from 2008 (http://www.jot.fm/issues/issue_2008_06/article2.pdf).

From the implementation section on p 40:
"Shortly after we implemented and made public a prototype compiler for FGJ-omega, type constructor parameterization was independently integrated into the Scala compiler under the name of type constructor polymorphism [MPO07]. The authors of the Scala implementation based their syntax on an unpublished version of the present paper where type constructor parameterization were described in the context of Scala. However they extended our syntax to make type constructor parameterization smoothly interact with pre-existing features of the language. ...Our work can be considered as a theoretical foundation for the subset of Scala that corresponds to FGJ-omega."

Odersky I think was at or around the Swiss university at the time and integrated this into Scala. His paper on the topic is "Generics of a Higher Kind", http://adriaanm.github.io/files/higher.pdf.

Perhaps in another dimension Java would have higher order types before Scala!

The site where this work was done was down for a long time, but I just followed the link tonight and it looks like it is up again (http://lampwww.epfl.ch/~cremet/FGJ-omega/). I have not had a chance to look around the site yet, but the compiler and the Coq proofs should be interesting.

I will have a think more to flesh out the trait idea. At the moment we can add methods to classes, but why not add whole traits/interfaces? I think this would mean Groovy had similar functionality to Haskell typeclasses and Scala implicits.

@yukoba made a comment on the return type that we should use createSimilarCollection instead of defaultSet. Note that all of this is not ideal as it will only create a similar collection for those few classes that we code for. This is not a general solution. A general solution here requires a function that creates the structure i.e. (Unit -> C), where C is a collection, most often known as unit on monad with signature (A -> M).

I have been exploring these ideas on a series of blog posts (http://mperry.github.io/). I am part of the way through writing up a post about implementing monads, but you can see the results already here (https://github.com/mperry/functionalgroovy/tree/master/typeclass/src/main/groovy/com/github/mperry/fg/typeclass). The concrete folder under the previous URL shows concrete functor, applicative and monad instances for lists and options. @blackdrag, you might find it useful to start here on functors. A functor is just any class with a single generic type T that implements the map/collect/fmap method with signature T<B> map(F<A, B>). I showed how to do this in Groovy here (http://mperry.github.io/2014/07/01/groovy-functors.html).

I have then been adding these methods to existing classes through Groovy extension modules. Here is an example for adding the monadic methods to List in ListMonadExtension (https://github.com/mperry/functionalgroovy/tree/master/main/src/main/groovy/com/github/mperry/fg). In essence we create a concrete method for each monad method whose implementation calls the appropriate monadic method. So we add flatMap/liftM2/other monad methods to SetExtension2, but the implementation calls the monad implementation for Set that just defines unit and flatMap. Here is where attaching the trait to an existing class would be useful. If we attached ListMonad to List then List would get all the monad methods: unit, flatMap, sequence, traverse, liftM2, etc. This would allow us incredible flexibility where the original class writer does not need to see all possible abstraction for their class. For example, this would allow Lists and other monads to be used in monad comprehensions that rely on the presence of flatMap and map.

Groovy is moving towards more commitment to type safety. The advantage of all this better abstraction, more reuse, better reasoning about code, fewer defects and faster development time.

This is a pretty long winded comment, so I hope others find it useful.

@blackdrag
Copy link
Member

Ok, so I wrote I longer comment here and decided to delete it all again, because there is one point I don't really have a clue about yet. You define a functor as any class that defines T<B> map(F<A, B>). But looking at http://mperry.github.io/2014/07/01/groovy-functors.html you right away start with M<B> map(M<A> m, F<A, B> f);. What is m? Because it smells to me like that m should be actually "this" - in other words the source of elements f will be applied on. But if that is right, then it should actually be abstract class M<A> { def <B> M<B> map(F<A,B> f) }

Of course you cannot define a functor trait like this. And I think that is really what you are after in the end.

@yukoba
Copy link
Contributor

yukoba commented Jul 31, 2014

return type that we should use createSimilarCollection instead of defaultSet. Note that all of this is not ideal as it will only create a similar collection for those few classes that we code for. This is not a general solution.

I think so too.
Therefore, to add immutable collections, I have to do it inside of Groovy #482,
and every time Oracle add a Collection class to Java, they have to be added to createSimilarCollection() #478 .

But isn't changing those system is quite hard task and lost compatibility?

@mperry
Copy link
Author

mperry commented Jul 31, 2014

I will try to explain the pseudo Java for the Mappable<M<Z>> interface in the blog. Here M is the generic type parameter which itself takes a single generic type Z. This is invalid Java. Say we can implement the Mappable<List<_>> type, or a Mappable<Set<_>> type or the Mappable<Option<_>> type. They are all structures that can be mapped over. So you want to parameterise the type that can be mapped over so that map takes a parameter of the correct type (M<A>) and returns the correct type (M<B>). This means map for a Mappable<Set<_>> will return Set<B>, but for a Mappable<Option<_>> will return an Option<B> for any A and B used in the map function.

This is not that important for Functor because it has just a single method. Applicative (http://mperry.github.io/2014/07/02/groovy-applicatives.html) extends Functor and has two abstract methods (pure and apply). From these two methods, many other useful concrete methods can be derived which all use the type parameter of the class. The examples I gave for Applicative were liftA3 and sequenceA which both use the type parameter for Applicative.

A good introduction to the topic might be http://learnyouahaskell.com/functors-applicative-functors-and-monoids.

What may be confusing is that my proposal to add a trait to the class has a mismatch on the usual 'this' parameter. I have not thought carefully about this yet and so it is unresolved.

@mperry
Copy link
Author

mperry commented Jul 31, 2014

@yukoba The problem you are having is that the map method of Functor can't be implemented by itself in a vacuum because it needs to know how to create the new structure. Any monad (with the unit and flatMap methods) can implement Functor's map method like this (https://github.com/mperry/functionalgroovy/blob/master/typeclass/src/main/groovy/com/github/mperry/fg/typeclass/Monad.groovy).

Groovy cheats by knowing about how to implement the unit method (to create the new structure) for some common collections and then defaults to List (I think) for those it does not know about. This works much of the time, but is fundamentally broken. We need specific map methods for each structure that have specific unit methods that create the appropriate similar structure. This is why implementing the general map method in Functor requires a unit implementation from Monad.

On my todo list is to have a close look at what is available in PCollections and compare it to the immutable data structures available in FunctionalJava.

@blackdrag
Copy link
Member

Mark introductions are often fine and all, but they do for example not really explain why a functor should be a homomorphism. And what implications it has if not... in other words I am missing the bigger picture.

@yukoba
Copy link
Contributor

yukoba commented Jul 31, 2014

I will write https://en.wikibooks.org/wiki/Haskell/Understanding_monads/List more simply.

If a collection library implements these two methods, those collection classes can handle in the same way.

1."Haskell return" == "Functional Groovy unit()"

static <T> List<T> createSingleton(T e) {
    [e]
}

2."Haskell binding operator" == "Functional Groovy flatMap()"
In precise, Haskell concat() and Groovy flatten() is not same. But please ignore.

/** @param f = { T e -> returns List<T> } */
List<T> collectThenFlatten(Closure f) {
    this.collect(f).flatten()
}

However, the characteristic feature of Groovy is Groovy uses normal Java collection classes.
java.util.Collection does not even require a zero arguments constructor.
Isn't it impossible if Groovy uses normal Java collection classes?

@yukoba
Copy link
Contributor

yukoba commented Jul 31, 2014

One possible solution might be this.

First create a marker interface.

public interface GroovyCollection<T> extends Collection<T> {}

GroovyCollection must implement this static method
and must have a public default constructor with no arguments.

public static <T> GroovyCollection<T> createSimilar(GroovyCollection<T> from);

Because asType(Collection col, Class<T> clazz) calls the zero argument constructor,
you can write [1, 2, 3] as MyList, if MyList implements GroovyCollection.

Also in createSimilarCollection() add this.

if (orig instanceof GroovyCollection) {
    return invokeStaticMethod(orig.getClass(), "createSimilar", orig);
}

@yukoba
Copy link
Contributor

yukoba commented Jul 31, 2014

I made a test implementation of above. See yukoba@45402ab .

Normal, @TypeChecked and @CompileStatic work properly,
but IntelliJ IDEA 14 EAP shows errors at @TypeChecked and @CompileStatic.

@yukoba
Copy link
Contributor

yukoba commented Aug 1, 2014

Sorry, GroovyCollection.createSimilar() does not need to be a static method.
I updated here yukoba@745e42a .

@melix
Copy link
Member

melix commented Aug 1, 2014

I think what Mark wants is something like this:

trait CollectionFactory<T> {
    abstract T createSimilar()
}

trait ArrayListFactory<T> implements CollectionFactory<ArrayList<T>> {
    public ArrayList<T> createSimilar() { return new ArrayList<T>() }
}

ArrayList<Integer> foo = ...
def sim = foo.createSimilar() // trait mixed into ArrayList

This cannot be done using an extension module. It also fails compiling, and it's a bug using:

def list = [1,2].withTraits(ArrayListFactory)
list.createSimilar()

Yet, one doesn't need traits to do this. With extension modules, you can already declare those:

class CreateSimilarCollections {
    static <T> ArrayList<T> createSimilar(ArrayList<T> self) { ... }
}

The problem with this approach is that even if the method is available at runtime, there is no marker interface added to the class which will let you know that the method is available. This problem cannot be fixed with the traits approach either: unless Groovy uses class rewriting at load time, the JVM will not let you add methods or interfaces to existing classes.

@yukoba: This approach requires to implement the marker interface, which is probably not what users want. Also calling InvokerHelper.invoke is in general not a good idea, because calls are uncached.

@blackdrag
Copy link
Member

So this problem boils down in the inability to specify generic enough signatures on the typing side and a missing unit morphism for Collections on the practical side. createSimilarCollection cannot be a replacement for unit, because it returns not the exact same container, only a similar one

The Java8 solution for this is the conversion to the Monad Stream and then convert back to whatever is needed. Of course at least the last step is done by the user. And cannot be done by us, since unit is missing or incomplete. For pcollections I can solve the problem of missing unit, but adding a fitting method to the interface everyone is supposed to overwrite.

@yukoba
Copy link
Contributor

yukoba commented Aug 1, 2014

So,

  • java.util.stream.Stream.of() is "Haskell return"
  • java.util.stream.Stream.flatMap() is "Haskell bind operator"

Therefore, java.util.stream.Stream is a list monad. I see.

@yukoba
Copy link
Contributor

yukoba commented Aug 1, 2014

The problem with this approach is that even if the method is available at runtime, there is no marker interface added to the class which will let you know that the method is available.

@melix, is it possible to check the existence of createSimilar() method created by the extension module? If Yes, then we can call it from DefaultGroovyMethodsSupport.createSimilarCollection(). Then marker interface is not needed. It is a duck typing system.

@melix
Copy link
Member

melix commented Aug 1, 2014

It is possible to check the existence of the method at runtime. Performance wise, this would however be terrific.

@yukoba
Copy link
Contributor

yukoba commented Aug 1, 2014

It is possible to check the existence of the method at runtime. Performance wise, this would however by terrific.

The first argument of createSimilarCollection() is Collection, so the checking class is always Collection. And Collection class is usually limited.

If you create two caches

  • Set<Class> groovyCollectionClassSet
  • Set<Class> notGroovyCollectionClassSet

and if you check it before searching createSimilar() method, then I think performance loss is limited.

@melix
Copy link
Member

melix commented Aug 1, 2014

Using cache classes is not really an option. It would be very difficult not to leak classes, especially in a language like Groovy where some classes can have a very short lifecycle. In any case checking against a Set does have a cost. The cascade of if is already a huge performance problem. I agree with Jochen that the JDK8 way is, performance wise, the best option.

@blackdrag
Copy link
Member

I want to add, that we don't have to use Stream. We can make a Monad list and Monad set that is returned by our operations plus conversions methods to other types. Just Java standard lists and sets would not be Monads - maybe not even functors. Wouldn't that be a possible way to go?

@mperry
Copy link
Author

mperry commented Aug 4, 2014

If @melix thinks we can't change the interfaces to existing classes, there is another way.

Scala and Haskell use implicits/typeclasses to add methods to existing types. We could implement something like this for Groovy that would have implications for the extensions modules.

Groovy has the range operator which you can use like this: (1..5).toList(). You can add a to method to Integers using extension modules to do something similar whose usage would be 1.to(5).

This could be done in a new, but similar, way by having compiler supported conversion between two types. This change would support something like the code below:

@Implicit(Integer)
class IntRangeProducer {
    Integer floor

    List<Integer> to(int ceiling) {
        (floor..ceiling).toList()
    }

    Integer toImplicit() {
        floor
    }

}

Then when compiling a call 1.to(5) the compiler would not see a to method on Integer, but knows there is an implicit class that supports converting between integers and IntRange (both ways) and that implicit class has a to method with the right signature. The implicit is created using a single one argument constructor and converted back using the toImplicit method. This could also be used when passing an Integer to a method expecting an IntRangeProducer and vice versa.

A compiler supporting this would, in effect, allow you to be adding interfaces to existing classes without changing the JVM interfaces of existing classes. This could be done in a type safe way. A potential problem is how to handle this using dynamic typing.

http://docs.scala-lang.org/overviews/core/implicit-classes.html

This would allow us to add abstractions to existing classes. Of course, this could be existing Java classes, classes from other Groovy libraries or separate abstractions and couple them in a structural way, rather than in the nominative Java way.

What would be worth exploring is the different ways this could be combined with existing features, like implicits implementing traits, or using the delegatesTo annotation.

Combined with some traits like Eq, Ord, Enum could make things interesting (https://en.wikibooks.org/wiki/Haskell/Classes_and_types).

It might even simplify some Groovy conversions that I suspect are currently done in an ad-hoc way.

traneHead pushed a commit to traneHead/groovy-core that referenced this pull request Jul 29, 2017
…ide a default undefined value for better Java compatibility (closes groovy#498)
@eric-milles
Copy link
Member

As of Groovy 4, the collect variants that accept a collector propagate the collector's type. For example:

@groovy.transform.TypeChecked
void test() {
  def one_two_three = [1,2,3]
  @groovy.transform.ASTTest({
    println node.getNodeMetaData(org.codehaus.groovy.transform.stc.StaticTypesMarker.INFERRED_TYPE)
  })
  def two_times = one_two_three.collect(new HashSet<Number>()) { it * 2 }
}

test()

This script prints "java.util.HashSet<java.lang.Number>" for the inferred type of two_times. This lets you control the type and carry on with that type instead of having to cast from Collection as before.

@mperry
Copy link
Author

mperry commented Sep 14, 2023

I found your example hard to follow @eric-milles, but it is promising if collect preserves the type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants