Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROOVY-6944: Add immutable collections #482

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yukoba
Copy link
Contributor

@yukoba yukoba commented Jul 12, 2014

Add immutable collections.
Immutable collections are convenient for backtracking algorithms.

The implementation is based on Blackdrag's PCollections.
http://pcollections.org/

As the license of PCollections is MIT license,
it can be merged to Groovy, which is Apache license.

I modified API, and wrote unit tests, and fixed bugs.

What I changed.

  1. All the public class names start with Immutable.
  2. All the implementation classes are package scope and marked Serializable.
  3. Empty class is renamed to ImmutableCollections.
  4. ImmutableCollections can create not only empty collections but also convert from an Iterable to an immutable collection. HashTreePBag, HashTreePMap and HashTreePSet, which are static convenience classes, are removed.
  5. PVector is renamed to ImmutableList.
  6. POrderedSet is renamed to ImmutableListSet. The name ListSet comes from Scala. I didn't use the name OrderedSet because it does not have a comparator.
  7. minus() of PCollections can remove an element and also can remove a specified position element, but this API creates bugs on using ImmutableList<Integer>. Therefore I changed one to minusAt(). For consistency, I also renamed to plusAt(). And with() is renamed to replaceAt().
  8. Removed PSequence interface. ImmutableStack extends ImmutableList like Stack extends List.
  9. A new instance can be created by [] as ImmutableList.
  10. Iterator of ImmutableList and ImmutableStack are same as the original, but ListIterator is backed by Object[]. I think this is faster than the original.
  11. Added getAt() to ImmutableList, ImmutableStack and ImmutaleListSet.
  12. DefaultGroovyMethods.plus() , minus(), multiply() can handle ImmutableCollections.
  13. minus(no argument) of ImmutableQueue is renamed to tail().
  14. Renamed Collection.asImmutable() to Collection.asUnmodifiable(), and deprecated the original method.
  15. Implemented Object[] as Queue and Object[] as Stack.

@yukoba yukoba changed the title Add immutable collections GROOVY-6944: Add immutable collections Jul 12, 2014
@glaforge
Copy link
Member

"Persistent" might be less confusing than "Immutable" (although "persistent" has other connotations in other contexts)

@melix
Copy link
Member

melix commented Jul 12, 2014

We might ping @blackdrag about this. The idea was to have pcollections integrated into Groovy 3. A premature inclusion might increase the chances of breaking changes between 2 and 3. But it's indeed an idea we had in mind for long 👍

@yukoba
Copy link
Contributor Author

yukoba commented Jul 12, 2014

There is an argument among Scala people about the jargon "persistent" and "immutable" here.
http://stackoverflow.com/questions/3107151/persistent-data-structures-in-scala

I really don't know which jargon is better, but should I use "persistent"?

@melix
Copy link
Member

melix commented Jul 12, 2014

Persistent usually means you have a way to rollback to a previous state, while immutable doesn't guarantee this.

@glaforge
Copy link
Member

Indeed "persistent" means that former states persist even when you add other elements — which actually create a new (immutable) collection. I prefer "persist".

@yukoba
Copy link
Contributor Author

yukoba commented Jul 13, 2014

I was thinking this problem for one day.

Haskell and Scala uses "immutable", but Clojure uses "persistent".
Clojure uses "persistent", but basically programmers do not need to type this word, "persistent".
http://clojure.org/data_structures

There is javax.persistence package already. This is Java Persistence API.
http://docs.oracle.com/javaee/7/api/javax/persistence/package-summary.html
"Persistence" is already used for a database.
Haskell also uses this word for a database.
https://en.wikipedia.org/wiki/Persistence_(computer_science)

If Groovy uses "persistent" for this data structure,
of course data structure professionals do not confuse,
but I think many usual programmers confuse.
Many people think PersistentList will be saving to a database or java.io.Serializable.

Therefore to avoid confuse I think Groovy should also use "immutable" for this data structure,
and I think writing "immutable and persistent" to JavaDoc is enough.

@yukoba
Copy link
Contributor Author

yukoba commented Jul 13, 2014

I updated the patch.

  1. Changed ImmutableQueue to ImmutableDeque. Implemented Deque. Added first(), head() , last() and init().
  2. Changed the scope of ImmutableStack to package scope. Use ImmutableDeque instead. Computational complexities are same.
  3. Changed the scope of ImmutableBag to package scope. Use ImmutableList instead.
  4. Removed getAt() because it is already in DefaultGroovyMethods.

@glaforge
Copy link
Member

I know the confusion with "disk persistence", as that's what I was referring to when saying it had other "connotations". We'll have to make a call on that, perhaps discussing it on the mailing-list for broader attention and feedback.

On another topic, I see you deprecated the asImmutable() methods, and created asUnmodifiable() ones. Shouldn't (or should) asImmutable() now use those immutable / persistent collections instead?
It might be a minor breaking change in the sense we don't return the same collection though, but it'd still be a Set / Collection / List / etc anyway. What do you think?

@yukoba
Copy link
Contributor Author

yukoba commented Jul 13, 2014

OK, I joined dev@groovy.codehaus.org and user@groovy.codehaus.org .

The speed of Java's unmodifiable views are far faster than immutable and persistent collections.
And I think for some people, or might be many people, unmodifiable views are enough.

I think Groovy should keep the method as Collections.unmodifiableCollection(),
and list as ImmutableList is enough for converting from List to ImmutableList.
If you have to call it from Java code, use ImmutableCollections.list(Iterable).

By the way, http://beta.groovy-lang.org/ is very cool!
I found it in the mailing list archive.

@yukoba
Copy link
Contributor Author

yukoba commented Jul 13, 2014

I summarized the jargon usage.

Java Scala Haskell Clojure My opinion
mutable mutable mutable mutable
immutable and transient unmodifiable IArray transient unmodifiable
immutable and persistent immutable immutable persistent immutable
disk persistence persistence persistence persistence

Also I updated the patch, and I added these.

  • List.minusAt(int)
  • Object[].minusAt(int)
  • List.replaceAt(int, Object)
  • Object[].replaceAt(int, Object)

Now I can write as the same way for four types of objects.

// mutable
def list = [1, 2]
list += 3
list = list.minusAt(0)
assert [2, 3] == list

// array
def ary = [1, 2] as Integer[]
ary += 3
ary = ary.minusAt(0)
assert [2, 3] == ary as List

// immutable and transient
def list2 = [1, 2].asUnmodifiable()
list2 += 3
list2 = list2.minusAt(0)
assert [2, 3] == list2

// immutable and presistent
def list3 = [1, 2] as ImmutableList
list3 += 3
list3 = list3.minusAt(0)
assert [2, 3] == list3

@yukoba
Copy link
Contributor Author

yukoba commented Jul 13, 2014

I found that to support the following code correctly,
DefaultGroovyMethodsSupport.createSimilarCollection() should support
Collections.UnmodifiableRandomAccessList correctly.
Collections.UnmodifiableRandomAccessList becomes normal List after List.plus().

// immutable and transient
def list2 = [1, 2].asUnmodifiable()
list2 += 3
list2 = list2.minusAt(0)
assert [2, 3] == list2

@yukoba
Copy link
Contributor Author

yukoba commented Jul 13, 2014

I updated the patch and solved the above problem.
I did not modify DefaultGroovyMethodsSupport.createSimilarCollection().

First, I added isUnmodifiable() to DefaultGroovyMethodsSupport.
Because Collections.UnmodifiableList is a package scope class, I used getClass().getName().

/**
 * Checks whether the specified object is an unmodifiable view or not.
 *
 * @param object an object to check
 */
protected static boolean isUnmodifiable(Object object) {
    return object.getClass().getName().startsWith("java.util.Collections$Unmodifiable");
}

Then I added wrapSimilar() to DefaultGroovyMethodsSupport.
I created Collection, List, Set and Map versions.

/**
 * If the first object is an unmodifiable object, then wrap the second list to an unmodifiable list.
 * If the first object is an immutable collection, then wrap the second list to an immutable list.
 */
protected static <T> List<T> wrapSimilar(Object object, List<T> list) {
    if (object instanceof ImmutableCollection) {
        return ImmutableCollections.list(list);
    } else if (isUnmodifiable(object)) {
        return Collections.unmodifiableList(list);
    } else {
        return list;
    }
}

Then I added wrapSimilar() to DefaultGroovyMethods many places.

Now, unmodifiableList + 2 returns an unmodifiable list,
and such as collect() also returns an unmodifiable or immutable list.

@blackdrag
Copy link
Member

Where to start... first of all, I maintain pcollections now, but it is not originally my project. I am free to make changes to it though. And in that spirit I would prefer changing pcollections to work better with Groovy than to just copy the sources over. And if it is only to avoid having to maintain two coebases.

I guess I could get friendly with Immutable instead of Persistent.

For DGM-integration tough I see a tough task. Creating a persistent collection is usually not cheap. You want to reuse as much as possible to avoid having to create a new thing all the time. But if you look at why createSimilarCollection is used, then it is exactly because we do that bad thing of creating a new container and rebuild it from scratch. Instead to call for example minusAll or plusAll.... it really depends on the case.

That's also why in Java8 you have Streams, it allows internal iteration and operations, specialized for the type's inner structure - thanks to the lazy evaluation ability. Think for example of using grep on a tree like structure. Instead of collecting the values, that are valid and then create a new tree from that, you could go through the tree, and mark branches, that will be cut off, to then later create a new Tree in a much faster manner

All in all I think we should first think about changing pcollections itself.

Oh, one more thing... Speed. ultimatively a Java unmodifiable list can be backed by a simple array. There is nothing faster than a array for such a collection. But my tests show, that for example a HAMT can compete with a HashMap for get operations. It is slower to create from scratch of course, but if I use existing HAMTs to create a new one it can be still very superior to a HashMap.

The point where unmodifiable is not enough anymore is when it comes to multithreading. Here you will want either one of those concurrent structures Java offers, or you will want something immutable. And here I see the biggest advantage of structures like there are in the pcollections library. Because with those you can more easily create concurrent systems without the usage of any memory barriers - or at least, you can dose them very exactly.

@yukoba
Copy link
Contributor Author

yukoba commented Jul 16, 2014

I would prefer changing pcollections to work better with Groovy than to just copy the sources over.

OK, I will recreate the patch.

bad thing of creating a new container and rebuild it from scratch

Actually, many methods have to create a whole List from scratch.
For example, all the element of collect() is new.
Therefore, in this case creating a temporary ArrayList
and convert it to an immutable List is not very slow.
Of course some methods should implement specially.

a HAMT can compete with a HashMap

I read this Clojure implementation,
https://code.google.com/p/clojure/source/browse/trunk/src/jvm/clojure/lang/PersistentVector.java
and it is very fast!

Switching from a binary tree to a hash array mapped trie is just an implementation problem,
and it can be solved by someday someone (or me) implement it!
I mean it is not an API problem.

How the Groovy immutable API should be?

I was thinking the instance creation API, and I found there are four options.

Option 1 (current patch implmentation)

/** @deprecated */
public static List asImmutable(List self) {
    return Collections.unmodifiableList(self);
}
public static List asUnmodifiable(List self) {
    return Collections.unmodifiableList(self);
}
// Create ImmutableList by [] as ImmutableList

Option 2 (I'm now thinking this is best)

// Breaks compatibility but similar to Guillaume Laforge's idea
public static List asImmutable(List self) {
    return ImmutableCollections.list(self);
}
public static List asUnmodifiable(List self) {
    return Collections.unmodifiableList(self);
}
// Also can create ImmutableList by [] as ImmutableList

Option 3

// Let's forget about Collections.UnmodifiableList
public static List asImmutable(List self) {
    return ImmutableCollections.list(self);
}
// Also can create ImmutableList by [] as ImmutableList

Option 4

// Use "persistent" word
public static List asPersistent(List self) {
    return PersistentCollections.list(self);
}
public static List asImmutable(List self) {
    return Collections.unmodifiableList(self);
}
// Also can create PersistentList by [] as PersistentList

Is there any another option?

Option 2 breaks the compatibility a little bit,
but it is most simple and I'm now felling it might be best.
Just say "immutable is immutable, and unmodifiable is newly created" for option 2. 😜

@blackdrag
Copy link
Member

bad thing of creating a new container and rebuild it from scratch

Actually, many methods have to create a whole List from scratch.
For example, all the element of collect() is new.
Therefore, in this case creating a temporary ArrayList
and convert it to an immutable List is not very slow.

Yes, but why first create an ArrayList to then convert it to the immutable list with a possibly other base type (createSimilar doesn't recreate on the exact type)? I was thinking, that to avoid that I should add the empty() method to the base interface of the pcollections classes. That way something like create similiar collection is not required at all.

I read this Clojure implementation,
https://code.google.com/p/clojure/source/browse/trunk/src/jvm/clojure/lang/PersistentVector.java
and it is very fast!

yes, using some kind of not too big "chunk" is in general a good idea to be able to reuse structures, while at the same time, keeping things small if you have to create something new and the array overhead is small. The typical linked list approach for example requires some kind of node class per element I add. That alone already causes a big overhead, since object creation is slow on the JVM. Things get worse, if you have to change the last element in a single linked list, because then you have to create each node new from the end. Using "chunks", all but the last chunk can be reused. And the actual memory performance is better than ArrayList, since if I add too many elements to ArrayList, I have to create a new internal array and that may exceed the available memory already. For the chunk based list, I create a new chunk and a new root array maybe, but that array will stay much smaller.

Switching from a binary tree to a hash array mapped trie is just an implementation problem,
and it can be solved by someday someone (or me) implement it!
I mean it is not an API problem.

I have a HAMT implementation here, I am going to commit it to pcollections as soon as I am done with my tests and clear about the actual performance. Anyway... my comment was directed at persistent collections having bad performance only.

As for how the Groovy immutable API should be....

Option 2 looks best to me.

@yukoba
Copy link
Contributor Author

yukoba commented Jul 16, 2014

I changed to use PCollection and updated the patch.
PCollection update is here. hrldcpr/pcollections#27

As it is not in the Maven central repository, I put the jar file in the lib directory.
Please change it when it is in the Maven central repository,

I also implemented the option 2, but @Immutable still uses unmodifiable collections as original.

I am going to commit it to pcollections as soon as I am done with my tests and clear about the actual performance.

OK, I will wait for it and don't implement the HAMT.

Immutable collections are convenient for backtracking algorithms.

The implementation is based on Blackdrag's PCollections.
http://pcollections.org/

As the license of PCollections is MIT license,
it can be merged to Groovy, which is Apache license.

I modified API, and wrote unit tests, and fixed bugs.

What I changed.
1. All the public class names start with `Immutable`.
2. All the implementation classes are package scope and marked `Serializable`.
3. `Empty` class is renamed to `ImmutableCollections`.
4. `ImmutableCollections` can create not only empty collections but also convert from an `Iterable` to an immutable collection. HashTreePBag, HashTreePMap and  HashTreePSet, which are static convenience classes, are removed.
5. `PVector` is renamed to `ImmutableList`.
6. `POrderedSet` is renamed to `ImmutableListSet`. The name `ListSet` comes from Scala. I didn't use the name `OrderedSet` because it does not have a comparator.
7. `minus()` of PCollections can remove an element and also can remove a specified position element, but this API creates bugs on using `ImmutableList<Integer>`. Therefore I changed one to `minusAt()`. For consistency, I also renamed to `plusAt()`. And `with()` is renamed to `replaceAt()`.
8. Removed `PSequence` interface. `ImmutableStack` extends `ImmutableList` like `Stack` extends `List`.
9. A new instance can be created by `[] as ImmutableList`.
10. `Iterator` of `ImmutableList` and `ImmutableStack` are same as the original, but `ListIterator` is backed by `Object[]`. I think this is faster than the original.
11. `DefaultGroovyMethods.plus()` , `minus()`, `multiply()` can handle `ImmutableCollections`.
12. `minus(no argument)` of `ImmutableQueue` is renamed to `tail()`.
13. Renamed `Collection.asImmutable()` to `Collection.asUnmodifiable()`, and deprecated the original method.
14. Implemented `Object[] as Queue` and `Object[] as Stack`.
15. Changed `ImmutableQueue` to `ImmutableDeque`.
16. Changed scope of `ImmutableStack` to package scope. Use `ImmutableDeque` instead.
17. Changed scope of `ImmutableBag` to package scope. Use `ImmutableList` instead.
18. Added `minusAt()` and `replaceAt()` to `List` and `Object[]`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants