java.lang.ArrayIndexOutOfBoundsException #3

ludwig-nc · 2018-11-21T13:14:34Z

I am running on Spark 2.1.0 and I could compile the library without problems. I could run the examples and a smaller subset of my Problem (10,000 data points). When I increase the problem size (100,000 data points) I get the following error. Any ideas? Do the data Point IDs need to be continuous from 0 to len(data_points)? I used hashed values for the IDs and it almost look like an array value is accessed by the ID.

java.lang.ArrayIndexOutOfBoundsException: 33567921
at org.apache.spark.graphx.impl.EdgePartition.aggregateMessagesEdgeScan(EdgePartition.scala:395)
at org.apache.spark.graphx.impl.GraphImpl$$anonfun$13$$anonfun$apply$3.apply(GraphImpl.scala:237)
at org.apache.spark.graphx.impl.GraphImpl$$anonfun$13$$anonfun$apply$3.apply(GraphImpl.scala:207)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

viirya · 2018-12-23T01:01:48Z

What do you mean you hashed values for the IDs?

ludwig-nc · 2019-01-30T08:34:18Z

The IDs for my elements are strings but the framework is using longs for IDs. I map all my strings to longs by using the first 8 bytes of the md5-hash as an integer. This results in ID numbers that are non-continuous, which may or may not be a reason for problems. In short, does the framework assume that the IDs are continuous from 0 to N, where N is the number of data points?

540667387 · 2021-03-17T03:13:38Z

Did the author find out the reason? I think I made the same mistake

viirya · 2021-05-30T23:47:19Z

Hmm, I don't think this requires continuous ids. As you can see the input similarities are RDD[(Long, Long, Double), and the similarity matrix could be sparse.

Could you provide complete stack trace?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.ArrayIndexOutOfBoundsException #3

java.lang.ArrayIndexOutOfBoundsException #3

ludwig-nc commented Nov 21, 2018

viirya commented Dec 23, 2018

ludwig-nc commented Jan 30, 2019

540667387 commented Mar 17, 2021

viirya commented May 30, 2021

java.lang.ArrayIndexOutOfBoundsException #3

java.lang.ArrayIndexOutOfBoundsException #3

Comments

ludwig-nc commented Nov 21, 2018

viirya commented Dec 23, 2018

ludwig-nc commented Jan 30, 2019

540667387 commented Mar 17, 2021

viirya commented May 30, 2021