You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My environment is spark 2.4.0 and xgboost4j-spark_2.11 1.0.0. When I run a pipeline contains vectorAssembler and ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier to fit Dataset type data, I get a error like this:
Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=10.90.128.79, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=1}
20/05/12 12:06:02 ERROR RabitTracker: Uncaught exception thrown by worker:
org.apache.spark.SparkException: Job 1 cancelled because SparkContext was shut down
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:935)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:933)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:933)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:2139)
at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2052)
at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1963)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1962)
at org.apache.spark.TaskFailedListener$$anon$1$$anonfun$run$1.apply$mcV$sp(SparkParallelismTracker.scala:119)
at org.apache.spark.TaskFailedListener$$anon$1$$anonfun$run$1.apply(SparkParallelismTracker.scala:119)
at org.apache.spark.TaskFailedListener$$anon$1$$anonfun$run$1.apply(SparkParallelismTracker.scala:119)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.TaskFailedListener$$anon$1.run(SparkParallelismTracker.scala:118)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:740)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2075)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2096)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2115)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2140)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933)
20/05/12 12:06:07 ERROR XGBoostSpark: the job was aborted due to
ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:697)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:572)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:190)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:40)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:82)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
Can anyone help me??
Thank you so much
The text was updated successfully, but these errors were encountered:
trivialfis
changed the title
ERROR XGBoostTaskFailedListener: Training Task Failed during XGBoost Training
[JVM Packages] ERROR XGBoostTaskFailedListener: Training Task Failed during XGBoost Training
May 13, 2020
#6019 added an option to avoid killing SparkContext, by setting kill_spark_context_on_worker_failure to true.
where to set this parameter? when initialize the XGBoostClassifier or other places?
I've reproduced this problem for many times stably failed, but success with smaller subset of my dataset,
version:
spark 2.4
xgboost4j 0.90
jdk 1.8
My environment is spark 2.4.0 and xgboost4j-spark_2.11 1.0.0. When I run a pipeline contains vectorAssembler and ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier to fit Dataset type data, I get a error like this:
Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=10.90.128.79, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=1}
20/05/12 12:06:02 ERROR XGBoostTaskFailedListener: Training Task Failed during XGBoost Training: TaskKilled(another attempt succeeded,Vector(AccumulableInfo(73,None,Some(332),None,false,true,None), AccumulableInfo(75,None,Some(0),None,false,true,None)),Vector(LongAccumulator(id: 73, name: Some(internal.metrics.executorRunTime), value: 332), LongAccumulator(id: 75, name: Some(internal.metrics.resultSize), value: 0))), stopping SparkContext
20/05/12 12:06:02 ERROR RabitTracker: Uncaught exception thrown by worker:
org.apache.spark.SparkException: Job 1 cancelled because SparkContext was shut down
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:935)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:933)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:933)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:2139)
at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2052)
at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1963)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1962)
at org.apache.spark.TaskFailedListener$$anon$1$$anonfun$run$1.apply$mcV$sp(SparkParallelismTracker.scala:119)
at org.apache.spark.TaskFailedListener$$anon$1$$anonfun$run$1.apply(SparkParallelismTracker.scala:119)
at org.apache.spark.TaskFailedListener$$anon$1$$anonfun$run$1.apply(SparkParallelismTracker.scala:119)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.TaskFailedListener$$anon$1.run(SparkParallelismTracker.scala:118)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:740)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2075)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2096)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2115)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2140)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933)
20/05/12 12:06:07 ERROR XGBoostSpark: the job was aborted due to
ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:697)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:572)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:190)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:40)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:82)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
Can anyone help me??
Thank you so much
The text was updated successfully, but these errors were encountered: