New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RabitTracker fails to start on IPv6 Spark environment #9118
Comments
At the moment, only dask is supported for IPv6 |
@trivialfis Any idea if using the |
Relevant changes staged here: master...dacort:xgboost:fix/spark-hostip |
Maybe we should add a customization option like what's currently in the dask interface so that users can pick the ip version. |
emm, can we make pyspark xgboost support IPv6 , similar with solution of dask ? I am not familiar with related rabit interface though. |
That would probably be ideal, tho I'm also unsure how much effort that is. Another short-term option might just be using only IPv4 until v6 support can be added? |
I will close this issue in favor of the original IPv6 support feature request here: #7725 . Feel free to continue the discussion there. |
When running a
pyspark.ml.Pipeline
fit on aSparkXGBRegressor
, the pyspark code fails with the following error:Stack trace
When testing the
_get_host_ip
inspark/utils.py
, it seems thatcontext.getTaskInfos()
returns IPv6 addresses when running in a DualStack IPv4/IPv6 network. Additionally, the_get_host_ip
function splits on:
so it only returns the first octet of the IPv6 address.I know that there's some partial support for IPv6, but #7725 is still open. I tried fixing
_get_host_ip
to return the full IPv6 address, but still got the same error.I was able to get it t work by instead using
get_host_ip
fromxgboost.tracker
, which returns IPv4 addresses, but I don't know if that's the "right" approach.The text was updated successfully, but these errors were encountered: