Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] 1.2.0 Release #5970

Closed
9 tasks done
hcho3 opened this issue Aug 2, 2020 · 24 comments
Closed
9 tasks done

[RFC] 1.2.0 Release #5970

hcho3 opened this issue Aug 2, 2020 · 24 comments

Comments

@hcho3
Copy link
Collaborator

hcho3 commented Aug 2, 2020

Roadmap: #5734

We are about to release version 1.2.0 of XGBoost. In the next two weeks, we invite everyone to try out the release candidate (RC).

Feedback period: until the end of August 21, 2020. No new feature will be added to the release; only critical bug fixes will be added.

@dmlc/xgboost-committer

Now available

  • Python package. RC2 available on PyPI. Try it out with the command
python3 -m pip install xgboost==1.2.0rc2
  • R package. RC2 available from the Releases section. Download the tarball file xgboost_1.2.0.1.tar.gz and run
R CMD INSTALL xgboost_1.2.0.1.tar.gz

Rendered R manual

  • JVM packages. RC2 available from our Maven repository. Add XGBoost4J as dependency to your Java application.
Show instructions (Maven/SBT)

Maven

<dependencies>
  ...
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j_2.12</artifactId>
      <version>1.2.0-RC2</version>
  </dependency>
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j-spark_2.12</artifactId>
      <version>1.2.0-RC2</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>XGBoost4J Release Repo</id>
    <name>XGBoost4J Release Repo</name>
    <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release/</url>
  </repository>
</repositories>

SBT

libraryDependencies ++= Seq(
  "ml.dmlc" %% "xgboost4j" % "1.2.0-RC2",
  "ml.dmlc" %% "xgboost4j-spark" % "1.2.0-RC2"
)
resolvers += ("XGBoost4J Release Repo"
              at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release/")

Starting from 1.2.0, XGBoost4J-Spark supports training with NVIDIA GPUs. To enable this capability, download artifacts suffixed with -gpu, as follows:

Show instructions (Maven/SBT)

Maven

<dependencies>
  ...
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j-gpu_2.12</artifactId>
      <version>1.2.0-RC2</version>
  </dependency>
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j-spark-gpu_2.12</artifactId>
      <version>1.2.0-RC2</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>XGBoost4J Release Repo</id>
    <name>XGBoost4J Release Repo</name>
    <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release/</url>
  </repository>
</repositories>

SBT

libraryDependencies ++= Seq(
  "ml.dmlc" %% "xgboost4j-gpu" % "1.2.0-RC2",
  "ml.dmlc" %% "xgboost4j-spark-gpu" % "1.2.0-RC2"
)
resolvers += ("XGBoost4J Release Repo"
              at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release/")

Deprecation notices

  • Starting from this release, XGBoost requires Python 3.6 or later.
  • CUDA 10.0 or later is now required. (For Windows platform, we require CUDA 10.1.)
  • XGBoost4J and XGBoost4J-Spark have fully transitioned to Spark 3.0.0. Therefore, Scala 2.12 is now required, and Scala 2.11 is no longer supported.

TODOs

  • Create a new branch release_1.2.0.
  • Create Python wheels and upload to PyPI.
  • Upload RC1 to our Maven repo.
  • Create a tarball for the R package and upload to the Releases section
  • Write release note

PRs that are back ported to release branch.

@trivialfis
Copy link
Member

Could you please add some docs around how to do each of the listed step as XGBoost's maintainer? (so any XGBoost specific issue, like pypi account, approval etc).

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 2, 2020

@trivialfis Here are the steps for Python package:

  1. Create a new release branch.
  2. Push a commit to the branch to update the version number to RC1.
  3. Wait until the CI system builds artifacts. Currently, we use Jenkins and Travis CI to build Windows, Mac, and Linux binaries. The binaries are uploaded to https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html.
  4. Now upload the binary wheels using the twine module: python -m twine upload *.whl. This will require PyPI credentials.

@trivialfis
Copy link
Member

Got it. Thanks.

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 2, 2020

I spent a fair amount of effort to automate building release binaries. What's not automated is to upload them to distribution channels (PyPI, CRAN, Maven Central etc).

@hcho3 hcho3 pinned this issue Aug 2, 2020
@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 2, 2020

TODO: Investigate whether GPU algorithm can be enabled in the JAR artifact. If the JAR file contains CUDA code, will it work on a cluster without a GPU? I will need to test and find out.

@Craigacp
Copy link
Contributor

Craigacp commented Aug 4, 2020

That depends on how the library loading is done. If the GPU binary tries to dynamically load CUDA libraries then you'll get an UnsatisfiedLinkError out of the native loader. You could probe for the presence of the CUDA libraries and then conditionally load the GPU binary, otherwise load the CPU one, but that's tricky (potentially it could catch the UnsatisfiedLinkError, log it, and then say falling back to CPU, but it could fail with UnsatisfiedLinkError for other reasons e.g. lack of OpenMP, which would be confusing). This will blow up the JAR size as you'd need two copies of everything. In most Java projects the CPU and GPU artifacts are different (e.g. https://search.maven.org/search?q=g:com.microsoft.onnxruntime), but this can cause issues in downstream builds as at some point a developer has to choose whether they want a CPU or GPU binary. Fortunately in production you can just drop the GPU one higher up the classpath and it'll load just fine.

@trivialfis
Copy link
Member

There's a stub library of CUDA linked into XGBoost so link time error should not happen. But I agree that we should have better tests.

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 4, 2020

@Craigacp The GPU algorithm uses NCCL to perform allreduce, and including the NCCL library in the JAR file increases its size to 150 MB. Does Maven Central accept this large artifact? cc @CodingCat @sperlingxx

@Craigacp
Copy link
Contributor

Craigacp commented Aug 4, 2020

It does. For example libtensorflow jni GPU for 1.15 is 355 MB.

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 6, 2020

@Craigacp @wbo4958 Here are the JAR files I built with GPU algorithm enabled:

To install:

mvn install:install-file -Dfile=./xgboost4j_2.12-1.2.0-RC1.jar -DgroupId=ml.dmlc \
    -DartifactId=xgboost4j_2.12 -Dversion=1.2.0-RC1 -Dpackaging=jar
mvn install:install-file -Dfile=./xgboost4j-spark_2.12-1.2.0-RC1.jar -DgroupId=ml.dmlc \
    -DartifactId=xgboost4j-spark_2.12 -Dversion=1.2.0-RC1 -Dpackaging=jar

@Craigacp
Copy link
Contributor

Craigacp commented Aug 7, 2020

The xgboost4j_2.12-1.2.0-RC1.jar loads just fine on Oracle Linux 7 (roughly equivalent to RHEL/CentOS 7). The error message when you try to run on GPU if there are only CPUs could probably do with prettying up a little though:

|  Exception ml.dmlc.xgboost4j.java.XGBoostError: [22:26:24] /workspace/src/gbm/gbtree.cc:459: Check failed: common::AllVisibleGPUs() >= 1 (0 vs. 1) : No visible GPU is found for XGBoost.
Stack trace:
  [bt] (0) /tmp/libxgboost4j12036788363652521596.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x57) [0x7fc89b2dec37]
  [bt] (1) /tmp/libxgboost4j12036788363652521596.so(xgboost::gbm::GBTree::GetPredictor(xgboost::HostDeviceVector<float> const*, xgboost::DMatrix*) const+0x531) [0x7fc89b3c0a91]
  [bt] (2) /tmp/libxgboost4j12036788363652521596.so(xgboost::gbm::GBTree::PredictBatch(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int)+0x32) [0x7fc89b3c0cc2]
  [bt] (3) /tmp/libxgboost4j12036788363652521596.so(xgboost::LearnerImpl::UpdateOneIter(int, std::shared_ptr<xgboost::DMatrix>)+0x2c1) [0x7fc89b3f1521]
  [bt] (4) /tmp/libxgboost4j12036788363652521596.so(XGBoosterUpdateOneIter+0x55) [0x7fc89b2e1785]
  [bt] (5) [0x7fc960a8a5d7]


|        at XGBoostJNI.checkCall (XGBoostJNI.java:48)
|        at Booster.update (Booster.java:180)
|        at XGBoost.trainAndSaveCheckpoint (XGBoost.java:202)
|        at XGBoost.train (XGBoost.java:284)
|        at XGBoost.train (XGBoost.java:112)
|        at XGBoost.train (XGBoost.java:83)```

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 7, 2020

@Craigacp Did you set tree_method='gpu_hist'? You should be able to use CPU algorithm with tree_method='hist'.

@Craigacp
Copy link
Contributor

Craigacp commented Aug 7, 2020

@hcho3 yes, I intentionally set it to use gpu_hist to see what the failure mode was on a CPU only machine. I admit I didn't check the standard CPU algorithm in my quick test, but I assume that's still fine. The default isn't changed over to gpu_hist right?

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 7, 2020

@Craigacp No, you have to explicitly opt into gpu_hist.

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 7, 2020

Thanks, we can maybe clarify in the message about GPU being unavailable.

@trivialfis
Copy link
Member

@hcho3 All blocking PRs are merged into master branch. I will back port them today.

@trivialfis
Copy link
Member

Back port PR: #6002

@trivialfis
Copy link
Member

I will back port them today.

Merged.

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 12, 2020

Great! I'm preparing RC2 now.

@hcho3 hcho3 changed the title [RFC] 1.2.0 Release Candidate [RFC] 1.2.0 Release Candidate 2 Aug 12, 2020
@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 12, 2020

RC2 is now up. I've also uploaded JVM packages xgboost4j-gpu and xgboost4j-spark-gpu where the GPU algorithm is enabled.

@JohnZed
Copy link
Contributor

JohnZed commented Aug 12, 2020

Maybe change the phrasing to "CUDA 10.0 or later is required"? Same with python version?

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 12, 2020

@JohnZed Fixed.

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 23, 2020

@dmlc/xgboost-committer XGBoost 1.2.0 has been now released to PyPI and our Maven repository.

@hetong007 Can we submit 1.2.0 to CRAN? Let's submit after Aug 24, when CRAN maintainers return from vacation.

@CodingCat We should make 1.1.1 and 1.2.0 available on Maven Central. Is there anything I can help?

@hcho3 hcho3 closed this as completed Sep 4, 2020
@hcho3
Copy link
Collaborator Author

hcho3 commented Sep 4, 2020

1.2.0 is now on CRAN.

@hcho3 hcho3 unpinned this issue Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants