You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, the XGBoost binary wheel size recently exceeded 200MB limit set by PyPi. I propose we remove the static linking of NCCL in our binary wheel to reduce the size. #7930 helped enable training distributed GPU without NCCL and as a result, XGBoost can work on MNMG without NCCL, albeit might suffer some performance loss. We can optionally emit a warning for users to install NCCL as a runtime dependency and dlopen the shared object when it's needed.
Alternative
Compile our own nccl with a smaller set of supported archs. I removed the sm_35 and sm_50 from NCCL build, the striped static library is of size 190M build/lib/libnccl_static.a. I'm not sure if this can actually help in the long term.
Reduce the number of GPU archs supported by XGBoost.
Maybe submit a feature request for nccl to share a binary wheel on PyPi?
@RAMitchell worries that the performance hit might be too much and a simple warning is not sufficient for asking users to install nccl (which is indeed not an easy task for pip users).
Hi all, the XGBoost binary wheel size recently exceeded 200MB limit set by PyPi. I propose we remove the static linking of NCCL in our binary wheel to reduce the size. #7930 helped enable training distributed GPU without NCCL and as a result, XGBoost can work on MNMG without NCCL, albeit might suffer some performance loss. We can optionally emit a warning for users to install NCCL as a runtime dependency and
dlopen
the shared object when it's needed.Alternative
sm_35
andsm_50
from NCCL build, the striped static library is of size190M build/lib/libnccl_static.a
. I'm not sure if this can actually help in the long term.Related: cupy/cupy#4850
The text was updated successfully, but these errors were encountered: