Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Container Build Error (Ubuntu 20.04 base image) #5415

Closed
Andreas237 opened this issue Aug 11, 2022 · 6 comments
Closed

CUDA Container Build Error (Ubuntu 20.04 base image) #5415

Andreas237 opened this issue Aug 11, 2022 · 6 comments
Labels

Comments

@Andreas237
Copy link

Description

Building LightGBM and the Python-interface goes smoothly. However, upon test I receive a "ModuleNotFound" error. My goal is to compile LightGBM for CUDA 11.5 for use with TensorFlow 2.9.1-gpu base image. Docker build output is:

Reproducible example

FROM tensorflow/tensorflow:2.9.1-gpu

RUN mkdir /work
WORKDIR /work

#################################################################################################################
#           Global
#################################################################################################################
# apt-get to skip any interactive post-install configuration steps with DEBIAN_FRONTEND=noninteractive and apt-get install -y

ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive

#################################################################################################################
#           Global Path Setting
#################################################################################################################

ENV CUDA_HOME /usr/local/cuda
ENV LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${CUDA_HOME}/lib64
ENV LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/usr/local/lib

ENV OPENCL_LIBRARIES /usr/local/cuda/lib64
ENV OPENCL_INCLUDE_DIR /usr/local/cuda/include

ENV PYTHONPATH /usr/lib64/python3.8/site-packages:/work/src:/work/src/include:..:.

#################################################################################################################
#           TINI
#################################################################################################################

# Install tini
ENV TINI_VERSION v0.14.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini

#################################################################################################################
#           SYSTEM
#################################################################################################################
# update: downloads the package lists from the repositories and "updates" them to get information on the newest versions of packages and their
# dependencies. It will do this for all repositories and PPAs.
RUN echo "deb [trusted=yes] http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN echo "deb [trusted=yes] http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/tensorRT.list 
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub



RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    bzip2 \
    ca-certificates \
    libglib2.0-0 \
    libxext6 \
    libsm6 \
    libxrender1 \
    git \
    cmake \
    libboost-dev \
    libboost-system-dev \
    libboost-filesystem-dev \
    gcc \
    g++ 



# Add OpenCL ICD files for LightGBM
RUN mkdir -p /etc/OpenCL/vendors && \
    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

#################################################################################################################
#           PIP
#################################################################################################################

# Python Dependencies

RUN pip3 install --upgrade pip
RUN pip3 install \
        numpy \
        protobuf \
        sklearn==0.0 \
        scikit-optimize==0.9.0 \
        imblearn==0.0 \
        pandas==1.4.3 \
        redis==4.3.4 \
        tensorflow-serving-api-gpu==2.9.1 \
        tensorflow_probability==0.17.0 \
        gast \
        connexion[swagger-ui]==2.14.0 \
        SharedArray \
        python-socketio[client]==5.7.1 \
        flask_cors \
        filelock \
        setuptools>=58.2.0 \
        scipy \ 
        scikit-learn


#################################################################################################################
#           LightGBM
#################################################################################################################

RUN cd /usr/local/src && mkdir lightgbm && cd lightgbm && \
    git config --global http.sslverify false && \
    git clone --recursive --branch v3.1.1 --depth 1 https://github.com/microsoft/LightGBM && \
    cd LightGBM && mkdir build && cd build && \
    cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ -DUSE_CUDA=1 -DUSE_DEBUG=1 .. && \
    make OPENCL_HEADERS=/usr/local/cuda/include/ LIBOPENCL=/usr/local/cuda/lib64/libOpenCL.so



ENV PATH /usr/local/src/lightgbm/LightGBM:${PATH}
RUN /bin/bash -c "cd /usr/local/src/lightgbm/LightGBM/python-package && python3 setup.py install --cuda --opencl-include-dir=/usr/local/cuda/include/ --opencl-library=/usr/local/cuda/lib64/libOpenCL.so"
# RUN pip3 install lightgbm

# Check that the Python lib at least exists
RUN cd /usr/local/src/lightgbm/LightGBM/examples/python-guide && python3 simple_example.py

Environment info

GPU: NVIDIA A100
Host OS: CentOS 7
Architecture: x86_64

LightGBM version or commit hash

v3.3.2

Command(s) you used to install LightGBM

Please see the dockerfile. Found install instructions at lightgbm.readthedocs.io. Note: the flag -DUSE_CUDA_EXP=1 gets the following warning during docker build, so I used -DUSE_CUDA=1 instead.

  CMake Warning:
      Manually-specified variables were not used by the project:
          USE_CUDA_EXP

I double checked the Python-Interface instructions on Github. Note that useing python3 setup.py install --cuda-exp yields the following error during docker build.

o    Step 25/26 : RUN /bin/bash -c "cd /usr/local/src/lightgbm/LightGBM/python-package && python3 setup.py install --cuda-exp --opencl-include-dir=/usr/local/cuda/include/ --opencl-library=/usr/local/cuda/lib64/libOpenCL.so"

o     ---> Running in 7520b681f51d

o    usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]

o       or: setup.py --help [cmd1 cmd2 ...]

o       or: setup.py --help-commands

o       or: setup.py cmd --help

o    error: option --cuda-exp not recognized

Additional Comments

If the line RUN /bin/bash -c "cd /usr/local/src/lightgbm/LightGBM/python-package && python3 setup.py install --cuda --opencl-include-dir=/usr/local/cuda/include/ --opencl-library=/usr/local/cuda/lib64/libOpenCL.so" is replaced with RUN pip3 install lightgbm then the image builds.

So somehow following the instructions isn't putting LightGBM into site packages?

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM, and for the excellent write-up with a reproducible example!

The issue you're facing is because of this line:

git clone --recursive --branch v3.1.1 --depth 1 https://github.com/microsoft/LightGBM

The cuda_exp build of LightGBM wasn't available in release v3.1.1. That work, originally introduced in #4630, was only introduced a few months ago.

I recommend removing --branch v3.1.1 from that command.

@Andreas237
Copy link
Author

Hi @jameslamb,

Thank you for your quick reply! Now I am getting an error that LightGBM cannot be found. I added the stage RUN python3 -m pip show lightgbm to the Dockerfile.

Build log

Step 1/27 : FROM tensorflow/tensorflow:2.9.1-gpu
 ---> 2448d376890d
Step 2/27 : RUN mkdir /work
 ---> Using cache
 ---> e5f233caebbf
Step 3/27 : WORKDIR /work
 ---> Using cache
 ---> e2b3b12dcfbc
Step 4/27 : ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
 ---> Using cache
 ---> 139046b856cd
Step 5/27 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Using cache
 ---> d64e8eedea1c
Step 6/27 : ENV CUDA_HOME /usr/local/cuda
 ---> Using cache
 ---> ff5ed87e94c6
Step 7/27 : ENV LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${CUDA_HOME}/lib64
 ---> Using cache
 ---> 7781528dedbf
Step 8/27 : ENV LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/usr/local/lib
 ---> Using cache
 ---> dc4049e5d742
Step 9/27 : ENV OPENCL_LIBRARIES /usr/local/cuda/lib64
 ---> Using cache
 ---> 266cf74eb22a
Step 10/27 : ENV OPENCL_INCLUDE_DIR /usr/local/cuda/include
 ---> Using cache
 ---> 2f1f6ff03a22
Step 11/27 : ENV PYTHONPATH /usr/lib64/python3.8/site-packages:/work/src:/work/src/include:..:.
 ---> Using cache
 ---> 9dd6d5336006
Step 12/27 : ENV TINI_VERSION v0.14.0
 ---> Using cache
 ---> 3156bebb0eb9
Step 13/27 : ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
Downloading [==================================================>]  19.89kB/19.89kB
 ---> Using cache
 ---> 297a2008a059
Step 14/27 : RUN chmod +x /tini
 ---> Using cache
 ---> e054e9df7fc5
Step 15/27 : RUN echo "deb [trusted=yes] http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list
 ---> Using cache
 ---> ad8e44eed6ac
Step 16/27 : RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
 ---> Using cache
 ---> 7d71f173b772
Step 17/27 : RUN echo "deb [trusted=yes] http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/tensorRT.list
 ---> Using cache
 ---> c93573fb7496
Step 18/27 : RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub
 ---> Using cache
 ---> 75a0558b34e2
Step 19/27 : RUN apt-get update &&     apt-get install -y --no-install-recommends     build-essential     curl     bzip2     ca-certificates     libglib2.0-0     libxext6     libsm6     libxrender1     git     cmake     libboost-dev     libboost-system-dev     libboost-filesystem-dev     gcc     g++
 ---> Using cache
 ---> 6608b5c003ae
Step 20/27 : RUN mkdir -p /etc/OpenCL/vendors &&     echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
 ---> Using cache
 ---> d7c64f2fbd7a
Step 21/27 : RUN pip3 install --upgrade pip
 ---> Using cache
 ---> 5355356e1bbf
Step 22/27 : RUN pip3 install         numpy         protobuf         sklearn==0.0         scikit-optimize==0.9.0         imblearn==0.0         pandas==1.4.3         redis==4.3.4         tensorflow-serving-api-gpu==2.9.1         tensorflow_probability==0.17.0         gast         connexion[swagger-ui]==2.14.0         SharedArray         python-socketio[client]==5.7.1         flask_cors         filelock         setuptools>=58.2.0         scipy         scikit-learn
 ---> Using cache
 ---> 03700cadecf9
Step 23/27 : RUN cd /usr/local/src && mkdir lightgbm && cd lightgbm &&     git config --global http.sslverify false &&     git clone --recursive --depth 1 https://github.com/microsoft/LightGBM &&     cd LightGBM && mkdir build && cd build &&     cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ -DUSE_CUDA=1 -DUSE_DEBUG=1 .. &&     make OPENCL_HEADERS=/usr/local/cuda/include/ LIBOPENCL=/usr/local/cuda/lib64/libOpenCL.so
 ---> Using cache
 ---> e4cc92783395
Step 24/27 : ENV PATH /usr/local/src/lightgbm/LightGBM:${PATH}
 ---> Using cache
 ---> 18c664a261e2
Step 25/27 : RUN /bin/bash -c "cd /usr/local/src/lightgbm/LightGBM/python-package && python3 setup.py install --cuda --opencl-include-dir=/usr/local/cuda/include/ --opencl-library=/usr/local/cuda/lib64/libOpenCL.so"
 ---> Using cache
 ---> e993681eb3cb
Step 26/27 : RUN python3 -m pip show lightgbm
 ---> Running in 9dedae496e14
WARNING: Package(s) not found: lightgbm
The command '/bin/bash -c python3 -m pip show lightgbm' returned a non-zero code: 1

Dockerfile

FROM tensorflow/tensorflow:2.9.1-gpu


RUN mkdir /work
WORKDIR /work

#################################################################################################################
#           Global
#################################################################################################################
# apt-get to skip any interactive post-install configuration steps with DEBIAN_FRONTEND=noninteractive and apt-get install -y

ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive

#################################################################################################################
#           Global Path Setting
#################################################################################################################

ENV CUDA_HOME /usr/local/cuda
ENV LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${CUDA_HOME}/lib64
ENV LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/usr/local/lib

ENV OPENCL_LIBRARIES /usr/local/cuda/lib64
ENV OPENCL_INCLUDE_DIR /usr/local/cuda/include

ENV PYTHONPATH /usr/lib64/python3.8/site-packages:/work/src:/work/src/include:..:.

#################################################################################################################
#           TINI
#################################################################################################################

# Install tini
ENV TINI_VERSION v0.14.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini

#################################################################################################################
#           SYSTEM
#################################################################################################################
# update: downloads the package lists from the repositories and "updates" them to get information on the newest versions of packages and their
# dependencies. It will do this for all repositories and PPAs.
# RUN echo "deb [trusted=yes] http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list
# RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
RUN echo "deb [trusted=yes] http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN echo "deb [trusted=yes] http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/tensorRT.list 
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub









RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    bzip2 \
    ca-certificates \
    libglib2.0-0 \
    libxext6 \
    libsm6 \
    libxrender1 \
    git \
    cmake \
    libboost-dev \
    libboost-system-dev \
    libboost-filesystem-dev \
    gcc \
    g++ 



# Add OpenCL ICD files for LightGBM
RUN mkdir -p /etc/OpenCL/vendors && \
    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

#################################################################################################################
#           PIP
#################################################################################################################

# Python Dependencies

RUN pip3 install --upgrade pip
RUN pip3 install \
        numpy \
        protobuf \
        sklearn==0.0 \
        scikit-optimize==0.9.0 \
        imblearn==0.0 \
        pandas==1.4.3 \
        redis==4.3.4 \
        tensorflow-serving-api-gpu==2.9.1 \
        tensorflow_probability==0.17.0 \
        gast \
        connexion[swagger-ui]==2.14.0 \
        SharedArray \
        python-socketio[client]==5.7.1 \
        flask_cors \
        filelock \
        setuptools>=58.2.0 \
        scipy \ 
        scikit-learn


#################################################################################################################
#           LightGBM
#################################################################################################################

# Change line:     git clone --recursive --branch v3.1.1 --depth 1 https://github.com/microsoft/LightGBM && \
RUN cd /usr/local/src && mkdir lightgbm && cd lightgbm && \
    git config --global http.sslverify false && \
    git clone --recursive --depth 1 https://github.com/microsoft/LightGBM && \
    cd LightGBM && mkdir build && cd build && \
    cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ -DUSE_CUDA=1 -DUSE_DEBUG=1 .. && \
    make OPENCL_HEADERS=/usr/local/cuda/include/ LIBOPENCL=/usr/local/cuda/lib64/libOpenCL.so



ENV PATH /usr/local/src/lightgbm/LightGBM:${PATH}
RUN /bin/bash -c "cd /usr/local/src/lightgbm/LightGBM/python-package && python3 setup.py install --cuda --opencl-include-dir=/usr/local/cuda/include/ --opencl-library=/usr/local/cuda/lib64/libOpenCL.so"
# RUN pip3 install lightgbm

RUN python3 -m pip show lightgbm

# Check that the Python lib at least exists
RUN cd /usr/local/src/lightgbm/LightGBM/examples/python-guide && \
    python3 simple_example.py

@jameslamb
Copy link
Collaborator

Try changing

python3 -m pip show lightgbm

to

pip3 show lightgbm

I'm not familiar with the base image you're using, but it seems to me that your code cares about the difference between pip vs. pip3 and python vs. python3 so maybe its paths or aliases are set up in a way that you need to use those version-specific commands.

@Andreas237
Copy link
Author

I just use python3 and pip3 notation for clarification, the base image actually has no python2.x. The issue actually ended up being that setup.py didn't place the module in site-packages. Adding the --user flag fixed it!

@jameslamb
Copy link
Collaborator

I see, ok. Glad that worked for you!

Thanks for using LightGBM, and for taking the time to come back and close this with an explanation that other people can find from search engines 😊

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants