New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On CPU-only machine received OSError from importing: libcublas.so.11: cannot open shared object file #88869
Comments
[Edit] Hmm, I can reproduce this even though
And reason for that is:
cc: @syed-ahmed @ptrblck |
@weiliw-amz but to unblock yourself, please consider installing a CPU-only version of pytorch, by running |
Interestingly enough, I can reproduce this with the I wonder what makes the |
@seemethere I guess On
On
And if one to look into |
@syed-ahmed @ptrblck Any recommendations on how deal with this issue ? |
IMO RPATH would not work here, as it's not a relative, but rather a different absolute path and could be vastly different if users choose to use Or, may be we should change wheels to be non-purelibs (this way PyTorch and its dependencies will always be installed in the same folder) |
@malfet according do definition https://peps.python.org/pep-0427/#what-s-the-deal-with-purelib-vs-platlib
From what I see Pytorch Linux wheels should be supported on any linux platforms. So are cudnn wheels: https://pypi.org/project/nvidia-cudnn-cu11/ Should we just align to use same |
It could be done, but I think will take as a while. Something like that would fix the problem and seem like a much less intrusive change that modifying the package location: try:
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
except OSError as e:
ctypes.CDLL("/usr/local/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11")
ctypes.CDLL("/usr/local/lib/python3.7/site-packages/nvidia/cudnn/lib/libcudnn.so.8")
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) |
If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem. Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders. Test plan: ``` docker pull amazonlinux:2 docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"' ``` Fixes pytorch#88869 Pull Request resolved: pytorch#90411 Approved by: https://github.com/atalman
If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem. Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders. Test plan: ``` docker pull amazonlinux:2 docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"' ``` Fixes #88869 Pull Request resolved: #90411 Approved by: https://github.com/atalman Co-authored-by: Nikita Shulga <nshulga@meta.com>
If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem. Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders. Test plan: ``` docker pull amazonlinux:2 docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"' ``` Fixes pytorch#88869 Pull Request resolved: pytorch#90411 Approved by: https://github.com/atalman
🐛 Describe the bug
On a CPU-only machine, in Amazon Linux 2 latest docker image, install latest PyTorch(1.13) via pip and import, received this error.
However in my sense, PyTorch should not install GPU related components on a CPU-only machine without CUDA and nvidia related package installed?
Steps to reproduce:
Then get error message:
Versions
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Amazon Linux 2 (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.2.5
Python version: 3.7.10 (default, Jun 3 2021, 00:02:01) [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] (64-bit runtime)
Python platform: Linux-5.4.214-134.408.amzn2int.x86_64-x86_64-with-glibc2.2.5
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A
Versions of relevant libraries:
[pip3] torch==1.13.0
[conda] Could not collect
cc @ezyang @gchanan @zou3519 @seemethere @malfet
The text was updated successfully, but these errors were encountered: