New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small wheels for 2.1.0 release candidate is not usable on AmazonLinux #109221
Comments
Can you also please include install command what version is getting installed ? |
Trying to install on amazon linux2 installs for some reason 1.13.1:
Manually downloading correct wheel:
Same command works on our validation environment: |
Here is metadata for this wheel:
Here is metadat for 2.0.1 :
|
Looks like we need to strip: Requires-Dist: pytorch-triton (==2.1.0) from the Metadata and rebuild this weel |
@atalman remove 2nd
Same command works fine for Ubuntu-22.04, because RedHat-flavor distros are the only one that have different system and local installation folders |
Ok, regression is due to the library name changes between nvidia-cuda-11 and nvidia-cuda-12 pipi packages (i.e. lack of libcuXYZ.so.${MAJOR}.${MINOR}") Following diff fixes the problem: # diff -u /usr/local/lib64/python3.9/site-packages/torch/__init__.py __init__.py
--- /usr/local/lib64/python3.9/site-packages/torch/__init__.py 2023-09-13 19:25:52.602695425 +0000
+++ __init__.py 2023-09-13 19:25:27.010624132 +0000
@@ -174,13 +174,13 @@
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
except OSError as err:
# Can only happen for wheel with cuda libs as PYPI deps
- # As PyTorch is not purelib, but nvidia-*-cu11 is
+ # As PyTorch is not purelib, but nvidia-*-cu12 is
cuda_libs: Dict[str, str] = {
'cublas': 'libcublas.so.*[0-9]',
'cudnn': 'libcudnn.so.*[0-9]',
- 'cuda_nvrtc': 'libnvrtc.so.*[0-9].*[0-9]',
- 'cuda_runtime': 'libcudart.so.*[0-9].*[0-9]',
- 'cuda_cupti': 'libcupti.so.*[0-9].*[0-9]',
+ 'cuda_nvrtc': 'libnvrtc.so.*[0-9]',
+ 'cuda_runtime': 'libcudart.so.*[0-9]',
+ 'cuda_cupti': 'libcupti.so.*[0-9]',
'cufft': 'libcufft.so.*[0-9]',
'curand': 'libcurand.so.*[0-9]',
'cusolver': 'libcusolver.so.*[0-9]', |
Or any other distro that have different purelib and platlib paths Regression was introduced, when small wheel base dependency was migrated from CUDA-11 to CUDA-12 Not sure why, but minor version of the package is no longer shipped with following CUDA-12: - nvidia_cuda_nvrtc_cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 But those were present in CUDA-11 release Fixes #109221
This feels separate, should we file different issue for it? Also let's add a test for that (that wheel has only one |
Or any other distro that have different purelib and platlib paths Regression was introduced, when small wheel base dependency was migrated from CUDA-11 to CUDA-12 Not sure why, but minor version of the package is no longer shipped with following CUDA-12: - nvidia_cuda_nvrtc_cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 But those were present in CUDA-11 release, i.e: ``` shell bash-5.2# curl -OL https://files.pythonhosted.org/packages/ef/25/922c5996aada6611b79b53985af7999fc629aee1d5d001b6a22431e18fec/nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl; unzip -t nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl |grep \.so testing: nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.11.7 OK testing: nvidia/cuda_nvrtc/lib/libnvrtc.so.11.2 OK bash-5.2# curl -OL https://files.pythonhosted.org/packages/b6/9f/c64c03f49d6fbc56196664d05dba14e3a561038a81a638eeb47f4d4cfd48/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl; unzip -t nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl|grep \.so testing: nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.12.1 OK testing: nvidia/cuda_nvrtc/lib/libnvrtc.so.12 OK ``` Fixes pytorch#109221 Pull Request resolved: pytorch#109244 Approved by: https://github.com/huydhn
Or any other distro that have different purelib and platlib paths Regression was introduced, when small wheel base dependency was migrated from CUDA-11 to CUDA-12 Not sure why, but minor version of the package is no longer shipped with following CUDA-12: - nvidia_cuda_nvrtc_cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 But those were present in CUDA-11 release, i.e: ``` shell bash-5.2# curl -OL https://files.pythonhosted.org/packages/ef/25/922c5996aada6611b79b53985af7999fc629aee1d5d001b6a22431e18fec/nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl; unzip -t nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl |grep \.so testing: nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.11.7 OK testing: nvidia/cuda_nvrtc/lib/libnvrtc.so.11.2 OK bash-5.2# curl -OL https://files.pythonhosted.org/packages/b6/9f/c64c03f49d6fbc56196664d05dba14e3a561038a81a638eeb47f4d4cfd48/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl; unzip -t nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl|grep \.so testing: nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.12.1 OK testing: nvidia/cuda_nvrtc/lib/libnvrtc.so.12 OK ``` Fixes #109221 This is a cherry-pick of #109244 into release/2.1 branch
馃悰 Describe the bug
Run:
Please note that above works as expected with 2.0 release, i.e. #88869 was reintroduced in trunk/2.1.0 branch
Versions
2.1.0
cc @ezyang @gchanan @zou3519 @kadeng @seemethere @ptrblck
The text was updated successfully, but these errors were encountered: