Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manual cuda deps search logic #90411

Closed
wants to merge 3 commits into from
Closed

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Dec 7, 2022

If PyTorch is package into a wheel with nvidia-cublas-cu11, which is designated as PureLib, but torch wheel is not, can cause a torch_globals loading problem.

Fix that by searching for nvidia/cublas/lib/libcublas.so.11 an nvidia/cudnn/lib/libcudnn.so.8 across all sys.path folders.

Test plan:

docker pull amazonlinux:2
docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"'

Output:

1.13.0+cu117 True

Fixes #88869

@malfet malfet requested a review from a team December 7, 2022 21:12
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 7, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90411

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 87af78f:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@atalman
Copy link
Contributor

atalman commented Dec 7, 2022

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please fix lint as well

@malfet
Copy link
Contributor Author

malfet commented Dec 7, 2022

Looks good. Do we also need rewrite this patch: https://github.com/pytorch/builder/blob/main/release/pypi/prep_binary_for_pypi.sh and this one ? https://github.com/pytorch/builder/blob/main/manywheel/build_cuda.sh#L190

Not sure how those are related to PR in question. I.e. it fixes a location problem, but this is a fallback path, i.e. not exercised by default

@malfet malfet changed the title Fix cuda deps search path Add manual cuda deps search logic Dec 7, 2022
@malfet malfet added release notes: releng release notes category topic: bug fixes topic category labels Dec 7, 2022
@malfet
Copy link
Contributor Author

malfet commented Dec 7, 2022

@pytorchbot merge -f "Lint is green, but otherwise it can not really be tested"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

atalman pushed a commit to atalman/pytorch that referenced this pull request Dec 7, 2022
If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem.

Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders.

Test plan:
```
docker pull amazonlinux:2
docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"'
```

Fixes pytorch#88869

Pull Request resolved: pytorch#90411
Approved by: https://github.com/atalman
atalman added a commit that referenced this pull request Dec 8, 2022
If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem.

Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders.

Test plan:
```
docker pull amazonlinux:2
docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"'
```

Fixes #88869

Pull Request resolved: #90411
Approved by: https://github.com/atalman

Co-authored-by: Nikita Shulga <nshulga@meta.com>
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem.

Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders.

Test plan:
```
docker pull amazonlinux:2
docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"'
```

Fixes pytorch#88869

Pull Request resolved: pytorch#90411
Approved by: https://github.com/atalman
@malfet malfet deleted the malfet/add-cuda-load-path branch June 7, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged release notes: releng release notes category topic: bug fixes topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On CPU-only machine received OSError from importing: libcublas.so.11: cannot open shared object file
3 participants