Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: 2.3.0 wheel can not be imported if installed for a single user only #125109

Closed
DukeSniper opened this issue Apr 27, 2024 · 15 comments
Closed
Assignees
Labels
high priority module: binaries Anything related to official binaries that we release to users module: regression It used to work, and now it doesn't module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@DukeSniper
Copy link

DukeSniper commented Apr 27, 2024

馃悰 Describe the bug

Upgrading torch from a perfectly fine 2.2.2 to 2.3.0 seems to break it on Windows

C:\stable\ComfyUI>python
Python 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch`
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\xyz\AppData\Roaming\Python\Python312\site-packages\torch\__init__.py", line 141, in <module>
    raise err
OSError: [WinError 126] Das angegebene Modul wurde nicht gefunden. Error loading "C:\Users\xyz\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll" or one of its dependencies.

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise LTSC
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.3 (tags/v3.12.3:f6650f9, Apr 9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4070 SUPER
Nvidia driver version: 551.86
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture=9
CurrentClockSpeed=3401
DeviceID=CPU0
Family=107
L2CacheSize=8192
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3401
Name=AMD Ryzen 9 5950X 16-Core Processor
ProcessorType=3
Revision=8450

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.0
[pip3] torchaudio==2.3.0
[pip3] torchsde==0.2.6
[pip3] torchvision==0.18.0
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite

@DollarAkshay
Copy link

DollarAkshay commented Apr 27, 2024

I have the same problem, I have tried everything

  • Python 3.12
  • Python 3.10
  • With Cuda
  • CPU Only
  • Different Versions of CUDA

Update : I just installed 2.2.2 and the problem went away

@acidbubbles
Copy link

acidbubbles commented Apr 28, 2024

Here's the full English message to help with searching for the issue: [WinError 126] The specified module could not be found. Error loading "...\torch\lib\shm.dll" or one of its dependencies.

I also have this issue with Python 3.11.9, to complete @DollarAkshay's results.

Note that I only use Python, I don't use conda.

Would be nice to know which dependency shm.dll is missing, though.

@acidbubbles
Copy link

I can see people on StackOverflow suggesting to use conda install cudatoolkit, however I'm not using Conda, so knowing which dependency is missing and how to upgrade from 2.2 to 2.3 would be appreciated.

Links:

@cpuhrsch cpuhrsch added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module high priority labels Apr 30, 2024
@cpuhrsch
Copy link
Contributor

Seems high priority given traffic and regression. Marked for triage review.

@guilhrmeln
Copy link

I have the same problem, I have tried everything

  • Python 3.12
  • Python 3.10
  • With Cuda
  • CPU Only
  • Different Versions of CUDA

Update : I just installed 2.2.2 and the problem went away

Same thing here. Only downgrading to 2.2.2 solved the issue.

@atalman atalman added this to the 2.3.1 milestone May 1, 2024
@atalman
Copy link
Contributor

atalman commented May 1, 2024

Trying to repro this issue now.

We have following validation workflow where windows python 3.12 builds where validated, I do not see this error:
https://github.com/pytorch/builder/actions/runs/8839206170/job/24272076022#step:9:581

@DukeSniper
Copy link
Author

Further analysis on a test machine using DLLTracer

C:\Windows\system32>python
Python 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlltracer
>>> import sys
>>>
>>> with dlltracer.Trace(out=sys.stdout):
...     import torch
...
LoadLibrary \Device\HarddiskVolume1\Windows\System32\kernel.appcore.dll
LoadLibrary \Device\HarddiskVolume1\Program Files\Python312\DLLs\_wmi.pyd
LoadLibrary \Device\HarddiskVolume1\Windows\System32\oleaut32.dll
LoadLibrary \Device\HarddiskVolume1\Windows\System32\combase.dll
LoadLibrary \Device\HarddiskVolume1\Windows\System32\propsys.dll
LoadLibrary \Device\HarddiskVolume1\Program Files\Python312\vcruntime140_1.dll
LoadLibrary \Device\HarddiskVolume1\Program Files\Python312\DLLs\_ctypes.pyd
LoadLibrary \Device\HarddiskVolume1\Windows\System32\ole32.dll
LoadLibrary \Device\HarddiskVolume1\Program Files\Python312\DLLs\libffi-8.dll
LoadLibrary \Device\HarddiskVolume1\Program Files\Python312\DLLs\_bz2.pyd
LoadLibrary \Device\HarddiskVolume1\Program Files\Python312\DLLs\_lzma.pyd
LoadLibrary \Device\HarddiskVolume1\Windows\System32\msvcp140.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\asmjit.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\c10.dll
LoadLibrary \Device\HarddiskVolume1\Windows\System32\dbghelp.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\c10_cuda.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudart64_12.dll
LoadLibrary \Device\HarddiskVolume1\Windows\System32\cryptbase.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\caffe2_nvrtc.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\nvrtc64_120_0.dll
LoadLibrary \Device\HarddiskVolume1\Windows\System32\shell32.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cublas64_12.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cublasLt64_12.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudnn64_8.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudnn_adv_infer64_8.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudnn_ops_infer64_8.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudnn_adv_train64_8.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudnn_ops_train64_8.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudnn_cnn_infer64_8.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\zlibwapi.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cudnn_cnn_train64_8.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cufft64_11.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cufftw64_11.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cupti64_2023.1.1.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\curand64_10.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cusolver64_11.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cusparse64_12.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\nvJitLink_120_0.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\cusolverMg64_11.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\fbgemm.dll
LoadLibrary \Device\HarddiskVolume1\Windows\System32\vcomp140.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\libiomp5md.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\libiompstubs5md.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\nvrtc-builtins64_121.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\nvToolsExt64_1.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cpu.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cuda.dll
Failed \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cuda.dll
Failed \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cpu.dll
Failed \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cpu.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\uv.dll
LoadLibrary \Device\HarddiskVolume1\Windows\System32\psapi.dll
LoadLibrary \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cuda.dll
Failed \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cuda.dll
Failed \Device\HarddiskVolume1\Windows\System32\psapi.dll
Failed \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\uv.dll
Failed \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\torch_cpu.dll
Failed \Device\HarddiskVolume1\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\__init__.py", line 141, in <module>
    raise err
OSError: [WinError 126] Das angegebene Modul wurde nicht gefunden. Error loading "C:\Users\roger.moore\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll" or one of its dependencies.

@DukeSniper
Copy link
Author

Next update: It works fine when both Python and torch are installed system wide. However, with a system wide Python installing torch in a user context will cause the DLL load error to appear

@atalman
Copy link
Contributor

atalman commented May 1, 2024

@DukeSniper does it work for you if you try creating venv for it https://docs.python.org/3/library/venv.html ?

@RichieHakim
Copy link

I'm seeing this too. It is specific to torch==2.3.0, pip install --user, and WIndows OS. You can see in this github actions run that it is specific to these parameters: https://github.com/RichieHakim/basic_neural_processing_modules/actions/runs/8916263499/job/24487346542#step:11:120

@DukeSniper
Copy link
Author

Haven't had a chance to try venv yet, but I just tried something else. Installed system wide Python on a naked system, ran "pip install torch torchvision torchaudio & pip uninstall torch" with admin privs, next installed torch as user and it works. Which got me tinkering a bit more. Seeing that the only obvious major change between 2.2.x and 2.3.0 is the additional dependency on mkl<=2021.4.0,>=2021.1.1 I tried moving that from system to user scope, and that seems to break the shm.dll dependency tree

@DukeSniper
Copy link
Author

Aaaand I found the solution. The MKL package, when installed in user context, installs the libraries that shm.dll is linked to into %APPDATA%\Python\Library\Bin - which per default isn't in the PATH env var. Adding that path to the PATH env var allows for the DLL loader to actually find the MKL dlls

@acidbubbles
Copy link

acidbubbles commented May 2, 2024

In a self-contained Python install, libraries installed in APPDATA are problematic, But the good news is, the DLLs can be found in site-packages\torch\lib. So, a workaround would be to add this path to the process's path. I'll update this issue if it works :)

UPDATE: I tried with adding the torch/lib path to both the PATH environment variable and PythonPath, it didn't work for me. I'm not sure why it worked with @DukeSniper, it may be because of my particular setup (PythonNet and a self-contained Python install). I'll wait for feedback from the PyTorch team.

@ibrhimAli
Copy link

ibrhimAli commented May 5, 2024

There is multiple .dll files which shm.dll depends on are missing.

  • I updated C++ Distubtion
  • Run sfc /scannow in case of any broken system files.
    That doesn't work. so, I downgraded to version 2,2,2 for now.

@atalman
Copy link
Contributor

atalman commented May 7, 2024

Looks like the issue was introduced by these PR's:
#102604
pytorch/builder#1467

Most likely the problem that mkl_intel_thread.1.dll can't be found on some systems.
I can repro this issue by renaming the above dll and running import torch:

Python 3.11.8 (tags/v3.11.8:db85d51, Feb  6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\runneruser\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\__init__.py", line 141, in <module>
    raise err
OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\runneruser\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\lib\shm.dll" or one of its dependencies.
>>> exit()

The variable that needs to be added to a PATH is the Python<version>\Library\bin folder where all the mkl libraries are installed, for example:

C:\Users\runneruser\AppData\Local\Programs\Python\Python311\Library\bin

As per this comment: #125109 (comment)

However we should already be loading dlls from this path. here is the logic for it:
https://github.com/pytorch/pytorch/blob/main/torch/__init__.py#L70

This problem is specific when Python is installed systemwide while torch is installed in user context. In this case the mkl libraries are installed in default path:

C:\Users\Administrator\AppData\Roaming\Python\Library\bin

@malfet malfet added the module: regression It used to work, and now it doesn't label May 7, 2024
@malfet malfet changed the title 2.3.0 on Windows missing dependency? Windows: 2.3.0 wheel can not be imported if installed for a single user only May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: binaries Anything related to official binaries that we release to users module: regression It used to work, and now it doesn't module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.