Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing DLL with SciPy 1.9.2 #24669

Closed
milos-korenciak opened this issue Oct 15, 2022 · 13 comments
Closed

Missing DLL with SciPy 1.9.2 #24669

milos-korenciak opened this issue Oct 15, 2022 · 13 comments
Labels

Comments

@milos-korenciak
Copy link

milos-korenciak commented Oct 15, 2022

Describe the bug

scikit-learn crashes /w current scipy==1.9.2 on Win (AMD64). This combination causes fail of joblib subprocess using scikit-learn. (Tested in Windows containers / docker images python:3.9.13 AND python:3.9.10 .)
Not replicated on Linux and MacOS even with the same settings (vanilla official Python 3.9 and 3.10).
Previous bugfix scipy version works fine (1.9.1).

Steps/Code to Reproduce

In Windows (only) install at first these packages:

python -m pip install -U pip setuptools scipy==1.9.2 joblib scikit-learn

Then run this code (e.g. interactively):

from joblib import Parallel, delayed
import sklearn
def a():
    from sklearn.model_selection import cross_val_score
    return cross_val_score
data_results = Parallel(n_jobs=4)(delayed(a)() for i in range(10))

The last line (above) fails.
To fix, just do (e.g. below) and rerun.

python -m pip install -U pip scipy==1.9.1

Expected Results

... nothing # if the minimum example above runs OK, nothing is deisplayed.

Actual Results

In [5]: data_results = Parallel(n_jobs=4)(delayed(a)() for i in range(10))
--------------------------------------------------------------------------- 
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Python\lib\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
    r = call_item()
  File "C:\Python\lib\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Python\lib\site-packages\joblib\_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "C:\Python\lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "C:\Python\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "<ipython-input-1-bf041901e974>", line 2, in a
  File "C:\Python\lib\site-packages\sklearn\model_selection\__init__.py", line 23, in <module>
    from ._validation import cross_val_score
  File "C:\Python\lib\site-packages\sklearn\model_selection\_validation.py", line 32, in <module>
    from ..metrics import check_scoring
  File "C:\Python\lib\site-packages\sklearn\metrics\__init__.py", line 41, in <module>
    from . import cluster
  File "C:\Python\lib\site-packages\sklearn\metrics\cluster\__init__.py", line 22, in <module>
    from ._unsupervised import silhouette_samples
  File "C:\Python\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py", line 16, in <module>
    from ..pairwise import pairwise_distances_chunked
  File "C:\Python\lib\site-packages\sklearn\metrics\pairwise.py", line 33, in <module>
    from ._pairwise_distances_reduction import PairwiseDistancesArgKmin
ImportError: DLL load failed while importing _pairwise_distances_reduction: The specified module could not be found.
"""

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In [5], line 1
----> 1 data_results = Parallel(n_jobs=4)(delayed(a)() for i in range(10))

File C:\Python\lib\site-packages\joblib\parallel.py:1098, in Parallel.__call__(self, iterable)
   1095     self._iterating = False
   1097 with self._backend.retrieval_context():
-> 1098     self.retrieve() 
   1099 # Make sure that we get a last message telling us we are done
   1100 elapsed_time = time.time() - self._start_time

File C:\Python\lib\site-packages\joblib\parallel.py:975, in Parallel.retrieve(self)
    973 try:
    974     if getattr(self._backend, 'supports_timeout', False):
--> 975         self._output.extend(job.get(timeout=self.timeout))
    976     else:
    977         self._output.extend(job.get())
 
File C:\Python\lib\site-packages\joblib\_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout)
    564 """Wrapper for Future.result to implement the same behaviour as
    565 AsyncResults.get from multiprocessing."""
    566 try:
--> 567     return future.result(timeout=timeout)
    568 except CfTimeoutError as e:
    569     raise TimeoutError from e
 
File C:\Python\lib\concurrent\futures\_base.py:458, in Future.result(self, timeout)
    456     raise CancelledError()
    457 elif self._state == FINISHED:
--> 458     return self.__get_result()
    459 else:
    460     raise TimeoutError()
 
File C:\Python\lib\concurrent\futures\_base.py:403, in Future.__get_result(self)
    401 if self._exception:
    402     try:
--> 403         raise self._exception
    404     finally:
    405         # Break a reference cycle with the exception in self._exception
    406         self = None

ImportError: DLL load failed while importing _pairwise_distances_reduction: The specified module could not be found.

Versions

System: 
    python: 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
executable: C:\Python\python.exe
   machine: Windows-10-10.0.17763-SP0

Python dependencies:
      sklearn: 1.1.2
          pip: 22.2.2
   setuptools: 65.4.1
        numpy: 1.23.4
        scipy: 1.9.2
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info: 
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\Python\Lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
        version: 0.3.20
threading_layer: pthreads
   architecture: Haswell
    num_threads: 2

       user_api: openmp
   internal_api: openmp
         prefix: vcomp
       filepath: C:\Python\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None
    num_threads: 2

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\Python\Lib\site-packages\scipy\.libs\libopenblas.PZA5WNOTOH6FZLB2KBVKAURAKVTFSNNU.gfortran-win_amd64.dll
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell
    num_threads: 2
@milos-korenciak milos-korenciak added Bug Needs Triage Issue requires triage labels Oct 15, 2022
@milos-korenciak
Copy link
Author

Link /w Scipy bug conterpart (IDNK which project is this bug related to) scipy/scipy#17232
Maybe also scipy/scipy#17191

@glemaitre glemaitre changed the title crash of Missing DLL with SciPy 1.9.2 Oct 18, 2022
@glemaitre
Copy link
Member

It should be certainly fixed by #24631
However, we will need to release a bug fix release including this fix.

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Oct 18, 2022
@glemaitre
Copy link
Member

@Micky774 @jeremiedbb Could you try to see if we solve the problem since you have a Windows machine ;) Be aware that you need to build scikit-learn from main. Using the nightly build is not enough since we did not upload it for a while (because there is a broken wheel).

@ogrisel
Copy link
Member

ogrisel commented Oct 18, 2022

We will also need the backport scipy/scipy#17224 for a full fix. This will be available in scipy 1.9.3 that should be released soonish (see scipy/scipy#17239).

But we can already prepare a backport of #24631 in the scikit-learn 1.1.X to branch prepare a scikit-learn 1.1.3 bugfix release.

@glemaitre
Copy link
Member

Since we cannot backport scipy/scipy#17224 in scikit-learn we should blacklist SciPy 1.9.2.

@rgommers
Copy link
Contributor

Since we cannot backport scipy/scipy#17224 in scikit-learn we should blacklist SciPy 1.9.2.

That issue should be present in earlier releases as well. The original bug report is scipy/scipy#16527, and says it happens with 1.8.1 and 1.9.0rc1. So blacklisting 1.9.2 doesn't help.

@glemaitre
Copy link
Member

Thanks, @rgommers. This is weird that we did not trigger the bug in the different CIs in the past.

@cmarmo
Copy link
Member

cmarmo commented Oct 18, 2022

Since we cannot backport scipy/scipy#17224 in scikit-learn we should blacklist SciPy 1.9.2.

That issue should be present in earlier releases as well. The original bug report is scipy/scipy#16527, and says it happens with 1.8.1 and 1.9.0rc1. So blacklisting 1.9.2 doesn't help.

For completeness: the segfault (not related to the missing DLL) with scipy 1.9.2 appeared with python3.10 on Windows (see #24612). With python3.11 Windows, linux and Mac all failed. So even if backporting scipy/scipy#16528 is a solution, the segfault is not only related to scipy: the OS and python version had some effect.

@rgommers
Copy link
Contributor

So even if backporting scipy/scipy#16528 is a solution, the segfault is not only related to scipy: the OS and python version had some effect.

That will all be due to the version of NumPy that the SciPy wheels for that OS/Python version were built with - because the root cause is numpy.f2py related, and showed up in 1.22.x IIRC.

@cmarmo
Copy link
Member

cmarmo commented Oct 18, 2022

That will all be due to the version of NumPy that the SciPy wheels for that OS/Python version were built with - because the root cause is numpy.f2py related, and showed up in 1.22.x IIRC.

So the issue is scipy 1.9.2 being built with numpy 1.22.3 with Python10 on Windows (see https://github.com/scipy/scipy/actions/runs/3211264776/jobs/5249316864#step:6:434) as there are no constraints on the platform from https://github.com/scipy/oldest-supported-numpy/blob/main/setup.cfg?
Could it be relevant to add a constraint for windows python 3.10 there?

@rgommers
Copy link
Contributor

That would be a serious bug - and if true, is a problem for oldest-supported-numpy. I have no time to look at it, about to sign off for an extended period of time - could you please double check that and open an issue on oldest-supported-numpy @cmarmo?

For SciPy we don't use oldest-supported-numpy though. The 1.22.3 is on purpose: https://github.com/scipy/scipy/blob/3c0bc3f17551cc8e43597813777a176a1528bebd/pyproject.toml#L38-L41

@cmarmo
Copy link
Member

cmarmo commented Oct 20, 2022

Hi @milos-korenciak , if by chance you have some time to check the wheels available here. With the availability of scipy 1.9.3 and scikit-learn shipping the missing dll, the issue should be gone.

@cmarmo
Copy link
Member

cmarmo commented Oct 30, 2022

With the 1.1.3 release the wheel for Windows contains the missing library. I'm closing this one.

@cmarmo cmarmo closed this as completed Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants