Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA inner function collision with matplotlib.pyplot #19285

Closed
MichalRIcar opened this issue Jan 27, 2021 · 20 comments
Closed

PCA inner function collision with matplotlib.pyplot #19285

MichalRIcar opened this issue Jan 27, 2021 · 20 comments

Comments

@MichalRIcar
Copy link

MichalRIcar commented Jan 27, 2021

Hello,

Describe the bug

sklearn.decomposition.PCA
has an inner function collision with matplotlib.pyplot as code with data shows below

Steps/Code to Reproduce

import pandas   as pd
from sklearn.decomposition   import PCA
import matplotlib.pyplot     as plt

DATA = pd.read_csv("DATA.csv")

#1 Calc PCA
x_pca = PCA(8).fit_transform(DATA)

#2 Set up the matplotlib figure matrix  →  this cause the collision if code is run twice 
fig, axes = plt.subplots(2, 2, figsize=(13, 8), sharex=False)

DATA.zip


#NOTES:
#A] IF WE RUN ONLY #1 THEN CODE PASSES INFINITE TIMES
#b] IF WE RUN #1 and #2 THEN CODE PASSES for the first time, but second time crashes, if run it for the third time it passes again and fourth time crashes and so on

Expected Results

FIRST TIME FULL CODE PASSES (in general odd run), SECOND TIME RUN (in general even run):

Actual Results for even runs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-ad8aef2aa149> in <module>
      6 
      7 #1 Calc PCA
----> 8 x_pca = PCA(8).fit_transform(DATA)
      9 
     10 

~\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py in fit_transform(self, X, y)
    374         C-ordered array, use 'np.ascontiguousarray'.
    375         """
--> 376         U, S, Vt = self._fit(X)
    377         U = U[:, :self.n_components_]
    378 

~\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py in _fit(self, X)
    423             return self._fit_full(X, n_components)
    424         elif self._fit_svd_solver in ['arpack', 'randomized']:
--> 425             return self._fit_truncated(X, n_components, self._fit_svd_solver)
    426         else:
    427             raise ValueError("Unrecognized svd_solver='{0}'"

~\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py in _fit_truncated(self, X, n_components, svd_solver)
    539         elif svd_solver == 'randomized':
    540             # sign flipping is done inside
--> 541             U, S, Vt = randomized_svd(X, n_components=n_components,
    542                                       n_iter=self.iterated_power,
    543                                       flip_sign=True,

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\utils\extmath.py in randomized_svd(M, n_components, n_oversamples, n_iter, power_iteration_normalizer, transpose, flip_sign, random_state)
    355 
    356     # compute the SVD on the thin matrix: (k + p) wide
--> 357     Uhat, s, Vt = linalg.svd(B, full_matrices=False)
    358 
    359     del B

~\anaconda3\lib\site-packages\scipy\linalg\decomp_svd.py in svd(a, full_matrices, compute_uv, overwrite_a, check_finite, lapack_driver)
    104 
    105     """
--> 106     a1 = _asarray_validated(a, check_finite=check_finite)
    107     if len(a1.shape) != 2:
    108         raise ValueError('expected matrix')

~\anaconda3\lib\site-packages\scipy\_lib\_util.py in _asarray_validated(a, check_finite, sparse_ok, objects_ok, mask_ok, as_inexact)
    260             raise ValueError('masked arrays are not supported')
    261     toarray = np.asarray_chkfinite if check_finite else np.asarray
--> 262     a = toarray(a)
    263     if not objects_ok:
    264         if a.dtype is np.dtype('O'):

~\anaconda3\lib\site-packages\numpy\lib\function_base.py in asarray_chkfinite(a, dtype, order)
    483     a = asarray(a, dtype=dtype, order=order)
    484     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
--> 485         raise ValueError(
    486             "array must not contain infs or NaNs")
    487     return a

ValueError: array must not contain infs or NaNs



#### Versions
 pandas 1.2.1 
 matplotlib 3.3.3 
 sklearn 0.24.1 
numpy 1.19.5 
@MichalRIcar MichalRIcar changed the title PCA inner function colision with matplotlib.pyplot PCA inner function collision with matplotlib.pyplot Jan 27, 2021
@NicolasHug
Copy link
Member

There's probably something fishy going on in display, which you haven't defined above.

@MichalRIcar
Copy link
Author

@NicolasHug pls ignore the display(x_pca) that just print of PCA result..I edited the above code

@NicolasHug
Copy link
Member

I can't reproduce your error @MichalRIcar .
The code runs just fine no matter how many times it is executed.

Given the error message it is likely that you are somehow modifying DATA at some point.

I will close the issue until we can get an actual reproducible example.

@MichalRIcar
Copy link
Author

@NicolasHug pls check the 5s recording of my screen which explains what is said (I don't do anything with data)

https://drive.google.com/file/d/1y3b67BbbcEI28dGvPX4F-1d3GMq4kOw3/view?usp=sharing

@NicolasHug
Copy link
Member

I tried your code replacing read_csv by np.random.randn(100, 10) and it worked fine.

Can you make sure that DATA is the same before and after the first execution?

@MichalRIcar
Copy link
Author

Just checked and data are same before and after run.

Tried the same strategy and changed DATA to np.random.randn(100, 10). Behavior is same, every even run it crashes, however reason is different: LinAlgError: SVD did not converge

@MichalRIcar
Copy link
Author

Screen Recording with np.random.randn(100, 10)

https://drive.google.com/file/d/13i9TyA_zpYGtpwi9skJ58cUeO28GZxMF/view?usp=sharing

@NicolasHug
Copy link
Member

I'm going to re-open the issue but TBH I have no idea what's going on. This is really unexpected

@NicolasHug NicolasHug reopened this Jan 27, 2021
@glemaitre
Copy link
Member

I cannot reproduce the issue in notebook or ipython. This bug does not make any sense.

I am not sure that this is due to our code because the first time the randomized solver is used and the second set, the scipy solver is used. So it is not even solver dependent.

Could you give more information regarding the platform and the jupyter and jupyter notebook version.

@MichalRIcar
Copy link
Author

Hello, agree it doesn't make much sense. It took me a while to track it as I didn't guess such dependency.
The Python is part of Anaconda installation:
https://repo.anaconda.com/archive/Anaconda3-2020.11-Windows-x86_64.exe
Just checked my version: Python 3.8.3

the packages are installed via PIP:
pip install pandas numpy sklearn matplotlib

@glemaitre
Copy link
Member

@jeremiedbb Can you reproduce on Windows?

@MichalRIcar
Copy link
Author

played with it a little bit and swap the order of the two, perhaps it is showing some direction

recording
https://drive.google.com/file/d/1hY1cGzAArthc3851YTJklvptTyu8ouvq/view?usp=sharing

@jnothman
Copy link
Member

Related to #17788?

@glemaitre
Copy link
Member

glemaitre commented Jan 28, 2021 via email

@jeremiedbb
Copy link
Member

@jeremiedbb Can you reproduce on Windows?

nope, works fine

@glemaitre
Copy link
Member

@MichalRIcar While it will not solve this issue directly, could you create a conda environment from scratch and install the minimum number of libraries and try to reproduce.

@MichalRIcar
Copy link
Author

MichalRIcar commented Jan 29, 2021

@glemaitre, gave it a shot:

  1. Installed "conda install anaconda-clean" as recommended and uninstall everything by "anaconda-clean --yes"
    https://docs.anaconda.com/anaconda/install/uninstall/#:~:text=Use%20simple%20remove%20to%20uninstall,or%20your%20version%20of%20Python.

  2. Deleted all folders connected with PY, CONDA: ...\ProgramData & Users\anaconda, pip...

  3. Uninstall in Windows

  4. Restart

  5. Install fresh Anacaonda with Py & install only numpy, pandas, scikit-learn matplotlib

  6. Same result as before :(

@amueller
Copy link
Member

amueller commented Feb 6, 2021

BTW I ran into this also (unrelated to matplotlib). In recent versions PCA seems to sometimes randomly error out. either with the above error or linalgerror dabl/dabl#248

@MichalRIcar
Copy link
Author

MichalRIcar commented Feb 10, 2021

Amazing! Just updated numpy (1.20.0→1.20.1) and problem solved, amazing job, guys!
Truly respect how you see through things!!

@cmarmo
Copy link
Member

cmarmo commented May 26, 2021

Hi @MichalRIcar, it seems that the issue has been solved then 🚀. I'm closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants