IPCA did not converge, numpy.linalg.LinAlgError: SVD did not converge #15996

madanvnera · 2020-04-16T11:27:53Z

Incremental PCA is consistently giving convergence issue with dataframe of 18000, 18000

Reproducing code example:

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA, IncrementalPCA

df_data=pd.read_csv("/home/ubuntu/df_data_18000_18000_data1.csv")
df_data.set_index('Unnamed: 0', inplace=True)
df_data=df_data.astype('int8')
ipca = IncrementalPCA(n_components=3600, batch_size=3600)
data_ipca = ipca.fit_transform(df_data)
total_explained_variances_ratio = sum(list(ipca.explained_variance_ratio_))
print("Total explained variance in IPCA is {}".format(total_explained_variances_ratio))
df = pd.DataFrame(data_ipca, index=list(df_data.index))
print("Size of vector space after IncrementalPCA {}".format(df.shape))
[df_data_18000_18000_data2.csv.zip](https://github.com/numpy/numpy/files/4486707/df_data_18000_18000_data2.csv.zip)

Error message:

Trackback:
Traceback (most recent call last):
File "ipca_script.py", line 8, in
data_ipca = ipca.fit_transform(df_data)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/sklearn/base.py", line 553, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/sklearn/decomposition/incremental_pca.py", line 201, in fit
self.partial_fit(X[batch], check_input=False)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/sklearn/decomposition/incremental_pca.py", line 279, in partial_fit
U, S, V = linalg.svd(X, full_matrices=False)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/scipy/linalg/decomp_svd.py", line 132, in svd

Numpy/Python version information:

1.17.4 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]

print(sklearn.version)
0.21.2

charris · 2020-04-16T13:25:50Z

What problem are you trying to solve with such large matrices?

madanvnera · 2020-04-16T13:46:26Z

Yes I understand, it is large matrix. The number of features is dynamic and can be really large in some cases (as discussed above) in our problem. We need to find the principal features and we are okay to have less variance ratio here. We are using Incremental PCA for memory optimization. Do you think, IPCA is not good for a large number of feature sets? I do not understand why does it give convergence issue . It should able to give features set with a lower variance ratio. IPCA does not have an option for variance ratio. LAPACK implementation should be triggered if it does not converge

seberg · 2020-04-16T15:10:11Z

We do have some 64bit blas support now, I am not quite sure when that is active by default (i.e. in our wheels, anaconda?) But you may get away with this if you update to a newer numpy version. As a start if you want to look into it: gh-15012 and gh-15114

charris · 2020-04-16T15:56:59Z

One thing worth looking at is if the approach you are using can be simplified, that is, improve the algorithm. The large array is somewhat suspicious in that regard. That is why I was asking for more details on what you were doing.

madanvnera · 2020-04-16T18:30:11Z

Yes. The original matrix has this large array. We need to use Clustering on this data. Before we give this for the clustering algorithm, we use principal component analysis.
@charris any idea what can cause convergence issue here?.

Also, are you planning to add more options in Incremental PCA for SVD Solver {‘auto’, ‘full’, ‘arpack’, ‘randomized’}?

charris · 2020-04-16T18:51:01Z

Incremental PCA is a scikit-learn thing, not numpy. My naive thought is that incremental may not be the best approach here. What I am curious about is how the matrix is produced.

madanvnera · 2020-04-16T19:03:10Z

Thanks for the reply. sorry, I just realized it, I am on NumPy GitHub. I should be asking this on sci-kit-learn.
Here I am more interested in knowing when and why this error can occur in NumPy module numpy.linalg.LinAlgError. I see this multiple times intermediately in PCA as well. Thanks in advance.

rossbar · 2020-07-12T02:59:12Z

Closing as it seems the conversation has moved to another forum. Feel free to reopen if I've missed something.

rossbar closed this as completed Jul 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPCA did not converge, numpy.linalg.LinAlgError: SVD did not converge #15996

IPCA did not converge, numpy.linalg.LinAlgError: SVD did not converge #15996

madanvnera commented Apr 16, 2020 •

edited

charris commented Apr 16, 2020

madanvnera commented Apr 16, 2020 •

edited

seberg commented Apr 16, 2020

charris commented Apr 16, 2020

madanvnera commented Apr 16, 2020

charris commented Apr 16, 2020

madanvnera commented Apr 16, 2020

rossbar commented Jul 12, 2020

IPCA did not converge, numpy.linalg.LinAlgError: SVD did not converge #15996

IPCA did not converge, numpy.linalg.LinAlgError: SVD did not converge #15996

Comments

madanvnera commented Apr 16, 2020 • edited

Reproducing code example:

Error message:

Numpy/Python version information:

charris commented Apr 16, 2020

madanvnera commented Apr 16, 2020 • edited

seberg commented Apr 16, 2020

charris commented Apr 16, 2020

madanvnera commented Apr 16, 2020

charris commented Apr 16, 2020

madanvnera commented Apr 16, 2020

rossbar commented Jul 12, 2020

madanvnera commented Apr 16, 2020 •

edited

madanvnera commented Apr 16, 2020 •

edited