Add LOBPCG solver for large symmetric positive definite eigenproblems #184

bytesnake · 2020-03-12T20:12:12Z

This PR ports the LOBPCG algorithm from scipy to Rust. The algorithm is useful for the symmetric eigenproblem for just a couple of eigenvalues (for example for multidimensional scaling of gaussian kernels). Solves the issue #160

I did not implement the generalized eigenproblem for matrix B different to identity, as its a uncommon use-case (at least in machine-learning), but if required the modification should be minor.

It also adds access to the functions ssygv, dsygv, zhegv, chegv for the generalized eigenvalue problem

with additional mass matrix B. The traits are implemented for tuples of A and B, so you can use it like this

let (eigvals, (eigvecs, B_cholesky)) = (A, B).eigh(UPLO::Upper);

Remaining issues:

Implement the orthogonalization to the constraint matrix Y
Improve documentation of the lobpcg.rs file and add examples
Implement truncated eigenvalue decomposition
Implement truncated SVD based on this
Benchmark the implementation
Add restart routine if cholesky fails

Example:

fn main() {
    let a = arr2(&[[3., 2., 2.], [2., 3., -2.]]);

    // calculate the truncated singular value decomposition for 2 singular values
    let result = TruncatedSvd::new(a, TruncatedOrder::Largest).decompose(2).unwrap();

    // acquire singular values, left-singular vectors and right-singular vectors
    let (u, sigma, v_t) = result.values_vectors();
    println!("Result of the singular value decomposition A = UΣV^T:");
    println!(" === U ===");
    println!("{:?}", u);
    println!(" === Σ ===");
    println!("{:?}", Array2::from_diag(&sigma));
    println!(" === V^T ===");
    println!("{:?}", v_t);
}

lobpcg · 2020-03-21T17:21:04Z

pls let me know if you need any advice on LOBPCG implementation tricks

bytesnake · 2020-03-22T11:31:47Z

Yes it is always a bit difficult to re-implement something from code and understand the reasoning behind it. So I currently have some free time and decided to implement LOBPCG for manifold learning. I brushed up some numeric classes for the conjugated gradient and Rayleigh-Ritz method.
The code is almost identical with the scipy implementation, except for some minor language details. (for example there is no restart flag, because the conjugated matrix P is optional and automatically restarted if none)

The (B-)orthonormalization routine chooses cholesky instead of QR decomposition (probably to save some cycles?) and calculates orthonormal matrix Q by explicitly calculating the inverse of R (here), why not use cho_solve?
When does the algorithm starts to calculate the gram matrices XX' and XR' explicitly, how is the threshold chosen?

Thanks for your help!

lobpcg · 2020-03-23T17:46:20Z

@bytesnake Thanks for your efforts! Please see my answers below:

Yes it is always a bit difficult to re-implement something from code and understand the reasoning behind it. So I currently have some free time and decided to implement LOBPCG for manifold learning. I brushed up some numeric classes for the conjugated gradient and Rayleigh-Ritz method.

Cf. https://scikit-learn.org/stable/modules/generated/sklearn.manifold.spectral_embedding.html

The code is almost identical with the scipy implementation, except for some minor language details. (for example there is no restart flag, because the conjugated matrix P is optional and automatically restarted if none)

Without the "conjugated" matrix P, the iteration indeed runs, but its convergence can be (much) slower.

The (B-)orthonormalization routine chooses cholesky instead of QR decomposition (probably to save some cycles?) and calculates orthonormal matrix Q by explicitly calculating the inverse of R (here), why not use cho_solve?

Two reasons:

The inverse of R is being reused in some cases, otherwise require calling cho_solve twice, which would be more time consuming.
The operations with "small" matrices like R are separated, and the only operation involving both "small" matrices like R and "tall" matrices like blockVectorBV is matmul (which can be, e.g., easily performed in parallel or on GPU if available, in contrast to cho_solve).

When does the algorithm starts to calculate the gram matrices XX' and XR' explicitly, how is the threshold chosen?

Manually, after a large battery of tests, checking for failures. "Explicit" is more stable, but slower. 'float32' is less stable compared to 'float64', so requires switching to "explicit" more aggressively.
The current values in https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py

        if activeBlockVectorAR.dtype == 'float32':
            myeps = 1
        elif activeBlockVectorR.dtype == 'float32':
            myeps = 1e-4
        else:
            myeps = 1e-8

are more on the safe and slower side.

bytesnake · 2020-03-25T10:07:22Z

@bytesnake Thanks for your efforts! Please see my answers below:

Yes it is always a bit difficult to re-implement something from code and understand the reasoning behind it. So I currently have some free time and decided to implement LOBPCG for manifold learning. I brushed up some numeric classes for the conjugated gradient and Rayleigh-Ritz method.

Cf. https://scikit-learn.org/stable/modules/generated/sklearn.manifold.spectral_embedding.html

I already looked into the code. For now I try to have the code as simple as possible, though support for Lanczos method through ARPACK or preconditioner should be added in the future. The main reason for the diffusion map PR is to give some feedback for the machine learning group in Rust.

The code is almost identical with the scipy implementation, except for some minor language details. (for example there is no restart flag, because the conjugated matrix P is optional and automatically restarted if none)

Without the "conjugated" matrix P, the iteration indeed runs, but its convergence can be (much) slower.

Sorry I wasn't very clear in my comment. I expressed restart and blockVectorP as type Option<T> and changed the code flow a bit. This is just a language detail, of course information of P is also used.

The (B-)orthonormalization routine chooses cholesky instead of QR decomposition (probably to save some cycles?) and calculates orthonormal matrix Q by explicitly calculating the inverse of R (here), why not use cho_solve?

Two reasons:
1. The inverse of `R` is being reused in some cases, otherwise require calling  `cho_solve` twice, which would be more time consuming.

2. The operations with "small" matrices like `R` are separated, and the only operation involving both    "small" matrices like `R` and "tall" matrices like `blockVectorBV` is matmul (which can be, e.g.,  easily performed in parallel or on GPU if available, in contrast to  `cho_solve`).

Thank you for clarifying that.

When does the algorithm starts to calculate the gram matrices XX' and XR' explicitly, how is the threshold chosen?

Manually, after a large battery of tests, checking for failures. "Explicit" is more stable, but slower. 'float32' is less stable compared to 'float64', so requires switching to "explicit" more aggressively.
The current values in https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py
        if activeBlockVectorAR.dtype == 'float32':
            myeps = 1
        elif activeBlockVectorR.dtype == 'float32':
            myeps = 1e-4
        else:
            myeps = 1e-8
are more on the safe and slower side.

Okay, I will just adopt these. The second case happens, when the preconditioner operates in 32bit and the "stiffness matrix" in 64bit.

lobpcg · 2020-03-25T17:45:35Z

@bytesnake you might also want to look at the LOBPCG C code for inspiration: https://github.com/lobpcg/blopex
Please feel free to ping me if you have any more questions. I would be also interested to see performance comparisons to my C and Python codes

bytesnake · 2020-03-31T14:35:08Z

I wrote a small benchmark this morning, comparing BLOPEX and this implementation. For this I modified the blopex_serial_double example by splitting it into setup and solving part and compiled it into a static library. I then created a benchmark with criterion between ndarray-linalg and the static library of BLOPEX. The initial matrix X has a standard uniform distribution and is the same for both libraries. A is a diagonal matrix with linearly decreasing values. The benchmark was conducted with a warmup of three second and a sample size of 100. Results for varying number of eigenvalues and matrix size n=60

Results for varying matrix size and number eigenvalues fixed to nvals = 1:

The main difference is that ndarray-linalg uses optimized matrix multiplication from openblas, so I don't think that this benchmark actually says anything about the LOBPCG implementation. But the runtime scales linear with the number of eigenvalues and quadratic with the matrix size.

bytesnake · 2020-03-31T14:36:13Z

@termoshtt please let me know when this is ready to merge

termoshtt · 2020-04-25T09:17:48Z

Thanks a lot, and sorry for later response :<

My concern is about ndarray-rand = 0.11 which requires rand = 0.7 (related to #176).
num-complex 0.3.0-pre is still developing rust-num/num-complex#70

Is it possible to replace ndarray-rand part using ndarray-lianlg::generate::random_* functions?

bytesnake · 2020-04-25T10:28:39Z

Yes of course, the ndarray-rand crate is only needed to generate the initial guess. I will update my PR later this week. (btw what distribution does random have?) I also encountered an issue with the restart routine, which I have to fix first.

bytesnake · 2020-04-28T09:40:11Z

pushed a new commit which removes the dependency to ndarray-rand :)

lobpcg · 2020-05-06T21:47:50Z

@bytesnake

I put a little advertisement at
https://www.linkedin.com/posts/andrew-knyazev_add-lobpcg-solver-for-large-symmetric-positive-activity-6663627147921375232-D2e_ (the direct link is somehow broken, so to see it navigate to posts from https://www.linkedin.com/in/andrew-knyazev/)

Please let me know if it is OK and ping me with your LinkedIn name if you want me to credit you there.

bytesnake added 5 commits March 12, 2020 20:53

Add eigenvalue decomposition for generalized problem

9f2eb2b

Unify return types for EighInplace, EighInto, Eigh

8b78314

Relax requirement for mass matrix B

64da100

Add LOBPCG

be9dcc6

Add some comments on the LOBPCG and run cargo fmt

80e6a61

bytesnake changed the title ~~Add generalized eigenvalue decomposition~~ Add Locally Optimal Block Preconditioned Conjugated (LOBPC) for large symmetric positive definite eigenproblems Mar 13, 2020

bytesnake changed the title ~~Add Locally Optimal Block Preconditioned Conjugated (LOBPC) for large symmetric positive definite eigenproblems~~ Add Locally Optimal Block Preconditioned Conjugated (LOBPCG) for large symmetric positive definite eigenproblems Mar 13, 2020

bytesnake changed the title ~~Add Locally Optimal Block Preconditioned Conjugated (LOBPCG) for large symmetric positive definite eigenproblems~~ Add LOBPCG solver for large symmetric positive definite eigenproblems Mar 13, 2020

bytesnake added 11 commits March 16, 2020 09:21

Improve tests and add test for convergence

b21b846

Move algorithm to seperate folder and add test for constraints

4bb6a53

Add truncated eigenvalue module

90c2c35

Implement iterator for TruncatedEig

9c21d78

Implement IntoIterator for TruncatedEig

8188649

Add truncated singular value decomposition module

705d37a

Improve test for truncated SVD

b0e1a40

Add test for random matrix reconstruction with SVD

3334034

Add example for truncated SVD

4acb85c

Run cargo fmt again

13cf2b3

Add eigenvalue decomposition example

f2909f0

This was referenced Mar 17, 2020

Roadmap rust-ml/linfa#7

Open

Add diffusion map dimensionality reduction and simple Principal Component Analysis rust-ml/linfa#17

Merged

bytesnake added 8 commits March 19, 2020 10:08

Truncate svd with float eps; Take preconditioner as a closure

68d8ec7

Use select for ndarray_mask

a7ce0e3

Only save best result for LOBPCG

f25a177

Add benchmarks for truncated eigenvalue decomposition

cea3c0e

Add restarting and improve performance with explicit gram flag

6c94817

Restart the eigenvalue decomposition as well

233fc8a

Remove unnecessary match blocks

93f8b44

Increase precision in orthonormalization test

e9f9f6a

Improve comments on LOBPCG

564bb77

Rename EigResult to LobpcgResult

a6ac4b3

Continue if full eigendecomposition fails

6a12b2a

Remove debugging output

35564fa

Remove generic type from precision for simpler use

f8ad553

This was referenced Apr 9, 2020

[neural-net-api] What do we want to build? rust-ml/discussion#6

Open

Are there any matrix-free solvers? #190

Open

Remove dependency to ndarray-rand; Fix restarting issue

ae2ce6a

termoshtt merged commit c033cb9 into rust-ndarray:master May 6, 2020

termoshtt added this to the 0.12.1 milestone Jul 23, 2020

lobpcg mentioned this pull request May 26, 2021

Eigenvalue solvers to add for comparisons rfeinman/Torch-ARPACK#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LOBPCG solver for large symmetric positive definite eigenproblems #184

Add LOBPCG solver for large symmetric positive definite eigenproblems #184

bytesnake commented Mar 12, 2020 •

edited

lobpcg commented Mar 21, 2020

bytesnake commented Mar 22, 2020 •

edited

lobpcg commented Mar 23, 2020

bytesnake commented Mar 25, 2020

lobpcg commented Mar 25, 2020

bytesnake commented Mar 31, 2020

bytesnake commented Mar 31, 2020

termoshtt commented Apr 25, 2020

bytesnake commented Apr 25, 2020 •

edited

bytesnake commented Apr 28, 2020

lobpcg commented May 6, 2020 •

edited

Add LOBPCG solver for large symmetric positive definite eigenproblems #184

Add LOBPCG solver for large symmetric positive definite eigenproblems #184

Conversation

bytesnake commented Mar 12, 2020 • edited

Remaining issues:

Example:

lobpcg commented Mar 21, 2020

bytesnake commented Mar 22, 2020 • edited

lobpcg commented Mar 23, 2020

bytesnake commented Mar 25, 2020

lobpcg commented Mar 25, 2020

bytesnake commented Mar 31, 2020

bytesnake commented Mar 31, 2020

termoshtt commented Apr 25, 2020

bytesnake commented Apr 25, 2020 • edited

bytesnake commented Apr 28, 2020

lobpcg commented May 6, 2020 • edited

bytesnake commented Mar 12, 2020 •

edited

bytesnake commented Mar 22, 2020 •

edited

bytesnake commented Apr 25, 2020 •

edited

lobpcg commented May 6, 2020 •

edited