Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LOBPCG solver for large symmetric positive definite eigenproblems #184

Merged
merged 30 commits into from May 6, 2020

Conversation

bytesnake
Copy link
Contributor

@bytesnake bytesnake commented Mar 12, 2020

This PR ports the LOBPCG algorithm from scipy to Rust. The algorithm is useful for the symmetric eigenproblem for just a couple of eigenvalues (for example for multidimensional scaling of gaussian kernels). Solves the issue #160

I did not implement the generalized eigenproblem for matrix B different to identity, as its a uncommon use-case (at least in machine-learning), but if required the modification should be minor.

It also adds access to the functions ssygv, dsygv, zhegv, chegv for the generalized eigenvalue problem
image
with additional mass matrix B. The traits are implemented for tuples of A and B, so you can use it like this

let (eigvals, (eigvecs, B_cholesky)) = (A, B).eigh(UPLO::Upper);

Remaining issues:

  • Implement the orthogonalization to the constraint matrix Y
  • Improve documentation of the lobpcg.rs file and add examples
  • Implement truncated eigenvalue decomposition
  • Implement truncated SVD based on this
  • Benchmark the implementation
  • Add restart routine if cholesky fails

Example:

fn main() {
    let a = arr2(&[[3., 2., 2.], [2., 3., -2.]]);

    // calculate the truncated singular value decomposition for 2 singular values
    let result = TruncatedSvd::new(a, TruncatedOrder::Largest).decompose(2).unwrap();

    // acquire singular values, left-singular vectors and right-singular vectors
    let (u, sigma, v_t) = result.values_vectors();
    println!("Result of the singular value decomposition A = UΣV^T:");
    println!(" === U ===");
    println!("{:?}", u);
    println!(" === Σ ===");
    println!("{:?}", Array2::from_diag(&sigma));
    println!(" === V^T ===");
    println!("{:?}", v_t);
}

@bytesnake bytesnake changed the title Add generalized eigenvalue decomposition Add Locally Optimal Block Preconditioned Conjugated (LOBPC) for large symmetric positive definite eigenproblems Mar 13, 2020
@bytesnake bytesnake changed the title Add Locally Optimal Block Preconditioned Conjugated (LOBPC) for large symmetric positive definite eigenproblems Add Locally Optimal Block Preconditioned Conjugated (LOBPCG) for large symmetric positive definite eigenproblems Mar 13, 2020
@bytesnake bytesnake changed the title Add Locally Optimal Block Preconditioned Conjugated (LOBPCG) for large symmetric positive definite eigenproblems Add LOBPCG solver for large symmetric positive definite eigenproblems Mar 13, 2020
@lobpcg
Copy link

lobpcg commented Mar 21, 2020

pls let me know if you need any advice on LOBPCG implementation tricks

@bytesnake
Copy link
Contributor Author

bytesnake commented Mar 22, 2020

Yes it is always a bit difficult to re-implement something from code and understand the reasoning behind it. So I currently have some free time and decided to implement LOBPCG for manifold learning. I brushed up some numeric classes for the conjugated gradient and Rayleigh-Ritz method.
The code is almost identical with the scipy implementation, except for some minor language details. (for example there is no restart flag, because the conjugated matrix P is optional and automatically restarted if none)

  • The (B-)orthonormalization routine chooses cholesky instead of QR decomposition (probably to save some cycles?) and calculates orthonormal matrix Q by explicitly calculating the inverse of R (here), why not use cho_solve?
  • When does the algorithm starts to calculate the gram matrices XX' and XR' explicitly, how is the threshold chosen?

Thanks for your help!

@lobpcg
Copy link

lobpcg commented Mar 23, 2020

@bytesnake Thanks for your efforts! Please see my answers below:

Yes it is always a bit difficult to re-implement something from code and understand the reasoning behind it. So I currently have some free time and decided to implement LOBPCG for manifold learning. I brushed up some numeric classes for the conjugated gradient and Rayleigh-Ritz method.

Cf. https://scikit-learn.org/stable/modules/generated/sklearn.manifold.spectral_embedding.html

The code is almost identical with the scipy implementation, except for some minor language details. (for example there is no restart flag, because the conjugated matrix P is optional and automatically restarted if none)

Without the "conjugated" matrix P, the iteration indeed runs, but its convergence can be (much) slower.

The (B-)orthonormalization routine chooses cholesky instead of QR decomposition (probably to save some cycles?) and calculates orthonormal matrix Q by explicitly calculating the inverse of R (here), why not use cho_solve?

Two reasons:

  1. The inverse of R is being reused in some cases, otherwise require calling cho_solve twice, which would be more time consuming.
  2. The operations with "small" matrices like R are separated, and the only operation involving both "small" matrices like R and "tall" matrices like blockVectorBV is matmul (which can be, e.g., easily performed in parallel or on GPU if available, in contrast to cho_solve).

When does the algorithm starts to calculate the gram matrices XX' and XR' explicitly, how is the threshold chosen?

Manually, after a large battery of tests, checking for failures. "Explicit" is more stable, but slower. 'float32' is less stable compared to 'float64', so requires switching to "explicit" more aggressively.
The current values in https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py

        if activeBlockVectorAR.dtype == 'float32':
            myeps = 1
        elif activeBlockVectorR.dtype == 'float32':
            myeps = 1e-4
        else:
            myeps = 1e-8

are more on the safe and slower side.

@bytesnake
Copy link
Contributor Author

@bytesnake Thanks for your efforts! Please see my answers below:

Yes it is always a bit difficult to re-implement something from code and understand the reasoning behind it. So I currently have some free time and decided to implement LOBPCG for manifold learning. I brushed up some numeric classes for the conjugated gradient and Rayleigh-Ritz method.

Cf. https://scikit-learn.org/stable/modules/generated/sklearn.manifold.spectral_embedding.html

I already looked into the code. For now I try to have the code as simple as possible, though support for Lanczos method through ARPACK or preconditioner should be added in the future. The main reason for the diffusion map PR is to give some feedback for the machine learning group in Rust.

The code is almost identical with the scipy implementation, except for some minor language details. (for example there is no restart flag, because the conjugated matrix P is optional and automatically restarted if none)

Without the "conjugated" matrix P, the iteration indeed runs, but its convergence can be (much) slower.

Sorry I wasn't very clear in my comment. I expressed restart and blockVectorP as type Option<T> and changed the code flow a bit. This is just a language detail, of course information of P is also used.

The (B-)orthonormalization routine chooses cholesky instead of QR decomposition (probably to save some cycles?) and calculates orthonormal matrix Q by explicitly calculating the inverse of R (here), why not use cho_solve?

Two reasons:

1. The inverse of `R` is being reused in some cases, otherwise require calling  `cho_solve` twice, which would be more time consuming.

2. The operations with "small" matrices like `R` are separated, and the only operation involving both    "small" matrices like `R` and "tall" matrices like `blockVectorBV` is matmul (which can be, e.g.,  easily performed in parallel or on GPU if available, in contrast to  `cho_solve`).

Thank you for clarifying that.

When does the algorithm starts to calculate the gram matrices XX' and XR' explicitly, how is the threshold chosen?

Manually, after a large battery of tests, checking for failures. "Explicit" is more stable, but slower. 'float32' is less stable compared to 'float64', so requires switching to "explicit" more aggressively.
The current values in https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py

        if activeBlockVectorAR.dtype == 'float32':
            myeps = 1
        elif activeBlockVectorR.dtype == 'float32':
            myeps = 1e-4
        else:
            myeps = 1e-8

are more on the safe and slower side.

Okay, I will just adopt these. The second case happens, when the preconditioner operates in 32bit and the "stiffness matrix" in 64bit.

@lobpcg
Copy link

lobpcg commented Mar 25, 2020

@bytesnake you might also want to look at the LOBPCG C code for inspiration: https://github.com/lobpcg/blopex
Please feel free to ping me if you have any more questions. I would be also interested to see performance comparisons to my C and Python codes

@bytesnake
Copy link
Contributor Author

I wrote a small benchmark this morning, comparing BLOPEX and this implementation. For this I modified the blopex_serial_double example by splitting it into setup and solving part and compiled it into a static library. I then created a benchmark with criterion between ndarray-linalg and the static library of BLOPEX. The initial matrix X has a standard uniform distribution and is the same for both libraries. A is a diagonal matrix with linearly decreasing values. The benchmark was conducted with a warmup of three second and a sample size of 100. Results for varying number of eigenvalues and matrix size n=60
violin_nvals
Results for varying matrix size and number eigenvalues fixed to nvals = 1:
violin_size
The main difference is that ndarray-linalg uses optimized matrix multiplication from openblas, so I don't think that this benchmark actually says anything about the LOBPCG implementation. But the runtime scales linear with the number of eigenvalues and quadratic with the matrix size.

@bytesnake
Copy link
Contributor Author

@termoshtt please let me know when this is ready to merge

@termoshtt
Copy link
Member

Thanks a lot, and sorry for later response :<

My concern is about ndarray-rand = 0.11 which requires rand = 0.7 (related to #176).
num-complex 0.3.0-pre is still developing rust-num/num-complex#70

Is it possible to replace ndarray-rand part using ndarray-lianlg::generate::random_* functions?

@bytesnake
Copy link
Contributor Author

bytesnake commented Apr 25, 2020

Yes of course, the ndarray-rand crate is only needed to generate the initial guess. I will update my PR later this week. (btw what distribution does random have?) I also encountered an issue with the restart routine, which I have to fix first.

@bytesnake
Copy link
Contributor Author

pushed a new commit which removes the dependency to ndarray-rand :)

@termoshtt termoshtt merged commit c033cb9 into rust-ndarray:master May 6, 2020
@lobpcg
Copy link

lobpcg commented May 6, 2020

@bytesnake

I put a little advertisement at
https://www.linkedin.com/posts/andrew-knyazev_add-lobpcg-solver-for-large-symmetric-positive-activity-6663627147921375232-D2e_ (the direct link is somehow broken, so to see it navigate to posts from https://www.linkedin.com/in/andrew-knyazev/)

Please let me know if it is OK and ping me with your LinkedIn name if you want me to credit you there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants