New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QZ-decomposition #1067
QZ-decomposition #1067
Conversation
…on failing across many modules
…est for generalized eigenvalues
@mkeeter While the decomposition works (as evidenced by the tests), the tests for calculated eigenvalues, not all are passing the Next steps is to figure out more clues by comparing against scipy's Wondering if @Andlon or @sebcrozet have any thoughts on why the return from lapack aren't the greatest |
doesn't seem to be a matrix conditioning problem that's triggering the issue. Even when the conditioning numbers were limited to matrices whose values (conditioning) < 10, we're still getting failing tests (the test here being |
@metric-space: I haven't looked at any details here, but I would advice caution when using determinants for testing. They are extremely sensitive, and in my experience can never be used to reliably determine if a matrix is singular, for example. I'm not sure why you want to compute the determinant of I'm not too familiar with the QZ decomposition, so I'll comment on the case where you might for example solve a generalized eigenvalue problem below, i.e. find generalized eigenvectors It seems to me that it's better to directly test the property use matrixcompare::assert_matrix_eq;
let a = ...;
let b = ...;
let av = a * v;
let lambda_bv = lambda * (b * v);
// This isn't necessarily a foolproof tolerance by any means, but it might be a good place to start?
let tol = 1e-12 * (av.amax() + lambda_b.amax());
// in a standard test
assert_matrix_eq(av, lambda_b, comp = abs, tol = tol);
// in a proptest
prop_assert_matrix_eq(av, lambda_b, comp = abs, tol = tol); |
Apart from that, except for maybe Given's rotations or Householder matrices, virtually all linear algebra algorithms are susceptible to particularly badly conditioned matrices (what this means depends on the context). I suspect you can find a particularly nasty adversarial example for most algorithms. I have no idea how robust the LAPACK GZ decompositions are in this case, but randomized testing of linear algebra is in general very, very difficult, because you cannot normally find a single tolerance that works across all examples: in fact, you'd need to have relatively sharp theoretical error bounds to compare with! And moreover, you would even need an "exact" answer to begin with, which by itself is difficult to obtain, sometimes requiring exact arithmetic. |
Sorry for the accidental close - misclicked by about 2 millimeters. |
@Andlon Really really appreciate the feedback and help. A mistake I made I made on my part is not make things more clear by linking to proper literature in the PR description. Will be working to fix this
I had assumed that, as per the the wikipedia page on generalized eigenvalues and this description of QZ decomposition
The ratio of the diagonal elements of S to those of T seem to be general eigenvalues and generalized eigenvalues are defined as values that satisfy the following
Apologies if that does nothing for your understanding of the problem and I'm merely regurgitating what you already know The determinant sensitivity problem is something I'll take a note of and bear in mind from now on. Thank you for the same.
This + everything else you've said has me convinced this is the way forward for the PR |
My apologies, the determinant equation certainly makes sense. I'm not sure what I was thinking: I think perhaps I first thought you were using it as a proxy to check if @metric-space: I think the LAPACK manual might interest you, as they write in detail how they test their algorithms. |
…ia LAPACK routines sggev/dggev
…est for QZ's calculation of eigenvalues
Thank you for the link and will keep this in mind |
Update: still working on this, at the moment chasing a particular detail in the testing portion of the pdf linked by @Andlon |
Apologies to @Andlon , gave up on looking up why LAPACK folks tested with the ratios that they do. The folks at eigen seem to be testing in a mix of LAPACK way (sorta kinda) + the usual tests https://gitlab.com/libeigen/eigen/-/blob/master/test/eigensolver_generalized_real.cpp#L69 Decided to minimize how much I do with the raw results from LAPACK and the reasoning for that is their tests still work for the outputs from LAPACK, the smaller my imprint the more the chances of their verification of functionality still standing through these wrapper Still open to hunting down details. Just a little pooped |
@Andlon it turns out that there were problems with how the eigenvector matrix build-up was being handled by the code I wrote. Corrected that and removed condition based calculation to filter out tests. Sorry for the trouble caused as a result, though the determinant part of things still stand. @sebcrozet not sure why tests associated with code I've not touched are failing as well as cuda building is failing |
@metric-space Let’s ignore the Cuda failure. I’m guessing something changed upstream with the cuda installation detection. I will fix this in a another PR. Regarding the other test failing, it isn’t uncommon that the svd or eigen-decomposition tests fail sometimes, most likely because a particularly badly conditioned matrix is generated randomly (or because of #1072 perhaps). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, but in the process I noticed some things that escaped me the first time around. Maybe you have some thoughts? See comments!
@@ -265,23 +266,8 @@ where | |||
out[i] = if self.beta[i].clone().abs() < T::RealField::default_epsilon() { | |||
Complex::zero() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on check applies here
while c < n { | ||
if eigenvalues[c].im.abs() > T::RealField::default_epsilon() && c + 1 < n && { | ||
if eigenvalues[c].im.abs() > epsilon && c + 1 < n && { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this the first time around, but at first glance this logic does not sound to me. What does it do? I think this needs a solid comment and/or documentation.
For example, imagine scaling a matrix by 1e20
, then all its eigenvalues get scaled by 1e20
and this logic will behave entirely differently. It seems to me at the very least that we'd need to look relative to the maximum absolute eigenvalue, but even then it's not bullet proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scaling point is something I've never considered, it's a pretty solid point.
But in this scenario, LAPACK uses the eigenvalues to signal how to assemble eigenvectors. If the eigenvalue is complex and if the subsequent eigenvalue is the former's complex conjugate, the eigenvector to be assembled has it's real and imaginary parts in two subsequent columns.
Does the scaling edge case apply here seeing that it's more of an assembling problem by signalling than a numeric one? In the documentation, the way to check for realness is by looking at a_j values, this gets mapped to the imaginary value of an eigenvalue. While it's interesting for me to look deeper in general and perhaps into LAPACK code, was wondering if putting the burden of getting right for the edge case is more on LAPACK to get it right?
In either case, putting in a comment here does make sense Is this not apt to explain what is being done? It is possible I've been in the trenches long enough to not know if it doesn't map well https://github.com/dimforge/nalgebra/pull/1067/files#diff-a056a4c6b4fb629735a73fcb65e0d5223386cb47dedb6ab80d5a1b905ebd9d25R198
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for me it's not clear why we need to do any kind of "post processing" at all. I think this would be nice to have a comment explaining this. And moreover, I would be very surprised if LAPACK expects you to use approximate comparison if it's signaling something through values - are you sure exact comparison is not more appropriate? I just don't have enough context to know what is the appropriate thing to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fair. I think there is some level of post processing expected but not to the extent I have done which is more a result of not trusting LAPACK which may be unwarranted. The changes reflect what I think LAPACK expects us to do minimally
relative_eq!( | ||
((&l_a - &l_b)*vsr.column(i)).map(|x| x.modulus()), | ||
OMatrix::zeros_generic(n, Const::<1>), | ||
epsilon = 1.0e-5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relative comparisons tend to break down near zero - can't this make this test relatively brittle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. I believe there is the absdiff catch as shown here https://docs.rs/approx/0.4.0/src/approx/relative_eq.rs.html#68 to tackle the concern. Hope I'm not preaching to the choir. If I am, apologies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that relative_eq!
also has an absolute tolerance check in addition - that's very unexpected.
It is a little unfortunate then that approx
is used to liberally within nalgebra
. Using default absolute tolerances has a number of pitfalls - in particular because it is generally impossible to know the right magnitude to use without further context.
With all that said, it might be fine in this case, if you know that the "scale" (for example, the largest entry) of your numbers are expected to be somewhat close to 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all that said, it might be fine in this case, if you know that the "scale" (for example, the largest entry) of your numbers are expected to be somewhat close to 1.
The eigenvectors seem to be guaranteed by LAPACK in their documentation
Each eigenvector is scaled so the largest component has abs(real part)+abs(imag. part)=1.`
So I think scale wise we should be good
@Andlon I asked for a re-review just to notify you I have replies to your comments instead of code. Hope that isn't a bother |
/// u(j)**H * A = lambda(j) * u(j)**H * B . | ||
/// where u(j)**H is the conjugate-transpose of u(j). | ||
/// | ||
/// What is going on below? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an answer to your question here: #1067 (comment)
I think this doc comment is very confusing because it's not clear what "below" is. Keep in mind that the source code is not generally displayed under the documentation when browsing docs, so you certainly wouldn't expect it to mean code in that case, but rather the docs that follow.
Anyway, for what comes below, is it possible to give a more intuitive explanation, perhaps in addition? Do generalized complex eigenvalues come in conjugate pairs? Maybe you can start by actually writing this out, as since I don't have experience with generalized eigenvalues it's not something I would immediately think of. I also don't really understand the VSL
notation (although the MATLAB-like indexing is OK). What is VSL
again? It's not explained here in the docs.
@Andlon As always thank you for the review. Once again the ball is in your court |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs definitely improved a lot, but I'm still confused about the eigenvectors extraction, see comments.
/// Outputs two matrices, the first one containing the left eigenvectors of the generalized eigenvalues | ||
/// as columns and the second matrix contains the right eigenvectors of the generalized eigenvalues | ||
/// as columns | ||
/// Outputs two matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't there need to be an extra newline here to make sure only the first sentence ends up in the module overview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re:module over-view: isn't the module overview here https://github.com/dimforge/nalgebra/pull/1067/files#diff-a056a4c6b4fb629735a73fcb65e0d5223386cb47dedb6ab80d5a1b905ebd9d25R16 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not module overview, but the docs for the struct. Please render them (cargo doc
) and see :)
/// as columns and the second matrix contains the right eigenvectors of the generalized eigenvalues | ||
/// as columns | ||
/// Outputs two matrices. | ||
/// The first output matix contains the left eigenvectors of the generalized eigenvalues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "matix"
/// How the eigenvectors are build up: | ||
/// | ||
/// Since the input entries are all real, the generalized eigenvalues if complex come in pairs | ||
/// as a consequence of <https://en.wikipedia.org/wiki/Complex_conjugate_root_theorem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this link renders - not familiar with angle bracket syntax here, but maybe use "proper" linking to make the text overall more readable (URLs break the flow of reading)? E.g. the [the complex conjugate root theorem](https://en.wikipedia.org/wiki/Complex_conjugate_root_theorem)
.
/// u(j)**H * A = lambda(j) * u(j)**H * B | ||
/// where u(j)**H is the conjugate-transpose of u(j). | ||
/// | ||
/// How the eigenvectors are build up: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"built"?
/// | ||
/// How the eigenvectors are build up: | ||
/// | ||
/// Since the input entries are all real, the generalized eigenvalues if complex come in pairs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused by this sentence. It says "if complex". But don't we always have to expect complex eigenvalues and eigenvectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what you mean. Real is a subset of Complex Numbers, if we are talking about a mix of complex eigenvalues and real eigenvalues, why not refer to them by the generalized type?
LAPACK does it explicitly here
If ALPHAI(j) is zero, then the j-th eigenvalue is real;
My guess is that computationally in this case (going via the LAPACK docs) complex numbers i.e those with a non-zero imaginary part seem like the exception and the ones with a the 0 imaginary part don''t seem to add any value (computational resource wise) being seen as explicitly complex. Or atleast that seem to be what I gathered
Fixed doc
Edit:misunderstood intent
let e = eigenvalues[c + 1]; | ||
(&e_conj.re).ulps_eq(&e.re, epsilon, 6) && (&e_conj.im).ulps_eq(&e.im, epsilon, 6) | ||
} { | ||
if eigenvalues[c].0.im.abs() > epsilon && c + 1 < n { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still uneasy about this check. Does the LAPACK manual/docs have any recommendations for how to interpret the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the LAPACK docs:
If ALPHAI(j) is zero, then the j-th eigenvalue is real;
alphai is a double precision array as per LAPACK.
That's as far as I've gathered from the docs as how to interpret the data
The alphai
in the docs gets pushed into the imaginary portion of the first elements of the tuples in the list of tuples returned by raw_eigenvalues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if LAPACK explicitly sets this to zero, or if it just happens to become "close enough" to zero? From reading that I'd say that the only "safe" thing to do is to use an equality comparison (==
), because any attempt at approximate comparison might have ill-intended effects. But I also don't know enough about what LAPACK is doing here and what it expects users to do...
Certainly the current epsilon check breaks down if the matrix doesn't happen to have its max eigenvalue scaled to roughly ~1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Andlon Apologies for the radio silence. So I thought the best way to check if LAPACK suggest equality over close enough was to sub the equal operator in and run the tests. It fails the tests
You've been driving the scaled eigenvalue point for a while, it was only now the seriousness struck. Apologies for that.
So LAPACK definitely doesn't take care of the scaled problem, which is evident if you apply random scaling to both A and B (input matrices) for the tests (the tests fail)
I can go ahead and in the method that assembles the eigenvectors by looking at the eigenvalues, bring back the eigenvalue method I deleted, obtain the max eigenvalue and scale the eigenvalue vector so that the max eigenvalue is ~1
Are we in agreement that this is the way forward?
Again I think the tests say LAPACK doesn't deal with the problem and if for some reason it(LAPACK) did recommend actual equality, the tests sure say the opposite i.e "close enough" is what we need
/// (Note: VL stands for the lapack real matrix output containing the left eigenvectors as columns, | ||
/// VR stands for the lapack real matrix output containing the right eigenvectors as columns) | ||
/// | ||
/// If the j-th and (j+1)-th eigenvalues form a complex conjugate pair, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the user know this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a comment, so the caller doesn't have to know the details unless they inspect the source code
/// What is going on below? | ||
/// If the j-th and (j+1)-th eigenvalues form a complex conjugate pair, | ||
/// then u(j) = VSL(:,j)+i*VSL(:,j+1) and u(j+1) = VSL(:,j)-i*VSL(:,j+1). | ||
/// and then v(j) = VSR(:,j)+i*VSR(:,j+1) and v(j+1) = VSR(:,j)-i*VSR(:,j+1). | ||
pub fn eigenvectors(self) -> (OMatrix<Complex<T>, D, D>, OMatrix<Complex<T>, D, D>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I did not realize before now: Here we take self
by value, but in the implementation we still clone self.vsl
and self.vsr
. This seems a little unfortunate, perhaps?
/// and | ||
/// | ||
/// u(j) = VR(:,j)+i*VR(:,j+1) | ||
/// v(j+1) = VR(:,j)-i*VR(:,j+1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still very confused about this. Isn't this what's going on internally in the method below? The docs here are written as if the user needs to do this after calling this method, but it looks to me that the method actually takes care of all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the user doesn't need to know unless they inspect the source code. Portion moved to comment within function body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @metric-space,
sorry for taking so long. I'm finding it very difficult to find the time for code reviews at the moment, and as I said at the start, I'm not very familiar with the LAPACK APIs.
I think the PR has greatly improved (especially the docs) since we started. What remains is mainly the somewhat open question about what to do about the "approximate" eigenvalue checks. I still feel very queasy about this, even though you say that the tests pass only with the "approximate" test. Looking at the LAPACK docs for dhgeqz
, they say:
If beta is nonzero, lambda = alpha / beta is an eigenvalue of the
generalized nonsymmetric eigenvalue problem (GNEP)A*x = lambda*B*x
and if alpha is nonzero, mu = beta / alpha is an eigenvalue of the
alternate form of the GNEPmu*A*y = B*y.
I'd like to think that the LAPACK folks really mean nonzero when they write nonzero, and that they would otherwise specify how to deal with any kind of approximate check. But I don't know - as I said I have little experience with the LAPACK API.
However, at this point your expertise is certainly greater than mine, so I'll defer to your decision here.
I know you've already spent much time in implementing this and one reaches a point where further work is very tiresome, so I don't want to impose more burdens on you. In any case whether it's ready for merge is not ultimately my decision. That said, I feel that I should point out that I would feel better with some more tests. Proptests alone are prone to giving us a false sense of safety.
/// Outputs two matrices, the first one containing the left eigenvectors of the generalized eigenvalues | ||
/// as columns and the second matrix contains the right eigenvectors of the generalized eigenvalues | ||
/// as columns | ||
/// Outputs two matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not module overview, but the docs for the struct. Please render them (cargo doc
) and see :)
Hey @Andlon
Thank you for this. That said, though it may be tiresome, in the end in the grand scheme of things, a small price to pay to put out something trustworthy. So given this short break will be able to look at this again to see and go through the docs again and dig a little more deeper to see if my take is at odds with the reality of LAPACK Will |
Basing off newest changes off this line of fortran in the latest linked docs by Andlon alphai( j ) = zero Hoping this will change and the passing test put's an end to this PR, what say you @Andlon and @sebcrozet ? |
Big thanks for working through this! |
Thank you to both @Andlon and @sebcrozet |
Aim
This hopes to close #1056
LAPACK routines wrapped around
Notes
raw_eigenvalues
that outputs tuples ofalpha
andbeta
values straight from LAPACK without processing them. This is done so for the user to decide how to handle cases wherebeta
is almost zeroHow this was tested
The following commands were used to run tests
PROPTEST_MAX_SHRINK_ITERS=100000 cargo test qz --features arbitrary,proptest-support -- --nocapture;
PROPTEST_MAX_SHRINK_ITERS=100000 cargo test generalized_eigenvalues --features arbitrary,proptest-support -- --nocapture;