forked from Reference-LAPACK/lapack
-
Notifications
You must be signed in to change notification settings - Fork 0
Add tests for the branch qr+cs gsvd #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
weslleyspereira
wants to merge
270
commits into
christoph-conrads:qr+cs-gsvd
from
weslleyspereira:add-tests-for-xGGQRCS-branch-qr+cs-gsvd
Closed
Add tests for the branch qr+cs gsvd #4
weslleyspereira
wants to merge
270
commits into
christoph-conrads:qr+cs-gsvd
from
weslleyspereira:add-tests-for-xGGQRCS-branch-qr+cs-gsvd
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ease debugging
This bug was introduced in commit 446ac9a2.
Make the test slightly more robust by using non-diagonal test matrices.
This bug was discovered by the random test from commit dcd04cd. Minimal triggering example with m=1, n=2, p=1 with non-random entries: A = [1, 1], B = [1, 0].
Fix a problem where double precision code was built when single precision was requested.
Fix harmless out-of-bounds accesses to avoid spurious address sanitizer (ASAN) failures.
Accidentally, the code check for infinity when finite numbers were expected.
Fix the U2 column permutation when the pre-processing of the matrix B is enabled and rank(B) is larger than p - rank(B).
The pre-processing may make the computation of some angles by the CS decomposition superfluous. These missing angles should properly computed for any combination of pre-processing flags starting with this commit. The tests are still not passing because the pre-processing of B is still broken when B has full rank because the bottom right element of B containing the elementary reflector is overwritten by the scalar belonging to the elementary reflector
* add hints for pre-processing of input matrices A, B * add parameter for matrix X and its leading dimension LDX (this simplifies workspace management a bit as well) * reduce workspace size by evaluating matrix ranks * simplify workspace management * reserve workspace for scalar factors of QR factorization of A, B instead of re-using ALPHA, BETA * avoid unnecessary xGEMM (instead of xTRMM ) call when assembling X * insert debugging code overwriting arrays with NaNs
Whenever the norm of B was zero, the calculated norm of G would be NaN. The new checks for NaN caught this problem immediately.
Also swap the pre-processing hints when swapping the order of the input matrices.
Fix a stupid bug caused by replacing the identifier `L` with `RANK` causing a multiplication with a matrix from the *R*ight instead of the *L*eft side. Yesterday several more instances of this bug were fixed; the fixes were squashed into the commit changing the SGGQRCS API. This commit fixes the pre-processing of the input matrix A.
The matrix pre-processing passes all C++ tests in the branch christoph-conrads/qr+cs-gsvd now.
When pre-processing of B is enabled, fix the column order of U2 immediately after multiplying U2 with the orthogonal factor of the QR decomposition of B instead of performing other, unrelated computations in between.
Streamline, comment, and speed up the code adjusting the singular values and the rows of the matrix X for the matrix scaling. * Exploit pre-processing information to avoid calls to expensive trigonometric functions when results are known to be zero or one. * Avoid having to move angles computed by the CS decomposition * add extensive commments
* allow more matrices with more rows than columns for testing pre-processing * halve maximum xGGQRCS matrix size
SORMQR requires a workspace of size 4000+ for 1x1 input (bug?). Luckily, the call can be omitted.
With the new pre-processing enabled, some matrices have a larger backward error than before. Note that the constant factor in the tolerance expression was doubled but the matrix dimension factor reduced.
Show the output every 60 seconds in the infinite xGGQRCS test unless an iteration takes more than 60 seconds.
Both test problems were found by the random GSVD test.
The algorithm is a simple blocked matrix transposition. In comparison to a simple column-wise read, this approach reduces level 1 cache misses in Cachegrind from 17% to 12%; the last-level cache performance is near 6% for both solutions. Cachegrind cache configuration * level 1: 2048 bytes, 8-way associative, 64 bytes cache line size * last level: 8192 bytes, 16-way associative, 64 bytes cache line size Cachegrind command line: ``` valgrind --tool=cachegrind --D1=2048,8,64 --LL=8192,16,64 ./a.out ```
Previously, xGGQRCS would rely exclusively on QR factorizations for pre-processing the input matrices. These operations were numerically stable but if G = [A; B] had a large number of columns, then xGGQRCS would be significantly slowed down so an LQ decomposition of G was calculated but this caused backward errors 45 times larger than the tolerance for matrix pairs as small as 3x20. These observations match the numerical linear algebra theory saying that the LQ decomposition has a bounded norm-wise round-off error in every column but there are no such bounds for individual columns, cf. Higham: "Accuracy and Stability of Numerical Algorithms" (2nd edition), 2002, Section 19.4 "Pivoting and Row-Wise Stability". There are several possible solutions: * An LQ factorization with row pivoting. There is no such algorithm in LAPACK and porting the existing QR factorization with column pivoting in xGEQP3 will give horrible performance because xGEQP3 relies on matrix columns and elementary reflectors being laid out sequentially in memory. * An in-place matrix transposition followed by a call to xGEQP3 followed by another in-place matrix transposition. In-place matrix transposition is not implemented in LAPACK; implementing this functionality is no trivial and requires a good understanding of modern computer architecture and number theory. * A matrix transposition (not in place) followed by a call to xGEQP3 followed by another matrix transposition. This is possible with the extra memory available when X must be stored. This is the solution implemented in this commit.
The C++ code was used in the performance measurements for SGETRP.
Compute the LQ decomposition with row-pivoting immediately.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adapt tests from #3 for the new interface of xGGQRCS.