Add tests for the branch qr+cs gsvd #4

weslleyspereira · 2021-07-29T14:16:16Z

Adapt tests from #3 for the new interface of xGGQRCS.

Ease debugging

This bug was introduced in commit 446ac9a2.

Make the test slightly more robust by using non-diagonal test matrices.

This bug was discovered by the random test from commit dcd04cd. Minimal triggering example with m=1, n=2, p=1 with non-random entries: A = [1, 1], B = [1, 0].

Fix a problem where double precision code was built when single precision was requested.

Fix harmless out-of-bounds accesses to avoid spurious address sanitizer (ASAN) failures.

Accidentally, the code check for infinity when finite numbers were expected.

Fix the U2 column permutation when the pre-processing of the matrix B is enabled and rank(B) is larger than p - rank(B).

The pre-processing may make the computation of some angles by the CS decomposition superfluous. These missing angles should properly computed for any combination of pre-processing flags starting with this commit. The tests are still not passing because the pre-processing of B is still broken when B has full rank because the bottom right element of B containing the elementary reflector is overwritten by the scalar belonging to the elementary reflector

* add hints for pre-processing of input matrices A, B * add parameter for matrix X and its leading dimension LDX (this simplifies workspace management a bit as well) * reduce workspace size by evaluating matrix ranks * simplify workspace management * reserve workspace for scalar factors of QR factorization of A, B instead of re-using ALPHA, BETA * avoid unnecessary xGEMM (instead of xTRMM ) call when assembling X * insert debugging code overwriting arrays with NaNs

Whenever the norm of B was zero, the calculated norm of G would be NaN. The new checks for NaN caught this problem immediately.

Also swap the pre-processing hints when swapping the order of the input matrices.

Fix a stupid bug caused by replacing the identifier `L` with `RANK` causing a multiplication with a matrix from the *R*ight instead of the *L*eft side. Yesterday several more instances of this bug were fixed; the fixes were squashed into the commit changing the SGGQRCS API. This commit fixes the pre-processing of the input matrix A.

The matrix pre-processing passes all C++ tests in the branch christoph-conrads/qr+cs-gsvd now.

When pre-processing of B is enabled, fix the column order of U2 immediately after multiplying U2 with the orthogonal factor of the QR decomposition of B instead of performing other, unrelated computations in between.

Streamline, comment, and speed up the code adjusting the singular values and the rows of the matrix X for the matrix scaling. * Exploit pre-processing information to avoid calls to expensive trigonometric functions when results are known to be zero or one. * Avoid having to move angles computed by the CS decomposition * add extensive commments

* allow more matrices with more rows than columns for testing pre-processing * halve maximum xGGQRCS matrix size

SORMQR requires a workspace of size 4000+ for 1x1 input (bug?). Luckily, the call can be omitted.

With the new pre-processing enabled, some matrices have a larger backward error than before. Note that the constant factor in the tolerance expression was doubled but the matrix dimension factor reduced.

Show the output every 60 seconds in the infinite xGGQRCS test unless an iteration takes more than 60 seconds.

Both test problems were found by the random GSVD test.

The algorithm is a simple blocked matrix transposition. In comparison to a simple column-wise read, this approach reduces level 1 cache misses in Cachegrind from 17% to 12%; the last-level cache performance is near 6% for both solutions. Cachegrind cache configuration * level 1: 2048 bytes, 8-way associative, 64 bytes cache line size * last level: 8192 bytes, 16-way associative, 64 bytes cache line size Cachegrind command line: ``` valgrind --tool=cachegrind --D1=2048,8,64 --LL=8192,16,64 ./a.out ```

Previously, xGGQRCS would rely exclusively on QR factorizations for pre-processing the input matrices. These operations were numerically stable but if G = [A; B] had a large number of columns, then xGGQRCS would be significantly slowed down so an LQ decomposition of G was calculated but this caused backward errors 45 times larger than the tolerance for matrix pairs as small as 3x20. These observations match the numerical linear algebra theory saying that the LQ decomposition has a bounded norm-wise round-off error in every column but there are no such bounds for individual columns, cf. Higham: "Accuracy and Stability of Numerical Algorithms" (2nd edition), 2002, Section 19.4 "Pivoting and Row-Wise Stability". There are several possible solutions: * An LQ factorization with row pivoting. There is no such algorithm in LAPACK and porting the existing QR factorization with column pivoting in xGEQP3 will give horrible performance because xGEQP3 relies on matrix columns and elementary reflectors being laid out sequentially in memory. * An in-place matrix transposition followed by a call to xGEQP3 followed by another in-place matrix transposition. In-place matrix transposition is not implemented in LAPACK; implementing this functionality is no trivial and requires a good understanding of modern computer architecture and number theory. * A matrix transposition (not in place) followed by a call to xGEQP3 followed by another matrix transposition. This is possible with the extra memory available when X must be stored. This is the solution implemented in this commit.

The C++ code was used in the performance measurements for SGETRP.

Compute the LQ decomposition with row-pivoting immediately.

christoph-conrads added 30 commits April 20, 2021 15:43

DGGQRCS: change location of matrix R

04e3769

Tests: update extraction of xGGQRCS' matrix R

e36a12b

Tests: check for infinity in xGGQRCS' return values

6080251

DGGQRCS: overwrite unused memory with NaNs

190818e

Ease debugging

DGGQRCS: fix argument to DLACPY

e48f19c

This bug was introduced in commit 446ac9a2.

Tests: make existing xGGQRCS test more robust

aa4a813

Make the test slightly more robust by using non-diagonal test matrices.

Tests: more lenient xGGQRCS orthogonal matrix tests

44974a0

DGGQRCS: use correct leading dimension

455433f

Tests: add xGGQRCS test with random matrices

ffdb9c8

Tests: more lenient error bounds for xGGQRCS

2b6e345

DGGQRCS: fix triangular matrices copies again

54b89b8

This bug was discovered by the random test from commit dcd04cd. Minimal triggering example with m=1, n=2, p=1 with non-random entries: A = [1, 1], B = [1, 0].

DGGQRCS: more accurate comments

0e0189f

Tests: check optimal LWORK value of xGGQRCS

94e71e2

Add program comparing DGGQRCS, DGGSVD3

400330c

Tests: make Boost libraries dependency optional

143ebc9

Fix unused variable warnings in Release mode

42db830

Use BOOST_ASSERT instead of assert in tests

f92dd2c

Add single-precision GSVD via QR+CSD

4831af9

DGGQRCS: fix typos

4a65976

CGGQRCS: draft complex (2x32bit) GSVD via QR, CSD

5ea7dd6

CGGQRCS: draft tests

4aa1d8e

TEST: fix single/double precision code build

3934f6e

Fix a problem where double precision code was built when single precision was requested.

Test: fix Fortran ABI in C++ code

6a03cb5

Fix harmless out-of-bounds accesses for ASAN

3c79240

Fix harmless out-of-bounds accesses to avoid spurious address sanitizer (ASAN) failures.

Test: use BOOST_REQUIRE for inputs

b617a20

SGGQRCS: fix branch condition causing NaNs

eb28fc7

Test: fix a messed up check

69ece04

Accidentally, the code check for infinity when finite numbers were expected.

Test: add checks in xGGQRCS test

6b98500

Test: replace deprecated header

8a28e44

CGGQRCS: fix an EXTERNAL statement

6002beb

christoph-conrads and others added 29 commits April 27, 2021 21:38

SGGQRCS: fix column permutation with high rank

75c4649

Fix the U2 column permutation when the pre-processing of the matrix B is enabled and rank(B) is larger than p - rank(B).

Test: adapt tests to new SGGQRCS API

f95d68f

SGGQRCS: handle case norm(B) = 0

2ccb240

Whenever the norm of B was zero, the calculated norm of G would be NaN. The new checks for NaN caught this problem immediately.

SGGQRCS: fix rank computation with pre-processing

3875c61

SGGQRCS: swap pre-processing hints if necessary

fce3cbd

Also swap the pre-processing hints when swapping the order of the input matrices.

SGGQRCS: enable matrix pre-processing

1da06ef

The matrix pre-processing passes all C++ tests in the branch christoph-conrads/qr+cs-gsvd now.

SGGQRCS: add debugging code

2961991

SGGQRCS: fix U2 column order as soon as possible

d444541

When pre-processing of B is enabled, fix the column order of U2 immediately after multiplying U2 with the orthogonal factor of the QR decomposition of B instead of performing other, unrelated computations in between.

Test: adjust xGGQRCS matrix size for pre-processing

6b67c30

* allow more matrices with more rows than columns for testing pre-processing * halve maximum xGGQRCS matrix size

SGGQRCS: query SORMQR for optimal workspace size

abca431

SORMQR requires a workspace of size 4000+ for 1x1 input (bug?). Luckily, the call can be omitted.

SERRGG: adapt test to new SGGQRCS API

40b3e54

Test: increase xGGQRCS backward error tolerance

107f1ba

With the new pre-processing enabled, some matrices have a larger backward error than before. Note that the constant factor in the tolerance expression was doubled but the matrix dimension factor reduced.

SGGQRCS: update the documentation after API change

162d3e9

Test: show output every 60 seconds in xGGQRCS test

3a36a7a

Show the output every 60 seconds in the infinite xGGQRCS test unless an iteration takes more than 60 seconds.

SGGQRCS: avoid oversized workspaces

f7e328b

SGGQRCS: fix possibly oversized workspace

41b07d4

SGGQRCS: draft code reducing number of columns

22eb041

SGGQRCS: fix use of unitialized variables

69ac078

Test: add tests for SGGQRCS column reduction

66fcbec

Both test problems were found by the random GSVD test.

SGGQRCS: shrink workspace when pre-processing G

f55fcd0

Test: add matrix transposition demo code

c99810f

The C++ code was used in the performance measurements for SGETRP.

SGGQRCS: modify pre-processing for G

38f3934

Compute the LQ decomposition with row-pivoting immediately.

Adapt tests for the new interface of xGGQRCS

edfaf44

weslleyspereira closed this Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tests for the branch qr+cs gsvd #4

Add tests for the branch qr+cs gsvd #4

Uh oh!

weslleyspereira commented Jul 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add tests for the branch qr+cs gsvd #4

Add tests for the branch qr+cs gsvd #4

Uh oh!

Conversation

weslleyspereira commented Jul 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants