<< . .

. 17
( : 95)

. . >>

6 111111111111111
B 00000
5 111111111111111
A 000000000000000
00 000000000000000
11 000000000000000

FIGURE 3.8. Two steps of nested dissection. Graph partitioning (left) and matrix
reordering (right)

estimate (3.64) becomes

¤ nu 3 A + O(u2 ).
δA + 5n U (3.65)
∞ ∞ ∞

The bound for δA ∞ in (3.65) is of practical use only if it is possible to
provide an estimate for U ∞ . With this aim, backward analysis can be
carried out introducing the so-called growth factor
max|aij |
ρn = . (3.66)
max|aij |

Taking advantage of the fact that |uij | ¤ ρn max|aij |, the following result
due to Wilkinson can be drawn from (3.65),
¤ 8un3 ρn A + O(u2 ).
δA (3.67)
∞ ∞

The growth factor can be bounded by 2n’1 and, although in most of the
cases it is of the order of 10, there exist matrices for which the inequality in
(3.67) becomes an equality (see, for instance, Exercise 5). For some special
classes of matrices, a sharp bound for ρn can be found:
3.10 Accuracy of the Solution Achieved Using GEM 105

1. for banded matrices with upper and lower bands equal to p, ρn ¤
22p’1 ’ (p ’ 1)2p’2 . As a consequence, in the tridiagonal case one
gets ρn ¤ 2;
2. for Hessenberg matrices, ρn ¤ n;
3. for symmetric positive de¬nite matrices, ρn = 1;
4. for matrices strictly diagonally dominant by columns, ρn ¤ 2.
To achieve better stability when using GEM for arbitrary matrices, re-
sorting to complete pivoting would seem to be mandatory, since it ensures
that ρn ¤ n1/2 2 · 31/2 · . . . · n1/(n’1) . Indeed, this growth is slower
than 2 as n increases.
However, apart from very special instances, GEM with only partial piv-
oting exhibits acceptable growth factors. This make it the most commonly
employed method in the computational practice.

Example 3.7 Consider the linear system (3.2) with
µ 1 1+µ
A= , b= , (3.68)
1 0 1
which admits the exact solution x=1 for any value of µ. The matrix is well-
conditioned, having K∞ (A) = (1 + µ)2 . Attempting to solve the system for µ =
10’15 by the LU factorization with 16 signi¬cant digits, and using the Programs
5, 2 and 3, yields the solution x = [0.8881784197001253, 1.000000000000000]T ,
with an error greater than 11% on the ¬rst component. Some insight into the
causes of the inaccuracy of the computed solution can be drawn from (3.64).
Indeed this latter does not provide a uniformly small bound for all the entries of
matrix δA, rather
3.55 · 10’30 1.33 · 10’15
|δA| ¤ .
1.33 · 10’15 2.22

Notice that the entries of the corresponding matrices L and U are quite large in
module. Conversely, resorting to GEM with partial or complete pivoting yields

the exact solution of the system (see Exercise 6).

Let us now address the role of the condition number in the error analysis
for GEM. GEM yields a solution x that is typically characterized by having
a small residual r = b ’ Ax (see [GL89]). This feature, however, does not
ensure that the error x ’ x is small when K(A) 1 (see Example 3.8). In
fact, if δb in (3.11) is regarded as being the residual, then
x’x 1 r
¤ K(A) r ¤ K(A) .
x A x b
This result will be applied to devise methods, based on the a posteriori
analysis, for improving the accuracy of the solution of GEM (see Section
106 3. Direct Methods for the Solution of Linear Systems

Example 3.8 Consider the linear system Ax = b with
1 1.0001 1
A= , b= ,
1.0001 1 1

which admits the solution x = (0.499975 . . . , 0.499975 . . . )T . Assuming as an ap-
proximate solution the vector x = (’4.499775, 5.5002249)T , one ¬nds the residual
r (’0.001, 0)T , which is small although x is quite di¬erent from the exact so-
lution. The reason for this is due to the ill-conditioning of matrix A. Indeed in

this case K∞ (A) = 20001.

An estimate of the number of exact signi¬cant digits of a numerical
solution of a linear system can be given as follows. From (3.13), letting
γ = u and assuming that uK∞ (A) ¤ 1/2 we get
δx ∞ 2uK∞ (A)
¤ ¤ 4uK∞ (A).
1 ’ uK∞ (A)
As a consequence
x’x ∞
uK∞ (A). (3.69)
β ’t and K∞ (A) β m , one gets that the solution x
Assuming that u
computed by GEM will have at least t ’ m exact digits, t being the number
of digits available for the mantissa. In other words, the ill-conditioning of a
system depends both on the capability of the ¬‚oating-point arithmetic that
is being used and on the accuracy that is required in the solution.

3.11 An Approximate Computation of K(A)
Suppose that the linear system (3.2) has been solved by a factorization
method. To determine the accuracy of the computed solution, the analy-
sis carried out in Section 3.10 can be used if an estimate of the condition
number K(A) of A, which we denote by K(A), is available. Indeed, al-
though evaluating A can be an easy task if a suitable norm is chosen
(for instance, · 1 or · ∞ ), it is by no means reasonable (or compu-
tationally convenient) to compute A’1 if the only purpose is to evaluate
A’1 . For this reason, we describe in this section a procedure (proposed
in [CMSW79]) that approximates A’1 with a computational cost of the
order of n2 ¬‚ops.
The basic idea of the algorithm is as follows: ∀d ∈ Rn with d = 0, thanks
to the de¬nition of matrix norm, A’1 ≥ y / d = γ(d) with Ay = d.
Thus, we look for d in such a way that γ(d) is as large as possible and
assume the obtained value as an estimate of A’1 .
For the method to be e¬ective, the selection of d is crucial. To explain
how to do this, we start by assuming that the QR factorization of A has
3.11 An Approximate Computation of K(A) 107

been computed and that K2 (A) is to be approximated. In such an event,
since K2 (A) = K2 (R) due to Property 1.8, it su¬ces to estimate R’1 2
instead of A’1 2 . Considerations related to the SVD of R induce approx-
imating R’1 2 by the following algorithm:
compute the vectors x and y, solutions to the systems

RT x = d, Ry = x, (3.70)

then estimate R’1 2 by the ratio γ2 = y 2 / x 2 . The vector d appearing
in (3.70) should be determined in such a way that γ2 is as close as possible
to the value actually attained by R’1 2 . It can be shown that, except in
very special cases, γ2 provides for any choice of d a reasonable (although
not very accurate) estimate of R’1 2 (see Exercise 15). As a consequence,
a proper selection of d can encourage this natural trend.
Before going on, it is worth noting that computing K2 (R) is not an easy
matter even if an estimate of R’1 2 is available. Indeed, it would remain
to compute R 2 = ρ(RT R). To overcome this di¬culty, we consider
henceforth K1 (R) instead of K2 (R) since R 1 is easily computable. Then,
heuristics allows us to assume that the ratio γ1 = y 1 / x 1 is an estimate
of R’1 1 , exactly as γ2 is an estimate of R’1 2 .

Let us now deal with the choice of d. Since RT x = d, the generic compo-
nent xk of x can be formally related to x1 , . . . , xk’1 through the formulae
of forward substitution as

r11 x0 = d1 ,
rkk xk = dk ’ (r1k x1 + . . . + rk’1,k xk’1 ), k ≥ 1.

Assume that the components of d are of the form dk = ±θk , where θk
are random numbers and set arbitrarily d1 = θ1 . Then, x1 = θ1 /r11 is
completely determined, while x2 = (d2 ’ r12 x1 )/r22 depends on the sign of
d2 . We set the sign of d2 as the opposite of r12 x1 in such a way to make
x(1 : 2) 1 = |x1 | + |x2 |, for a ¬xed x1 , the largest possible. Once x2 is
known, we compute x3 following the same criterion, and so on, until xn .
This approach sets the sign of each component of d and yields a vector
x with a presumably large · 1 . However, it can fail since it is based on
the idea (which is in general not true) that maximizing x 1 can be done
by selecting at each step k in (3.71) the component xk which guarantees
the maximum increase of x(1 : k ’ 1) 1 (without accounting for the fact
that all the components are related).
Therefore, we need to modify the method by including a sort of “look-
ahead” strategy, which accounts for the way of choosing dk a¬ects all later
values xi , with i > k, still to be computed. Concerning this point, we notice
that for a generic row i of the system it is always possible to compute at
108 3. Direct Methods for the Solution of Linear Systems

step k the vector p(k’1) with components
i = 1, . . . , k ’ 1,
pi =0
pi = r1i x1 + . . . + rk’1,i xk’1 i = k, . . . , n.
Thus xk = (±θk ’ pk )/rkk . We denote the two possible values of xk by

xk and xk . The choice between them is now taken not only accounting for
which of the two most increases x(1 : k) 1 , but also evaluating the increase
of p(k) 1 . This second contribution accounts for the e¬ect of the choice of
dk on the components that are still to be computed. We can include both
criteria in a unique test. Denoting by
(k)+ (k)’
pi = 0, pi = 0, i = 1, . . . , k,
(k)+ (k)’
+ rki x’ , i = k + 1, . . . , n,
(k’1) (k’1)
+ rki x+ , pi
pi = pi = pi
k k
+ ’
the components of the vectors p(k) and p(k) respectively, we set each
k-th step dk = +θk or dk = ’θk according to whether |rkk x+ | + p(k) 1

is greater or less than |rkk x’ | + p(k) 1 .
Under this choice d is completely determined and the same holds for x.
Now, solving the system Ry = x, we are warranted that y 1 / x 1 is a reli-
able approximation to R’1 1 , so that we can set K1 (A) = R 1 y 1 / x 1 .

In practice the PA=LU factorization introduced in Section 3.5 is usually
available. Based on the previous considerations and on some heuristics, an
analogous procedure to that shown above can be conveniently employed
to approximate A’1 1 . Precisely, instead of systems (3.70), we must now
(LU)T x = d, LUy = x.
We set y 1 / x 1 as the approximation of A’1 1 and, consequently, we
de¬ne K1 (A). The strategy for selecting d can be the same as before;
indeed, solving (LU)T x = d amounts to solving
UT z = d, LT x = z, (3.72)
and thus, since UT is lower triangular, we can proceed as in the previous
case. A remarkable di¬erence concerns the computation of x. Indeed, while
the matrix RT in the second system of (3.70) has the same condition number
as R, the second system in (3.72) has a matrix LT which could be even more
ill-conditioned than UT . If this were the case, solving for x could lead to
an inaccurate outcome, thus making the whole process useless.
Fortunately, resorting to partial pivoting prevents this circumstance from
occurring, ensuring that any ill-condition in A is re¬‚ected in a correspond-
ing ill-condition in U. Moreover, picking θk randomly between 1/2 and 1
3.12 Improving the Accuracy of GEM 109

guarantees accurate results even in the special cases where L turns out to
be ill-conditioned.

The algorithm presented below is implemented in the LINPACK library
[BDMS79] and in the MATLAB function rcond. This function, in order
to avoid rounding errors, returns as output parameter the reciprocal of
K1 (A). A more accurate estimator, described in [Hig88], is implemented in
the MATLAB function condest.

Program 14 implements the approximate evaluation of K1 for a matrix
A of generic form. The input parameters are the size n of the matrix A, the
matrix A, the factors L, U of its PA=LU factorization and the vector theta
containing the random numbers θk , for k = 1, . . . , n.

Program 14 - cond est : Algorithm for the approximation of K1 (A)

function [k1] = cond est(n,A,L,U,theta)
for i=1:n, p(i)=0; end
for k=1:n
zplus=(theta(k)-p(k))/U(k,k); zminu=(-theta(k)-p(k))/U(k,k);
splus=abs(theta(k)-p(k)); sminu=abs(-theta(k)-p(k));
for i=(k+1):n
if splus >= sminu, z(k)=zplus; else, z(k)=zminu; end
for i=(k+1):n, p(i)=p(i)+U(k,i)*z(k); end
z = z™; x = backward col(L™,z);
w = forward col(L,x); y = backward col(U,w);

Example 3.9 Let us consider the Hilbert matrix H4 . Its condition number
K1 (H4 ), computed using the MATLAB function invhilb which returns the exact
inverse of H4 , is 2.8375 · 104 . Running Program 14 with theta=(1, 1, 1, 1)T gives
the reasonable estimate K1 (H4 ) = 2.1523 · 104 (which is the same as the output

of rcond), while the function condest returns the exact result.

3.12 Improving the Accuracy of GEM
As previously noted if the matrix of the system is ill-conditioned, the so-
lution generated by GEM could be inaccurate even though its residual is
small. In this section, we mention two techniques for improving the accu-
racy of the solution computed by GEM.
110 3. Direct Methods for the Solution of Linear Systems

3.12.1 Scaling
If the entries of A vary greatly in size, it is likely that during the elimination
process large entries are summed to small entries, with a consequent onset
of rounding errors. A remedy consists of performing a scaling of the matrix
A before the elimination is carried out.

Example 3.10 Consider again the matrix A of Remark 3.3. Multiplying it on
the right and on the left with matrix D=diag(0.0005, 1, 1), we obtain the scaled
® 
’0.0001 1 1
A = DAD = ° 0.78125 0 » .
˜ 1
1 0 0

Applying GEM to the scaled system A˜ = Db = (0.2, 1.3816, 1.9273)T , we get

the correct solution x = D˜.

Row scaling of A amounts to ¬nding a diagonal nonsingular matrix D1
such that the diagonal entries of D1 A are of the same size. The linear
system Ax = b transforms into

D1 Ax = D1 b.

When both rows and columns of A are to be scaled, the scaled version of
(3.2) becomes

with y = D’1 x,
(D1 AD2 )y = D1 b 2

having also assumed that D2 is invertible. Matrix D1 scales the equations
while D2 scales the unknowns. Notice that, to prevent rounding errors, the
scaling matrices are chosen in the form

D1 = diag(β r1 , . . . , β rn ), D2 = diag(β c1 , . . . , β cn ),

where β is the base of the used ¬‚oating-point arithmetic and the exponents
r1 , . . . , rn , c1 , . . . , cn must be determined. It can be shown that

D’1 (x ’ x) ∞
uK∞ (D1 AD2 ).
D’1 x ∞

Therefore, scaling will be e¬ective if K∞ (D1 AD2 ) is much less than K∞ (A).
Finding convenient matrices D1 and D2 is not in general an easy matter.
A strategy consists, for instance, of picking up D1 and D2 in such a way
that D1 AD2 ∞ and D1 AD2 1 belong to the interval [1/β, 1], where β is
the base of the used ¬‚oating-point arithmetic (see [McK62] for a detailed
analysis in the case of the Crout factorization).
3.12 Improving the Accuracy of GEM 111

Remark 3.7 (The Skeel condition number) The Skeel condition num-
ber, de¬ned as cond(A) = |A’1 | |A| ∞ , is the supremum over the set
x∈ Rn , with x = 0, of the numbers

|A’1 | |A| |x| ∞
cond(A, x) = .

Unlike what happens for K(A), cond(A,x) is invariant with respect to a
scaling by rows of A, that is, to transformations of A of the form DA, where
D is a nonsingular diagonal matrix. As a consequence, cond(A) provides a
sound indication of the ill-conditioning of a matrix, irrespectively of any
possible row diagonal scaling.

3.12.2 Iterative Re¬nement
Iterative re¬nement is a technique for improving the accuracy of a solution
yielded by a direct method. Suppose that the linear system (3.2) has been
solved by means of LU factorization (with partial or complete pivoting),
and denote by x(0) the computed solution. Having ¬xed an error tolerance,
toll, the iterative re¬nement performs as follows: for i = 0, 1, . . . , until

1. compute the residual r(i) = b ’ Ax(i) ;

2. solve the linear system Az = r(i) using the LU factorization of A;

3. update the solution setting x(i+1) = x(i) + z;

4. if z / x(i+1) < toll, then terminate the process returning the solu-
tion x(i+1) . Otherwise, the algorithm restarts at step 1.

In absence of rounding errors, the process would stop at the ¬rst step,
yielding the exact solution. The convergence properties of the method
can be improved by computing the residual r(i) in double precision, while
computing the other quantities in single precision. We call this procedure

<< . .

. 17
( : 95)

. . >>