000000000000000

00000

11111

000000000000000

111111111111111

00000

11111

111111111111111

000000000000000

B 00000

11111

111111111111111

000000000000000

5 111111111111111

000000000000000

111111111111111

000000000000000

111111111111111

000000000000000

A 000000000000000

111111111111111

11

00 000000000000000

111111111111111

00B

11 000000000000000

111111111111111

6

111111111111111

000000000000000

FIGURE 3.8. Two steps of nested dissection. Graph partitioning (left) and matrix

reordering (right)

estimate (3.64) becomes

¤ nu 3 A + O(u2 ).

δA + 5n U (3.65)

∞ ∞ ∞

The bound for δA ∞ in (3.65) is of practical use only if it is possible to

provide an estimate for U ∞ . With this aim, backward analysis can be

carried out introducing the so-called growth factor

(k)

max|aij |

i,j,k

ρn = . (3.66)

max|aij |

i,j

Taking advantage of the fact that |uij | ¤ ρn max|aij |, the following result

i,j

due to Wilkinson can be drawn from (3.65),

¤ 8un3 ρn A + O(u2 ).

δA (3.67)

∞ ∞

The growth factor can be bounded by 2n’1 and, although in most of the

cases it is of the order of 10, there exist matrices for which the inequality in

(3.67) becomes an equality (see, for instance, Exercise 5). For some special

classes of matrices, a sharp bound for ρn can be found:

3.10 Accuracy of the Solution Achieved Using GEM 105

1. for banded matrices with upper and lower bands equal to p, ρn ¤

22p’1 ’ (p ’ 1)2p’2 . As a consequence, in the tridiagonal case one

gets ρn ¤ 2;

2. for Hessenberg matrices, ρn ¤ n;

3. for symmetric positive de¬nite matrices, ρn = 1;

4. for matrices strictly diagonally dominant by columns, ρn ¤ 2.

To achieve better stability when using GEM for arbitrary matrices, re-

sorting to complete pivoting would seem to be mandatory, since it ensures

1/2

that ρn ¤ n1/2 2 · 31/2 · . . . · n1/(n’1) . Indeed, this growth is slower

n’1

than 2 as n increases.

However, apart from very special instances, GEM with only partial piv-

oting exhibits acceptable growth factors. This make it the most commonly

employed method in the computational practice.

Example 3.7 Consider the linear system (3.2) with

µ 1 1+µ

A= , b= , (3.68)

1 0 1

which admits the exact solution x=1 for any value of µ. The matrix is well-

conditioned, having K∞ (A) = (1 + µ)2 . Attempting to solve the system for µ =

10’15 by the LU factorization with 16 signi¬cant digits, and using the Programs

5, 2 and 3, yields the solution x = [0.8881784197001253, 1.000000000000000]T ,

with an error greater than 11% on the ¬rst component. Some insight into the

causes of the inaccuracy of the computed solution can be drawn from (3.64).

Indeed this latter does not provide a uniformly small bound for all the entries of

matrix δA, rather

3.55 · 10’30 1.33 · 10’15

|δA| ¤ .

1.33 · 10’15 2.22

Notice that the entries of the corresponding matrices L and U are quite large in

module. Conversely, resorting to GEM with partial or complete pivoting yields

•

the exact solution of the system (see Exercise 6).

Let us now address the role of the condition number in the error analysis

for GEM. GEM yields a solution x that is typically characterized by having

a small residual r = b ’ Ax (see [GL89]). This feature, however, does not

ensure that the error x ’ x is small when K(A) 1 (see Example 3.8). In

fact, if δb in (3.11) is regarded as being the residual, then

x’x 1 r

¤ K(A) r ¤ K(A) .

x A x b

This result will be applied to devise methods, based on the a posteriori

analysis, for improving the accuracy of the solution of GEM (see Section

3.12).

106 3. Direct Methods for the Solution of Linear Systems

Example 3.8 Consider the linear system Ax = b with

1 1.0001 1

A= , b= ,

1.0001 1 1

which admits the solution x = (0.499975 . . . , 0.499975 . . . )T . Assuming as an ap-

proximate solution the vector x = (’4.499775, 5.5002249)T , one ¬nds the residual

r (’0.001, 0)T , which is small although x is quite di¬erent from the exact so-

lution. The reason for this is due to the ill-conditioning of matrix A. Indeed in

•

this case K∞ (A) = 20001.

An estimate of the number of exact signi¬cant digits of a numerical

solution of a linear system can be given as follows. From (3.13), letting

γ = u and assuming that uK∞ (A) ¤ 1/2 we get

δx ∞ 2uK∞ (A)

¤ ¤ 4uK∞ (A).

1 ’ uK∞ (A)

x∞

As a consequence

x’x ∞

uK∞ (A). (3.69)

x∞

β ’t and K∞ (A) β m , one gets that the solution x

Assuming that u

computed by GEM will have at least t ’ m exact digits, t being the number

of digits available for the mantissa. In other words, the ill-conditioning of a

system depends both on the capability of the ¬‚oating-point arithmetic that

is being used and on the accuracy that is required in the solution.

3.11 An Approximate Computation of K(A)

Suppose that the linear system (3.2) has been solved by a factorization

method. To determine the accuracy of the computed solution, the analy-

sis carried out in Section 3.10 can be used if an estimate of the condition

number K(A) of A, which we denote by K(A), is available. Indeed, al-

though evaluating A can be an easy task if a suitable norm is chosen

(for instance, · 1 or · ∞ ), it is by no means reasonable (or compu-

tationally convenient) to compute A’1 if the only purpose is to evaluate

A’1 . For this reason, we describe in this section a procedure (proposed

in [CMSW79]) that approximates A’1 with a computational cost of the

order of n2 ¬‚ops.

The basic idea of the algorithm is as follows: ∀d ∈ Rn with d = 0, thanks

to the de¬nition of matrix norm, A’1 ≥ y / d = γ(d) with Ay = d.

Thus, we look for d in such a way that γ(d) is as large as possible and

assume the obtained value as an estimate of A’1 .

For the method to be e¬ective, the selection of d is crucial. To explain

how to do this, we start by assuming that the QR factorization of A has

3.11 An Approximate Computation of K(A) 107

been computed and that K2 (A) is to be approximated. In such an event,

since K2 (A) = K2 (R) due to Property 1.8, it su¬ces to estimate R’1 2

instead of A’1 2 . Considerations related to the SVD of R induce approx-

imating R’1 2 by the following algorithm:

compute the vectors x and y, solutions to the systems

RT x = d, Ry = x, (3.70)

then estimate R’1 2 by the ratio γ2 = y 2 / x 2 . The vector d appearing

in (3.70) should be determined in such a way that γ2 is as close as possible

to the value actually attained by R’1 2 . It can be shown that, except in

very special cases, γ2 provides for any choice of d a reasonable (although

not very accurate) estimate of R’1 2 (see Exercise 15). As a consequence,

a proper selection of d can encourage this natural trend.

Before going on, it is worth noting that computing K2 (R) is not an easy

matter even if an estimate of R’1 2 is available. Indeed, it would remain

to compute R 2 = ρ(RT R). To overcome this di¬culty, we consider

henceforth K1 (R) instead of K2 (R) since R 1 is easily computable. Then,

heuristics allows us to assume that the ratio γ1 = y 1 / x 1 is an estimate

of R’1 1 , exactly as γ2 is an estimate of R’1 2 .

Let us now deal with the choice of d. Since RT x = d, the generic compo-

nent xk of x can be formally related to x1 , . . . , xk’1 through the formulae

of forward substitution as

r11 x0 = d1 ,

(3.71)

rkk xk = dk ’ (r1k x1 + . . . + rk’1,k xk’1 ), k ≥ 1.

Assume that the components of d are of the form dk = ±θk , where θk

are random numbers and set arbitrarily d1 = θ1 . Then, x1 = θ1 /r11 is

completely determined, while x2 = (d2 ’ r12 x1 )/r22 depends on the sign of

d2 . We set the sign of d2 as the opposite of r12 x1 in such a way to make

x(1 : 2) 1 = |x1 | + |x2 |, for a ¬xed x1 , the largest possible. Once x2 is

known, we compute x3 following the same criterion, and so on, until xn .

This approach sets the sign of each component of d and yields a vector

x with a presumably large · 1 . However, it can fail since it is based on

the idea (which is in general not true) that maximizing x 1 can be done

by selecting at each step k in (3.71) the component xk which guarantees

the maximum increase of x(1 : k ’ 1) 1 (without accounting for the fact

that all the components are related).

Therefore, we need to modify the method by including a sort of “look-

ahead” strategy, which accounts for the way of choosing dk a¬ects all later

values xi , with i > k, still to be computed. Concerning this point, we notice

that for a generic row i of the system it is always possible to compute at

108 3. Direct Methods for the Solution of Linear Systems

step k the vector p(k’1) with components

(k’1)

i = 1, . . . , k ’ 1,

pi =0

(k’1)

pi = r1i x1 + . . . + rk’1,i xk’1 i = k, . . . , n.

(k’1)

Thus xk = (±θk ’ pk )/rkk . We denote the two possible values of xk by

’

+

xk and xk . The choice between them is now taken not only accounting for

which of the two most increases x(1 : k) 1 , but also evaluating the increase

of p(k) 1 . This second contribution accounts for the e¬ect of the choice of

dk on the components that are still to be computed. We can include both

criteria in a unique test. Denoting by

(k)+ (k)’

pi = 0, pi = 0, i = 1, . . . , k,

(k)+ (k)’

+ rki x’ , i = k + 1, . . . , n,

(k’1) (k’1)

+ rki x+ , pi

pi = pi = pi

k k

+ ’

the components of the vectors p(k) and p(k) respectively, we set each

+

k-th step dk = +θk or dk = ’θk according to whether |rkk x+ | + p(k) 1

k

’

is greater or less than |rkk x’ | + p(k) 1 .

k

Under this choice d is completely determined and the same holds for x.

Now, solving the system Ry = x, we are warranted that y 1 / x 1 is a reli-

able approximation to R’1 1 , so that we can set K1 (A) = R 1 y 1 / x 1 .

In practice the PA=LU factorization introduced in Section 3.5 is usually

available. Based on the previous considerations and on some heuristics, an

analogous procedure to that shown above can be conveniently employed

to approximate A’1 1 . Precisely, instead of systems (3.70), we must now

solve

(LU)T x = d, LUy = x.

We set y 1 / x 1 as the approximation of A’1 1 and, consequently, we

de¬ne K1 (A). The strategy for selecting d can be the same as before;

indeed, solving (LU)T x = d amounts to solving

UT z = d, LT x = z, (3.72)

and thus, since UT is lower triangular, we can proceed as in the previous

case. A remarkable di¬erence concerns the computation of x. Indeed, while

the matrix RT in the second system of (3.70) has the same condition number

as R, the second system in (3.72) has a matrix LT which could be even more

ill-conditioned than UT . If this were the case, solving for x could lead to

an inaccurate outcome, thus making the whole process useless.

Fortunately, resorting to partial pivoting prevents this circumstance from

occurring, ensuring that any ill-condition in A is re¬‚ected in a correspond-

ing ill-condition in U. Moreover, picking θk randomly between 1/2 and 1

3.12 Improving the Accuracy of GEM 109

guarantees accurate results even in the special cases where L turns out to

be ill-conditioned.

The algorithm presented below is implemented in the LINPACK library

[BDMS79] and in the MATLAB function rcond. This function, in order

to avoid rounding errors, returns as output parameter the reciprocal of

K1 (A). A more accurate estimator, described in [Hig88], is implemented in

the MATLAB function condest.

Program 14 implements the approximate evaluation of K1 for a matrix

A of generic form. The input parameters are the size n of the matrix A, the

matrix A, the factors L, U of its PA=LU factorization and the vector theta

containing the random numbers θk , for k = 1, . . . , n.

Program 14 - cond est : Algorithm for the approximation of K1 (A)

function [k1] = cond est(n,A,L,U,theta)

for i=1:n, p(i)=0; end

for k=1:n

zplus=(theta(k)-p(k))/U(k,k); zminu=(-theta(k)-p(k))/U(k,k);

splus=abs(theta(k)-p(k)); sminu=abs(-theta(k)-p(k));

for i=(k+1):n

splus=splus+abs(p(i)+U(k,i)*zplus);

sminu=sminu+abs(p(i)+U(k,i)*zminu);

end

if splus >= sminu, z(k)=zplus; else, z(k)=zminu; end

for i=(k+1):n, p(i)=p(i)+U(k,i)*z(k); end

end

z = z™; x = backward col(L™,z);

w = forward col(L,x); y = backward col(U,w);

k1=norm(A,1)*norm(y,1)/norm(x,1);

Example 3.9 Let us consider the Hilbert matrix H4 . Its condition number

K1 (H4 ), computed using the MATLAB function invhilb which returns the exact

inverse of H4 , is 2.8375 · 104 . Running Program 14 with theta=(1, 1, 1, 1)T gives

the reasonable estimate K1 (H4 ) = 2.1523 · 104 (which is the same as the output

•

of rcond), while the function condest returns the exact result.

3.12 Improving the Accuracy of GEM

As previously noted if the matrix of the system is ill-conditioned, the so-

lution generated by GEM could be inaccurate even though its residual is

small. In this section, we mention two techniques for improving the accu-

racy of the solution computed by GEM.

110 3. Direct Methods for the Solution of Linear Systems

3.12.1 Scaling

If the entries of A vary greatly in size, it is likely that during the elimination

process large entries are summed to small entries, with a consequent onset

of rounding errors. A remedy consists of performing a scaling of the matrix

A before the elimination is carried out.

Example 3.10 Consider again the matrix A of Remark 3.3. Multiplying it on

the right and on the left with matrix D=diag(0.0005, 1, 1), we obtain the scaled

matrix

®

’0.0001 1 1

A = DAD = ° 0.78125 0 » .

˜ 1

1 0 0

˜x

Applying GEM to the scaled system A˜ = Db = (0.2, 1.3816, 1.9273)T , we get

•

the correct solution x = D˜.

x

Row scaling of A amounts to ¬nding a diagonal nonsingular matrix D1

such that the diagonal entries of D1 A are of the same size. The linear

system Ax = b transforms into

D1 Ax = D1 b.

When both rows and columns of A are to be scaled, the scaled version of

(3.2) becomes

with y = D’1 x,

(D1 AD2 )y = D1 b 2

having also assumed that D2 is invertible. Matrix D1 scales the equations

while D2 scales the unknowns. Notice that, to prevent rounding errors, the

scaling matrices are chosen in the form

D1 = diag(β r1 , . . . , β rn ), D2 = diag(β c1 , . . . , β cn ),

where β is the base of the used ¬‚oating-point arithmetic and the exponents

r1 , . . . , rn , c1 , . . . , cn must be determined. It can be shown that

D’1 (x ’ x) ∞

2

uK∞ (D1 AD2 ).

D’1 x ∞

2

Therefore, scaling will be e¬ective if K∞ (D1 AD2 ) is much less than K∞ (A).

Finding convenient matrices D1 and D2 is not in general an easy matter.

A strategy consists, for instance, of picking up D1 and D2 in such a way

that D1 AD2 ∞ and D1 AD2 1 belong to the interval [1/β, 1], where β is

the base of the used ¬‚oating-point arithmetic (see [McK62] for a detailed

analysis in the case of the Crout factorization).

3.12 Improving the Accuracy of GEM 111

Remark 3.7 (The Skeel condition number) The Skeel condition num-

ber, de¬ned as cond(A) = |A’1 | |A| ∞ , is the supremum over the set

x∈ Rn , with x = 0, of the numbers

|A’1 | |A| |x| ∞

cond(A, x) = .

x∞

Unlike what happens for K(A), cond(A,x) is invariant with respect to a

scaling by rows of A, that is, to transformations of A of the form DA, where

D is a nonsingular diagonal matrix. As a consequence, cond(A) provides a

sound indication of the ill-conditioning of a matrix, irrespectively of any

possible row diagonal scaling.

3.12.2 Iterative Re¬nement

Iterative re¬nement is a technique for improving the accuracy of a solution

yielded by a direct method. Suppose that the linear system (3.2) has been

solved by means of LU factorization (with partial or complete pivoting),

and denote by x(0) the computed solution. Having ¬xed an error tolerance,

toll, the iterative re¬nement performs as follows: for i = 0, 1, . . . , until

convergence:

1. compute the residual r(i) = b ’ Ax(i) ;

2. solve the linear system Az = r(i) using the LU factorization of A;

3. update the solution setting x(i+1) = x(i) + z;

4. if z / x(i+1) < toll, then terminate the process returning the solu-

tion x(i+1) . Otherwise, the algorithm restarts at step 1.

In absence of rounding errors, the process would stop at the ¬rst step,

yielding the exact solution. The convergence properties of the method

can be improved by computing the residual r(i) in double precision, while

computing the other quantities in single precision. We call this procedure