K2 (A) = (3.5)

»min

where »max and »min are the maximum and minimum eigenvalues of A.

To check (3.5), notice that

ρ(AT A) = ρ(A2 ) = »2

A = max = »max .

2

Moreover, since »(A’1 ) = 1/»(A), one gets A’1 2 = 1/»min from which

(3.5) follows. For that reason, K2 (A) is called spectral condition number.

Remark 3.1 De¬ne the relative distance of A ∈ Cn—n from the set of

singular matrices with respect to the p-norm by

δA p

distp (A) = min : A + δA is singular .

Ap

It can then be shown that ([Kah66], [Gas83])

1

distp (A) = . (3.6)

Kp (A)

Equation (3.6) suggests that a matrix A with a high condition number

can behave like a singular matrix of the form A+δA. In other words, null

60 3. Direct Methods for the Solution of Linear Systems

perturbations in the right hand side do not necessarily yield non vanishing

changes in the solution since, if A+δA is singular, the homogeneous system

(A + δA)z = 0 does no longer admit only the null solution. From (3.6) it

also follows that if A+δA is nonsingular then

δA A < 1. (3.7)

p p

Relation (3.6) seems to suggest that a natural candidate for measuring

the ill-conditioning of a matrix is its determinant, since from (3.3) one is

prompted to conclude that small determinants mean nearly-singular matri-

ces. However this conclusion is wrong, as there exist examples of matrices

with small (respectively, high) determinants and small (respectively, high)

condition numbers (see Exercise 2).

3.1.2 Forward a priori Analysis

In this section we introduce a measure of the sensitivity of the system to

changes in the data. These changes will be interpreted in Section 3.10 as

being the e¬ects of rounding errors induced by the numerical method used

to solve the system. For a more comprehensive analysis of the subject we

refer to [Dat95], [GL89], [Ste73] and [Var62].

Due to rounding errors, a numerical method for solving (3.2) does not

provide the exact solution but only an approximate one, which satis¬es a

perturbed system. In other words, a numerical method yields an (exact)

solution x + δx of the perturbed system

(A + δA)(x + δx) = b + δb. (3.8)

The next result provides an estimate of δx in terms of δA and δb.

Theorem 3.1 Let A ∈ Rn—n be a nonsingular matrix and δA ∈ Rn—n be

such that (3.7) is satis¬ed for a matrix norm · . Then, if x∈ Rn is the

solution of Ax=b with b ∈ Rn (b = 0) and δx ∈ Rn satis¬es (3.8) for

δb ∈ Rn ,

δx δb

K(A) δA

¤ + . (3.9)

1 ’ K(A) δA / A

x b A

Proof. From (3.7) it follows that the matrix A’1 δA has norm less than 1. Then,

due to Theorem 1.5, I + A’1 δA is invertible and from (1.26) it follows that

1 1

(I + A’1 δA)’1 ¤ ¤ . (3.10)

1’ A 1’ A

’1 δA ’1 δA

On the other hand, solving for δx in (3.8) and recalling that Ax = b, one gets

δx = (I + A’1 δA)’1 A’1 (δb ’ δAx),

3.1 Stability Analysis of Linear Systems 61

from which, passing to the norms and using (3.10), it follows that

A’1

δx ¤ ( δb + δA x ).

1 ’ A’1 δA

Finally, dividing both sides by x (which is nonzero since b = 0 and A is

nonsingular) and noticing that x ≥ b / A , the result follows. 3

Well-conditioning alone is not enough to yield an accurate solution of the

linear system. It is indeed crucial, as pointed out in Chapter 2, to resort to

stable algorithms. Conversely, ill-conditioning does not necessarily exclude

that for particular choices of the right side b the overall conditioning of the

system is good (see Exercise 4).

A particular case of Theorem 3.1 is the following.

Theorem 3.2 Assume that the conditions of Theorem 3.1 hold and let

δA = 0. Then

δb δx δb

1

¤ ¤ K(A) . (3.11)

K(A) b x b

Proof. We will prove only the ¬rst inequality since the second one directly

follows from (3.9). Relation δx = A’1 δb yields δb ¤ A δx . Multiplying

both sides by x and recalling that x ¤ A’1 b it follows that x δb ¤

K(A) b δx , which is the desired inequality. 3

In order to employ the inequalities (3.10) and (3.11) in the analysis of

propagation of rounding errors in the case of direct methods, δA and

δb should be bounded in terms of the dimension of the system and of

the characteristics of the ¬‚oating-point arithmetic that is being used.

It is indeed reasonable to expect that the perturbations induced by a

method for solving a linear system are such that δA ¤ γ A and δb ¤

γ b , γ being a positive number that depends on the roundo¬ unit u (for

example, we shall assume henceforth that γ = β 1’t , where β is the base

and t is the number of digits of the mantissa of the ¬‚oating-point system

F). In such a case (3.9) can be completed by the following theorem.

Theorem 3.3 Assume that δA ¤ γ A , δb ¤ γ b with γ ∈ R+ and

δA ∈ Rn—n , δb ∈ Rn . Then, if γK(A) < 1 the following inequalities hold

x + δx 1 + γK(A)

¤ , (3.12)

1 ’ γK(A)

x

δx 2γ

¤ K(A). (3.13)

1 ’ γK(A)

x

62 3. Direct Methods for the Solution of Linear Systems

Proof. From (3.8) it follows that (I + A’1 δA)(x + δx) = x + A’1 δb. Moreover,

since γK(A) < 1 and δA ¤ γ A it turns out that I + A’1 δA is nonsingular.

Taking the inverse of such a matrix and passing to the norms we get x + δx ¤

(I + A’1 δA)’1 x + γ A’1 b . From Theorem 1.5 it then follows that

1

x + γ A’1

x + δx ¤ b ,

1 ’ A’1 δA

which implies (3.12), since A’1 δA ¤ γK(A) and b ¤ A x .

Let us prove (3.13). Subtracting (3.2) from (3.8) it follows that

Aδx = ’δA(x + δx) + δb.

Inverting A and passing to the norms, the following inequality is obtained

A’1 δA x + δx + A’1

δx ¤ δb

(3.14)

γK(A) x + δx + γ A’1

¤ b.

Dividing both sides by x and using the triangular inequality x+δx ¤ δx +

3

x , we ¬nally get (3.13).

Remarkable instances of perturbations δA and δb are those for which

|δA| ¤ γ|A| and |δb| ¤ γ|b| with γ ≥ 0. Hereafter, the absolute value

notation B = |A| denotes the matrix n — n having entries bij = |aij | with

i, j = 1, . . . , n and the inequality C ¤ D, with C, D ∈ Rm—n has the

following meaning

cij ¤ dij for i = 1, . . . , m, j = 1, . . . , n.

·

If is considered, from (3.14) it follows that

∞

|A’1 | |A| |x| + |A’1 | |b| ∞

δx ∞

¤γ

(1 ’ γ |A’1 | |A| ∞ ) x ∞

x∞

(3.15)

2γ

|A’1 | |A|

¤ ∞.

1 ’ γ |A’1 | |A| ∞

Estimate (3.15) is generally too pessimistic; however, the following compo-

nentwise error estimates of δx can be derived from (3.15)

|δxi | ¤ γ|rT | |A| |x + δx|, i = 1, . . . , n if δb = 0,

(i)

(3.16)

|rT | |b|

|δxi | (i)

¤γ T , i = 1, . . . , n if δA = 0,

|xi | |r(i) b|

being rT the row vector eT A’1 . Estimates (3.16) are more stringent than

i

(i)

(3.15), as can be seen in Example 3.1. The ¬rst inequality in (3.16) can be

used when the perturbed solution x + δx is known, being henceforth x + δx

the solution computed by a numerical method.

3.1 Stability Analysis of Linear Systems 63

In the case where |A’1 | |b| = |x|, the parameter γ in (3.15) is equal

to 1. For such systems the components of the solution are insensitive to

perturbations to the right side. A slightly worse situation occurs when A

is a triangular M-matrix and b has positive entries. In such a case γ is

bounded by 2n ’ 1, since

|rT | |A| |x| ¤ (2n ’ 1)|xi |.

(i)

For further details on the subject we refer to [Ske79], [CI95] and [Hig89].

Results linking componentwise estimates to normwise estimates through

the so-called hypernorms can be found in [ADR92].

Example 3.1 Consider the linear system Ax=b with

® ®2

1 1

±± ± +±

A=° », b = ° »

1 1

0± ±

which has solution xT = (±, 1), where 0 < ± < 1. Let us compare the results

obtained using (3.15) and (3.16). From

T

2

’1 ’1

|A | |A| |x| = |A | |b| = ± + 2,1 (3.17)

±

it follows that the supremum of (3.17) is unbounded as ± ’ 0, exactly as it

happens in the case of A ∞ . On the other hand, the ampli¬cation factor of

the error in (3.16) is bounded. Indeed, the component of the maximum absolute

value, x2 , of the solution, satis¬es |rT | |A| |x|/|x2 | = 1. •

(2)

3.1.3 Backward a priori Analysis

The numerical methods that we have considered thus far do not require the

explicit computation of the inverse of A to solve Ax=b. However, we can

always assume that they yield an approximate solution of the form x = Cb,

where the matrix C, due to rounding errors, is an approximation of A’1 .

In practice, C is very seldom constructed; in case this should happen, the

following result yields an estimate of the error that is made substituting C

for A’1 (see [IK66], Chapter 2, Theorem 7).

Property 3.1 Let R = AC ’ I; if R < 1, then A and C are nonsingular

and

C R CR

¤ C ’ A’1 ¤

A’1 ¤ , . (3.18)

1’ R 1’ R

A

In the frame of backward a priori analysis we can interpret C as being the

inverse of A + δA (for a suitable unknown δA). We are thus assuming that

C(A + δA) = I. This yields

δA = C’1 ’ A = ’(AC ’ I)C’1 = ’RC’1

64 3. Direct Methods for the Solution of Linear Systems

and, as a consequence, if R < 1 it turns out that

RA

δA ¤ , (3.19)

1’ R

having used the ¬rst inequality in (3.18), where A is assumed to be an

approximation of the inverse of C (notice that the roles of C and A can be

interchanged).

3.1.4 A posteriori Analysis

Having approximated the inverse of A by a matrix C turns into having an

approximation of the solution of the linear system (3.2). Let us denote by

y a known approximate solution. The aim of the a posteriori analysis is to

relate the (unknown) error e = y ’ x to quantities that can be computed

using y and C.

The starting point of the analysis relies on the fact that the residual

vector r = b ’ Ay is in general nonzero, since y is just an approximation

to the unknown exact solution. The residual can be related to the error

through Property 3.1 as follows. We have e = A’1 (Ay ’ b) = ’A’1 r and

thus, if R < 1 then

rC

e¤ . (3.20)

1’ R

Notice that the estimate does not necessarily require y to coincide with

the solution x = Cb of the backward a priori analysis. One could therefore

think of computing C only for the purpose of using the estimate (3.20) (for

instance, in the case where (3.2) is solved through the Gauss elimination

method, one can compute C a posteriori using the LU factorization of A,

see Sections 3.3 and 3.3.1).

We conclude by noticing that if δb is interpreted in (3.11) as being the

residual of the computed solution y = x + δx, it also follows that

e r

¤ K(A) . (3.21)

x b

The estimate (3.21) is not used in practice since the computed residual

is a¬ected by rounding errors. A more signi¬cant estimate (in the · ∞

norm) is obtained letting r = f l(b ’ Ay) and assuming that r = r + δr

with |δr| ¤ γn+1 (|A| |y| + |b|), where γn+1 = (n + 1)u/(1 ’ (n + 1)u) > 0,

from which we have

|A’1 |(|r| + γn+1 (|A||y| + |b|))

e ∞ ∞

¤ .

y y∞

∞

Formulae like this last one are implemented in the library for linear algebra

LAPACK (see [ABB+ 92]).

3.2 Solution of Triangular Systems 65

3.2 Solution of Triangular Systems

Consider the nonsingular 3—3 lower triangular system

® ® ®

l11 0 0 x1 b1

° l21 l22 0»° x2 » = ° b2 » .

l31 l32 l33 x3 b3

Since the matrix is nonsingular, its diagonal entries lii , i = 1, 2, 3, are

non vanishing, hence we can solve sequentially for the unknown values

xi , i = 1, 2, 3 as follows

x1 = b1 /l11 ,

x2 = (b2 ’ l21 x1 )/l22 ,

x3 = (b3 ’ l31 x1 ’ l32 x2 )/l33 .

This algorithm can be extended to systems n — n and is called forward

substitution. In the case of a system Lx=b, with L being a nonsingular

lower triangular matrix of order n (n ≥ 2), the method takes the form

b1

x1 = ,

l11«

(3.22)

i’1

1

lij xj , i = 2, . . . , n.

bi ’

xi =

lii j=1

The number of multiplications and divisions to execute the algorithm is

equal to n(n+1)/2, while the number of sums and subtractions is n(n’1)/2.

The global operation count for (3.22) is thus n2 ¬‚ops.

Similar conclusions can be drawn for a linear system Ux=b, where U

is a nonsingular upper triangular matrix of order n (n ≥ 2). In this case

the algorithm is called backward substitution and in the general case can

be written as

bn

xn = ,

«

unn

(3.23)

n

1

uij xj , i = n ’ 1, . . . , 1.

bi ’

xi =

uii j=i+1

Its computational cost is still n2 ¬‚ops.

3.2.1 Implementation of Substitution Methods

Each i-th step of algorithm (3.22) requires performing the scalar product

between the row vector L(i, 1 : i ’ 1) (this notation denoting the vector

extracted from matrix L taking the elements of the i-th row from the ¬rst

66 3. Direct Methods for the Solution of Linear Systems

to the (i-1)-th column) and the column vector x(1 : i ’ 1). The access to

matrix L is thus by row; for that reason, the forward substitution algorithm,

when implemented in the form above, is called row-oriented.

Its coding is reported in Program 1 (the Program mat square that is

called by forward row merely checks that L is a square matrix).

Program 1 - forward row : Forward substitution: row-oriented version

function [x]=forward row(L,b)

[n]=mat square(L); x(1) = b(1)/L(1,1);

for i = 2:n, x (i) = (b(i)-L(i,1:i-1)*(x(1:i-1))™)/L(i,i); end

x=x™;

To obtain a column-oriented version of the same algorithm, we take ad-

vantage of the fact that i-th component of the vector x, once computed,

can be conveniently eliminated from the system.

An implementation of such a procedure, where the solution x is over-

written on the right vector b, is reported in Program 2.

Program 2 - forward col : Forward substitution: column-oriented version

function [b]=forward col(L,b)

[n]=mat square(L);

for j=1:n-1,

b(j)= b(j)/L(j,j); b(j+1:n)=b(j+1:n)-b(j)*L(j+1:n,j);

end; b(n) = b(n)/L(n,n);

Implementing the same algorithm by a row-oriented rather than a column-

oriented approach, might dramatically change its performance (but of course,

not the solution). The choice of the form of implementation must therefore

be subordinated to the speci¬c hardware that is used.

Similar considerations hold for the backward substitution method, pre-

sented in (3.23) in its row-oriented version.

In Program 3 only the column-oriented version of the algorithm is coded.

As usual, the vector x is overwritten on b.

Program 3 - backward col : Backward substitution: column-oriented ver-

sion

function [b]=backward col(U,b)

[n]=mat square(U);