and to de¬ne the Richardson iteration

(1.6) xk+1 = (I ’ A)xk + b.

We will discuss more general methods in which {xk } is given by

(1.7) xk+1 = M xk + c.

In (1.7) M is an N —N matrix called the iteration matrix. Iterative methods of

this form are called stationary iterative methods because the transition from xk

to xk+1 does not depend on the history of the iteration. The Krylov methods

discussed in Chapters 2 and 3 are not stationary iterative methods.

All our results are based on the following lemma.

Lemma 1.2.1. If M is an N — N matrix with M < 1 then I ’ M is

nonsingular and

1

(I ’ M )’1 ¤

(1.8) .

1’ M

Proof. We will show that I ’ M is nonsingular and that (1.8) holds by

showing that the series

∞

M l = (I ’ M )’1 .

l=0

The partial sums

k

Ml

Sk =

l=0

form a Cauchy sequence in RN —N . To see this note that for all m > k

m

Ml .

Sk ’ Sm ¤

l=k+1

Now, M l ¤ M l because · is a matrix norm that is induced by a vector

norm. Hence

m

1 ’ M m’k

l k+1

Sk ’ S m ¤ M =M ’0

1’ M

l=k+1

as m, k ’ ∞. Hence the sequence Sk converges, say to S. Since M Sk + I =

Sk+1 , we must have M S + I = S and hence (I ’ M )S = I. This proves that

I ’ M is nonsingular and that S = (I ’ M )’1 .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

6 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Noting that

∞

’1 l

= (1 ’ M )’1 .

(I ’ M ) ¤ M

l=0

proves (1.8) and completes the proof.

The following corollary is a direct consequence of Lemma 1.2.1.

Corollary 1.2.1. If M < 1 then the iteration (1.7) converges to

x = (I ’ M )’1 c for all initial iterates x0 .

A consequence of Corollary 1.2.1 is that Richardson iteration (1.6) will

converge if I ’ A < 1. It is sometimes possible to precondition a linear

equation by multiplying both sides of (1.1) by a matrix B

BAx = Bb

so that convergence of iterative methods is improved. In the context of

Richardson iteration, the matrices B that allow us to apply the Banach lemma

and its corollary are called approximate inverses.

Definition 1.2.1. B is an approximate inverse of A if I ’ BA < 1.

The following theorem is often referred to as the Banach Lemma.

Theorem 1.2.1. If A and B are N — N matrices and B is an approximate

inverse of A, then A and B are both nonsingular and

B A

A’1 ¤ B ’1 ¤

(1.9) , ,

1 ’ I ’ BA 1 ’ I ’ BA

and

B I ’ BA A I ’ BA

A’1 ’ B ¤ A ’ B ’1 ¤

(1.10) , .

1 ’ I ’ BA 1 ’ I ’ BA

Proof. Let M = I ’ BA. By Lemma 1.2.1 I ’ M = I ’ (I ’ BA) = BA is

nonsingular. Hence both A and B are nonsingular. By (1.8)

1 1

A’1 B ’1 = (I ’ M )’1 ¤

(1.11) = .

1’ M 1 ’ I ’ BA

Since A’1 = (I ’ M )’1 B, inequality (1.11) implies the ¬rst part of (1.9). The

second part follows in a similar way from B ’1 = A(I ’ M )’1 .

To complete the proof note that

A’1 ’ B = (I ’ BA)A’1 , A ’ B ’1 = B ’1 (I ’ BA),

and use (1.9).

Richardson iteration, preconditioned with approximate inversion, has the

form

(1.12) xk+1 = (I ’ BA)xk + Bb.

If the norm of I ’ BA is small, then not only will the iteration converge

rapidly, but, as Lemma 1.1.1 indicates, termination decisions based on the

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

7

BASIC CONCEPTS

preconditioned residual Bb ’ BAx will better re¬‚ect the actual error. This

method is a very e¬ective technique for solving di¬erential equations, integral

equations, and related problems [15], [6], [100], [117], [111]. Multigrid methods

[19], [99], [126], can also be interpreted in this light. We mention one other

approach, polynomial preconditioning, which tries to approximate A’1 by a

polynomial in A [123], [179], [169].

1.3. The spectral radius

The analysis in § 1.2 related convergence of the iteration (1.7) to the norm of

the matrix M . However the norm of M could be small in some norms and

quite large in others. Hence the performance of the iteration is not completely

described by M . The concept of spectral radius allows us to make a complete

description.

We let σ(A) denote the set of eigenvalues of A.

Definition 1.3.1. The spectral radius of an N — N matrix A is

ρ(A) = max |»| = lim An 1/n

(1.13) .

n’∞

»∈σ(A)

The term on the right-hand side of the second equality in (1.13) is the limit

used by the radical test for convergence of the series An .

The spectral radius of M is independent of any particular matrix norm of

M . It is clear, in fact, that

(1.14) ρ(A) ¤ A

for any induced matrix norm. The inequality (1.14) has a partial converse that

allows us to completely describe the performance of iteration (1.7) in terms of

spectral radius. We state that converse as a theorem and refer to [105] for a

proof.

Theorem 1.3.1. Let A be an N — N matrix. Then for any > 0 there is

a norm · on RN such that

ρ(A) > A ’ .

A consequence of Theorem 1.3.1, Lemma 1.2.1, and Exercise 1.5.1 is a

characterization of convergent stationary iterative methods. The proof is left

as an exercise.

Theorem 1.3.2. Let M be an N —N matrix. The iteration (1.7) converges

for all c ∈ RN if and only if ρ(M ) < 1.

1.4. Matrix splittings and classical stationary iterative methods

There are ways to convert Ax = b to a linear ¬xed-point iteration that are

di¬erent from (1.5). Methods such as Jacobi, Gauss“Seidel, and sucessive

overrelaxation (SOR) iteration are based on splittings of A of the form

A = A1 + A2 ,

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

8 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where A1 is a nonsingular matrix constructed so that equations with A1 as

coe¬cient matrix are easy to solve. Then Ax = b is converted to the ¬xed-

point problem

x = A’1 (b ’ A2 x).

1

The analysis of the method is based on an estimation of the spectral radius of

’1

the iteration matrix M = ’A1 A2 .

For a detailed description of the classical stationary iterative methods the

reader may consult [89], [105], [144], [193], or [200]. These methods are usually

less e¬cient than the Krylov methods discussed in Chapters 2 and 3 or the more

modern stationary methods based on multigrid ideas. However the classical

methods have a role as preconditioners. The limited description in this section

is intended as a review that will set some notation to be used later.

As a ¬rst example we consider the Jacobi iteration that uses the splitting

A1 = D, A2 = L + U,

where D is the diagonal of A and L and U are the (strict) lower and upper

triangular parts. This leads to the iteration matrix

MJAC = ’D’1 (L + U ).

Letting (xk )i denote the ith component of the kth iterate we can express Jacobi

iteration concretely as

«

(xk+1 )i = a’1 bi ’ aij (xk )j .

(1.15) ii

j=i

Note that A1 is diagonal and hence trivial to invert.

We present only one convergence result for the classical stationary iterative

methods.

Theorem 1.4.1. Let A be an N — N matrix and assume that for all

1¤i¤N

(1.16) 0< |aij | < |aii |.

j=i

Then A is nonsingular and the Jacobi iteration (1.15) converges to x— = A’1 b

for all b.

Proof. Note that the ith row sum of M = MJAC satis¬es

N

j=i |aij |

|mij | = < 1.

|aii |

j=1

Hence MJAC ∞ < 1 and the iteration converges to the unique solution of

x = M x + D’1 b. Also I ’ M = D’1 A is nonsingular and therefore A is

nonsingular.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

9

BASIC CONCEPTS

Gauss“Seidel iteration overwrites the approximate solution with the new

value as soon as it is computed. This results in the iteration

«

(xk+1 )i = a’1 bi ’ aij (xk )j ,

aij (xk+1 )j ’

ii

j<i j>i

the splitting

A1 = D + L, A2 = U,

and iteration matrix

MGS = ’(D + L)’1 U.

’1

Note that A1 is lower triangular, and hence A1 y is easy to compute for vectors

y. Note also that, unlike Jacobi iteration, the iteration depends on the ordering

of the unknowns. Backward Gauss“Seidel begins the update of x with the N th

coordinate rather than the ¬rst, resulting in the splitting

A1 = D + U, A2 = L,

and iteration matrix

MBGS = ’(D + U )’1 L.

A symmetric Gauss“Seidel iteration is a forward Gauss“Seidel iteration

followed by a backward Gauss“Seidel iteration. This leads to the iteration

matrix

MSGS = MBGS MGS = (D + U )’1 L(D + L)’1 U.

If A is symmetric then U = LT . In that event

MSGS = (D + U )’1 L(D + L)’1 U = (D + LT )’1 L(D + L)’1 LT .

From the point of view of preconditioning, one wants to write the stationary

method as a preconditioned Richardson iteration. That means that one wants

to ¬nd B such that M = I ’ BA and then use B as an approximate inverse.

For the Jacobi iteration,

BJAC = D’1 .

(1.17)

For symmetric Gauss“Seidel

BSGS = (D + LT )’1 D(D + L)’1 .

(1.18)

The successive overrelaxation iteration modi¬es Gauss“Seidel by adding a

relaxation parameter ω to construct an iteration with iteration matrix

MSOR = (D + ωL)’1 ((1 ’ ω)D ’ ωU ).

The performance can be dramatically improved with a good choice of ω but

still is not competitive with Krylov methods. A further disadvantage is that

the choice of ω is often di¬cult to make. References [200], [89], [193], [8], and

the papers cited therein provide additional reading on this topic.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

10 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

1.5. Exercises on stationary iterative methods

1.5.1. Show that if ρ(M ) ≥ 1 then there are x0 and c such that the iteration

(1.7) fails to converge.

1.5.2. Prove Theorem 1.3.2.

1.5.3. Verify equality (1.18).

1.5.4. Show that if A is symmetric and positive de¬nite (that is AT = A and

xT Ax > 0 for all x = 0) that BSGS is also symmetric and positive

de¬nite.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 2

Conjugate Gradient Iteration

2.1. Krylov methods and the minimization property

In the following two chapters we describe some of the Krylov space methods

for linear equations. Unlike the stationary iterative methods, Krylov methods

do not have an iteration matrix. The two such methods that we™ll discuss in

depth, conjugate gradient and GMRES, minimize, at the kth iteration, some

measure of error over the a¬ne space

x0 + Kk ,

where x0 is the initial iterate and the kth Krylov subspace Kk is

Kk = span(r0 , Ar0 , . . . , Ak’1 r0 )

for k ≥ 1.

The residual is

r = b ’ Ax.

So {rk }k≥0 will denote the sequence of residuals

rk = b ’ Axk .

As in Chapter 1, we assume that A is a nonsingular N — N matrix and let

x— = A’1 b.

There are other Krylov methods that are not as well understood as CG or

GMRES. Brief descriptions of several of these methods and their properties

are in § 3.6, [12], and [78].

The conjugate gradient (CG) iteration was invented in the 1950s [103] as a

direct method. It has come into wide use over the last 15 years as an iterative

method and has generally superseded the Jacobi“Gauss“Seidel“SOR family of

methods.

CG is intended to solve symmetric positive de¬nite (spd) systems. Recall

that A is symmetric if A = AT and positive de¬nite if

xT Ax > 0 for all x = 0.

11

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

12 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

In this section we assume that A is spd. Since A is spd we may de¬ne a norm

(you should check that this is a norm) by

√

xT Ax.

(2.1) x =

A

· A is called the A-norm. The development in these notes is di¬erent from

the classical work and more like the analysis for GMRES and CGNR in [134].

In this section, and in the section on GMRES that follows, we begin with a

description of what the algorithm does and the consequences of the minimiza-

tion property of the iterates. After that we describe termination criterion,

performance, preconditioning, and at the very end, the implementation.

The kth iterate xk of CG minimizes

1

φ(x) = xT Ax ’ xT b

(2.2)

2

over x0 + Kk .

Note that if φ(˜) is the minimal value (in RN ) then

x

∇φ(˜) = A˜ ’ b = 0

x x