Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

18 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Therefore (2.15) would predict a reduction in the size of the A-norm error by

a factor of 10’3 when

2 — .05k < 10’3

or when

k > ’ log10 (2000)/ log10 (.05) ≈ 3.3/1.3 ≈ 2.6,

which also predicts termination within three iterations.

We may have more precise information than a single interval containing

σ(A). When we do, the estimate in (2.15) can be very pessimistic. If the

eigenvalues cluster in a small number of intervals, the condition number can

be quite large, but CG can perform very well. We will illustrate this with an

example. Exercise 2.8.5 also covers this point.

Assume that x0 = 0 and the eigenvalues of A lie in the two intervals (1, 1.5)

and (399, 400). Based on this information the best estimate of the condition

number of A is κ2 (A) ¤ 400, which, when inserted into (2.15) gives

xk ’ x— A

¤ 2 — (19/21)k ≈ 2 — (.91)k .

x— A

This would indicate fairly slow convergence. However, if we use as a residual

polynomial p3k ∈ P3k

¯

(1.25 ’ z)k (400 ’ z)2k

p3k (z) =

¯ .

(1.25)k — 4002k

It is easy to see that

max |¯3k (z)| ¤ (.25/1.25)k = (.2)k ,

p

z∈σ(A)

which is a sharper estimate on convergence. In fact, (2.15) would predict that

xk ’ x— ¤ 10’3 x— A,

A

when 2 — (.91)k < 10’3 or when

k > ’ log10 (2000)/ log10 (.91) ≈ 3.3/.04 = 82.5.

The estimate based on the clustering gives convergence in 3k iterations when

(.2)k ¤ 10’3

or when

k > ’3/ log10 (.2) = 4.3.

Hence (2.15) predicts 83 iterations and the clustering analysis 15 (the smallest

integer multiple of 3 larger than 3 — 4.3 = 12.9).

From the results above one can see that if the condition number of A is near

one, the CG iteration will converge very rapidly. Even if the condition number

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

19

CONJUGATE GRADIENT ITERATION

is large, the iteration will perform well if the eigenvalues are clustered in a few

small intervals. The transformation of the problem into one with eigenvalues

clustered near one (i.e., easier to solve) is called preconditioning. We used

this term before in the context of Richardson iteration and accomplished the

goal by multiplying A by an approximate inverse. In the context of CG, such

a simple approach can destroy the symmetry of the coe¬cient matrix and a

more subtle implementation is required. We discuss this in § 2.5.

2.4. Implementation

The implementation of CG depends on the amazing fact that once xk has been

determined, either xk = x— or a search direction pk+1 = 0 can be found very

cheaply so that xk+1 = xk + ±k+1 pk+1 for some scalar ±k+1 . Once pk+1 has

been found, ±k+1 is easy to compute from the minimization property of the

iteration. In fact

dφ(xk + ±pk+1 )

(2.16) =0

d±

for the correct choice of ± = ±k+1 . Equation (2.16) can be written as

pT Axk + ±pT Apk+1 ’ pT b = 0

k+1 k+1 k+1

leading to

pT (b ’ Axk ) pT r k

k+1

= T k+1

(2.17) ±k+1 = .

T Ap

pk+1 k+1 pk+1 Apk+1

If xk = xk+1 then the above analysis implies that ± = 0. We show that

this only happens if xk is the solution.

Lemma 2.4.1. Let A be spd and let {xk } be the conjugate gradient iterates.

Then

T

(2.18) rk rl = 0 for all 0 ¤ l < k.

Proof. Since xk minimizes φ on x0 + Kk , we have, for any ξ ∈ Kk

dφ(xk + tξ)

= ∇φ(xk + tξ)T ξ = 0

dt

at t = 0. Recalling that

∇φ(x) = Ax ’ b = ’r

we have

∇φ(xk )T ξ = ’rk ξ = 0 for all ξ ∈ Kk .

T

(2.19)

Since rl ∈ Kk for all l < k (see Exercise 2.8.1), this proves (2.18).

Now, if xk = xk+1 , then rk = rk+1 . Lemma 2.4.1 then implies that

rk 2 = rk rk = rk rk+1 = 0 and hence xk = x— .

T T

2

The next lemma characterizes the search direction and, as a side e¬ect,

proves that (if we de¬ne p0 = 0) pT rk = 0 for all 0 ¤ l < k ¤ n, unless the

l

iteration terminates prematurely.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

20 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Lemma 2.4.2. Let A be spd and let {xk } be the conjugate gradient iterates.

If xk = x— then xk+1 = xk + ±k+1 pk+1 and pk+1 is determined up to a scalar

multiple by the conditions

pk+1 ∈ Kk+1 , pT Aξ = 0 for all ξ ∈ Kk .

(2.20) k+1

Proof. Since Kk ‚ Kk+1 ,

∇φ(xk+1 )T ξ = (Axk + ±k+1 Apk+1 ’ b)T ξ = 0

(2.21)

for all ξ ∈ Kk . (2.19) and (2.21) then imply that for all ξ ∈ Kk ,

±k+1 pT Aξ = ’(Axk ’ b)T ξ = ’∇φ(xk )T ξ = 0.

(2.22) k+1

This uniquely speci¬es the direction of pk+1 as (2.22) implies that pk+1 ∈ Kk+1

is A-orthogonal (i.e., in the scalar product (x, y) = xT Ay) to Kk , a subspace

of dimension one less than Kk+1 .

The condition pT Aξ = 0 is called A-conjugacy of pk+1 to Kk . Now, any

k+1

pk+1 satisfying (2.20) can, up to a scalar multiple, be expressed as

pk+1 = rk + wk

with wk ∈ Kk . While one might think that wk would be hard to compute, it

is, in fact, trivial. We have the following theorem.

Theorem 2.4.1. Let A be spd and assume that rk = 0. De¬ne p0 = 0.

Then

(2.23) pk+1 = rk + βk+1 pk for some βk+1 and k ≥ 0.

Proof. By Lemma 2.4.2 and the fact that Kk = span(r0 , . . . , rk’1 ), we need

only verify that a βk+1 can be found so that if pk+1 is given by (2.23) then

pT Arl = 0

k+1

for all 0 ¤ l ¤ k ’ 1.

Let pk+1 be given by (2.23). Then for any l ¤ k

pT Arl = rk Arl + βk+1 pT Arl .

T

k+1 k

If l ¤ k ’ 2, then rl ∈ Kl+1 ‚ Kk’1 . Lemma 2.4.2 then implies that

pT Arl = 0 for 0 ¤ l ¤ k ’ 2.

k+1

It only remains to solve for βk+1 so that pT Ark’1 = 0. Trivially

k+1

βk+1 = ’rk Ark’1 /pT Ark’1

T

(2.24) k

provided pT Ark’1 = 0. Since

k

rk = rk’1 ’ ±k Apk

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

21

CONJUGATE GRADIENT ITERATION

we have

T 2

’ ±k pT Ark’1 .

rk rk’1 = rk’1 2 k

T

Since rk rk’1 = 0 by Lemma 2.4.1 we have

pT Ark’1 = rk’1 2 /±k = 0.

(2.25) k 2

This completes the proof.

The common implementation of conjugate gradient uses a di¬erent form

for ±k and βk than given in (2.17) and (2.24).

Lemma 2.4.3. Let A be spd and assume that rk = 0. Then

rk 22

(2.26) ±k+1 =T

pk+1 Apk+1

and

2

rk 2

(2.27) βk+1 = 2.

rk’1 2

Proof. Note that for k ≥ 0

pT rk+1 = rk rk+1 + βk+1 pT rk+1 = 0

T

(2.28) k+1 k

by Lemma 2.4.2. An immediate consequence of (2.28) is that pT rk = 0 and

k

hence

pT rk = (rk + βk+1 pk )T rk = rk 2 .

(2.29) k+1 2

Taking scalar products of both sides of

rk+1 = rk ’ ±k+1 Apk+1

with pk+1 and using (2.29) gives

0 = pT rk ’ ±k+1 pT Apk+1 = rk

T 2

’ ±k+1 pT Apk+1 ,

k+1 k+1 2 k+1

which is equivalent to (2.26).

To get (2.27) note that pT Apk = 0 and hence (2.23) implies that

k+1

T

’rk Apk

(2.30) βk+1 =T .

pk Apk

Also note that

pT Apk = pT A(rk’1 + βk pk’1 )

k k

(2.31)

= pT Ark’1 + βk pT Apk’1 = pT Ark’1 .

k k k

Now combine (2.30), (2.31), and (2.25) to get

T

’rk Apk ±k

βk+1 = .

rk’1 22

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

22 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Now take scalar products of both sides of

rk = rk’1 ’ ±k Apk

with rk and use Lemma 2.4.1 to get

2 T

rk = ’±k rk Apk .

2

Hence (2.27) holds.

The usual implementation re¬‚ects all of the above results. The goal is to

¬nd, for a given , a vector x so that b’Ax 2 ¤ b 2 . The input is the initial

iterate x, which is overwritten with the solution, the right hand side b, and a

routine which computes the action of A on a vector. We limit the number of

iterations to kmax and return the solution, which overwrites the initial iterate

x and the residual norm.

Algorithm 2.4.1. cg(x, b, A, , kmax)

1. r = b ’ Ax, ρ0 = r 2 , k = 1.

2

√

2. Do While ρk’1 > b 2 and k < kmax

(a) if k = 1 then p = r

else

β = ρk’1 /ρk’2 and p = r + βp

(b) w = Ap

(c) ± = ρk’1 /pT w

(d) x = x + ±p

(e) r = r ’ ±w

2

(f) ρk = r 2

(g) k = k + 1

Note that the matrix A itself need not be formed or stored, only a routine

for matrix-vector products is required. Krylov space methods are often called

matrix-free for that reason.

Now, consider the costs. We need store only the four vectors x, w, p, and r.

Each iteration requires a single matrix-vector product (to compute w = Ap),

two scalar products (one for pT w and one to compute ρk = r 2 ), and three

2

operations of the form ax + y, where x and y are vectors and a is a scalar.

It is remarkable that the iteration can progress without storing a basis for

the entire Krylov subspace. As we will see in the section on GMRES, this is

not the case in general. The spd structure buys quite a lot.

2.5. Preconditioning

To reduce the condition number, and hence improve the performance of the

iteration, one might try to replace Ax = b by another spd system with the

same solution. If M is a spd matrix that is close to A’1 , then the eigenvalues

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

23

CONJUGATE GRADIENT ITERATION

of M A will be clustered near one. However M A is unlikely to be spd, and

hence CG cannot be applied to the system M Ax = M b.

In theory one avoids this di¬culty by expressing the preconditioned

problem in terms of B, where B is spd, A = B 2 , and by using a two-sided

preconditioner, S ≈ B ’1 (so M = S 2 ). Then the matrix SAS is spd and its

eigenvalues are clustered near one. Moreover the preconditioned system

SASy = Sb

has y — = S ’1 x— as a solution, where Ax— = b. Hence x— can be recovered

from y — by multiplication by S. One might think, therefore, that computing

S (or a subroutine for its action on a vector) would be necessary and that a

matrix-vector multiply by SAS would incur a cost of one multiplication by A

and two by S. Fortunately, this is not the case.

If y k , rk , pk are the iterate, residual, and search direction for CG applied

ˆˆ

to SAS and we let

xk = S y k , rk = S ’1 rk , pk = S pk , and zk = S rk ,

ˆ ˆ ˆ ˆ

then one can perform the iteration directly in terms of xk , A, and M . The

reader should verify that the following algorithm does exactly that. The input

is the same as that for Algorithm cg and the routine to compute the action of

the preconditioner on a vector. Aside from the preconditioner, the arguments

to pcg are the same as those to Algorithm cg.

Algorithm 2.5.1. pcg(x, b, A, M, , kmax)

1. r = b ’ Ax, ρ0 = r 2 , k = 1

2

√

2. Do While ρk’1 > b 2 and k < kmax

(a) z = M r

(b) „k’1 = z T r

(c) if k = 1 then β = 0 and p = z

else

β = „k’1 /„k’2 , p = z + βp

(d) w = Ap

(e) ± = „k’1 /pT w

(f) x = x + ±p

(g) r = r ’ ±w

(h) ρk = rT r

(i) k = k + 1

Note that the cost is identical to CG with the addition of

• the application of the preconditioner M in step 2a and

• the additional inner product required to compute „k in step 2b.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

24 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Of these costs, the application of the preconditioner is usually the larger. In

the remainder of this section we brie¬‚y mention some classes of preconditioners.