and hence x = x— .

˜

Minimizing φ over any subset of RN is the same as minimizing x ’ x— A

over that subset. We state this as a lemma.

Lemma 2.1.1. Let S ‚ RN . If xk minimizes φ over S then xk also

minimizes x— ’ x A = r A’1 over S.

Proof. Note that

x ’ x— 2

= (x ’ x— )T A(x ’ x— ) = xT Ax ’ xT Ax— ’ (x— )T Ax + (x— )T Ax— .

A

Since A is symmetric and Ax— = b

’xT Ax— ’ (x— )T Ax = ’2xT Ax— = ’2xT b.

Therefore

x ’ x— 2

= 2φ(x) + (x— )T Ax— .

A

Since (x— )T Ax— is independent of x, minimizing φ is equivalent to minimizing

x ’ x— 2 and hence to minimizing x ’ x— A .

A

If e = x ’ x— then

2

= eT Ae = (A(x ’ x— ))T A’1 (A(x ’ x— )) = b ’ Ax 2

e A A’1

and hence the A-norm of the error is also the A’1 -norm of the residual.

We will use this lemma in the particular case that S = x0 + Kk for some k.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

13

CONJUGATE GRADIENT ITERATION

2.2. Consequences of the minimization property

Lemma 2.1.1 implies that since xk minimizes φ over x0 + Kk

x— ’ xk ¤ x— ’ w

(2.3) A A

for all w ∈ x0 + Kk . Since any w ∈ x0 + Kk can be written as

k’1

γj Aj r0 + x0

w=

j=0

for some coe¬cients {γj }, we can express x— ’ w as

k’1

— —

γj Aj r0 .

x ’ w = x ’ x0 ’

j=0

Since Ax— = b we have

r0 = b ’ Ax0 = A(x— ’ x0 )

and therefore

k’1

— —

γj Aj+1 (x— ’ x0 ) = p(A)(x— ’ x0 ),

x ’ w = x ’ x0 ’

j=0

where the polynomial

k’1

γj z j+1

p(z) = 1 ’

j=0

has degree k and satis¬es p(0) = 1. Hence

x— ’ xk p(A)(x— ’ x0 )

(2.4) = min A.

A

p∈Pk ,p(0)=1

In (2.4) Pk denotes the set of polynomials of degree k.

The spectral theorem for spd matrices asserts that

A = U ΛU T ,

where U is an orthogonal matrix whose columns are the eigenvectors of A and

Λ is a diagonal matrix with the positive eigenvalues of A on the diagonal. Since

U U T = U T U = I by orthogonality of U , we have

Aj = U Λ j U T .

Hence

p(A) = U p(Λ)U T .

De¬ne A1/2 = U Λ1/2 U T and note that

2

= xT Ax = A1/2 x 2 .

(2.5) x A 2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

14 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Hence, for any x ∈ RN and

= A1/2 p(A)x A1/2 x

p(A)x ¤ p(A) ¤ p(A) x A.

2 2 2 2

A

This, together with (2.4) implies that

xk ’ x— ¤ x0 ’ x—

(2.6) min max |p(z)|.

A A

p∈Pk ,p(0)=1 z∈σ(A)

Here σ(A) is the set of all eigenvalues of A.

The following corollary is an important consequence of (2.6).

Corollary 2.2.1. Let A be spd and let {xk } be the CG iterates. Let k be

given and let {¯k } be any kth degree polynomial such that pk (0) = 1. Then

p ¯

xk ’ x— A

(2.7) ¤ max |¯k (z)|.

p

x0 ’ x— z∈σ(A)

A

We will refer to the polynomial pk as a residual polynomial [185].

¯

Definition 2.2.1. The set of kth degree residual polynomials is

(2.8) Pk = {p | p is a polynomial of degree k and p(0) = 1.}

In speci¬c contexts we try to construct sequences of residual polynomials,

based on information on σ(A), that make either the middle or the right term

in (2.7) easy to evaluate. This leads to an upper estimate for the number of

CG iterations required to reduce the A-norm of the error to a given tolerance.

One simple application of (2.7) is to show how the CG algorithm can be

viewed as a direct method.

Theorem 2.2.1. Let A be spd. Then the CG algorithm will ¬nd the

solution within N iterations.

Proof. Let {»i }N be the eigenvalues of A. As a test polynomial, let

i=1

N

p(z) =

¯ (»i ’ z)/»i .

i=1

p ∈ PN because p has degree N and p(0) = 1. Hence, by (2.7) and the fact

¯ ¯ ¯

that p vanishes on σ(A),

¯

xN ’ x— ¤ x0 ’ x— max |¯(z)| = 0.

p

A A

z∈σ(A)

Note that our test polynomial had the eigenvalues of A as its roots. In

that way we showed (in the absence of all roundo¬ error!) that CG terminated

in ¬nitely many iterations with the exact solution. This is not as good as it

sounds, since in most applications the number of unknowns N is very large,

and one cannot a¬ord to perform N iterations. It is best to regard CG as an

iterative method. When doing that we seek to terminate the iteration when

some speci¬ed error tolerance is reached.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

15

CONJUGATE GRADIENT ITERATION

In the two examples that follow we look at some other easy consequences

of (2.7).

Theorem 2.2.2. Let A be spd with eigenvectors {ui }N . Let b be a linear

i=1

combination of k of the eigenvectors of A

k

b= γl uil .

l=1

Then the CG iteration for Ax = b with x0 = 0 will terminate in at most k

iterations.

Proof. Let {»il } be the eigenvalues of A associated with the eigenvectors

{uil }k . By the spectral theorem

l=1

k

—

x= (γl /»il )uil .

l=1

We use the residual polynomial,

k

p(z) =

¯ (»il ’ z)/»il .

l=1

One can easily verify that p ∈ Pk . Moreover, p(»il ) = 0 for 1 ¤ l ¤ k and

¯ ¯

hence

k

—

p(A)x =

¯ p(»il )γl /»il uil = 0.

¯

l=1

So, we have by (2.4) and the fact that x0 = 0 that

xk ’ x— ¤ p(A)x—

¯ = 0.

A A

This completes the proof.

If the spectrum of A has fewer than N points, we can use a similar technique

to prove the following theorem.

Theorem 2.2.3. Let A be spd. Assume that there are exactly k ¤ N

distinct eigenvalues of A. Then the CG iteration terminates in at most k

iterations.

2.3. Termination of the iteration

In practice we do not run the CG iteration until an exact solution is found, but

rather terminate once some criterion has been satis¬ed. One typical criterion is

small (say ¤ ·) relative residuals. This means that we terminate the iteration

after

(2.9) b ’ Axk 2 ¤ · b 2 .

The error estimates that come from the minimization property, however, are

based on (2.7) and therefore estimate the reduction in the relative A-norm of

the error.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

16 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Our next task is to relate the relative residual in the Euclidean norm to

the relative error in the A-norm. We will do this in the next two lemmas and

then illustrate the point with an example.

Lemma 2.3.1. Let A be spd with eigenvalues »1 ≥ »2 ≥ . . . »N . Then for

all z ∈ RN ,

A1/2 z 2 = z A

(2.10)

and

1/2 1/2

(2.11) »N z ¤ Az ¤ »1 z A.

2

A

Proof. Clearly

2

= z T Az = (A1/2 z)T (A1/2 z) = A1/2 z 2

z A 2

which proves (2.10).

Let ui be a unit eigenvector corresponding to »i . We may write A = U ΛU T

as

N

»i (uT z)ui .

Az = i

i=1

Hence

N

»N A1/2 z 2 T2

= »N i=1 »i (ui z)

2

N

2 2T2

¤ Az = i=1 »i (ui z)

2

N T2 = »1 A1/2 z 2 .

¤ »1 i=1 »i (ui z) 2

Taking square roots and using (2.10) complete the proof.

Lemma 2.3.2.

xk ’ x—

b b ’ Axk b ’ Axk

2 2 2 A

(2.12) = ¤ κ2 (A) —

r0 b2 b ’ Ax0 x ’ x0

2 2 A

and

xk ’ x—

b ’ Axk κ2 (A) r0

2 2 A

(2.13) ¤ .

x— ’ x0

b2 b2 A

Proof. The equality on the left of (2.12) is clear and (2.13) follows directly

from (2.12). To obtain the inequality on the right of (2.12), ¬rst recall that if

A = U ΛU T is the spectral decomposition of A and we order the eigenvalues

such that »1 ≥ »2 ≥ . . . »N > 0, then A 2 = »1 and A’1 2 = 1/»N . So

κ2 (A) = »1 /»N .

Therefore, using (2.10) and (2.11) twice,

A(x— ’ xk ) »1 x— ’ xk

b ’ Axk 2 2 A

= ¤

A(x— ’ x0 ) »N x— ’ x0

b ’ Ax0 2 2 A

as asserted.

So, to predict the performance of the CG iteration based on termination on

small relative residuals, we must not only use (2.7) to predict when the relative

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

17

CONJUGATE GRADIENT ITERATION

A-norm error is small, but also use Lemma 2.3.2 to relate small A-norm errors

to small relative residuals.

We consider a very simple example. Assume that x0 = 0 and that the

eigenvalues of A are contained in the interval (9, 11). If we let pk (z) =

¯

k /10k , then p ∈ P . This means that we may apply (2.7) to get

(10 ’ z) ¯k k

xk ’ x— ¤ x— max |¯k (z)|.

p

A A

9¤z¤11

It is easy to see that

max |¯k (z)| = 10’k .

p

9¤z¤11

Hence, after k iterations

xk ’ x— ¤ x— ’k

(2.14) A 10 .

A

So, the size of the A-norm of the error will be reduced by a factor of 10’3 when

10’k ¤ 10’3 ,

that is, when

k ≥ 3.

To use Lemma 2.3.2 we simply note that κ2 (A) ¤ 11/9. Hence, after k

iterations we have

Axk ’ b 2 √

¤ 11 — 10’k /3.

b2

So, the size of the relative residual will be reduced by a factor of 10’3 when

√

10’k ¤ 3 — 10’3 / 11,

that is, when

k ≥ 4.

One can obtain a more precise estimate by using a polynomial other than

pk in the upper estimate for the right-hand side of (2.7). Note that it is always

the case that the spectrum of a spd matrix is contained in the interval [»N , »1 ]

and that κ2 (A) = »1 /»N . A result from [48] (see also [45]) that is, in one

sense, the sharpest possible, is

k

κ2 (A) ’ 1

— —

(2.15) xk ’ x ¤ 2 x0 ’ x .

A A

κ2 (A) + 1

In the case of the above example, we can estimate κ2 (A) by κ2 (A) ¤ 11/9.

√ √

Hence, since ( x ’ 1)/( x + 1) is an increasing function of x on the interval

(1, ∞). √

κ2 (A) ’ 1 11 ’ 3

¤√ ≈ .05.

κ2 (A) + 1 11 + 3