<< . .

. 3
( : 26)



. . >>


and hence x = x— .
˜
Minimizing φ over any subset of RN is the same as minimizing x ’ x— A
over that subset. We state this as a lemma.
Lemma 2.1.1. Let S ‚ RN . If xk minimizes φ over S then xk also
minimizes x— ’ x A = r A’1 over S.
Proof. Note that

x ’ x— 2
= (x ’ x— )T A(x ’ x— ) = xT Ax ’ xT Ax— ’ (x— )T Ax + (x— )T Ax— .
A


Since A is symmetric and Ax— = b

’xT Ax— ’ (x— )T Ax = ’2xT Ax— = ’2xT b.

Therefore
x ’ x— 2
= 2φ(x) + (x— )T Ax— .
A

Since (x— )T Ax— is independent of x, minimizing φ is equivalent to minimizing
x ’ x— 2 and hence to minimizing x ’ x— A .
A
If e = x ’ x— then

2
= eT Ae = (A(x ’ x— ))T A’1 (A(x ’ x— )) = b ’ Ax 2
e A A’1


and hence the A-norm of the error is also the A’1 -norm of the residual.
We will use this lemma in the particular case that S = x0 + Kk for some k.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




13
CONJUGATE GRADIENT ITERATION

2.2. Consequences of the minimization property
Lemma 2.1.1 implies that since xk minimizes φ over x0 + Kk

x— ’ xk ¤ x— ’ w
(2.3) A A

for all w ∈ x0 + Kk . Since any w ∈ x0 + Kk can be written as
k’1
γj Aj r0 + x0
w=
j=0

for some coe¬cients {γj }, we can express x— ’ w as
k’1
— —
γj Aj r0 .
x ’ w = x ’ x0 ’
j=0

Since Ax— = b we have

r0 = b ’ Ax0 = A(x— ’ x0 )

and therefore
k’1
— —
γj Aj+1 (x— ’ x0 ) = p(A)(x— ’ x0 ),
x ’ w = x ’ x0 ’
j=0

where the polynomial
k’1
γj z j+1
p(z) = 1 ’
j=0

has degree k and satis¬es p(0) = 1. Hence

x— ’ xk p(A)(x— ’ x0 )
(2.4) = min A.
A
p∈Pk ,p(0)=1

In (2.4) Pk denotes the set of polynomials of degree k.
The spectral theorem for spd matrices asserts that

A = U ΛU T ,

where U is an orthogonal matrix whose columns are the eigenvectors of A and
Λ is a diagonal matrix with the positive eigenvalues of A on the diagonal. Since
U U T = U T U = I by orthogonality of U , we have

Aj = U Λ j U T .

Hence
p(A) = U p(Λ)U T .
De¬ne A1/2 = U Λ1/2 U T and note that
2
= xT Ax = A1/2 x 2 .
(2.5) x A 2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




14 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Hence, for any x ∈ RN and

= A1/2 p(A)x A1/2 x
p(A)x ¤ p(A) ¤ p(A) x A.
2 2 2 2
A

This, together with (2.4) implies that

xk ’ x— ¤ x0 ’ x—
(2.6) min max |p(z)|.
A A
p∈Pk ,p(0)=1 z∈σ(A)

Here σ(A) is the set of all eigenvalues of A.
The following corollary is an important consequence of (2.6).
Corollary 2.2.1. Let A be spd and let {xk } be the CG iterates. Let k be
given and let {¯k } be any kth degree polynomial such that pk (0) = 1. Then
p ¯

xk ’ x— A
(2.7) ¤ max |¯k (z)|.
p
x0 ’ x— z∈σ(A)
A

We will refer to the polynomial pk as a residual polynomial [185].
¯
Definition 2.2.1. The set of kth degree residual polynomials is

(2.8) Pk = {p | p is a polynomial of degree k and p(0) = 1.}

In speci¬c contexts we try to construct sequences of residual polynomials,
based on information on σ(A), that make either the middle or the right term
in (2.7) easy to evaluate. This leads to an upper estimate for the number of
CG iterations required to reduce the A-norm of the error to a given tolerance.
One simple application of (2.7) is to show how the CG algorithm can be
viewed as a direct method.
Theorem 2.2.1. Let A be spd. Then the CG algorithm will ¬nd the
solution within N iterations.
Proof. Let {»i }N be the eigenvalues of A. As a test polynomial, let
i=1

N
p(z) =
¯ (»i ’ z)/»i .
i=1

p ∈ PN because p has degree N and p(0) = 1. Hence, by (2.7) and the fact
¯ ¯ ¯
that p vanishes on σ(A),
¯

xN ’ x— ¤ x0 ’ x— max |¯(z)| = 0.
p
A A
z∈σ(A)

Note that our test polynomial had the eigenvalues of A as its roots. In
that way we showed (in the absence of all roundo¬ error!) that CG terminated
in ¬nitely many iterations with the exact solution. This is not as good as it
sounds, since in most applications the number of unknowns N is very large,
and one cannot a¬ord to perform N iterations. It is best to regard CG as an
iterative method. When doing that we seek to terminate the iteration when
some speci¬ed error tolerance is reached.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




15
CONJUGATE GRADIENT ITERATION

In the two examples that follow we look at some other easy consequences
of (2.7).
Theorem 2.2.2. Let A be spd with eigenvectors {ui }N . Let b be a linear
i=1
combination of k of the eigenvectors of A
k
b= γl uil .
l=1

Then the CG iteration for Ax = b with x0 = 0 will terminate in at most k
iterations.
Proof. Let {»il } be the eigenvalues of A associated with the eigenvectors
{uil }k . By the spectral theorem
l=1

k

x= (γl /»il )uil .
l=1

We use the residual polynomial,
k
p(z) =
¯ (»il ’ z)/»il .
l=1

One can easily verify that p ∈ Pk . Moreover, p(»il ) = 0 for 1 ¤ l ¤ k and
¯ ¯
hence
k

p(A)x =
¯ p(»il )γl /»il uil = 0.
¯
l=1
So, we have by (2.4) and the fact that x0 = 0 that

xk ’ x— ¤ p(A)x—
¯ = 0.
A A

This completes the proof.
If the spectrum of A has fewer than N points, we can use a similar technique
to prove the following theorem.
Theorem 2.2.3. Let A be spd. Assume that there are exactly k ¤ N
distinct eigenvalues of A. Then the CG iteration terminates in at most k
iterations.

2.3. Termination of the iteration
In practice we do not run the CG iteration until an exact solution is found, but
rather terminate once some criterion has been satis¬ed. One typical criterion is
small (say ¤ ·) relative residuals. This means that we terminate the iteration
after
(2.9) b ’ Axk 2 ¤ · b 2 .
The error estimates that come from the minimization property, however, are
based on (2.7) and therefore estimate the reduction in the relative A-norm of
the error.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




16 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Our next task is to relate the relative residual in the Euclidean norm to
the relative error in the A-norm. We will do this in the next two lemmas and
then illustrate the point with an example.
Lemma 2.3.1. Let A be spd with eigenvalues »1 ≥ »2 ≥ . . . »N . Then for
all z ∈ RN ,
A1/2 z 2 = z A
(2.10)
and
1/2 1/2
(2.11) »N z ¤ Az ¤ »1 z A.
2
A

Proof. Clearly
2
= z T Az = (A1/2 z)T (A1/2 z) = A1/2 z 2
z A 2

which proves (2.10).
Let ui be a unit eigenvector corresponding to »i . We may write A = U ΛU T
as
N
»i (uT z)ui .
Az = i
i=1
Hence
N
»N A1/2 z 2 T2
= »N i=1 »i (ui z)
2

N
2 2T2
¤ Az = i=1 »i (ui z)
2

N T2 = »1 A1/2 z 2 .
¤ »1 i=1 »i (ui z) 2

Taking square roots and using (2.10) complete the proof.
Lemma 2.3.2.
xk ’ x—
b b ’ Axk b ’ Axk
2 2 2 A
(2.12) = ¤ κ2 (A) —
r0 b2 b ’ Ax0 x ’ x0
2 2 A

and
xk ’ x—
b ’ Axk κ2 (A) r0
2 2 A
(2.13) ¤ .
x— ’ x0
b2 b2 A
Proof. The equality on the left of (2.12) is clear and (2.13) follows directly
from (2.12). To obtain the inequality on the right of (2.12), ¬rst recall that if
A = U ΛU T is the spectral decomposition of A and we order the eigenvalues
such that »1 ≥ »2 ≥ . . . »N > 0, then A 2 = »1 and A’1 2 = 1/»N . So
κ2 (A) = »1 /»N .
Therefore, using (2.10) and (2.11) twice,

A(x— ’ xk ) »1 x— ’ xk
b ’ Axk 2 2 A
= ¤
A(x— ’ x0 ) »N x— ’ x0
b ’ Ax0 2 2 A

as asserted.
So, to predict the performance of the CG iteration based on termination on
small relative residuals, we must not only use (2.7) to predict when the relative



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




17
CONJUGATE GRADIENT ITERATION

A-norm error is small, but also use Lemma 2.3.2 to relate small A-norm errors
to small relative residuals.
We consider a very simple example. Assume that x0 = 0 and that the
eigenvalues of A are contained in the interval (9, 11). If we let pk (z) =
¯
k /10k , then p ∈ P . This means that we may apply (2.7) to get
(10 ’ z) ¯k k

xk ’ x— ¤ x— max |¯k (z)|.
p
A A
9¤z¤11

It is easy to see that
max |¯k (z)| = 10’k .
p
9¤z¤11

Hence, after k iterations

xk ’ x— ¤ x— ’k
(2.14) A 10 .
A

So, the size of the A-norm of the error will be reduced by a factor of 10’3 when

10’k ¤ 10’3 ,

that is, when
k ≥ 3.
To use Lemma 2.3.2 we simply note that κ2 (A) ¤ 11/9. Hence, after k
iterations we have
Axk ’ b 2 √
¤ 11 — 10’k /3.
b2
So, the size of the relative residual will be reduced by a factor of 10’3 when

10’k ¤ 3 — 10’3 / 11,

that is, when
k ≥ 4.
One can obtain a more precise estimate by using a polynomial other than
pk in the upper estimate for the right-hand side of (2.7). Note that it is always
the case that the spectrum of a spd matrix is contained in the interval [»N , »1 ]
and that κ2 (A) = »1 /»N . A result from [48] (see also [45]) that is, in one
sense, the sharpest possible, is
k
κ2 (A) ’ 1
— —
(2.15) xk ’ x ¤ 2 x0 ’ x .
A A
κ2 (A) + 1

In the case of the above example, we can estimate κ2 (A) by κ2 (A) ¤ 11/9.
√ √
Hence, since ( x ’ 1)/( x + 1) is an increasing function of x on the interval
(1, ∞). √
κ2 (A) ’ 1 11 ’ 3
¤√ ≈ .05.
κ2 (A) + 1 11 + 3



<< . .

. 3
( : 26)



. . >>