<< . .

. 4
( : 26)



. . >>

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




18 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Therefore (2.15) would predict a reduction in the size of the A-norm error by
a factor of 10’3 when
2 — .05k < 10’3
or when
k > ’ log10 (2000)/ log10 (.05) ≈ 3.3/1.3 ≈ 2.6,
which also predicts termination within three iterations.
We may have more precise information than a single interval containing
σ(A). When we do, the estimate in (2.15) can be very pessimistic. If the
eigenvalues cluster in a small number of intervals, the condition number can
be quite large, but CG can perform very well. We will illustrate this with an
example. Exercise 2.8.5 also covers this point.
Assume that x0 = 0 and the eigenvalues of A lie in the two intervals (1, 1.5)
and (399, 400). Based on this information the best estimate of the condition
number of A is κ2 (A) ¤ 400, which, when inserted into (2.15) gives

xk ’ x— A
¤ 2 — (19/21)k ≈ 2 — (.91)k .
x— A
This would indicate fairly slow convergence. However, if we use as a residual
polynomial p3k ∈ P3k
¯

(1.25 ’ z)k (400 ’ z)2k
p3k (z) =
¯ .
(1.25)k — 4002k
It is easy to see that

max |¯3k (z)| ¤ (.25/1.25)k = (.2)k ,
p
z∈σ(A)

which is a sharper estimate on convergence. In fact, (2.15) would predict that

xk ’ x— ¤ 10’3 x— A,
A

when 2 — (.91)k < 10’3 or when

k > ’ log10 (2000)/ log10 (.91) ≈ 3.3/.04 = 82.5.

The estimate based on the clustering gives convergence in 3k iterations when

(.2)k ¤ 10’3

or when
k > ’3/ log10 (.2) = 4.3.
Hence (2.15) predicts 83 iterations and the clustering analysis 15 (the smallest
integer multiple of 3 larger than 3 — 4.3 = 12.9).
From the results above one can see that if the condition number of A is near
one, the CG iteration will converge very rapidly. Even if the condition number



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




19
CONJUGATE GRADIENT ITERATION

is large, the iteration will perform well if the eigenvalues are clustered in a few
small intervals. The transformation of the problem into one with eigenvalues
clustered near one (i.e., easier to solve) is called preconditioning. We used
this term before in the context of Richardson iteration and accomplished the
goal by multiplying A by an approximate inverse. In the context of CG, such
a simple approach can destroy the symmetry of the coe¬cient matrix and a
more subtle implementation is required. We discuss this in § 2.5.

2.4. Implementation
The implementation of CG depends on the amazing fact that once xk has been
determined, either xk = x— or a search direction pk+1 = 0 can be found very
cheaply so that xk+1 = xk + ±k+1 pk+1 for some scalar ±k+1 . Once pk+1 has
been found, ±k+1 is easy to compute from the minimization property of the
iteration. In fact
dφ(xk + ±pk+1 )
(2.16) =0

for the correct choice of ± = ±k+1 . Equation (2.16) can be written as

pT Axk + ±pT Apk+1 ’ pT b = 0
k+1 k+1 k+1

leading to
pT (b ’ Axk ) pT r k
k+1
= T k+1
(2.17) ±k+1 = .
T Ap
pk+1 k+1 pk+1 Apk+1
If xk = xk+1 then the above analysis implies that ± = 0. We show that
this only happens if xk is the solution.
Lemma 2.4.1. Let A be spd and let {xk } be the conjugate gradient iterates.
Then
T
(2.18) rk rl = 0 for all 0 ¤ l < k.
Proof. Since xk minimizes φ on x0 + Kk , we have, for any ξ ∈ Kk

dφ(xk + tξ)
= ∇φ(xk + tξ)T ξ = 0
dt
at t = 0. Recalling that

∇φ(x) = Ax ’ b = ’r

we have
∇φ(xk )T ξ = ’rk ξ = 0 for all ξ ∈ Kk .
T
(2.19)
Since rl ∈ Kk for all l < k (see Exercise 2.8.1), this proves (2.18).
Now, if xk = xk+1 , then rk = rk+1 . Lemma 2.4.1 then implies that
rk 2 = rk rk = rk rk+1 = 0 and hence xk = x— .
T T
2
The next lemma characterizes the search direction and, as a side e¬ect,
proves that (if we de¬ne p0 = 0) pT rk = 0 for all 0 ¤ l < k ¤ n, unless the
l
iteration terminates prematurely.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




20 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Lemma 2.4.2. Let A be spd and let {xk } be the conjugate gradient iterates.
If xk = x— then xk+1 = xk + ±k+1 pk+1 and pk+1 is determined up to a scalar
multiple by the conditions

pk+1 ∈ Kk+1 , pT Aξ = 0 for all ξ ∈ Kk .
(2.20) k+1

Proof. Since Kk ‚ Kk+1 ,

∇φ(xk+1 )T ξ = (Axk + ±k+1 Apk+1 ’ b)T ξ = 0
(2.21)

for all ξ ∈ Kk . (2.19) and (2.21) then imply that for all ξ ∈ Kk ,

±k+1 pT Aξ = ’(Axk ’ b)T ξ = ’∇φ(xk )T ξ = 0.
(2.22) k+1

This uniquely speci¬es the direction of pk+1 as (2.22) implies that pk+1 ∈ Kk+1
is A-orthogonal (i.e., in the scalar product (x, y) = xT Ay) to Kk , a subspace
of dimension one less than Kk+1 .
The condition pT Aξ = 0 is called A-conjugacy of pk+1 to Kk . Now, any
k+1
pk+1 satisfying (2.20) can, up to a scalar multiple, be expressed as

pk+1 = rk + wk

with wk ∈ Kk . While one might think that wk would be hard to compute, it
is, in fact, trivial. We have the following theorem.
Theorem 2.4.1. Let A be spd and assume that rk = 0. De¬ne p0 = 0.
Then
(2.23) pk+1 = rk + βk+1 pk for some βk+1 and k ≥ 0.
Proof. By Lemma 2.4.2 and the fact that Kk = span(r0 , . . . , rk’1 ), we need
only verify that a βk+1 can be found so that if pk+1 is given by (2.23) then

pT Arl = 0
k+1

for all 0 ¤ l ¤ k ’ 1.
Let pk+1 be given by (2.23). Then for any l ¤ k

pT Arl = rk Arl + βk+1 pT Arl .
T
k+1 k

If l ¤ k ’ 2, then rl ∈ Kl+1 ‚ Kk’1 . Lemma 2.4.2 then implies that

pT Arl = 0 for 0 ¤ l ¤ k ’ 2.
k+1

It only remains to solve for βk+1 so that pT Ark’1 = 0. Trivially
k+1

βk+1 = ’rk Ark’1 /pT Ark’1
T
(2.24) k

provided pT Ark’1 = 0. Since
k

rk = rk’1 ’ ±k Apk



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




21
CONJUGATE GRADIENT ITERATION

we have
T 2
’ ±k pT Ark’1 .
rk rk’1 = rk’1 2 k
T
Since rk rk’1 = 0 by Lemma 2.4.1 we have

pT Ark’1 = rk’1 2 /±k = 0.
(2.25) k 2

This completes the proof.
The common implementation of conjugate gradient uses a di¬erent form
for ±k and βk than given in (2.17) and (2.24).
Lemma 2.4.3. Let A be spd and assume that rk = 0. Then

rk 22
(2.26) ±k+1 =T
pk+1 Apk+1

and
2
rk 2
(2.27) βk+1 = 2.
rk’1 2
Proof. Note that for k ≥ 0

pT rk+1 = rk rk+1 + βk+1 pT rk+1 = 0
T
(2.28) k+1 k

by Lemma 2.4.2. An immediate consequence of (2.28) is that pT rk = 0 and
k
hence
pT rk = (rk + βk+1 pk )T rk = rk 2 .
(2.29) k+1 2

Taking scalar products of both sides of

rk+1 = rk ’ ±k+1 Apk+1

with pk+1 and using (2.29) gives

0 = pT rk ’ ±k+1 pT Apk+1 = rk
T 2
’ ±k+1 pT Apk+1 ,
k+1 k+1 2 k+1

which is equivalent to (2.26).
To get (2.27) note that pT Apk = 0 and hence (2.23) implies that
k+1

T
’rk Apk
(2.30) βk+1 =T .
pk Apk
Also note that
pT Apk = pT A(rk’1 + βk pk’1 )
k k
(2.31)
= pT Ark’1 + βk pT Apk’1 = pT Ark’1 .
k k k

Now combine (2.30), (2.31), and (2.25) to get
T
’rk Apk ±k
βk+1 = .
rk’1 22




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




22 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Now take scalar products of both sides of

rk = rk’1 ’ ±k Apk

with rk and use Lemma 2.4.1 to get
2 T
rk = ’±k rk Apk .
2

Hence (2.27) holds.
The usual implementation re¬‚ects all of the above results. The goal is to
¬nd, for a given , a vector x so that b’Ax 2 ¤ b 2 . The input is the initial
iterate x, which is overwritten with the solution, the right hand side b, and a
routine which computes the action of A on a vector. We limit the number of
iterations to kmax and return the solution, which overwrites the initial iterate
x and the residual norm.
Algorithm 2.4.1. cg(x, b, A, , kmax)
1. r = b ’ Ax, ρ0 = r 2 , k = 1.
2

2. Do While ρk’1 > b 2 and k < kmax
(a) if k = 1 then p = r
else
β = ρk’1 /ρk’2 and p = r + βp
(b) w = Ap
(c) ± = ρk’1 /pT w
(d) x = x + ±p
(e) r = r ’ ±w
2
(f) ρk = r 2
(g) k = k + 1

Note that the matrix A itself need not be formed or stored, only a routine
for matrix-vector products is required. Krylov space methods are often called
matrix-free for that reason.
Now, consider the costs. We need store only the four vectors x, w, p, and r.
Each iteration requires a single matrix-vector product (to compute w = Ap),
two scalar products (one for pT w and one to compute ρk = r 2 ), and three
2
operations of the form ax + y, where x and y are vectors and a is a scalar.
It is remarkable that the iteration can progress without storing a basis for
the entire Krylov subspace. As we will see in the section on GMRES, this is
not the case in general. The spd structure buys quite a lot.

2.5. Preconditioning
To reduce the condition number, and hence improve the performance of the
iteration, one might try to replace Ax = b by another spd system with the
same solution. If M is a spd matrix that is close to A’1 , then the eigenvalues



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




23
CONJUGATE GRADIENT ITERATION

of M A will be clustered near one. However M A is unlikely to be spd, and
hence CG cannot be applied to the system M Ax = M b.
In theory one avoids this di¬culty by expressing the preconditioned
problem in terms of B, where B is spd, A = B 2 , and by using a two-sided
preconditioner, S ≈ B ’1 (so M = S 2 ). Then the matrix SAS is spd and its
eigenvalues are clustered near one. Moreover the preconditioned system

SASy = Sb

has y — = S ’1 x— as a solution, where Ax— = b. Hence x— can be recovered
from y — by multiplication by S. One might think, therefore, that computing
S (or a subroutine for its action on a vector) would be necessary and that a
matrix-vector multiply by SAS would incur a cost of one multiplication by A
and two by S. Fortunately, this is not the case.
If y k , rk , pk are the iterate, residual, and search direction for CG applied
ˆˆ
to SAS and we let

xk = S y k , rk = S ’1 rk , pk = S pk , and zk = S rk ,
ˆ ˆ ˆ ˆ

then one can perform the iteration directly in terms of xk , A, and M . The
reader should verify that the following algorithm does exactly that. The input
is the same as that for Algorithm cg and the routine to compute the action of
the preconditioner on a vector. Aside from the preconditioner, the arguments
to pcg are the same as those to Algorithm cg.
Algorithm 2.5.1. pcg(x, b, A, M, , kmax)
1. r = b ’ Ax, ρ0 = r 2 , k = 1
2

2. Do While ρk’1 > b 2 and k < kmax
(a) z = M r
(b) „k’1 = z T r
(c) if k = 1 then β = 0 and p = z
else
β = „k’1 /„k’2 , p = z + βp
(d) w = Ap
(e) ± = „k’1 /pT w
(f) x = x + ±p
(g) r = r ’ ±w
(h) ρk = rT r
(i) k = k + 1

Note that the cost is identical to CG with the addition of
• the application of the preconditioner M in step 2a and

• the additional inner product required to compute „k in step 2b.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




24 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Of these costs, the application of the preconditioner is usually the larger. In
the remainder of this section we brie¬‚y mention some classes of preconditioners.

<< . .

. 4
( : 26)



. . >>