<< . .

. 23
( : 95)



. . >>

and
K2 (A) ’ 1 (k)
¤
e(k+1) e A, k = 0, 1, . . . , (4.38)
A
K2 (A) + 1
·
where is the energy norm de¬ned in (1.28).
A

Proof. Let x(k) be the solution generated by the gradient method at the k-th
(k+1)
step. Then, let xR be the vector generated by taking one step of the non
preconditioned Richardson method with optimal parameter starting from x(k) ,
(k+1)
= x(k) + ±opt r(k) .
i.e., xR
Due to Corollary 4.1 and (4.28), we have
K2 (A) ’ 1 (k)
(k+1)
¤
eR e A,
A
K2 (A) + 1
(k+1) (k+1)
’ x. Moreover, from (4.35) we have that the vector x(k+1) ,
where eR = xR
generated by the gradient method, is the one that minimizes the A-norm of
the error among all vectors of the form x(k) + θr(k) , with θ ∈ R. Therefore,
(k+1)
e(k+1) A ¤ eR 3
A which is the desired result.


We notice that the line through x(k) and x(k+1) is tangent at the point
x(k+1) to the ellipsoidal level surface x ∈ Rn : ¦(x) = ¦(x(k+1) ) (see
also Figure 4.5).

Relation (4.38) shows that convergence of the gradient method can be
quite slow if K2 (A) = »1 /»n is large. A simple geometric interpretation of
this result can be given in the case n = 2. Suppose that A=diag(»1 , »2 ),
with 0 < »2 ¤ »1 and b = (b1 , b2 )T .
In such a case, the curves corresponding to ¦(x1 , x2 ) = c, as c varies
in R+ , form a sequence of concentric ellipses whose semi-axes have length
inversely proportional to the values »1 and »2 . If »1 = »2 , the ellipses
degenerate into circles and the direction of the gradient crosses the center
directly, in such a way that the gradient method converges in one iteration.
Conversely, if »1 »2 , the ellipses become strongly eccentric and the
method converges quite slowly, as shown in Figure 4.5, moving along a
“zig-zag” trajectory.
4.3 Stationary and Nonstationary Iterative Methods 149


1
2


0.5
1
(1)
x (3)
x
0
0

(2)
’1
x
’0.5
(0)
x
’2
’2 0 2 ’1 ’0.5 0 0.5 1


FIGURE 4.5. The ¬rst iterates of the gradient method on the level curves of ¦



Program 19 provides an implementation of the gradient method with dy-
namic parameter. Here and in the programs reported in the remainder of
the section, the input parameters A, x, b, M, maxit and tol respectively
represent the coe¬cient matrix of the linear system, the initial datum x(0) ,
the right side, a possible preconditioner, the maximum number of admis-
sible iterations and a tolerance for the stopping test. This stopping test
checks if the ratio r(k) 2 / b 2 is less than tol. The output parameters of
the code are the the number of iterations niter required to ful¬ll the stop-
ping test, the vector x with the solution computed after niter iterations
and the normalized residual error = r(niter) 2 / b 2 . A null value of the
parameter flag warns the user that the algorithm has actually satis¬ed
the stopping test and it has not terminated due to reaching the maximum
admissible number of iterations.

Program 19 - gradient : Gradient method with dynamic parameter

function [x, error, niter, ¬‚ag] = gradient(A, x, b, M, maxit, tol)
¬‚ag = 0; niter = 0; bnrm2 = norm( b );
if ( bnrm2 == 0.0 ), bnrm2 = 1.0; end
r = b - A*x; error = norm( r ) / bnrm2;
if ( error < tol ) return, end
for niter = 1:maxit
z = M \ r; rho = (r™*z);
q = A*z; alpha = rho / (z™*q );
x = x + alpha * z; r = r - alpha*q;
error = norm( r ) / bnrm2;
if ( error <= tol ), break, end
end
if ( error > tol ) ¬‚ag = 1; end
150 4. Iterative Methods for Solving Linear Systems

Example 4.6 Let us solve with the gradient method the linear system with ma-
trix Am ∈ Rm—m generated with the MATLAB commands G=numgrid(™S™,n);
A=delsq(G) where m = (n ’ 2)2 . This matrix is associated with the discretiza-
tion of the di¬erential Laplace operator on the domain [’1, 1]2 . The right-hand
side bm is selected in such a way that the exact solution is the vector 1T ∈ Rm .
The matrix Am is symmetric and positive de¬nite for any m and becomes ill-
conditioned for large values of m. We run Program 19 in the cases m = 16 and
m = 400, with x(0) = 0T , tol=10’10 and maxit=200. If m = 400, the method
fails to satisfy the stopping test within the admissible maximum number of it-
erations and exhibits an extremely slow reduction of the residual (see Figure
4.6). Actually, K2 (A400 ) 258. If, however, we precondition the system with the
matrix P = RT Rin , where Rin is the lower triangular matrix in the Cholesky
in
incomplete factorization of A, the algorithm ful¬lls the convergence within the
maximum admissible number of iterations (indeed, now K2 (P’1 A400 ) 38). •


0
10

(c)
’2
10


’4
10


’6
10


’8
10


(a)
’10
10 (d)
(b)
’12
10


’14
10
0 50 100 150 200 250




FIGURE 4.6. The residual normalized to the starting one, as a function of the
number of iterations, for the gradient method applied to the systems in Example
4.6. The curves labelled (a) and (b) refer to the case m = 16 with the non precon-
ditioned and preconditioned method, respectively, while the curves labelled (c)
and (d) refer to the case m = 400 with the non preconditioned and preconditioned
method, respectively



4.3.4 The Conjugate Gradient Method
The gradient method consists essentially of two phases: choosing a descent
direction (the one of the residual) and picking up a point of local minimum
for ¦ along that direction. The second phase is independent of the ¬rst one
since, for a given direction p(k) , we can determine ±k as being the value
of the parameter ± such that ¦(x(k) + ±p(k) ) is minimized. Di¬erentiating
with respect to ± and setting to zero the derivative at the minimizer, yields
T
p(k) r(k)
±k = , (4.39)
T
p(k) Ap(k)
4.3 Stationary and Nonstationary Iterative Methods 151

instead of (4.37). The question is how to determine p(k) . A di¬erent ap-
proach than the one which led to identify p(k) with r(k) is suggested by the
following de¬nition.

De¬nition 4.4 A direction x(k) is said to be optimal with respect to a
direction p = 0 if

¦(x(k) ) ¤ ¦(x(k) + »p), ∀» ∈ R. (4.40)

If x(k) is optimal with respect to any direction in a vector space V, we say
that x(k) is optimal with respect to V.

From the de¬nition of optimality, it turns out that p must be orthogonal
to the residual r(k) . Indeed, from (4.40) we conclude that ¦ admits a local
minimum along p for » = 0, and thus the partial derivative of ¦ with
respect to » must vanish at » = 0. Since

‚¦ (k)
(x + »p) = pT (Ax(k) ’ b) + »pT Ap,
‚»
we therefore have
‚¦ (k)
pT (r(k) ) = 0,
(x )|»=0 = 0 i¬
‚»

that is, p ⊥ r(k) . Notice that the iterate x(k+1) of the gradient method
is optimal with respect to r(k) since, due to the choice of ±k , we have
r(k+1) ⊥ r(k) , but this property no longer holds for the successive iterate
x(k+2) (see Exercise 12). It is then natural to ask whether there exist descent
directions that maintain the optimality of iterates. Let

x(k+1) = x(k) + q,

and assume that x(k) is optimal with respect to a direction p (thus, r(k) ⊥
p). Let us impose that x(k+1) is still optimal with respect to p, that is,
r(k+1) ⊥ p. We obtain

0 = pT r(k+1) = pT (r(k) ’ Aq) = ’pT Aq.

The conclusion is that, in order to preserve optimality between succes-
sive iterates, the descent directions must be mutually A-orthogonal or A-
conjugate, i.e.

pT Aq = 0.

A method employing A-conjugate descent directions is called conjugate.
The next step is how to generate automatically a sequence of conjugate
152 4. Iterative Methods for Solving Linear Systems

directions. This can be done as follows. Let p(0) = r(0) and search for the
directions of the form

p(k+1) = r(k+1) ’ βk p(k) , k = 0, 1, . . . (4.41)

where βk ∈ R must be determined in such a way that

(Ap(j) )T p(k+1) = 0, j = 0, 1, . . . , k. (4.42)

Requiring that (4.42) is satis¬ed for j = k, we get from (4.41)

(Ap(k) )T r(k+1)
βk = , k = 0, 1, . . .
(Ap(k) )T p(k)

We must now verify that (4.42) holds also for j = 0, 1, . . . , k ’1. To do this,
let us proceed by induction on k. Due to the choice of β0 , relation (4.42)
holds for k = 0; let us thus assume that the directions p(0) , . . . , p(k’1) are
mutually A-orthogonal and, without losing generality, that

(p(j) )T r(k) = 0, j = 0, 1, . . . , k ’ 1, k ≥ 1. (4.43)

Then, from (4.41) it follows that

(Ap(j) )T p(k+1) = (Ap(j) )T r(k+1) , j = 0, 1, . . . , k ’ 1.

Moreover, due to (4.43) and by the assumption of of A-orthogonality we
get

(p(j) )T r(k+1) = (p(j) )T r(k) ’ ±k (p(j) )T Ap(k) = 0, j = 0, . . . , k ’ 1(4.44)

i.e., we conclude that r(k+1) is orthogonal to every vector of the space Vk =
span(p(0) , . . . , p(k’1) ). Since p(0) = r(0) , from (4.41) it follows that Vk is
also equal to span(r(0) , . . . , r(k’1) ). Then, (4.41) implies that Ap(j) ∈ Vj+1
and thus, due to (4.44)

(Ap(j) )T r(k+1) = 0, j = 0, 1, . . . , k ’ 1.

As a consequence, (4.42) holds for j = 0, . . . , k.

The conjugate gradient method (CG) is the method obtained by choosing
the descent directions p(k) given by (4.41) and the acceleration parameter
±k as in (4.39). As a consequence, setting r(0) = b ’ Ax(0) and p(0) = r(0) ,
the k-th iteration of the conjugate gradient method takes the following
4.3 Stationary and Nonstationary Iterative Methods 153


1.4

1.2

1

0.8

G
0.6
CG
0.4

0.2

0

’0.2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8


FIGURE 4.7. Descent directions for the conjugate gradient method (denoted by
CG, dashed line) and the gradient method (denoted by G, solid line). Notice that
the CG method reaches the solution after two iterations

form
T
p(k) r(k)
±k = T
p(k) Ap(k)
x(k+1) = x(k) + ±k p(k)

r(k+1) = r(k) ’ ±k Ap(k)

(Ap(k) )T r(k+1)
βk =
(Ap(k) )T p(k)
p(k+1) = r(k+1) ’ βk p(k) .

It can also be shown (see Exercise 13) that the two parameters ±k and βk
may be alternatively expressed as

r(k) 2
r(k+1) 2
2 2
±k = , βk = . (4.45)
(k) 2
T r
p(k) Ap(k) 2

We ¬nally notice that, eliminating the descent directions from r(k+1) =
r(k) ’ ±k Ap(k) , the following recursive three-terms relation is obtained for
the residuals (see Exercise 14)

1 (k+1) 1 βk’1 βk (k’1)
Ar(k) = ’ ’ r(k) +
r + r . (4.46)
±k ±k ±k’1 ±k’1
As for the convergence of the CG method, we have the following results.

Theorem 4.11 Let A be a symmetric and positive de¬nite matrix. Any
method which employs conjugate directions to solve (3.2) terminates after
at most n steps, yielding the exact solution.
154 4. Iterative Methods for Solving Linear Systems

Proof. The directions p(0) , p(1) , . . . , p(n’1) form an A-orthogonal basis in Rn .
Moreover, since x(k) is optimal with respect to all the directions p(j) , j =
0, . . . , k ’ 1, it follows that r(k) is orthogonal to the space Sk’1 = span(p(0) , p(1) ,
. . . , p(k’1) ). As a consequence, r(n) ⊥ Sn’1 = Rn and thus r(n) = 0 which
3
implies x(n) = x.


Theorem 4.12 Let A be a symmetric and positive de¬nite matrix and
let »1 , »n be its maximum and minimum eigenvalues, respectively. The
conjugate gradient method for solving (3.2) converges after at most n steps.
Moreover, the error e(k) at the k-th iteration (with k < n) is orthogonal to
p(j) , for j = 0, . . . , k ’ 1 and

K2 (A) ’ 1
2ck
¤ e(0)
e(k) A, with c = . (4.47)
A 2k
1+c K2 (A) + 1

Proof. The convergence of the CG method in n steps is a consequence of The-
orem 4.11.
Let us prove the error estimate, assuming for simplicity that x(0) = 0. Notice
¬rst that, for ¬xed k
k
(k+1)
γj Aj b,
x =
j=0


for suitable γj ∈ R. Moreover, by construction, x(k+1) is the vector which min-
imizes the A-norm of the error at step k + 1, among all vectors of the form
z = k δj Aj b = pk (A)b, where pk (ξ) = k δj ξ j is a polynomial of degree
j=0 j=0
k and pk (A) denotes the corresponding matrix polynomial. As a consequence

¤ (x ’ z)T A(x ’ z) = xT qk+1 (A)Aqk+1 (A)x,
e(k+1) 2
(4.48)
A

where qk+1 (ξ) = 1 ’ pk (ξ)ξ ∈ P0,1 , being P0,1 = {q ∈ Pk+1 : q(0) = 1} and
k+1 k+1
qk+1 (A) the associated matrix polynomial. From (4.48) we get

e(k+1) 2
xT qk+1 (A)Aqk+1 (A)x.
= min (4.49)
A
0,1
qk+1 ∈Pk+1


Since A is symmetric positive de¬nite, there exists an orthogonal matrix Q
such that A = QΛQT with Λ = diag(»1 , . . . , »n ). Noticing that qk+1 (A) =
Qqk+1 (Λ)QT , we get from (4.49)

xT Qqk+1 (Λ)QT QΛQT Qqk+1 (Λ)QT x
e(k+1) 2
= min

<< . .

. 23
( : 95)



. . >>