K2 (A) ’ 1 (k)

¤

e(k+1) e A, k = 0, 1, . . . , (4.38)

A

K2 (A) + 1

·

where is the energy norm de¬ned in (1.28).

A

Proof. Let x(k) be the solution generated by the gradient method at the k-th

(k+1)

step. Then, let xR be the vector generated by taking one step of the non

preconditioned Richardson method with optimal parameter starting from x(k) ,

(k+1)

= x(k) + ±opt r(k) .

i.e., xR

Due to Corollary 4.1 and (4.28), we have

K2 (A) ’ 1 (k)

(k+1)

¤

eR e A,

A

K2 (A) + 1

(k+1) (k+1)

’ x. Moreover, from (4.35) we have that the vector x(k+1) ,

where eR = xR

generated by the gradient method, is the one that minimizes the A-norm of

the error among all vectors of the form x(k) + θr(k) , with θ ∈ R. Therefore,

(k+1)

e(k+1) A ¤ eR 3

A which is the desired result.

We notice that the line through x(k) and x(k+1) is tangent at the point

x(k+1) to the ellipsoidal level surface x ∈ Rn : ¦(x) = ¦(x(k+1) ) (see

also Figure 4.5).

Relation (4.38) shows that convergence of the gradient method can be

quite slow if K2 (A) = »1 /»n is large. A simple geometric interpretation of

this result can be given in the case n = 2. Suppose that A=diag(»1 , »2 ),

with 0 < »2 ¤ »1 and b = (b1 , b2 )T .

In such a case, the curves corresponding to ¦(x1 , x2 ) = c, as c varies

in R+ , form a sequence of concentric ellipses whose semi-axes have length

inversely proportional to the values »1 and »2 . If »1 = »2 , the ellipses

degenerate into circles and the direction of the gradient crosses the center

directly, in such a way that the gradient method converges in one iteration.

Conversely, if »1 »2 , the ellipses become strongly eccentric and the

method converges quite slowly, as shown in Figure 4.5, moving along a

“zig-zag” trajectory.

4.3 Stationary and Nonstationary Iterative Methods 149

1

2

0.5

1

(1)

x (3)

x

0

0

(2)

’1

x

’0.5

(0)

x

’2

’2 0 2 ’1 ’0.5 0 0.5 1

FIGURE 4.5. The ¬rst iterates of the gradient method on the level curves of ¦

Program 19 provides an implementation of the gradient method with dy-

namic parameter. Here and in the programs reported in the remainder of

the section, the input parameters A, x, b, M, maxit and tol respectively

represent the coe¬cient matrix of the linear system, the initial datum x(0) ,

the right side, a possible preconditioner, the maximum number of admis-

sible iterations and a tolerance for the stopping test. This stopping test

checks if the ratio r(k) 2 / b 2 is less than tol. The output parameters of

the code are the the number of iterations niter required to ful¬ll the stop-

ping test, the vector x with the solution computed after niter iterations

and the normalized residual error = r(niter) 2 / b 2 . A null value of the

parameter flag warns the user that the algorithm has actually satis¬ed

the stopping test and it has not terminated due to reaching the maximum

admissible number of iterations.

Program 19 - gradient : Gradient method with dynamic parameter

function [x, error, niter, ¬‚ag] = gradient(A, x, b, M, maxit, tol)

¬‚ag = 0; niter = 0; bnrm2 = norm( b );

if ( bnrm2 == 0.0 ), bnrm2 = 1.0; end

r = b - A*x; error = norm( r ) / bnrm2;

if ( error < tol ) return, end

for niter = 1:maxit

z = M \ r; rho = (r™*z);

q = A*z; alpha = rho / (z™*q );

x = x + alpha * z; r = r - alpha*q;

error = norm( r ) / bnrm2;

if ( error <= tol ), break, end

end

if ( error > tol ) ¬‚ag = 1; end

150 4. Iterative Methods for Solving Linear Systems

Example 4.6 Let us solve with the gradient method the linear system with ma-

trix Am ∈ Rm—m generated with the MATLAB commands G=numgrid(™S™,n);

A=delsq(G) where m = (n ’ 2)2 . This matrix is associated with the discretiza-

tion of the di¬erential Laplace operator on the domain [’1, 1]2 . The right-hand

side bm is selected in such a way that the exact solution is the vector 1T ∈ Rm .

The matrix Am is symmetric and positive de¬nite for any m and becomes ill-

conditioned for large values of m. We run Program 19 in the cases m = 16 and

m = 400, with x(0) = 0T , tol=10’10 and maxit=200. If m = 400, the method

fails to satisfy the stopping test within the admissible maximum number of it-

erations and exhibits an extremely slow reduction of the residual (see Figure

4.6). Actually, K2 (A400 ) 258. If, however, we precondition the system with the

matrix P = RT Rin , where Rin is the lower triangular matrix in the Cholesky

in

incomplete factorization of A, the algorithm ful¬lls the convergence within the

maximum admissible number of iterations (indeed, now K2 (P’1 A400 ) 38). •

0

10

(c)

’2

10

’4

10

’6

10

’8

10

(a)

’10

10 (d)

(b)

’12

10

’14

10

0 50 100 150 200 250

FIGURE 4.6. The residual normalized to the starting one, as a function of the

number of iterations, for the gradient method applied to the systems in Example

4.6. The curves labelled (a) and (b) refer to the case m = 16 with the non precon-

ditioned and preconditioned method, respectively, while the curves labelled (c)

and (d) refer to the case m = 400 with the non preconditioned and preconditioned

method, respectively

4.3.4 The Conjugate Gradient Method

The gradient method consists essentially of two phases: choosing a descent

direction (the one of the residual) and picking up a point of local minimum

for ¦ along that direction. The second phase is independent of the ¬rst one

since, for a given direction p(k) , we can determine ±k as being the value

of the parameter ± such that ¦(x(k) + ±p(k) ) is minimized. Di¬erentiating

with respect to ± and setting to zero the derivative at the minimizer, yields

T

p(k) r(k)

±k = , (4.39)

T

p(k) Ap(k)

4.3 Stationary and Nonstationary Iterative Methods 151

instead of (4.37). The question is how to determine p(k) . A di¬erent ap-

proach than the one which led to identify p(k) with r(k) is suggested by the

following de¬nition.

De¬nition 4.4 A direction x(k) is said to be optimal with respect to a

direction p = 0 if

¦(x(k) ) ¤ ¦(x(k) + »p), ∀» ∈ R. (4.40)

If x(k) is optimal with respect to any direction in a vector space V, we say

that x(k) is optimal with respect to V.

From the de¬nition of optimality, it turns out that p must be orthogonal

to the residual r(k) . Indeed, from (4.40) we conclude that ¦ admits a local

minimum along p for » = 0, and thus the partial derivative of ¦ with

respect to » must vanish at » = 0. Since

‚¦ (k)

(x + »p) = pT (Ax(k) ’ b) + »pT Ap,

‚»

we therefore have

‚¦ (k)

pT (r(k) ) = 0,

(x )|»=0 = 0 i¬

‚»

that is, p ⊥ r(k) . Notice that the iterate x(k+1) of the gradient method

is optimal with respect to r(k) since, due to the choice of ±k , we have

r(k+1) ⊥ r(k) , but this property no longer holds for the successive iterate

x(k+2) (see Exercise 12). It is then natural to ask whether there exist descent

directions that maintain the optimality of iterates. Let

x(k+1) = x(k) + q,

and assume that x(k) is optimal with respect to a direction p (thus, r(k) ⊥

p). Let us impose that x(k+1) is still optimal with respect to p, that is,

r(k+1) ⊥ p. We obtain

0 = pT r(k+1) = pT (r(k) ’ Aq) = ’pT Aq.

The conclusion is that, in order to preserve optimality between succes-

sive iterates, the descent directions must be mutually A-orthogonal or A-

conjugate, i.e.

pT Aq = 0.

A method employing A-conjugate descent directions is called conjugate.

The next step is how to generate automatically a sequence of conjugate

152 4. Iterative Methods for Solving Linear Systems

directions. This can be done as follows. Let p(0) = r(0) and search for the

directions of the form

p(k+1) = r(k+1) ’ βk p(k) , k = 0, 1, . . . (4.41)

where βk ∈ R must be determined in such a way that

(Ap(j) )T p(k+1) = 0, j = 0, 1, . . . , k. (4.42)

Requiring that (4.42) is satis¬ed for j = k, we get from (4.41)

(Ap(k) )T r(k+1)

βk = , k = 0, 1, . . .

(Ap(k) )T p(k)

We must now verify that (4.42) holds also for j = 0, 1, . . . , k ’1. To do this,

let us proceed by induction on k. Due to the choice of β0 , relation (4.42)

holds for k = 0; let us thus assume that the directions p(0) , . . . , p(k’1) are

mutually A-orthogonal and, without losing generality, that

(p(j) )T r(k) = 0, j = 0, 1, . . . , k ’ 1, k ≥ 1. (4.43)

Then, from (4.41) it follows that

(Ap(j) )T p(k+1) = (Ap(j) )T r(k+1) , j = 0, 1, . . . , k ’ 1.

Moreover, due to (4.43) and by the assumption of of A-orthogonality we

get

(p(j) )T r(k+1) = (p(j) )T r(k) ’ ±k (p(j) )T Ap(k) = 0, j = 0, . . . , k ’ 1(4.44)

i.e., we conclude that r(k+1) is orthogonal to every vector of the space Vk =

span(p(0) , . . . , p(k’1) ). Since p(0) = r(0) , from (4.41) it follows that Vk is

also equal to span(r(0) , . . . , r(k’1) ). Then, (4.41) implies that Ap(j) ∈ Vj+1

and thus, due to (4.44)

(Ap(j) )T r(k+1) = 0, j = 0, 1, . . . , k ’ 1.

As a consequence, (4.42) holds for j = 0, . . . , k.

The conjugate gradient method (CG) is the method obtained by choosing

the descent directions p(k) given by (4.41) and the acceleration parameter

±k as in (4.39). As a consequence, setting r(0) = b ’ Ax(0) and p(0) = r(0) ,

the k-th iteration of the conjugate gradient method takes the following

4.3 Stationary and Nonstationary Iterative Methods 153

1.4

1.2

1

0.8

G

0.6

CG

0.4

0.2

0

’0.2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

FIGURE 4.7. Descent directions for the conjugate gradient method (denoted by

CG, dashed line) and the gradient method (denoted by G, solid line). Notice that

the CG method reaches the solution after two iterations

form

T

p(k) r(k)

±k = T

p(k) Ap(k)

x(k+1) = x(k) + ±k p(k)

r(k+1) = r(k) ’ ±k Ap(k)

(Ap(k) )T r(k+1)

βk =

(Ap(k) )T p(k)

p(k+1) = r(k+1) ’ βk p(k) .

It can also be shown (see Exercise 13) that the two parameters ±k and βk

may be alternatively expressed as

r(k) 2

r(k+1) 2

2 2

±k = , βk = . (4.45)

(k) 2

T r

p(k) Ap(k) 2

We ¬nally notice that, eliminating the descent directions from r(k+1) =

r(k) ’ ±k Ap(k) , the following recursive three-terms relation is obtained for

the residuals (see Exercise 14)

1 (k+1) 1 βk’1 βk (k’1)

Ar(k) = ’ ’ r(k) +

r + r . (4.46)

±k ±k ±k’1 ±k’1

As for the convergence of the CG method, we have the following results.

Theorem 4.11 Let A be a symmetric and positive de¬nite matrix. Any

method which employs conjugate directions to solve (3.2) terminates after

at most n steps, yielding the exact solution.

154 4. Iterative Methods for Solving Linear Systems

Proof. The directions p(0) , p(1) , . . . , p(n’1) form an A-orthogonal basis in Rn .

Moreover, since x(k) is optimal with respect to all the directions p(j) , j =

0, . . . , k ’ 1, it follows that r(k) is orthogonal to the space Sk’1 = span(p(0) , p(1) ,

. . . , p(k’1) ). As a consequence, r(n) ⊥ Sn’1 = Rn and thus r(n) = 0 which

3

implies x(n) = x.

Theorem 4.12 Let A be a symmetric and positive de¬nite matrix and

let »1 , »n be its maximum and minimum eigenvalues, respectively. The

conjugate gradient method for solving (3.2) converges after at most n steps.

Moreover, the error e(k) at the k-th iteration (with k < n) is orthogonal to

p(j) , for j = 0, . . . , k ’ 1 and

K2 (A) ’ 1

2ck

¤ e(0)

e(k) A, with c = . (4.47)

A 2k

1+c K2 (A) + 1

Proof. The convergence of the CG method in n steps is a consequence of The-

orem 4.11.

Let us prove the error estimate, assuming for simplicity that x(0) = 0. Notice

¬rst that, for ¬xed k

k

(k+1)

γj Aj b,

x =

j=0

for suitable γj ∈ R. Moreover, by construction, x(k+1) is the vector which min-

imizes the A-norm of the error at step k + 1, among all vectors of the form

z = k δj Aj b = pk (A)b, where pk (ξ) = k δj ξ j is a polynomial of degree

j=0 j=0

k and pk (A) denotes the corresponding matrix polynomial. As a consequence

¤ (x ’ z)T A(x ’ z) = xT qk+1 (A)Aqk+1 (A)x,

e(k+1) 2

(4.48)

A

where qk+1 (ξ) = 1 ’ pk (ξ)ξ ∈ P0,1 , being P0,1 = {q ∈ Pk+1 : q(0) = 1} and

k+1 k+1

qk+1 (A) the associated matrix polynomial. From (4.48) we get

e(k+1) 2

xT qk+1 (A)Aqk+1 (A)x.

= min (4.49)

A

0,1

qk+1 ∈Pk+1

Since A is symmetric positive de¬nite, there exists an orthogonal matrix Q

such that A = QΛQT with Λ = diag(»1 , . . . , »n ). Noticing that qk+1 (A) =

Qqk+1 (Λ)QT , we get from (4.49)

xT Qqk+1 (Λ)QT QΛQT Qqk+1 (Λ)QT x

e(k+1) 2

= min