qk+1 ∈Pk+1

xT Qqk+1 (Λ)Λqk+1 (Λ)QT x

= min

0,1

qk+1 ∈Pk+1

yT diag(qk+1 (»i )»i qk+1 (»i ))y

= min

0,1

qk+1 ∈Pk+1

n

yi »i (qk+1 (»i ))2

2

= min

0,1

qk+1 ∈Pk+1 i=1

4.3 Stationary and Nonstationary Iterative Methods 155

having set y = Qx. Thus, we can conclude that

n

¤

e(k+1) 2 2 2

min max (qk+1 (»i )) yi »i .

A

0,1

qk+1 ∈Pk+1 »i ∈σ(A) i=1

n

yi »i = e(0)

2 2

Recalling that A, we have

i=1

e(k+1) A

¤ max |qk+1 (»i )|.

min

e(0) A 0,1

qk+1 ∈Pk+1 »i ∈σ(A)

Let us now recall the following property

max |q(z)| over the space

Property 4.6 The problem of minimizing

»n ¤z¤»1

P0,1 ([»n , »1 ]) admits a unique solution, given by the polynomial

k+1

»1 + »n ’ 2ξ

ξ ∈ [»n , »1 ],

pk+1 (ξ) = Tk+1 /Ck+1 ,

»1 ’ » n

»1 +»

where Ck+1 = Tk+1 ( »1 ’»n ) and Tk+1 is the Chebyshev polynomial of degree k + 1

n

(see Section 10.10). The value of the minimum is 1/Ck+1 .

Using this property we get

e(k+1) A 1

¤

e(0) A »1 + »n

Tk+1

»1 ’ »n

from which the thesis follows since in the case of a symmetric positive de¬nite

matrix

2ck+1

1

= .

1 + c2(k+1)

Ck+1

3

The generic k-th iteration of the conjugate gradient method is well de¬ned

only if the descent direction p(k) is non null. Besides, if p(k) = 0, then the

iterate x(k) must necessarily coincide with the solution x of the system.

Moreover, irrespectively of the choice of the parameters βk , one can show

(see [Axe94], p. 463) that the sequence x(k) generated by the CG method

is such that either x(k) = x, p(k) = 0, ±k = 0 for any k, or there must exist

an integer m such that x(m) = x, where x(k) = x, p(k) = 0 and ±k = 0 for

k = 0, 1, . . . , m ’ 1.

The particular choice made for βk in (4.45) ensures that m ¤ n. In ab-

sence of rounding errors, the CG method can thus be regarded as being a

direct method, since it terminates after a ¬nite number of steps. However,

for matrices of large size, it is usually employed as an iterative scheme,

156 4. Iterative Methods for Solving Linear Systems

where the iterations are stopped when the error gets below a ¬xed toler-

ance. In this respect, the dependence of the error reduction factor on the

condition number of the matrix is more favorable than for the gradient

method. We also notice that estimate (4.47) is often overly pessimistic and

does not account for the fact that in this method, unlike what happens for

the gradient method, the convergence is in¬‚uenced by the whole spectrum

of A, and not only by its extremal eigenvalues.

Remark 4.3 (E¬ect of rounding errors) The termination property of

the CG method is rigorously valid only in exact arithmetic. The cumulating

rounding errors prevent the descent directions from being A-conjugate and

can even generate null denominators in the computation of coe¬cients ±k

and βk . This latter phenomenon, known as breakdown, can be avoided by

introducing suitable stabilization procedures; in such an event, we speak

about stabilized gradient methods.

Despite the use of these strategies, it may happen that the CG method

fails to converge (in ¬nite arithmetic) after n iterations. In such a case,

the only reasonable possibility is to restart the iterative process, taking

as residual the last computed one. By so doing, the cyclic CG method or

CG method with restart is obtained, for which, however, the convergence

properties of the original CG method are no longer valid.

4.3.5 The Preconditioned Conjugate Gradient Method

If P is a symmetric and positive de¬nite preconditioning matrix, the pre-

conditioned conjugate gradient method (PCG) consists of applying the CG

method to the preconditioned system

P’1/2 AP’1/2 y = P’1/2 b, with y = P1/2 x.

In practice, the method is implemented without explicitly requiring the

computation of P1/2 or P’1/2 . After some algebra, the following scheme is

obtained:

given x(0) and setting r(0) = b ’ Ax(0) , z(0) = P’1 r(0) e p(0) = z(0) , the

k-th iteration reads

4.3 Stationary and Nonstationary Iterative Methods 157

T

p(k) r(k)

±k = T

p(k) Ap(k)

x(k+1) = x(k) + ±k p(k)

r(k+1) = r(k) ’ ±k Ap(k)

Pz(k+1) = r(k+1)

(Ap(k) )T z(k+1)

βk =

(Ap(k) )T p(k)

p(k+1) = z(k+1) ’ βk p(k) .

The computational cost is increased with respect to the CG method, as

one needs to solve at each step the linear system Pz(k+1) = r(k+1) . For this

system the symmetric preconditioners examined in Section 4.3.2 can be

used. The error estimate is the same as for the nonpreconditioned method,

provided to replace the matrix A by P’1 A.

In Program 20 an implementation of the PCG method is reported. For

a description of the input/output parameters, see Program 19.

Program 20 - conjgrad : Preconditioned conjugate gradient method

function [x, error, niter, ¬‚ag] = conjgrad(A, x, b, P, maxit, tol)

¬‚ag = 0; niter = 0; bnrm2 = norm( b );

if ( bnrm2 == 0.0 ), bnrm2 = 1.0; end

r = b - A*x; error = norm( r ) / bnrm2;

if ( error < tol ) return, end

for niter = 1:maxit

z = P \ r; rho = (r™*z);

if niter > 1

beta = rho / rho1; p = z + beta*p;

else

p = z;

end

q = A*p; alpha = rho / (p™*q );

x = x + alpha * p; r = r - alpha*q;

error = norm( r ) / bnrm2;

if ( error <= tol ), break, end

rho1 = rho;

end

if ( error > tol ) ¬‚ag = 1; end

158 4. Iterative Methods for Solving Linear Systems

Example 4.7 Let us consider again the linear system of Example 4.6. The CG

method has been run with the same input data as in the previous example. It

converges in 3 iterations for m = 16 and in 45 iterations for m = 400. Using the

same preconditioner as in Example 4.6, the number of iterations decreases from

•

45 to 26, in the case m = 400.

0

10

’2

10

’4

10

’6

10

’8

10

’10

10

’12

10

’14

10

0 5 10 15 20 25 30 35 40 45

FIGURE 4.8. Behavior of the residual, normalized to the right-hand side, as a

function of the number of iterations for the conjugate gradient method applied

to the systems of Example 4.6 in the case m = 400. The curve in dashed line

refers to the non preconditioned method, while the curve in solid line refers to

the preconditioned one

4.3.6 The Alternating-Direction Method

Assume that A = A1 +A2 , with A1 and A2 symmetric and positive de¬nite.

The alternating direction method (ADI), as introduced by Peaceman and

Rachford [PJ55], is an iterative scheme for (3.2) which consists of solving

the following systems ∀k ≥ 0

(I + ±1 A1 )x(k+1/2) = (I ’ ±1 A2 )x(k) + ±1 b,

(4.50)

= (I ’ ±2 A1 )x

(k+1) (k+1/2)

(I + ±2 A2 )x + ±2 b

where ±1 and ±2 are two real parameters. The ADI method can be cast in

the form (4.2) setting

B = (I + ±2 A2 )’1 (I ’ ±2 A1 )(I + ±1 A1 )’1 (I ’ ±1 A2 ),

f = ±1 (I ’ ±2 A1 )(I + ±1 A1 )’1 + ±2 I b.

Both B and f depend on ±1 and ±2 . The following estimate holds

(1) (2)

1 ’ ±2 »i 1 ’ ±1 »i

ρ(B) ¤ max max ,

(1) (2)

i=1,... ,n ±1 »i i=1,... ,n

1+ 1+ ±2 »i

4.4 Methods Based on Krylov Subspace Iterations 159

(i) (i)

where »1 and »2 , for i = 1, . . . , n, are the eigenvalues of A1 and A2 ,

respectively. The method converges if ρ(B) < 1, which is always veri¬ed if

(j)

±1 = ±2 = ± > 0. Moreover (see [Axe94]) if γ ¤ »i ¤ δ ∀i = 1, . . . , n,

∀j = 1, 2, for suitable γ and δ then the ADI method converges with the

√

choice ±1 = ±2 = 1/ δγ, provided that γ/δ tends to 0 as the size of A

grows. In such an event the corresponding spectral radius satis¬es

2

1’ γ/δ

ρ(B) ¤ .

1+ γ/δ

4.4 Methods Based on Krylov Subspace Iterations

In this section we introduce iterative methods based on Krylov subspace

iterations. For the proofs and further analysis, we refer to [Saa96], [Axe94]

and [Hac94].

Consider the Richardson method (4.24) with P=I; the residual at the

k-th step can be related to the initial residual as

k’1

(I ’ ±j A)r(0)

(k)

r = (4.51)

j=0

so that r(k) = pk (A)r(0) , where pk (A) is a polynomial in A of degree k. If

we introduce the space

Km (A; v) = span v, Av, . . . , Am’1 v , (4.52)

it immediately appears from (4.51) that r(k) ∈ Kk+1 (A; r(0) ). The space

de¬ned in (4.52) is called the Krylov subspace of order m. It is a subspace

of the set spanned by all the vectors u ∈ Rn that can be written as u =

pm’1 (A)v, where pm’1 is a polynomial in A of degree ¤ m ’ 1.

In an analogous manner as for (4.51), it is seen that the iterate x(k) of

the Richardson method is given by

k’1

(k) (0)

±j r(j)

x =x +

j=0

so that x(k) belongs to the following space

Wk = v = x(0) + y, y ∈ Kk (A; r(0) ) . (4.53)

k’1

Notice also that j=0 ±j r(j) is a polynomial in A of degree less than k ’ 1.

In the non preconditioned Richardson method we are thus looking for an

160 4. Iterative Methods for Solving Linear Systems

approximate solution to x in the space Wk . More generally, we can think

of devising methods that search for approximate solutions of the form

x(k) = x(0) + qk’1 (A)r(0) , (4.54)

where qk’1 is a polynomial selected in such a way that x(k) be, in a sense

that must be made precise, the best approximation of x in Wk . A method

that looks for a solution of the form (4.54) with Wk de¬ned as in (4.53) is

called a Krylov method.

A ¬rst question concerning Krylov subspace iterations is whether the

dimension of Km (A; v) increases as the order m grows. A partial answer is

provided by the following result.

Property 4.7 Let A ∈ Rn—n and v ∈ Rn . The Krylov subspace Km (A; v)

has dimension equal to m i¬ the degree of v with respect to A, denoted by

degA (v), is not less than m, where the degree of v is de¬ned as the minimum

degree of a monic non null polynomial p in A, for which p(A)v = 0.

The dimension of Km (A; v) is thus equal to the minimum between m and

the degree of v with respect to A and, as a consequence, the dimension

of the Krylov subspaces is certainly a nondecreasing function of m. Notice

that the degree of v cannot be greater than n due to the Cayley-Hamilton

Theorem (see Section 1.7).

Example 4.8 Consider the matrix A = tridiag4 (’1, 2, ’1). The vector v =

(1, 1, 1, 1)T has degree 2 with respect to A since p2 (A)v = 0 with p2 (A) = I4 ’

3A + A2 , while there is no monic polynomial p1 of degree 1 for which p1 (A)v = 0.

As a consequence, all Krylov subspaces from K2 (A; v) on, have dimension equal

to 2. The vector w = (1, 1, ’1, 1)T has, instead, degree 4 with respect to A. •

For a ¬xed m, it is possible to compute an orthonormal basis for Km (A; v)

using the so-called Arnoldi algorithm.

Setting v1 = v/ v 2 , this method generates an orthonormal basis {vi }

for Km (A; v1 ) using the Gram-Schmidt procedure (see Section 3.4.3). For

k = 1, . . . , m, the Arnoldi algorithm computes

T

hik = vi Avk , i = 1, 2, . . . , k,

(4.55)

k

wk = Avk ’ hik vi , hk+1,k = wk 2.

i=1

If wk = 0 the process terminates and in such a case we say that a breakdown

of the algorithm has occurred; otherwise, we set vk+1 = wk / wk 2 and the

algorithm restarts, incrementing k by 1.

4.4 Methods Based on Krylov Subspace Iterations 161

It can be shown that if the method terminates at the step m then the

vectors v1 , . . . , vm form a basis for Km (A; v). In such a case, if we denote

by Vm ∈ Rn—m the matrix whose columns are the vectors vi , we have

T T (4.56)

Vm AVm = Hm , Vm+1 AVm = Hm ,

where Hm ∈ R(m+1)—m is the upper Hessenberg matrix whose entries hij

are given by (4.55) and Hm ∈ Rm—m is the restriction of Hm to the ¬rst m

rows and m columns.

The algorithm terminates at an intermediate step k < m i¬ degA (v1 ) =

k. As for the stability of the procedure, all the considerations valid for the

Gram-Schmidt method hold. For more e¬cient and stable computational

variants of (4.55), we refer to [Saa96].

The functions arnoldi alg and GSarnoldi, invoked by Program 21, pro-

vide an implementation of the Arnoldi algorithm. In output, the columns

of V contain the vectors of the generated basis, while the matrix H stores

the coe¬cients hik computed by the algorithm. If m steps are carried out,

V = Vm and H(1 : m, 1 : m) = Hm .

Program 21 - arnoldi alg : The Arnoldi algorithm

function [V,H]=arnoldi alg(A,v,m)

v=v/norm(v,2); V=[v1]; H=[]; k=0;

while k <= m-1

[k,V,H] = GSarnoldi(A,m,k,V,H);

end

function [k,V,H]=GSarnoldi(A,m,k,V,H)

k=k+1; H=[H,V(:,1:k)™*A*V(:,k)];

s=0; for i=1:k, s=s+H(i,k)*V(:,i); end

w=A*V(:,k)-s; H(k+1,k)=norm(w,2);

if ( H(k+1,k) <= eps ) & ( k < m )

V=[V,w/H(k+1,k)];

else

k=m+1;

end

Having introduced an algorithm for generating the basis for a Krylov sub-

space of any order, we can now solve the linear system (3.2) by a Krylov

method. As already noticed, for all of these methods the iterate x(k) is

always of the form (4.54) and, for a given r(0) , the vector x(k) is selected as

being the unique element in Wk which satis¬es a criterion of minimal dis-

tance from x. Thus, the feature distinguishing two di¬erent Krylov methods

is the criterion for selecting x(k) .

The most natural idea consists of searching for x(k) ∈ Wk as the vector

which minimizes the Euclidean norm of the error. This approach, how-