<< . .

. 24
( : 95)



. . >>

A 0,1
qk+1 ∈Pk+1

xT Qqk+1 (Λ)Λqk+1 (Λ)QT x
= min
0,1
qk+1 ∈Pk+1

yT diag(qk+1 (»i )»i qk+1 (»i ))y
= min
0,1
qk+1 ∈Pk+1
n
yi »i (qk+1 (»i ))2
2
= min
0,1
qk+1 ∈Pk+1 i=1
4.3 Stationary and Nonstationary Iterative Methods 155

having set y = Qx. Thus, we can conclude that
n
¤
e(k+1) 2 2 2
min max (qk+1 (»i )) yi »i .
A
0,1
qk+1 ∈Pk+1 »i ∈σ(A) i=1

n
yi »i = e(0)
2 2
Recalling that A, we have
i=1


e(k+1) A
¤ max |qk+1 (»i )|.
min
e(0) A 0,1
qk+1 ∈Pk+1 »i ∈σ(A)


Let us now recall the following property

max |q(z)| over the space
Property 4.6 The problem of minimizing
»n ¤z¤»1
P0,1 ([»n , »1 ]) admits a unique solution, given by the polynomial
k+1


»1 + »n ’ 2ξ
ξ ∈ [»n , »1 ],
pk+1 (ξ) = Tk+1 /Ck+1 ,
»1 ’ » n
»1 +»
where Ck+1 = Tk+1 ( »1 ’»n ) and Tk+1 is the Chebyshev polynomial of degree k + 1
n
(see Section 10.10). The value of the minimum is 1/Ck+1 .

Using this property we get

e(k+1) A 1
¤
e(0) A »1 + »n
Tk+1
»1 ’ »n

from which the thesis follows since in the case of a symmetric positive de¬nite
matrix
2ck+1
1
= .
1 + c2(k+1)
Ck+1
3

The generic k-th iteration of the conjugate gradient method is well de¬ned
only if the descent direction p(k) is non null. Besides, if p(k) = 0, then the
iterate x(k) must necessarily coincide with the solution x of the system.
Moreover, irrespectively of the choice of the parameters βk , one can show
(see [Axe94], p. 463) that the sequence x(k) generated by the CG method
is such that either x(k) = x, p(k) = 0, ±k = 0 for any k, or there must exist
an integer m such that x(m) = x, where x(k) = x, p(k) = 0 and ±k = 0 for
k = 0, 1, . . . , m ’ 1.
The particular choice made for βk in (4.45) ensures that m ¤ n. In ab-
sence of rounding errors, the CG method can thus be regarded as being a
direct method, since it terminates after a ¬nite number of steps. However,
for matrices of large size, it is usually employed as an iterative scheme,
156 4. Iterative Methods for Solving Linear Systems

where the iterations are stopped when the error gets below a ¬xed toler-
ance. In this respect, the dependence of the error reduction factor on the
condition number of the matrix is more favorable than for the gradient
method. We also notice that estimate (4.47) is often overly pessimistic and
does not account for the fact that in this method, unlike what happens for
the gradient method, the convergence is in¬‚uenced by the whole spectrum
of A, and not only by its extremal eigenvalues.




Remark 4.3 (E¬ect of rounding errors) The termination property of
the CG method is rigorously valid only in exact arithmetic. The cumulating
rounding errors prevent the descent directions from being A-conjugate and
can even generate null denominators in the computation of coe¬cients ±k
and βk . This latter phenomenon, known as breakdown, can be avoided by
introducing suitable stabilization procedures; in such an event, we speak
about stabilized gradient methods.
Despite the use of these strategies, it may happen that the CG method
fails to converge (in ¬nite arithmetic) after n iterations. In such a case,
the only reasonable possibility is to restart the iterative process, taking
as residual the last computed one. By so doing, the cyclic CG method or
CG method with restart is obtained, for which, however, the convergence
properties of the original CG method are no longer valid.




4.3.5 The Preconditioned Conjugate Gradient Method
If P is a symmetric and positive de¬nite preconditioning matrix, the pre-
conditioned conjugate gradient method (PCG) consists of applying the CG
method to the preconditioned system


P’1/2 AP’1/2 y = P’1/2 b, with y = P1/2 x.


In practice, the method is implemented without explicitly requiring the
computation of P1/2 or P’1/2 . After some algebra, the following scheme is
obtained:
given x(0) and setting r(0) = b ’ Ax(0) , z(0) = P’1 r(0) e p(0) = z(0) , the
k-th iteration reads
4.3 Stationary and Nonstationary Iterative Methods 157



T
p(k) r(k)
±k = T
p(k) Ap(k)
x(k+1) = x(k) + ±k p(k)

r(k+1) = r(k) ’ ±k Ap(k)

Pz(k+1) = r(k+1)

(Ap(k) )T z(k+1)
βk =
(Ap(k) )T p(k)
p(k+1) = z(k+1) ’ βk p(k) .

The computational cost is increased with respect to the CG method, as
one needs to solve at each step the linear system Pz(k+1) = r(k+1) . For this
system the symmetric preconditioners examined in Section 4.3.2 can be
used. The error estimate is the same as for the nonpreconditioned method,
provided to replace the matrix A by P’1 A.

In Program 20 an implementation of the PCG method is reported. For
a description of the input/output parameters, see Program 19.

Program 20 - conjgrad : Preconditioned conjugate gradient method
function [x, error, niter, ¬‚ag] = conjgrad(A, x, b, P, maxit, tol)
¬‚ag = 0; niter = 0; bnrm2 = norm( b );
if ( bnrm2 == 0.0 ), bnrm2 = 1.0; end
r = b - A*x; error = norm( r ) / bnrm2;
if ( error < tol ) return, end
for niter = 1:maxit
z = P \ r; rho = (r™*z);
if niter > 1
beta = rho / rho1; p = z + beta*p;
else
p = z;
end
q = A*p; alpha = rho / (p™*q );
x = x + alpha * p; r = r - alpha*q;
error = norm( r ) / bnrm2;
if ( error <= tol ), break, end
rho1 = rho;
end
if ( error > tol ) ¬‚ag = 1; end
158 4. Iterative Methods for Solving Linear Systems

Example 4.7 Let us consider again the linear system of Example 4.6. The CG
method has been run with the same input data as in the previous example. It
converges in 3 iterations for m = 16 and in 45 iterations for m = 400. Using the
same preconditioner as in Example 4.6, the number of iterations decreases from

45 to 26, in the case m = 400.


0
10


’2
10


’4
10


’6
10


’8
10


’10
10


’12
10


’14
10
0 5 10 15 20 25 30 35 40 45




FIGURE 4.8. Behavior of the residual, normalized to the right-hand side, as a
function of the number of iterations for the conjugate gradient method applied
to the systems of Example 4.6 in the case m = 400. The curve in dashed line
refers to the non preconditioned method, while the curve in solid line refers to
the preconditioned one



4.3.6 The Alternating-Direction Method
Assume that A = A1 +A2 , with A1 and A2 symmetric and positive de¬nite.
The alternating direction method (ADI), as introduced by Peaceman and
Rachford [PJ55], is an iterative scheme for (3.2) which consists of solving
the following systems ∀k ≥ 0
(I + ±1 A1 )x(k+1/2) = (I ’ ±1 A2 )x(k) + ±1 b,
(4.50)
= (I ’ ±2 A1 )x
(k+1) (k+1/2)
(I + ±2 A2 )x + ±2 b
where ±1 and ±2 are two real parameters. The ADI method can be cast in
the form (4.2) setting

B = (I + ±2 A2 )’1 (I ’ ±2 A1 )(I + ±1 A1 )’1 (I ’ ±1 A2 ),

f = ±1 (I ’ ±2 A1 )(I + ±1 A1 )’1 + ±2 I b.
Both B and f depend on ±1 and ±2 . The following estimate holds
(1) (2)
1 ’ ±2 »i 1 ’ ±1 »i
ρ(B) ¤ max max ,
(1) (2)
i=1,... ,n ±1 »i i=1,... ,n
1+ 1+ ±2 »i
4.4 Methods Based on Krylov Subspace Iterations 159

(i) (i)
where »1 and »2 , for i = 1, . . . , n, are the eigenvalues of A1 and A2 ,
respectively. The method converges if ρ(B) < 1, which is always veri¬ed if
(j)
±1 = ±2 = ± > 0. Moreover (see [Axe94]) if γ ¤ »i ¤ δ ∀i = 1, . . . , n,
∀j = 1, 2, for suitable γ and δ then the ADI method converges with the

choice ±1 = ±2 = 1/ δγ, provided that γ/δ tends to 0 as the size of A
grows. In such an event the corresponding spectral radius satis¬es
2
1’ γ/δ
ρ(B) ¤ .
1+ γ/δ


4.4 Methods Based on Krylov Subspace Iterations
In this section we introduce iterative methods based on Krylov subspace
iterations. For the proofs and further analysis, we refer to [Saa96], [Axe94]
and [Hac94].

Consider the Richardson method (4.24) with P=I; the residual at the
k-th step can be related to the initial residual as
k’1
(I ’ ±j A)r(0)
(k)
r = (4.51)
j=0


so that r(k) = pk (A)r(0) , where pk (A) is a polynomial in A of degree k. If
we introduce the space

Km (A; v) = span v, Av, . . . , Am’1 v , (4.52)

it immediately appears from (4.51) that r(k) ∈ Kk+1 (A; r(0) ). The space
de¬ned in (4.52) is called the Krylov subspace of order m. It is a subspace
of the set spanned by all the vectors u ∈ Rn that can be written as u =
pm’1 (A)v, where pm’1 is a polynomial in A of degree ¤ m ’ 1.
In an analogous manner as for (4.51), it is seen that the iterate x(k) of
the Richardson method is given by
k’1
(k) (0)
±j r(j)
x =x +
j=0


so that x(k) belongs to the following space

Wk = v = x(0) + y, y ∈ Kk (A; r(0) ) . (4.53)

k’1
Notice also that j=0 ±j r(j) is a polynomial in A of degree less than k ’ 1.
In the non preconditioned Richardson method we are thus looking for an
160 4. Iterative Methods for Solving Linear Systems

approximate solution to x in the space Wk . More generally, we can think
of devising methods that search for approximate solutions of the form

x(k) = x(0) + qk’1 (A)r(0) , (4.54)

where qk’1 is a polynomial selected in such a way that x(k) be, in a sense
that must be made precise, the best approximation of x in Wk . A method
that looks for a solution of the form (4.54) with Wk de¬ned as in (4.53) is
called a Krylov method.
A ¬rst question concerning Krylov subspace iterations is whether the
dimension of Km (A; v) increases as the order m grows. A partial answer is
provided by the following result.

Property 4.7 Let A ∈ Rn—n and v ∈ Rn . The Krylov subspace Km (A; v)
has dimension equal to m i¬ the degree of v with respect to A, denoted by
degA (v), is not less than m, where the degree of v is de¬ned as the minimum
degree of a monic non null polynomial p in A, for which p(A)v = 0.

The dimension of Km (A; v) is thus equal to the minimum between m and
the degree of v with respect to A and, as a consequence, the dimension
of the Krylov subspaces is certainly a nondecreasing function of m. Notice
that the degree of v cannot be greater than n due to the Cayley-Hamilton
Theorem (see Section 1.7).

Example 4.8 Consider the matrix A = tridiag4 (’1, 2, ’1). The vector v =
(1, 1, 1, 1)T has degree 2 with respect to A since p2 (A)v = 0 with p2 (A) = I4 ’
3A + A2 , while there is no monic polynomial p1 of degree 1 for which p1 (A)v = 0.
As a consequence, all Krylov subspaces from K2 (A; v) on, have dimension equal
to 2. The vector w = (1, 1, ’1, 1)T has, instead, degree 4 with respect to A. •


For a ¬xed m, it is possible to compute an orthonormal basis for Km (A; v)
using the so-called Arnoldi algorithm.
Setting v1 = v/ v 2 , this method generates an orthonormal basis {vi }
for Km (A; v1 ) using the Gram-Schmidt procedure (see Section 3.4.3). For
k = 1, . . . , m, the Arnoldi algorithm computes
T
hik = vi Avk , i = 1, 2, . . . , k,
(4.55)
k
wk = Avk ’ hik vi , hk+1,k = wk 2.
i=1

If wk = 0 the process terminates and in such a case we say that a breakdown
of the algorithm has occurred; otherwise, we set vk+1 = wk / wk 2 and the
algorithm restarts, incrementing k by 1.
4.4 Methods Based on Krylov Subspace Iterations 161

It can be shown that if the method terminates at the step m then the
vectors v1 , . . . , vm form a basis for Km (A; v). In such a case, if we denote
by Vm ∈ Rn—m the matrix whose columns are the vectors vi , we have
T T (4.56)
Vm AVm = Hm , Vm+1 AVm = Hm ,
where Hm ∈ R(m+1)—m is the upper Hessenberg matrix whose entries hij
are given by (4.55) and Hm ∈ Rm—m is the restriction of Hm to the ¬rst m
rows and m columns.
The algorithm terminates at an intermediate step k < m i¬ degA (v1 ) =
k. As for the stability of the procedure, all the considerations valid for the
Gram-Schmidt method hold. For more e¬cient and stable computational
variants of (4.55), we refer to [Saa96].
The functions arnoldi alg and GSarnoldi, invoked by Program 21, pro-
vide an implementation of the Arnoldi algorithm. In output, the columns
of V contain the vectors of the generated basis, while the matrix H stores
the coe¬cients hik computed by the algorithm. If m steps are carried out,
V = Vm and H(1 : m, 1 : m) = Hm .


Program 21 - arnoldi alg : The Arnoldi algorithm
function [V,H]=arnoldi alg(A,v,m)
v=v/norm(v,2); V=[v1]; H=[]; k=0;
while k <= m-1
[k,V,H] = GSarnoldi(A,m,k,V,H);
end

function [k,V,H]=GSarnoldi(A,m,k,V,H)
k=k+1; H=[H,V(:,1:k)™*A*V(:,k)];
s=0; for i=1:k, s=s+H(i,k)*V(:,i); end
w=A*V(:,k)-s; H(k+1,k)=norm(w,2);
if ( H(k+1,k) <= eps ) & ( k < m )
V=[V,w/H(k+1,k)];
else
k=m+1;
end


Having introduced an algorithm for generating the basis for a Krylov sub-
space of any order, we can now solve the linear system (3.2) by a Krylov
method. As already noticed, for all of these methods the iterate x(k) is
always of the form (4.54) and, for a given r(0) , the vector x(k) is selected as
being the unique element in Wk which satis¬es a criterion of minimal dis-
tance from x. Thus, the feature distinguishing two di¬erent Krylov methods
is the criterion for selecting x(k) .
The most natural idea consists of searching for x(k) ∈ Wk as the vector
which minimizes the Euclidean norm of the error. This approach, how-

<< . .

. 24
( : 95)



. . >>