<< . .

. 8
( : 26)



. . >>

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




45
GMRES ITERATION

In the context of GMRES iteration, however, we can incrementally perform
the QR factorization of H as the GMRES iteration progresses [167]. To see
this, note that if Rk = Qk Hk and, after orthogonalization, we add the new
column hk+2 to Hk , we can update both Qk and Rk by ¬rst multiplying hk+2
by Qk (that is applying the ¬rst k Givens rotations to hk+2 ), then computing
the Givens rotation Gk+1 that annihilates the (k + 2)nd element of Qk hk+2 ,
and ¬nally setting Qk+1 = Gk+1 Qk and forming Rk+1 by augmenting Rk with
Gk+1 Qk hk+2 .
The MATLAB implementation of Algorithm gmres stores Qk by storing
the sequences {cj } and {sj } and then computing the action of Qk on a vector
x ∈ Rk+1 by applying Gj (cj , sj ) in turn to obtain

Qk x = Gk (ck , sk ) . . . G1 (c1 , s1 )x.

We overwrite the upper triangular part of Hk with the triangular part of the
QR factorization of Hk in the MATLAB code. The MATLAB implementation
of Algorithm gmres uses (3.10) to test for loss of orthogonality.
Algorithm 3.5.1. gmres(x, b, A, , kmax, ρ)
1. r = b ’ Ax, v1 = r/ r 2 , ρ = r 2 , β = ρ,
k = 0; g = ρ(1, 0, . . . , 0)T ∈ Rkmax+1

2. While ρ > b and k < kmax do
2

(a) k = k + 1
(b) vk+1 = Avk
for j = 1, . . . k
T
i. hjk = vk+1 vj
ii. vk+1 = vk+1 ’ hjk vj
(c) hk+1,k = vk+1 2

(d) Test for loss of orthogonality and reorthogonalize if necessary.
(e) vk+1 = vk+1 / vk+1 2

(f) i. If k > 1 apply Qk’1 to the kth column of H.
h2 + h2
ii. ν = k+1,k .
k,k
iii. ck = hk,k /ν, sk = ’hk+1,k /ν
hk,k = ck hk,k ’ sk hk+1,k , hk+1,k = 0
iv. g = Gk (ck , sk )g.
(g) ρ = |(g)k+1 |.

3. Set ri,j = hi,j for 1 ¤ i, j ¤ k.
Set (w)i = (g)i for 1 ¤ i ¤ k.
Solve the upper triangular system Ry k = w.

4. xk = x0 + Vk y k .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




46 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

We close with an example of an implementation of GMRES(m) . This
implementation does not test for success and may, therefore, fail to terminate.
You are asked to repair this in exercise 7. Aside from the parameter m, the
arguments to Algorithm gmresm are the same as those for Algorithm gmres.
Algorithm 3.5.2. gmresm(x, b, A, , kmax, m, ρ)
1. gmres(x, b, A, , m, ρ)

2. k = m

3. While ρ > b and k < kmax do
2

(a) gmres(x, b, A, , m, ρ)
(b) k = k + m
The storage costs of m iterations of gmres or of gmresm are the m + 2
vectors b, x, and {vk }m .
k=1

3.6. Other methods for nonsymmetric systems
The methods for nonsymmetric linear systems that receive most attention in
this book, GMRES, CGNR, and CGNE, share the properties that they are easy
to implement, can be analyzed by a common residual polynomial approach, and
only terminate if an acceptable approximate solution has been found. CGNR
and CGNE have the disadvantages that a transpose-vector product must be
computed for each iteration and that the coe¬cient matrix AT A or AAT has
condition number the square of that of the original matrix. In § 3.8 we give
an example of how this squaring of the condition number can lead to failure
of the iteration. GMRES needs only matrix-vector products and uses A alone,
but, as we have seen, a basis for Kk must be stored to compute xk . Hence,
the storage requirements increase as the iteration progresses. For large and
ill-conditioned problems, it may be impossible to store enough basis vectors
and the iteration may have to be restarted. Restarting can seriously degrade
performance.
An ideal method would, like CG, only need matrix-vector products, be
based on some kind of minimization principle or conjugacy, have modest
storage requirements that do not depend on the number of iterations needed
for convergence, and converge in N iterations for all nonsingular A. However,
[74], methods based on short-term recurrences such as CG that also satisfy
minimization or conjugacy conditions cannot be constructed for general
matrices. The methods we describe in this section fall short of the ideal, but
can still be quite useful. We discuss only a small subset of these methods and
refer the reader to [12] and [78] for pointers to more of the literature on this
subject. All the methods we present in this section require two matrix-vector
products for each iteration.
Consistently with our implementation of GMRES, we take the view that
preconditioners will be applied externally to the iteration. However, as with
CG, these methods can also be implemented in a manner that incorporates



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




47
GMRES ITERATION

the preconditioner and uses the residual for the original system to control
termination.

3.6.1. Bi-CG. The earliest such method Bi-CG (Biconjugate gradient)
[122], [76], does not enforce a minimization principle; instead, the kth residual
must satisfy the bi-orthogonality condition
T
(3.16) rk w = 0 for all w ∈ Kk ,

where
Kk = span(ˆ0 , AT r0 , . . . , (AT )k’1 r0 )
r ˆ ˆ
is the Krylov space for AT and the vector r0 . r0 is a user-supplied vector and is
ˆˆ
often set to r0 . The algorithm gets its name because it produces sequences of
residuals {rk }, {ˆk } and search directions {pk }, {ˆk } such that bi-orthogonality
r p
T r = 0 if k = l and the search directions {p } and {ˆ } satisfy
holds, i. e. rk l
ˆ pk
k
the bi-conjugacy property

pT Apl = 0 if k = l.
ˆk

In the symmetric positive de¬nite case (3.16) is equivalent to the minimization
principle (2.2) for CG [89].
Using the notation of Chapter 2 and [191] we give an implementation of Bi-
CG making the choice r0 = r0 . This algorithmic description is explained and
ˆ
motivated in more detail in [122], [76], [78], and [191]. We do not recommend
use of Bi-CG and present this algorithm only as a basis for discussion of some
of the properties of this class of algorithms.
Algorithm 3.6.1. bicg(x, b, A, , kmax)
1. r = b ’ Ax, r = r, ρ0 = 1, p = p = 0, k = 0
ˆ ˆ

2. Do While r > b and k < kmax
2 2

(a) k = k + 1
(b) ρk = rT r, β = ρk /ρk’1
ˆ
(c) p = r + βp, p = r + β p
ˆˆ ˆ
(d) v = Ap
(e) ± = ρk /(ˆT v)
p
(f) x = x + ±p
(g) r = r ’ ±v; r = r ’ ±AT p
ˆˆ ˆ

Note that if A = AT is spd, Bi-CG produces the same iterations as CG
(but computes everything except x twice). If, however, A is not spd, there is
no guarantee that ρk in step 2b or pT Ap, the denominator in step 2e, will not
ˆ
T Ap = 0 we say that a breakdown has taken
vanish. If either ρk’1 = 0 or p ˆ
place. Even if these quantities are nonzero but very small, the algorithm can
become unstable or produce inaccurate results. While there are approaches



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




48 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

[80], [78], [148], [79], [81], to avoid some breakdowns in many variations of
Bi-CG, there are no methods that both limit storage and completely eliminate
the possibility of breakdown [74]. All of the methods we present in this section
can break down and should be implemented with that possibility in mind.
Once breakdown has occurred, one can restart the Bi-CG iteration or pass
the computation to another algorithm, such as GMRES. The possibility of
breakdown is small and certainly should not eliminate the algorithms discussed
below from consideration if there is not enough storage for GMRES to perform
well.
Breakdowns aside, there are other problems with Bi-CG. A transpose-
vector product is needed, which at the least will require additional program-
ming and may not be possible at all. The performance of the algorithm can
be erratic or even unstable with residuals increasing by several orders of mag-
nitude from one iteration to the next. Moreover, the e¬ort in computing r ˆ
at each iteration is wasted in that r makes no contribution to x. However,
ˆ
Bi-CG sometimes performs extremely well and the remaining algorithms in
this section represent attempts to capture this good performance and damp
the erratic behavior when it occurs.
We can compare the best-case performance of Bi-CG with that of GMRES.
Note that there is pk ∈ Pk such that both
¯

rk = pk (A)r0 and rk = pk (AT )ˆ0 .
(3.17) ¯ ˆ ¯ r

Hence, by the minimization property for GMRES
Bi’CG
rk RES
GM
¤ rk 2,
2

reminding us that GMRES, if su¬cient storage is available, will always
reduce the residual more rapidly than Bi-CG (in terms of iterations, but not
necessarily in terms of computational work). One should also keep in mind that
a single Bi-CG iteration requires two matrix-vector products and a GMRES
iterate only one, but that the cost of the GMRES iteration increases (in terms
of ¬‚oating-point operations) as the iteration progresses.

3.6.2. CGS. A remedy for one of the problems with Bi-CG is the Conjugate
Gradient Squared (CGS) algorithm [180]. The algorithm takes advantage of
the fact that (3.17) implies that the scalar product rT r in Bi-CG (step 2b of
ˆ
Algorithm bicg) can be represented without using AT as

rk rk = (¯k (A)r0 )T (¯k (AT )ˆ0 ) = (¯k (A)2 r0 )T r0 .
T
ˆ p p r p ˆ

The other references to AT can be eliminated in a similar fashion and an
iteration satisfying
rk = pk (A)2 r0
(3.18) ¯
is produced, where pk is the same polynomial produced by Bi-CG and used in
¯
(3.17). This explains the name, Conjugate Gradient Squared.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




49
GMRES ITERATION

The work used in Bi-CG to compute r is now used to update x. CGS
ˆ
replaces the transpose-vector product with an additional matrix-vector product
and applies the square of the Bi-CG polynomial to r0 to produce rk . This may,
of course, change the convergence properties for the worse and either improve
good convergence or magnify erratic behavior [134], [191]. CGS has the same
potential for breakdown as Bi-CG. Since p2 ∈ P2k we have
¯k

r2k RES
GM CGS
(3.19) ¤ rk 2.
2

We present an algorithmic description of CGS that we will refer to in our
discussion of TFQMR in § 3.6.4. In our description of CGS we will index the
vectors and scalars that would be overwritten in an implementation so that we
can describe some of the ideas in TFQMR later on. This description is taken
from [77].
Algorithm 3.6.2. cgs(x, b, A, , kmax)
1. x0 = x; p0 = u0 = r0 = b ’ Ax

2. v0 = Ap0 ; r0 = r0
ˆ

ˆT
3. ρ0 = r0 r0 ; k = 0

4. Do While r > b and k < kmax
2 2

(a) k = k + 1
ˆT
(b) σk’1 = r0 vk’1
(c) ±k’1 = ρk’1 /σk’1
(d) qk = uk’1 ’ ±k’1 vk’1
(e) xk = xk’1 + ±k’1 (uk’1 + qk )
rk = rk’1 ’ ±k’1 A(uk’1 + qk )
ˆT
(f) ρk = r0 rk ; βk = ρk /ρk’1
uk = rk + βk qk
pk = uk + βk (qk + βk pk’1 )
vk = Apk

Breakdowns take place when either ρk’1 or σk’1 vanish. If the algorithm
does not break down, then ±k’1 = 0 for all k. One can show [180], [77], that
if pk is the residual polynomial from (3.17) and (3.18) then
¯

(3.20) pk (z) = pk’1 (z) ’ ±k’1 z qk’1 (z)
¯ ¯ ¯

where the auxiliary polynomials qk ∈ Pk are given by q0 = 1 and for k ≥ 1 by
¯ ¯

(3.21) qk (z) = pk (z) + βk qk’1 (z).
¯ ¯ ¯

We provide no MATLAB implementation of Bi-CG or CGS. Bi-CGSTAB
is the most e¬ective representative of this class of algorithms.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




50 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

3.6.3. Bi-CGSTAB. The Bi-conjugate gradient stabilized method (Bi-
CGSTAB) [191] attempts to smooth the convergence of CGS by replacing
(3.18) with
(3.22) rk = qk (A)pk (A)r0
where
k
qk (z) = (1 ’ ωi z).
i=1

The constant ωi is selected to minimize ri = qi (A)pi (A)r0 as a function of ωi .
The examples in [191] indicate the e¬ectiveness of this approach, which can
be thought of [12] as a blend of Bi-CG and GMRES(1). The performance in
cases where the spectrum has a signi¬cant imaginary part can be improved by
constructing qk to have complex conjugate pairs of roots, [98].
There is no convergence theory for Bi-CGSTAB and no estimate of the
residuals that is better than that for CGS, (3.19). In our description of the
implementation, which is described and motivated fully in [191], we use r0 = r0
ˆ
and follow the notation of [191].
Algorithm 3.6.3. bicgstab(x, b, A, , kmax)
ˆT
1. r = b ’ Ax, r0 = r = r, ρ0 = ± = ω = 1, v = p = 0, k = 0, ρ1 = r0 r
ˆ ˆ

2. Do While r > b and k < kmax
2 2

(a) k = k + 1
(b) β = (ρk /ρk’1 )(±/ω)
(c) p = r + β(p ’ ωv)
(d) v = Ap
rT
(e) ± = ρk /(ˆ0 v)
(f) s = r ’ ±v, t = As
(g) ω = tT s/ t 2 , ρk+1 = ’ωˆ0 t
rT
2
(h) x = x + ±p + ωs
(i) r = s ’ ωt

Note that the iteration can break down in steps 2b and 2e. We provide an
implementation of Algorithm bicgstab in the collection of MATLAB codes.
The cost in storage and in ¬‚oating-point operations per iteration re-
mains bounded for the entire iteration. One must store seven vectors
(x, b, r, p, v, r0 , t), letting s overwrite r when needed. A single iteration re-
ˆ
quires four scalar products. In a situation where many GMRES iterations are
needed and matrix-vector product is fast, Bi-CGSTAB can have a much lower
average cost per iterate than GMRES. The reason for this is that the cost of
orthogonalization in GMRES can be much more than that of a matrix-vector
product if the dimension of the Krylov space is large. We will present such a
case in § 3.8.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




51
GMRES ITERATION

3.6.4. TFQMR. Now we consider the quasi-minimal residual (QMR)
family of algorithms [80], [81], [77], [37], [202]. We will focus on the transpose-
free algorithm TFQMR proposed in [77], to illustrate the quasi-minimization
idea. All algorithms in this family minimize the norm of an easily computable
quantity q = f ’ Hz, a quasi-residual over z ∈ RN . The quasi-residual is
related to the true residual by a full-rank linear transformation r = Lq and
reduction of the residual can be approximately measured by reduction in q.

<< . .

. 8
( : 26)



. . >>