45

GMRES ITERATION

In the context of GMRES iteration, however, we can incrementally perform

the QR factorization of H as the GMRES iteration progresses [167]. To see

this, note that if Rk = Qk Hk and, after orthogonalization, we add the new

column hk+2 to Hk , we can update both Qk and Rk by ¬rst multiplying hk+2

by Qk (that is applying the ¬rst k Givens rotations to hk+2 ), then computing

the Givens rotation Gk+1 that annihilates the (k + 2)nd element of Qk hk+2 ,

and ¬nally setting Qk+1 = Gk+1 Qk and forming Rk+1 by augmenting Rk with

Gk+1 Qk hk+2 .

The MATLAB implementation of Algorithm gmres stores Qk by storing

the sequences {cj } and {sj } and then computing the action of Qk on a vector

x ∈ Rk+1 by applying Gj (cj , sj ) in turn to obtain

Qk x = Gk (ck , sk ) . . . G1 (c1 , s1 )x.

We overwrite the upper triangular part of Hk with the triangular part of the

QR factorization of Hk in the MATLAB code. The MATLAB implementation

of Algorithm gmres uses (3.10) to test for loss of orthogonality.

Algorithm 3.5.1. gmres(x, b, A, , kmax, ρ)

1. r = b ’ Ax, v1 = r/ r 2 , ρ = r 2 , β = ρ,

k = 0; g = ρ(1, 0, . . . , 0)T ∈ Rkmax+1

2. While ρ > b and k < kmax do

2

(a) k = k + 1

(b) vk+1 = Avk

for j = 1, . . . k

T

i. hjk = vk+1 vj

ii. vk+1 = vk+1 ’ hjk vj

(c) hk+1,k = vk+1 2

(d) Test for loss of orthogonality and reorthogonalize if necessary.

(e) vk+1 = vk+1 / vk+1 2

(f) i. If k > 1 apply Qk’1 to the kth column of H.

h2 + h2

ii. ν = k+1,k .

k,k

iii. ck = hk,k /ν, sk = ’hk+1,k /ν

hk,k = ck hk,k ’ sk hk+1,k , hk+1,k = 0

iv. g = Gk (ck , sk )g.

(g) ρ = |(g)k+1 |.

3. Set ri,j = hi,j for 1 ¤ i, j ¤ k.

Set (w)i = (g)i for 1 ¤ i ¤ k.

Solve the upper triangular system Ry k = w.

4. xk = x0 + Vk y k .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

46 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

We close with an example of an implementation of GMRES(m) . This

implementation does not test for success and may, therefore, fail to terminate.

You are asked to repair this in exercise 7. Aside from the parameter m, the

arguments to Algorithm gmresm are the same as those for Algorithm gmres.

Algorithm 3.5.2. gmresm(x, b, A, , kmax, m, ρ)

1. gmres(x, b, A, , m, ρ)

2. k = m

3. While ρ > b and k < kmax do

2

(a) gmres(x, b, A, , m, ρ)

(b) k = k + m

The storage costs of m iterations of gmres or of gmresm are the m + 2

vectors b, x, and {vk }m .

k=1

3.6. Other methods for nonsymmetric systems

The methods for nonsymmetric linear systems that receive most attention in

this book, GMRES, CGNR, and CGNE, share the properties that they are easy

to implement, can be analyzed by a common residual polynomial approach, and

only terminate if an acceptable approximate solution has been found. CGNR

and CGNE have the disadvantages that a transpose-vector product must be

computed for each iteration and that the coe¬cient matrix AT A or AAT has

condition number the square of that of the original matrix. In § 3.8 we give

an example of how this squaring of the condition number can lead to failure

of the iteration. GMRES needs only matrix-vector products and uses A alone,

but, as we have seen, a basis for Kk must be stored to compute xk . Hence,

the storage requirements increase as the iteration progresses. For large and

ill-conditioned problems, it may be impossible to store enough basis vectors

and the iteration may have to be restarted. Restarting can seriously degrade

performance.

An ideal method would, like CG, only need matrix-vector products, be

based on some kind of minimization principle or conjugacy, have modest

storage requirements that do not depend on the number of iterations needed

for convergence, and converge in N iterations for all nonsingular A. However,

[74], methods based on short-term recurrences such as CG that also satisfy

minimization or conjugacy conditions cannot be constructed for general

matrices. The methods we describe in this section fall short of the ideal, but

can still be quite useful. We discuss only a small subset of these methods and

refer the reader to [12] and [78] for pointers to more of the literature on this

subject. All the methods we present in this section require two matrix-vector

products for each iteration.

Consistently with our implementation of GMRES, we take the view that

preconditioners will be applied externally to the iteration. However, as with

CG, these methods can also be implemented in a manner that incorporates

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

47

GMRES ITERATION

the preconditioner and uses the residual for the original system to control

termination.

3.6.1. Bi-CG. The earliest such method Bi-CG (Biconjugate gradient)

[122], [76], does not enforce a minimization principle; instead, the kth residual

must satisfy the bi-orthogonality condition

T

(3.16) rk w = 0 for all w ∈ Kk ,

where

Kk = span(ˆ0 , AT r0 , . . . , (AT )k’1 r0 )

r ˆ ˆ

is the Krylov space for AT and the vector r0 . r0 is a user-supplied vector and is

ˆˆ

often set to r0 . The algorithm gets its name because it produces sequences of

residuals {rk }, {ˆk } and search directions {pk }, {ˆk } such that bi-orthogonality

r p

T r = 0 if k = l and the search directions {p } and {ˆ } satisfy

holds, i. e. rk l

ˆ pk

k

the bi-conjugacy property

pT Apl = 0 if k = l.

ˆk

In the symmetric positive de¬nite case (3.16) is equivalent to the minimization

principle (2.2) for CG [89].

Using the notation of Chapter 2 and [191] we give an implementation of Bi-

CG making the choice r0 = r0 . This algorithmic description is explained and

ˆ

motivated in more detail in [122], [76], [78], and [191]. We do not recommend

use of Bi-CG and present this algorithm only as a basis for discussion of some

of the properties of this class of algorithms.

Algorithm 3.6.1. bicg(x, b, A, , kmax)

1. r = b ’ Ax, r = r, ρ0 = 1, p = p = 0, k = 0

ˆ ˆ

2. Do While r > b and k < kmax

2 2

(a) k = k + 1

(b) ρk = rT r, β = ρk /ρk’1

ˆ

(c) p = r + βp, p = r + β p

ˆˆ ˆ

(d) v = Ap

(e) ± = ρk /(ˆT v)

p

(f) x = x + ±p

(g) r = r ’ ±v; r = r ’ ±AT p

ˆˆ ˆ

Note that if A = AT is spd, Bi-CG produces the same iterations as CG

(but computes everything except x twice). If, however, A is not spd, there is

no guarantee that ρk in step 2b or pT Ap, the denominator in step 2e, will not

ˆ

T Ap = 0 we say that a breakdown has taken

vanish. If either ρk’1 = 0 or p ˆ

place. Even if these quantities are nonzero but very small, the algorithm can

become unstable or produce inaccurate results. While there are approaches

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

48 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

[80], [78], [148], [79], [81], to avoid some breakdowns in many variations of

Bi-CG, there are no methods that both limit storage and completely eliminate

the possibility of breakdown [74]. All of the methods we present in this section

can break down and should be implemented with that possibility in mind.

Once breakdown has occurred, one can restart the Bi-CG iteration or pass

the computation to another algorithm, such as GMRES. The possibility of

breakdown is small and certainly should not eliminate the algorithms discussed

below from consideration if there is not enough storage for GMRES to perform

well.

Breakdowns aside, there are other problems with Bi-CG. A transpose-

vector product is needed, which at the least will require additional program-

ming and may not be possible at all. The performance of the algorithm can

be erratic or even unstable with residuals increasing by several orders of mag-

nitude from one iteration to the next. Moreover, the e¬ort in computing r ˆ

at each iteration is wasted in that r makes no contribution to x. However,

ˆ

Bi-CG sometimes performs extremely well and the remaining algorithms in

this section represent attempts to capture this good performance and damp

the erratic behavior when it occurs.

We can compare the best-case performance of Bi-CG with that of GMRES.

Note that there is pk ∈ Pk such that both

¯

rk = pk (A)r0 and rk = pk (AT )ˆ0 .

(3.17) ¯ ˆ ¯ r

Hence, by the minimization property for GMRES

Bi’CG

rk RES

GM

¤ rk 2,

2

reminding us that GMRES, if su¬cient storage is available, will always

reduce the residual more rapidly than Bi-CG (in terms of iterations, but not

necessarily in terms of computational work). One should also keep in mind that

a single Bi-CG iteration requires two matrix-vector products and a GMRES

iterate only one, but that the cost of the GMRES iteration increases (in terms

of ¬‚oating-point operations) as the iteration progresses.

3.6.2. CGS. A remedy for one of the problems with Bi-CG is the Conjugate

Gradient Squared (CGS) algorithm [180]. The algorithm takes advantage of

the fact that (3.17) implies that the scalar product rT r in Bi-CG (step 2b of

ˆ

Algorithm bicg) can be represented without using AT as

rk rk = (¯k (A)r0 )T (¯k (AT )ˆ0 ) = (¯k (A)2 r0 )T r0 .

T

ˆ p p r p ˆ

The other references to AT can be eliminated in a similar fashion and an

iteration satisfying

rk = pk (A)2 r0

(3.18) ¯

is produced, where pk is the same polynomial produced by Bi-CG and used in

¯

(3.17). This explains the name, Conjugate Gradient Squared.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

49

GMRES ITERATION

The work used in Bi-CG to compute r is now used to update x. CGS

ˆ

replaces the transpose-vector product with an additional matrix-vector product

and applies the square of the Bi-CG polynomial to r0 to produce rk . This may,

of course, change the convergence properties for the worse and either improve

good convergence or magnify erratic behavior [134], [191]. CGS has the same

potential for breakdown as Bi-CG. Since p2 ∈ P2k we have

¯k

r2k RES

GM CGS

(3.19) ¤ rk 2.

2

We present an algorithmic description of CGS that we will refer to in our

discussion of TFQMR in § 3.6.4. In our description of CGS we will index the

vectors and scalars that would be overwritten in an implementation so that we

can describe some of the ideas in TFQMR later on. This description is taken

from [77].

Algorithm 3.6.2. cgs(x, b, A, , kmax)

1. x0 = x; p0 = u0 = r0 = b ’ Ax

2. v0 = Ap0 ; r0 = r0

ˆ

ˆT

3. ρ0 = r0 r0 ; k = 0

4. Do While r > b and k < kmax

2 2

(a) k = k + 1

ˆT

(b) σk’1 = r0 vk’1

(c) ±k’1 = ρk’1 /σk’1

(d) qk = uk’1 ’ ±k’1 vk’1

(e) xk = xk’1 + ±k’1 (uk’1 + qk )

rk = rk’1 ’ ±k’1 A(uk’1 + qk )

ˆT

(f) ρk = r0 rk ; βk = ρk /ρk’1

uk = rk + βk qk

pk = uk + βk (qk + βk pk’1 )

vk = Apk

Breakdowns take place when either ρk’1 or σk’1 vanish. If the algorithm

does not break down, then ±k’1 = 0 for all k. One can show [180], [77], that

if pk is the residual polynomial from (3.17) and (3.18) then

¯

(3.20) pk (z) = pk’1 (z) ’ ±k’1 z qk’1 (z)

¯ ¯ ¯

where the auxiliary polynomials qk ∈ Pk are given by q0 = 1 and for k ≥ 1 by

¯ ¯

(3.21) qk (z) = pk (z) + βk qk’1 (z).

¯ ¯ ¯

We provide no MATLAB implementation of Bi-CG or CGS. Bi-CGSTAB

is the most e¬ective representative of this class of algorithms.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

50 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

3.6.3. Bi-CGSTAB. The Bi-conjugate gradient stabilized method (Bi-

CGSTAB) [191] attempts to smooth the convergence of CGS by replacing

(3.18) with

(3.22) rk = qk (A)pk (A)r0

where

k

qk (z) = (1 ’ ωi z).

i=1

The constant ωi is selected to minimize ri = qi (A)pi (A)r0 as a function of ωi .

The examples in [191] indicate the e¬ectiveness of this approach, which can

be thought of [12] as a blend of Bi-CG and GMRES(1). The performance in

cases where the spectrum has a signi¬cant imaginary part can be improved by

constructing qk to have complex conjugate pairs of roots, [98].

There is no convergence theory for Bi-CGSTAB and no estimate of the

residuals that is better than that for CGS, (3.19). In our description of the

implementation, which is described and motivated fully in [191], we use r0 = r0

ˆ

and follow the notation of [191].

Algorithm 3.6.3. bicgstab(x, b, A, , kmax)

ˆT

1. r = b ’ Ax, r0 = r = r, ρ0 = ± = ω = 1, v = p = 0, k = 0, ρ1 = r0 r

ˆ ˆ

2. Do While r > b and k < kmax

2 2

(a) k = k + 1

(b) β = (ρk /ρk’1 )(±/ω)

(c) p = r + β(p ’ ωv)

(d) v = Ap

rT

(e) ± = ρk /(ˆ0 v)

(f) s = r ’ ±v, t = As

(g) ω = tT s/ t 2 , ρk+1 = ’ωˆ0 t

rT

2

(h) x = x + ±p + ωs

(i) r = s ’ ωt

Note that the iteration can break down in steps 2b and 2e. We provide an

implementation of Algorithm bicgstab in the collection of MATLAB codes.

The cost in storage and in ¬‚oating-point operations per iteration re-

mains bounded for the entire iteration. One must store seven vectors

(x, b, r, p, v, r0 , t), letting s overwrite r when needed. A single iteration re-

ˆ

quires four scalar products. In a situation where many GMRES iterations are

needed and matrix-vector product is fast, Bi-CGSTAB can have a much lower

average cost per iterate than GMRES. The reason for this is that the cost of

orthogonalization in GMRES can be much more than that of a matrix-vector

product if the dimension of the Krylov space is large. We will present such a

case in § 3.8.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

51

GMRES ITERATION

3.6.4. TFQMR. Now we consider the quasi-minimal residual (QMR)

family of algorithms [80], [81], [77], [37], [202]. We will focus on the transpose-

free algorithm TFQMR proposed in [77], to illustrate the quasi-minimization

idea. All algorithms in this family minimize the norm of an easily computable

quantity q = f ’ Hz, a quasi-residual over z ∈ RN . The quasi-residual is

related to the true residual by a full-rank linear transformation r = Lq and

reduction of the residual can be approximately measured by reduction in q.