Returning to Algorithm cgs we de¬ne sequences

±

uk’1 if m = 2k ’ 1 is odd,

ym =

q if m = 2k is even,

k

and ±

(¯k (A))2 r0

p if m = 2k ’ 1 is odd,

wm =

p (A)¯

¯k pk’1 (A)r0 if m = 2k is even.

Here the sequences {¯k } and {¯k } are given by (3.20) and (3.21). Hence, the

p q

CGS residual satis¬es

CGS

rk = w2k+1 .

We assume that CGS does not break down and, therefore, ±k = 0 for all

k ≥ 0. If we let r be the nearest integer less than or equal to a real r we

have

(3.23) Aym = (wm ’ wm+1 )/± (m’1)/2 ,

where the denominator is nonzero by assumption. We express (3.23) in matrix

form as

(3.24) AYm = Wm+1 Bm ,

where Ym is the N — m matrix with columns {yj }m , Wm+1 the N — (m + 1)

j=1

m+1

matrix with columns {wj }j=1 , and Bm is the (m + 1) — m matrix given by

«

1 0 ... 0

¬ .·

.. .·

¬ ’1 .

1 .

¬ ·

¬ ·

. .

Bm = ¬ 0 · diag(±0 , ±0 , . . . , ± (m’1)/2 )’1 .

. .

. .

0

¬ ·

¬. ·

..

¬. ·

. ’1 1

.

0 . . . 0 ’1

Our assumptions that no breakdown takes place imply that Km is the span

of {yj }m and hence

j=1

(3.25) xm = x0 + Ym z

for some z ∈ Rm . Therefore,

(3.26) rm = r0 ’ AYm z = Wm+1 (e1 ’ Bm z),

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

52 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where, as in our discussion of GMRES, e1 = (1, 0, . . . , 0)T ∈ Rm . In a sense,

the conditioning of the matrix Wm+1 can be improved by scaling. Let

(3.27) „¦ = diag(ω1 , ω2 , . . . , ωm )

and rewrite (3.26) as

rm = r0 ’ AYm z = Wm+1 „¦’1 (fm+1 ’ Hm z)

(3.28) m+1

where fm+1 = „¦m+1 e1 = ωm+1 e1 and Hm = „¦m+1 Bm is bidiagonal and hence

upper Hessenberg. The diagonal entries in the matrix „¦m+1 are called weights.

The weights make it possible to derive straightforward estimates of the residual

in terms of easily computed terms in the iteration.

If Wm+1 „¦’1 were an orthogonal matrix, we could minimize rm by solving

m+1

the least squares problem

(3.29) minimizez∈Rm fm+1 ’ Hm z 2 .

The quasi-minimization idea is to solve (3.29) (thereby quasi-minimizing the

residual) to obtain zm and then set

(3.30) xm = x0 + Ym zm .

Note that if k corresponds to the iteration counter for CGS, each k produces

two approximate solutions (corresponding to m = 2k ’ 1 and m = 2k).

The solution to the least squares problem (3.29) can be done by using

Givens rotations to update the factorization of the upper Hessenberg matrix

Hm [77] and this is re¬‚ected in the implementation description below. As

one can see from that description, approximate residuals are not computed in

Algorithm tfqmr. Instead we have the quasi-residual norm

„m = fm+1 ’ Hm zm 2 .

then the columns of Wm+1 „¦’1 have

Now if we pick the weights ωi = wi 2 m+1

norm one. Hence

√

Wm+1 „¦’1 (fm+1

(3.31) rm = ’ Hm z) ¤ „m m + 1.

2 2

m+1

We can, therefore, base our termination criterion on „m , and we do this in

Algorithm tfqmr.

One can [80], [78], also use the quasi-minimization condition to estimate

the TFQMR residual in terms of residual polynomials.

Theorem 3.6.1. Let A be an N — N matrix and let x0 , b ∈ RN be given.

Assume that the TFQMR iteration does not break down or terminate with

the exact solution for k iterations and let 1 ¤ m ¤ 2k. Let rm RES be the

GM

residual for the mth GMRES iteration. Let ξm be the smallest singular value

of Wm+1 „¦’1 . Then if ξm > 0

m+1

„m ¤ rm RES

GM

(3.32) 2 /ξm .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

53

GMRES ITERATION

Proof. We may write

rm RES = r0 + Ym zm RES

GM GM

and obtain

= Wm+1 „¦’1 (fm+1 ’Hm zm RES )

rm RES

GM GM

≥ ξm fm+1 ’Hm zm RES

GM

2.

2 2

m+1

Hence, by the quasi-minimization property

rm RES

GM

≥ ξm „m

2

as desired.

As a corollary we have a ¬nite termination result and a convergence

estimate for diagonalizable matrices similar to Theorem 3.1.3.

Corollary 3.6.1. Let A be an N — N matrix and let x0 , b ∈ RN be given.

Then within (N + 1)/2 iterations the TFQMR iteration will either break down

or terminate with the solution.

We apply Theorem 3.1.3 to (3.32) and use (3.31) to obtain the following

result.

Theorem 3.6.2. Let A = V ΛV ’1 be a nonsingular diagonalizable matrix.

Assume that TFQMR does not break down or terminate with the solution for k

iterations. For 1 ¤ m ¤ 2k let ξm be the smallest singular value of Wm+1 „¦’1

m+1

and let xm be given by (3.30). Then, if ξm > 0,

√

rm 2 ’1

¤ m + 1ξm κ(V ) max |φ(z)|

r0 z∈σ(A)

2

for all φ ∈ Pm .

The implementation below follows [77] with r0 = r0 used throughout.

ˆ

Algorithm 3.6.4. tfqmr(x, b, A, , kmax)

1. k = 0, w1 = y1 = r0 = b ’ Ax, u1 = v = Ay1 , d = 0

T

ρ0 = r0 r0 , „ = r 2 , θ = 0, · = 0

2. Do While k < kmax

(a) k = k + 1

T

(b) σk’1 = r0 v, ± = ρk’1 /σk’1 , y2 = y1 ’ ±k’1 v, u2 = Ay2

(c) For j = 1, 2 (m = 2k ’ 2 + j)

i. w = w ’ ±k’1 uj

d = yj + (θ2 ·/±k’1 )d

ii.

√

θ = w 2 /„ , c = 1/ 1 + θ2

iii.

„ = „ θc, · = c2 ±k’1

iv.

v. x = x + ·d

√

vi. If „ m + 1 ¤ b 2 terminate successfully

T

(d) ρk = r0 w, β = ρk /ρk’1

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

54 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

(e) y1 = w + βy2 , u1 = Ay1

(f) v = u1 + β(u2 + βv)

Note that y2 and u2 = Ay2 need only be computed if the loop in step 2c

does not terminate when j = 1. We take advantage of this in the MATLAB

code tfqmr to potentially reduce the matrix-vector product cost by one.

3.7. Examples for GMRES iteration

These examples use the code gmres from the collection of MATLAB codes. The

inputs, described in the comment lines, are the initial iterate x0 , the right-hand

side vector b, a MATLAB function for the matrix-vector product, and iteration

parameters to specify the maximum number of iterations and the termination

criterion. We do not pass the preconditioner to the GMRES code, rather

we pass the preconditioned problem. In all the GMRES examples, we limit

the number of GMRES iterations to 60. For methods like GMRES(m), Bi-

CGSTAB, CGNR, and TFQMR, whose storage requirements do not increase

with the number of iterations, we allow for more iterations.

We consider the discretization of the partial di¬erential equation

(Lu)(x, y) = ’(uxx (x, y) + uyy (x, y)) + a1 (x, y)ux (x, y)

(3.33)

+a2 (x, y)uy (x, y) + a3 (x, y)u(x, y) = f (x, y)

on 0 < x, y < 1 subject to homogeneous Dirichlet boundary conditions

u(x, 0) = u(x, 1) = u(0, y) = u(1, y) = 0, 0 < x, y < 1.

For general coe¬cients {aj } the operator L is not self-adjoint and its discretiza-

tion is not symmetric. As in § 2.7 we discretize with a ¬ve-point centered dif-

ference scheme with n2 points and mesh width h = 1/(n + 1). The unknowns

are

uij ≈ u(xi , xj )

where xi = ih for 1 ¤ i ¤ n. We compute Lu in a matrix-free manner

as we did in § 2.7. We let n = 31 to create a system with 961 unknowns.

As before, we expect second-order accuracy and terminate the iteration when

rk 2 ¤ h2 b 2 ≈ 9.8 — 10’4 b 2 .

As a preconditioner we use the fast Poisson solver fish2d described in § 2.7.

The motivation for this choice is similar to that for the conjugate gradient

computations and has been partially explained theoretically in [33], [34], [121],

and [139]. If we let Gu denote the action of the Poisson solver on u, the

preconditioned system is GLu = Gf .

For the computations reported in this section we took

a1 (x, y) = 1, a2 (x, y) = 20y, and a3 (x, y) = 1.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

55

GMRES ITERATION

u0 = 0 was the initial iterate. As in § 2.7 we took the right hand side so that

the exact solution was the discretization of

10xy(1 ’ x)(1 ’ y) exp(x4.5 ).

In Figs. 3.1 and 3.2 we plot iteration histories corresponding to precondi-

tioned or unpreconditioned GMRES. For the restarted iteration, we restarted

the algorithm after three iterations, GMRES(3) in the terminology of § 3.4.

Note that the plots are semilog plots of r 2 / b 2 and that the right-hand

sides in the preconditioned and unpreconditioned cases are not the same. In

both ¬gures the solid line is the history of the preconditioned iteration and the

dashed line that of the unpreconditioned. Note the importance of precondi-

tioning. The MATLAB flops command indicates that the unpreconditioned

iteration required 7.6 million ¬‚oating point operations and converged in 56

iterations. The preconditioned iteration required 1.3 million ¬‚oating point op-

erations and terminated after 8 iterations. Restarting after 3 iterations is more

costly in terms of the iteration count than not restarting. However, even in

the unpreconditioned case, the restarted iteration terminated successfully. In

this case the MATLAB flops command indicates that the unpreconditioned

iteration terminated after 223 iterations and 9.7 million ¬‚oating point opera-

tions and the preconditioned iteration required 13 iterations and 2.6 million

¬‚oating point operations.

The execution times in our environment indicate that the preconditioned

iterations are even faster than the di¬erence in operation counts indicates.

The reason for this is that the FFT routine is a MATLAB intrinsic and is not

interpreted MATLAB code. In other computing environments preconditioners

which vectorize or parallelize particularly well might perform in the same way.

3.8. Examples for CGNR, Bi-CGSTAB, and TFQMR iteration

When a transpose is available, CGNR (or CGNE) is an alternative to GMRES

that does not require storage of the iteration history. For elliptic problems like

(3.33), the transpose (adjoint) of L is given by

L— u = ’uxx ’ uyy ’ a1 ux ’ a2 uy + a3 u

as one can derive by integration by parts. To apply CGNR to the precondi-

tioned problem, we require the transpose of GL, which is given by

(GL)— u = L— G— u = L— Gu.

The cost of the application of the transpose of L or GL is the same as that

of the application of L or GL itself. Hence, in the case where the cost of the

matrix-vector product dominates the computation, a single CGNR iteration

costs roughly the same as two GMRES iterations.

However, CGNR has the e¬ect of squaring the condition number, this e¬ect

has serious consequences as Fig. 3.3 shows. The dashed line corresponds to

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

56 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

0

10

-1

10

Relative Residual

-2

10

-3

10

-4

10

0 10 20 30 40 50 60

Iterations

Fig. 3.1. GMRES for 2-D elliptic equation.

0

10

-1

10

Relative Residual

-2

10

-3

10

-4

10

0 50 100 150 200 250

Iterations

Fig. 3.2. GMRES(3) for 2-D elliptic equation.