reason for this is that the Krylov methods use the scalar product and the

estimates in Chapters 2 and 3 are in terms of the Euclidean norm. In the

examples discussed in § 6.4 we will scale the l2 norm by a factor of 1/N so

that the results for the di¬erential and integral equations will be independent

of the computational mesh.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

101

INEXACT NEWTON METHODS

6.2.1. Newton GMRES. We provide a detailed analysis of the Newton-

GMRES iteration. We begin by discussing the e¬ects of a forward di¬erence

approximation to the action of the Jacobian on a vector.

If the linear iterative method is any one of the Krylov subspace methods

discussed in Chapters 2 and 3 then each inner iteration requires at least one

evaluation of the action of F (xc ) on a vector. In the case of CGNR and CGNE,

an evaluation of the action of F (xc )T on a vector is also required. In many

implementations [20], [24], the action of F (xc ) on a vector w is approximated

by a forward di¬erence, (5.15), Dh F (x : w) for some h. It is important to note

that this is entirely di¬erent from forming the ¬nite di¬erence Jacobian ∇h F (x)

and applying that matrix to w. In fact, as pointed out in [20], application

of GMRES to the linear equation for the Newton step with matrix-vector

products approximated by ¬nite di¬erences is the same as the application of

GMRES to the matrix Gh F (x) whose last k ’ 1 columns are the vectors

vk = Dh F (x : vk’1 )

The sequence {vk } is formed in the course of the forward-di¬erence GMRES

iteration.

To illustrate this point, we give the forward-di¬erence GMRES algorithm

for computation of the Newton step, s = ’F (x)’1 F (x). Note that matrix-

vector products are replaced by forward di¬erence approximations to F (x), a

sequence of approximate steps {sk } is produced, that b = ’F (x), and that the

initial iterate for the linear problem is the zero vector. We give a MATLAB

implementation of this algorithm in the collection of MATLAB codes.

Algorithm 6.2.1. fdgmres(s, x, F, h, ·, kmax, ρ)

1. s = 0, r = ’F (x), v1 = r/ r 2 , ρ = r 2 , β = ρ, k = 0

2. While ρ > · F (x) and k < kmax do

2

(a) k = k + 1

(b) vk+1 = Dh F (x : vk )

for j = 1, . . . k

T

i. hjk = vk+1 vj

ii. vk+1 = vk+1 ’ hjk vj

(c) hk+1,k = vk+1 2

(d) vk+1 = vk+1 / vk+1 2

(e) e1 = (1, 0, . . . , 0)T ∈ Rk+1

Minimize βe1 ’ Hk y k Rk+1 to obtain y k ∈ Rk .

(f) ρ = βe1 ’ Hk y k Rk+1 .

3. s = Vk y k .

In our MATLAB implementation we solve the least squares problem in

step 2e by the Givens rotation approach used in Algorithm gmres.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

102 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

As should be clear from inspection of Algorithm gmres, the di¬erence

between ∇h F and Gh F is that the columns are computed by taking directional

derivatives based on two di¬erent orthonormal bases: ∇h F using the basis of

unit vectors in coordinate directions and Gh F the basis for the Krylov space Kk

⊥

constructed by algorithm fdgmres. The action of Gh F on Kk , the orthogonal

complement of Kk , is not used by the algorithm and, for the purposes of

⊥

analysis, could be speci¬ed by the action of ∇h F on any basis for Kk . Hence

Gh F is also ¬rst order accurate in h. Assuming that there is no error in the

evaluation of F we may summarize the observations above in the following

proposition, which is a special case of the more general results in [20].

Proposition 6.2.1. Let the standard assumptions hold. Let · ∈ (0, 1).

¯ ¯

Then there are CG , h, and δ such that if x ∈ B(δ), h ¤ h, and Algo-

rithm fdgmres terminates with k < kmax then the computed step s satis¬es

(6.13) F (x)s + F (x) < (· + CG h) F (x) 2 .

2

Proof. First let δ be small enough so that B(δ) ‚ „¦ and the conclusions

to Lemma 4.3.1 hold. Let {uj } be any orthonormal basis for RN such that

uj = vj for 1 ¤ j ¤ k, where {vj }k is the orthonormal basis for Kk generated

j=1

by Algorithm fdgmres.

Consider the linear map de¬ned by its action on {uj }

Buj = Dh F (x : uj ).

Note that

1

Dh F (x : uj ) = 0F (x + th x 2 uj )uj dt

1

= F (x)uj + 0 (F (x + th x 2 uj ) ’ F (x))uj dt.

Since F is Lipschitz continuous and {uj } is an orthonormal basis for RN we

have, with γ = γ( x— 2 + δ),

¯

(6.14) B ’ F (x) ¤ h x 2 γ/2 ¤ h¯ /2

γ

2

Since the linear iteration (fdgmres) terminates in k < kmax iterations, we

have, since B and Gh F agree on Kk ,

Bs + F (x) ¤ · F (x)

2 2

and therefore

(6.15) F (x)s + F (x) ¤ · F (x) + h¯ s 2 /2.

γ

2 2

Assume that

¯γ ’1

h¯ ¤ F (x— )’1 2 /2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

103

INEXACT NEWTON METHODS

Then Lemma 4.3.1 and (6.15) imply

’1 ’1

F (x— )’1 s 2 /2 ¤ F (x)’1 s 2

2 2

¯γ

¤ F (x)s ¤ (1 + ·) F (x) + h¯ s 2 /2.

2 2

Therefore,

¤ 4(1 + ·) F (x— )’1

(6.16) s F (x) 2 .

2 2

Combining (6.16) with (6.15) completes the proof with CG = 4¯ (1 + γ

— )’1 .

·) F (x 2

Proposition 6.2.1 implies that a ¬nite di¬erence implementation will not

a¬ect the performance of Newton-GMRES if the steps in the forward di¬erence

approximation of the derivatives are su¬ciently small. In an implementation,

therefore, we must specify not only the sequence {·n }, but also the steps {hn }

used to compute forward di¬erences. Note also that Proposition 6.2.1 applies

to a restarted GMRES because upon successful termination (6.15) will hold.

We summarize our results so far in the analogs of Theorem 6.1.4 and

Theorem 6.1.2. The proofs are immediate consequences of Theorems 6.1.2,

6.1.4, and Proposition 6.2.1 and are left as (easy) exercises.

Theorem 6.2.1. Assume that the assumptions of Proposition 6.2.1 hold.

Then there are δ, σ such that if x0 ∈ B(δ) and the sequences {·n } and {hn }

¯

satisfy

σn = ·n + CG hn ¤ σ ¯

then the forward di¬erence Newton-GMRES iteration

xn+1 = xn + sn

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly and sn satis¬es

(6.17) F (xn )sn + F (xn ) < σn F (xn ) 2 .

2

Moreover,

• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear

2

with q-order 1 + p.

Theorem 6.2.2. Assume that the assumptions of Proposition 6.2.1 hold.

Then there is δ such that if x0 ∈ B(δ) and the sequences {·n } and {hn } satisfy

σn = ·n + CG hn ∈ [0, σ ]

¯

with 0 < σ < 1 then the forward di¬erence Newton-GMRES iteration

¯

xn+1 = xn + sn

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

104 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly with respect to · — and sn satis¬es (6.17). Moreover

• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear

2

with q-order 1 + p.

In the case where hn is approximately the square root of machine roundo¬,

σn ≈ ·n if ·n is not too small and Theorem 6.2.2 states that the observed

behavior of the forward di¬erence implementation will be the same as that of

an implementation with exact derivatives.

6.2.2. Other Newton-iterative methods. Other iterative methods may

be used as the solver for the linear equation for the Newton step. Newton-

multigrid [99] is one approach using a stationary iterative method. See also

[6], [111]. If F is symmetric and positive de¬nite, as it is in the context of

unconstrained minimization, [63], Newton-CG is a possible approach.

When storage is restricted or the problem is very large, GMRES may not

be practical. GMRES(m), for a small m, may not converge rapidly enough

to be useful. CGNR and CGNE o¬er alternatives, but require evaluation of

transpose-vector products. Bi-CGSTAB and TFQMR should be considered in

all cases where there is not enough storage for GMRES(m) to perform well.

If a transpose is needed, as it will be if CGNR or CGNE is used as the

iterative method, a forward di¬erence formulation is not an option because

approximation of the transpose-vector product by di¬erences is not possible.

Computation of ∇h F (x) and its transpose analytically is one option as is

di¬erentiation of F automatically or by hand. The reader may explore this

further in Exercise 6.5.13.

The reader may also be interested in experimenting with the other Krylov

subspace methods described in § 3.6, [78], and [12].

6.3. Newton-GMRES implementation

We provide a MATLAB code nsolgm in the collection that implements

a forward-di¬erence Newton-GMRES algorithm. This algorithm requires

di¬erent inputs than nsol. As input we must give the initial iterate, the

function, the vector of termination tolerances as we did for nsol. In addition,

we must provide a method for forming the sequence of forcing terms {·n }.

We adjust · as the iteration progresses with a variation of a choice from

[69]. This issue is independent of the particular linear solver and our discussion

is in the general inexact Newton method setting. Setting · to a constant for

the entire iteration is often a reasonable strategy as we see in § 6.4, but the

choice of that constant depends on the problem. If a constant · is too small,

much e¬ort can be wasted in the initial stages of the iteration. The choice in

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

105

INEXACT NEWTON METHODS

[69] is motivated by a desire to avoid such over solving. Over solving means

that the linear equation for the Newton step is solved to a precision far beyond

what is needed to correct the nonlinear iteration. As a measure of the degree

to which the nonlinear iteration approximates the solution we begin with

·n = γ F (xn ) 2 / F (xn’1 ) 2 ,

A

A

where γ ∈ (0, 1] is a parameter. If ·n is uniformly bounded away from 1,

A

then setting ·n = ·n for n > 0 would guarantee q-quadratic convergence by

Theorem 6.1.1. In this way, the most information possible would be extracted

from the inner iteration. In order to specify the choice at n = 0 and bound

the sequence away from 1 we set

±

·max , n = 0,

B

(6.18) ·n =

min(· A

max , ·n ), n > 0.

In (6.18) the parameter ·max is an upper limit on the sequence {·n }. In [69]

the choices γ = .9 and ·max = .9999 are used.

B

It may happen that ·n is small for one or more iterations while xn is still

far from the solution. A method of safeguarding was suggested in [69] to avoid

volatile decreases in ·n . The idea is that if ·n’1 is su¬ciently large we do not

let ·n decrease by much more than a factor of ·n’1 .

±

·max , n = 0,

C A 2

min(·max , ·n ), n > 0, γ·n’1 ¤ .1,

(6.19) ·n =

A 2 2

min(·max , max(·n , γ·n’1 )), n > 0, γ·n’1 > .1.

The constant .1 is somewhat arbitrary. This safeguarding does improve the

performance of the iteration.

There is a chance that the ¬nal iterate will reduce F far beyond the

desired level and that the cost of the solution of the linear equation for the last

step will be more accurate than is really needed. This oversolving on the ¬nal

step can be controlled comparing the norm of the current nonlinear residual

F (xn ) to the nonlinear residual norm at which the iteration would terminate

„t = „a + „r F (x0 )

and bounding ·n from below by a constant multiple of „t / F (xn ) . The

algorithm nsolgm does this and uses

C

(6.20) ·n = min(·max , max(·n , .5„t / F (xn ) )).

Exercise 6.5.9 asks the reader to modify nsolgm so that other choices of

the sequence {·n } can be used.

We use dirder to approximate directional derivatives and use the default

value of h = 10’7 . In all the examples in this book we use the value γ = .9 as

recommended in [69].

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

106 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Algorithm 6.3.1. nsolgm(x, F, „, ·)

√

1. rc = r0 = F (x) 2 / N

√

2. Do while F (x) 2 / N > „r r0 + „a

(a) Select ·.

(b) fdgmres(s, x, F, ·)

(c) x = x + s

(d) Evaluate F (x)

√

(e) r+ = F (x) 2 / N , σ = r+ /rc , rc = r+

(f) If F (x) ¤ „r r0 + „a exit.

2

Note that the cost of an outer iterate is one evaluation of F to compute

the value at the current iterate and other evaluations of F to compute the

forward di¬erences for each inner iterate. Hence if a large value of · can be

used, the cost of the entire iteration can be greatly reduced. If a maximum

of m GMRES iterations is needed (or GMRES(m) is used as a linear solver)

the storage requirements of Algorithm nsolgm are the m + 5 vectors x, F (x),

x + hv, F (x + hv), s, and the Krylov basis {vk }m .

k=1

Since GMRES forms the inner iterates and makes termination decisions

based on scalar products and l2 norms, we also terminate the outer iteration on

small l2 norms of the nonlinear residuals. However, because of our applications

to di¬erential and integral equations, we scale the l2 norm of the nonlinear

√

residual by a factor of 1/ N so that constant functions will have norms that

are independent of the computational mesh.

For large and poorly conditioned problems, GMRES will have to be

restarted. We illustrate the consequences of this in § 6.4 and ask the reader to

make the necessary modi¬cations to nsolgm in Exercise 6.5.9.

The preconditioners we use in § 6.4 are independent of the outer iteration.

Since the Newton steps for M F (x) = 0 are the same as those for F (x) = 0, it

is equivalent to precondition the nonlinear equation before calling nsolgm.

6.4. Examples for Newton-GMRES

In this section we consider two examples. We revisit the H-equation and solve

a preconditioned nonlinear convection-di¬usion equation.

In all the ¬gures we plot the relative nonlinear residual F (xn ) 2 / F (x0 ) 2

against the number of function evaluations required by all inner and outer

iterations to compute xn . Counts of function evaluations corresponding to

outer iterations are indicated by circles. From these plots one can compare

not only the number of outer iterations, but also the total cost. This enables

us to directly compare the costs of di¬erent strategies for selection of ·. Note

that if only a single inner iteration is needed, the total cost of the outer iteration

will be two function evaluations since F (xc ) will be known. One new function

evaluation will be needed to approximate the action of F (xc ) on a vector

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.