analytic Jacobian rather than evaluate a di¬erence Jacobian. How do

the execution times and ¬‚op counts change?

5.7.22. Modify dirder by setting the forward di¬erence step h = 10’2 . How

does this change a¬ect the convergence rate observations that you have

made?

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

94 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

5.7.23. Add noise to your function evaluation with a random number generator,

such as the the MATLAB rand function. If the size of the noise is

O(10’4 ) and you make the appropriate adjustments to dirder, how are

the convergence rates di¬erent? What is an appropriate termination

criterion? At what point does the iteration stagnate?

5.7.24. Assume that the standard assumptions hold, that the cost of a function

evaluation is O(N 2 ) ¬‚oating-point operations, the cost of a Jacobian is

O(N ) function evaluations, and that x0 is near enough to x— so that the

Newton iteration converges q-quadratically to x— . Show that the number

n of Newton iterations needed to obtain en ¤ e0 is O(log( )) and

that the number of ¬‚oating-point operations required is O(N 3 log( )).

What about the chord method?

5.7.25. Develop a sparse version of nsol for problems with tridiagonal Jacobian.

In MATLAB, for example, you could modify diffjac to use the

techniques of [47] and nsol to use the sparse matrix factorization in

MATLAB. Apply your code to the central di¬erence discretization of the

two-point boundary value problem

’u = sin(u) + f (x), u(0) = u(1) = 0.

Here, the solution is u— (x) = x(1 ’ x), and

f (x) = 2 ’ sin(x(1 ’ x)).

Let u0 = 0 be the initial iterate.

5.7.26. Suppose you try to ¬nd an eigenvector-eigenvalue pair (φ, ») of an N —N

matrix A by solving the system of N + 1 nonlinear equations

Aφ ’ »φ 0

(5.23) F (φ, ») = =

φT φ ’ 1 0

for the vector x = (φT , »)T ∈ RN +1 . What is the Jacobian of this

system? If x = (φ, ») ∈ RN +1 is an eigenvector-eigenvalue pair, when is

F (x) nonsingular? Relate the application of Newton™s method to (5.23)

to the inverse power method. See [150] for more development of this

idea.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 6

Inexact Newton Methods

Theorem 5.4.1 describes how errors in the derivative/function a¬ect the

progress in the Newton iteration. Another way to look at this is to ask how

an approximate solution of the linear equation for the Newton step a¬ects the

iteration. This was the view taken in [55] where inexact Newton methods in

which the step satis¬es

(6.1) F (xc )s + F (xc ) ¤ ·c F (xc )

are considered. Any approximate step is accepted provided that the relative

residual of the linear equation is small. This is quite useful because conditions

like (6.1) are precisely the small linear residual termination conditions for

iterative solution of the linear system for the Newton step. Such methods are

not new. See [145] and [175], for example, for discussion of these ideas in the

context of the classical stationary iterative methods. In most of this chapter

we will focus on the approximate solution of the equation for the Newton step

by GMRES, but the other Krylov subspace methods discussed in Chapters 2

and 3 and elsewhere can also be used. We follow [69] and refer to the term ·c

on the right hand side of (6.1) as the forcing term.

6.1. The basic estimates

We will discuss speci¬c implementation issues in § 6.2. Before that we will

give the basic result from [55] to show how a step that satis¬es (6.1) a¬ects

the convergence. We will present this result in two parts. In § 6.1.1 we give a

straightforward analysis that uses the techniques of Chapter 5. In § 6.1.2 we

show how the requirements of the simple analysis can be changed to admit a

more aggressive (i.e., larger) choice of the parameter ·.

6.1.1. Direct analysis. The proof and application of Theorem 6.1.1 should

be compared to that of Theorem 5.1.1 and the other results in Chapter 5.

Theorem 6.1.1. Let the standard assumptions hold. Then there are δ and

KI such that if xc ∈ B(δ), s satis¬es (6.1), and x+ = xc + s then

(6.2) e+ ¤ KI ( ec + ·c ) ec .

95

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

96 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and

Theorem 5.1.1 hold. To prove the ¬rst assertion (6.2) note that if2

r = ’F (xc )s ’ F (xc )

is the linear residual then

s + F (xc )’1 F (xc ) = ’F (xc )’1 r

and

e+ = ec + s = ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r.

(6.3)

Now, (6.1), (4.7), and (4.6) imply that

s + F (xc )’1 F (xc ) ¤ F (xc )’1 ·c F (xc )

¤ 4κ(F (x— ))·c ec .

Hence, using (6.3) and Theorem 5.1.1

¤ ec ’ F (xc )’1 F (xc ) + 4κ(F (x— ))·c ec

e+

2 + 4κ(F (x— ))·c ec ,

¤ K ec

where K is the constant from (5.2). If we set

KI = K + 4κ(F (x— )),

the proof is complete.

We summarize the implications of Theorem 6.1.1 for iterative methods in

the next result, which is a direct consequence of (6.2) and will su¬ce to explain

most computational observations.

Theorem 6.1.2. Let the standard assumptions hold. Then there are δ and

· such that if x0 ∈ B(δ), {·n } ‚ [0, · ], then the inexact Newton iteration

¯ ¯

xn+1 = xn + sn ,

where

F (xn )sn + F (xn ) ¤ ·n F (xn )

converges q-linearly to x— . Moreover

• if ·n ’ 0 the convergence is q-superlinear, and

p

• if ·n ¤ K· F (xn ) for some K· > 0 the convergence is q-superlinear

with q-order 1 + p.

2 Our

de¬nition of r is consistent with the idea that the residual in a linear equation is b ’ Ax. In

[55] r = F (xc )s + F (xc ).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

97

INEXACT NEWTON METHODS

Proof. Let δ be small enough so that (6.2) holds for xc ∈ B(δ). Reduce δ

and · if needed so that

¯

KI (δ + · ) < 1,

¯

where KI is from (6.2). Then if n ≥ 0 and xn ∈ B(δ) we have

en+1 ¤ KI ( en + ·n ) en ¤ KI (δ + · ) en < en .

¯

This proves q-linear convergence with a q-factor of KI (δ + · ).

¯

If ·n ’ 0 then q-superlinear convergence follows from the de¬nition. If

p

·n ¤ K· F (xn )

then we may use (4.7) and (6.2) to conclude

1’p

+ K· 2p F (x— ) p ) en 1+p

en+1 ¤ KI ( en

which completes the proof.

6.1.2. Weighted norm analysis. Since KI = O(κ(F (x— ))), one might

conclude from Theorem 6.1.2 that if F (x— ) is ill conditioned very small forcing

terms must be used. This is not the case and the purpose of this subsection

is to describe more accurately the necessary restrictions on the forcing terms.

The results here di¬er from those in § 6.1.1 in that no restriction is put on

the sequence {·n } ‚ [0, 1) other than requiring that 1 not be an accumulation

point.

Theorem 6.1.3. Let the standard assumptions hold. Then there is δ such

that if xc ∈ B(δ), s satis¬es (6.1), x+ = xc + s, and ·c ¤ · < · < 1, then

¯

F (x— )e+ ¤ · F (x— )ec .

(6.4) ¯

Proof. To prove (6.4) note that Theorem 4.0.1 implies that

2

γ ec

—

F (xc ) ¤ F (x )ec + .

2

Since

ec = F (x— )’1 F (x— )ec ¤ F (x— )’1 F (x— )ec

we have, with

γ F (x— )’1

M0 =

2

F (xc ) ¤ (1 + M0 δ) F (x— )ec .

(6.5)

Now,

F (x— )e+ = F (x— )(ec + s)

= F (x— )(ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

98 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

By Theorem 5.1.1

F (x— )(ec ’ F (xc )’1 F (xc )) ¤ K F (x— ) ec 2 .

Hence,

F (x— )e+ ¤ F (x— )F (xc )’1 r + K F (x— ) ec 2 .

(6.6)

Since

F (x— )F (xc )’1 r ¤ r + (F (x— ) ’ F (xc ))F (xc )’1 r

¤ (1 + 2γ F (x— )’1 ec ) r ,

we may set

M1 = 2γ F (x— )’1 , and M2 = K F (x— )

and obtain, using (4.7) and (6.5),

F (x— )e+ ¤ (1 + M1 δ) r + M2 δ ec

¤ (1 + M1 δ)(1 + M0 δ)·c F (x— )ec

(6.7)

+M2 δ F (x— )’1 F (x— )ec

¤ ((1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ) F (x— )ec .

Now let δ be small enough so that

(1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ¤ ·

¯

and the proof is complete.

Note that the distance δ from the initial iterate to the solution may have

to be smaller for (6.4) to hold than for (6.2). However (6.4) is remarkable in its

assertion that any method that produces a step with a linear residual less than

that of the zero step will reduce the norm of the error if the error is measured

with the weighted norm

· — = F (x— ) · .

In fact, Theorem 6.1.3 asserts q-linear convergence in the weighted norm if the

initial iterate is su¬ciently near x— . This is made precise in Theorem 6.1.4.

The application of Theorem 6.1.3 described in Theorem 6.1.4 di¬ers from

Theorem 6.1.2 in that we do not demand that {·n } be bounded away from 1

by a su¬ciently large amount, just that 1 not be an accumulation point. The

importance of this theorem for implementation is that choices of the sequence

of forcing terms {·n } (such as ·n = .5 for all n) that try to minimize the

number of inner iterations are completely justi¬ed by this result if the initial

iterate is su¬ciently near the solution. We make such a choice in one of the

examples considered in § 6.4 and compare it to a modi¬cation of a choice from

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

99

INEXACT NEWTON METHODS

[69] that decreases ·n rapidly with a view toward minimizing the number of

outer iterations.

Theorem 6.1.4. Let the standard assumptions hold. Then there is δ such

that if x0 ∈ B(δ), {·n } ‚ [0, ·] with · < · < 1, then the inexact Newton

¯

iteration

xn+1 = xn + sn ,

where

F (xn )sn + F (xn ) ¤ ·n F (xn )

converges q-linearly with respect to · — to x— . Moreover

• if ·n ’ 0 the convergence is q-superlinear, and

• if ·n ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear

with q-order 1 + p.

Proof. The proof uses both (6.4) and (6.2). Our ¬rst task is to relate the

norms · and · — so that we can estimate δ. Let δ0 be such that (6.4) holds

for ec < δ0 .

For all x ∈ RN ,

¤ F (x— ) x ¤ κ(F (x— )) x — ,

(6.8) x —

Note that e < δ0 if

< δ— = F (x— ) δ0

e —

Set δ = F (x— ) ’1 δ— . Then e0 — < δ— if e0 < δ.

Our proof does not rely on the (possibly false) assertions that en < δ for

all n or that xn ’ x— q-linearly with respect to the unweighted norm. Rather

we note that (6.4) implies that if en — < δ— then

(6.9) en+1 ¤ · en

¯ < δ—

— —

and hence xn ’ x— q-linearly with respect to the weighted norm. This proves

the ¬rst assertion.

To prove the assertions on q-superlinear convergence, note that since

xn ’ x— , eventually en will be small enough so that the conclusions of

Theorem 6.1.1 hold. The superlinear convergence results then follow exactly

as in the proof of Theorem 6.1.2.

We close this section with some remarks on the two types of results in

Theorems 6.1.2 and 6.1.4. The most signi¬cant di¬erence is the norm in which

q-linear convergence takes place. q-linear convergence with respect to the

weighted norm · — is equivalent to q-linear convergence of the sequence of

nonlinear residuals { F (xn ) }. We state this result as Proposition 6.1.1 and

leave the proof to the reader in Exercise 6.5.2. The implications for the choice of

the forcing terms {·n } are also di¬erent. While Theorem 6.1.2 might lead one

to believe that a small value of ·n is necessary for convergence, Theorem 6.1.4

shows that the sequence of forcing terms need only be kept bounded away from

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

100 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

1. A constant sequence such as ·n = .5 may well su¬ce for linear convergence,

but at a price of a greater number of outer iterations. We will discuss other

factors in the choice of forcing terms in § 6.3.

Proposition 6.1.1. Let the standard assumptions hold and let xn ’ x— .

Then F (xn ) converges q-linearly to 0 if and only if en — does.

6.1.3. Errors in the function. In this section, we state two theorems on

inexact local improvement. We leave the proofs, which are direct analogs to

the proofs of Theorems 6.1.1 and 6.1.3, to the reader as exercises. Note that

errors in the derivative, such as those arising from a di¬erence approximation

of the action of F (xc ) on a vector, can be regarded as part of the inexact

solution of the equation for the Newton step. See [55] or Proposition 6.2.1 and

its proof for an illustration of this point.

Theorem 6.1.5. Let the standard assumptions hold. Then there are δ and

KI such that if xc ∈ B(δ), s satis¬es

(6.10) F (xc )s + F (xc ) + (xc ) ¤ ·c F (xc ) + (xc )

and x+ = xc + s, then

(6.11) e+ ¤ KI (( ec + ·c ) ec + (xc ) ).

Theorem 6.1.6. Let the standard assumptions hold. Then there is δ such

that if xc ∈ B(δ), s satis¬es (6.10), x+ = xc + s, and ·c ¤ · < · < 1, then

¯

there is KE such that

F (x— )e+ ¤ · F (x— )ec + KE (xc ) .

(6.12) ¯

6.2. Newton-iterative methods

A Newton-iterative method realizes (6.1) with an iterative method for the linear

system for the Newton step, terminating when the relative linear residual is

smaller than ·c (i.e, when (6.1) holds). The name of the method indicates

which linear iterative method is used, for example, Newton-SOR, Newton-CG,

Newton-GMRES, would use SOR, CG, or GMRES to solve the linear system.

This naming convention is taken from [175] and [145]. These methods have also

been called truncated Newton methods [56], [136] in the context of optimization.

Typically the nonlinear iteration that generates the sequence {xn } is called

the outer iteration and the linear iteration that generates the approximations

to the steps is called the inner iteration.