For de¬niteness in the description of Algorithm newton we use an LU

factorization of F . Any other appropriate factorization such as QR or

Cholesky could be used as well. The inputs of the algorithm are the initial

iterate x, the nonlinear map F , and a vector of termination tolerances

„ = („r , „a ) ∈ R2 .

Algorithm 5.3.1. newton(x, F, „ )

1. r0 = F (x)

2. Do while F (x) > „r r0 + „a

(a) Compute F (x)

(b) Factor F (x) = LU .

(c) Solve LU s = ’F (x)

(d) x = x + s

(e) Evaluate F (x).

One approach to reducing the cost of items 2a and item 2b in the Newton

iteration is to move them outside of the main loop. This means that the linear

approximation of F (x) = 0 that is solved at each iteration has derivative

determined by the initial iterate.

x+ = xc ’ F (x0 )’1 F (xc ).

This method is called the chord method. The inputs are the same as for

Newton™s method.

Algorithm 5.3.2. chord(x, F, „ )

1. r0 = F (x)

2. Compute F (x)

3. Factor F (x) = LU .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

75

NEWTON™S METHOD

4. Do while F (x) > „r r0 + „a

(a) Solve LU s = ’F (x)

(b) x = x + s

(c) Evaluate F (x).

The only di¬erence in implementation from Newton™s method is that the

computation and factorization of the Jacobian are done before the iteration

is begun. The di¬erence in the iteration itself is that an approximation to

F (xc ) is used. Similar di¬erences arise if F (xc ) is numerically approximated

by di¬erences. We continue in the next section with an analysis of the e¬ects

of errors in the function and Jacobian in a general context and then consider

the chord method and di¬erence approximation to F as applications.

5.4. Errors in the function and derivative

Suppose that F and F are computed inaccurately so that F + and F + ∆

are used instead of F and F in the iteration. If ∆ is su¬ciently small, the

resulting iteration can return a result that is an O( ) accurate approximation

to x— . This is di¬erent from convergence and was called “local improvement”

in [65]. These issues are discussed in [201] as well. If, for example, is entirely

due to ¬‚oating-point roundo¬, there is no reason to expect that F (xn ) will

ever be smaller than in general. We will refer to this phase of the iteration

in which the nonlinear residual is no longer being reduced as stagnation of the

iteration.

Theorem 5.4.1. Let the standard assumptions hold. Then there are

¯

K > 0, δ > 0, and δ1 > 0 such that if xc ∈ B(δ) and ∆(xc ) < δ1 then

x+ = xc ’ (F (xc ) + ∆(xc ))’1 (F (xc ) + (xc ))

is de¬ned (i.e., F (xc ) + ∆(xc ) is nonsingular) and satis¬es

¯ 2

(5.5) e+ ¤ K( ec + ∆(xc ) ec + (xc ) ).

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and

Theorem 5.1.1 hold. Let

xN = xc ’ F (xc )’1 F (xc )

+

and note that

x+ = xN + (F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 )F (xc ) ’ (F (xc ) + ∆(xc ))’1 (xc ).

+

Lemma 4.3.1 and Theorem 5.1.1 imply

2 + 2 F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ) F (x— ) ec

e+ ¤ K ec

(5.6)

+ (F (xc ) + ∆(xc ))’1 (xc ) .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

76 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

If

∆(xc ) ¤ F (x— )’1 ’1

/4

then Lemma 4.3.1 implies that

∆(xc ) ¤ F (xc )’1 ’1

/2

and the Banach Lemma implies that F (xc ) + ∆(xc ) is nonsingular and

(F (xc ) + ∆(xc ))’1 ¤ 2 F (xc )’1 ¤ 4 F (x— )’1 .

Hence

F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ¤ 8 F (x— )’1 2

∆(xc ) .

We use these estimates and (5.6) to obtain

2

+ 16 F (x— )’1 2

F (x— ) ∆(xc ) ec + 4 F (x— )’1

e+ ¤ K ec (xc ) .

Setting

¯

K = K + 16 F (x— )’1 2

F (x— ) + 4 F (x— )’1

completes the proof.

The remainder of this section is devoted to applications of Theorem 5.4.1.

5.4.1. The chord method. Recall that the chord method is given by

x+ = xc ’ F (x0 )’1 F (xc ).

In the language of Theorem 5.4.1

(xc ) = 0, ∆(xc ) = F (x0 ) ’ F (xc ).

Hence, if xc , x0 ∈ B(δ) ‚ „¦

(5.7) ∆(xc ) ¤ γ x0 ’ xc ¤ γ( e0 + ec ).

We apply Theorem 5.4.1 to obtain the following result.

Theorem 5.4.2. Let the standard assumptions hold. Then there are

KC > 0 and δ > 0 such that if x0 ∈ B(δ) the chord iterates converge q-linearly

to x— and

(5.8) en+1 ¤ KC e0 en .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and the conclusions of

Theorem 5.4.1 hold. Assume that xn ∈ B(δ). Combining (5.7) and (5.5)

implies

¯ ¯

en+1 ¤ K( en (1 + γ) + γ e0 ) en ¤ K(1 + 2γ)δ en .

Hence if δ is small enough so that

¯

K(1 + 2γ)δ = · < 1

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

77

NEWTON™S METHOD

then the chord iterates converge q-linearly to x— . Q-linear convergence implies

¯

that en < e0 and hence (5.8) holds with KC = K(1 + 2γ).

Another variation of the chord method is

x+ = xc ’ A’1 F (xc ),

where A ≈ F (x— ). Methods of this type may be viewed as preconditioned

nonlinear Richardson iteration. Since

∆(xc ) = A ’ F (xc ) ¤ A ’ F (x— ) + F (x— ) ’ F (xc ) ,

if xc ∈ B(δ) ‚ „¦ then

∆(xc ) ¤ A ’ F (x— ) + γ ec ¤ A ’ F (x— ) + γδ.

Theorem 5.4.3. Let the standard assumptions hold. Then there are

KA > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and A ’ F (x— ) < δ1 then

the iteration

xn+1 = xn ’ A’1 F (xn )

converges q-linearly to x— and

en+1 ¤ KA ( e0 + A ’ F (x— ) ) en .

(5.9)

5.4.2. Approximate inversion of F . Another way to implement chord-

type methods is to provide an approximate inverse of F . Here we replace

F (x)’1 by B(x), where the action of B on a vector is less expensive to compute

than a solution using the LU factorization. Rather than express the iteration

in terms of B ’1 (x) ’ F (x) and using Theorem 5.4.3 one can proceed directly

from the de¬nition of approximate inverse.

Note that if B is constant (independent of x) then the iteration

x+ = xc ’ BF (xc )

can be viewed as a preconditioned nonlinear Richardson iteration.

We have the following result.

Theorem 5.4.4. Let the standard assumptions hold. Then there are

KB > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and the matrix-valued

function B(x) satis¬es

I ’ B(x)F (x— ) = ρ(x) < δ1

(5.10)

for all x ∈ B(δ) then the iteration

xn+1 = xn ’ B(xn )F (xn )

converges q-linearly to x— and

(5.11) en+1 ¤ KB (ρ(xn ) + en ) en .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

78 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. On one hand, this theorem could be regarded as a corollary of the

Banach Lemma and Theorem 5.4.1. We give a direct proof.

First, by (5.10) we have

= B(x)F (x— )F (x— )’1 ¤ B(x)F (x— ) F (x— )’1

B(x)

(5.12)

¤ MB = (1 + δ1 ) F (x— )’1 .

Using (5.12) and

1

(I ’ B(xc )F (x— + tec ))ec dt

e+ = ec ’ B(xc )F (xc ) =

0

1

= (I ’ B(xc )F (x— ))ec + B(xc ) (F (x— ) ’ F (x— + tec ))ec dt

0

we have

e+ ¤ ρ(xc ) ec + MB γ ec 2 /2.

This completes the proof with KB = 1 + MB γ/2.

5.4.3. The Shamanskii method. Alternation of a Newton step with a

sequence of chord steps leads to a class of high-order methods, that is, methods

that converge q-superlinearly with q-order larger than 2. We follow [18] and

name this method for Shamanskii [174], who considered the ¬nite di¬erence

Jacobian formulation of the method from the point of view of e¬ciency. The

method itself was analyzed in [190]. Other high-order methods, especially for

problems in one dimension, are described in [190] and [146].

We can describe the transition from xc to x+ by

= xc ’ F (xc )’1 F (xc ),

y1

yj+1 = yj ’ F (xc )’1 F (yj ) for 1 ¤ j ¤ m ’ 1,

(5.13)

x+ = ym .

Note that m = 1 is Newton™s method and m = ∞ is the chord method with

{yj } playing the role of the chord iterates.

Algorithmically a second loop surrounds the factor/solve step. The inputs

for the Shamanskii method are the same as for Newton™s method or the chord

method except for the addition of the parameter m. Note that we can overwrite

xc with the sequence of yj ™s. Note that we apply the termination test after

computation of each yj .

Algorithm 5.4.1. sham(x, F, „, m)

1. r0 = F (x)

2. Do while F (x) > „r r0 + „a

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

79

NEWTON™S METHOD

(a) Compute F (x)

(b) Factor F (x) = LU .

(c) for j = 1, . . . m

i. Solve LU s = ’F (x)

ii. x = x + s

iii. Evaluate F (x).

iv. If F (x) ¤ „r r0 + „a exit.

The convergence result is a simple application of Theorem 5.4.2.

Theorem 5.4.5. Let the standard assumptions hold and let m ≥ 1 be

given. Then there are KS > 0 and δ > 0 such that if x0 ∈ B(δ) the Shamanskii

iterates converge q-superlinearly to x— with q-order m + 1 and

m+1

(5.14) en+1 ¤ KS en .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and that the conclusions

of Theorem 5.4.2 hold. Then if xn ∈ B(δ) all the intermediate iterates {yj }

are in B(δ) by Theorem 5.4.2. In fact, if we set y1 = xn , (5.8) implies that for

1¤j¤m

j

yj ’x— ¤ KC xn ’x— yj’1 ’x— = KC en yj’1 ’x— ¤ . . . ¤ KC en j+1

.

Hence xn+1 ∈ B(δ). Setting j = m in the above inequality completes the

proof.

The advantage of the Shamanskii method over Newton™s method is

that high q-orders can be obtained with far fewer Jacobian evaluations or

factorizations. Optimal choices of m as a function of N can be derived

under assumptions consistent with dense matrix algebra [18] by balancing the

O(N 3 ) cost of a matrix factorization with the cost of function and Jacobian

evaluation. The analysis in [18] and the expectation that, if the initial iterate is

su¬ciently accurate, only a small number of iterations will be taken, indicate

that the chord method is usually the best option for very large problems.

Algorithm nsol is based on this idea and using an idea from [114] computes

and factors the Jacobian only until an estimate of the q-factor for the linear

convergence rate is su¬ciently low.

5.4.4. Di¬erence approximation to F . Assume that we compute

F (x) + (x) instead of F (x) and attempt to approximate the action of F

on a vector by a forward di¬erence. We would do this, for example, when

building an approximate Jacobian. Here the jth column of F (x) is F (x)ej

where ej is the unit vector with jth component 1 and other components 0.

A forward di¬erence approximation to F (x)w would be

F (x + hw) + (x + hw) ’ F (x) ’ (x)

.

h

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

80 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Assume that (x) ¤ ¯. Then

F (x + hw) + (x + hw) ’ F (x) ’ (x)

F (x)w ’ = O(h + ¯/h).

h

√

The quantity inside the O-term is minimized when h = ¯. This means, for

instance, that very small di¬erence steps h can lead to inaccurate results. In

the special case where (x) is a result of ¬‚oating-point roundo¬ in full precision

(¯ ≈ 10’15 ), the analysis here indicates that h ≈ 10’7 is a reasonable choice.

The choice h ≈ 10’15 in this case can lead to disaster (try it!). Here we have

made the implicit assumption that x and w are of roughly the same size. If

this is not the case, h should be scaled to re¬‚ect that. The choice

h = ¯1/2 x / w

re¬‚ects all the discussion above.

A more important consequence of this analysis is that the choice of step

size in a forward di¬erence approximation to the action of the Jacobian on

a vector must take into account the error in the evaluation of F . This can

become very important if part of F is computed from measured data.

If F (x) has already been computed, the cost of a forward di¬erence

approximation of F (x)w is an additional evaluation of F (at the point x+hw).

Hence the evaluation of a full ¬nite di¬erence Jacobian would cost N function

evaluations, one for each column of the Jacobian. Exploitation of special

structure could reduce this cost.

The forward di¬erence approximation to the directional derivative is not

a linear operator in w. The reason for this is that the derivative has been

approximated, and so linearity has been lost. Because of this we must

carefully specify what we mean by a di¬erence approximation to the Jacobian.

De¬nition 5.4.1 below speci¬es the usual choice.

Definition 5.4.1. Let F be de¬ned in a neighborhood of x ∈ RN .

(∇h F )(x) is the N — N matrix whose jth column is given by

±

F (x + h x ej ) ’ F (x)

x=0

hx

(∇h F )(x)j =

F (he ) ’ F (x)

j

x=0

h

We make a similar de¬nition of the di¬erence approximation of the

directional derivative.

Definition 5.4.2. Let F be de¬ned in a neighborhood of x ∈ RN and let