Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

81

NEWTON™S METHOD

w ∈ RN . We have

±

0, w = 0,

F (x + h x w/ w ) ’ F (x)

w , w, x = 0,

(5.15)Dh F (x : w) = hx

F (hw/ w ) ’ F (x)

w , x = 0, w = 0.

h

The standard assumptions and the Banach Lemma imply the following

result.

¯

Lemma 5.4.1. Let the standard assumptions hold. Then there are δ, ¯, h >

¯

0 such that if x ∈ B(δ), h ¤ h, and (x) ¤ ¯ for all x ∈ B(δ), then

∇h (F + )(x) is nonsingular for all x ∈ B(δ) and there is MF > 0 such that

F (x) ’ ∇h (F + )(x) ¤ MF (h + ¯/h).

If we assume that the forward di¬erence step has been selected so that

√ √

h = O( ¯), then ∆(x) = O( ¯) by Lemma 5.4.1. Theorem 5.4.1 implies the

following result.

Theorem 5.4.6. Let the standard assumptions hold. Then there are

positive δ, ¯, and KD such that if xc ∈ B(δ), (x) ¤ ¯ for all x ∈ B(δ),

and there is M’ > 0 such that

√

hc > M’ ¯

then

x+ = xc ’ ∇hc (F + )(xc )’1 (F (xc ) + (xc ))

satis¬es

e+ ¤ KD (¯ + ( ec + hc ) ec ).

Note that Theorem 5.4.6 does not imply that an iteration will converge to

x— .Even if is entirely due to ¬‚oating-point roundo¬, there is no guarantee

that (xn ) ’ 0 as xn ’ x— . Hence, one should expect the sequence { en }

to stagnate and cease to decrease once√ en ≈ ¯. Therefore in the early

phases of the iteration, while en >> ¯, the progress of the iteration will

be indistinguishable from Newton™s method as implemented with an exact

√

Jacobian. When en ¤ ¯, Theorem 5.4.6 says that en+1 = O(¯). Hence

one will see quadratic convergence until the error in F admits no further

reduction.

A di¬erence approximation to a full Jacobian does not use any special

structure of the Jacobian and will probably be less e¬cient than a hand coded

analytic Jacobian. If information on the sparsity pattern of the Jacobian is

available it is sometimes possible [47], [42], to evaluate ∇h F with far fewer than

N evaluations of F . If the sparsity pattern is such that F can be reduced to

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

82 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

a block triangular form, the entire solution process can take advantage of this

structure [59].

For problems in which the Lipschitz constant of F is large, the error in

the di¬erence approximation will be large, too. Moreover, Dh F (x : w) is not,

in general, a linear function of w and it is usually the case that

(∇h F (x))w = Dh F (x : w).

If F (x— ) is ill conditioned or the Lipschitz constant of F is large, the

di¬erence between them may be signi¬cant and a di¬erence approximation to

a directional derivative, which uses the size of w, is likely to be more accurate

than ∇h F (x)w.

While we used full matrix approximations to numerical Jacobians in all

the examples reported here, they should be used with caution.

5.4.5. The secant method. The results in this section are for single

equations f (x) = 0, where f is a real-valued function of one real variable.

We assume for the sake of simplicity that there are no errors in the evaluation

of f , i.e., (x) = 0. The standard assumptions, then, imply that f (x— ) = 0.

There is no reason to use a di¬erence approximation to f for equations in a

single variable because one can use previously computed data to approximate

f (xn ) by

f (xn ) ’ f (xn’1 )

an = .

xn ’ xn’1

The resulting method is called the secant method and is due to Newton

[140], [137]. We will prove that the secant method is locally q-superlinearly

convergent if the standard assumptions hold.

In order to start, one needs two approximations to x— , x0 and x’1 . The

local convergence theory can be easily described by Theorem 5.4.1. Let xc be

the current iterate and x’ the previous iterate. We have (xc ) = 0 and

f (xc ) ’ f (x’ )

(5.16) ∆(xc ) = ’ f (xc ).

xc ’ x’

We have the following theorem.

Theorem 5.4.7. Let the standard assumptions hold with N = 1. Then

there is δ > 0 such that if x0 , x’1 ∈ B(δ) and x0 = x’1 then the secant iterates

converge q-superlinearly to x— .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and the conclusions of

Theorem 5.4.1 hold. Assume that x’1 , x0 , . . . , xn ∈ B(δ). If xn = x— for some

¬nite n, we are done. Otherwise xn = xn’1 . If we set s = xn ’ xn’1 we have

by (5.16) and the fundamental theorem of calculus

1

∆(xn ) = (f (xn’1 + ts) ’ f (xn )) dt

0

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

83

NEWTON™S METHOD

and hence

γ(|en | + |en’1 |)

|∆(xn )| ¤ γ|xn’1 ’ xn |/2 ¤ .

2

Applying Theorem 5.4.1 implies

¯

|en+1 | ¤ K((1 + γ/2)|en |2 + γ|en ||en’1 |/2).

(5.17)

If we reduce δ so that

¯

K(1 + γ)δ = · < 1

then xn ’ x— q-linearly. Therefore

|en+1 | ¯

¤ K((1 + γ/2)|en | + γ|en’1 |/2) ’ 0

|en |

as n ’ ∞. This completes the proof.

The secant method and the classical bisection (see Exercise 5.7.8) method

are the basis for the popular method of Brent [17] for the solution of single

equations.

5.5. The Kantorovich Theorem

In this section we present the result of Kantorovich [107], [106], which states

that if the standard assumptions “almost” hold at a point x0 , then there

is a root and the Newton iteration converges r-quadratically to that root.

This result is of use, for example, in proving that discretizations of nonlinear

di¬erential and integral equations have solutions that are near to that of the

continuous problem.

We require the following assumption.

Assumption 5.5.1. There are β, ·, r, and γ with β·γ ¤ 1/2 and x0 ∈ RN

¯

such that

1. F is di¬erentiable at x0 ,

F (x0 )’1 ¤ β, and F (x0 )’1 F (x0 ) ¤ ·.

2. F is Lipschitz continuous with Lipschitz constant γ in a ball of radius

r ≥ r’ about x0 where

¯

√

1 ’ 1 ’ 2β·γ

r’ = .

βγ

We will let B0 be the closed ball of radius r’ about x0

B0 = {x | x ’ x0 ¤ r’ }.

We do not prove the result in its full generality and refer to [106], [145],

[57], and [58] for a complete proof and for discussions of related results.

However, we give a simpli¬ed version of a Kantorovich-like result for the chord

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

84 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

method to show how existence of a solution can be derived from estimates

on the function and Jacobian. Unlike the chord method results in [106],

[145], and [57], Theorem 5.5.1 assumes a bit more than Assumption 5.5.1 but

has a simple proof that uses only the contraction mapping theorem and the

fundamental theorem of calculus and obtains q-linear convergence instead of

r-linear convergence.

Theorem 5.5.1. Let Assumption 5.5.1 hold and let

(5.18) βγ· < 1/2.

Then there is a unique root x— of F in B0 , the chord iteration with x0 as the

initial iterate converges to x— q-linearly with q-factor βγr’ , and xn ∈ B0 for

all n. Moreover x— is the unique root of F in the ball of radius

√

1 + 1 ’ 2β·γ

min r,¯

βγ

about x0 .

Proof. We will show that the map

φ(x) = x ’ F (x0 )’1 F (x)

is a contraction on the set

S = {x | x ’ x0 ¤ r’ }.

By the fundamental theorem of calculus and Assumption 5.5.1 we have for

all x ∈ S

F (x0 )’1 F (x) = F (x0 )’1 F (x0 )

1

)’1

+F (x0 F (x0 + t(x ’ x0 ))(x ’ x0 ) dt

0

(5.19)

= F (x0 )’1 F (x0 ) + x ’ x0

1

)’1

+F (x0 (F (x0 + t(x ’ x0 )) ’ F (x0 ))(x ’ x0 ) dt

0

and

φ(x) ’ x0 = ’F (x0 )’1 F (x0 )

1

)’1

’F (x0 (F (x0 + t(x ’ x0 )) ’ F (x0 ))(x ’ x0 ) dt.

0

Hence,

2

φ(x) ’ x0 ¤ · + βγr’ /2 = r’ .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

85

NEWTON™S METHOD

and φ maps S into itself.

To show that φ is a contraction on S and to prove the q-linear convergence

we will show that

φ(x) ’ φ(y) ¤ βγr’ x ’ y

for all x, y ∈ S. If we do this, the contraction mapping theorem (Theo-

rem 4.2.1) will imply that there is a unique ¬xed point x— of φ in S and

that xn ’ x— q-linearly with q-factor βγr’ . Clearly x— is a root of F because

it is a ¬xed point of φ. We know that βγr’ < 1 by our assumption that

β·γ < 1/2.

Note that for all x ∈ B0

φ (x) = I ’ F (x0 )’1 F (x) = F (x0 )’1 (F (x0 ) ’ F (x))

and hence

φ (x) ¤ βγ x ’ x0 ¤ βγr’ < 1.

Therefore, for all x, y ∈ B0 ,

1

φ(x) ’ φ(y) = φ (y + t(x ’ y))(x ’ y) dt ¤ βγr’ x ’ y .

0

This proves the convergence result and the uniqueness of x— in B0 .

To prove the remainder of the uniqueness assertion let

√

1 + 1 ’ 2β·γ

r+ =

βγ

and note that r+ > r’ because β·γ < 1/2. If r = r’ , then we are done by the

¯

— in B . Hence we may assume that r > r . We must show

uniqueness of x ¯

0 ’

that if x is such that

r’ < x ’ x0 < min(¯, r+ )

r

then F (x) = 0. Letting r = x ’ x0 we have by (5.19)

F (x0 )’1 F (x) ≥ r ’ · ’ βγr2 /2 > 0

because r’ < r < r+ . This completes the proof.

The Kantorovich theorem is more precise than Theorem 5.5.1 and has a

very di¬erent proof. We state the theorem in the standard way [145], [106],

[63], using the r-quadratic estimates from [106].

Theorem 5.5.2. Let Assumption 5.5.1 hold. Then there is a unique root

x— of F in B0 , the Newton iteration with x0 as the initial iterate converges to

x— , and xn ∈ B0 for all n. x— is the unique root of F in the ball of radius

√

1 + 1 ’ 2β·γ

min r, ¯

βγ

about x0 and the errors satisfy

n

(2β·γ)2

(5.20) en ¤ .

2n βγ

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

86 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

If β·γ < 1/2 then (5.20) implies that the convergence is r-quadratic (see

Exercise 5.7.16).

5.6. Examples for Newton™s method

In the collection of MATLAB codes we provide an implementation nsol

of Newton™s method, the chord method, the Shamanskii method, or a

combination of all of them based on the reduction in the nonlinear residual.

This latter option decides if the Jacobian should be recomputed based on the

ratio of successive residuals. If

F (x+ ) / F (xc )

is below a given threshold, the Jacobian is not recomputed. If the ratio is too

large, however, we compute and factor F (x+ ) for use in the subsequent chord

steps. In addition to the inputs of Algorithm sham, a threshold for the ratio

of successive residuals is also input. The Jacobian is recomputed and factored

if either the ratio of successive residuals exceeds the threshold 0 < ρ < 1 or

the number of iterations without an update exceeds m. The output of the

MATLAB implementation of nsol is the solution, the history of the iteration

stored as the vector of l∞ norms of the nonlinear residuals, and an error ¬‚ag.

nsol uses diffjac to approximate the Jacobian by ∇h F with h = 10’7 .

While our examples have dense Jacobians, the ideas in the implementation

of nsol also apply to situations in which the Jacobian is sparse and direct

methods for solving the equation for the Newton step are necessary. One would

still want to minimize the number of Jacobian computations (by the methods

of [47] or [42], say) and sparse matrix factorizations. In Exercise 5.7.25, which

is not easy, the reader is asked to create a sparse form of nsol.

Algorithm 5.6.1. nsol(x, F, „, m, ρ)

1. rc = r0 = F (x)

2. Do while F (x) > „r r0 + „a

(a) Compute ∇h F (x) ≈ F (x)

(b) Factor ∇h F (x) = LU .

(c) is = 0, σ = 0

(d) Do While is < m and σ ¤ ρ

i. is = is + 1

ii. Solve LU s = ’F (x)

iii. x = x + s

iv. Evaluate F (x)

v. r+ = F (x) , σ = r+ /rc , rc = r+

vi. If F (x) ¤ „r r0 + „a exit.

In the MATLAB code the iteration is terminated with an error condition

if σ ≥ 1 at any point. This indicates that the local convergence analysis in this

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

87

NEWTON™S METHOD

section is not applicable. In Chapter 8 we discuss how to recover from this.

nsol is based on the assumption that a di¬erence Jacobian will be used. In

Exercise 5.7.21 you are asked to modify the code to accept an analytic Jacobian.

Many times the Jacobian can be computed e¬ciently by reusing results that

are already available from the function evaluation. The parameters m and ρ

are assigned the default values of 1000 and .5. With these default parameters

the decision to update the Jacobian is made based entirely on the reduction in

the norm of the nonlinear residual. nsol allows the user to specify a maximum

number of iterations; the default is 40.

The Chandrasekhar H-equation. The Chandrasekhar H-equation, [41],

[30],

’1

1 µH(ν) dν

c

(5.21) F (H)(µ) = H(µ) ’ 1 ’ = 0,

2 µ+ν