Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

67

FIXED-POINT ITERATION

Before discussing convergence of ¬xed-point iteration we make two de¬ni-

tions.

Definition 4.2.1. Let „¦ ‚ RN and let G : „¦ ’ RM . G is Lipschitz

continuous on „¦ with Lipschitz constant γ if

G(x) ’ G(y) ¤ γ x ’ y

for all x, y ∈ „¦.

Definition 4.2.2. Let „¦ ‚ RN . K : „¦ ’ RN is a contraction mapping

on „¦ if K is Lipschitz continuous on „¦ with Lipschitz constant γ < 1.

The standard result for ¬xed-point iteration is the Contraction Mapping

Theorem [9]. Compare the proof to that of Lemma 1.2.1 and Corollary 1.2.1.

Theorem 4.2.1. Let „¦ be a closed subset of RN and let K be a contraction

mapping on „¦ with Lipschitz constant γ < 1 such that K(x) ∈ „¦ for all x ∈ „¦.

Then there is a unique ¬xed point of K, x— ∈ „¦, and the iteration de¬ned by

(4.3) converges q-linearly to x— with q-factor γ for all initial iterates x0 ∈ „¦.

Proof. Let x0 ∈ „¦. Note that {xn } ‚ „¦ because x0 ∈ „¦ and K(x) ∈ „¦

whenever x ∈ „¦. The sequence {xn } remains bounded since for all i ≥ 1

xi+1 ’ xi = K(xi ) ’ K(xi’1 ) ¤ γ xi ’ xi’1 . . . ¤ γ i x1 ’ x0 ,

and therefore

n’1

xn ’ x0 = i=0 xi+1 ’ xi

n’1 n’1 i

¤ xi+1 ’ xi ¤ x1 ’ x0 i=0 γ

i=0

¤ x1 ’ x0 /(1 ’ γ).

Now, for all n, k ≥ 0,

xn+k ’ xn = K(xn+k’1 ) ’ K(xn’1 )

¤ γ xn+k’1 ’ xn’1

¤ γ K(xn+k’2 ) ’ K(xn’2 )

¤ γ 2 xn+k’2 ’ xn’2 ¤ . . . ¤ γ n xk ’ x0

¤ γ n x1 ’ x0 /(1 ’ γ).

Hence

lim xn+k ’ xn = 0

n,k’∞

and therefore the sequence {xn } is a Cauchy sequence and has a limit x— [162].

If K has two ¬xed points x— and y — in „¦, then

x— ’ y — = K(x— ) ’ K(y — ) ¤ γ x— ’ y — .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

68 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Since γ < 1 this can only hold if x— ’ y — = 0, i. e. x— = y — . Hence the ¬xed

point is unique.

Finally we note that

xn+1 ’ x— = K(xn ) ’ K(x— ) ¤ γ xn ’ x— ,

which completes the proof.

One simple application of this theorem is to numerical integration of

ordinary di¬erential equations by implicit methods. For example, consider

the initial value problem

y = f (y), y(t0 ) = y0

and its solution by the backward Euler method with time step h,

y n+1 ’ y n = hf (y n+1 ).

(4.4)

In (4.4) y k denotes the approximation of y(t0 + kh). Advancing in time from

y n to y n+1 requires solution of the ¬xed-point problem

y = K(y) = y n + hf (y).

For simplicity we assume that f is bounded and Lipschitz continuous, say

|f (y)| ¤ M and |f (x) ’ f (y)| ¤ M |x ’ y|

for all x, y.

If hM < 1, we may apply the contraction mapping theorem with „¦ = R

since for all x, y

K(x) ’ K(y) = h|f (x) ’ f (y)| ¤ hM |x ’ y|.

From this we conclude that for h su¬ciently small we can integrate forward

in time and that the time step h can be chosen independently of y. This time

step may well be too small to be practical, and the methods based on Newton™s

method which we present in the subsequent chapters are used much more often

[83]. This type of analysis was used by Picard to prove existence of solutions

of initial value problems [152]. Nonlinear variations of the classical stationary

iterative methods such as Gauss“Seidel have also been used; see [145] for a

more complete discussion of these algorithms.

4.3. The standard assumptions

We will make the standard assumptions on F .

Assumption 4.3.1.

1. Equation 4.1 has a solution x— .

2. F : „¦ ’ RN —N is Lipschitz continuous with Lipschitz constant γ.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

69

FIXED-POINT ITERATION

3. F (x— ) is nonsingular.

These assumptions can be weakened [108] without sacri¬cing convergence

of the methods we consider here. However the classical result on quadratic

convergence of Newton™s method requires them and we make them throughout.

Exercise 5.7.1 illustrates one way to weaken the standard assumptions.

Throughout this part, we will always denote a root of F by x— . We let

B(r) denote the ball of radius r about x—

B(r) = {x | e < r},

where

e = x ’ x— .

The notation introduced above will be used consistently. If xn is the nth iterate

of a sequence, en = xn ’ x— is the error in that iterate.

Lemma 4.3.1 is an important consequence of the standard assumptions.

Lemma 4.3.1. Assume that the standard assumptions hold. Then there is

δ > 0 so that for all x ∈ B(δ)

F (x) ¤ 2 F (x— ) ,

(4.5)

F (x)’1 ¤ 2 F (x— )’1 ,

(4.6)

and

F (x— )’1 ’1

e /2 ¤ F (x) ¤ 2 F (x— ) e .

(4.7)

Proof. Let δ be small enough so that B(δ) ‚ „¦. For all x ∈ B(δ) we have

F (x) ¤ F (x— ) + γ e .

Hence (4.5) holds if γδ < F (x— ) .

The next result (4.6) is a direct consequence of the Banach Lemma if

I ’ F (x— )’1 F (x) < 1/2.

This will follow from

F (x— )’1 ’1

δ<

2γ

since then

I ’ F (x— )’1 F (x) = F (x— )’1 (F (x— ) ’ F (x))

(4.8)

¤ γ F (x— )’1 e ¤ γδ F (x— )’1 < 1/2.

To prove the ¬nal inequality (4.7), we note that if x ∈ B(δ) then x— + te ∈

B(δ) for all 0 ¤ t ¤ 1. We use (4.5) and Theorem 4.0.1 to obtain

1

F (x— + te) e dt ¤ 2 F (x— ) e

F (x) ¤

0

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

70 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

which is the right inequality in (4.7).

To prove the left inequality in (4.7) note that

1

(x— )’1 F (x) (x— )’1 F (x— + te)e dt

F =F

0

1

(I ’ F (x— )’1 F (x— + te))e dt,

=e’

0

and hence, by (4.8)

1

— ’1

I ’ F (x— )’1 F (x— + te)dt ) ≥ e /2.

F (x ) F (x) ≥ e (1 ’

0

Therefore

e /2 ¤ F (x— )’1 F (x) ¤ F (x— )’1 F (x) ,

which completes the proof.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 5

Newton™s Method

5.1. Local convergence of Newton™s method

We will describe iterative methods for nonlinear equations in terms of the

transition from a current iterate xc to a new iterate x+ . In this language,

Newton™s method is

x+ = xc ’ F (xc )’1 F (xc ).

(5.1)

We may also view x+ as the root of the two-term Taylor expansion or linear

model of F about xc

Mc (x) = F (xc ) + F (xc )(x ’ xc ).

In the context of single systems this method appeared in the 17th century

[140], [156], [13], [137].

The convergence result on Newton™s method follows from Lemma 4.3.1.

Theorem 5.1.1. Let the standard assumptions hold. Then there are K > 0

and δ > 0 such that if xc ∈ B(δ) the Newton iterate from xc given by (5.1)

satis¬es

e+ ¤ K ec 2 .

(5.2)

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 hold.

By Theorem 4.0.1

1

’1 ’1

(F (xc ) ’ F (x— + tec ))ec dt.

e+ = ec ’ F (xc ) F (xc ) = F (xc )

0

By Lemma 4.3.1 and the Lipschitz continuity of F

e+ ¤ (2 F (x— )’1 )γ ec 2 /2.

This completes the proof of (5.2) with K = γ F (x— )’1 .

The proof of convergence of the complete Newton iteration will be complete

if we reduce δ if needed so that Kδ < 1.

Theorem 5.1.2. Let the standard assumptions hold. Then there is δ such

that if x0 ∈ B(δ) the Newton iteration

xn+1 = xn ’ F (xn )’1 F (xn )

converges q-quadratically to x— .

71

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

72 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. Let δ be small enough so that the conclusions of Theorem 5.1.1

hold. Reduce δ if needed so that Kδ = · < 1. Then if n ≥ 0 and xn ∈ B(δ),

Theorem 5.1.1 implies that

2

(5.3) en+1 ¤ K en ¤ · en < en

and hence xn+1 ∈ B(·δ) ‚ B(δ). Therefore if xn ∈ B(δ) we may continue the

iteration. Since x0 ∈ B(δ) by assumption, the entire sequence {xn } ‚ B(δ).

(5.3) then implies that xn ’ x— q-quadratically.

The assumption that the initial iterate be “su¬ciently near” the solution

(x0 ∈ B(δ)) may seem arti¬cial at ¬rst look. There are, however, many

situations in which the initial iterate is very near the root. Two examples are

implicit integration of ordinary di¬erential equations and di¬erential algebraic

equations, [105], [83], [16], where the initial iterate is derived from the solution

at the previous time step, and solution of discretizations of partial di¬erential

equations, where the initial iterate is an interpolation of a solution from a

coarser computational mesh [99], [126]. Moreover, when the initial iterate is

far from the root and methods such as those discussed in Chapter 8 are needed,

local convergence results like those in this and the following two chapters

describe the terminal phase of the iteration.

5.2. Termination of the iteration

We can base our termination decision on Lemma 4.3.1. If x ∈ B(δ), where δ

is small enough so that the conclusions of Lemma 4.3.1 hold, then if F (x— ) is

well conditioned, we may terminate the iteration when the relative nonlinear

residual F (x) / F (x0 ) is small. We have, as a direct consequence of applying

Lemma 4.3.1 twice, a nonlinear form of Lemma 1.1.1.

Lemma 5.2.1. Assume that the standard assumptions hold. Let δ > 0 be

small enough so that the conclusions of Lemma 4.3.1 hold for x ∈ B(δ). Then

for all x ∈ B(δ)

4κ(F (x— )) e

e F (x)

¤ ¤ ,

4 e0 κ(F (x— )) F (x0 ) e0

where κ(F (x— )) = F (x— ) F (x— )’1 is the condition number of F (x— )

relative to the norm · .

From Lemma 5.2.1 we conclude that if F (x— ) is well conditioned, the size

of the relative nonlinear residual is a good indicator of the size of the error.

However, if there is error in the evaluation of F or the initial iterate is near a

root, a termination decision based on the relative residual may be made too

late in the iteration or, as we will see in the next section, the iteration may

not terminate at all. We raised this issue in the context of linear equations

in Chapter 1 when we compared (1.2) and (1.4). In the nonlinear case, there

is no “right-hand side” to use as a scaling factor and we must balance the

relative and absolute size of the nonlinear residuals in some other way. In all

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

73

NEWTON™S METHOD

of our MATLAB codes and algorithms for nonlinear equations our termination

criterion is to stop the iteration if

(5.4) F (x) ¤ „r F (x0 ) + „a ,

where the relative error tolerance „r and absolute error tolerance „a are input

to the algorithm. Combinations of relative and absolute error tolerances are

commonly used in numerical methods for ordinary di¬erential equations or

di¬erential algebraic equations [16], [21], [23], [151].

Another way to decide whether to terminate is to look at the Newton step

s = ’F (xc )’1 F (xc ) = x+ ’ xc ,

and terminate the iteration when s is su¬ciently small. This criterion is

based on Theorem 5.1.1, which implies that

ec = s + O( ec 2 ).

Hence, near the solution s and ec are essentially the same size.

For methods other than Newton™s method, the relation between the step

and the error is less clear. If the iteration is q-linearly convergent, say, then

e+ ¤ σ ec

implies that

(1 ’ σ) ec ¤ s ¤ (1 + σ) ec .

Hence the step is a reliable indicator of the error provided σ is not too near

1. The di¬erential equations codes discussed in [16], [21], [23], and [151], make

use of this in the special case of the chord method.

However, in order to estimate ec this way, one must do most of the work

needed to compute x+ , whereas by terminating on small relative residuals or

on a condition like (5.4) the decision to stop the iteration can be made before

computation and factorization of F (xc ). If computation and factorization of

the Jacobian are inexpensive, then termination on small steps becomes more

attractive.

5.3. Implementation of Newton™s method

In this section we assume that direct methods will be used to solve the linear

equation for the Newton step. Our examples and discussions of operation

counts make the implicit assumption that the Jacobian is dense. However,

the issues for sparse Jacobians when direct methods are used to solve linear

systems are very similar. In Chapter 6 we consider algorithms in which iterative

methods are used to solve the equation for the Newton step.

In order to compute the Newton iterate x+ from a current point xc one

must ¬rst evaluate F (xc ) and decide whether to terminate the iteration. If

one decides to continue, the Jacobian F (xc ) must be computed and factored.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

74 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Then the step is computed as the solution of F (xc )s = ’F (xc ) and the iterate

is updated x+ = xc + s. Of these steps, the evaluation and factorization of F

are the most costly. Factorization of F in the dense case costs O(N 3 ) ¬‚oating-

point operations. Evaluation of F by ¬nite di¬erences should be expected to

cost N times the cost of an evaluation of F because each column in F requires

an evaluation of F to form the di¬erence approximation. Hence the cost of a

Newton step may be roughly estimated as N + 1 evaluations of F and O(N 3 )

¬‚oating-point operations. In many cases F can be computed more e¬ciently,

accurately, and directly than with di¬erences and the analysis above for the

cost of a Newton iterate is very pessimistic. See Exercise 5.7.21 for an example.

For general nonlinearities or general purpose codes such as the MATLAB code

nsol from the collection, there may be little alternative to di¬erence Jacobians.

In the future, automatic di¬erentiation (see [94] for a collection of articles on