¯ x

(8.17)

Df is Lipschitz continuous in an open neighbourhood C of x.¯

> 0 there exists a δ > 0 such that for x, y ∈ Bδ (¯),

Then for every x

x ’ x ¤ (1 + ) κ(Df (¯)) f (x) y’x .

f (y) ¯ x ¯

2

Proof: See [22, p. 69, p. 72] and Exercise 8.4.

Here κ is the condition number in a matrix norm that is consistent with

the chosen vector norm. For x = x(k) and y = x(0) we get (locally) the

generalization of (5.16).

8.2.2 Modi¬cations of Newton™s Method

Modi¬cations of Newton™s method aim in two directions:

• Reduction of the cost of the assembling and the solution of the sys-

tem of equations (8.12) (without a signi¬cant deterioration of the

properties of convergence).

• Enlargement of the range of convergence.

We can account for the ¬rst aspect by simplifying the matrix in (8.12)

(modi¬ed or simpli¬ed Newton™s method). The extreme case is the replace-

ment of Df x(k) by the identity matrix; this leads us to the ¬xed-point

iteration (8.9). If the mapping f consists of a nonlinear and a linear part,

f (x) := Ax + g(x) = 0 , (8.18)

then the system of equations (8.12) of the Newton iteration reads as

δ (k) = ’f x(k) .

A + Dg x(k)

A straightforward simpli¬cation in this case is the ¬xed-point iteration

A δ (k) = ’f x(k) . (8.19)

It may be interpreted as the ¬xed-point iteration (8.9) of the system that

is preconditioned with A, i.e., of

A’1 f (x) = 0.

In (8.19) the matrix is identical in every iteration step; therefore, it has to

be assembled only once, and if we use a direct method (cf. Section 2.5),

the LU factorization has to be carried out only once. Thus with forward

354 8. Iterative Methods for Nonlinear Equations

and backward substitution we have only to perform methods with lower

computational cost. For iterative methods we cannot rely on this advantage,

but we can expect that x(k+1) is close to x(k) , and consequently δ (k,0) = 0

constitutes a good initial guess. Accordingly, the assembling of the matrix

gets more important with respect to the overall computational cost, and

savings during the assembling become relevant.

We get a system of equations similar to (8.19) by applying the chord

method (see Exercise 8.3), where the linear approximation of the initial

iterate is maintained, i.e.,

Df x(0) δ (k) = ’f x(k) . (8.20)

If the matrix B x(k) , which approximates Df x(k) , is changing in each

iteration step, i.e.,

B x(k) δ (k) = ’f x(k) , (8.21)

then the only advantage can be a possibly easier assembling or solvability of

the system of equations. If the partial derivatives ‚j fi (x) are more di¬cult

to evaluate than the function fi (y) itself (or possibly not evaluable at all),

then the approximation of Df (x(k) ) by di¬erence quotients can be taken

into consideration. This corresponds to

1

f (x + hej ) ’ f (x)

B x(k) ej = (8.22)

h

for column j of B x(k) with a ¬xed h > 0. The number of computations for

the assembling of the matrix remains the same: m2 for the full matrix and

analogously for the sparse matrix (see Exercise 8.6). Observe that numerical

di¬erentiation is an ill-posed problem, which means that we should ideally

choose h ∼ δ 1/2 , where δ > 0 is the error level in the evaluation of f . Even

then we can merely expect

Df x(k) ’ B x(k) ¤ Cδ 1/2

(see [22, pp. 80 f.]). Thus in the best case we can expect only half of the

signi¬cant digits of the machine precision. The second aspect of facilitated

solvability of (8.21) occurs if there appear “small” entries in the Jaco-

bian, due to a problem-dependent weak coupling of the components, and

these entries may be skipped. Take, for example, a Df x(k) with a block

structure as in (5.39):

Aij ∈ Rmi ,mj ,

Df x(k) = Aij ,

ij

such that the blocks Aij may be neglected for j > i. Then there results a

nested system of equations of the dimensions m1 , m2 , . . . , mp .

The possible advantages of such simpli¬ed Newton™s methods have to

be weighted against the disadvantage of a deterioration in the order of

convergence: Instead of an estimation like that in Theorem 8.11, (3), we

8.2. Newton™s Method and Variants 355

have to expect an additional term

B x(k) ’ Df x(k) x(k) ’ x .

This means only linear or ” by successive improvement of the approxi-

mation ” superlinear convergence (see [22, pp. 75 ¬.]). If we have a good

initial iterate, it may often be advantageous to perform a small number of

steps of Newton™s method. So in the following we will treat again Newton™s

method, although the subsequent considerations can also be transferred to

its modi¬cations.

If the linear problems (8.12) are solved with an iterative scheme, we

have the possibility to adjust the accuracy of the algorithm in order to

reduce the number of inner iterations, without a (severe) deterioration of

the convergence of the outer iteration of the Newton iteration. So dealing

with such inexact Newton™s methods, we determine instead of δ (k) from

˜

(8.12) only δ (k) , which ful¬ls (8.12) only up to an inner residual r(k) , i.e.,

˜

Df x(k) δ (k) = ’f x(k) + r(k) .

The new iterate is given by

˜

x(k+1) := x(k) + δ (k) .

˜

The accuracy of δ (k) is estimated by the requirement

r(k) ¤ ·k f x(k) (8.23)

with certain properties for the sequence (·k )k that still have to be de-

termined. Since the natural choice of the initial iterate for solving (8.12)

is δ (k,0) = 0, (8.23) corresponds to the termination criterion (5.15).

Conditions for ·k can be deduced from the following theorem:

Theorem 8.13 Let (8.17) hold and consider compatible matrix and vector

norms. Then there exists for every > 0 a δ > 0 such that for x(k) ∈ Bδ (¯),

x

’1

x(k+1) ’ x ¤ x(k) ’ Df x(k) f x(k) ’ x

¯ ¯

(8.24)

+ (1 + ) κ Df (¯) ·k x(k) ’ x .

x ¯

Proof: By the choice of δ we can ensure the nonsingularity of Df (x(k) ).

From

’1 ’1 (k)

˜

δ (k) = ’Df x(k) f x(k) + Df x(k) r

it follows that

¯˜

x(k+1) ’ x x(k) ’ x + δ (k)

¯ =

’1 ’1 (k)

¤ x(k) ’ x ’ Df x(k) f x(k) + Df x(k)

¯ r .

The assertion can be deduced from the estimation

’1 (k)

¤ (1 + )1/2 Df (¯)’1

Df x(k) r(k)

r x

356 8. Iterative Methods for Nonlinear Equations

(1 + )1/2 Df (¯)’1 ·k (1 + )1/2 Df (¯)

¤ x(k) ’ x .

x x ¯

2

Here we used Exercise 8.4 (2), (3) and (8.23).

The ¬rst part of the approximation corresponds to the error of the ex-

act Newton step, which can be estimated using the same argument as in

Theorem 8.11, (3) (with Exercise 8.4, (2)) by

γ (k)

’1 2

f x(k) ’ x ¤ (1 + )1/2 Df (¯)’1

x(k) ’ Df x(k) x ’x

¯ x ¯ .

2

This implies the following result:

Corollary 8.14 Let the assumptions of Theorem 8.13 be satis¬ed. Then

there exist δ > 0 and · > 0 such that for x(0) ∈ Bδ (¯) and ·k ¤ · for all

¯ x ¯

k ∈ N for the inexact Newton™s method the following hold:

(1) The sequence x(k) converges linearly to x.

¯

k

(2) If ·k ’ 0 for k ’ ∞, then x(k) converges superlinearly.

k

(3) If ·k ¤ K f x(k) for a K > 0, then x(k) converges quadratical-

k

ly.

2

Proof: Exercise 8.5.

The estimation (8.24) suggests that we carefully choose a very ¬ne level

of accuracy · of the inner iteration to guarantee the above statements of

¯

convergence. This is particularly true for ill-conditioned Df (¯) (which is

x

common for discretization matrices: See (5.34)). In fact, the analysis in the

weighted norm · = Df (¯) · shows that only ·k ¤ · < 1 has to be

x ¯

ensured (cf. [22, pp. 97 ¬.]). With this and on the basis of

2 2

·k = ± f x(k) / f x(k’1)

˜

for some ± ¤ 1 we can construct ·k in an adaptive way (see [22, p. 105]).

Most of the iterative methods introduced in Chapter 5 do not require the

explicit knowledge of the matrix Df x(k) . It su¬ces that the operation

Df x(k) y be feasible for vectors y, in general for fewer than m of them;

i.e., the directional derivative of f in x(k) in direction y is needed. Thus

in case a di¬erence scheme for the derivatives of f should be necessary or

reasonable, it is more convenient to choose directly a di¬erence scheme for

the directional derivative.

Since we cannot expect convergence of Newton™s method in general,

we require indicators for the convergence behaviour of the iteration. The

solution x is in particular also the solution of

¯

for x ∈ Rm .

2

Minimize f (x)

8.2. Newton™s Method and Variants 357

Let x(0) , „ > 0, ·0 , ˜ ∈ (0, 1), k = 0, i = 0 be given.

¯

˜

δ (k,0) := 0 , i := 1 .

(1)

˜ ˜

(2) Determine the ith iterate δ (k,i) for Df (x(k) )δ (k) = ’f (x(k) )

and calculate

˜

r(i) := Df (x(k) )δ (k,i) + f (x(k) ) .

(3) If r(i) ¤ ·k f (x(k) ) , then go to (4),

else set i := i + 1 and go to (2).

˜ ˜

δ (k) := δ (k,i) .

(4)

˜

x(k+1) := x(k) + δ (k) .

(5)

(6) If f (x(k+1) ) > ˜ f (x(k) ) , interrupt.

(7) If f (x(k+1) ) ¤ „ f (x(0) ) , end.

Else calculate ·k+1 , set k := k + 1, and go to (1).

Table 8.1. Inexact Newton™s method with monotonicity test.

Thus we could expect a descent of the sequence of iterates (x(k) ) in this

functional, i.e.,

f (x(k+1) ) ¤ ˜ f (x(k) )

¯ ¯

for a ˜ < 1.

If this monotonicity test is not ful¬lled, the iteration is terminated. Such

an example of an inexact Newton™s method is given in Table 8.1.

In order to avoid the termination of the method due to divergence, the

continuation methods have been developed. They attribute the problem

f (x) = 0 to a family of problems to provide successively good initial iter-

ates. The approach presented at the end of Section 8.3 for time-dependent

problems is similar to the continuation methods. Another approach (which

can be combined with the latter) modi¬es the (inexact) Newton™s method,

so that the range of convergence is enlarged: Applying the damped (inex-

act) Newton™s method means reducing the step length of x(k) to x(k+1) as

long as we observe a decrease conformable to the monotonicity test. One

strategy of damping, termed Armijo™s rule, is described in Table 8.2 and

replaces the steps (1), (5), and (6) in Table 8.1.

Thus damping Newton™s method means also a relaxation similar to

(5.30), where ω = »k is being adjusted to the iteration step as in (5.41).

In the formulation of Table 8.2 the iteration may eventually not terminate

if in (5) »k is successively reduced. This must be avoided in a practical

implementation of the method. But except for situations where divergence

is obvious, this situation will not appear, because we have the following

theorem:

358 8. Iterative Methods for Nonlinear Equations

Let additionally ±, β ∈ (0, 1) be given.

˜

δ (k,0) := 0 , i := 1 , »k := 1.

(1)

˜

(5) If f (x(k) + »k δ (k) ) ≥ (1 ’ ±»k ) f (x(k) ) , set »k := β»k

and go to (5).

˜

x(k+1) := x(k) + »k δ (k) .

(6)

Table 8.2. Damped inexact Newton step according to Armijo™s rule.

Theorem 8.15 Let ±, β, γ > 0 exist such that conditions (i), (ii) of Theo-

rem 8.11 on k∈N Br (x(k) ) hold for the sequence (x(k) )k de¬ned according

to Table 8.2. Let ·k ¤ · for an · < 1 ’ ±.

¯ ¯

¯ ¯

= 0, there exists a » > 0 such that »k ≥ » for all k ∈ N. If

(0)

Then if f x

furthermore x(k) k is bounded, then there exists a zero x, satisfying (8.17)

¯

and

x(k) ’ x for k ’ ∞ .

¯

There exists a k0 ∈ N such that for k ≥ k0 the relation

»k = 1

holds.

2

Proof: See [22, pp. 139 ¬.].

We see that in the ¬nal stage of the iteration we again deal with the

(inexact) Newton™s method with the previously described behaviour of

convergence.

Finally, the following should be mentioned: The problem f (x) = 0 and

Newton™s method are a¬ne-invariant in the sense that a transition to

Af (x) = 0 with a nonsingular A ∈ Rm,m changes neither the problem

nor the iteration method, since

D(Af )(x)’1 Af (x) = Df (x)’1 f (x) .

Among the assumptions of Theorem 8.11, (8.14) is not a¬ne-invariant. A

possible alternative would be

Df (y)’1 (Df (x) ’ Df (y)) ¤ γ x ’ y ,

which ful¬ls the requirement. With the proof of Lemma 8.10 it follows that

γ

Df (y)’1 (f (x) ’ f (y) ’ Df (y)(x ’ y)) ¤ x’y 2.

2

With this argument a similar variant of Theorem 8.11 can be proven.

8.2. Newton™s Method and Variants 359

The test of monotonicity is not a¬ne-invariant, so probably the natural

test of monotonicity

’1 ’1

¤ ˜ Df x(k)

¯

Df x(k) f x(k+1) f x(k)

has to be preferred. The vector on the right-hand side has already been

calculated, being, except for the sign, the Newton correction δ (k) . But for

¯

the vector in the left-hand side, ’δ (k+1) , the system of equations

¯

Df x(k) δ (k+1) = ’f x(k+1)

additionally has to be resolved.

Exercises

8.3 Consider the chord method as described in (8.20). Prove the

convergence of this method to the solution x under the following

¯

assumptions:

(1) Let (8.17) with B r (¯) ‚ C hold,

x

’1

¤β,

Df (x(0) )

(2)

(3) 2βγr < 1 ,

(4) x(0) ∈ B r (¯) .

x

8.4 Let assumption (8.17) hold. Prove for compatible matrix and vector

norms that for every > 0 there exists a δ > 0 such that for every x ∈

Bδ (¯),

x

(1) Df (x) ¤ (1 + )1/2 Df (¯) ,

x

(2) Df (x)’1 ¤ (1 + )1/2 Df (¯)’1

x

(I ’ M )’1 ¤ 1/(1 ’ M )

(employ for M < 1),

(3) (1 + )’1/2 Df (¯)’1 ’1

x ’ x ¤ f (x)

x ¯

¤ (1 + )1/2 Df (¯) x’x ,

x ¯

(4) Theorem 8.12.

8.5 Prove Corollary 8.14.

8.6 Let U ‚ Rm be open and convex. Consider problem (8.2) with con-

tinuously di¬erentiable f : U ’ Rm . For i = 1, . . . , m let Ji ‚ {1, . . . , m}

be de¬ned by

‚j fi (x) = 0 for j ∈ Ji and every x ∈ U .

/

360 8. Iterative Methods for Nonlinear Equations

Then the operator f is sparsely occupied if li := |Ji | < m, or sparsely

occupied in the strict sense if li ¤ l for all i = 1, . . . , m and l < m is

independent of m for a sequence of problems (8.2) of dimension m.

Then the evaluation of Df (x) and its approximation according to (8.22)

m

both need k=1 lk evaluations of ‚j fi or of fl , respectively. What is the

computational e¬ort for a di¬erence approximation

f (x + hδ/ δ ) ’ f (x)

δ

h

of the directional derivative Df (x)δ ?

8.3 Semilinear Boundary Value Problems for

Elliptic and Parabolic Equations

In this section we treat semilinear problems as the simplest nonlinear case,

where nonlinearities do not occur in parts containing derivatives. Hence we

want to examine di¬erential equations of the form (0.33) that satisfy (0.42)

and (0.43).

Stationary Problems

As a stationary problem we consider the di¬erential equation

for x ∈ „¦

Lu(x) + ψ(u(x)) = 0 (8.25)

with the linear elliptic di¬erential operator L according to (3.12) and linear

boundary conditions on ‚„¦ according to (3.18)“(3.20). Here ψ : R ’ R

denotes a mapping that is supposed to be continuously di¬erentiable.

A Galerkin discretization in Vh ‚ V with H0 („¦) ‚ V ‚ H 1 („¦) according

1

to the type of boundary condition and Vh = span {•1 , . . . , •M } with the

M

approximative solution uh ∈ Vh in the representation uh = i=1 ξi •i gives

Sξ + G(ξ) = b (8.26)

with the sti¬ness matrix S = (a (•j , •i ) )i,j and a vector b that con-

tains the contributions of the inhomogeneous boundary conditions. Here

the nonlinear mapping G : RM ’ RM is de¬ned by

M