<< . .

. 50
( : 59)



. . >>

There exists a zero x of f such that Df (¯) is nonsingular and
¯ x
(8.17)
Df is Lipschitz continuous in an open neighbourhood C of x.¯
> 0 there exists a δ > 0 such that for x, y ∈ Bδ (¯),
Then for every x
x ’ x ¤ (1 + ) κ(Df (¯)) f (x) y’x .
f (y) ¯ x ¯

2
Proof: See [22, p. 69, p. 72] and Exercise 8.4.

Here κ is the condition number in a matrix norm that is consistent with
the chosen vector norm. For x = x(k) and y = x(0) we get (locally) the
generalization of (5.16).


8.2.2 Modi¬cations of Newton™s Method
Modi¬cations of Newton™s method aim in two directions:
• Reduction of the cost of the assembling and the solution of the sys-
tem of equations (8.12) (without a signi¬cant deterioration of the
properties of convergence).
• Enlargement of the range of convergence.
We can account for the ¬rst aspect by simplifying the matrix in (8.12)
(modi¬ed or simpli¬ed Newton™s method). The extreme case is the replace-
ment of Df x(k) by the identity matrix; this leads us to the ¬xed-point
iteration (8.9). If the mapping f consists of a nonlinear and a linear part,
f (x) := Ax + g(x) = 0 , (8.18)
then the system of equations (8.12) of the Newton iteration reads as
δ (k) = ’f x(k) .
A + Dg x(k)
A straightforward simpli¬cation in this case is the ¬xed-point iteration
A δ (k) = ’f x(k) . (8.19)
It may be interpreted as the ¬xed-point iteration (8.9) of the system that
is preconditioned with A, i.e., of
A’1 f (x) = 0.
In (8.19) the matrix is identical in every iteration step; therefore, it has to
be assembled only once, and if we use a direct method (cf. Section 2.5),
the LU factorization has to be carried out only once. Thus with forward
354 8. Iterative Methods for Nonlinear Equations

and backward substitution we have only to perform methods with lower
computational cost. For iterative methods we cannot rely on this advantage,
but we can expect that x(k+1) is close to x(k) , and consequently δ (k,0) = 0
constitutes a good initial guess. Accordingly, the assembling of the matrix
gets more important with respect to the overall computational cost, and
savings during the assembling become relevant.
We get a system of equations similar to (8.19) by applying the chord
method (see Exercise 8.3), where the linear approximation of the initial
iterate is maintained, i.e.,
Df x(0) δ (k) = ’f x(k) . (8.20)

If the matrix B x(k) , which approximates Df x(k) , is changing in each
iteration step, i.e.,
B x(k) δ (k) = ’f x(k) , (8.21)
then the only advantage can be a possibly easier assembling or solvability of
the system of equations. If the partial derivatives ‚j fi (x) are more di¬cult
to evaluate than the function fi (y) itself (or possibly not evaluable at all),
then the approximation of Df (x(k) ) by di¬erence quotients can be taken
into consideration. This corresponds to
1
f (x + hej ) ’ f (x)
B x(k) ej = (8.22)
h
for column j of B x(k) with a ¬xed h > 0. The number of computations for
the assembling of the matrix remains the same: m2 for the full matrix and
analogously for the sparse matrix (see Exercise 8.6). Observe that numerical
di¬erentiation is an ill-posed problem, which means that we should ideally
choose h ∼ δ 1/2 , where δ > 0 is the error level in the evaluation of f . Even
then we can merely expect
Df x(k) ’ B x(k) ¤ Cδ 1/2
(see [22, pp. 80 f.]). Thus in the best case we can expect only half of the
signi¬cant digits of the machine precision. The second aspect of facilitated
solvability of (8.21) occurs if there appear “small” entries in the Jaco-
bian, due to a problem-dependent weak coupling of the components, and
these entries may be skipped. Take, for example, a Df x(k) with a block
structure as in (5.39):
Aij ∈ Rmi ,mj ,
Df x(k) = Aij ,
ij

such that the blocks Aij may be neglected for j > i. Then there results a
nested system of equations of the dimensions m1 , m2 , . . . , mp .
The possible advantages of such simpli¬ed Newton™s methods have to
be weighted against the disadvantage of a deterioration in the order of
convergence: Instead of an estimation like that in Theorem 8.11, (3), we
8.2. Newton™s Method and Variants 355

have to expect an additional term
B x(k) ’ Df x(k) x(k) ’ x .
This means only linear or ” by successive improvement of the approxi-
mation ” superlinear convergence (see [22, pp. 75 ¬.]). If we have a good
initial iterate, it may often be advantageous to perform a small number of
steps of Newton™s method. So in the following we will treat again Newton™s
method, although the subsequent considerations can also be transferred to
its modi¬cations.
If the linear problems (8.12) are solved with an iterative scheme, we
have the possibility to adjust the accuracy of the algorithm in order to
reduce the number of inner iterations, without a (severe) deterioration of
the convergence of the outer iteration of the Newton iteration. So dealing
with such inexact Newton™s methods, we determine instead of δ (k) from
˜
(8.12) only δ (k) , which ful¬ls (8.12) only up to an inner residual r(k) , i.e.,
˜
Df x(k) δ (k) = ’f x(k) + r(k) .
The new iterate is given by
˜
x(k+1) := x(k) + δ (k) .
˜
The accuracy of δ (k) is estimated by the requirement
r(k) ¤ ·k f x(k) (8.23)
with certain properties for the sequence (·k )k that still have to be de-
termined. Since the natural choice of the initial iterate for solving (8.12)
is δ (k,0) = 0, (8.23) corresponds to the termination criterion (5.15).
Conditions for ·k can be deduced from the following theorem:
Theorem 8.13 Let (8.17) hold and consider compatible matrix and vector
norms. Then there exists for every > 0 a δ > 0 such that for x(k) ∈ Bδ (¯),
x
’1
x(k+1) ’ x ¤ x(k) ’ Df x(k) f x(k) ’ x
¯ ¯
(8.24)
+ (1 + ) κ Df (¯) ·k x(k) ’ x .
x ¯

Proof: By the choice of δ we can ensure the nonsingularity of Df (x(k) ).
From
’1 ’1 (k)
˜
δ (k) = ’Df x(k) f x(k) + Df x(k) r
it follows that
¯˜
x(k+1) ’ x x(k) ’ x + δ (k)
¯ =
’1 ’1 (k)
¤ x(k) ’ x ’ Df x(k) f x(k) + Df x(k)
¯ r .
The assertion can be deduced from the estimation
’1 (k)
¤ (1 + )1/2 Df (¯)’1
Df x(k) r(k)
r x
356 8. Iterative Methods for Nonlinear Equations

(1 + )1/2 Df (¯)’1 ·k (1 + )1/2 Df (¯)
¤ x(k) ’ x .
x x ¯
2
Here we used Exercise 8.4 (2), (3) and (8.23).

The ¬rst part of the approximation corresponds to the error of the ex-
act Newton step, which can be estimated using the same argument as in
Theorem 8.11, (3) (with Exercise 8.4, (2)) by
γ (k)
’1 2
f x(k) ’ x ¤ (1 + )1/2 Df (¯)’1
x(k) ’ Df x(k) x ’x
¯ x ¯ .
2
This implies the following result:
Corollary 8.14 Let the assumptions of Theorem 8.13 be satis¬ed. Then
there exist δ > 0 and · > 0 such that for x(0) ∈ Bδ (¯) and ·k ¤ · for all
¯ x ¯
k ∈ N for the inexact Newton™s method the following hold:
(1) The sequence x(k) converges linearly to x.
¯
k

(2) If ·k ’ 0 for k ’ ∞, then x(k) converges superlinearly.
k

(3) If ·k ¤ K f x(k) for a K > 0, then x(k) converges quadratical-
k
ly.

2
Proof: Exercise 8.5.

The estimation (8.24) suggests that we carefully choose a very ¬ne level
of accuracy · of the inner iteration to guarantee the above statements of
¯
convergence. This is particularly true for ill-conditioned Df (¯) (which is
x
common for discretization matrices: See (5.34)). In fact, the analysis in the
weighted norm · = Df (¯) · shows that only ·k ¤ · < 1 has to be
x ¯
ensured (cf. [22, pp. 97 ¬.]). With this and on the basis of
2 2
·k = ± f x(k) / f x(k’1)
˜
for some ± ¤ 1 we can construct ·k in an adaptive way (see [22, p. 105]).
Most of the iterative methods introduced in Chapter 5 do not require the
explicit knowledge of the matrix Df x(k) . It su¬ces that the operation
Df x(k) y be feasible for vectors y, in general for fewer than m of them;
i.e., the directional derivative of f in x(k) in direction y is needed. Thus
in case a di¬erence scheme for the derivatives of f should be necessary or
reasonable, it is more convenient to choose directly a di¬erence scheme for
the directional derivative.
Since we cannot expect convergence of Newton™s method in general,
we require indicators for the convergence behaviour of the iteration. The
solution x is in particular also the solution of
¯
for x ∈ Rm .
2
Minimize f (x)
8.2. Newton™s Method and Variants 357


Let x(0) , „ > 0, ·0 , ˜ ∈ (0, 1), k = 0, i = 0 be given.
¯
˜
δ (k,0) := 0 , i := 1 .
(1)
˜ ˜
(2) Determine the ith iterate δ (k,i) for Df (x(k) )δ (k) = ’f (x(k) )
and calculate
˜
r(i) := Df (x(k) )δ (k,i) + f (x(k) ) .
(3) If r(i) ¤ ·k f (x(k) ) , then go to (4),
else set i := i + 1 and go to (2).
˜ ˜
δ (k) := δ (k,i) .
(4)
˜
x(k+1) := x(k) + δ (k) .
(5)
(6) If f (x(k+1) ) > ˜ f (x(k) ) , interrupt.
(7) If f (x(k+1) ) ¤ „ f (x(0) ) , end.
Else calculate ·k+1 , set k := k + 1, and go to (1).

Table 8.1. Inexact Newton™s method with monotonicity test.

Thus we could expect a descent of the sequence of iterates (x(k) ) in this
functional, i.e.,
f (x(k+1) ) ¤ ˜ f (x(k) )
¯ ¯
for a ˜ < 1.
If this monotonicity test is not ful¬lled, the iteration is terminated. Such
an example of an inexact Newton™s method is given in Table 8.1.
In order to avoid the termination of the method due to divergence, the
continuation methods have been developed. They attribute the problem
f (x) = 0 to a family of problems to provide successively good initial iter-
ates. The approach presented at the end of Section 8.3 for time-dependent
problems is similar to the continuation methods. Another approach (which
can be combined with the latter) modi¬es the (inexact) Newton™s method,
so that the range of convergence is enlarged: Applying the damped (inex-
act) Newton™s method means reducing the step length of x(k) to x(k+1) as
long as we observe a decrease conformable to the monotonicity test. One
strategy of damping, termed Armijo™s rule, is described in Table 8.2 and
replaces the steps (1), (5), and (6) in Table 8.1.
Thus damping Newton™s method means also a relaxation similar to
(5.30), where ω = »k is being adjusted to the iteration step as in (5.41).
In the formulation of Table 8.2 the iteration may eventually not terminate
if in (5) »k is successively reduced. This must be avoided in a practical
implementation of the method. But except for situations where divergence
is obvious, this situation will not appear, because we have the following
theorem:
358 8. Iterative Methods for Nonlinear Equations


Let additionally ±, β ∈ (0, 1) be given.
˜
δ (k,0) := 0 , i := 1 , »k := 1.
(1)
˜
(5) If f (x(k) + »k δ (k) ) ≥ (1 ’ ±»k ) f (x(k) ) , set »k := β»k
and go to (5).
˜
x(k+1) := x(k) + »k δ (k) .
(6)

Table 8.2. Damped inexact Newton step according to Armijo™s rule.


Theorem 8.15 Let ±, β, γ > 0 exist such that conditions (i), (ii) of Theo-
rem 8.11 on k∈N Br (x(k) ) hold for the sequence (x(k) )k de¬ned according
to Table 8.2. Let ·k ¤ · for an · < 1 ’ ±.
¯ ¯
¯ ¯
= 0, there exists a » > 0 such that »k ≥ » for all k ∈ N. If
(0)
Then if f x
furthermore x(k) k is bounded, then there exists a zero x, satisfying (8.17)
¯
and
x(k) ’ x for k ’ ∞ .
¯
There exists a k0 ∈ N such that for k ≥ k0 the relation
»k = 1
holds.

2
Proof: See [22, pp. 139 ¬.].

We see that in the ¬nal stage of the iteration we again deal with the
(inexact) Newton™s method with the previously described behaviour of
convergence.
Finally, the following should be mentioned: The problem f (x) = 0 and
Newton™s method are a¬ne-invariant in the sense that a transition to
Af (x) = 0 with a nonsingular A ∈ Rm,m changes neither the problem
nor the iteration method, since
D(Af )(x)’1 Af (x) = Df (x)’1 f (x) .
Among the assumptions of Theorem 8.11, (8.14) is not a¬ne-invariant. A
possible alternative would be
Df (y)’1 (Df (x) ’ Df (y)) ¤ γ x ’ y ,
which ful¬ls the requirement. With the proof of Lemma 8.10 it follows that
γ
Df (y)’1 (f (x) ’ f (y) ’ Df (y)(x ’ y)) ¤ x’y 2.
2
With this argument a similar variant of Theorem 8.11 can be proven.
8.2. Newton™s Method and Variants 359

The test of monotonicity is not a¬ne-invariant, so probably the natural
test of monotonicity
’1 ’1
¤ ˜ Df x(k)
¯
Df x(k) f x(k+1) f x(k)
has to be preferred. The vector on the right-hand side has already been
calculated, being, except for the sign, the Newton correction δ (k) . But for
¯
the vector in the left-hand side, ’δ (k+1) , the system of equations
¯
Df x(k) δ (k+1) = ’f x(k+1)
additionally has to be resolved.



Exercises
8.3 Consider the chord method as described in (8.20). Prove the
convergence of this method to the solution x under the following
¯
assumptions:
(1) Let (8.17) with B r (¯) ‚ C hold,
x
’1
¤β,
Df (x(0) )
(2)
(3) 2βγr < 1 ,
(4) x(0) ∈ B r (¯) .
x

8.4 Let assumption (8.17) hold. Prove for compatible matrix and vector
norms that for every > 0 there exists a δ > 0 such that for every x ∈
Bδ (¯),
x
(1) Df (x) ¤ (1 + )1/2 Df (¯) ,
x

(2) Df (x)’1 ¤ (1 + )1/2 Df (¯)’1
x
(I ’ M )’1 ¤ 1/(1 ’ M )
(employ for M < 1),

(3) (1 + )’1/2 Df (¯)’1 ’1
x ’ x ¤ f (x)
x ¯
¤ (1 + )1/2 Df (¯) x’x ,
x ¯

(4) Theorem 8.12.

8.5 Prove Corollary 8.14.

8.6 Let U ‚ Rm be open and convex. Consider problem (8.2) with con-
tinuously di¬erentiable f : U ’ Rm . For i = 1, . . . , m let Ji ‚ {1, . . . , m}
be de¬ned by
‚j fi (x) = 0 for j ∈ Ji and every x ∈ U .
/
360 8. Iterative Methods for Nonlinear Equations

Then the operator f is sparsely occupied if li := |Ji | < m, or sparsely
occupied in the strict sense if li ¤ l for all i = 1, . . . , m and l < m is
independent of m for a sequence of problems (8.2) of dimension m.
Then the evaluation of Df (x) and its approximation according to (8.22)
m
both need k=1 lk evaluations of ‚j fi or of fl , respectively. What is the
computational e¬ort for a di¬erence approximation
f (x + hδ/ δ ) ’ f (x)
δ
h
of the directional derivative Df (x)δ ?


8.3 Semilinear Boundary Value Problems for
Elliptic and Parabolic Equations
In this section we treat semilinear problems as the simplest nonlinear case,
where nonlinearities do not occur in parts containing derivatives. Hence we
want to examine di¬erential equations of the form (0.33) that satisfy (0.42)
and (0.43).

Stationary Problems
As a stationary problem we consider the di¬erential equation
for x ∈ „¦
Lu(x) + ψ(u(x)) = 0 (8.25)
with the linear elliptic di¬erential operator L according to (3.12) and linear
boundary conditions on ‚„¦ according to (3.18)“(3.20). Here ψ : R ’ R
denotes a mapping that is supposed to be continuously di¬erentiable.
A Galerkin discretization in Vh ‚ V with H0 („¦) ‚ V ‚ H 1 („¦) according
1

to the type of boundary condition and Vh = span {•1 , . . . , •M } with the
M
approximative solution uh ∈ Vh in the representation uh = i=1 ξi •i gives
Sξ + G(ξ) = b (8.26)
with the sti¬ness matrix S = (a (•j , •i ) )i,j and a vector b that con-
tains the contributions of the inhomogeneous boundary conditions. Here
the nonlinear mapping G : RM ’ RM is de¬ned by
M

<< . .

. 50
( : 59)



. . >>