<< . .

. 12
( : 26)



. . >>

this topic) may provide such an alternative.
For de¬niteness in the description of Algorithm newton we use an LU
factorization of F . Any other appropriate factorization such as QR or
Cholesky could be used as well. The inputs of the algorithm are the initial
iterate x, the nonlinear map F , and a vector of termination tolerances
„ = („r , „a ) ∈ R2 .
Algorithm 5.3.1. newton(x, F, „ )
1. r0 = F (x)

2. Do while F (x) > „r r0 + „a
(a) Compute F (x)
(b) Factor F (x) = LU .
(c) Solve LU s = ’F (x)
(d) x = x + s
(e) Evaluate F (x).

One approach to reducing the cost of items 2a and item 2b in the Newton
iteration is to move them outside of the main loop. This means that the linear
approximation of F (x) = 0 that is solved at each iteration has derivative
determined by the initial iterate.

x+ = xc ’ F (x0 )’1 F (xc ).

This method is called the chord method. The inputs are the same as for
Newton™s method.
Algorithm 5.3.2. chord(x, F, „ )
1. r0 = F (x)

2. Compute F (x)

3. Factor F (x) = LU .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




75
NEWTON™S METHOD

4. Do while F (x) > „r r0 + „a
(a) Solve LU s = ’F (x)
(b) x = x + s
(c) Evaluate F (x).

The only di¬erence in implementation from Newton™s method is that the
computation and factorization of the Jacobian are done before the iteration
is begun. The di¬erence in the iteration itself is that an approximation to
F (xc ) is used. Similar di¬erences arise if F (xc ) is numerically approximated
by di¬erences. We continue in the next section with an analysis of the e¬ects
of errors in the function and Jacobian in a general context and then consider
the chord method and di¬erence approximation to F as applications.

5.4. Errors in the function and derivative
Suppose that F and F are computed inaccurately so that F + and F + ∆
are used instead of F and F in the iteration. If ∆ is su¬ciently small, the
resulting iteration can return a result that is an O( ) accurate approximation
to x— . This is di¬erent from convergence and was called “local improvement”
in [65]. These issues are discussed in [201] as well. If, for example, is entirely
due to ¬‚oating-point roundo¬, there is no reason to expect that F (xn ) will
ever be smaller than in general. We will refer to this phase of the iteration
in which the nonlinear residual is no longer being reduced as stagnation of the
iteration.
Theorem 5.4.1. Let the standard assumptions hold. Then there are
¯
K > 0, δ > 0, and δ1 > 0 such that if xc ∈ B(δ) and ∆(xc ) < δ1 then

x+ = xc ’ (F (xc ) + ∆(xc ))’1 (F (xc ) + (xc ))

is de¬ned (i.e., F (xc ) + ∆(xc ) is nonsingular) and satis¬es
¯ 2
(5.5) e+ ¤ K( ec + ∆(xc ) ec + (xc ) ).

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and
Theorem 5.1.1 hold. Let

xN = xc ’ F (xc )’1 F (xc )
+

and note that

x+ = xN + (F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 )F (xc ) ’ (F (xc ) + ∆(xc ))’1 (xc ).
+

Lemma 4.3.1 and Theorem 5.1.1 imply
2 + 2 F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ) F (x— ) ec
e+ ¤ K ec
(5.6)
+ (F (xc ) + ∆(xc ))’1 (xc ) .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




76 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

If
∆(xc ) ¤ F (x— )’1 ’1
/4
then Lemma 4.3.1 implies that

∆(xc ) ¤ F (xc )’1 ’1
/2

and the Banach Lemma implies that F (xc ) + ∆(xc ) is nonsingular and

(F (xc ) + ∆(xc ))’1 ¤ 2 F (xc )’1 ¤ 4 F (x— )’1 .

Hence

F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ¤ 8 F (x— )’1 2
∆(xc ) .

We use these estimates and (5.6) to obtain
2
+ 16 F (x— )’1 2
F (x— ) ∆(xc ) ec + 4 F (x— )’1
e+ ¤ K ec (xc ) .

Setting
¯
K = K + 16 F (x— )’1 2
F (x— ) + 4 F (x— )’1
completes the proof.
The remainder of this section is devoted to applications of Theorem 5.4.1.

5.4.1. The chord method. Recall that the chord method is given by

x+ = xc ’ F (x0 )’1 F (xc ).

In the language of Theorem 5.4.1

(xc ) = 0, ∆(xc ) = F (x0 ) ’ F (xc ).

Hence, if xc , x0 ∈ B(δ) ‚ „¦

(5.7) ∆(xc ) ¤ γ x0 ’ xc ¤ γ( e0 + ec ).

We apply Theorem 5.4.1 to obtain the following result.
Theorem 5.4.2. Let the standard assumptions hold. Then there are
KC > 0 and δ > 0 such that if x0 ∈ B(δ) the chord iterates converge q-linearly
to x— and
(5.8) en+1 ¤ KC e0 en .
Proof. Let δ be small enough so that B(δ) ‚ „¦ and the conclusions of
Theorem 5.4.1 hold. Assume that xn ∈ B(δ). Combining (5.7) and (5.5)
implies
¯ ¯
en+1 ¤ K( en (1 + γ) + γ e0 ) en ¤ K(1 + 2γ)δ en .

Hence if δ is small enough so that
¯
K(1 + 2γ)δ = · < 1



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




77
NEWTON™S METHOD

then the chord iterates converge q-linearly to x— . Q-linear convergence implies
¯
that en < e0 and hence (5.8) holds with KC = K(1 + 2γ).
Another variation of the chord method is

x+ = xc ’ A’1 F (xc ),

where A ≈ F (x— ). Methods of this type may be viewed as preconditioned
nonlinear Richardson iteration. Since

∆(xc ) = A ’ F (xc ) ¤ A ’ F (x— ) + F (x— ) ’ F (xc ) ,

if xc ∈ B(δ) ‚ „¦ then

∆(xc ) ¤ A ’ F (x— ) + γ ec ¤ A ’ F (x— ) + γδ.

Theorem 5.4.3. Let the standard assumptions hold. Then there are
KA > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and A ’ F (x— ) < δ1 then
the iteration
xn+1 = xn ’ A’1 F (xn )
converges q-linearly to x— and

en+1 ¤ KA ( e0 + A ’ F (x— ) ) en .
(5.9)


5.4.2. Approximate inversion of F . Another way to implement chord-
type methods is to provide an approximate inverse of F . Here we replace
F (x)’1 by B(x), where the action of B on a vector is less expensive to compute
than a solution using the LU factorization. Rather than express the iteration
in terms of B ’1 (x) ’ F (x) and using Theorem 5.4.3 one can proceed directly
from the de¬nition of approximate inverse.
Note that if B is constant (independent of x) then the iteration

x+ = xc ’ BF (xc )

can be viewed as a preconditioned nonlinear Richardson iteration.
We have the following result.
Theorem 5.4.4. Let the standard assumptions hold. Then there are
KB > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and the matrix-valued
function B(x) satis¬es

I ’ B(x)F (x— ) = ρ(x) < δ1
(5.10)

for all x ∈ B(δ) then the iteration

xn+1 = xn ’ B(xn )F (xn )

converges q-linearly to x— and

(5.11) en+1 ¤ KB (ρ(xn ) + en ) en .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




78 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. On one hand, this theorem could be regarded as a corollary of the
Banach Lemma and Theorem 5.4.1. We give a direct proof.
First, by (5.10) we have

= B(x)F (x— )F (x— )’1 ¤ B(x)F (x— ) F (x— )’1
B(x)
(5.12)
¤ MB = (1 + δ1 ) F (x— )’1 .

Using (5.12) and
1
(I ’ B(xc )F (x— + tec ))ec dt
e+ = ec ’ B(xc )F (xc ) =
0

1
= (I ’ B(xc )F (x— ))ec + B(xc ) (F (x— ) ’ F (x— + tec ))ec dt
0

we have
e+ ¤ ρ(xc ) ec + MB γ ec 2 /2.
This completes the proof with KB = 1 + MB γ/2.

5.4.3. The Shamanskii method. Alternation of a Newton step with a
sequence of chord steps leads to a class of high-order methods, that is, methods
that converge q-superlinearly with q-order larger than 2. We follow [18] and
name this method for Shamanskii [174], who considered the ¬nite di¬erence
Jacobian formulation of the method from the point of view of e¬ciency. The
method itself was analyzed in [190]. Other high-order methods, especially for
problems in one dimension, are described in [190] and [146].
We can describe the transition from xc to x+ by

= xc ’ F (xc )’1 F (xc ),
y1

yj+1 = yj ’ F (xc )’1 F (yj ) for 1 ¤ j ¤ m ’ 1,
(5.13)

x+ = ym .

Note that m = 1 is Newton™s method and m = ∞ is the chord method with
{yj } playing the role of the chord iterates.
Algorithmically a second loop surrounds the factor/solve step. The inputs
for the Shamanskii method are the same as for Newton™s method or the chord
method except for the addition of the parameter m. Note that we can overwrite
xc with the sequence of yj ™s. Note that we apply the termination test after
computation of each yj .
Algorithm 5.4.1. sham(x, F, „, m)
1. r0 = F (x)

2. Do while F (x) > „r r0 + „a



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




79
NEWTON™S METHOD

(a) Compute F (x)
(b) Factor F (x) = LU .
(c) for j = 1, . . . m
i. Solve LU s = ’F (x)
ii. x = x + s
iii. Evaluate F (x).
iv. If F (x) ¤ „r r0 + „a exit.

The convergence result is a simple application of Theorem 5.4.2.
Theorem 5.4.5. Let the standard assumptions hold and let m ≥ 1 be
given. Then there are KS > 0 and δ > 0 such that if x0 ∈ B(δ) the Shamanskii
iterates converge q-superlinearly to x— with q-order m + 1 and
m+1
(5.14) en+1 ¤ KS en .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and that the conclusions
of Theorem 5.4.2 hold. Then if xn ∈ B(δ) all the intermediate iterates {yj }
are in B(δ) by Theorem 5.4.2. In fact, if we set y1 = xn , (5.8) implies that for
1¤j¤m
j
yj ’x— ¤ KC xn ’x— yj’1 ’x— = KC en yj’1 ’x— ¤ . . . ¤ KC en j+1
.

Hence xn+1 ∈ B(δ). Setting j = m in the above inequality completes the
proof.
The advantage of the Shamanskii method over Newton™s method is
that high q-orders can be obtained with far fewer Jacobian evaluations or
factorizations. Optimal choices of m as a function of N can be derived
under assumptions consistent with dense matrix algebra [18] by balancing the
O(N 3 ) cost of a matrix factorization with the cost of function and Jacobian
evaluation. The analysis in [18] and the expectation that, if the initial iterate is
su¬ciently accurate, only a small number of iterations will be taken, indicate
that the chord method is usually the best option for very large problems.
Algorithm nsol is based on this idea and using an idea from [114] computes
and factors the Jacobian only until an estimate of the q-factor for the linear
convergence rate is su¬ciently low.

5.4.4. Di¬erence approximation to F . Assume that we compute
F (x) + (x) instead of F (x) and attempt to approximate the action of F
on a vector by a forward di¬erence. We would do this, for example, when
building an approximate Jacobian. Here the jth column of F (x) is F (x)ej
where ej is the unit vector with jth component 1 and other components 0.
A forward di¬erence approximation to F (x)w would be

F (x + hw) + (x + hw) ’ F (x) ’ (x)
.
h


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




80 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Assume that (x) ¤ ¯. Then


F (x + hw) + (x + hw) ’ F (x) ’ (x)
F (x)w ’ = O(h + ¯/h).
h


The quantity inside the O-term is minimized when h = ¯. This means, for
instance, that very small di¬erence steps h can lead to inaccurate results. In
the special case where (x) is a result of ¬‚oating-point roundo¬ in full precision
(¯ ≈ 10’15 ), the analysis here indicates that h ≈ 10’7 is a reasonable choice.
The choice h ≈ 10’15 in this case can lead to disaster (try it!). Here we have
made the implicit assumption that x and w are of roughly the same size. If
this is not the case, h should be scaled to re¬‚ect that. The choice


h = ¯1/2 x / w


re¬‚ects all the discussion above.
A more important consequence of this analysis is that the choice of step
size in a forward di¬erence approximation to the action of the Jacobian on
a vector must take into account the error in the evaluation of F . This can
become very important if part of F is computed from measured data.
If F (x) has already been computed, the cost of a forward di¬erence
approximation of F (x)w is an additional evaluation of F (at the point x+hw).
Hence the evaluation of a full ¬nite di¬erence Jacobian would cost N function
evaluations, one for each column of the Jacobian. Exploitation of special
structure could reduce this cost.
The forward di¬erence approximation to the directional derivative is not
a linear operator in w. The reason for this is that the derivative has been
approximated, and so linearity has been lost. Because of this we must
carefully specify what we mean by a di¬erence approximation to the Jacobian.
De¬nition 5.4.1 below speci¬es the usual choice.
Definition 5.4.1. Let F be de¬ned in a neighborhood of x ∈ RN .
(∇h F )(x) is the N — N matrix whose jth column is given by

±
 F (x + h x ej ) ’ F (x)
 x=0

 hx

(∇h F )(x)j =

 F (he ) ’ F (x)

 j
 x=0
h

We make a similar de¬nition of the di¬erence approximation of the
directional derivative.
Definition 5.4.2. Let F be de¬ned in a neighborhood of x ∈ RN and let

<< . .

. 12
( : 26)



. . >>