<< . .

. 15
( : 26)



. . >>

less than twice that of an evaluation of F . Modify nsol to accept this
analytic Jacobian rather than evaluate a di¬erence Jacobian. How do
the execution times and ¬‚op counts change?

5.7.22. Modify dirder by setting the forward di¬erence step h = 10’2 . How
does this change a¬ect the convergence rate observations that you have
made?



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




94 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

5.7.23. Add noise to your function evaluation with a random number generator,
such as the the MATLAB rand function. If the size of the noise is
O(10’4 ) and you make the appropriate adjustments to dirder, how are
the convergence rates di¬erent? What is an appropriate termination
criterion? At what point does the iteration stagnate?

5.7.24. Assume that the standard assumptions hold, that the cost of a function
evaluation is O(N 2 ) ¬‚oating-point operations, the cost of a Jacobian is
O(N ) function evaluations, and that x0 is near enough to x— so that the
Newton iteration converges q-quadratically to x— . Show that the number
n of Newton iterations needed to obtain en ¤ e0 is O(log( )) and
that the number of ¬‚oating-point operations required is O(N 3 log( )).
What about the chord method?

5.7.25. Develop a sparse version of nsol for problems with tridiagonal Jacobian.
In MATLAB, for example, you could modify diffjac to use the
techniques of [47] and nsol to use the sparse matrix factorization in
MATLAB. Apply your code to the central di¬erence discretization of the
two-point boundary value problem

’u = sin(u) + f (x), u(0) = u(1) = 0.

Here, the solution is u— (x) = x(1 ’ x), and

f (x) = 2 ’ sin(x(1 ’ x)).

Let u0 = 0 be the initial iterate.

5.7.26. Suppose you try to ¬nd an eigenvector-eigenvalue pair (φ, ») of an N —N
matrix A by solving the system of N + 1 nonlinear equations

Aφ ’ »φ 0
(5.23) F (φ, ») = =
φT φ ’ 1 0

for the vector x = (φT , »)T ∈ RN +1 . What is the Jacobian of this
system? If x = (φ, ») ∈ RN +1 is an eigenvector-eigenvalue pair, when is
F (x) nonsingular? Relate the application of Newton™s method to (5.23)
to the inverse power method. See [150] for more development of this
idea.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 6

Inexact Newton Methods




Theorem 5.4.1 describes how errors in the derivative/function a¬ect the
progress in the Newton iteration. Another way to look at this is to ask how
an approximate solution of the linear equation for the Newton step a¬ects the
iteration. This was the view taken in [55] where inexact Newton methods in
which the step satis¬es

(6.1) F (xc )s + F (xc ) ¤ ·c F (xc )

are considered. Any approximate step is accepted provided that the relative
residual of the linear equation is small. This is quite useful because conditions
like (6.1) are precisely the small linear residual termination conditions for
iterative solution of the linear system for the Newton step. Such methods are
not new. See [145] and [175], for example, for discussion of these ideas in the
context of the classical stationary iterative methods. In most of this chapter
we will focus on the approximate solution of the equation for the Newton step
by GMRES, but the other Krylov subspace methods discussed in Chapters 2
and 3 and elsewhere can also be used. We follow [69] and refer to the term ·c
on the right hand side of (6.1) as the forcing term.

6.1. The basic estimates
We will discuss speci¬c implementation issues in § 6.2. Before that we will
give the basic result from [55] to show how a step that satis¬es (6.1) a¬ects
the convergence. We will present this result in two parts. In § 6.1.1 we give a
straightforward analysis that uses the techniques of Chapter 5. In § 6.1.2 we
show how the requirements of the simple analysis can be changed to admit a
more aggressive (i.e., larger) choice of the parameter ·.

6.1.1. Direct analysis. The proof and application of Theorem 6.1.1 should
be compared to that of Theorem 5.1.1 and the other results in Chapter 5.
Theorem 6.1.1. Let the standard assumptions hold. Then there are δ and
KI such that if xc ∈ B(δ), s satis¬es (6.1), and x+ = xc + s then

(6.2) e+ ¤ KI ( ec + ·c ) ec .
95

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




96 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and
Theorem 5.1.1 hold. To prove the ¬rst assertion (6.2) note that if2

r = ’F (xc )s ’ F (xc )

is the linear residual then

s + F (xc )’1 F (xc ) = ’F (xc )’1 r

and
e+ = ec + s = ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r.
(6.3)
Now, (6.1), (4.7), and (4.6) imply that

s + F (xc )’1 F (xc ) ¤ F (xc )’1 ·c F (xc )

¤ 4κ(F (x— ))·c ec .

Hence, using (6.3) and Theorem 5.1.1

¤ ec ’ F (xc )’1 F (xc ) + 4κ(F (x— ))·c ec
e+

2 + 4κ(F (x— ))·c ec ,
¤ K ec

where K is the constant from (5.2). If we set

KI = K + 4κ(F (x— )),

the proof is complete.
We summarize the implications of Theorem 6.1.1 for iterative methods in
the next result, which is a direct consequence of (6.2) and will su¬ce to explain
most computational observations.
Theorem 6.1.2. Let the standard assumptions hold. Then there are δ and
· such that if x0 ∈ B(δ), {·n } ‚ [0, · ], then the inexact Newton iteration
¯ ¯

xn+1 = xn + sn ,

where
F (xn )sn + F (xn ) ¤ ·n F (xn )
converges q-linearly to x— . Moreover
• if ·n ’ 0 the convergence is q-superlinear, and
p
• if ·n ¤ K· F (xn ) for some K· > 0 the convergence is q-superlinear
with q-order 1 + p.

2 Our
de¬nition of r is consistent with the idea that the residual in a linear equation is b ’ Ax. In
[55] r = F (xc )s + F (xc ).




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




97
INEXACT NEWTON METHODS

Proof. Let δ be small enough so that (6.2) holds for xc ∈ B(δ). Reduce δ
and · if needed so that
¯
KI (δ + · ) < 1,
¯
where KI is from (6.2). Then if n ≥ 0 and xn ∈ B(δ) we have

en+1 ¤ KI ( en + ·n ) en ¤ KI (δ + · ) en < en .
¯

This proves q-linear convergence with a q-factor of KI (δ + · ).
¯
If ·n ’ 0 then q-superlinear convergence follows from the de¬nition. If
p
·n ¤ K· F (xn )

then we may use (4.7) and (6.2) to conclude
1’p
+ K· 2p F (x— ) p ) en 1+p
en+1 ¤ KI ( en

which completes the proof.

6.1.2. Weighted norm analysis. Since KI = O(κ(F (x— ))), one might
conclude from Theorem 6.1.2 that if F (x— ) is ill conditioned very small forcing
terms must be used. This is not the case and the purpose of this subsection
is to describe more accurately the necessary restrictions on the forcing terms.
The results here di¬er from those in § 6.1.1 in that no restriction is put on
the sequence {·n } ‚ [0, 1) other than requiring that 1 not be an accumulation
point.
Theorem 6.1.3. Let the standard assumptions hold. Then there is δ such
that if xc ∈ B(δ), s satis¬es (6.1), x+ = xc + s, and ·c ¤ · < · < 1, then
¯

F (x— )e+ ¤ · F (x— )ec .
(6.4) ¯

Proof. To prove (6.4) note that Theorem 4.0.1 implies that
2
γ ec

F (xc ) ¤ F (x )ec + .
2
Since
ec = F (x— )’1 F (x— )ec ¤ F (x— )’1 F (x— )ec
we have, with
γ F (x— )’1
M0 =
2
F (xc ) ¤ (1 + M0 δ) F (x— )ec .
(6.5)
Now,

F (x— )e+ = F (x— )(ec + s)

= F (x— )(ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r).



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




98 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

By Theorem 5.1.1

F (x— )(ec ’ F (xc )’1 F (xc )) ¤ K F (x— ) ec 2 .

Hence,
F (x— )e+ ¤ F (x— )F (xc )’1 r + K F (x— ) ec 2 .
(6.6)
Since
F (x— )F (xc )’1 r ¤ r + (F (x— ) ’ F (xc ))F (xc )’1 r

¤ (1 + 2γ F (x— )’1 ec ) r ,

we may set
M1 = 2γ F (x— )’1 , and M2 = K F (x— )
and obtain, using (4.7) and (6.5),

F (x— )e+ ¤ (1 + M1 δ) r + M2 δ ec

¤ (1 + M1 δ)(1 + M0 δ)·c F (x— )ec
(6.7)
+M2 δ F (x— )’1 F (x— )ec

¤ ((1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ) F (x— )ec .

Now let δ be small enough so that

(1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ¤ ·
¯

and the proof is complete.
Note that the distance δ from the initial iterate to the solution may have
to be smaller for (6.4) to hold than for (6.2). However (6.4) is remarkable in its
assertion that any method that produces a step with a linear residual less than
that of the zero step will reduce the norm of the error if the error is measured
with the weighted norm
· — = F (x— ) · .
In fact, Theorem 6.1.3 asserts q-linear convergence in the weighted norm if the
initial iterate is su¬ciently near x— . This is made precise in Theorem 6.1.4.
The application of Theorem 6.1.3 described in Theorem 6.1.4 di¬ers from
Theorem 6.1.2 in that we do not demand that {·n } be bounded away from 1
by a su¬ciently large amount, just that 1 not be an accumulation point. The
importance of this theorem for implementation is that choices of the sequence
of forcing terms {·n } (such as ·n = .5 for all n) that try to minimize the
number of inner iterations are completely justi¬ed by this result if the initial
iterate is su¬ciently near the solution. We make such a choice in one of the
examples considered in § 6.4 and compare it to a modi¬cation of a choice from



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




99
INEXACT NEWTON METHODS

[69] that decreases ·n rapidly with a view toward minimizing the number of
outer iterations.
Theorem 6.1.4. Let the standard assumptions hold. Then there is δ such
that if x0 ∈ B(δ), {·n } ‚ [0, ·] with · < · < 1, then the inexact Newton
¯
iteration
xn+1 = xn + sn ,
where
F (xn )sn + F (xn ) ¤ ·n F (xn )
converges q-linearly with respect to · — to x— . Moreover
• if ·n ’ 0 the convergence is q-superlinear, and

• if ·n ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear
with q-order 1 + p.
Proof. The proof uses both (6.4) and (6.2). Our ¬rst task is to relate the
norms · and · — so that we can estimate δ. Let δ0 be such that (6.4) holds
for ec < δ0 .
For all x ∈ RN ,

¤ F (x— ) x ¤ κ(F (x— )) x — ,
(6.8) x —

Note that e < δ0 if
< δ— = F (x— ) δ0
e —

Set δ = F (x— ) ’1 δ— . Then e0 — < δ— if e0 < δ.
Our proof does not rely on the (possibly false) assertions that en < δ for
all n or that xn ’ x— q-linearly with respect to the unweighted norm. Rather
we note that (6.4) implies that if en — < δ— then

(6.9) en+1 ¤ · en
¯ < δ—
— —

and hence xn ’ x— q-linearly with respect to the weighted norm. This proves
the ¬rst assertion.
To prove the assertions on q-superlinear convergence, note that since
xn ’ x— , eventually en will be small enough so that the conclusions of
Theorem 6.1.1 hold. The superlinear convergence results then follow exactly
as in the proof of Theorem 6.1.2.
We close this section with some remarks on the two types of results in
Theorems 6.1.2 and 6.1.4. The most signi¬cant di¬erence is the norm in which
q-linear convergence takes place. q-linear convergence with respect to the
weighted norm · — is equivalent to q-linear convergence of the sequence of
nonlinear residuals { F (xn ) }. We state this result as Proposition 6.1.1 and
leave the proof to the reader in Exercise 6.5.2. The implications for the choice of
the forcing terms {·n } are also di¬erent. While Theorem 6.1.2 might lead one
to believe that a small value of ·n is necessary for convergence, Theorem 6.1.4
shows that the sequence of forcing terms need only be kept bounded away from



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




100 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

1. A constant sequence such as ·n = .5 may well su¬ce for linear convergence,
but at a price of a greater number of outer iterations. We will discuss other
factors in the choice of forcing terms in § 6.3.
Proposition 6.1.1. Let the standard assumptions hold and let xn ’ x— .
Then F (xn ) converges q-linearly to 0 if and only if en — does.

6.1.3. Errors in the function. In this section, we state two theorems on
inexact local improvement. We leave the proofs, which are direct analogs to
the proofs of Theorems 6.1.1 and 6.1.3, to the reader as exercises. Note that
errors in the derivative, such as those arising from a di¬erence approximation
of the action of F (xc ) on a vector, can be regarded as part of the inexact
solution of the equation for the Newton step. See [55] or Proposition 6.2.1 and
its proof for an illustration of this point.
Theorem 6.1.5. Let the standard assumptions hold. Then there are δ and
KI such that if xc ∈ B(δ), s satis¬es

(6.10) F (xc )s + F (xc ) + (xc ) ¤ ·c F (xc ) + (xc )

and x+ = xc + s, then

(6.11) e+ ¤ KI (( ec + ·c ) ec + (xc ) ).

Theorem 6.1.6. Let the standard assumptions hold. Then there is δ such
that if xc ∈ B(δ), s satis¬es (6.10), x+ = xc + s, and ·c ¤ · < · < 1, then
¯
there is KE such that

F (x— )e+ ¤ · F (x— )ec + KE (xc ) .
(6.12) ¯


6.2. Newton-iterative methods
A Newton-iterative method realizes (6.1) with an iterative method for the linear
system for the Newton step, terminating when the relative linear residual is
smaller than ·c (i.e, when (6.1) holds). The name of the method indicates
which linear iterative method is used, for example, Newton-SOR, Newton-CG,
Newton-GMRES, would use SOR, CG, or GMRES to solve the linear system.
This naming convention is taken from [175] and [145]. These methods have also
been called truncated Newton methods [56], [136] in the context of optimization.
Typically the nonlinear iteration that generates the sequence {xn } is called
the outer iteration and the linear iteration that generates the approximations
to the steps is called the inner iteration.

<< . .

. 15
( : 26)



. . >>