<< . .

. 22
( : 26)



. . >>

¯

ξn ¤ ·n F (xn ) ¤ · F (xn ) .
¯

The step acceptance condition in nsola implies that { F (xn ) } is a decreasing
sequence and therefore

dn = F (xn )’1 (ξn ’ F (xn )) ¤ mf (1 + · ) F (xn ) .
¯

Therefore, by Assumption 8.2.1 F is Lipschitz continuous on the line segment
[xn , xn + »dn ] whenever
¯
» ¤ »1 = r/(mf F (x0 ) ).
¯
Lipschitz continuity of F implies that if » ¤ »1 then
γ2 2
F (xn + »dn ) ¤ (1 ’ ») F (xn ) + »¯ F (xn ) +
· » dn
2

m2 γ(1 + · )2 »2 F (xn ) 2
¯
f
¤ (1 ’ ») F (xn )| + »¯ F (xn ) +
·
2

m2 γ»(1 + · )2 F (x0 )
¯
f
¤ (1 ’ » + · ») F (xn ) + » F (xn )
¯
2

< (1 ’ ±») F (xn ) ,

which will be implied by

2(1 ’ ± ’ · )
¯
¯ ¯
» ¤ »2 = min »1 , .
(1 + · )2 m2 γ F (x0 )
¯ f

¯
This shows that » can be no smaller than σ0 »2 , which completes the proof.
Now we can show directly that the sequence of Armijo iterates either is
unbounded or converges to a solution. Theorem 8.2.1 says even more. The



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




141
GLOBAL CONVERGENCE

sequence converges to a root at which the standard assumptions hold, so in the
terminal phase of the iteration the step lengths are all 1 and the convergence
is q-quadratic.
Theorem 8.2.1. Let x0 ∈ RN and ± ∈ (0, 1) be given. Assume that {xn }
is given by Algorithm nsola, {·n } satis¬es (8.3), {xn } is bounded, and that
Assumption 8.2.1 holds. Then {xn } converges to a root x— of F at which the
standard assumptions hold, full steps are taken for n su¬ciently large, and the
convergence behavior in the ¬nal phase of the iteration is that given by the local
theory for inexact Newton methods (Theorem 6.1.2).
Proof. If F (xn ) = 0 for some n, then we are done because Assump-
tion 8.2.1 implies that the standard assumptions hold at x— = xn . Otherwise
Lemma 8.2.1 implies that F (xn ) converges q-linearly to zero with q-factor at
¯
most (1 ’ ±»).
The boundedness of the sequence {xn } implies that a subsequence {xnk }
converges, say to x— . Since F is continuous, F (x— ) = 0. Eventually
|xnk ’ x— | < r, where r is the radius in Assumption 8.2.1 and therefore the
standard assumptions hold at x— .
Since the standard assumptions hold at x— , there is δ such that if x ∈ B(δ),
the Newton iteration with x0 = x will remain in B(δ) and converge q-
quadratically to x— . Hence as soon as xnk ∈ B(δ), the entire sequence, not
just the subsequence, will remain in B(δ) and converge to x— . Moreover,
Theorem 6.1.2 applies and hence full steps can be taken.
At this stage we have shown that if the Armijo iteration fails to converge
to a root either the continuity properties of F or nonsingularity of F break
down as the iteration progresses (Assumption 8.2.1 is violated) or the iteration
becomes unbounded. This is all that one could ask for in terms of robustness
of such an algorithm.
Thus far we have used for d the inexact Newton direction. It could well be
advantageous to use a chord or Broyden direction, for example. All that one
needs to make the theorems and proofs in § 8.2 hold is (8.2), which is easy to
verify, in principle, via a forward di¬erence approximation to F (xn )d.

8.3. Implementation of the Armijo rule
The MATLAB code nsola is a modi¬cation of nsolgm which provides for a
choice of several Krylov methods for computing an inexact Newton direction
and globalizes Newton™s method with the Armijo rule. We use the l2 norm in
that code and in the numerical results reported in § 8.4. If GMRES is used
as the linear solver, then the storage requirements are roughly the same as for
Algorithm nsolgm if one reuses the storage location for the ¬nite di¬erence
directional derivative to test for su¬cient decrease.
In Exercise 8.5.9 you are asked to do this with nsol and, with each
iteration, numerically determine if the chord direction satis¬es the inexact
Newton condition. In this case the storage requirements are roughly the same
as for Algorithm nsol. The iteration can be very costly if the Jacobian must be



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




142 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

evaluated and factored many times during the phase of the iteration in which
the approximate solution is far from the root.
We consider two topics in this section. The ¬rst, polynomial interpolation
as a way to choose », is applicable to all methods. The second, how Broyden™s
method must be implemented, is again based on [67] and is more specialized.
Our Broyden“Armijo code does not explicitly check the inexact Newton
condition and thereby saves an evaluation of F . Evaluation of F (xc )dc if the
full step is not accepted (here dc is the Broyden direction) would not only verify
the inexact Newton condition but provide enough information for a two-point
parabolic line search. The reader is asked to pursue this in Exercise 8.5.10.

8.3.1. Polynomial line searches. In this section we consider an elabora-
tion of the simple line search scheme of reduction of the steplength by a ¬xed
factor. The motivation for this is that some problems respond well to one or
two reductions in the steplength by modest amounts (such as 1/2) and others
require many such reductions, but might respond well to a more aggressive
steplength reduction (by factors of 1/10, say). If one can model the decrease
in F 2 as the steplength is reduced, one might expect to be able to better
estimate the appropriate reduction factor. In practice such methods usually
perform better than constant reduction factors.
If we have rejected k steps we have in hand the values

F (xn ) 2 , F (xn + »1 dn ) 2 , . . . F (xn + »k’1 dn ) 2 .

We can use this iteration history to model the scalar function
2
f (») = F (xn + »dn ) 2

with a polynomial and use the minimum of that polynomial as the next
steplength. We consider two ways of doing this that use second degree
polynomials which we compute using previously computed information.
After »c has been rejected and a model polynomial computed, we compute
the minimum »t of that polynomial analytically and set
±
 σ0 »c if »t < σ0 »c ,





σ1 »c if »t > σ1 »c ,
(8.5) »+ =






»t otherwise.

Two-point parabolic model. In this approach we use values of f and f
at » = 0 and the value of f at the current value of » to construct a 2nd degree
interpolating polynomial for f .
f (0) = F (xn ) 2 is known. Clearly f (0) can be computed as
2

f (0) = 2(F (xn )T dn )T F (xn ) = 2F (xn )T (F (xn )dn ).
(8.6)



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




143
GLOBAL CONVERGENCE

Use of (8.6) requires evaluation of F (xn )dn , which can be obtained by
examination of the ¬nal residual from GMRES or by expending an additional
function evaluation to compute F (xn )dn with a forward di¬erence. In any
case, our approximation to f (0) should be negative. If it is not, it may
be necessary to compute a new search direction. This is a possibility with
directions other than inexact Newton directions, such as the Broyden direction.
The polynomial p(») such that p, p agree with f, f at 0 and p(»c ) = f (»c )
is
p(») = f (0) + f (0)» + c»2 ,
where
f (»c ) ’ f (0) ’ f (0)»c
c= .
»2
c
Our approximation of f (0) < 0, so if f (»c ) > f (0), then c > 0 and p(») has a
minimum at
»t = ’f (0)/(2c) > 0.
We then compute »+ with (8.5).

Three-point parabolic model. An alternative to the two-point model that
avoids the need to approximate f (0) is a three-point model, which uses f (0)
and the two most recently rejected steps to create the parabola. The MATLAB
code parab3p implements the three-point parabolic line search and is called
by the two codes nsola and brsola.
In this approach one evaluates f (0) and f (1) as before. If the full step is
rejected, we set » = σ1 and try again. After the second step is rejected, we
have the values
f (0), f (»c ), and f (»’ ),
where »c and »’ are the most recently rejected values of ». The polynomial
that interpolates f at 0, »c , »’ is
» (» ’ »’ )(f (»c ) ’ f (0)) (»c ’ »)(f (»’ ) ’ f (0))
p(») = f (0) + + .
»c ’ »’ »c »’
We must consider two situations. If
2
p (0) = (»’ (f (»c ) ’ f (0)) ’ »c (f (»’ ) ’ f (0)))
»c »’ (»c ’ »’ )
is positive, then we set »t to the minimum of p

»t = ’p (0)/p (0)

and apply the safeguarding step (8.5) to compute »+ . If p (0) ¤ 0 one could
either set »+ to be the minimum of p on the interval [σ0 », σ1 »] or reject the
parabolic model and simply set »+ to σ0 »c or σ1 »c . We take the latter approach
and set »+ = σ1 »c . In the MATLAB code nsola from the collection, we
implement this method with σ0 = .1, σ1 = .5.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




144 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Interpolation with a cubic polynomial has also been suggested [87], [86],
[63], [133]. Clearly, the polynomial modeling approach is heuristic. The
safeguarding and Theorem 8.2.1 provide insurance that progress will be made
in reducing F .

8.3.2. Broyden™s method. In Chapter 7 we implemented Broyden™s
method at a storage cost of one vector for each iteration. The relation
y ’ Bc s = F (x+ ) was not critical to this and we may also incorporate a line
search at a cost of a bit of complexity in the implementation. As in § 7.3, our
approach is a nonlinear version of the method in [67].
The di¬erence from the development in § 7.3 is that the simple relation
between the sequence {wn } and {vn } is changed. If a line search is used, then
’1
sn = ’»n Bn F (xn )

and hence
(8.7) yn ’ Bn sn = F (xn+1 ) ’ (1 ’ »n )F (xn ).
If, using the notation in § 7.3, we set
yn ’ Bn sn sn ’1 T ’1
un = , vn = , and wn = (Bn un )/(1 + vn Bn un ),
sn 2 sn 2

we can use (8.7) and (7.38) to obtain

’dn+1 sn
+ (»’1 ’ 1)
wn = »n n
sn 2 sn 2
(8.8)
’sn+1 /»n+1 sn
+ (»’1 ’ 1)
= »n ,
n
sn 2 sn 2

where
’1
dn+1 = ’Bn+1 F (xn+1 )
is the search direction used to compute xn+1 . Note that (8.8) reduces to (7.41)
when »n = 1 and »n+1 = 1 (so dn+1 = sn+1 ).
We can use the ¬rst equality in (8.8) and the relation

wn sT
n ’1
dn+1 =’ I’ Bn F (xn+1 )
sn 2

to solve for dn+1 and obtain an analog of (7.44)

sn 2 Bn F (xn+1 ) ’ (1 ’ »n )sT Bn F (xn+1 )sn
’1 ’1
n
2
(8.9) dn+1 =’ ’1
sn 2 + »n sT Bn F (xn+1 )
n
2

’1
We then use the second equality in (8.8) to construct Bc F (x+ ) as we did in
Algorithm brsol. Veri¬cation of (8.8) and (8.9) is left to Exercise 8.5.5.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




145
GLOBAL CONVERGENCE

Algorthm brsola uses these ideas. We do not include details of the line
search or provide for restarts or a limited memory implementation. The
MATLAB implementation, provided in the collection of MATLAB codes, uses
a three-point parabolic line search and performs a restart when storage is
exhausted like Algorithm brsol does.
It is possible that the Broyden direction fails to satisfy (8.2). In fact, there
is no guarantee that the Broyden direction is a direction of decrease for F 2 .
The MATLAB code returns with an error message if more than a ¬xed number
of reductions in the step length are needed. Another approach would be to
compute F (xc )d numerically if a full step is rejected and then test (8.2) before
beginning the line search. As a side e¬ect, an approximation of f (0) can be
obtained at no additional cost and a parabolic line search based on f (0), f (0),
and f (1) can be used. In Exercise 8.5.10 the reader is asked to fully develop
and implement these ideas.
Algorithm 8.3.1. brsola(x, F, „, maxit, nmax)
1. Initialize:
Fc = F (x) r0 = F (x) 2 , n = ’1,
d = ’F (x), compute »0 , s0 = »0 d
2. Do while n ¤ maxit
(a) n = n + 1
(b) x = x + sn
(c) Evaluate F (x)
(d) If F (x) ¤ „r r0 + „a exit.
2
(e) i. z = ’F (x)
ii. for j = 0, n ’ 1
a = ’»j /»j+1 , b = 1 ’ »j
z = z + (asj+1 + bsj )sT z/ sj 2
2
j
(f) d = ( sn 2 z + (1 ’ »n )sT zsn )/( sn 2 ’ »n sT z)
n n
2 2
(g) Compute »n+1 with a line search.
(h) Set sn+1 = »n+1 d, store sn+1 and »n+1 .
2

Since dn+1 and sn+1 can occupy the same location, the storage require-
ments of Algorithm brsola would appear to be essentially the same as those
of brsol. However, computation of the step length requires storage of both the
current point x and the trial point x + »s before a step can be accepted. So,
the storage of the globalized methods exceeds that of the local method by one
vector. The MATLAB implementation of brsola in the collection illustrates
this point.

8.4. Examples for Newton“Armijo
The MATLAB code nsola is a modi¬cation of nsolgm that incorporates the
three-point parabolic line search from § 8.3.1 and also changes · using (6.20)



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




146 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

once the iteration is near the root. We compare this strategy, using GMRES
as the linear solver, with the constant · method as we did in § 6.4 with γ = .9.
As before, in all the ¬gures we plot F (xn ) 2 / F (x0 ) 2 against the number
of function evaluations required by all line searches and inner and outer
iterations to compute xn . Counts of function evaluations corresponding to
outer iterations are√ indicated by circles. We base our absolute error criterion
on the norm · 2 / N as we did in § 6.4.2 and 7.4.
In § 8.4.3 we compare the Broyden“Armijo method with Newton-GMRES
for those problems for which both methods were successful. We caution the
reader now, and will repeat the caution later, that if the initial iterate is far
from the solution, an inexact Newton method such as Newton-GMRES can
succeed in many case in which a quasi-Newton method can fail because the
quasi-Newton direction may not be an inexact Newton direction. However,
when both methods succeed, the quasi-Newton method, which requires a single
function evaluation for each outer iteration when full steps can be taken, may
well be most e¬cient.

8.4.1. Inverse tangent function. Since we have easy access to analytic
derivatives in this example, we can use the two-point parabolic line search.
We compare the two-point parabolic line search with the constant reduction
search (σ0 = σ1 = .5) for the arctan function. In Fig. 8.2 we plot the iteration
history corresponding to the parabolic line search with the solid line and that
for the constant reduction with the dashed line. We use x0 = 10 as the initial
iterate with „r = „a = 10’8 . The parabolic line search required 7 outer iterates
and 21 function evaluations in contrast with the constant reduction searches 11
outer iterates and 33 function evaluations. In the ¬rst outer iteration, both line
searches take three steplength reductions. However, the parabolic line search
takes only one reduction in the next three iterations and none thereafter. The
constant reduction line search took three reductions in the ¬rst two outer
iterations and two each in following two.

8.4.2. Convection-di¬usion equation. We consider a much more di¬-

<< . .

. 22
( : 26)



. . >>