ξn ¤ ·n F (xn ) ¤ · F (xn ) .

¯

The step acceptance condition in nsola implies that { F (xn ) } is a decreasing

sequence and therefore

dn = F (xn )’1 (ξn ’ F (xn )) ¤ mf (1 + · ) F (xn ) .

¯

Therefore, by Assumption 8.2.1 F is Lipschitz continuous on the line segment

[xn , xn + »dn ] whenever

¯

» ¤ »1 = r/(mf F (x0 ) ).

¯

Lipschitz continuity of F implies that if » ¤ »1 then

γ2 2

F (xn + »dn ) ¤ (1 ’ ») F (xn ) + »¯ F (xn ) +

· » dn

2

m2 γ(1 + · )2 »2 F (xn ) 2

¯

f

¤ (1 ’ ») F (xn )| + »¯ F (xn ) +

·

2

m2 γ»(1 + · )2 F (x0 )

¯

f

¤ (1 ’ » + · ») F (xn ) + » F (xn )

¯

2

< (1 ’ ±») F (xn ) ,

which will be implied by

2(1 ’ ± ’ · )

¯

¯ ¯

» ¤ »2 = min »1 , .

(1 + · )2 m2 γ F (x0 )

¯ f

¯

This shows that » can be no smaller than σ0 »2 , which completes the proof.

Now we can show directly that the sequence of Armijo iterates either is

unbounded or converges to a solution. Theorem 8.2.1 says even more. The

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

141

GLOBAL CONVERGENCE

sequence converges to a root at which the standard assumptions hold, so in the

terminal phase of the iteration the step lengths are all 1 and the convergence

is q-quadratic.

Theorem 8.2.1. Let x0 ∈ RN and ± ∈ (0, 1) be given. Assume that {xn }

is given by Algorithm nsola, {·n } satis¬es (8.3), {xn } is bounded, and that

Assumption 8.2.1 holds. Then {xn } converges to a root x— of F at which the

standard assumptions hold, full steps are taken for n su¬ciently large, and the

convergence behavior in the ¬nal phase of the iteration is that given by the local

theory for inexact Newton methods (Theorem 6.1.2).

Proof. If F (xn ) = 0 for some n, then we are done because Assump-

tion 8.2.1 implies that the standard assumptions hold at x— = xn . Otherwise

Lemma 8.2.1 implies that F (xn ) converges q-linearly to zero with q-factor at

¯

most (1 ’ ±»).

The boundedness of the sequence {xn } implies that a subsequence {xnk }

converges, say to x— . Since F is continuous, F (x— ) = 0. Eventually

|xnk ’ x— | < r, where r is the radius in Assumption 8.2.1 and therefore the

standard assumptions hold at x— .

Since the standard assumptions hold at x— , there is δ such that if x ∈ B(δ),

the Newton iteration with x0 = x will remain in B(δ) and converge q-

quadratically to x— . Hence as soon as xnk ∈ B(δ), the entire sequence, not

just the subsequence, will remain in B(δ) and converge to x— . Moreover,

Theorem 6.1.2 applies and hence full steps can be taken.

At this stage we have shown that if the Armijo iteration fails to converge

to a root either the continuity properties of F or nonsingularity of F break

down as the iteration progresses (Assumption 8.2.1 is violated) or the iteration

becomes unbounded. This is all that one could ask for in terms of robustness

of such an algorithm.

Thus far we have used for d the inexact Newton direction. It could well be

advantageous to use a chord or Broyden direction, for example. All that one

needs to make the theorems and proofs in § 8.2 hold is (8.2), which is easy to

verify, in principle, via a forward di¬erence approximation to F (xn )d.

8.3. Implementation of the Armijo rule

The MATLAB code nsola is a modi¬cation of nsolgm which provides for a

choice of several Krylov methods for computing an inexact Newton direction

and globalizes Newton™s method with the Armijo rule. We use the l2 norm in

that code and in the numerical results reported in § 8.4. If GMRES is used

as the linear solver, then the storage requirements are roughly the same as for

Algorithm nsolgm if one reuses the storage location for the ¬nite di¬erence

directional derivative to test for su¬cient decrease.

In Exercise 8.5.9 you are asked to do this with nsol and, with each

iteration, numerically determine if the chord direction satis¬es the inexact

Newton condition. In this case the storage requirements are roughly the same

as for Algorithm nsol. The iteration can be very costly if the Jacobian must be

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

142 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

evaluated and factored many times during the phase of the iteration in which

the approximate solution is far from the root.

We consider two topics in this section. The ¬rst, polynomial interpolation

as a way to choose », is applicable to all methods. The second, how Broyden™s

method must be implemented, is again based on [67] and is more specialized.

Our Broyden“Armijo code does not explicitly check the inexact Newton

condition and thereby saves an evaluation of F . Evaluation of F (xc )dc if the

full step is not accepted (here dc is the Broyden direction) would not only verify

the inexact Newton condition but provide enough information for a two-point

parabolic line search. The reader is asked to pursue this in Exercise 8.5.10.

8.3.1. Polynomial line searches. In this section we consider an elabora-

tion of the simple line search scheme of reduction of the steplength by a ¬xed

factor. The motivation for this is that some problems respond well to one or

two reductions in the steplength by modest amounts (such as 1/2) and others

require many such reductions, but might respond well to a more aggressive

steplength reduction (by factors of 1/10, say). If one can model the decrease

in F 2 as the steplength is reduced, one might expect to be able to better

estimate the appropriate reduction factor. In practice such methods usually

perform better than constant reduction factors.

If we have rejected k steps we have in hand the values

F (xn ) 2 , F (xn + »1 dn ) 2 , . . . F (xn + »k’1 dn ) 2 .

We can use this iteration history to model the scalar function

2

f (») = F (xn + »dn ) 2

with a polynomial and use the minimum of that polynomial as the next

steplength. We consider two ways of doing this that use second degree

polynomials which we compute using previously computed information.

After »c has been rejected and a model polynomial computed, we compute

the minimum »t of that polynomial analytically and set

±

σ0 »c if »t < σ0 »c ,

σ1 »c if »t > σ1 »c ,

(8.5) »+ =

»t otherwise.

Two-point parabolic model. In this approach we use values of f and f

at » = 0 and the value of f at the current value of » to construct a 2nd degree

interpolating polynomial for f .

f (0) = F (xn ) 2 is known. Clearly f (0) can be computed as

2

f (0) = 2(F (xn )T dn )T F (xn ) = 2F (xn )T (F (xn )dn ).

(8.6)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

143

GLOBAL CONVERGENCE

Use of (8.6) requires evaluation of F (xn )dn , which can be obtained by

examination of the ¬nal residual from GMRES or by expending an additional

function evaluation to compute F (xn )dn with a forward di¬erence. In any

case, our approximation to f (0) should be negative. If it is not, it may

be necessary to compute a new search direction. This is a possibility with

directions other than inexact Newton directions, such as the Broyden direction.

The polynomial p(») such that p, p agree with f, f at 0 and p(»c ) = f (»c )

is

p(») = f (0) + f (0)» + c»2 ,

where

f (»c ) ’ f (0) ’ f (0)»c

c= .

»2

c

Our approximation of f (0) < 0, so if f (»c ) > f (0), then c > 0 and p(») has a

minimum at

»t = ’f (0)/(2c) > 0.

We then compute »+ with (8.5).

Three-point parabolic model. An alternative to the two-point model that

avoids the need to approximate f (0) is a three-point model, which uses f (0)

and the two most recently rejected steps to create the parabola. The MATLAB

code parab3p implements the three-point parabolic line search and is called

by the two codes nsola and brsola.

In this approach one evaluates f (0) and f (1) as before. If the full step is

rejected, we set » = σ1 and try again. After the second step is rejected, we

have the values

f (0), f (»c ), and f (»’ ),

where »c and »’ are the most recently rejected values of ». The polynomial

that interpolates f at 0, »c , »’ is

» (» ’ »’ )(f (»c ) ’ f (0)) (»c ’ »)(f (»’ ) ’ f (0))

p(») = f (0) + + .

»c ’ »’ »c »’

We must consider two situations. If

2

p (0) = (»’ (f (»c ) ’ f (0)) ’ »c (f (»’ ) ’ f (0)))

»c »’ (»c ’ »’ )

is positive, then we set »t to the minimum of p

»t = ’p (0)/p (0)

and apply the safeguarding step (8.5) to compute »+ . If p (0) ¤ 0 one could

either set »+ to be the minimum of p on the interval [σ0 », σ1 »] or reject the

parabolic model and simply set »+ to σ0 »c or σ1 »c . We take the latter approach

and set »+ = σ1 »c . In the MATLAB code nsola from the collection, we

implement this method with σ0 = .1, σ1 = .5.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

144 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Interpolation with a cubic polynomial has also been suggested [87], [86],

[63], [133]. Clearly, the polynomial modeling approach is heuristic. The

safeguarding and Theorem 8.2.1 provide insurance that progress will be made

in reducing F .

8.3.2. Broyden™s method. In Chapter 7 we implemented Broyden™s

method at a storage cost of one vector for each iteration. The relation

y ’ Bc s = F (x+ ) was not critical to this and we may also incorporate a line

search at a cost of a bit of complexity in the implementation. As in § 7.3, our

approach is a nonlinear version of the method in [67].

The di¬erence from the development in § 7.3 is that the simple relation

between the sequence {wn } and {vn } is changed. If a line search is used, then

’1

sn = ’»n Bn F (xn )

and hence

(8.7) yn ’ Bn sn = F (xn+1 ) ’ (1 ’ »n )F (xn ).

If, using the notation in § 7.3, we set

yn ’ Bn sn sn ’1 T ’1

un = , vn = , and wn = (Bn un )/(1 + vn Bn un ),

sn 2 sn 2

we can use (8.7) and (7.38) to obtain

’dn+1 sn

+ (»’1 ’ 1)

wn = »n n

sn 2 sn 2

(8.8)

’sn+1 /»n+1 sn

+ (»’1 ’ 1)

= »n ,

n

sn 2 sn 2

where

’1

dn+1 = ’Bn+1 F (xn+1 )

is the search direction used to compute xn+1 . Note that (8.8) reduces to (7.41)

when »n = 1 and »n+1 = 1 (so dn+1 = sn+1 ).

We can use the ¬rst equality in (8.8) and the relation

wn sT

n ’1

dn+1 =’ I’ Bn F (xn+1 )

sn 2

to solve for dn+1 and obtain an analog of (7.44)

sn 2 Bn F (xn+1 ) ’ (1 ’ »n )sT Bn F (xn+1 )sn

’1 ’1

n

2

(8.9) dn+1 =’ ’1

sn 2 + »n sT Bn F (xn+1 )

n

2

’1

We then use the second equality in (8.8) to construct Bc F (x+ ) as we did in

Algorithm brsol. Veri¬cation of (8.8) and (8.9) is left to Exercise 8.5.5.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

145

GLOBAL CONVERGENCE

Algorthm brsola uses these ideas. We do not include details of the line

search or provide for restarts or a limited memory implementation. The

MATLAB implementation, provided in the collection of MATLAB codes, uses

a three-point parabolic line search and performs a restart when storage is

exhausted like Algorithm brsol does.

It is possible that the Broyden direction fails to satisfy (8.2). In fact, there

is no guarantee that the Broyden direction is a direction of decrease for F 2 .

The MATLAB code returns with an error message if more than a ¬xed number

of reductions in the step length are needed. Another approach would be to

compute F (xc )d numerically if a full step is rejected and then test (8.2) before

beginning the line search. As a side e¬ect, an approximation of f (0) can be

obtained at no additional cost and a parabolic line search based on f (0), f (0),

and f (1) can be used. In Exercise 8.5.10 the reader is asked to fully develop

and implement these ideas.

Algorithm 8.3.1. brsola(x, F, „, maxit, nmax)

1. Initialize:

Fc = F (x) r0 = F (x) 2 , n = ’1,

d = ’F (x), compute »0 , s0 = »0 d

2. Do while n ¤ maxit

(a) n = n + 1

(b) x = x + sn

(c) Evaluate F (x)

(d) If F (x) ¤ „r r0 + „a exit.

2

(e) i. z = ’F (x)

ii. for j = 0, n ’ 1

a = ’»j /»j+1 , b = 1 ’ »j

z = z + (asj+1 + bsj )sT z/ sj 2

2

j

(f) d = ( sn 2 z + (1 ’ »n )sT zsn )/( sn 2 ’ »n sT z)

n n

2 2

(g) Compute »n+1 with a line search.

(h) Set sn+1 = »n+1 d, store sn+1 and »n+1 .

2

Since dn+1 and sn+1 can occupy the same location, the storage require-

ments of Algorithm brsola would appear to be essentially the same as those

of brsol. However, computation of the step length requires storage of both the

current point x and the trial point x + »s before a step can be accepted. So,

the storage of the globalized methods exceeds that of the local method by one

vector. The MATLAB implementation of brsola in the collection illustrates

this point.

8.4. Examples for Newton“Armijo

The MATLAB code nsola is a modi¬cation of nsolgm that incorporates the

three-point parabolic line search from § 8.3.1 and also changes · using (6.20)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

146 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

once the iteration is near the root. We compare this strategy, using GMRES

as the linear solver, with the constant · method as we did in § 6.4 with γ = .9.

As before, in all the ¬gures we plot F (xn ) 2 / F (x0 ) 2 against the number

of function evaluations required by all line searches and inner and outer

iterations to compute xn . Counts of function evaluations corresponding to

outer iterations are√ indicated by circles. We base our absolute error criterion

on the norm · 2 / N as we did in § 6.4.2 and 7.4.

In § 8.4.3 we compare the Broyden“Armijo method with Newton-GMRES

for those problems for which both methods were successful. We caution the

reader now, and will repeat the caution later, that if the initial iterate is far

from the solution, an inexact Newton method such as Newton-GMRES can

succeed in many case in which a quasi-Newton method can fail because the

quasi-Newton direction may not be an inexact Newton direction. However,

when both methods succeed, the quasi-Newton method, which requires a single

function evaluation for each outer iteration when full steps can be taken, may

well be most e¬cient.

8.4.1. Inverse tangent function. Since we have easy access to analytic

derivatives in this example, we can use the two-point parabolic line search.

We compare the two-point parabolic line search with the constant reduction

search (σ0 = σ1 = .5) for the arctan function. In Fig. 8.2 we plot the iteration

history corresponding to the parabolic line search with the solid line and that

for the constant reduction with the dashed line. We use x0 = 10 as the initial

iterate with „r = „a = 10’8 . The parabolic line search required 7 outer iterates

and 21 function evaluations in contrast with the constant reduction searches 11

outer iterates and 33 function evaluations. In the ¬rst outer iteration, both line

searches take three steplength reductions. However, the parabolic line search

takes only one reduction in the next three iterations and none thereafter. The

constant reduction line search took three reductions in the ¬rst two outer

iterations and two each in following two.

8.4.2. Convection-di¬usion equation. We consider a much more di¬-