the initial Jacobian evaluation. How does this compare with nsol on the

examples in Chapter 5.

7.5.8. Compare the performance of Broyden™s method and Newton-GMRES on

the Modi¬ed Bratu Problem

’∇2 u + dux + eu = 0

on (0, 1) — (0, 1) with homogeneous Dirichlet boundary conditions.

Precondition with the Poisson solver fish2d. Experiment with various

mesh widths, initial iterates, and values for the convection coe¬cient d.

7.5.9. The bad Broyden method [26], so named because of its inferior perfor-

mance in practice [63], updates an approximation to the inverse of the

Jacobian at the root so that B ’1 ≈ F (x— )’1 satis¬es the inverse secant

equation

’1

(7.45) B+ y = s

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

133

BROYDEN™S METHOD

with the rank-one update

(s ’ Bc y)y T

’1

’1 ’1

B+ = Bc + .

y2

2

Show that the bad Broyden method is locally superlinearly convergent

if the standard assumptions hold. Try to implement the bad Broyden

method in a storage-e¬cient manner using the ideas from [67] and § 7.3.

Does anything go wrong? See [67] for more about this.

7.5.10. The Schubert or sparse Broyden algorithm [172], [27] is a quasi-Newton

update for sparse matrices that enforces both the secant equation and the

sparsity pattern. For problems with tridiagonal Jacobian, for example,

the update is

(y ’ Bc s)i sj

(B+ )ij = (Bc )ij + i+1 2

k=i’1 sk

for 1 ¤ i, j ¤ N and |i ’ j| ¤ 1. Note that only the subdiagonal, su-

perdiagonal, and main diagonal are updated. Read about the algorithm

in [172] and prove local superlinear convergence under the standard as-

sumptions and the additional assumption that the sparsity pattern of B0

is the same as that of F (x— ). How would the Schubert algorithm be

a¬ected by preconditioning? Compare your analysis with those in [158],

[125], and [189].

7.5.11. Implement the bad Broyden method, apply it to the examples in § 7.4,

and compare the performance to that of Broyden™s method. Discuss the

di¬erences in implementation from Broyden™s method.

7.5.12. Implement the Schubert algorithm on an unpreconditioned discretized

partial di¬erential equation (such as the Bratu problem from Exer-

cise 7.5.8) and compare it to Newton-GMRES and Broyden™s method.

Does the relative performance of the methods change as h is decreased?

This is interesting even in one space dimension where all matrices are

tridiagonal. References [101], [93], [92], and [112], are relevant to this

exercise.

7.5.13. Use your result from Exercise 5.7.14 in Chapter 5 to numerically estimate

the q-order of Broyden™s method for some of the examples in this

section (both linear and nonlinear). What can you conclude from your

observations?

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

134 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 8

Global Convergence

By a globally convergent algorithm we mean an algorithm with the property

that for any initial iterate the iteration either converges to a root of F or fails

to do so in one of a small number of ways. Of the many such algorithms we

will focus on the class of line search methods and the Armijo rule [3], [88],

implemented inexactly [24], [25], [70].

Other methods, such as trust region [153], [154], [181], [129], [63], [24],

[25], [173], [70], and continuation/homotopy methods [1], [2], [109], [159],

[198], can be used to accomplish the same objective. We select the line search

paradigm because of its simplicity, and because it is trivial to add a line search

to an existing locally convergent implementation of Newton™s method. The

implementation and analysis of the line search method we develop in this

chapter do not depend on whether iterative or direct methods are used to

compute the Newton step or upon how or if the Jacobian is computed and

stored. We begin with single equations, where the algorithms can be motivated

and intuition developed with a very simple example.

8.1. Single equations

If we apply Newton™s method to ¬nd the root x— = 0 of the function

f (x) = arctan(x) with initial iterate x0 = 10 we ¬nd that the initial iterate

is too far from the root for the local convergence theory in Chapter 5 to be

applicable. The reason for this is that f (x) = (1 + x2 )’1 is small for large

x; f (x) = arctan(x) ≈ ±π/2, and hence the magnitude of the Newton step

is much larger than that of the iterate itself. We can see this e¬ect in the

sequence of iterates:

10, ’138, 2.9 — 104 , ’1.5 — 109 , 9.9 — 1017 ,

a failure of Newton™s method that results from an inaccurate initial iterate.

If we look more closely, we see that the Newton step s = ’f (xc )/f (xc ) is

pointing toward the root in the sense that sxc < 0, but the length of s is too

large. This observation motivates the simple ¬x of reducing the size of the step

until the size of the nonlinear residual is decreased. A prototype algorithm is

given below.

135

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

136 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Algorithm 8.1.1. lines1(x, f, „ )

1. r0 = |f (x)|

2. Do while |f (x)| > „r r0 + „a

(a) If f (x) = 0 terminate with failure.

(b) s = ’f (x)/f (x) (search direction)

(c) xt = x + s (trial point)

(d) If |f (xt )| < |f (x)| then x = xt (accept the step)

else

s = s/2 goto 2c (reject the step)

When we apply Algorithm lines1 to our simple example, we obtain the

sequence

10, ’8.5, 4.9, ’3.8, 1.4, ’1.3, 1.2, ’.99, .56, ’0.1, 9 — 10’4 , ’6 — 10’10 .

The quadratic convergence behavior of Newton™s method is only apparent

very late in the iteration. Iterates 1, 2, 3, and 4 required 3, 3, 2, and 2

steplength reductions. After the fourth iterate, decrease in the size of the

nonlinear residual was obtained with a full Newton step. This is typical of the

performance of the algorithms we discuss in this chapter.

We plot the progress of the iteration in Fig. 8.1. We plot

|arctan(x)/arctan(x0 )|

as a function of the number of function evaluations with the solid line. The

outer iterations are identi¬ed with circles as in § 6.4. One can see that most of

the work is in the initial phase of the iteration. In the terminal phase, where

the local theory is valid, the minimum number of two function evaluations per

iterate is required. However, even in this terminal phase rapid convergence

takes place only in the ¬nal three outer iterations.

Note the di¬erences between Algorithm lines1 and Algorithm nsol. One

does not set x+ = x + s without testing the step to see if the absolute value of

the nonlinear residual has been decreased. We call this method a line search

because we search along the line segment

(xc , xc ’ f (xc )/f (xc ))

to ¬nd a decrease in |f (xc )|. As we move from the right endpoint to the left,

some authors [63] refer to this as backtracking.

Algorithm lines1 almost always works well. In theory, however, there is

the possibility of oscillation about a solution without convergence [63]. To

remedy this and to make analysis possible we replace the test for simple

decrease with one for su¬cient decrease. The algorithm is to compute a search

direction d which for us will be the Newton direction

d = ’f (xc )/f (xc )

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

137

GLOBAL CONVERGENCE

0

10

-1

10

-2

10

Relative Nonlinear Residual -3

10

-4

10

-5

10

-6

10

-7

10

-8

10

-9

10

-10

10

0 5 10 15 20 25 30 35

Function Evaluations

Fig. 8.1. Newton“Armijo iteration for inverse tangent.

and then test steps of the form s = »d, with » = 2’j for some j ≥ 0, until

f (x + s) satis¬es

(8.1) |f (xc + »d)| < (1 ’ ±»)|f (xc )|.

The condition in (8.1) is called su¬cient decrease of |f |. The parameter

± ∈ (0, 1) is a small, but positive, number intended to make (8.1) as easy

as possible to satisfy. We follow the recent optimization literature [63] and

set ± = 10’4 . Once su¬cient decrease has been obtained we accept the step

s = »d. This strategy, from [3] (see also [88]) is called the Armijo rule. Note

that we must now distinguish between the step and the Newton direction,

something we did not have to do in the earlier chapters.

Algorithm 8.1.2. nsola1(x, f, „ )

1. r0 = |f (x)|

2. Do while |f (x)| > „r r0 + „a

(a) If f (x) = 0 terminate with failure.

(b) d = ’f (x)/f (x) (search direction)

(c) » = 1

i. xt = x + »d (trial point)

ii. If |f (xt )| < (1 ’ ±»)|f (x)| then x = xt (accept the step)

else

» = »/2 goto 2(c)i (reject the step)

This is essentially the algorithm implemented in the MATLAB code nsola.

We will analyze Algorithm nsola in § 8.2. We remark here that there is

nothing critical about the reduction factor of 2 in the line search. A factor of

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

138 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

10 could well be better in situations in which small values of » are needed for

several consecutive steps (such as our example with the arctan function). In

that event a reduction of a factor of 8 would require three passes through the

loop in nsola1 but only a single reduction by a factor of 10. This could be

quite important if the search direction were determined by an inexact Newton

method or an approximation to Newton™s method such as the chord method

or Broyden™s method.

On the other hand, reducing » by too much can be costly as well. Taking

full Newton steps ensures fast local convergence. Taking as large a fraction as

possible helps move the iteration into the terminal phase in which full steps

may be taken and fast convergence expected. We will return to the issue of

reduction in » in § 8.3.

8.2. Analysis of the Armijo rule

In this section we extend the one-dimensional algorithm in two ways before

proceeding with the analysis. First, we accept any direction that satis¬es the

inexact Newton condition (6.1). We write this for the Armijo sequence as

(8.2) F (xn )dn + F (xn ) ¤ ·n F (xn ) .

Our second extension is to allow more ¬‚exibility in the reduction of ». We

allow for any choice that produces a reduction that satis¬es

σ0 »old ¤ »new ¤ σ1 »old ,

where 0 < σ0 < σ1 < 1. One such method, developed in § 8.3, is to minimize

an interpolating polynomial based on the previous trial steps. The danger is

that the minimum may be too near zero to be of much use and, in fact, the

iteration may stagnate as a result. The parameter σ0 safeguards against that.

Safeguarding is an important aspect of many globally convergent methods and

has been used for many years in the context of the polynomial interpolation

algorithms we discuss in § 8.3.1 [54], [63], [86], [133], [87]. Typical values of σ0

and σ1 are .1 and .5. Our algorithm nsola incorporates these ideas.

While (8.2) is motivated by the Newton-iterative paradigm, the analysis

here applies equally to methods that solve the equations for the Newton step

directly (so ·n = 0), approximate the Newton step by a quasi-Newton method,

or even use the chord method provided that the resulting direction dn satis¬es

(8.2). In Exercise 8.5.9 you are asked to implement the Armijo rule in the

context of Algorithm nsol.

Algorithm 8.2.1. nsola(x, F, „, ·).

1. r0 = F (x)

2. Do while F (x) > „r r0 + „a

(a) Find d such that F (x)d + F (x) ¤ · F (x)

If no such d can be found, terminate with failure.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

139

GLOBAL CONVERGENCE

(b) » = 1

i. xt = x + »d

ii. If F (xt ) < (1 ’ ±») F (x) then x = xt

else

Choose σ ∈ [σ0 , σ1 ]

» = σ»

goto 2(b)i

Note that step 2a must allow for the possibility that F is ill-conditioned

and that no direction can be found that satis¬es (8.2). If, for example, step

2a were implemented with a direct factorization method, ill-conditioning of F

might be detected as part of the factorization.

Let {xn } be the iteration produced by Algorithm nsola with initial iterate

x0 . The algorithm can fail in some obvious ways:

1. F (xn ) is singular for some n. The inner iteration could terminate with

a failure.

2. xn ’ x, a local minimum of F which is not a root.

¯

3. xn ’ ∞.

Clearly if F has no roots, the algorithm must fail. We will see that if F has no

roots then either F (xn ) has a limit point which is singular or {xn } becomes

unbounded.

The convergence theorem is the remarkable statement that if {xn } and

{ F (xn )’1 } are bounded (thereby eliminating premature failure and local

minima of F that are not roots) and F is Lipschitz continuous in a

neighborhood of the entire sequence {xn }, then the iteration converges to a

root of F at which the standard assumptions hold, full steps are taken in the

terminal phase, and the convergence is q-quadratic.

We begin with a formal statement of our assumption that F is uniformly

Lipschitz continuous and bounded away from zero on the sequence {xn }.

Assumption 8.2.1. There are r, γ, mf > 0 such that F is de¬ned, F is

Lipschitz continuous with Lipschitz constant γ, and F (x)’1 ¤ mf on the

set

∞

„¦({xn }, r) = {x | x ’ xn ¤ r}.

n=0

Assumption 8.2.1 implies that the line search phase (step 2b in Algorithm

nsola) will terminate in a number of cycles that is independent of n. We show

this, as do [25] and [70], by giving a uniform lower bound on »n (assuming

that F (x0 ) = 0). Note how the safeguard bound σ0 enters this result. Note

also the very modest conditions on ·n .

Lemma 8.2.1. Let x0 ∈ RN and ± ∈ (0, 1) be given. Assume that {xn } is

given by Algorithm nsola with

(8.3) {·n } ‚ (0, · ] ‚ (0, 1 ’ ±)

¯

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

140 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

and that Assumption 8.2.1 holds. Then either F (x0 ) = 0 or

r 2(1 ’ ± ’ · )

¯

¯

(8.4) »n ≥ » = σ0 min ,2 .

mf F (x0 ) mf F (x0 ) (1 + · )2 γ

¯

Proof. Let n ≥ 0. By the fundamental theorem of calculus and (8.2) we

have, for any » ∈ [0, 1]

1

F (xn + »dn ) = F (xn ) + »F (xn )dn + (F (xn + t»dn ) ’ F (xn ))»dn dt

0

1

= (1 ’ »)F (xn ) + »ξn + (F (xn + t»dn ) ’ F (xn ))»dn dt,

0

where, by the bound ·n ¤ · ,