<< . .

. 21
( : 26)



. . >>

7.5.7. Modify nsol to use Broyden™s method instead of the chord method after
the initial Jacobian evaluation. How does this compare with nsol on the
examples in Chapter 5.

7.5.8. Compare the performance of Broyden™s method and Newton-GMRES on
the Modi¬ed Bratu Problem

’∇2 u + dux + eu = 0

on (0, 1) — (0, 1) with homogeneous Dirichlet boundary conditions.
Precondition with the Poisson solver fish2d. Experiment with various
mesh widths, initial iterates, and values for the convection coe¬cient d.

7.5.9. The bad Broyden method [26], so named because of its inferior perfor-
mance in practice [63], updates an approximation to the inverse of the
Jacobian at the root so that B ’1 ≈ F (x— )’1 satis¬es the inverse secant
equation
’1
(7.45) B+ y = s



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




133
BROYDEN™S METHOD

with the rank-one update

(s ’ Bc y)y T
’1
’1 ’1
B+ = Bc + .
y2
2

Show that the bad Broyden method is locally superlinearly convergent
if the standard assumptions hold. Try to implement the bad Broyden
method in a storage-e¬cient manner using the ideas from [67] and § 7.3.
Does anything go wrong? See [67] for more about this.

7.5.10. The Schubert or sparse Broyden algorithm [172], [27] is a quasi-Newton
update for sparse matrices that enforces both the secant equation and the
sparsity pattern. For problems with tridiagonal Jacobian, for example,
the update is
(y ’ Bc s)i sj
(B+ )ij = (Bc )ij + i+1 2
k=i’1 sk

for 1 ¤ i, j ¤ N and |i ’ j| ¤ 1. Note that only the subdiagonal, su-
perdiagonal, and main diagonal are updated. Read about the algorithm
in [172] and prove local superlinear convergence under the standard as-
sumptions and the additional assumption that the sparsity pattern of B0
is the same as that of F (x— ). How would the Schubert algorithm be
a¬ected by preconditioning? Compare your analysis with those in [158],
[125], and [189].

7.5.11. Implement the bad Broyden method, apply it to the examples in § 7.4,
and compare the performance to that of Broyden™s method. Discuss the
di¬erences in implementation from Broyden™s method.

7.5.12. Implement the Schubert algorithm on an unpreconditioned discretized
partial di¬erential equation (such as the Bratu problem from Exer-
cise 7.5.8) and compare it to Newton-GMRES and Broyden™s method.
Does the relative performance of the methods change as h is decreased?
This is interesting even in one space dimension where all matrices are
tridiagonal. References [101], [93], [92], and [112], are relevant to this
exercise.

7.5.13. Use your result from Exercise 5.7.14 in Chapter 5 to numerically estimate
the q-order of Broyden™s method for some of the examples in this
section (both linear and nonlinear). What can you conclude from your
observations?




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




134 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 8

Global Convergence




By a globally convergent algorithm we mean an algorithm with the property
that for any initial iterate the iteration either converges to a root of F or fails
to do so in one of a small number of ways. Of the many such algorithms we
will focus on the class of line search methods and the Armijo rule [3], [88],
implemented inexactly [24], [25], [70].
Other methods, such as trust region [153], [154], [181], [129], [63], [24],
[25], [173], [70], and continuation/homotopy methods [1], [2], [109], [159],
[198], can be used to accomplish the same objective. We select the line search
paradigm because of its simplicity, and because it is trivial to add a line search
to an existing locally convergent implementation of Newton™s method. The
implementation and analysis of the line search method we develop in this
chapter do not depend on whether iterative or direct methods are used to
compute the Newton step or upon how or if the Jacobian is computed and
stored. We begin with single equations, where the algorithms can be motivated
and intuition developed with a very simple example.

8.1. Single equations
If we apply Newton™s method to ¬nd the root x— = 0 of the function
f (x) = arctan(x) with initial iterate x0 = 10 we ¬nd that the initial iterate
is too far from the root for the local convergence theory in Chapter 5 to be
applicable. The reason for this is that f (x) = (1 + x2 )’1 is small for large
x; f (x) = arctan(x) ≈ ±π/2, and hence the magnitude of the Newton step
is much larger than that of the iterate itself. We can see this e¬ect in the
sequence of iterates:

10, ’138, 2.9 — 104 , ’1.5 — 109 , 9.9 — 1017 ,

a failure of Newton™s method that results from an inaccurate initial iterate.
If we look more closely, we see that the Newton step s = ’f (xc )/f (xc ) is
pointing toward the root in the sense that sxc < 0, but the length of s is too
large. This observation motivates the simple ¬x of reducing the size of the step
until the size of the nonlinear residual is decreased. A prototype algorithm is
given below.
135

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




136 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Algorithm 8.1.1. lines1(x, f, „ )
1. r0 = |f (x)|
2. Do while |f (x)| > „r r0 + „a
(a) If f (x) = 0 terminate with failure.
(b) s = ’f (x)/f (x) (search direction)
(c) xt = x + s (trial point)
(d) If |f (xt )| < |f (x)| then x = xt (accept the step)
else
s = s/2 goto 2c (reject the step)
When we apply Algorithm lines1 to our simple example, we obtain the
sequence

10, ’8.5, 4.9, ’3.8, 1.4, ’1.3, 1.2, ’.99, .56, ’0.1, 9 — 10’4 , ’6 — 10’10 .

The quadratic convergence behavior of Newton™s method is only apparent
very late in the iteration. Iterates 1, 2, 3, and 4 required 3, 3, 2, and 2
steplength reductions. After the fourth iterate, decrease in the size of the
nonlinear residual was obtained with a full Newton step. This is typical of the
performance of the algorithms we discuss in this chapter.
We plot the progress of the iteration in Fig. 8.1. We plot

|arctan(x)/arctan(x0 )|

as a function of the number of function evaluations with the solid line. The
outer iterations are identi¬ed with circles as in § 6.4. One can see that most of
the work is in the initial phase of the iteration. In the terminal phase, where
the local theory is valid, the minimum number of two function evaluations per
iterate is required. However, even in this terminal phase rapid convergence
takes place only in the ¬nal three outer iterations.
Note the di¬erences between Algorithm lines1 and Algorithm nsol. One
does not set x+ = x + s without testing the step to see if the absolute value of
the nonlinear residual has been decreased. We call this method a line search
because we search along the line segment

(xc , xc ’ f (xc )/f (xc ))

to ¬nd a decrease in |f (xc )|. As we move from the right endpoint to the left,
some authors [63] refer to this as backtracking.
Algorithm lines1 almost always works well. In theory, however, there is
the possibility of oscillation about a solution without convergence [63]. To
remedy this and to make analysis possible we replace the test for simple
decrease with one for su¬cient decrease. The algorithm is to compute a search
direction d which for us will be the Newton direction

d = ’f (xc )/f (xc )



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




137
GLOBAL CONVERGENCE


0
10
-1
10
-2
10
Relative Nonlinear Residual -3
10
-4
10
-5
10
-6
10
-7
10
-8
10
-9
10
-10
10
0 5 10 15 20 25 30 35
Function Evaluations




Fig. 8.1. Newton“Armijo iteration for inverse tangent.


and then test steps of the form s = »d, with » = 2’j for some j ≥ 0, until
f (x + s) satis¬es
(8.1) |f (xc + »d)| < (1 ’ ±»)|f (xc )|.
The condition in (8.1) is called su¬cient decrease of |f |. The parameter
± ∈ (0, 1) is a small, but positive, number intended to make (8.1) as easy
as possible to satisfy. We follow the recent optimization literature [63] and
set ± = 10’4 . Once su¬cient decrease has been obtained we accept the step
s = »d. This strategy, from [3] (see also [88]) is called the Armijo rule. Note
that we must now distinguish between the step and the Newton direction,
something we did not have to do in the earlier chapters.
Algorithm 8.1.2. nsola1(x, f, „ )
1. r0 = |f (x)|

2. Do while |f (x)| > „r r0 + „a
(a) If f (x) = 0 terminate with failure.
(b) d = ’f (x)/f (x) (search direction)
(c) » = 1
i. xt = x + »d (trial point)
ii. If |f (xt )| < (1 ’ ±»)|f (x)| then x = xt (accept the step)
else
» = »/2 goto 2(c)i (reject the step)

This is essentially the algorithm implemented in the MATLAB code nsola.
We will analyze Algorithm nsola in § 8.2. We remark here that there is
nothing critical about the reduction factor of 2 in the line search. A factor of



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




138 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

10 could well be better in situations in which small values of » are needed for
several consecutive steps (such as our example with the arctan function). In
that event a reduction of a factor of 8 would require three passes through the
loop in nsola1 but only a single reduction by a factor of 10. This could be
quite important if the search direction were determined by an inexact Newton
method or an approximation to Newton™s method such as the chord method
or Broyden™s method.
On the other hand, reducing » by too much can be costly as well. Taking
full Newton steps ensures fast local convergence. Taking as large a fraction as
possible helps move the iteration into the terminal phase in which full steps
may be taken and fast convergence expected. We will return to the issue of
reduction in » in § 8.3.

8.2. Analysis of the Armijo rule
In this section we extend the one-dimensional algorithm in two ways before
proceeding with the analysis. First, we accept any direction that satis¬es the
inexact Newton condition (6.1). We write this for the Armijo sequence as

(8.2) F (xn )dn + F (xn ) ¤ ·n F (xn ) .

Our second extension is to allow more ¬‚exibility in the reduction of ». We
allow for any choice that produces a reduction that satis¬es

σ0 »old ¤ »new ¤ σ1 »old ,

where 0 < σ0 < σ1 < 1. One such method, developed in § 8.3, is to minimize
an interpolating polynomial based on the previous trial steps. The danger is
that the minimum may be too near zero to be of much use and, in fact, the
iteration may stagnate as a result. The parameter σ0 safeguards against that.
Safeguarding is an important aspect of many globally convergent methods and
has been used for many years in the context of the polynomial interpolation
algorithms we discuss in § 8.3.1 [54], [63], [86], [133], [87]. Typical values of σ0
and σ1 are .1 and .5. Our algorithm nsola incorporates these ideas.
While (8.2) is motivated by the Newton-iterative paradigm, the analysis
here applies equally to methods that solve the equations for the Newton step
directly (so ·n = 0), approximate the Newton step by a quasi-Newton method,
or even use the chord method provided that the resulting direction dn satis¬es
(8.2). In Exercise 8.5.9 you are asked to implement the Armijo rule in the
context of Algorithm nsol.
Algorithm 8.2.1. nsola(x, F, „, ·).
1. r0 = F (x)

2. Do while F (x) > „r r0 + „a
(a) Find d such that F (x)d + F (x) ¤ · F (x)
If no such d can be found, terminate with failure.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




139
GLOBAL CONVERGENCE

(b) » = 1
i. xt = x + »d
ii. If F (xt ) < (1 ’ ±») F (x) then x = xt
else
Choose σ ∈ [σ0 , σ1 ]
» = σ»
goto 2(b)i

Note that step 2a must allow for the possibility that F is ill-conditioned
and that no direction can be found that satis¬es (8.2). If, for example, step
2a were implemented with a direct factorization method, ill-conditioning of F
might be detected as part of the factorization.
Let {xn } be the iteration produced by Algorithm nsola with initial iterate
x0 . The algorithm can fail in some obvious ways:
1. F (xn ) is singular for some n. The inner iteration could terminate with
a failure.

2. xn ’ x, a local minimum of F which is not a root.
¯

3. xn ’ ∞.
Clearly if F has no roots, the algorithm must fail. We will see that if F has no
roots then either F (xn ) has a limit point which is singular or {xn } becomes
unbounded.
The convergence theorem is the remarkable statement that if {xn } and
{ F (xn )’1 } are bounded (thereby eliminating premature failure and local
minima of F that are not roots) and F is Lipschitz continuous in a
neighborhood of the entire sequence {xn }, then the iteration converges to a
root of F at which the standard assumptions hold, full steps are taken in the
terminal phase, and the convergence is q-quadratic.
We begin with a formal statement of our assumption that F is uniformly
Lipschitz continuous and bounded away from zero on the sequence {xn }.
Assumption 8.2.1. There are r, γ, mf > 0 such that F is de¬ned, F is
Lipschitz continuous with Lipschitz constant γ, and F (x)’1 ¤ mf on the
set

„¦({xn }, r) = {x | x ’ xn ¤ r}.
n=0

Assumption 8.2.1 implies that the line search phase (step 2b in Algorithm
nsola) will terminate in a number of cycles that is independent of n. We show
this, as do [25] and [70], by giving a uniform lower bound on »n (assuming
that F (x0 ) = 0). Note how the safeguard bound σ0 enters this result. Note
also the very modest conditions on ·n .
Lemma 8.2.1. Let x0 ∈ RN and ± ∈ (0, 1) be given. Assume that {xn } is
given by Algorithm nsola with

(8.3) {·n } ‚ (0, · ] ‚ (0, 1 ’ ±)
¯



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




140 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

and that Assumption 8.2.1 holds. Then either F (x0 ) = 0 or

r 2(1 ’ ± ’ · )
¯
¯
(8.4) »n ≥ » = σ0 min ,2 .
mf F (x0 ) mf F (x0 ) (1 + · )2 γ
¯

Proof. Let n ≥ 0. By the fundamental theorem of calculus and (8.2) we
have, for any » ∈ [0, 1]
1
F (xn + »dn ) = F (xn ) + »F (xn )dn + (F (xn + t»dn ) ’ F (xn ))»dn dt
0

1
= (1 ’ »)F (xn ) + »ξn + (F (xn + t»dn ) ’ F (xn ))»dn dt,
0

where, by the bound ·n ¤ · ,

<< . .

. 21
( : 26)



. . >>