problem d. This algorithm is not the whole story, as once the trust region problem is solved

approximately, one must use f (xc + d) to compute ared and then make a decision on how the

trust region radius should be changed. Our formulation differs from that in [247] in that the

termination criterion measures relative residuals in the l2 -norm rather than in the C-norm. This

change in the norm has no effect on the analysis in [247], and, therefore, we can apply the results

in §2.5 directly to draw conclusions about local convergence.

Algorithm 3.3.7. trcg(d, x, f, M, ·, ∆, kmax)

1. r = ’∇f (x), ρ0 = r 2 , k = 1, d = 0

2

√

2. Do While ρk’1 > · ∇f (x) 2 and k < kmax

(a) z = M r

(b) „k’1 = z T r

(c) if k = 1 then β = 0 and p = z

else

β = „k’1 /„k’2 , p = z + βp

(d) w = ∇2 f (x)p

If pT w ¤ 0 then

Find „ such that d + „ p C = ∆

d = d + „ p; return

(e) ± = „k’1 /pT w

(f) r = r ’ ±w

(g) ρk = rT r

ˆ

(h) d = d + ±p

ˆ

(i) If d C > ∆ then

Find „ such that d + „ p C = ∆

d = d + „ p; return

ˆ

(j) d = d; k = k + 1

Algorithm trcg does what we would expect a dogleg algorithm to do in that the piecewise

linear path determined by the iteration moves monotonically away from x (in the · C -norm!)

and the quadratic model decreases on that path [247]. Algorithm trcg will, therefore, compute

the same Newton step as Algorithm fdpcg. One might think that it may be dif¬cult to compute

the C-norm if one has, for example, a way to compute the action of M on a vector that does

not require computation of the matrix C. However, at the cost of storing two additional vectors

we can update Cp and Cd as the iteration progresses. So, when p is updated to z + βp then

Cp = r + βCp can be updated at the same time without computing the product of C with p.

Then p C = pT Cp. Similarly d = d + „ p implies that Cd = Cd + „ Cp.

Algorithm cgtrust combines the solution of the trust region problem from trcg, the

trust region radius adjustment scheme from trtest, and (indirectly) the locally convergent

algorithm newtcg. The result ¬ts nicely into our paradigm algorithm trgen.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 65

Algorithm 3.3.8. cgtrust(x, f, „ )

1. Initialize ∆, M , ·, kmax.

2. Do forever

(a) Let xc = x. Compute ∇f (xc ).

(b) Call trcg(d, x, f, M, ·, ∆, kmax) to solve the trust region subproblem.

Set xt = x + d.

(c) Call trtest(xc , xt , x, f, ∆),

solving the trust region subproblem with Algorithm trcg.

(d) Update ·.

Theorem 3.3.10 combines several results from [247].

Theorem 3.3.10. Let f be twice Lipschitz continuously differentiable. Let M be a given

positive de¬nite matrix and let {·n } satisfy 0 < ·n < 1 for all n. Let {xn } be the sequence

generated by Algorithm cgtrust and assume that { ∇2 f (xn ) } is bounded. Then

lim ∇f (xn ) = 0.

(3.49)

n’∞

Moreover, if x— is a local minimizer for which the standard assumptions hold and xn ’ x— ,

then

• if ·n ’ 0 the convergence is q-superlinear, and

p

• if ·n ¤ K· ∇f (xn ) for some K· > 0 the convergence is q-superlinear with q-order

1 + p.

Finally, there are δ and ∆ such that if x0 ’ x— ¤ δ and ∆0 ¤ ∆ then xn ’ x— .

One can, as we do in the MATLAB code cgtrust, replace the Hessian“vector product

with a difference Hessian. The accuracy of the difference Hessian and the loss of symmetry

present the potential problem that was mentioned in §2.5. Another, very different, approach is

to approximate the exact solution of the trust region subproblem with an iterative method [243].

3.4 Examples

The results we report here used the MATLAB implementations of steepest descent, steep.m,

damped Gauss“Newton, gaussn.m, the dogleg trust region algorithm for Newton™s method,

ntrust.m, and the PCG“dogleg algorithms, cgtrust.m, from the software collection.

Our MATLAB implementation of Algorithm steep guards against extremely poor scaling

and very long steps by setting » to

»0 = min(1, 100/(1 + ∇f (x) ))

(3.50)

at the beginning of the line search. We invite the reader in Exercise 3.5.3 to attempt the control

example with »0 = 1.

We not only present plots, which are an ef¬cient way to understand convergence rates, but we

also report counts of function, gradient, and Hessian evaluations and the results of the MATLAB

flops command.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

66 ITERATIVE METHODS FOR OPTIMIZATION

Dogleg Dogleg

5 5

10 10

0

10

Function Value

Gradient Norm

0

10

5

10

5

10

10

10

10 15

10 10

0 10 20 30 0 10 20 30

Iterations Iterations

Steepest Descent Steepest Descent

4 5

10 10

2

10

Function Value

Gradient Norm

0

10

0

10

5

10

2

10

4 10

10 10

0 20 40 60 0 20 40 60

Iterations Iterations

Figure 3.1: Steepest Descent and Newton“Dogleg for Parameter ID Problem

Damped Gauss“Newton Damped Gauss“Newton

2 5

10 10

0 0

10 10

Function Value

Gradient Norm

2 5

10 10

4 10

10 10

6 15

10 10

0 2 4 6 0 2 4 6

Iterations Iterations

Levenberg“Marquardt Levenberg“Marquardt

2 5

10 10

0 0

10 10

Function Value

Gradient Norm

2 5

10 10

4 10

10 10

6 15

10 10

0 5 10 15 0 5 10 15

Iterations Iterations

Figure 3.2: Gauss“Newton and Levenberg“Marquardt for Parameter ID Problem

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 67

Dogleg“CG Dogleg“CG

5 7

10 10

6

10

Function Value

Gradient Norm

0

10

5

10

5

10

4

10

10 3

10 10

0 2 4 6 0 2 4 6

Iterations Iterations

Steepest Descent Steepest Descent

5 7

10 10

6

10

Function Value

Gradient Norm

0

10

5

10

5

10

4

10

10 3

10 10

0 20 40 60 0 20 40 60

Iterations Iterations

Figure 3.3: Steepest Descent and Dogleg“CG for Discrete Control Problem

3.4.1 Parameter Identi¬cation

We consider the problem from §2.6.1 except we use the initial data x0 = (5, 5)T . Both the Gauss“

Newton and Newton methods will fail to converge with this initial data without globalization (see

Exercise 3.5.14). Newton™s method has particular trouble with this problem because the Newton

direction is not a descent direction in the early phases of the iteration. The termination criterion

and difference increment for the ¬nite difference Hessian was the same as for the computation

in §2.6.1.

In Figure 3.1 we compare the performance of the Newton dogleg algorithm with the steepest

descent algorithm. Our implementation of the classical dogleg in ntrust uses the standard

values

ωdown = .5, ωup = 2, µ0 = µlow = .25, and µhigh = .75.

(3.51)

The plots clearly show the locally superlinear convergence of Newton™s method and the linear

convergence of steepest descent. However, the graphs do not completely show the difference

in computational costs. In terms of gradient evaluations, steepest descent was marginally better

than the Newton dogleg algorithm, requiring 50 gradients as opposed to 55 (which includes those

needed for the 18 difference Hessian evaluations) for the Newton dogleg algorithm. However, the

steepest descent algorithm required 224 function evaluations, while the Newton dogleg needed

only 79. As a result, the Newton dogleg code was much more ef¬cient, needing roughly 5 million

¬‚oating point operations instead of the 10 million needed by the steepest descent code.

In Figure 3.2 we plot the performance of the damped Gauss“Newton and Levenberg“

Marquardt algorithms. These exploit the least squares structure of the problem and are locally

superlinearly convergent because this is a zero residual problem. They also show that algo-

rithms that effectively exploit the structure of the least squares problem are much more ef¬cient.

Gauss“Newton required 6 gradient evaluations, 14 function evaluations, and 750 thousand ¬‚oat-

ing point operations, and Levenberg“Marquardt required 12 gradients, 23 functions, and 1.3

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

68 ITERATIVE METHODS FOR OPTIMIZATION

million ¬‚oating point operations.

3.4.2 Discrete Control Problem

We consider the discrete control problem from §1.6.1 with N = 400, T = 1, y0 = 0,

L(y, u, t) = (y ’ 3)2 + .5 — u2 , and φ(y, u, t) = uy + t2 .

We chose the poor initial iterate

u0 (t) = 5 + 300 sin(20πt).

This problem can be solved very ef¬ciently with Algorithm cgtrust. In our implementa-

tion we use the same parameters from (3.51). In Figure 3.3 we compare the dogleg“CG iteration

with steepest descent. We terminated both iterations when ∇f < 10’8 . For the dogleg“CG

code we used · = .01 throughout the entire iteration and an initial trust region radius of u0 .

The steepest descent computation required 48 gradient evaluations, 95 function evaluations, and

roughly 1 million ¬‚oating point operations, and dogleg“CG needed 17 gradient evaluations, 21

function evaluations, and roughly 530 thousand ¬‚oating point operations. Note that the steepest

descent algorithm performed very well in the terminal phase of the iteration. The reason for this

is that, in this example, the Hessian is near the identity.

3.5 Exercises on Global Convergence

3.5.1. Let F be a nonlinear function from RN ’ RN . Let

f (x) = F (x) 2 /2.

What is ∇f ? When is the Newton step for the nonlinear equation F (x) = 0,

d = ’F (x)’1 F (x),

a descent direction for f at x?

3.5.2. Prove Lemma 3.2.1.

3.5.3. Implement Algorithm steep without the scaling ¬xup in (3.50). Apply this crippled

algorithm to the control problem example from §3.4.2. What happens and why?

3.5.4. Show that if f is a convex quadratic then f is bounded from below.

3.5.5. Verify (3.40).

3.5.6. Show that the Levenberg“Marquardt steps computed by (3.20) and (3.21) are the same.

3.5.7. Prove Theorem 3.2.7.

3.5.8. Complete the proof of Theorem 3.3.2.

3.5.9. Prove Theorem 3.3.4.

3.5.10. Look at the trust region algorithm for nonlinear equations from [218] or [84]. What are the

costs of that algorithm that are not present in a line search? When might this trust region

approach have advantages for solving nonlinear equations? Could it be implemented

inexactly?

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 69

3.5.11. The double dogleg method [80], [84] puts a new node on the dogleg path in the Newton