<< . .

. 12
( : 32)



. . >>

and the current trust region radius ∆. The output is the approximate solution of the trust region
problem d. This algorithm is not the whole story, as once the trust region problem is solved
approximately, one must use f (xc + d) to compute ared and then make a decision on how the
trust region radius should be changed. Our formulation differs from that in [247] in that the
termination criterion measures relative residuals in the l2 -norm rather than in the C-norm. This
change in the norm has no effect on the analysis in [247], and, therefore, we can apply the results
in §2.5 directly to draw conclusions about local convergence.
Algorithm 3.3.7. trcg(d, x, f, M, ·, ∆, kmax)
1. r = ’∇f (x), ρ0 = r 2 , k = 1, d = 0
2

2. Do While ρk’1 > · ∇f (x) 2 and k < kmax
(a) z = M r
(b) „k’1 = z T r
(c) if k = 1 then β = 0 and p = z
else
β = „k’1 /„k’2 , p = z + βp
(d) w = ∇2 f (x)p
If pT w ¤ 0 then
Find „ such that d + „ p C = ∆
d = d + „ p; return
(e) ± = „k’1 /pT w
(f) r = r ’ ±w
(g) ρk = rT r
ˆ
(h) d = d + ±p
ˆ
(i) If d C > ∆ then
Find „ such that d + „ p C = ∆
d = d + „ p; return
ˆ
(j) d = d; k = k + 1

Algorithm trcg does what we would expect a dogleg algorithm to do in that the piecewise
linear path determined by the iteration moves monotonically away from x (in the · C -norm!)
and the quadratic model decreases on that path [247]. Algorithm trcg will, therefore, compute
the same Newton step as Algorithm fdpcg. One might think that it may be dif¬cult to compute
the C-norm if one has, for example, a way to compute the action of M on a vector that does
not require computation of the matrix C. However, at the cost of storing two additional vectors
we can update Cp and Cd as the iteration progresses. So, when p is updated to z + βp then
Cp = r + βCp can be updated at the same time without computing the product of C with p.
Then p C = pT Cp. Similarly d = d + „ p implies that Cd = Cd + „ Cp.
Algorithm cgtrust combines the solution of the trust region problem from trcg, the
trust region radius adjustment scheme from trtest, and (indirectly) the locally convergent
algorithm newtcg. The result ¬ts nicely into our paradigm algorithm trgen.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




GLOBAL CONVERGENCE 65

Algorithm 3.3.8. cgtrust(x, f, „ )
1. Initialize ∆, M , ·, kmax.
2. Do forever
(a) Let xc = x. Compute ∇f (xc ).
(b) Call trcg(d, x, f, M, ·, ∆, kmax) to solve the trust region subproblem.
Set xt = x + d.
(c) Call trtest(xc , xt , x, f, ∆),
solving the trust region subproblem with Algorithm trcg.
(d) Update ·.

Theorem 3.3.10 combines several results from [247].
Theorem 3.3.10. Let f be twice Lipschitz continuously differentiable. Let M be a given
positive de¬nite matrix and let {·n } satisfy 0 < ·n < 1 for all n. Let {xn } be the sequence
generated by Algorithm cgtrust and assume that { ∇2 f (xn ) } is bounded. Then

lim ∇f (xn ) = 0.
(3.49)
n’∞

Moreover, if x— is a local minimizer for which the standard assumptions hold and xn ’ x— ,
then
• if ·n ’ 0 the convergence is q-superlinear, and
p
• if ·n ¤ K· ∇f (xn ) for some K· > 0 the convergence is q-superlinear with q-order
1 + p.
Finally, there are δ and ∆ such that if x0 ’ x— ¤ δ and ∆0 ¤ ∆ then xn ’ x— .
One can, as we do in the MATLAB code cgtrust, replace the Hessian“vector product
with a difference Hessian. The accuracy of the difference Hessian and the loss of symmetry
present the potential problem that was mentioned in §2.5. Another, very different, approach is
to approximate the exact solution of the trust region subproblem with an iterative method [243].


3.4 Examples
The results we report here used the MATLAB implementations of steepest descent, steep.m,
damped Gauss“Newton, gaussn.m, the dogleg trust region algorithm for Newton™s method,
ntrust.m, and the PCG“dogleg algorithms, cgtrust.m, from the software collection.
Our MATLAB implementation of Algorithm steep guards against extremely poor scaling
and very long steps by setting » to

»0 = min(1, 100/(1 + ∇f (x) ))
(3.50)

at the beginning of the line search. We invite the reader in Exercise 3.5.3 to attempt the control
example with »0 = 1.
We not only present plots, which are an ef¬cient way to understand convergence rates, but we
also report counts of function, gradient, and Hessian evaluations and the results of the MATLAB
flops command.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




66 ITERATIVE METHODS FOR OPTIMIZATION

Dogleg Dogleg
5 5
10 10


0
10




Function Value
Gradient Norm

0
10
5
10
5
10
10
10


10 15
10 10
0 10 20 30 0 10 20 30
Iterations Iterations

Steepest Descent Steepest Descent
4 5
10 10


2
10


Function Value
Gradient Norm




0
10
0
10
5
10
2
10


4 10
10 10
0 20 40 60 0 20 40 60
Iterations Iterations


Figure 3.1: Steepest Descent and Newton“Dogleg for Parameter ID Problem



Damped Gauss“Newton Damped Gauss“Newton
2 5
10 10


0 0
10 10
Function Value
Gradient Norm




2 5
10 10


4 10
10 10


6 15
10 10
0 2 4 6 0 2 4 6
Iterations Iterations

Levenberg“Marquardt Levenberg“Marquardt
2 5
10 10


0 0
10 10
Function Value
Gradient Norm




2 5
10 10


4 10
10 10


6 15
10 10
0 5 10 15 0 5 10 15
Iterations Iterations


Figure 3.2: Gauss“Newton and Levenberg“Marquardt for Parameter ID Problem




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




GLOBAL CONVERGENCE 67

Dogleg“CG Dogleg“CG
5 7
10 10


6
10




Function Value
Gradient Norm




0
10
5
10
5
10
4
10


10 3
10 10
0 2 4 6 0 2 4 6
Iterations Iterations

Steepest Descent Steepest Descent
5 7
10 10


6
10
Function Value
Gradient Norm




0
10
5
10
5
10
4
10


10 3
10 10
0 20 40 60 0 20 40 60
Iterations Iterations


Figure 3.3: Steepest Descent and Dogleg“CG for Discrete Control Problem


3.4.1 Parameter Identi¬cation
We consider the problem from §2.6.1 except we use the initial data x0 = (5, 5)T . Both the Gauss“
Newton and Newton methods will fail to converge with this initial data without globalization (see
Exercise 3.5.14). Newton™s method has particular trouble with this problem because the Newton
direction is not a descent direction in the early phases of the iteration. The termination criterion
and difference increment for the ¬nite difference Hessian was the same as for the computation
in §2.6.1.
In Figure 3.1 we compare the performance of the Newton dogleg algorithm with the steepest
descent algorithm. Our implementation of the classical dogleg in ntrust uses the standard
values
ωdown = .5, ωup = 2, µ0 = µlow = .25, and µhigh = .75.
(3.51)
The plots clearly show the locally superlinear convergence of Newton™s method and the linear
convergence of steepest descent. However, the graphs do not completely show the difference
in computational costs. In terms of gradient evaluations, steepest descent was marginally better
than the Newton dogleg algorithm, requiring 50 gradients as opposed to 55 (which includes those
needed for the 18 difference Hessian evaluations) for the Newton dogleg algorithm. However, the
steepest descent algorithm required 224 function evaluations, while the Newton dogleg needed
only 79. As a result, the Newton dogleg code was much more ef¬cient, needing roughly 5 million
¬‚oating point operations instead of the 10 million needed by the steepest descent code.
In Figure 3.2 we plot the performance of the damped Gauss“Newton and Levenberg“
Marquardt algorithms. These exploit the least squares structure of the problem and are locally
superlinearly convergent because this is a zero residual problem. They also show that algo-
rithms that effectively exploit the structure of the least squares problem are much more ef¬cient.
Gauss“Newton required 6 gradient evaluations, 14 function evaluations, and 750 thousand ¬‚oat-
ing point operations, and Levenberg“Marquardt required 12 gradients, 23 functions, and 1.3


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




68 ITERATIVE METHODS FOR OPTIMIZATION

million ¬‚oating point operations.

3.4.2 Discrete Control Problem
We consider the discrete control problem from §1.6.1 with N = 400, T = 1, y0 = 0,

L(y, u, t) = (y ’ 3)2 + .5 — u2 , and φ(y, u, t) = uy + t2 .

We chose the poor initial iterate

u0 (t) = 5 + 300 sin(20πt).

This problem can be solved very ef¬ciently with Algorithm cgtrust. In our implementa-
tion we use the same parameters from (3.51). In Figure 3.3 we compare the dogleg“CG iteration
with steepest descent. We terminated both iterations when ∇f < 10’8 . For the dogleg“CG
code we used · = .01 throughout the entire iteration and an initial trust region radius of u0 .
The steepest descent computation required 48 gradient evaluations, 95 function evaluations, and
roughly 1 million ¬‚oating point operations, and dogleg“CG needed 17 gradient evaluations, 21
function evaluations, and roughly 530 thousand ¬‚oating point operations. Note that the steepest
descent algorithm performed very well in the terminal phase of the iteration. The reason for this
is that, in this example, the Hessian is near the identity.


3.5 Exercises on Global Convergence
3.5.1. Let F be a nonlinear function from RN ’ RN . Let

f (x) = F (x) 2 /2.

What is ∇f ? When is the Newton step for the nonlinear equation F (x) = 0,

d = ’F (x)’1 F (x),

a descent direction for f at x?

3.5.2. Prove Lemma 3.2.1.

3.5.3. Implement Algorithm steep without the scaling ¬xup in (3.50). Apply this crippled
algorithm to the control problem example from §3.4.2. What happens and why?

3.5.4. Show that if f is a convex quadratic then f is bounded from below.

3.5.5. Verify (3.40).

3.5.6. Show that the Levenberg“Marquardt steps computed by (3.20) and (3.21) are the same.

3.5.7. Prove Theorem 3.2.7.

3.5.8. Complete the proof of Theorem 3.3.2.

3.5.9. Prove Theorem 3.3.4.

3.5.10. Look at the trust region algorithm for nonlinear equations from [218] or [84]. What are the
costs of that algorithm that are not present in a line search? When might this trust region
approach have advantages for solving nonlinear equations? Could it be implemented
inexactly?


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




GLOBAL CONVERGENCE 69

3.5.11. The double dogleg method [80], [84] puts a new node on the dogleg path in the Newton

<< . .

. 12
( : 32)



. . >>