<< . .

. 10
( : 32)



. . >>


sk ≥ ∇f (xk ) /M.

Hence, we need only consider the case in which

sk = ∆k and sk < ∇f (xk ) ,
(3.34)

since if (3.34) does not hold then (3.32) holds with MT = min(1, 1/M ).
We will complete the proof by showing that if (3.34) holds and sk is accepted, then
’2
2σ min(1 ’ µhigh , (1 ’ µ0 )ωup )
sk = ∆k ≥ ∇f (xk ) .
(3.35)
M +L
This will complete the proof with
’2
2σ min(1 ’ µhigh , (1 ’ µ0 )ωup )
MT = min 1, 1/M, .
M +L
Now increase the constant M > 0 in part 1 of Assumption 3.3.1 if needed so that

Hk ¤ M for all k.
(3.36)

We prove (3.35) by showing that if (3.34) holds and (3.35) does not hold for a trial step st ,
then the trust region radius will be expanded and the step corresponding to the larger radius will
be acceptable. Let st be a trial step such that st < ∇f (xk ) and
’2
2σ min(1 ’ µhigh , (1 ’ µ0 )ωup )
st = ∆t < ∇f (xk ) .
(3.37)
M +L
We use the Lipschitz continuity of ∇f and (3.36) to obtain
1
T
(∇f (xk + tst ) ’ ∇f (xk ))T st dt
aredt = ’∇f (xk ) st ’
0

1
sT Hk st /2 (∇f (xk + tst ) ’ ∇f (xk ))T st dt
= predt + ’
t
0

≥ predt ’ (M + L) st 2 /2.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




54 ITERATIVE METHODS FOR OPTIMIZATION

Therefore, using (3.30) from Assumption 3.3.1, we have
2
aredt (M + L) st
≥1’
predt 2predt
(3.38)
(M + L) st 2
≥1’ .
2σ ∇f (xk ) min( ∇f (xk ) , st )

Now since st < ∇f (xk ) by (3.34) we have

min( ∇f (xk ) , st ) = st

and hence
aredk (M + L) st
≥1’ > µhigh
(3.39)
predk 2 ∇f (xk ) σ
by (3.37). Hence, an expansion step will be taken by replacing ∆t by ∆+ = ωup ∆t and st by
t
+
st , the minimum of the quadratic model in the new trust region.
Now (3.38) still holds and, after the expansion,

s+ ¤ ωup st < ωup ∇f (xk ) .
t

So
min( ∇f (xk ) , s+ ) > s+ /ωup .
t t

Hence,
ared+ (M + L) s+ 2
t t
≥1’
+
2σ ∇f (xk ) min( ∇f (xk ) , s+ )
predt t

2
(M + L)ωup s+ (M + L)ωup st
t
≥1’ ≥1’ ≥ µ0
2 ∇f (xk ) σ 2 ∇f (xk ) σ
by (3.37). Hence, the expansion will produce an acceptable step. This means that if the ¬nal
accepted step satis¬es (3.34), it must also satisfy (3.35). This completes the proof.


3.3.3 A Unidirectional Trust Region Algorithm
The most direct way to compute a trial point that satis¬es Assumption 3.3.1 is to mimic the line
search and simply minimize the quadratic model in the steepest descent direction subject to the
trust region bound constraints.
In this algorithm, given a current point xc and trust region radius ∆c , our trial point is the
minimizer of
ψc (») = mc (xc ’ »∇f (xc ))
subject to the constraint that

x(») = xc ’ »∇f (xc ) ∈ T (∆c ).

ˆ
Clearly the solution is x(») where
± ∆c
if ∇f (xc )T Hc ∇f (xc ) ¤ 0 ,

 ∇f (xc )
ˆ
»=
(3.40)

 min ∇f (xc ) 2 ∆c
if ∇f (xc )T Hc ∇f (xc ) > 0.
, ∇f (xc )
∇f (xc )T Hc ∇f (xc )




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




GLOBAL CONVERGENCE 55

ˆ
x(»), the minimizer of the quadratic model in the steepest descent direction, subject to the trust
region bounds, is called the Cauchy point. We will denote the Cauchy point by xCP .1
c
Then with xCP as trial point, one can use Theorem 3.3.1 to derive a global convergence
theorem for the unidirectional trust region.
Theorem 3.3.2. Let ∇f be Lipschitz continuous with Lipschitz constant L. Let {xk } be
generated by Algorithm trgen with xt = xCP and (3.40). Assume that the matrices {Hk } are
bounded. Then either f (xk ) is unbounded from below, ∇f (xk ) = 0 for some ¬nite k, or

lim ∇f (xk ) = 0.
k’∞



Proof. We show that xt satis¬es part 2 of Assumption 3.3.1. If st = ∆c then the assertion
holds trivially. If st < ∆c then, by de¬nition of xCP ,
c

∇f (xc ) 2 ∇f (xc )
st = ’ .
∇f (xc )T Hc ∇f (xc )

Hence, if Hc ¤ M ,
st ≥ ∇f (xc ) /M
as asserted.
We leave the proof that xt satis¬es part 1 for the reader (exercise 3.5.8).
The assumptions we used are stronger that those in, for example, [104] and [223], where

lim inf ∇f (xk ) = 0

rather than ∇f (xk ) ’ 0 is proved.


3.3.4 The Exact Solution of the Trust Region Problem
The theory of constrained optimization [117], [104] leads to a characterization of the solutions
of the trust region problem. In this section we derive that characterization via an elementary
argument (see also [84], [242], and [109]). This book focuses on approximate solutions, but the
reader should be aware that the exact solution can be computed accurately [192], [243].
Theorem 3.3.3. Let g ∈ RN and let A be a symmetric N — N matrix. Let

m(s) = g T s + sT As/2.

A vector s is a solution to
min m(s)
(3.41)
s ¤∆

if and only if there is ν ≥ 0 such that

(A + νI)s = ’g

and either ν = 0 or s = ∆.
Proof. If s < ∆ then ∇m(s) = g + As = 0, and the conclusion follows with ν = 0. To
consider the case where s = ∆, let »1 ¤ »2 ¤ · · · »N be the eigenvalues of A.
some of the literature, [84], for example, Hc is assumed to be positive de¬nite and the Cauchy point is taken to
1 In

be the global minimizer of the quadratic model.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




56 ITERATIVE METHODS FOR OPTIMIZATION

Clearly, for any ν,

= g T s + sT As/2
m(s)

= g T s + sT (A + νI)s/2 ’ ν∆2 /2.

Consider the function, de¬ned for ν > ν0 = max(0, ’»1 ),

s(ν) = ’(A + νI)’1 g.

Since
lim s(ν) = 0
ν’∞

and s(ν) is a continuous decreasing function of ν ∈ (ν0 , ∞) we see that if

lim s(ν) > ∆
ν’ν0

then there is a unique ν such that s(ν) = ∆. Since ν ≥ ν0 , A + νI is positive semide¬nite;
therefore, s(ν) is a global minimizer of

g T s + sT (A + νI)s/2.

Hence, we must have
m(s) ≥ m(s(ν))
for all s such that s = ∆. Hence, s(ν) is a solution of (3.41).
The remaining case is
lim s(ν) ¤ ∆.
ν’ν0

This implies that g is orthogonal to the nontrivial space S0 of eigenfunctions corresponding to
’ν0 (for otherwise the limit would be in¬nite). If we let s = s1 + s2 , where s2 is the projection
of s onto S0 , we have

= sT g + sT (A + ν0 )s1 /2 + sT (A + ν0 )s2 /2 ’ ν0 ∆2 /2
m(s) 1 1 2


= sT g + sT (A + »0 )s1 /2 ’ ν0 ∆2 /2.
1 1

Hence, m(s) is minimized by setting s1 equal to the minimum norm solution of (A+ν0 )x = ’g
(which exists by orthogonality of g to S0 ) and letting s2 be any element of S0 such that
2
= ∆2 ’ s1 2 .
s2

This completes the proof.

3.3.5 The Levenberg“Marquardt Parameter
The solution of the trust region problem presented in §3.3.4 suggests that, rather than controlling
∆, one could set
st = ’(νI + Hc )’1 g,
adjust ν in response to ared/pred instead of ∆, and still maintain global convergence. A natural
application of this idea is control of the Levenberg“Marquardt parameter. This results in a
much simpler algorithm than Levenberg“Marquardt“Armijo in that the stepsize control can be
eliminated. We need only vary the Levenberg“Marquardt parameter as the iteration progresses.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




GLOBAL CONVERGENCE 57

We present the algorithm from [190] to illustrate this point. For an inexact formulation, see
[276].
The Levenberg“Marquardt quadratic model of least squares objective
M
1 1
2
R(x)T R(x)
f (x) = ri (x) =
2
2 2
i=1

with parameter νc at the point xc is

= f (xc ) + (x ’ xc )T R (xc )T R(xc )
mc (x)
(3.42)
+ 1 (x ’ xc )T (R (xc )T R (xc ) + νc I)(x ’ xc ).
2

The minimizer of the quadratic model is the trial point

xt = xc ’ (R (xc )T R (xc ) + νc I)’1 R (xc )T R(xc ),
(3.43)

the step is s = xt ’ xc , and the predicted reduction is
1
= m(xc ) ’ m(xt ) = ’sT R (xc )T R(xc ) ’ 2 sT (R (xc )T R (xc ) + νc I)s
pred

= ’sT R (xc )T R(xc ) + 1 sT R (xc )T R(xc ) = ’ 1 sT ∇f (xc ).
2 2

The algorithm we present below follows the trust region paradigm and decides on accepting
the trial point and on adjustments in the Levenberg“Marquardt parameter by examinaing the
ratio
ared f (xc ) ’ f (xt )
=
pred m(xc ) ’ m(xt )

f (xc ) ’ f (xt )
= ’2 .
sT ∇f (xc )
In addition to the trust region parameters 0 < ωdown < 1 < ωup and µ0 ¤ µlow < µhigh
we require a default value ν0 of the Levenberg“Marquardt parameter.
The algorithm for testing the trial point differs from Algorithm trtest in that we decrease
(increase) ν rather that increasing (decreasing) a trust region radius if ared/pred is large (small).
We also attempt to set the Levenberg“Marquardt parameter to zero when possible in order to
recover the Gauss“Newton iteration™s fast convergence for small residual problems.
Algorithm 3.3.4. trtestlm(xc , xt , x+ , f, ν)

1. z = xc

2. Do while z = xc

(a) ared = f (xc ) ’ f (xt ), st = xt ’ xc , pred = ’∇f (xc )T st /2.
(b) If ared/pred < µ0 then set z = xc , ν = max(ωup ν, ν0 ), and recompute the trial
point with the new value of ν.
(c) If µ0 ¤ ared/pred < µlow , then set z = xt and ν = max(ωup ν, ν0 ).
(d) If µlow ¤ ared/pred, then set z = xt .
If µhigh < ared/pred, then set ν = ωdown ν.
If ν < ν0 , then set ν = 0.

3. x+ = z.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




58 ITERATIVE METHODS FOR OPTIMIZATION

The Levenberg“Marquardt version ofAlgorithm trgen is simple to describe and implement.
Algorithm 3.3.5. levmar(x, R, kmax)
1. Set ν = ν0 .
2. For k = 1, . . . , kmax
(a) Let xc = x.
(b) Compute R, f , R , and ∇f ; test for termination.
(c) Compute xt using (3.43).
(d) Call trtestlm(xc , xt , x, f, ν)

We state a convergence result [190], [276] without proof.
Theorem 3.3.4. Let R be Lipschitz continuously differentiable. Let {xk } and {νk } be the
sequence of iterates and Levenberg“Marquardt parameters generated by Algorithm levmar
with kmax = ∞. Assume that {νk } is bounded from above. Then either R (xk )T R(xk ) = 0
for some ¬nite k or
lim R (xk )T R(xk ) = 0.
k’∞

Moreover, if x— is a limit point of {xk } for which R(x— ) = 0 and R (x— ) has full rank, then
xk ’ x— q-quadratically and νk = 0 for k suf¬ciently large.


3.3.6 Superlinear Convergence: The Dogleg
The convergence of the unidirectional trust region iteration can be as slow as that for steepest
descent. To improve the convergence speed in the terminal phase we must allow for approx-
imations to the Newton direction. The power of trust region methods is the ease with which
the transition from steepest descent, with its good global properties, to Newton™s method can be
managed.
We de¬ne the Newton point at xc as

xN = xc ’ Hc ∇f (xc ).
’1
c

If Hc is spd, the Newton point is the global minimizer of the local quadratic model. On the other
hand, if Hc has directions of negative curvature the local quadratic model will not have a ¬nite
minimizer, but the Newton point is still useful. Note that if H = I the Newton point and the
Cauchy point are the same if the Newton point is inside the trust region.
We will restrict our attention to a special class of algorithms that approximate the solution
of the trust region problem by minimizing mc along a piecewise linear path S ‚ T (∆). These
paths are sometimes called doglegs because of the shapes of the early examples [84], [80], [218],
[217], [220]. In the case where ∇2 f (x) is spd, one may think of the dogleg path as a piecewise
linear approximation to the path with parametric representation

{x ’ (»I + ∇2 f (x))’1 ∇f (x) | 0 ¤ »}.

This is the path on which the exact solution of the trust region problem lies.
The next step up from the unidirectional path, the classical dogleg path [220], has as many
as three nodes, xc , xCP — , and xN . Here xCP — is the global minimizer of the quadratic model in
c c c
the steepest descent direction, which will exist if and only if ∇f (xc )T Hc ∇f (xc ) > 0. If xCP —
c
exists and
(xN ’ xCP — )T (xCP — ’ xc ) > 0,
(3.44) c c c

<< . .

. 10
( : 32)



. . >>