sk ≥ ∇f (xk ) /M.

Hence, we need only consider the case in which

sk = ∆k and sk < ∇f (xk ) ,

(3.34)

since if (3.34) does not hold then (3.32) holds with MT = min(1, 1/M ).

We will complete the proof by showing that if (3.34) holds and sk is accepted, then

’2

2σ min(1 ’ µhigh , (1 ’ µ0 )ωup )

sk = ∆k ≥ ∇f (xk ) .

(3.35)

M +L

This will complete the proof with

’2

2σ min(1 ’ µhigh , (1 ’ µ0 )ωup )

MT = min 1, 1/M, .

M +L

Now increase the constant M > 0 in part 1 of Assumption 3.3.1 if needed so that

Hk ¤ M for all k.

(3.36)

We prove (3.35) by showing that if (3.34) holds and (3.35) does not hold for a trial step st ,

then the trust region radius will be expanded and the step corresponding to the larger radius will

be acceptable. Let st be a trial step such that st < ∇f (xk ) and

’2

2σ min(1 ’ µhigh , (1 ’ µ0 )ωup )

st = ∆t < ∇f (xk ) .

(3.37)

M +L

We use the Lipschitz continuity of ∇f and (3.36) to obtain

1

T

(∇f (xk + tst ) ’ ∇f (xk ))T st dt

aredt = ’∇f (xk ) st ’

0

1

sT Hk st /2 (∇f (xk + tst ) ’ ∇f (xk ))T st dt

= predt + ’

t

0

≥ predt ’ (M + L) st 2 /2.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

54 ITERATIVE METHODS FOR OPTIMIZATION

Therefore, using (3.30) from Assumption 3.3.1, we have

2

aredt (M + L) st

≥1’

predt 2predt

(3.38)

(M + L) st 2

≥1’ .

2σ ∇f (xk ) min( ∇f (xk ) , st )

Now since st < ∇f (xk ) by (3.34) we have

min( ∇f (xk ) , st ) = st

and hence

aredk (M + L) st

≥1’ > µhigh

(3.39)

predk 2 ∇f (xk ) σ

by (3.37). Hence, an expansion step will be taken by replacing ∆t by ∆+ = ωup ∆t and st by

t

+

st , the minimum of the quadratic model in the new trust region.

Now (3.38) still holds and, after the expansion,

s+ ¤ ωup st < ωup ∇f (xk ) .

t

So

min( ∇f (xk ) , s+ ) > s+ /ωup .

t t

Hence,

ared+ (M + L) s+ 2

t t

≥1’

+

2σ ∇f (xk ) min( ∇f (xk ) , s+ )

predt t

2

(M + L)ωup s+ (M + L)ωup st

t

≥1’ ≥1’ ≥ µ0

2 ∇f (xk ) σ 2 ∇f (xk ) σ

by (3.37). Hence, the expansion will produce an acceptable step. This means that if the ¬nal

accepted step satis¬es (3.34), it must also satisfy (3.35). This completes the proof.

3.3.3 A Unidirectional Trust Region Algorithm

The most direct way to compute a trial point that satis¬es Assumption 3.3.1 is to mimic the line

search and simply minimize the quadratic model in the steepest descent direction subject to the

trust region bound constraints.

In this algorithm, given a current point xc and trust region radius ∆c , our trial point is the

minimizer of

ψc (») = mc (xc ’ »∇f (xc ))

subject to the constraint that

x(») = xc ’ »∇f (xc ) ∈ T (∆c ).

ˆ

Clearly the solution is x(») where

± ∆c

if ∇f (xc )T Hc ∇f (xc ) ¤ 0 ,

∇f (xc )

ˆ

»=

(3.40)

min ∇f (xc ) 2 ∆c

if ∇f (xc )T Hc ∇f (xc ) > 0.

, ∇f (xc )

∇f (xc )T Hc ∇f (xc )

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 55

ˆ

x(»), the minimizer of the quadratic model in the steepest descent direction, subject to the trust

region bounds, is called the Cauchy point. We will denote the Cauchy point by xCP .1

c

Then with xCP as trial point, one can use Theorem 3.3.1 to derive a global convergence

theorem for the unidirectional trust region.

Theorem 3.3.2. Let ∇f be Lipschitz continuous with Lipschitz constant L. Let {xk } be

generated by Algorithm trgen with xt = xCP and (3.40). Assume that the matrices {Hk } are

bounded. Then either f (xk ) is unbounded from below, ∇f (xk ) = 0 for some ¬nite k, or

lim ∇f (xk ) = 0.

k’∞

Proof. We show that xt satis¬es part 2 of Assumption 3.3.1. If st = ∆c then the assertion

holds trivially. If st < ∆c then, by de¬nition of xCP ,

c

∇f (xc ) 2 ∇f (xc )

st = ’ .

∇f (xc )T Hc ∇f (xc )

Hence, if Hc ¤ M ,

st ≥ ∇f (xc ) /M

as asserted.

We leave the proof that xt satis¬es part 1 for the reader (exercise 3.5.8).

The assumptions we used are stronger that those in, for example, [104] and [223], where

lim inf ∇f (xk ) = 0

rather than ∇f (xk ) ’ 0 is proved.

3.3.4 The Exact Solution of the Trust Region Problem

The theory of constrained optimization [117], [104] leads to a characterization of the solutions

of the trust region problem. In this section we derive that characterization via an elementary

argument (see also [84], [242], and [109]). This book focuses on approximate solutions, but the

reader should be aware that the exact solution can be computed accurately [192], [243].

Theorem 3.3.3. Let g ∈ RN and let A be a symmetric N — N matrix. Let

m(s) = g T s + sT As/2.

A vector s is a solution to

min m(s)

(3.41)

s ¤∆

if and only if there is ν ≥ 0 such that

(A + νI)s = ’g

and either ν = 0 or s = ∆.

Proof. If s < ∆ then ∇m(s) = g + As = 0, and the conclusion follows with ν = 0. To

consider the case where s = ∆, let »1 ¤ »2 ¤ · · · »N be the eigenvalues of A.

some of the literature, [84], for example, Hc is assumed to be positive de¬nite and the Cauchy point is taken to

1 In

be the global minimizer of the quadratic model.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

56 ITERATIVE METHODS FOR OPTIMIZATION

Clearly, for any ν,

= g T s + sT As/2

m(s)

= g T s + sT (A + νI)s/2 ’ ν∆2 /2.

Consider the function, de¬ned for ν > ν0 = max(0, ’»1 ),

s(ν) = ’(A + νI)’1 g.

Since

lim s(ν) = 0

ν’∞

and s(ν) is a continuous decreasing function of ν ∈ (ν0 , ∞) we see that if

lim s(ν) > ∆

ν’ν0

then there is a unique ν such that s(ν) = ∆. Since ν ≥ ν0 , A + νI is positive semide¬nite;

therefore, s(ν) is a global minimizer of

g T s + sT (A + νI)s/2.

Hence, we must have

m(s) ≥ m(s(ν))

for all s such that s = ∆. Hence, s(ν) is a solution of (3.41).

The remaining case is

lim s(ν) ¤ ∆.

ν’ν0

This implies that g is orthogonal to the nontrivial space S0 of eigenfunctions corresponding to

’ν0 (for otherwise the limit would be in¬nite). If we let s = s1 + s2 , where s2 is the projection

of s onto S0 , we have

= sT g + sT (A + ν0 )s1 /2 + sT (A + ν0 )s2 /2 ’ ν0 ∆2 /2

m(s) 1 1 2

= sT g + sT (A + »0 )s1 /2 ’ ν0 ∆2 /2.

1 1

Hence, m(s) is minimized by setting s1 equal to the minimum norm solution of (A+ν0 )x = ’g

(which exists by orthogonality of g to S0 ) and letting s2 be any element of S0 such that

2

= ∆2 ’ s1 2 .

s2

This completes the proof.

3.3.5 The Levenberg“Marquardt Parameter

The solution of the trust region problem presented in §3.3.4 suggests that, rather than controlling

∆, one could set

st = ’(νI + Hc )’1 g,

adjust ν in response to ared/pred instead of ∆, and still maintain global convergence. A natural

application of this idea is control of the Levenberg“Marquardt parameter. This results in a

much simpler algorithm than Levenberg“Marquardt“Armijo in that the stepsize control can be

eliminated. We need only vary the Levenberg“Marquardt parameter as the iteration progresses.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 57

We present the algorithm from [190] to illustrate this point. For an inexact formulation, see

[276].

The Levenberg“Marquardt quadratic model of least squares objective

M

1 1

2

R(x)T R(x)

f (x) = ri (x) =

2

2 2

i=1

with parameter νc at the point xc is

= f (xc ) + (x ’ xc )T R (xc )T R(xc )

mc (x)

(3.42)

+ 1 (x ’ xc )T (R (xc )T R (xc ) + νc I)(x ’ xc ).

2

The minimizer of the quadratic model is the trial point

xt = xc ’ (R (xc )T R (xc ) + νc I)’1 R (xc )T R(xc ),

(3.43)

the step is s = xt ’ xc , and the predicted reduction is

1

= m(xc ) ’ m(xt ) = ’sT R (xc )T R(xc ) ’ 2 sT (R (xc )T R (xc ) + νc I)s

pred

= ’sT R (xc )T R(xc ) + 1 sT R (xc )T R(xc ) = ’ 1 sT ∇f (xc ).

2 2

The algorithm we present below follows the trust region paradigm and decides on accepting

the trial point and on adjustments in the Levenberg“Marquardt parameter by examinaing the

ratio

ared f (xc ) ’ f (xt )

=

pred m(xc ) ’ m(xt )

f (xc ) ’ f (xt )

= ’2 .

sT ∇f (xc )

In addition to the trust region parameters 0 < ωdown < 1 < ωup and µ0 ¤ µlow < µhigh

we require a default value ν0 of the Levenberg“Marquardt parameter.

The algorithm for testing the trial point differs from Algorithm trtest in that we decrease

(increase) ν rather that increasing (decreasing) a trust region radius if ared/pred is large (small).

We also attempt to set the Levenberg“Marquardt parameter to zero when possible in order to

recover the Gauss“Newton iteration™s fast convergence for small residual problems.

Algorithm 3.3.4. trtestlm(xc , xt , x+ , f, ν)

1. z = xc

2. Do while z = xc

(a) ared = f (xc ) ’ f (xt ), st = xt ’ xc , pred = ’∇f (xc )T st /2.

(b) If ared/pred < µ0 then set z = xc , ν = max(ωup ν, ν0 ), and recompute the trial

point with the new value of ν.

(c) If µ0 ¤ ared/pred < µlow , then set z = xt and ν = max(ωup ν, ν0 ).

(d) If µlow ¤ ared/pred, then set z = xt .

If µhigh < ared/pred, then set ν = ωdown ν.

If ν < ν0 , then set ν = 0.

3. x+ = z.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

58 ITERATIVE METHODS FOR OPTIMIZATION

The Levenberg“Marquardt version ofAlgorithm trgen is simple to describe and implement.

Algorithm 3.3.5. levmar(x, R, kmax)

1. Set ν = ν0 .

2. For k = 1, . . . , kmax

(a) Let xc = x.

(b) Compute R, f , R , and ∇f ; test for termination.

(c) Compute xt using (3.43).

(d) Call trtestlm(xc , xt , x, f, ν)

We state a convergence result [190], [276] without proof.

Theorem 3.3.4. Let R be Lipschitz continuously differentiable. Let {xk } and {νk } be the

sequence of iterates and Levenberg“Marquardt parameters generated by Algorithm levmar

with kmax = ∞. Assume that {νk } is bounded from above. Then either R (xk )T R(xk ) = 0

for some ¬nite k or

lim R (xk )T R(xk ) = 0.

k’∞

Moreover, if x— is a limit point of {xk } for which R(x— ) = 0 and R (x— ) has full rank, then

xk ’ x— q-quadratically and νk = 0 for k suf¬ciently large.

3.3.6 Superlinear Convergence: The Dogleg

The convergence of the unidirectional trust region iteration can be as slow as that for steepest

descent. To improve the convergence speed in the terminal phase we must allow for approx-

imations to the Newton direction. The power of trust region methods is the ease with which

the transition from steepest descent, with its good global properties, to Newton™s method can be

managed.

We de¬ne the Newton point at xc as

xN = xc ’ Hc ∇f (xc ).

’1

c

If Hc is spd, the Newton point is the global minimizer of the local quadratic model. On the other

hand, if Hc has directions of negative curvature the local quadratic model will not have a ¬nite

minimizer, but the Newton point is still useful. Note that if H = I the Newton point and the

Cauchy point are the same if the Newton point is inside the trust region.

We will restrict our attention to a special class of algorithms that approximate the solution

of the trust region problem by minimizing mc along a piecewise linear path S ‚ T (∆). These

paths are sometimes called doglegs because of the shapes of the early examples [84], [80], [218],

[217], [220]. In the case where ∇2 f (x) is spd, one may think of the dogleg path as a piecewise

linear approximation to the path with parametric representation

{x ’ (»I + ∇2 f (x))’1 ∇f (x) | 0 ¤ »}.

This is the path on which the exact solution of the trust region problem lies.

The next step up from the unidirectional path, the classical dogleg path [220], has as many

as three nodes, xc , xCP — , and xN . Here xCP — is the global minimizer of the quadratic model in

c c c

the steepest descent direction, which will exist if and only if ∇f (xc )T Hc ∇f (xc ) > 0. If xCP —

c

exists and

(xN ’ xCP — )T (xCP — ’ xc ) > 0,

(3.44) c c c