Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 59

we will let xN be the terminal node. If (3.44) holds, as it always will if Hc is spd, then the path

c

can be parameterized by the distance from xc and, moreover, mc decreases along the path. If

(3.44) does not hold, we do not use xN as a node and revert to the unidirectional path in the

c

steepest descent direction.

Note that (3.44) implies

∇f (xc )T (xN ’ xc ) < 0.

(3.45) c

We can express the conditions for using the three node path rather than the unidirectional path

very simply. If xCP is on the boundary of the trust region then we accept xCP as the trial point.

c c

If xCP = xCP — is in the interior of the trust region, then we test (3.44) to decide what to do.

c c

With this in mind our trial point for the classical dogleg algorithm will be

± CP

if xc ’ xCP = ∆

x

c

or xCP — exists and (3.44) fails,

D

xN if xc ’ xCP < xc ’ xN ¤ ∆

x (∆) =

(3.46) c c

and (3.44) holds,

D

y (∆) otherwise.

Here y D (∆) is the unique point between xCP and xN such that xD ’ xc = ∆.

c c

The important properties of dogleg methods are as follows:

• No two points on the path have the same distance from xc ; hence the path may be param-

eterized as x(s), where s = x(s) ’ xc .

• mc (x(s)) is a strictly decreasing function of s.

This enables us to show that the dogleg approximate solution of the trust region problem sat-

is¬es Assumption 3.3.1 and apply Theorem 3.3.1 to conclude global convergence. Superlinear

convergence will follow if Hk is a suf¬ciently good approximation to ∇2 f (xk ).

Lemma 3.3.5. Let xc , Hc , and ∆ be given. Let Hc be nonsingular,

sN = ’Hc ∇f (xc ), and xN = xc + sN .

’1

Assume that ∇f (xc )T Hc ∇f (xc ) > 0 and let

∇f (xc ) 2

sCP — = xCP — ’ xc = ’ ∇f (xc ).

∇f (xc )T Hc ∇f (xc )

Let S be the piecewise linear path from xc to xCP — to xN . Then if

(sN ’ sCP — )T sCP — > 0,

(3.47)

for any δ ¤ sN there is a unique point x(δ) on S such that

x(δ) ’ xc = δ.

Proof. Clearly the statement of the result holds on the segment of the path from x to xCP — .

To prove the result on the segment from xCP — to xN we must show that

1

(1 ’ »)sCP — + »sN 2

φ(») =

2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

60 ITERATIVE METHODS FOR OPTIMIZATION

is strictly monotone increasing for » ∈ (0, 1).

Since (3.47) implies that

sN sCP — ≥ (sN )T sCP — > sCP — 2

and therefore that sN > sCP — , we have

= (sN ’ sCP — )T ((1 ’ »)sCP — + »sN )

φ (»)

= ’(1 ’ ») sCP — 2

+ (1 ’ »)(sN )T sCP — + » sN 2

’ »(sN )T sCP —

> »( sN 2

’ (sN )T sCP — ) ≥ »( sN ’ sCP — ) sN > 0.

Hence, φ is an increasing function and the proof is complete.

The next stage is to show that the local quadratic model decreases on the dogleg path S.

Lemma 3.3.6. Let the assumptions of Lemma 3.3.5 hold. Then the local quadratic model

1

mc (x) = f (xc ) + ∇f (xc )T (x ’ xc ) + (x ’ xc )T Hc (x ’ xc )

2

is strictly monotone decreasing on S.

Proof. Since xCP — is the minimum of the local quadratic model in the steepest descent

c

direction, we need only show that mc is strictly decreasing on the segment of the path between

xCP — and xN . Set

c

= mc (xc + (1 ’ »)sCP — + »sN )

ψ(»)

= f (xc ) + ∇f (xc )T ((1 ’ »)sCP — + »sN )

+ 1 ((1 ’ »)sCP — + »sN )T Hc ((1 ’ »)sCP — + »sN ).

2

ˆ

Noting that Hc sN = ’∇f (xc ) and sCP — = ’»∇f (xc ), we obtain

ˆ

= f (xc ) ’ »(1 ’ »)2 ∇f (xc ) 2

ψ(»)

+»(1 ’ »/2)∇f (xc )T sN

ˆ

+ 1 (1 ’ »)2 »2 ∇f (xc )T Hc ∇f (xc ).

2

Therefore,

ˆ 2

ψ (») = 2»(1 ’ ») ∇f (xc )

ˆ

+(1 ’ »)∇f (xc )T sN ’ (1 ’ »)»2 ∇f (xc )T Hc ∇f (cc ).

Since

ˆ

»∇f (xc )T Hc ∇f (cc ) = ∇f (xc ) 2

we have, using (3.44),

ˆ 2

’ ∇f (xc )T Hc ∇f (xc ))

’1

ψ (») = (1 ’ »)(» ∇f (xc )

ˆ

= (1 ’ »)∇f (xc )T (»∇f (xc ) ’ Hc ∇f (xc ))

’1

1’»

(xc ’ xCP — )T (xN ’ xc ) < 0,

= c c

ˆ

»

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 61

completing the proof.

At this point we have shown that the approximate trust region problem has a unique solution.

To prove global convergence we need only verify that the approximate solution of the trust region

problem xD satis¬es Assumption 3.3.1.

Theorem 3.3.7. Let ∇f be Lipschitz continuous with Lipschitz constant L. Let {xk } be

generated by Algorithm trgen and the solutions for the trust region problem be given by

(3.46). Assume that the matrices {Hk } are bounded. Then either f (xk ) is unbounded from

below, ∇f (xk ) = 0 for some ¬nite k, or

lim ∇f (xk ) = 0.

(3.48)

k’∞

Proof. We need to check that the solutions of the trust region problem satisfy Assump-

tion 3.3.1. Part 2 of the assumption follows from the de¬nition, (3.46), of xD and the bounded-

ness of the approximate Hessians. Let

Hk ¤ M

for all k. If sk < ∆, then (3.46) implies that (3.44) must hold and so xt = xN is the Newton

k

point. Hence,

’1

sk = xk ’ xN = Hk ∇f (xk ) ≥ ∇f (xk ) /M,

k

which proves part 2.

Veri¬cation of part 1 will complete the proof. There are several cases to consider depending

on how xD is computed.

If xD = xCP then either sCP = ∆c or (3.44) fails. We ¬rst consider the case where

ˆ

∇f (xc )T Hc ∇f (xc ) ¤ 0. In that case sCP = ∆c and » = ∆c / ∇f (xc ) . Therefore,

ˆ

ˆ »2

2 T

pred = » ∇f (xc ) ’ 2 ∇f (xc ) Hc ∇f (xc )

T

2 ∇f (xc ) Hc ∇f (xc )

= ∆c ∇f (xc ) ’ ∆c

2 ∇f (xc ) 2

≥ ∆c ∇f (xc ) = s ∇f (xc ) .

Hence (3.30) holds with σ = 1.

Now assume that ∇f (xc )T Hc ∇f (xc ) > 0 and sCP = ∆c . In this case

∇f (xc ) 2 ∆c

≥

∇f (xc )T Hc ∇f (xc ) ∇f (xc )

and so

ˆ

ˆ »2

2 T

pred = » ∇f (xc ) ’ 2 ∇f (xc ) Hc ∇f (xc )

T

2 ∇f (xc ) Hc ∇f (xc )

= ∆c ∇f (xc ) ’ ∆c

2 ∇f (xc ) 2

≥ ∆c ∇f (xc ) /2,

which is (3.30) with σ = 1/2.

If (3.44) fails, ∇f (xc )T Hc ∇f (xc ) > 0, and sCP < ∆c , then

∇f (xc ) 2

ˆ

»= ,

∇f (xc )T Hc ∇f (xc )

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

62 ITERATIVE METHODS FOR OPTIMIZATION

and

ˆ

ˆ »2

2 T

pred = » ∇f (xc ) ’ 2 ∇f (xc ) Hc ∇f (xc )

ˆ

∇f (xc ) 4 2

» ∇f (xc )

= =

2∇f (xc )T Hc ∇f (xc ) 2

s ∇f (xc )

= ,

2

which is (3.30) with σ = 1/2.

The ¬nal case is if (3.44) holds and xD = xCP . In that case the predicted reduction is more

than Cauchy decrease, i.e., the decrease obtained by taking the Cauchy point, and hence

∇f (xc ) 4

pred ≥

2∇f (xc )T Hc ∇f (xc )

2

∇f (xc )

≥ ,

2M

which is (3.30) with σ = 1/(2M ). This completes the proof.

The last part of the proof of this theorem is very important, asserting that any solution of

the trust region problem for which pred is at least a ¬xed fraction of Cauchy decrease will give

global convergence. We refer the reader to [232] and [104] for a more general and detailed

treatment using this point of view.

Corollary 3.3.8. Any algorithm for solving the trust region problem that satis¬es for some

„ >0

pred ≥ „ (mc (xc ) ’ mc (xCP ))

c

satis¬es (3.30) for σ = „ /2.

The trust region CG algorithm we present in §3.3.7 can be analyzed with this corollary.

If Hk = ∇2 f (xk ) or a suf¬ciently good approximation, then the classical dogleg will become

Newton™s method (or a superlinearly convergent method) as the iterations approach a minimizer

that satis¬es the standard assumptions. Hence, the algorithm makes a smooth and automatic

transition into the superlinearly convergent stage.

Theorem 3.3.9. Let ∇f be Lipschitz continuous with Lipschitz constant L. Let {xk } be

generated by Algorithm trgen and the solutions for the trust region problem are given by (3.46).

Assume that Hk = ∇2 f (xk ) and that the matrices {Hk } are bounded. Let f be bounded from

below. Let x— be a minimizer of f at which the standard assumptions hold. Then if x— is a limit

point of xk , then xk ’ x— and the convergence is locally q-quadratic.

Proof. Since x— is a limit point of {xk }, there is, for any ρ > 0, a k suf¬ciently large so that

’1

ek < ρ, Hk ¤ 2 ∇2 f (x— ) , Hk ¤ 2 (∇2 f (x— ))’1 ,

’1

and xk is near enough for the assumptions of Theorem 2.3.2 to hold. If Hk is spd, so is Hk

and for such k, (3.44) holds. Hence, the dogleg path has the nodes xk , xCP , and xN . Moreover,

k k

if ρ is suf¬ciently small, then

’1

Hk ∇f (xk ) ¤ 2 ek ¤ 2ρ.

We complete the proof by showing that if ρ is suf¬ciently small, the trust region radius will be

expanded if necessary until the Newton step is in the trust region. Once we do this, the proof is

complete as then the local quadratic convergence of Newton™s method will take over.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

GLOBAL CONVERGENCE 63

Now

predk ≥ sk ∇f (xk ) /2

by the proof of Theorem 3.3.7. Using Hk = ∇2 f (xk ) we have

1

= ’∇f (xk )T stk ’ (∇f (xk + tstk ) ’ ∇f (xk ))T stk dt

aredk

0

1

sT ∇2 f (xk )stk /2 (∇f (xk + tstk ) ’ ∇f (xk ))T stk dt

= predk + ’

tk

0

= predk + O( sk ∇f (xk ) ρ)

and therefore ared/pred = 1’O(ρ). Hence, for ρ suf¬ciently small, the trust region radius will

be increased, if necessary, until the Newton point is inside the trust region and then a Newton

step will be taken. This completes the proof.

The classical dogleg algorithm is implemented in Algorithm ntrust, which uses the trust

radius adjustment scheme from Algorithm trtest. It is to be understood that trtest is

implemented so that xt is given by (3.46) and hence trtest only samples points on the

piecewise linear search path determined by the Cauchy point, the Newton point, and (3.44).

Algorithm 3.3.6. ntrust(x, f, „ )

1. Compute f (x) and ∇f (x)

2. „ = „a + „r ∇f (x)

3. Do while ∇f (x) > „

(a) Compute and factor ∇2 f (x)

(b) Compute the Cauchy and Newton points and test (3.44)

(c) Call trtest(x, xt , x+ , f, ∆)

(d) Compute f (x+ ) and ∇f (x+ ); x = x+

We implement Algorithm ntrust in the collection of MATLAB codes.

3.3.7 A Trust Region Method for Newton“CG

In this section we present a brief account of an algorithm from [247] (see also [257]) that combines

the trust region paradigm of §3.3.6 with the inexact Newton ideas of §2.5.2. We follow §2.5.2

and denote the preconditioner by M and let C = M ’1 . We solve the scaled trust region problem

min φ(d),

d C ¤∆

where the quadratic model is still

1

φ(d) = ∇f (x)T d + dT ∇2 f (x)d.

2

Here the C-norm is

= (dT Cd)1/2 .

d C

The algorithmic description of the trust region problem solver from the TR“CG method

given below is from [162]. In [247] the algorithm is expressed in terms of C rather than M .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

64 ITERATIVE METHODS FOR OPTIMIZATION

This is a dogleg method in that the approximate solution of the trust region problem lies on a

piecewise linear path with the CG iterations as nodes. As long as CG is performing properly

(i.e., pT w > 0) nodes are added to the path until the path intersects the trust region boundary. If

a direction of inde¬niteness is found (pT w ¤ 0), then that direction is followed to the boundary.

In this way a negative curvature direction, if found in the course of the CG iteration, can be

exploited.

The inputs to Algorithm trcg are the current point x, the objective f , the forcing term ·,