BOUND CONSTRAINTS 99

Hence, using Corollary 5.4.4, (5.33), (5.34), and (5.37), we obtain

∇f (x)T (x ’ xH, (»)) T

’ xH, (»))

= (PA (x) ∇f (x)) (x

T

’ xH, (»))

+(PI (x) ∇f (x)) (x

’ x(»)) + »’1 (PI

T T

≥ (PA (x) ∇f (x)) (x (x) ∇f (x)) (x ’ x(»))

l

min(1,»’1 )

≥ min(1, »’1 )∇f (x)T (x ’ x(»)) ≥ x ’ x(») 2 .

l

l »

(5.38)

The remainder of the proof is almost identical to that for Theorem 5.4.5. The fundamental

theorem of calculus and the Lipschitz continuity assumption imply that

f (xH, (»)) ’ f (x) ¤ ’∇f (x)T (x ’ xH, (»)) + L x ’ xH, (») 2 .

We apply (5.38) and obtain

f (xH, (»)) ’ f (x) ¤ ’(1 ’ L» max(1, »l ))∇f (x)T (x ’ xH, (»)),

which implies (5.31) if 1 ’ L» max(1, »l ) ≥ ± which will follow from

(1 ’ ±)

¯

» ¤ »3 = .

(5.39)

max(1, »l )L

¯ ¯¯¯

This completes the proof with » = min(»1 , »2 , »3 ).

An algorithm based on these ideas is the scaled gradient projection algorithm. The name

comes from the scaling matrix H that is used to computed the direction. The inputs are the initial

iterate, the vectors of upper and lower bounds u and l, the relative-absolute residual tolerance

vector „ = („r , „a ), and a limit on the number of iterations. Left unstated in the algorithmic

description are the manner in which the parameter is computed and the way in which the

approximate Hessians are constructed.

Algorithm 5.5.1. sgradproj(x, f, „, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination using (5.18).

(b) Compute and an spd H.

(c) Solve

R(x, , Hc )d = ’∇f (xc ).

(d) Find the least integer m such that (5.13) holds for » = β m .

(e) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

If our model reduced Hessians remain uniformly positive de¬nite, a global convergence

result completely analogous to Theorem 3.2.4 holds.

Theorem 5.5.2. Let ∇f be Lipschitz continuous with Lipschitz constant L. Assume that the

matrices Hn are symmetric positive de¬nite and that there are κ and »l such that κ(Hn ) ¤ κ,

¯ ¯

and Hn ¤ »l for all n. Assume that there is ¯ > 0 such that ¯ ¤ n < min(Ui ’ Li )/2 for

all n.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

100 ITERATIVE METHODS FOR OPTIMIZATION

Then

lim xn ’ xn (1) = 0,

(5.40)

n’∞

and hence any limit point of the sequence of iterates produced by Algorithm sgradproj is a

stationary point.

In particular, if xnl ’ x— is any convergent subsequence of {xn }, then x— = x— (1). If xn

converges to a nondegenerate local minimizer x— , then the active set of xn is the same as that of

x— after ¬nitely many iterations.

Proof. With Lemma 5.5.1 and its proof in hand, the proof follows the outline of the proof of

Theorem 5.4.6. We invite the reader to work through it in exercise 5.8.3.

5.5.2 The Projected Newton Method

The requirement in the hypothesis of Theorem 5.5.2 that the sequence { n } be bounded away

from zero is used to guarantee that the steplengths »n are bounded away from zero. This is

needed because appears in the numerator in (5.36). However, once the active set has been

identi¬ed and one is near enough to a nondegenerate local minimizer for the reduced Hessians to

be spd, one is solving an unconstrained problem. Moreover, once near enough to that minimizer,

the convergence theory for Newton™s method will hold. Then one can, in principle, set n = 0

and the iteration will be q-quadratically convergent. In this section we discuss an approach from

[19] for making a transition from the globally convergent regime described in Theorem 5.5.2 to

the locally convergent setting where Newton™s method converges rapidly.

If the initial iterate x0 is suf¬ciently near a nondegenerate local minimizer x— and we take

Hn = ∇2 f (xn )

R

in Algorithm sgradproj, then the resulting projected Newton method will take full steps (i.e.,

» = 1) and, if n is chosen with care, converge q-quadratically to x— .

A speci¬c form of the recommendation from [19], which we use here, is

= min( xn ’ xn (1) , min(Ui ’ Li )/2).

(5.41) n

Note that while xn is far from a stationary point and the reduced Hessian is spd, then n will be

bounded away from zero and Theorem 5.5.2 will be applicable. The convergence result is like

Theorem 2.3.3 for local convergence but makes the strong assumption that Hn is spd (valid near

x— , of course) in order to get a global result.

Algorithm projnewt is the formal description of the projected Newton algorithm. It is a

bit more than just a speci¬c instance of Algorithm gradproj. Keep in mind that if the initial

iterate is far from x— and the reduced Hessian is not spd, then the line search (and hence the

entire iteration) may fail. The algorithm tests for this. This possibility of inde¬niteness is the

weakness in any line search method that uses ∇2 f when far from the minimizer. The inputs to

Algorithm projnewt are the same as those for Algorithm gradproj. The algorithm exploits

the fact that

R(x, , ∇2 f (x)) = R(x, , ∇2 f (x))

(5.42) R

which follows from A(x) ‚ A (x).

Algorithm 5.5.2. projnewt(x, f, „, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination using (5.18).

(b) Set = x ’ x(1) .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 101

(c) Compute and factor R = R(x, , ∇2 f (x)). If R is not spd, terminate with a failure

R

message.

(d) Solve Rd = ’∇f (xc ).

(e) Find the least integer m such that (5.13) holds for » = β m .

(f) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

Theorem 5.5.3. Let x— be a nondegenerate local minimizer. Then if x0 is suf¬ciently near

to x— and A(x0 ) = A(x— ) then the projected Newton iteration, with n = xn ’ xn (1) , will

converge q-quadratically to x— .

Proof. Our assumption that the active set has been identi¬ed, i.e.,

A(xc ) = A(x+ ) = A(x— ),

implies that

PA(xc ) ec = PA(xc ) e+ = 0.

Hence, we need only estimate PI(xc ) e+ to prove the result.

Let

δ — = min— (|(x)i ’ Ui |, |(x)i ’ Li |) > 0.

i∈I(x )

We reduce e if necessary so that

e ¤ δ — /M,

where M is the constant in Theorem 5.4.2. We may then apply Theorem 5.4.2 to conclude that

both c < δ — and ec < δ — . Then any index i ∈ A c (xc ) must also be in A(xc ) = A(x— ).

Hence

A c (xc ) = A(xc ) = A(x— ).

(5.43)

From this we have

R(xc , c , ∇2 f (xc )) = ∇2 f (xc ).

(5.44) R R

Hence, for ec suf¬ciently small the projected Newton iteration is

x+ = P(xc ’ (∇2 f (xc ))’1 ∇f (xc )).

R

By the fundamental theorem of calculus,

∇f (xc ) = ∇f (x— ) + ∇2 f (xc )ec + E1 ,

(5.45)

where

1

(∇2 f (x— + tec ) ’ ∇2 f (xc ))ec dt

E1 =

0

and hence E1 ¤ K1 ec 2 for some K1 > 0.

By the necessary conditions,

PI(x) ∇f (x— ) = PI(x— ) ∇f (x— ) = 0.

(5.46)

By the fact that I(xc ) = I(x— ), we have the equivalent statements

ec = PI(xc ) ec and PA(xc ) ec = 0.

(5.47)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

102 ITERATIVE METHODS FOR OPTIMIZATION

Therefore, combining (5.45), (5.46), (5.47),

= PI(xc ) ∇2 f (xc )PI(xc ) ec + PI(xc ) E1

PI(xc ) ∇f (xc )

= PA(xc ) ec + PI(xc ) ∇2 f (xc )PI(xc ) ec + PI(xc ) E1

(5.48)

= ∇2 f (xc )ec + PI(xc ) E1 .

R

So, by de¬nition of ∇2 ,

R

PI(xc ) (∇2 f (xc ))’1 ∇f (xc ) = (∇2 f (xc ))’1 PI(xc ) ∇f (xc ) = ec + E2 ,

R R

where E2 ¤ K2 ec 2 for some K2 > 0.

Since PI(xc ) Pw = PPI(xc ) w for all w ∈ RN ,

= PI(xc ) P(xc ’ (∇2 f (xc ))’1 ∇f (xc ))

PI(xc ) x+ R

= PPI(xc ) (xc ’ (∇2 f (xc ))’1 ∇f (xc )) = P(x— ’ E2 ).

R

2

Therefore, e+ ¤ K2 ec as asserted.

5.5.3 A Projected BFGS“Armijo Algorithm

We can apply the structured quasi-Newton updating scheme from §4.3 with

C(x) = PA

(5.49) (x)

and update an approximation to the part of the model Hessian that acts on the inactive set. In

this way we can hope to maintain a positive de¬nite model reduced Hessian with, say, a BFGS

update. So if our model reduced Hessian is

R = C(x) + A,

we can use (4.42) to update A (with A0 = PI 0 (x0 ) , for example), as long as the active set does

not change. If one begins the iteration near a nondegenerate local minimizer with an accurate

approximation to the Hessian, then one would expect, based on Theorem 5.5.3, that the active

set would remain constant and that the iteration would converge q-superlinearly.

However, if the initial data is far from a local minimizer, the active set can change with each

iteration and the update must be designed to account for this. One way to do this is to use a

projected form of the BFGS update of A from (4.42),

T

y# y# (Ac s)(Ac s)T

A+ = PI+ Ac PI+ + ’ PI+ PI+ ,

(5.50) T sT Ac s

y# s

with

y # = PI+ (∇f (x+ ) ’ ∇f (xc )).

Here I+ = I + (x+ ). This update carries as much information as possible from the previous

model reduced Hessian while taking care about proper approximation of the active set. As in

T

the unconstrained case, if y # s ¤ 0 we can either skip the update or reinitialize A to PI .

A is not spd if any constraints are active. However, we can demand that A be symmetric

inde¬nite, and a generalized inverse A† exists. We have

(PA + A)’1 = PA + A† .

(5.51)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 103

If A(xc ) = A(x+ ) any of the low-storage methods from Chapter 4 can be used to update A† .

In this case we have

T

sy # y # sT ssT

A† A†

= I’ I’ + .

(5.52) c

+ T T T

y# s y# s y# s

Since s = PI+ s if A(x+ ) = A(xc ), we could replace s by s# = PI+ s in (5.52).

If the set of active constraints changes, then (5.52) is no longer true and we cannot replace

s by s# . One approach is to store A, update it as a full matrix, and refactor with each nonlinear

iteration. This is very costly for even moderately large problems. Another approach, used in

[250], is to reinitialize A = PI whenever the active set changes. The problem with this is that

in the terminal phase of the iteration, when most of the active set has been identi¬ed, too much

information is lost when A is reinitialized.

In this book we suggest an approach based on the recursive BFGS update that does not

discard information corresponding to that part of the inactive set that is not changed. The idea is

that even if the active set has changed, we can still maintain an approximate generalized inverse

with

T T T

s# y # y # s# s# s#

† †

A+ = I ’ PI+ Ac PI+ I ’ + .

(5.53) T T T

y # s# y # s# y # s#

The formulation we use in the MATLAB code bfgsbound is based on (5.52) and Al-

gorithm bfgsrec. Algorithm bfgsrecb stores the sequences {yk } and {s# } and uses

#

k

Algorithm bfgsrec and the projection PI+ to update A† as the iteration progresses. The data

are the same as for Algorithm bfgsrec with the addition of

PIn = PI .

n (xn )

Note that the sequences {yk } and {s# } are changed early in the call and then the unconstrained

#

k

algorithm bfgsrec is used to do most of the work.

Algorithm 5.5.3. bfgsrecb(n, {s# }, {yk }, A† , d, PIn )

#

0

k

1. d = PIn d.

2. If n = 0, d = A† d; return

0

T

T # #

3. ± = s# n’1 d/yn’1 s# ; d = d ’ ±yn’1

4. call bfgsrec(n ’ 1, {s# k }, {yk }, A† , d)

#

0

T T

# #

5. d = d + (± ’ (yn’1 d/yn’1 s# n’1 ))s# n’1

6. d = PIn d.

The projected BFGS“Armijo algorithm that we used in the example problems in §5.7 is

based on Algorithm bfgsrecb. Note that we reinitialize ns to zero (i.e., reinitialize A to PI )

#T

when yns s ¤ 0. We found experimentally that this was better than skipping the update.

Algorithm 5.5.4. bfgsoptb(x, f, „, u, l)

1. ns = n = 0; pg0 = pg = x ’ P(x ’ ∇f (x))

= min(min(Ui ’ Li )/2, pg ); A = A (x); I = I (x); A0 = PI

2.

3. While pg ¤ „a + „r pg0

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

104 ITERATIVE METHODS FOR OPTIMIZATION

(a) d = ’∇f (x); Call bfgsrecb(ns, {s# }, {yk }, A† , d, PI )

#

0

k

(b) d = ’PA ∇f (x) + d

(c) Find the least integer m such that (5.13) holds for » = β m . Set s# = PI (x(») ’ x)

ns

#

(d) xp = x(»); y = ∇f (xp) ’ ∇f (x); x=xp; yns = PI (∇f (xp) ’ ∇f (x))

T

(e) If yns s# s > 0 then ns = ns + 1, else ns = 0

#

n

(f) x = xp; pg = x ’ P(x ’ ∇f (x))

= min(min(Ui ’ Li )/2, pg ); A = A (x); I = I (x)

(g)

(h) n = n + 1

Theorem 4.1.3 can be applied directly once the active set has been identi¬ed and a good

initial approximation to the reduced Hessian is available. The reader is invited to construct the

(easy!) proof in exercise 5.8.6.

Theorem 5.5.4. Let x— be a nondegenerate local minimizer. Then if x0 is suf¬ciently near

to x— , A(x0 ) = A(x— ), and A0 suf¬ciently near to PI(x— ) ∇2 f (x— )PI(x— ) , then the projected

BFGS iteration, with n = xn ’ xn (1) , will converge q-superlinearly to x— .

A global convergence result for this projected BFGS algorithm can be derived by combining

Theorems 5.5.2 and 4.1.9.

Theorem 5.5.5. Let ∇f be Lipschitz continuous on „¦. Assume that the matrices Hn are

constructed with the projected BFGS method (5.50) and satisfy the assumptions of Theorem 5.5.2.

Then (5.40) and the conclusions of Theorem 5.5.2 hold.

Moreover, if x— is a nondegenerate local minimizer such that there is n0 such that A(xn ) =

A(x— ) for all n ≥ n0 , Hn0 is spd, and the set

D = {x | f (x) ¤ f (xn0 ) and A(x) = A(x— )}