for all u ∈ RN .

Therefore, since e = PI — e,

1

2

eT PI — ∇2 f (x— + te)e dt

PI — (x ’ x(1)) =

0

1

eT PI — ∇2 f (x— + te)PI — e dt

=

0

≥ µ PI e 2 /2.

—

Therefore, x ’ x(1) ≥ min(1, µ/2) e and setting M = max{2 + L, 1, 2/µ} completes

the proof.

Following the unconstrained case, we formulate a termination criterion based on relative and

absolute reductions in the measure of stationarity x ’ x(1) . Given r0 = x0 ’ x0 (1) and

relative and absolute tolerances „r and „a the termination criterion for Algorithm gradproj is

x ’ x(1) ¤ „a + „r r0 .

(5.18)

5.4.2 Convergence Analysis

The convergence analysis is more complicated than that for the steepest descent algorithm be-

cause of the care that must be taken with the constraints. Our analysis begins with several

preliminary lemmas.

Lemma 5.4.3. For all x, y ∈ „¦,

(y ’ x(»))T (x(») ’ x + »∇f (x)) ≥ 0.

(5.19)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

94 ITERATIVE METHODS FOR OPTIMIZATION

Proof. By de¬nition of P

x(») ’ x + »∇f (x) ¤ y ’ x + »∇f (x)

for all y ∈ „¦. Hence t = 0 is a local minimum for

φ(t) = (1 ’ t)x(») + ty ’ x + »∇f (x) 2 /2

and, therefore,

0 ¤ φ (0) = (y ’ x(»))T (x(») ’ x + »∇f (x))

as asserted.

We will most often use the equivalent form of (5.19)

(x ’ x(»))T (y ’ x(»)) ¤ »∇f (x)T (y ’ x(»)).

(5.20)

Setting y = x in (5.20), we state Corollary 5.4.4.

Corollary 5.4.4. For all x ∈ „¦ and » ≥ 0,

2

¤ »∇f (x)T (x ’ x(»)).

x ’ x(»)

(5.21)

An important result in any line search analysis is that the steplengths remain bounded away

from 0.

Theorem 5.4.5. Assume that ∇f is Lipschitz continuous with Lipschitz constant L. Let

x ∈ „¦. Then the suf¬cient decrease condition (5.13) holds for all » such that

2(1 ’ ±)

0<»¤ .

(5.22)

L

Proof. We begin with the fundamental theorem of calculus. Setting y = x ’ x(») we have

1

∇f (x ’ ty)T y dt.

f (x ’ y) ’ f (x) = f (x(»)) ’ f (x) = ’

0

Hence,

= f (x) + ∇f (x)T (x(») ’ x)

f (x(»))

(5.23) 1

(∇f (x ’ ty) ’ ∇f (x))T y dt.

’

0

Rearranging terms in (5.23) gives

»(f (x) ’ f (x(»))) = »∇f (x)T (x ’ x(»)) + »E,

(5.24)

where

1

(∇f (x ’ ty) ’ ∇f (x))T y dt

E=

0

and hence

E ¤ L x ’ x(») 2 /2.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 95

So

»(f (x) ’ f (x(»))) ≥ »∇f (x)T (x ’ x(»)) ’ »L x ’ x(») 2 /2.

(5.25)

Therefore, using Corollary 5.4.4 we obtain

2

»(f (x) ’ f (x(»))) ≥ (1 ’ »L/2) x ’ x(»)

which completes the proof.

The consequence for the Armijo rule is that the line search will terminate when

2(1 ’ ±)

βm ¤ ¤ β m’1

L

if not before. Hence, a lower bound for the steplengths is

¯ 2β(1 ’ ±) .

»=

(5.26)

L

Theorem 5.4.6. Assume that ∇f is Lipschitz continuous with Lipschitz constant L. Let

{xn } be the sequence generated by the gradient projection method. Then every limit point of

the sequence is a stationary point.

Proof. Since the sequence {f (xn )} is decreasing and f is bounded from below on „¦, f (xn )

has a limit f — . The suf¬cient decrease condition, as in the proof of Theorem 3.2.4, and (5.26)

imply that

2

xn ’ xn+1 ¤ »(f (xn ) ’ f (xn+1 ))/± ¤ (f (xn ) ’ f (xn+1 ))/± ’ 0

as n ’ ∞.

Now let y ∈ „¦ and n ≥ 0. By (5.20) we have

∇f (xn )T (xn ’ y) = ∇f (xn )T (xn+1 ’ y) + ∇f (xn )T (xn ’ xn+1 )

¤ »’1 (xn ’ xn+1 )T (xn+1 ’ y) + ∇f (xn )T (xn ’ xn+1 ).

n

Therefore, by (5.26),

∇f (xn )T (xn ’ y) ¤ xn ’ xn+1 (»’1 xn+1 ’ y + ∇f (xn ) ),

n

(5.27)

¯

∇f (xn )T (xn ’ y) ¤ xn ’ xn+1 (»’1 xn+1 ’ y + ∇f (xn ) ).

If xnl ’ x— is a convergence subsequence of {xn }, then we may take limits in (5.27) as

l ’ ∞ and complete the proof.

5.4.3 Identi¬cation of the Active Set

The gradient projection iteration has the remarkable property that if it converges to a nondegen-

erate local minimizer, then the active set An of xn is the same as A— after only ¬nitely many

iterations.

Theorem 5.4.7. Assume that f is Lipschitz continuously differentiable and that the gradient

projection iterates {xn } converge to a nondegenerate local minimizer x— . Then there is n0 such

that A(xn ) = A(x— ) for all n ≥ n0 .

¯

Proof. Let » be the lower bound for the steplength. Let δ be such that the conclusions of

¯ ¯

Lemma 5.4.1 hold for » = » (and hence for all » ≥ »). Let n0 be such that en < δ for all

n ≥ n0 ’ 1 and the proof is complete.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

96 ITERATIVE METHODS FOR OPTIMIZATION

5.4.4 A Proof of Theorem 5.2.4

We close this section with a proof of Theorem 5.2.4. We de¬ne a nonsmooth function

F (x) = x ’ P(x ’ ∇f (x)).

(5.28)

Using (5.12),

F (x) = x ’ x(1).

We now prove Theorem 5.2.4.

Proof. Corollary 5.4.4 states that

x— ’ x— (») 2

¤ »∇f (x— )T (x— ’ x— (»)).

If we set x = x— (») in the de¬nition of stationarity (5.7) we have

∇f (x— )T (x— ’ x— (»)) ¤ 0

and hence x— = x— (»).

Conversely assume that x— = x— (») for all » > 0. This implies that x— is left invariant by

the gradient projection iteration and is therefore a stationary point.

By setting » = 1 we obtain a simple consequence of Theorem 5.2.4.

Corollary 5.4.8. Let f be a Lipschitz continuously differentiable function on „¦. Then if

x is stationary then F (x— ) = 0.

—

5.5 Superlinear Convergence

Once the gradient projection iteration has identi¬ed the active constraints, PA(x— ) x— is known.

At that point the minimization problem for PI x— is unconstrained and, in principal, any super-

linearly convergent method for unconstrained optimization could then be used.

The problem with this idea is, of course, that determining when the active set has been

identi¬ed is possible only after the problem has been solved and an error in estimating the active

set can have devastating effects on convergence. In this section we discuss two approaches: one,

based on Newton™s method, is presented only as a local method; the other is a BFGS“Armijo

method similar to Algorithm bfgsopt.

We will begin with the development of the local theory for the projected Newton method

[19]. This analysis illustrates the important problem of estimation of the active set. As with the

unconstrained minimization problem, the possibility of negative curvature makes this method

dif¬cult to globalize (but see §5.6 for pointers to the literature on trust region methods). Following

the approach in §4.2 we describe a projected BFGS“Armijo scheme in §5.5.3.

5.5.1 The Scaled Gradient Projection Algorithm

One might think that the theory developed in §5.4 applies equally well to iterations of the form

’1

x+ = P(xc ’ »Hc ∇f (xc ))

where Hc is spd. This is not the case as the following simple example illustrates. Let N = 2,

Li = 0, and Ui = 1 for all i. Let

f (x) = x ’ (’1, 1/2)T 2

/2;

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 97

then the only local minimizer for (5.3) is x— = (0, 1/2)T . Let xc = (0, 0) (not a local mini-

mizer!); then ∇f (xc ) = (1, ’1/2)T . If

21

’1

Hc =

12

’1

then Hc (and hence Hc ) is spd, and

2 1 1 3/2

’1

Hc ∇f (xc ) = = .

1 2 ’1/2 0

Therefore, for all » > 0,

0 1 ’3»/2 0

’1

xc (») = P ’ »Hc =P = = xc .

0 ’1/2 0 0

The reason that xc (») = xc for all » > 0 is that the search direction for the unconstrained

’1

problem has been rotated by Hc to be orthogonal to the direction of decrease in the inactive

directions for the constrained problem. Hence, unlike the constrained case, positivity of the

model Hessian is not suf¬cient and we must be able to estimate the active set and model the

reduced Hessian (rather than the Hessian) if we expect to improve convergence.

The solution proposed in [19] is to underestimate the inactive set in a careful way and

therefore maintain a useful spd approximate reduced Hessian. For x ∈ „¦ and

0 ¤ < min(Ui ’ Li )/2,

we de¬ne A (x), the -active set at x, by

A (x) = {i | Ui ’ (x)i ¤ or (x)i ’ Li ¤ }.

(5.29)

And let I (x), the -inactive set, be the complement of A (x).

Given 0 ¤ c < min(Ui ’ Li )/2, xc , and an spd matrix Hc , we model the reduced Hessian

with Rc , the matrix with entries

±

δij if i ∈ A c (xc ) or j ∈ A c (xc ),

Rc = PA c (xc ) + PI c (xc )Hc PI c (xc ) =

(Hc )ij otherwise.

(5.30)

When the explicit dependence on xc , c , and Hc is important we will write

R(xc , c , Hc ).

So, for example,

∇2 f (xc ) = R(xc , 0, ∇2 f (xc )).

R

Given 0 < < min(Ui ’ Li )/2 and an spd H, de¬ne

xH, (») = P(x ’ »R(x, , H)’1 ∇f (x)).

It requires proof that

f (xH, (»)) < f (x)

for » suf¬ciently small. We prove more and show that the suf¬cient decrease condition

f (xH, (»)) ’ f (x) ¤ ’±∇f (x)T (x ’ xH, (»))

(5.31)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

98 ITERATIVE METHODS FOR OPTIMIZATION

holds for suf¬ciently small ».

Lemma 5.5.1. Let x ∈ „¦, 0 < < min(Ui ’ Li )/2, and H be spd with smallest and largest

eigenvalues 0 < »s ¤ »l . Let ∇f be Lipschitz continuous on „¦ with Lipschitz constant L. Then

¯

there is »( , H) such that (5.31) holds for all

¯

» ¤ »( , H).

(5.32)

Proof. The proof will show that

∇f (x)T (x ’ xH, (»)) ≥ »’1 ∇f (x)T (x ’ x(»))

l

and then use the method of proof from Theorem 5.4.5. We do this by writing

∇f (x)T (x ’ xH, (»)) = (PA T

’ xH, (»)) + (PI T

’ xH, (»))

(x) ∇f (x)) (x (x) ∇f (x)) (x

and considering the two terms on the right side separately.

We begin by looking at (PA (x) ∇f (x))T (x ’ xH, (»)). Note that

(xH, (»))i = (x(»))i for i ∈ A (x)

and, therefore,

T

’ xH, (»)) = (PA T

(PA (x) ∇f (x)) (x (x) ∇f (x)) (x ’ x(»)).

(5.33)

We will need to show that

T

(PA (x) ∇f (x)) (x ’ x(»)) ≥ 0.

(5.34)

Now assume that

min(Ui ’ Li )

¯

» < »1 = .

(5.35)

2 maxx∈„¦ ∇f (x) ∞

Since A(x) ‚ A (x) we can investigate the contributions of A(x) and A (x) © I(x) separately.

If i ∈ A(x) then (5.35) implies that either (x ’ x(»))i = »(∇f (x))i or (x ’ x(»))i = 0. In

either case (x ’ x(»))i (∇f (x))i ≥ 0. If i ∈ A (x) © I(x) and (x ’ x(»))i = »(∇f (x))i , then

it must be the case that i ∈ A(x(»)) and therefore we must still have (x ’ x(»))i (∇f (x))i ≥ 0.

Hence (5.34) holds.

Now if i ∈ I (x) then, by de¬nition,

Li + ¤ (x)i ¤ Ui ’

and, hence, if

¯

» ¤ »2 =

(5.36)

maxx∈„¦ R(x, , H)’1 ∇f (x) ∞

then i is in the inactive set for both xH, (») and x(»). Therefore, if (5.36) holds then

T

’ xH, (»)) = »(PI T ’1

(PI (x) ∇f (x)) (x (x) ∇f (x)) H PI (x) ∇f (x)

≥ »’1 »’1 PI 2

(x) (x ’ x(»))

(5.37) l

= »’1 (PI T

(x) ∇f (x)) (x ’ x(»)).

l

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.