5.1 Problem Statement

The goal of this chapter is to show how the techniques of Chapters 2, 3, and 4 can be used to

solve a simple constrained optimization problem. The algorithm we suggest at the end in §5.5.3

is a useful extension of the BFGS“Armijo algorithm from Chapter 4. We will continue this line

of development when we solve noisy problems in Chapter 7.

Let {Li }N and {Ui }N be sequences of real numbers such that

i=1 i=1

’∞ < Li < Ui < +∞.

(5.1)

The bound constrained optimization problem is to ¬nd a local minimizer x— of a function f of

N variables subject to the constraint that

x— ∈ „¦ = {x ∈ RN | Li ¤ (x)i ¤ Ui }.

(5.2)

By this we mean that x— satis¬es

f (x— ) ¤ f (x) for all x ∈ „¦ near x— .

(5.3)

It is standard to express this problem as

min f (x)

(5.4)

x∈„¦

or as min„¦ f . The set „¦ is called the feasible set and a point in „¦ is called a feasible point.

Because the set „¦ is compact there is always a solution to our minimization problem [229].

The inequalities Li ¤ (x)i ¤ Ui are called inequality constraints or simply constraints.

We will say that the ith constraint is active at x ∈ „¦ if either (x)i = Li or (x)i = Ui . If the

ith constraint is not active we will say that it is inactive. The set of indices i such that the ith

constraint is active (inactive) will be called the set of active (inactive) indices at x.

We will write A(x) and I(x) for the active and inactive sets at x.

5.2 Necessary Conditions for Optimality

For a continuously differentiable function of one variable, the necessary conditions for uncon-

strained optimality at x— are simply f (x— ) = 0 and, if f is twice continuously differentiable,

f (x— ) ≥ 0. A bound constrained problem in one variable restricts the domain of f to an

interval [a, b], and the necessary conditions must be changed to admit the possibility that the

87

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

88 ITERATIVE METHODS FOR OPTIMIZATION

minimizer is one of the endpoints. If x— = a is a local minimizer, then it need not be the case

that f (a) = 0; however, because a is a local minimizer, f (x) ≥ f (a) for all a ¤ x suf¬ciently

near a. Hence f (a) ≥ 0. Nothing, however, can be said about f . Similarly, if x— = b is a

local minimizer, f (b) ¤ 0. If f is differentiable on [a, b] (i.e., on an open set containing [a, b]),

then the necessary conditions for all three possibilities, x— = a, x— = b, and a < x— < b can be

neatly expressed by the following theorem.

Theorem 5.2.1. Let f be a continuously differentiable function of one variable on the

interval [a, b]. Let x— be a local minimum of f on [a, b]. Then

f (x— )(x ’ x— ) ≥ 0 for all x ∈ [a, b]

(5.5)

and, if f is twice continuously differentiable on [a, b],

f (x— )(x— ’ a)(b ’ x— ) ≥ 0.

(5.6)

The analogue (5.5) is expressed by the idea of stationarity.

De¬nition 5.2.1. A point x— ∈ „¦ is stationary for problem (5.4) if

∇f (x— )T (x ’ x— ) ≥ 0 for all x ∈ „¦.

(5.7)

As in the unconstrained case, stationary points are said to satisfy the ¬rst-order necessary

conditions.

The fact that optimality implies stationarity is proved with Taylor™s theorem just as it was in

the unconstrained case.

Theorem 5.2.2. Let f be continuously differentiable on „¦ and let x— be a solution of problem

(5.4). Then x— is a stationary point for problem (5.4).

Proof. Let x— be a solution of problem (5.4) and let y ∈ „¦. As „¦ is convex, the line segment

joining x— and y is entirely in „¦. Hence, the function

φ(t) = f (x— + t(y ’ x— ))

is de¬ned for t ∈ [0, 1] and has a local minimum at t = 0. Therefore, by Theorem 5.2.1

0 ¤ φ (0) = ∇f (x— )T (y ’ x— )

as asserted.

The case of a function of a single variable is less useful in explaining the role of the second

derivative. However, we can get a complete picture by looking at functions of two variables.

To illustrate the ideas we let N = 2 and let f be a twice Lipschitz continuously differentiable

function on „¦ = [0, 1] — [0, 1]. If x— is a solution of (5.4) and no constraints are active, then

∇2 f (x— ) is positive semide¬nite by the same arguments used in the unconstrained case. If one

or more constraints are active, however, then, just as in the one variable case, one cannot draw

conclusions about the positivity of ∇2 f (x— ). Suppose the minimizer is at x— = (ξ, 0) with

0 < ξ < 1. While nothing can be said about ‚ 2 f (x— )/‚x2 , the function φ(t) = f (t, 0), de¬ned

2

on [0, 1], must satisfy

φ (ξ) = ‚ 2 f (x— )/‚x2 ≥ 0.

1

Hence, second partials in directions corresponding to the inactive constraints must be nonnega-

tive, while nothing can be said about directions corresponding to active constraints.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 89

We de¬ne the reduced Hessian to help make this idea precise.

De¬nition 5.2.2. Let f be twice differentiable at x ∈ „¦. The reduced Hessian ∇2 f (x) is

R

the matrix whose entries are

±

δij if i ∈ A(x) or j ∈ A(x),

2

(∇R f (x))ij =

(5.8)

(∇2 f (x))ij otherwise.

We can now present the second-order necessary conditions.

Theorem 5.2.3. Let f be twice Lipschitz continuously differentiable and let x— be the solution

of problem (5.4). Then the reduced Hessian ∇2 f (x— ) is positive semide¬nite.

R

Proof. Assume that there are M inactive indices and N ’ M active indices. We partition

x ∈ „¦, reordering the variables if needed, into x = (ξ, ζ) with ξ corresponding to the inactive

indices and ζ to the active. The map

φ(ξ) = f (ξ, ζ — )

has an unconstrained local minimizer at ξ — ∈ RM and hence ∇2 φ is positive semide¬nite. Since

the reduced Hessian can be written as

∇2 φ(x— ) 0

∇2 f (x— ) =

R 0 I

if the variables are partitioned in this way, the proof is complete.

We let P denote the projection onto „¦, that is, the map that takes x into the nearest point (in

the l2 -norm) in „¦ to x. We have that

±

Li if (x)i ¤ Li ,

(x)i if Li < (x)i < Ui ,

P(x)i =

(5.9)

Ui if (x)i ≥ Ui .

Theorem 5.2.4 states our ¬nal necessary condition; we defer the proof to §5.4.4.

Theorem 5.2.4. Let f be continuously differentiable. A point x— ∈ „¦ is stationary for

problem (5.4) if and only if

x— = P(x— ’ »∇f (x— ))

(5.10)

for all » ≥ 0.

5.3 Suf¬cient Conditions

With the de¬nition of the reduced Hessian in hand, the suf¬cient conditions are easy to formulate.

We begin by strengthening the notion of stationarity. If x— is stationary, i ∈ I(x— ), and ei is a

unit vector in the ith coordinate direction, then x— ± tei ∈ „¦ for all t suf¬ciently small. Since

df (x— ± tei )

= ±∇f (x— )T ei ≥ 0,

dt

therefore

(∇f (x— ))i = 0 for all i ∈ I(x— ).

We will use the concept of nondegenerate stationary point or strict complementarity in our

formulation of the suf¬cient conditions.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

90 ITERATIVE METHODS FOR OPTIMIZATION

De¬nition 5.3.1. A point x— ∈ „¦ is a nondegenerate stationary point for problem (5.4) if

x— is a stationary point and

(∇f (x— ))i = 0 for all i ∈ A(x— ).

(5.11)

If x— is also a solution of problem (5.4) we say that x— is a nondegenerate local minimizer.

Our nondegeneracy condition is also referred to as strict complementarity.

If S is any set of indices de¬ne

±

(x)i , i ∈ S,

(PS x)i =

0, i ∈ S.

Nondegeneracy is important not only in the formulation of suf¬cient conditions but also in

the design of termination criteria. The ¬rst step in the use of nondegeneracy is Lemma 5.3.1.

Lemma 5.3.1. Let x— be a nondegenerate stationary point. Assume that A = A(x— ) is not

empty. Then there is σ such that

∇f (x— )T (x ’ x— ) = ∇f (x— )T PA (x ’ x— ) ≥ σ PA (x ’ x— )

for all x ∈ „¦.

Proof. If i ∈ A then nondegeneracy and stationarity imply that there is σ > 0 such that

either

(x— ) = Li and (∇f (x— ))i ≥ σ or (x— ) = Ui and (∇f (x— ))i ¤ ’σ.

i i

If x ∈ „¦ then for all i ∈ A,

(∇f (x— ))i (x ’ x— )i ≥ σ|(x ’ x— )i |.

Therefore, since x ≥ x 2,

1

∇f (x— )T PA (x ’ x— ) ≥ σ PA (x ’ x— ) ,

as asserted.

For a nondegenerate stationary point the suf¬ciency conditions are very similar to the un-

constrained case.

Theorem 5.3.2. Let x— ∈ „¦ be a nondegenerate stationary point for problem (5.4). Let

f be twice differentiable in a neighborhood of x— and assume that the reduced Hessian at x—

is positive de¬nite. Then x— is a solution of problem (5.4) (and hence a nondegenerate local

minimizer).

Proof. Let x ∈ „¦ and de¬ne φ(t) = f (x— + t(x ’ x— )). We complete the proof by showing

that either (i) φ (0) > 0 or (ii) φ (0) = 0, φ (0) > 0. Let e = x ’ x— and note that

φ (0) = ∇f (x— )T e = ∇f (x— )T (PA e + PI e).

Stationarity implies that ∇f (x— )T PI e = 0. If PA e = 0 then nondegeneracy implies that

∇f (x— )T PA e > 0

and hence (i) holds. If PA e = 0 then

φ (0) = (x ’ x— )T PI ∇2 f (x— )PI (x ’ x— ) = (x ’ x— )T ∇2 f (x— )(x ’ x— ) > 0,

R

proving (ii).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 91

5.4 The Gradient Projection Algorithm

The gradient projection algorithm is the natural extension of the steepest descent algorithm to

bound constrained problems. It shares all the advantages and disadvantages of that algorithm.

Our approach follows that of [18]. Given a current iterate xc the new iterate is

x+ = P(xc ’ »∇f (xc )),

where » is a steplength parameter given by the Armijo rule or some other line search scheme.

In this section we will restrict our attention to the simplest form of the Armijo rule. In order to

implement any line search scheme, we must specify what we mean by suf¬cient decrease. For

» > 0 de¬ne

x(») = P(x ’ »∇f (x)).

(5.12)

For bound constrained problems we will express the suf¬cient decrease condition for line searches

(compare with (3.4)) as

’±

x ’ x(») 2 .

f (x(»)) ’ f (x) ¤

(5.13)

»

As with (3.4), ± is a parameter and is typically set to 10’4 [84].

The general algorithmic description follows in Algorithm 5.4.1.

Algorithm 5.4.1. gradproj(x, f, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination.

(b) Find the least integer m such that (5.13) holds for » = β m .

(c) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

The next step is to elaborate on the termination criterion.

5.4.1 Termination of the Iteration

The termination criterion for unconstrained optimization that we have used previously must be

modi¬ed if we are to properly take the constraints into account. ∇f need not be zero at the

solution, but a natural substitute is to terminate the iteration if the difference between x and x(1)

is small. As in the case of unconstrained optimization or nonlinear equations, we must invoke

the suf¬cient conditions to show that such a termination criterion will accurately measure the

error.

As usual, we let e = x ’ x— .

We begin with a lemma that connects the active and inactive sets at a nondegenerate local

minimizer with nearby points.

Lemma 5.4.1. Let f be twice continuously differentiable on „¦ and let x— be a nondegenerate

stationary point for problem (5.4). Let » ∈ (0, 1]. Then for x suf¬ciently near x— ,

1. A(x) ‚ A(x— ) and (x)i = (x— )i for all i ∈ A(x).

2. A(x(»)) = A(x— ) and (x(»))i = (x— )i for all i ∈ A(x— ).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

92 ITERATIVE METHODS FOR OPTIMIZATION

Proof. Let

A— = A(x— ), I — = I(x— ), A = A(x), and I = I(x).

Let

δ1 = min {(Ui ’ (x— )i ), ((x— )i ’ Li ), (Ui ’ Li )/2}.

—i∈I

If i ∈ I — and e < δ1 then Li < (x)i < Ui . Hence,

I— ‚ I

proving the ¬rst assertion that A ‚ A— . Moreover, since

e < δ1 ¤ min {(Ui ’ Li )/2} ,

then (x)i = (x— )i for all i ∈ A.

Now let A» and I » be the active and inactive sets for x(») = P(x ’ »∇f (x)). Let i ∈ A— .

By Lemma 5.3.1 and continuity of ∇f there is δ2 such that if e < δ2 then

(∇f (x— + e))i (x ’ x— )i > σ|x ’ x— |i /2.

Therefore, if

e < δ3 < min(σ/2, δ2 ),

then i ∈ A» and (x(»))i = (x— )i . Hence A— ‚ A» .

It remains to prove that A» ‚ A— . By de¬nition of P we have

P(x) ’ P(y) ¤ x ’ y

for all x, y ∈ RN . Continuity of ∇2 f implies that ∇f is Lipschitz continuous. We let L denote

the Lipschitz constant of ∇f in „¦. By stationarity and Theorem 5.2.4,

x— = x— (») = P(x— ’ »∇f (x— )),

and, therefore,

x— ’ x(») = P(x— ’ »∇f (x— )) ’ P(x ’ »∇f (x))

(5.14)

¤ e + » ∇f (x— ) ’ ∇f (x) ¤ (1 + L») e .

If there is i ∈ A» © I — then we must have

x— ’ x(») ≥ δ1 = min {(Ui ’ x— ), (x— ’ Li )}.

(5.15) — i∈I

However, if

e < δ4 = min(δ3 , δ1 /(1 + L))

then (5.14) implies that (5.15) cannot hold. This completes the proof.

Theorem 5.4.2. Let f be twice continuously differentiable on „¦ and let x— be a nondegenerate

stationary point for problem (5.4). Assume that suf¬cient conditions hold at x— . Then there are

δ and M such that if e < δ and A(x) = A(x— ) then

e /M ¤ x ’ x(1) ¤ M e .

(5.16)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 93

Proof. Again we let L denote the Lipschitz constant of ∇f in „¦ and

A— = A(x— ), I — = I(x— ), A = A(x), and I = I(x).

Using stationarity we obtain

= e ’ (x(1) ’ x— (1))

x ’ x(1)

¤ e + P(x ’ ∇f (x)) ’ P(x— ’ ∇f (x— ))

¤ 2 e + ∇f (x) ’ ∇f (x— ) ¤ (2 + L) e .

Hence, the right inequality in (5.16) holds.

To verify the left inequality in (5.16) we apply Lemma 5.4.1. Let δ1 be such that e < δ1

implies that the conclusions of Lemma 5.4.1 hold for » = 1. The lemma implies that

±

(∇f (x))i , i ∈ I — ,

(x ’ x(1))i =

(5.17)

(e)i = 0, i ∈ A— .

The remaining case is if i ∈ I = I — . The suf¬ciency conditions imply that there is µ > 0

such that

uT PI — ∇2 f (x— )PI — u ≥ µ PI — u 2

for all u ∈ RN . Hence, there is δ2 so that if e < δ2 then