is convex, then the projected BFGS“Armijo algorithm converges q-superlinearly to x— .

5.6 Other Approaches

Our simple projected-BFGS method is effective for small to medium sized problems and for very

large problems that are discretizations of in¬nite-dimensional problems that have the appropriate

compactness properties. The example in §4.4.2 nicely illustrates this point. For other kinds of

large problems, however, more elaborate methods are needed, and we present some pointers to

the literature in this section.

The limited memory BFGS method for unconstrained problems described in [44] and [176]

has also been extended to bound constrained problems [42], [280]. More general work on line

search methods for bound constrained problems can be found in [47], [194], and [42].

Very general theories have been developed for convergence of trust region methods for

bound constrained problems. The notion of Cauchy decrease can, for example, be replaced by

the decrease from a gradient projection step for the quadratic model [191], [259], [66]. One

could look for minima of the quadratic model along the projection path [63], [64], or attempt to

project the solution of an unconstrained model using the reduced Hessian [162].

A completely different approach can be based on interior point methods. This is an active

research area and the algorithms are not, at least at this moment, easy to implement or analyze.

This line of research began with [57] and [58]. We refer the reader to [86] and [266] for more

recent accounts of this aspect of the ¬eld and to [140] and [79] for some applications to control

problems and an account of the dif¬culties in in¬nite dimensions.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 105

Gradient Projection Gradient Projection

1 2

10 10

Projected Gradient Norm

Function Value

0

10

1 1

10 10

0 500 1000 1500 2000 0 500 1000 1500 2000

Iterations Iterations

Projected BFGS Projected BFGS

5 2

10 10

Projected Gradient Norm

Function Value

0

10

5

10

10 1

10 10

0 10 20 30 40 0 10 20 30 40

Iterations Iterations

Figure 5.1: Solution to Constrained Parameter ID Problem

Solution to bound constrained control problem

1.4

1.3

1.2

1.1

1

0.9

0.8

0.7

0.6

0.5

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Figure 5.2: Solution to Discrete Control Problem: First Example

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

106 ITERATIVE METHODS FOR OPTIMIZATION

5.6.1 In¬nite-Dimensional Problems

The results in this part of this book do not extend in a direct way to in¬nite-dimensional problems.

One reason for this is that often in¬nite-dimensional problems have countably in¬nitely many

constraints or even a continuum of constraints; hence „¦ is not compact in the norm topology of

the Banach space in which the problem is posed and appeals to various types of weak continuity

must be made (see [122] for an example of such arguments and [122] and [10] for applications).

Moreover, identi¬cation of an active set in ¬nitely many iterations is not always possible. A

more complete account of this issue may be found in [254], [162], [161].

These are not the only complications that can arise in in¬nite dimension. Even the projected

gradient method presents challenges, especially if the minima fail to be nondegenerate in the

sense of this book [94], [95]. Convergence behavior for discretized problems can be different

from that for the continuous problem [97]. Nonequivalence of norms makes convergence results

dif¬cult to formulate and analyze for both line search [96], [254], [98] and trust region [140],

[162] methods.

The functional analytic structure of many control problems can be exploited with fast mul-

tilevel methods. Both second kind multigrid methods from [138] and variants of the Atkinson“

Brakhage method [9], [31] have been applied to ¬xed point formulations of parabolic boundary

control problems in one space dimension [136], [137], [153], [162], [161].

5.7 Examples

The computations in this section were done with the MATLAB code bfgsbound. In this code

the storage is limited to ¬ve pairs of vectors, and β = .1 was used in the line search.

5.7.1 Parameter ID Problem

We consider the parameter problem from §3.4.1 with bounds L = (2, 0)T and U = (20, 5)T .

The initial iterate x0 = (5, 5)T is feasible, but the global minimum of (1, 1)T is not. As one

might expect, the lower bound constraint on (x)1 is active at the optimal point x— ≈ (2, 1.72)T .

The termination criterion for both the gradient projection and projected BFGS algorithms was

u ’ u(1) ¤ 10’6 .

The gradient projection algorithm failed. While the value of the objective function was

correct, the projected gradient norm failed to converge and the active set was not identi¬ed.

The projected BFGS iteration converged in 35 iterations. One can see the local superlinear

convergence in Figure 5.1 from the plot of the projected gradient norms. The cost of the BFGS

iteration was 121 function evaluations, 36 gradients, and roughly 5.3 million ¬‚oating point

operations.

5.7.2 Discrete Control Problem

We base the two control problem examples on the example from §1.6.1.

Our ¬rst example takes N = 2000, T = 1, y0 = 0,

L(y, u, t) = (y ’ 3)2 + .1 — u2 , and φ(y, u, t) = uy + t2 ,

with the bound constraints

.5 ¤ u ¤ 2,

and the initial iterate u0 = 2. We terminated the iteration when u ’ u(1) ¤ 10’5 . In

Figure 5.2 we plot the solution of this problem. Clearly the active set is not empty for the

constrained problem.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 107

4

Projected BFGS Projected BFGS

x 10

2

10 1.74

Projected Gradient Norm

1.73

0

10

Function Value

1.72

2

10

1.71

4

10

1.7

6

10 1.69

0 10 20 30 40 0 10 20 30 40

Iterations Iterations

4

Gradient Projection Gradient Projection

x 10

2

10 1.74

Projected Gradient Norm

1.73

0

10

Function Value

1.72

2

10

1.71

4

10

1.7

6

10 1.69

0 50 100 0 50 100

Iterations Iterations

Figure 5.3: Constrained Discrete Control Problem I

Projected BFGS Projected BFGS

5 8

10 10

Projected Gradient Norm

Function Value

0 6

10 10

5 4

10 10

10 2

10 10

0 2 4 6 0 2 4 6

Iterations Iterations

Gradient Projection Gradient Projection

5 8

10 10

Projected Gradient Norm

Function Value

0 6

10 10

5 4

10 10

10 2

10 10

0 2 4 6 8 0 2 4 6 8

Iterations Iterations

Figure 5.4: Constrained Discrete Control Problem II

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

108 ITERATIVE METHODS FOR OPTIMIZATION

We solve the constrained problem with Algorithm gradproj and Algorithm bfgsoptb.

In Figure 5.3 we plot the function value and the norm of the projected gradient u ’ u(1).

The projected BFGS iteration required 71 function evaluations, 36 gradient evaluations, and

roughly 5.6 million ¬‚oating point operations, while the gradient projection needed 183 function

evaluations, 92 gradient evaluations, and roughly 10.4 million ¬‚oating point operations.

Our second control problem example solves the same problem as in §3.4.2 using the con-

straints

’206 ¤ u ¤ 206.

We terminate the iteration when u ’ u(1) ¤ 10’6 , which is exactly the condition used in

§3.4.2 when the active set is empty. The solution to the unconstrained problem is feasible,

the active set is empty, and the initial iterate is feasible. Both the gradient projection iteration

and the projected BFGS iteration converge to the solution of the unconstrained problem. The

constraints are not active at either the initial iterate or the ¬nal solution but are active inside

the line search for the ¬rst iterate and for the second iterate. As is clear from a comparison

of Figures 5.4 and 3.3, this small change has a dramatic effect on the cost of the optimization,

eliminating the need for the scaling ¬xup (3.50). The gradient projection method, requiring 15

function evaluations, 8 gradient evaluations, and roughly 167 thousand ¬‚oating point operations,

is far more ef¬cient that the steepest descent iteration reported in §3.4.2. The projected BFGS

iteration was somewhat worse, needing 223 thousand operations, but only 13 function evaluations

and 7 gradient evaluations. In this example the cost of maintaining the BFGS update was not

compensated by a signi¬cantly reduced iteration count.

5.8 Exercises on Bound Constrained Optimization

5.8.1. Suppose that f is continuously differentiable, that x— is a nondegenerate local minimizer

for problem (5.4), and all constraints are active. Show that there is δ such that

1. if x ∈ B(δ) then x— = P(x ’ ∇f (x)), and

2. the gradient projection algorithm converges in one iteration if x0 ∈ B(δ).

5.8.2. Show that if H = I then (5.31) and (5.13) are equivalent.

5.8.3. Prove Theorem 5.5.2.

5.8.4. Verify (5.42).

5.8.5. Suppose the unconstrained problem (1.2) has a solution x— at which the standard assump-

tions for unconstrained optimization hold. Consider the bound constrained problem (5.3)

for u and l such that x— ∈ „¦ and A(x— ) is not empty. Is x— a nondegenerate local mini-

mizer? If not, how are the results in this chapter changed? You might try a computational

example to see what™s going on.

5.8.6. Prove Theorem 5.5.4.

5.8.7. Verify (5.51).

5.8.8. Verify (5.52).

5.8.9. Formulate a generalization of (4.33) for updating A† .

5.8.10. What would happen in the examples if we increased the number of (y, s) pairs that were

stored? By how much would the BFGS cost be increased?

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Part II

Optimization of Noisy Functions

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 6

Basic Concepts and Goals

The algorithms in Part I cannot be implemented at all if the gradient of f is not available,

either analytically or via a difference. Even if gradients are available, these algorithms are

not satisfactory if f has many local minima that are not of interest. We limit our coverage to

deterministic sampling algorithms which are generally applicable and are more or less easy to

implement. Of these algorithms, only the DIRECT algorithm [150] covered in §8.4.2 is truly

intended to be a global optimizer.

The study of optimization methods that do not require gradients is an active research area (see

[227] for a survey of some of this activity), even for smooth problems [61], [62]. Even though