<< . .

. 19
( : 32)



. . >>


is convex, then the projected BFGS“Armijo algorithm converges q-superlinearly to x— .


5.6 Other Approaches
Our simple projected-BFGS method is effective for small to medium sized problems and for very
large problems that are discretizations of in¬nite-dimensional problems that have the appropriate
compactness properties. The example in §4.4.2 nicely illustrates this point. For other kinds of
large problems, however, more elaborate methods are needed, and we present some pointers to
the literature in this section.
The limited memory BFGS method for unconstrained problems described in [44] and [176]
has also been extended to bound constrained problems [42], [280]. More general work on line
search methods for bound constrained problems can be found in [47], [194], and [42].
Very general theories have been developed for convergence of trust region methods for
bound constrained problems. The notion of Cauchy decrease can, for example, be replaced by
the decrease from a gradient projection step for the quadratic model [191], [259], [66]. One
could look for minima of the quadratic model along the projection path [63], [64], or attempt to
project the solution of an unconstrained model using the reduced Hessian [162].
A completely different approach can be based on interior point methods. This is an active
research area and the algorithms are not, at least at this moment, easy to implement or analyze.
This line of research began with [57] and [58]. We refer the reader to [86] and [266] for more
recent accounts of this aspect of the ¬eld and to [140] and [79] for some applications to control
problems and an account of the dif¬culties in in¬nite dimensions.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 105



Gradient Projection Gradient Projection
1 2
10 10
Projected Gradient Norm




Function Value
0
10




1 1
10 10
0 500 1000 1500 2000 0 500 1000 1500 2000
Iterations Iterations

Projected BFGS Projected BFGS
5 2
10 10
Projected Gradient Norm




Function Value


0
10



5
10



10 1
10 10
0 10 20 30 40 0 10 20 30 40
Iterations Iterations


Figure 5.1: Solution to Constrained Parameter ID Problem



Solution to bound constrained control problem
1.4


1.3


1.2


1.1


1


0.9


0.8


0.7


0.6


0.5
0 200 400 600 800 1000 1200 1400 1600 1800 2000


Figure 5.2: Solution to Discrete Control Problem: First Example




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




106 ITERATIVE METHODS FOR OPTIMIZATION

5.6.1 In¬nite-Dimensional Problems
The results in this part of this book do not extend in a direct way to in¬nite-dimensional problems.
One reason for this is that often in¬nite-dimensional problems have countably in¬nitely many
constraints or even a continuum of constraints; hence „¦ is not compact in the norm topology of
the Banach space in which the problem is posed and appeals to various types of weak continuity
must be made (see [122] for an example of such arguments and [122] and [10] for applications).
Moreover, identi¬cation of an active set in ¬nitely many iterations is not always possible. A
more complete account of this issue may be found in [254], [162], [161].
These are not the only complications that can arise in in¬nite dimension. Even the projected
gradient method presents challenges, especially if the minima fail to be nondegenerate in the
sense of this book [94], [95]. Convergence behavior for discretized problems can be different
from that for the continuous problem [97]. Nonequivalence of norms makes convergence results
dif¬cult to formulate and analyze for both line search [96], [254], [98] and trust region [140],
[162] methods.
The functional analytic structure of many control problems can be exploited with fast mul-
tilevel methods. Both second kind multigrid methods from [138] and variants of the Atkinson“
Brakhage method [9], [31] have been applied to ¬xed point formulations of parabolic boundary
control problems in one space dimension [136], [137], [153], [162], [161].


5.7 Examples
The computations in this section were done with the MATLAB code bfgsbound. In this code
the storage is limited to ¬ve pairs of vectors, and β = .1 was used in the line search.

5.7.1 Parameter ID Problem
We consider the parameter problem from §3.4.1 with bounds L = (2, 0)T and U = (20, 5)T .
The initial iterate x0 = (5, 5)T is feasible, but the global minimum of (1, 1)T is not. As one
might expect, the lower bound constraint on (x)1 is active at the optimal point x— ≈ (2, 1.72)T .
The termination criterion for both the gradient projection and projected BFGS algorithms was
u ’ u(1) ¤ 10’6 .
The gradient projection algorithm failed. While the value of the objective function was
correct, the projected gradient norm failed to converge and the active set was not identi¬ed.
The projected BFGS iteration converged in 35 iterations. One can see the local superlinear
convergence in Figure 5.1 from the plot of the projected gradient norms. The cost of the BFGS
iteration was 121 function evaluations, 36 gradients, and roughly 5.3 million ¬‚oating point
operations.

5.7.2 Discrete Control Problem
We base the two control problem examples on the example from §1.6.1.
Our ¬rst example takes N = 2000, T = 1, y0 = 0,

L(y, u, t) = (y ’ 3)2 + .1 — u2 , and φ(y, u, t) = uy + t2 ,

with the bound constraints
.5 ¤ u ¤ 2,
and the initial iterate u0 = 2. We terminated the iteration when u ’ u(1) ¤ 10’5 . In
Figure 5.2 we plot the solution of this problem. Clearly the active set is not empty for the
constrained problem.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 107

4
Projected BFGS Projected BFGS
x 10
2
10 1.74
Projected Gradient Norm




1.73
0
10




Function Value
1.72
2
10
1.71
4
10
1.7

6
10 1.69
0 10 20 30 40 0 10 20 30 40
Iterations Iterations

4
Gradient Projection Gradient Projection
x 10
2
10 1.74
Projected Gradient Norm




1.73
0
10
Function Value


1.72
2
10
1.71
4
10
1.7

6
10 1.69
0 50 100 0 50 100
Iterations Iterations


Figure 5.3: Constrained Discrete Control Problem I


Projected BFGS Projected BFGS
5 8
10 10
Projected Gradient Norm




Function Value




0 6
10 10



5 4
10 10



10 2
10 10
0 2 4 6 0 2 4 6
Iterations Iterations

Gradient Projection Gradient Projection
5 8
10 10
Projected Gradient Norm




Function Value




0 6
10 10



5 4
10 10



10 2
10 10
0 2 4 6 8 0 2 4 6 8
Iterations Iterations


Figure 5.4: Constrained Discrete Control Problem II




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




108 ITERATIVE METHODS FOR OPTIMIZATION

We solve the constrained problem with Algorithm gradproj and Algorithm bfgsoptb.
In Figure 5.3 we plot the function value and the norm of the projected gradient u ’ u(1).
The projected BFGS iteration required 71 function evaluations, 36 gradient evaluations, and
roughly 5.6 million ¬‚oating point operations, while the gradient projection needed 183 function
evaluations, 92 gradient evaluations, and roughly 10.4 million ¬‚oating point operations.
Our second control problem example solves the same problem as in §3.4.2 using the con-
straints
’206 ¤ u ¤ 206.
We terminate the iteration when u ’ u(1) ¤ 10’6 , which is exactly the condition used in
§3.4.2 when the active set is empty. The solution to the unconstrained problem is feasible,
the active set is empty, and the initial iterate is feasible. Both the gradient projection iteration
and the projected BFGS iteration converge to the solution of the unconstrained problem. The
constraints are not active at either the initial iterate or the ¬nal solution but are active inside
the line search for the ¬rst iterate and for the second iterate. As is clear from a comparison
of Figures 5.4 and 3.3, this small change has a dramatic effect on the cost of the optimization,
eliminating the need for the scaling ¬xup (3.50). The gradient projection method, requiring 15
function evaluations, 8 gradient evaluations, and roughly 167 thousand ¬‚oating point operations,
is far more ef¬cient that the steepest descent iteration reported in §3.4.2. The projected BFGS
iteration was somewhat worse, needing 223 thousand operations, but only 13 function evaluations
and 7 gradient evaluations. In this example the cost of maintaining the BFGS update was not
compensated by a signi¬cantly reduced iteration count.


5.8 Exercises on Bound Constrained Optimization
5.8.1. Suppose that f is continuously differentiable, that x— is a nondegenerate local minimizer
for problem (5.4), and all constraints are active. Show that there is δ such that
1. if x ∈ B(δ) then x— = P(x ’ ∇f (x)), and
2. the gradient projection algorithm converges in one iteration if x0 ∈ B(δ).
5.8.2. Show that if H = I then (5.31) and (5.13) are equivalent.
5.8.3. Prove Theorem 5.5.2.

5.8.4. Verify (5.42).
5.8.5. Suppose the unconstrained problem (1.2) has a solution x— at which the standard assump-
tions for unconstrained optimization hold. Consider the bound constrained problem (5.3)
for u and l such that x— ∈ „¦ and A(x— ) is not empty. Is x— a nondegenerate local mini-
mizer? If not, how are the results in this chapter changed? You might try a computational
example to see what™s going on.
5.8.6. Prove Theorem 5.5.4.
5.8.7. Verify (5.51).
5.8.8. Verify (5.52).
5.8.9. Formulate a generalization of (4.33) for updating A† .

5.8.10. What would happen in the examples if we increased the number of (y, s) pairs that were
stored? By how much would the BFGS cost be increased?




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Part II

Optimization of Noisy Functions




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 6

Basic Concepts and Goals

The algorithms in Part I cannot be implemented at all if the gradient of f is not available,
either analytically or via a difference. Even if gradients are available, these algorithms are
not satisfactory if f has many local minima that are not of interest. We limit our coverage to
deterministic sampling algorithms which are generally applicable and are more or less easy to
implement. Of these algorithms, only the DIRECT algorithm [150] covered in §8.4.2 is truly
intended to be a global optimizer.
The study of optimization methods that do not require gradients is an active research area (see
[227] for a survey of some of this activity), even for smooth problems [61], [62]. Even though

<< . .

. 19
( : 32)



. . >>