<< . .

. 21
( : 32)



. . >>

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BASIC CONCEPTS AND GOALS 117

Theorem 6.2.8, like Theorem 6.2.4, motivates using a small simplex gradient as a test for
convergence. Suppose φ ∞ ¤ and an algorithm generates sequences of simplices whose
vertices are intended to approximate a minimizer of fs . We can use the results in §2.3.1 to con-
clude that simplices with σ+ (S) << 1/2 will result in inaccurate forward difference gradients
and those with σ+ (S) << 2/3 in inaccurate central difference gradients. This indicates that
the central simplex gradient will be less sensitive to noise than the forward. While this is not
usually critical in computing a difference Hessian, where the loss of accuracy may cause slow
convergence, it can cause failure of the iteration if one is computing a difference gradient.
If one wants to terminate the algorithm when the simplex gradient is small, say, ¤ „ , a rough
estimate of the minimal possible value of „ is „ = O( 1/2 ) for a forward difference simplex
gradient and „ = O( 2/3 ) for a central simplex gradient.
Moreover, if one is using a centered difference, one has information on the values of f at
enough points to make an important qualitative judgment. In order to evaluate a central simplex
gradient f must be sampled at x1 and x1 ± vj for 1 ¤ j ¤ N . If f (x1 ) ¤ f (x1 ± vj ) for
all 1 ¤ j ¤ N , then one can question the validity of using the simplex gradient as a descent
direction or as a measure of stationarity. We call this stencil failure. We will use stencil failure
as a termination criterion in most of the algorithms we discuss in this part of the book. Our basis
for that is a result from [29], which only requires differentiability of fs .
Theorem 6.2.9. Let S be a nonsingular simplex such that for some µ’ ∈ (0, 1) and κ+ > 0,

κ(V ) ¤ κ+ and xT V V T x ≥ µ’ σ+ (S)2 x 2
for all x.
(6.6)

Let f satisfy (6.1) and let ∇fs be Lipschitz continuously differentiable in a ball B of radius
2σ+ (S) about x1 . Assume that

f (x1 ) < min{f (x1 ± vj )}.
(6.7)
j

Then, if K is the constant from Lemma 6.2.2,

φB
∇fs (x1 ) ¤ 8µ’1 Kκ+ σ+ (S) + .
(6.8) ’
σ+ (S)


Proof. Let R(S), the re¬‚ected simplex, have vertices x1 and {rj }N . (6.7) implies that
j=1
each component of δ(f : S) and δ(f : R) is positive. Now since

V = V (S) = ’V (R),

we must have
0 < δ(f : S)T δ(f : R)

= (V T V ’T δ(f : S))T (V (R)T V (R)’T δ(f : R))
(6.9)

= ’D(f : S)T V V T D(f : R).
We apply Lemma 6.2.2 to both D(f : S) and D(f : R) to obtain

D(f : S) = ∇fs (x1 ) + E1 and D(f : R) = ∇fs (x1 ) + E2 ,

where, since κ(V ) = κ(V (R)) ¤ κ+ ,

φB
Ek ¤ Kκ+ σ+ (S) + .
σ+ (S)


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




118 ITERATIVE METHODS FOR OPTIMIZATION

Since V ¤ 2σ+ (S) we have by (6.9)

∇fs (x1 )T V V T ∇fs (x1 ) ¤ 4σ+ (S)2 ∇fs (x1 ) ( E1 + E2 )
(6.10)
+4σ+ (S)2 E1 E2 .

The assumptions of the lemma give a lower estimate of the left side of (6.10),

wT V V T w ≥ µ’ σ+ (S)2 w 2 .

Hence,
∇2 f (x1 ) ¤ b ∇2 f (x1 ) + c,
where, using (6.10),
φB
b = 8µ’1 Ks κ+ σ+ (S) +
1
σ+ (S)
and
2
φB µ’ 2
4µ’1 (Ks κ+ )2
c= σ+ (S) + = B.

σ+ (S) 16
So b2 ’ 4c = b2 (1 ’ µ’ /4) and the quadratic formula then implies that

1 + 1 ’ µ’ /4
b + b2 ’ 4c
∇2 f (x1 ) ¤ =b ¤b
2 2
as asserted.


6.3 Examples
Our examples are selected to represent a variety of problems that can be attacked by the methods
in this part of the book and, at the same time, are easy for the reader to implement. Many of the
problems to which these methods have been applied have complex objective functions and have
been solved as team efforts [107], [250], [121], [70], [69]. In many such cases the objective
function is not even available as a single subroutine as the optimizer, simulator, and design tool
are one package. Hence, the examples we present in this part of the book are even more arti¬cial
than the ones in the ¬rst part. The cost of an evaluation of f is much less in these examples than
it is in practice.

6.3.1 Weber™s Problem
Our discussion of this problem is based on [182]. Weber™s problem is to locate a central facility
(a warehouse or factory, for example) so that the total cost associated with distribution to several
demand centers is minimized. The model is that the cost is proportional to the distance from the
facility. The proportionality constant may be positive re¬‚ecting transportation costs or negative
re¬‚ecting environmental concerns.
If the locations of the demand centers are {zi } ‚ R2 and the corresponding weights are
{wi }, then the objective function is

[(x)1 ’ (zi )1 ]2 + [(x)2 ’ (zi )2 ]2 .
f (x) = wi x ’ zi = wi
(6.11)
i i

We will assume that
wi > 0,
i



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BASIC CONCEPTS AND GOALS 119

so that a global optimum exists. If i wi < 0 then inf f = ’∞ and there is no global optimum.
Weber™s problem is not differentiable at x = zi because the square root function is not
differentiable at 0. A gradient-based algorithm, applied in a naive way, will have dif¬culty with
this problem. There are special-purpose algorithms (see [182] for a survey) for Weber™s problem,
especially if all the weights are positive. Our main interest is in the case where at least one weight
is negative. In that case there may be multiple local minima.
We will consider two examples. The ¬rst, and simplest, is from [182]. This example has
three demand centers with
2 90 43
w = (2, 4, ’5)T and (z1 , z2 , z3 ) = .
42 11 88
The global minimum is at x— = (90, 11)T , at which the gradient is not de¬ned. The complex
contours near the minimizer in Figure 6.2 illustrate the dif¬culty of the problem.
200


150


100


50


0


’50


’100


’150


’200
’200 ’150 ’100 ’50 0 50 100 150 200




Figure 6.2: Contour/Surface for Weber™s Function: First Example


Our second example has two local minimizers, at (’10, ’10) and (25, 30) with the global
minimizer at (25, 30). There are four demand centers with
’10 0 5 25
w = (2, ’4, 2, 1)T and (z1 , z2 , z3 , z4 ) = .
’10 0 8 30
See Figure 6.3.
Our third example adds the oscillatory function
φ(x) = sin(.0035xT x) + 5 sin(.003(x ’ y)T (x ’ y))
to the second example, where y = (’20, 0)T . This complicates the optimization landscape
signi¬cantly, as the surface and contour plots in Figure 6.4 show.

6.3.2 Perturbed Convex Quadratics
The sum of a simple convex quadratic and low-amplitude high-frequency perturbation will serve
as a model problem for all the algorithms in this section. For example, the function graphed on
the right in Figure 6.1,

f (x) = 2x2 (1 + .75 cos(80x)/12) + cos(100x)2 /24
is one of the examples in [120]. Our general form will be
= (x ’ ξ0 )T H(x ’ ξ0 )(1 + a1 cos(bT (x ’ ξ1 ) + c1 (x ’ ξ1 )T (x ’ ξ1 )))
f (x) 1
(6.12)
+a2 (1 + cos(bT (x ’ ξ2 )T + c2 (x ’ ξ2 )T (x ’ ξ2 ))) + a3 |rand|,
2



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




120 ITERATIVE METHODS FOR OPTIMIZATION


60



40



20



0



’20



’40



’60
’60 ’40 ’20 0 20 40 60




Figure 6.3: Contour/Surface for Weber™s Function: Second Example

60



40



20



0



’20



’40



’60
’60 ’40 ’20 0 20 40 60




Figure 6.4: Contour/Surface plots for Weber™s Function: Third Example


where {ξj }, {aj }, {bj }, {cj } are given and rand is a random number generator. f has been
designed so that the minimum value is O(a1 +a2 +a3 ). The unperturbed case a1 = a2 = a3 = 0
is also of interest for many of the algorithms in this part of the book.


6.3.3 Lennard“Jones Problem
The objective function is a simple model of the potential energy in a molecule of identical atoms.
Assume that there are M atoms and that ξi ∈ R3 is the position of the ith atom. Letting

dij = ξi ’ ξj

and
x = (ξ1 , . . . , ξM )T ∈ RN
T T


where N = 3M , we have that the Lennard“Jones energy function is
M i’1
d’12 ’ 2d’6 .
f (x) =
(6.13) ij ij
i=1 j=1

2
f has many local minimizers (O(eM ) is one conjecture [142]) and the values at the mini-
mizers are close. Hence, the Lennard“Jones function does not conform to the noisy perturbation
of a smooth function paradigm. The reader is asked in some of the exercises to see how the
methods perform.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BASIC CONCEPTS AND GOALS 121

6.4 Exercises on Basic Concepts
6.4.1. Show that if wi > 0 for all i then Weber™s problem has a unique local minimum.

6.4.2. Prove Lemma 6.2.6.
6.4.3. Try to minimize the Lennard“Jones functional using some of the algorithms from the ¬rst
part of the book. Vary the initial iterate and M . Compare your best results with those in
[142], [40], and [210].




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




122 ITERATIVE METHODS FOR OPTIMIZATION




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 7

Implicit Filtering

7.1 Description and Analysis of Implicit Filtering
The implicit ¬ltering algorithm was originally formulated in [270], [251], and [271], as a
difference-gradient implementation of the gradient projection algorithm [18] in which the dif-
ference increment is reduced in size as the iteration progresses. A different formulation for
unconstrained problems with certain convexity properties was introduced at about the same time
in [279]. From the point of view of this book, the simplex gradient is used in a direct way. The
algorithmic description and analysis in this chapter uses the results from §6.2 directly. We will
focus on unconstrained problems and derive the convergence results that implicit ¬ltering shares
with the search algorithms in Chapter 8.
Implicit ¬ltering, by using an approximate gradient directly, offers the possibility of im-
proved performance with quasi-Newton methods and can be easily applied to bound constrained
problems. We explore these two possibilities in §§7.2 and 7.4.
In its simplest unconstrained form, implicit ¬ltering is the steepest descent algorithm with
difference gradients, where the difference increment varies as the iteration progresses. Because
the gradient is only an approximation, the computed steepest descent direction may fail to be a
descent direction and the line search may fail. In this event, the difference increment is reduced.
For a given x ∈ RN and h > 0 we let the simplex S(x, h) be the right simplex from x with
edges having length h. Hence the vertices are x and x + hvi for 1 ¤ i ¤ N with V = I. So
κ(V ) = 1. The performance of implicit ¬ltering with a central difference gradient is far superior
to that with the forward difference gradient [120], [187], [250]. We will, therefore, use centered
differences in the discussion. We illustrate the performance of forward difference gradients in
§7.6.
We set
∇h f (x) = DC (f : S(x, h)).
We use a simple Armijo [7] line search and demand that the suf¬cient decrease condition
2
f (x ’ »∇h f (x)) ’ f (x) < ’±» ∇h f (x)
(7.1)

holds (compare with (3.4)) for some ± > 0.
Our central difference steepest descent algorithm fdsteep terminates when

∇h f (x) ¤ „ h
(7.2)

for some „ > 0, when more than pmax iterations have been taken, after a stencil failure, or
when the line search fails by taking more than amax backtracks. Even the failures of fdsteep

123
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




<< . .

. 21
( : 32)



. . >>