<< . .

. 24
( : 32)



. . >>

10 10
function value




0 0
10 10


1 2
10 10


2 4
10 10
0 1000 2000 3000 4000 0 1000 2000 3000 4000
function evaluations function evaluations

Forward Differences Forward Differences
2 3
10 10
simplex gradient norm




2
10
1
function value




10
1
10
0
10
0
10


1 1
10 10
0 1000 2000 3000 4000 0 1000 2000 3000 4000
function evaluations function evaluations


Figure 7.9: Perturbed Quadratic, N = 32




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




IMPLICIT FILTERING 133

If a3 = 0 then f may not return the same value when called with the same argument twice.
The reader is invited to explore the consequences of this in exercise 7.7.3.
The performance of the algorithms in this part of the book sometimes depends on the size of
the problem much more strongly than the Newton-based methods in Part I. In the case of implicit
¬ltering, that dependence is mostly a result of the cost of evaluation of the simplex gradient. To
illustrate this we consider our quadratic problems for N = 4 (Figures 7.6 and 7.8) and N = 32
(Figures 7.7 and 7.9).
For all the quadratic examples the initial iterate was

(1, 2, . . . , N )T
x0 =
10N
and the sequence of scales was {2’k }10 .
k=0



7.7 Exercises on Implicit Filtering
7.7.1. Let S be a nonsingular simplex. Show that DC (f : S) = f (x1 ) if f is a quadratic
function.
7.7.2. How would you expect forward and centered difference implicit ¬ltering to perform when
applied to f (x) = xT x? Would the performance be independent of dimension? Test your
expectation with numerical experimentation.
7.7.3. Use implicit ¬ltering to minimize the perturbed quadratic function with nonzero values of
a3 .

7.7.4. Try to solve the Lennard“Jones problem with implicit ¬ltering for various values of M
and various initial iterates. Compare your best results with those in [142], [40], and [210].
Are you doing any better than you did in exercise 6.4.3?
7.7.5. Show that Theorems 5.5.4 and 5.5.5 hold if the projected BFGS update is implemented us-
ing (7.9) and (7.10). How would these formulas affect an implementation like bfgsrecb,
which is designed for problems in which full matrices cannot be stored?




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




134 ITERATIVE METHODS FOR OPTIMIZATION




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 8

Direct Search Algorithms

In this chapter we discuss the class of direct search algorithms. These methods use values of
f taken from a set of sample points and use that information to continue the sampling. Unlike
implicit ¬ltering, these methods do not explicitly use approximate gradient information. We will
focus on three such methods: the Nelder“Mead simplex algorithm [204], the multidirectional
search method [85], [261], [262], and the Hooke“Jeeves algorithm [145]. Each of these can be
analyzed using the simplex gradient techniques from Chapter 6. We will not discuss the very
general results based on the taxonomies of direct search methods from [263], [174], and [179] or
the recent research on the application of these methods to bound [173] or linear [175] constraints.
We include at the end of this chapter a short discussion of methods based on surrogate models
and a brief account of a very different search method, the DIRECT algorithm [150]. These two
¬nal topics do not lead to algorithms that are easy to implement, and our discussions will be
very general with pointers to the literature.


8.1 The Nelder“Mead Algorithm
8.1.1 Description and Implementation
The Nelder“Mead [204] simplex algorithm maintains a simplex S of approximations to an
optimal point. In this algorithm the vertices {xj }N +1 are sorted according to the objective
j=1
function values
f (x1 ) ¤ f (x2 ) ¤ · · · ¤ f (xN +1 ).
(8.1)
x1 is called the best vertex and xN +1 the worst. If several vertices have the same objective
value as x1 , the best vertex is not uniquely de¬ned, but this ambiguity has little effect on the
performance of the algorithm.
The algorithm attempts to replace the worst vertex xN +1 with a new point of the form

x(µ) = (1 + µ)x ’ µxN +1 ,
(8.2)

where x is the centroid of the convex hull of {xi }N
i=1

1 N
x= xi .
(8.3)
N i=1

The value of µ is selected from a sequence

’1 < µic < 0 < µoc < µr < µe

135
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




136 ITERATIVE METHODS FOR OPTIMIZATION

by rules that we formally describe in Algorithm nelder. Our formulation of the algorithm
allows for termination if either f (xN +1 ) ’ f (x1 ) is suf¬ciently small or a user-speci¬ed number
of function evaluations has been expended.

Algorithm 8.1.1. nelder(S, f, „, kmax)

1. Evaluate f at the vertices of S and sort the vertices of S so that (8.1) holds.

2. Set f count = N + 1.

3. While f (xN +1 ) ’ f (x1 ) > „

(a) Compute x, (8.3), x(µr ), (8.2), and fr = f (x(µr )). f count = f count + 1.
(b) Re¬‚ect: If f count = kmax then exit. If f (x1 ) ¤ fr < f (xN ), replace xN +1 with
x(µr ) and go to step 3g.
(c) Expand: If f count = kmax then exit. If fr < f (x1 ) then compute fe = f (x(µe )).
f count = f count + 1. If fe < fr , replace xN +1 with x(µe ); otherwise replace
xN +1 with x(µr ). Go to to step 3g.
(d) Outside Contraction: If f count = kmax then exit. If f (xN ) ¤ fr < f (xN +1 ),
compute fc = f (x(µoc )). f count = f count + 1. If fc ¤ fr replace xN +1 with
x(µoc ) and go to step 3g; otherwise go to step 3f.
(e) Inside Contraction: If f count = kmax then exit. If fr ≥ f (xN +1 ) compute
fc = f (x(µic )). f count = f count + 1. If fc < f (xN +1 ), replace xN +1 with
x(µic ) and go to step 3g; otherwise go to step 3f.
(f) Shrink: If f count ≥ kmax ’ N , exit. For 2 ¤ i ¤ N + 1: set xi = x1 ’ (xi ’
x1 )/2; compute f (xi ).
(g) Sort: Sort the vertices of S so that (8.1) holds.


A typical sequence [169] of candidate values for µ is

{µr , µe , µoc , µic } = {1, 2, 1/2, ’1/2}.

Figure 8.1 is an illustration of the options in two dimensions. The vertices labeled 1, 2, and
3 are those of the original simplex.
The Nelder“Mead algorithm is not guaranteed to converge, even for smooth problems [89],
[188]. The failure mode is stagnation at a nonoptimal point. In §8.1.3 we will present some
examples from [188] that illustrate this failure. However, the performance of the Nelder“Mead
algorithm in practice is generally good [169], [274]. The shrink step is rare in practice and we
will assume in the analysis in §8.1.2 that shrinks do not occur. In that case, while a Nelder“Mead
iterate may not result in a reduction in the best function value, the average value

1 N +1
f= f (xj )
N +1 j=1


will be reduced.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 137

e




1
r



oc




ic



3 2

Figure 8.1: Nelder“Mead Simplex and New Points


8.1.2 Suf¬cient Decrease and the Simplex Gradient
Our study of the Nelder“Mead algorithm is based on the simple ideas in §3.1. We will denote
the vertices of the simplex S k at the kth iteration by {xk }N +1 . We will simplify notation by
j j=1
suppressing explicit mention of S k in what follows by denoting

V k = V (S k ), δ k = δ(f : S k ), K k = K(S k ), and Dk (f ) = D(f : S k ).

If V 0 is nonsingular then V k is nonsingular for all k > 0 [169]. Hence if S 0 is nonsingular so
is S k for all k and hence Dk (f ) is de¬ned for all k.
We formalize this by assuming that our sequence of simplices satis¬es the following assump-
tion.
Assumption 8.1.1. For all k,
• S k is nonsingular.
• The vertices satisfy (8.1).
• f k+1 < f k .

Assumption 8.1.1 is satis¬ed by the Nelder“Mead sequence if no shrink steps are taken
and the initial simplex directions are linearly independent [169]. The Nelder“Mead algorithm
demands that the average function value improve, but no control is possible on which value is
improved, and the simplex condition number can become unbounded.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




138 ITERATIVE METHODS FOR OPTIMIZATION

We can de¬ne a suf¬cient decrease condition for search algorithms that is analogous to the
suf¬cient decrease condition for steepest descent and related algorithms (3.2). We will ask that
the k + 1st iteration satisfy
f k+1 ’ f k < ’± Dk f 2 .
(8.4)
Here ± > 0 is a small parameter. Our choice of suf¬cient decrease condition is motivated by the
¯
smooth case and steepest descent, where (3.2) and the lower bound ’» on » from Lemma 3.2.3
lead to
¯
f (xk+1 ) ’ f (xk ) ¤ ’»± ∇f (xk ) 2 ,
which is a smooth form of (8.4). Unlike the smooth case, however, we have no descent direction
¯
and must incorporate » into ±. This leads to the possibility that if the simplex diameter is much
smaller than Dk f , (8.4) could fail on the ¬rst iterate. We address this problem with the scaling
σ+ (S 0 )
± = ±0 .
D0 f
A typical choice in line search methods, which we use in our numerical results, is ±0 = 10’4 .
The convergence result for smooth functions follows easily from Lemma 6.2.1.
Theorem 8.1.1. Let a sequence of simplices satisfy Assumption 8.1.1 and let the assumptions
of Lemma 6.2.1 hold, with the Lipschitz constants K k uniformly bounded. Assume that {f k } is
bounded from below. Then if (8.4) holds for all but ¬nitely many k and
lim σ+ (S k )κ(V k ) = 0,
k’∞

then any accumulation point of the simplices is a critical point of f .
Proof. The boundedness from below of {f k } and (8.4) imply that f k ’ 0. Assumption 8.1.1
and (8.4) imply that limk’∞ Dk f = 0. Hence (6.2) implies
lim ∇f (xk ) ¤ lim Kκ(V k )σ+ (S k ) + Dk f = 0.
1
k’∞ k’∞

Hence, if x— is any accumulation point of the sequence {xk } then ∇f (x— ) = 0. This completes
1
k k
the proof since κ(V ) ≥ 1 and therefore σ+ (V ) ’ 0.
The result for the noisy functions that satisfy (6.1) with fs smooth re¬‚ects the fact that
the resolution is limited by the size of φ. In fact, if σ+ (S k ) is much smaller than φ S k , no
information on fs can be obtained by evaluating f at the vertices of S k and once σ+ (S k ) is
1/2
smaller than φ S k no conclusions on ∇fs can be drawn. If, however, the noise decays to zero
suf¬ciently rapidly near the optimal point, the conclusions of Theorem 8.1.1 still hold.
Theorem 8.1.2. Let a sequence of simplices satisfy Assumption 8.1.1 and let the assumptions
of Lemma 6.2.2 hold with the Lipschitz constants Ks uniformly bounded. Assume that {f k } is
k

bounded from below. Then if (8.4) holds for all but ¬nitely many k and if
φ Sk
lim κ(V k ) σ+ (S k ) + = 0,
(8.5)
σ+ (S k )
k’∞

then any accumulation point of the simplices is a critical point of fs .
Proof. Our assumptions, as in the proof of Theorem 8.1.1, imply that Dk f ’ 0. Recall that
Lemma 6.2.2 implies that
φ Sk
Dk fs ¤ Dk f + K k κ(V k ) σ+ (S k ) + ,
(8.6)
σ+ (S k )

and the sequence {K k } is bounded because {Ks } is. Hence, by (8.5), Dk fs ’ 0 as k ’ ∞.
k




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 139

Function Differences Max Oriented Length
2 2
10 10
0
10
0
10
2
10
2
10
4
10
4
10 6
10
6
10 8
10
8 10
10 10
0 50 100 150 0 50 100 150

Simplex Gradient Norm Simplex Condition
4 20
10 10

15
10
3
10
10
10
2
10 5
10

1 0
10 10
0 50 100 150 0 50 100 150


Figure 8.2: Unmodi¬ed Nelder“Mead, („, θ, φ) = (1, 15, 10)


8.1.3 McKinnon™s Examples
In this set of three examples from [188], N = 2, and
±
 θφ|(x)1 |„ + (x)2 + (x)2 , (x)1 ¤ 0,
2

<< . .

. 24
( : 32)



. . >>