<< . .

. 26
( : 32)



. . >>



r3
r2
r3




e3
e3
e2


Figure 8.8: MDS Simplices and New Points


1. Evaluate f at the vertices of S and sort the vertices of S so that (8.1) holds.

2. Set f count = N + 1.

3. While f (xN +1 ) ’ f (x1 ) > „

(a) Re¬‚ect: If f count = kmax then exit.
For j = 2, . . . , N +1: rj = x1 ’(xj ’x1 ); Compute f (rj ); f count = f count+1.
If f (x1 ) > minj {f (rj )} then goto step 3b else goto step 3c.
(b) Expand:
i. For j = 2, . . . , N + 1: ej = x1 ’ µe (xj ’ x1 ); Compute f (ej ); f count =
f count + 1.
ii. If minj {f (rj )} > minj {f (ej )} then
for j = 2, . . . N + 1: xj = ej
else
for j = 2, . . . N + 1: xj = rj
iii. Goto step 3d
(c) Contract: For j = 2, . . . , N + 1: xj = x1 + µc (xj ’ x1 ), Compute f (xj )
(d) Sort: Sort the vertices of S so that (8.1) holds.

If the function values at the vertices of S c are known, then the cost of computing S + is 2N
additional evaluations. Just as with the Nelder“Mead algorithm, the expansion step is optional
but has been observed to improve performance.
The extension of MDS to bound constrained and linearly constrained problems is not trivial.
We refer the reader to [173] and [175] for details.

8.2.2 Convergence and the Simplex Gradient
Assume that the simplices are either equilateral or right simplices (having one vertex from which
all N edges are at right angles). In those cases, as pointed out in [262], the possible vertices
created by expansion and re¬‚ection steps form a regular lattice of points. If the MDS simplices
remain bounded, only ¬nitely many re¬‚ections and expansions are possible before every point
on that lattice has been visited and a contraction to a new maximal simplex size must take place.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 145

This exhaustion of a lattice takes place under more general conditions [262] but is most clear for
the equilateral and right simplex cases.
Theorem 6.2.9 implies that in¬nitely many contractions and convergence of the simplex
diameters to zero imply convergence of the simplex gradient to zero. The similarity of The-
orem 6.2.9 to Lemma 6.2.2 and of Theorem 8.2.1, the convergence result for multidirectional
search, to Theorem 8.1.2 is no accident. The Nelder“Mead iteration, which is more aggressive
than the multidirectional search iteration, requires far stronger assumptions (well conditioning
and suf¬cient decrease) for convergence, but the ideas are the same. Theorems 6.2.9 and 8.2.1
can be used to extend the results in [262] to the noisy case. The observation in [85] that one
can apply any heuristic or machine-dependent idea to improve performance, say, by exploring
far away points on spare processors (the “speculative function evaluations” of [46]) without
affecting the analysis is still valid here.
Theorem 8.2.1. Let f satisfy (6.1) and assume that the set

{x | f (x) ¤ f (x0 )}
1

is bounded. Assume that the simplex shape is such that

lim σ+ (S k ) ’ 0.
(8.9)
k’∞

Let B k be a ball of radius 2σ+ (S k ) about xk . Then if
1

φ Bk
lim =0
k’∞ σ+ (S k )

then every limit point of the vertices is a critical point of fs .
Recall that if the simplices are equilateral or right simplices, then (8.9) holds (see exer-
cise 8.6.2).


8.3 The Hooke“Jeeves Algorithm
8.3.1 Description and Implementation
The Hooke“Jeeves algorithm is like implicit ¬ltering in that the objective is evaluated on a stencil
and the function values are used to compute a search direction. However, unlike implicit ¬ltering,
there are only ¬nitely many possible search directions and only qualitative information about
the function values is used.
The algorithm begins with a base point x and pattern size h, which is like the scale in implicit
¬ltering. In the next phase of the algorithm, called the exploratory move in [145], the function is
sampled at successive perturbations of the base point in the search directions {vj }, where vj is
the jth column of a direction matrix V . In [145] and our MATLAB implementation V = I. The
current best value fcb = f (xcb ) and best point xcb are recorded and returned. xcb is initialized
to x. The sampling is managed by ¬rst evaluating f at xcb + vj and only testing xcb ’ vj
if f (xcb + vj ) ≥ f (xcb ). The exploratory phase will either produce a new base point or fail
(meaning that xcb = x). Note that this phase depends on the ordering of the coordinates of x.
Applying a permutation to x could change the output of the exploration.
If the exploratory phase has succeeded, the search direction is

dHJ = xcb ’ x
(8.10)

and the new base point is xcb . The subtle part of the algorithm begins here. Rather than center
the next exploration at xcb , which would use some of the same points that were examined in


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




146 ITERATIVE METHODS FOR OPTIMIZATION

the previous exploration, the Hooke“Jeeves pattern move step is aggressive and tries to move
further. The algorithm centers the next exploratory move at

xC = x + 2dHJ = xcb + dHJ .

If this second exploratory move fails to improve upon f (xcb ), then an exploratory move with
xcb as the center is tried. If that fails h is reduced, x is set to xcb , and the process is started over.
Note that when h has just been set, the base point and the center of the stencil for the exploratory
moves are the same, but afterward they are not.
If, after the ¬rst exploratory move, xcb = x (i.e., as it will be if x is the best point in the
pattern), then x is left unchanged and h is reduced.
Therefore, whenever h is reduced, the stencil centered at x has x itself as the best point.
This is exactly the situation that led to a shrink in the MDS algorithm and, as you might expect,
will enable us to prove a convergence result like those in the previous sections. In [145] h was
simply multiplied by a constant factor. Our description in Algorithm hooke follows the model
of implicit ¬ltering and uses a sequence of scales. Choice of perturbation directions could be
generalized to any simplex shape, not just the right simplices used in [145].
Figure 8.9 illustrates the idea for N = 2. The base point x lies at the center of the stencil. If

f (x+ ) < f (x), f (x+ ) < f (x), f (x’ ) ≥ f (x), and f (x’ ) ≥ f (x),
1 2 1 2

then the new base point xb will be located above and to the right of x. The next exploratory
move will be centered at xC , which is the center of the stencil in the upper right corner of the
¬gure.
The reader, especially one who plans to implement this method, must be mindful that points
may be sampled more than once. For example, in the ¬gure, if the exploratory move centered
at xC fails, f will be evaluated for the second time at the four points in the stencil centered
at xb unless the algorithm is implemented to avoid this. The MDS method is also at risk of
sampling points more than once. The implementations of Hooke“Jeeves and MDS in our suite
of MATLAB codes keep the most recent 4N iterations in memory to guard against this. This
reevaluation is much less likely for the Nelder“Mead and implicit ¬ltering methods. One should
also be aware that the Hooke“Jeeves algorithm, like Nelder“Mead, does not have the natural
parallelism that implicit ¬ltering and MDS do.
One could implement a variant of the Hooke“Jeeves iteration by using xC = x + dHJ
instead of xC = x + 2dHJ and shrinking the size of the simplex on stencil failure. This is the
discrete form of the classical coordinate descent algorithm [180] and can also be analyzed by
the methods of this section (see [279] for a different view).
Our implementation follows the model of implicit ¬ltering as well as the description in
[145]. We begin with the exploratory phase, which uses a base point xb , base function value
fb = f (xb ), and stencil center xC . Note that in the algorithm xb = xC for the ¬rst exploration
and xC = xb +dHJ thereafter. Algorithm hjexplore takes a base point and a scale and returns
a direction and the value at the trial point x + d. We let V = I be the matrix of coordinate
directions, but any nonsingular matrix of search directions could be used. The status ¬‚ag sf is
used to signal failure and trigger a shrink step.
Algorithm 8.3.1. hjexplore(xb , xC , f, h, sf )

1. fb = f (xb ); d = 0; sf = 0; xcb = xb ; fcb = f (xb ); xt = xC

2. for j = 1, . . . , N : p = xt + hvj ; if f (p) ≥ fb then p = xt ’ hvj ;
if f (p) < fb then xt = xcb = p; fb = f (xcb )

3. if xcb = xb ; sf = 1; xb = xcb


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 147

xC2+




xC1- xC1+
xC




x2+ xb xC2-




s1+




x1- x1+
x
s1- s1+




s2-




x2-


Figure 8.9: Hooke“Jeeves Pattern and New Points


The exploration is coupled to the pattern move to complete the algorithm for a single value
of the scale. The inputs for Algorithm hjsearch are an initial iterate x, the function, and the
scale. On output, a point x is returned for which the exploration has failed. There are other
considerations, such as the budget for function evaluations, that should trigger a return from the
exploratory phase in a good implementation. In our MATLAB code hooke.m we pay attention
to the number of function evaluations and change in the function value as part of the decision to
return from the exploratory phase.

Algorithm 8.3.2. hjsearch(x, f, h)

1. xb = x; xC = x; sf = 1

2. Call hjexplore(x, xC , f, h, sf )

3. While sf = 1

(a) d = x ’ xb ; xb = x; xC = x + d
(b) Call hjexplore(x, xC , f, h, sf );
If sf = 0; xC = x; Call hjexplore(x, xC , f, h, sf )



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




148 ITERATIVE METHODS FOR OPTIMIZATION

Step 3b requires care in implementation. If sf = 0 on exit from the ¬rst call to hjexplore,
one should only test f at those points on the stencil centered at x that have not been evaluated
before.
The Hooke“Jeeves algorithm simply calls hjsearch repeatedly as h varies over a sequence
{hk } of scales.
Algorithm 8.3.3. hooke(x, f, {hk })

1. For k = 1, . . .
Call hjsearch(x, f, hk )

As is the case with implicit ¬ltering, the Hooke“Jeeves algorithm can be applied to bound
constrained problems in a completely natural way [145], [227] by simply restricting the stencil
points to those that satisfy the bounds and avoiding pattern moves that leave the feasible region.
The Hooke“Jeeves algorithm shares with implicit ¬ltering the property that extension to
bound constrained problems is trivial [145]. One simply restricts the exploratory and pattern
moves to the feasible set.

8.3.2 Convergence and the Simplex Gradient
As with MDS, if the set of sampling points remains bounded, only ¬nitely many explorations
can take place before hjsearch returns and the scale must be reduced. The conditions for
reduction in the scale include failure of an exploratory move centered at the current best point x.
This means that we can apply Theorem 6.2.9 with κ+ = 1 to prove the same result we obtained
for MDS.
Theorem 8.3.1. Let f satisfy (6.1). Let {xk } be the sequence of Hooke“Jeeves best points.
Assume that the set
{x | f (x) ¤ f (x0 )}
is bounded. Then let hk ’ 0 and if

φ Bk
lim = 0,
k’∞ σ+ (S k )


where B k is the ball of radius 2hk about xk , then every limit point of {xk } is a critical point of
fs .



8.4 Other Approaches
In this section we brie¬‚y discuss two methods that have been used successfully for noisy prob-
lems. These methods are substantially more dif¬cult to implement than the ones that we have
discussed so far and we will give few details. The pointers to the literature are a good starting
place for the interested and energetic reader.

8.4.1 Surrogate Models
As any sampling method progresses, the function values can be used to build a (possibly)
quadratic model based, for example, on interpolation or least squares ¬t-to-data. Such mod-
els are called surrogates or response surfaces. Even for smooth f there are risks in doing this.
Points from early in the iteration may corrupt an accurate model that could be built from the
more recent points; however, the most recent points alone may not provide a rich enough set of


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 149

interpolatory data. The function being modeled could be too complex to be modeled in a sim-
ple way (think of the Lennard“Jones function), and very misleading results could be obtained.
However, this approach is often very productive even for smooth problems in which evaluation
of f is very expensive (see [28] for a high-¬‚ying example).
Initialization of the model requires an initial set of points at which to sample f . Selection of
this point set is not a trivial issue, and the regular stencils used in implicit ¬ltering and the direct
search algorithms are very poor choices. The study of this issue alone is a ¬eld in itself, called
design and analysis of computer experiments (DACE) [27], [167], [230].
Having built such a model, one then ¬nds one or more local minima of the model. One can
use either a conventional gradient-based method, a sampling algorithm of the type discussed in
Chapters 7 or 8, or an algorithm that is itself based on building models like the one described in
[62], the nongradient-based approaches being used when the model is expected to be multimodal
or nonconvex. Upon minimizing the model, one then evaluates f again at one or more new points.
The implementation of such a scheme requires careful coordination between the sampling
of the function, the optimization of the model, and the changing of the set of sample points. We
refer the reader to [28] and [4] for more information on recent progress in this area.

8.4.2 The DIRECT Algorithm
Suppose f is a Lipschitz continuous function on [a, b] with Lipschitz constant L. If one has a
priori knowledge of L, one can use this in a direct search algorithm to eliminate intervals of
possible optimal points based on the function values at the endpoints of these intervals. The
Shubert algorithm [146], [214], [241] is the simplest way to use this idea. The method begins
with the fact that

f (x) ≥ flow (x, a, b) = max(f (a) ’ L(x ’ a), f (b) ’ L(b ’ x))
(8.11)

for all x ∈ [a, b]. If one samples f repeatedly, one can use (8.11) on a succession of intervals
and obtain a piecewise linear approximation to f . If In = [an , bn ] ‚ [a, b] then f (x) ≥
flow (x, an , bn ) on In , the minimum value of flow (x, an , bn ) is

Vn = (f (an ) + f (bn ) ’ L(bn ’ an ))/2,

and the minimizer is

Mn = (f (an ) ’ f (bn ) + L(bn + an ))/(2L).

The algorithm begins with I0 = [a, b], selects the interval for which Vn is least, and divides at
Mn . This means that if K intervals have been stored we have, replacing In and adding IK+1 to
the list,
In = [an , Mn ] and IK+1 = [Mn , bn ].
The sequence of intervals is only ordered by the iteration counter, not by location. In this way
the data structure for the intervals is easy to manage.
If there are p and k such that p = k and Vp ≥ max(f (ak ), f (bk )), then Ip need not be
searched any longer, since the best value from Ip is worse than the best value in Ik . The
algorithm™s rule for division automatically incorporates this information and will never sample
from Ip .
There are two problems with this algorithm. One cannot expect to know the Lipschitz
constant L, so it must be estimated. An estimated Lipschitz constant that is too low can lead to
erroneous rejection of an interval. An estimate that is too large will lead to slow convergence,
since intervals that should have been discarded will be repeatedly divided. The second problem



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




150 ITERATIVE METHODS FOR OPTIMIZATION

10


9


8


7


6


5


4


3


2


1


<< . .

. 26
( : 32)



. . >>