r3

r2

r3

e3

e3

e2

Figure 8.8: MDS Simplices and New Points

1. Evaluate f at the vertices of S and sort the vertices of S so that (8.1) holds.

2. Set f count = N + 1.

3. While f (xN +1 ) ’ f (x1 ) > „

(a) Re¬‚ect: If f count = kmax then exit.

For j = 2, . . . , N +1: rj = x1 ’(xj ’x1 ); Compute f (rj ); f count = f count+1.

If f (x1 ) > minj {f (rj )} then goto step 3b else goto step 3c.

(b) Expand:

i. For j = 2, . . . , N + 1: ej = x1 ’ µe (xj ’ x1 ); Compute f (ej ); f count =

f count + 1.

ii. If minj {f (rj )} > minj {f (ej )} then

for j = 2, . . . N + 1: xj = ej

else

for j = 2, . . . N + 1: xj = rj

iii. Goto step 3d

(c) Contract: For j = 2, . . . , N + 1: xj = x1 + µc (xj ’ x1 ), Compute f (xj )

(d) Sort: Sort the vertices of S so that (8.1) holds.

If the function values at the vertices of S c are known, then the cost of computing S + is 2N

additional evaluations. Just as with the Nelder“Mead algorithm, the expansion step is optional

but has been observed to improve performance.

The extension of MDS to bound constrained and linearly constrained problems is not trivial.

We refer the reader to [173] and [175] for details.

8.2.2 Convergence and the Simplex Gradient

Assume that the simplices are either equilateral or right simplices (having one vertex from which

all N edges are at right angles). In those cases, as pointed out in [262], the possible vertices

created by expansion and re¬‚ection steps form a regular lattice of points. If the MDS simplices

remain bounded, only ¬nitely many re¬‚ections and expansions are possible before every point

on that lattice has been visited and a contraction to a new maximal simplex size must take place.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

SEARCH ALGORITHMS 145

This exhaustion of a lattice takes place under more general conditions [262] but is most clear for

the equilateral and right simplex cases.

Theorem 6.2.9 implies that in¬nitely many contractions and convergence of the simplex

diameters to zero imply convergence of the simplex gradient to zero. The similarity of The-

orem 6.2.9 to Lemma 6.2.2 and of Theorem 8.2.1, the convergence result for multidirectional

search, to Theorem 8.1.2 is no accident. The Nelder“Mead iteration, which is more aggressive

than the multidirectional search iteration, requires far stronger assumptions (well conditioning

and suf¬cient decrease) for convergence, but the ideas are the same. Theorems 6.2.9 and 8.2.1

can be used to extend the results in [262] to the noisy case. The observation in [85] that one

can apply any heuristic or machine-dependent idea to improve performance, say, by exploring

far away points on spare processors (the “speculative function evaluations” of [46]) without

affecting the analysis is still valid here.

Theorem 8.2.1. Let f satisfy (6.1) and assume that the set

{x | f (x) ¤ f (x0 )}

1

is bounded. Assume that the simplex shape is such that

lim σ+ (S k ) ’ 0.

(8.9)

k’∞

Let B k be a ball of radius 2σ+ (S k ) about xk . Then if

1

φ Bk

lim =0

k’∞ σ+ (S k )

then every limit point of the vertices is a critical point of fs .

Recall that if the simplices are equilateral or right simplices, then (8.9) holds (see exer-

cise 8.6.2).

8.3 The Hooke“Jeeves Algorithm

8.3.1 Description and Implementation

The Hooke“Jeeves algorithm is like implicit ¬ltering in that the objective is evaluated on a stencil

and the function values are used to compute a search direction. However, unlike implicit ¬ltering,

there are only ¬nitely many possible search directions and only qualitative information about

the function values is used.

The algorithm begins with a base point x and pattern size h, which is like the scale in implicit

¬ltering. In the next phase of the algorithm, called the exploratory move in [145], the function is

sampled at successive perturbations of the base point in the search directions {vj }, where vj is

the jth column of a direction matrix V . In [145] and our MATLAB implementation V = I. The

current best value fcb = f (xcb ) and best point xcb are recorded and returned. xcb is initialized

to x. The sampling is managed by ¬rst evaluating f at xcb + vj and only testing xcb ’ vj

if f (xcb + vj ) ≥ f (xcb ). The exploratory phase will either produce a new base point or fail

(meaning that xcb = x). Note that this phase depends on the ordering of the coordinates of x.

Applying a permutation to x could change the output of the exploration.

If the exploratory phase has succeeded, the search direction is

dHJ = xcb ’ x

(8.10)

and the new base point is xcb . The subtle part of the algorithm begins here. Rather than center

the next exploration at xcb , which would use some of the same points that were examined in

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

146 ITERATIVE METHODS FOR OPTIMIZATION

the previous exploration, the Hooke“Jeeves pattern move step is aggressive and tries to move

further. The algorithm centers the next exploratory move at

xC = x + 2dHJ = xcb + dHJ .

If this second exploratory move fails to improve upon f (xcb ), then an exploratory move with

xcb as the center is tried. If that fails h is reduced, x is set to xcb , and the process is started over.

Note that when h has just been set, the base point and the center of the stencil for the exploratory

moves are the same, but afterward they are not.

If, after the ¬rst exploratory move, xcb = x (i.e., as it will be if x is the best point in the

pattern), then x is left unchanged and h is reduced.

Therefore, whenever h is reduced, the stencil centered at x has x itself as the best point.

This is exactly the situation that led to a shrink in the MDS algorithm and, as you might expect,

will enable us to prove a convergence result like those in the previous sections. In [145] h was

simply multiplied by a constant factor. Our description in Algorithm hooke follows the model

of implicit ¬ltering and uses a sequence of scales. Choice of perturbation directions could be

generalized to any simplex shape, not just the right simplices used in [145].

Figure 8.9 illustrates the idea for N = 2. The base point x lies at the center of the stencil. If

f (x+ ) < f (x), f (x+ ) < f (x), f (x’ ) ≥ f (x), and f (x’ ) ≥ f (x),

1 2 1 2

then the new base point xb will be located above and to the right of x. The next exploratory

move will be centered at xC , which is the center of the stencil in the upper right corner of the

¬gure.

The reader, especially one who plans to implement this method, must be mindful that points

may be sampled more than once. For example, in the ¬gure, if the exploratory move centered

at xC fails, f will be evaluated for the second time at the four points in the stencil centered

at xb unless the algorithm is implemented to avoid this. The MDS method is also at risk of

sampling points more than once. The implementations of Hooke“Jeeves and MDS in our suite

of MATLAB codes keep the most recent 4N iterations in memory to guard against this. This

reevaluation is much less likely for the Nelder“Mead and implicit ¬ltering methods. One should

also be aware that the Hooke“Jeeves algorithm, like Nelder“Mead, does not have the natural

parallelism that implicit ¬ltering and MDS do.

One could implement a variant of the Hooke“Jeeves iteration by using xC = x + dHJ

instead of xC = x + 2dHJ and shrinking the size of the simplex on stencil failure. This is the

discrete form of the classical coordinate descent algorithm [180] and can also be analyzed by

the methods of this section (see [279] for a different view).

Our implementation follows the model of implicit ¬ltering as well as the description in

[145]. We begin with the exploratory phase, which uses a base point xb , base function value

fb = f (xb ), and stencil center xC . Note that in the algorithm xb = xC for the ¬rst exploration

and xC = xb +dHJ thereafter. Algorithm hjexplore takes a base point and a scale and returns

a direction and the value at the trial point x + d. We let V = I be the matrix of coordinate

directions, but any nonsingular matrix of search directions could be used. The status ¬‚ag sf is

used to signal failure and trigger a shrink step.

Algorithm 8.3.1. hjexplore(xb , xC , f, h, sf )

1. fb = f (xb ); d = 0; sf = 0; xcb = xb ; fcb = f (xb ); xt = xC

2. for j = 1, . . . , N : p = xt + hvj ; if f (p) ≥ fb then p = xt ’ hvj ;

if f (p) < fb then xt = xcb = p; fb = f (xcb )

3. if xcb = xb ; sf = 1; xb = xcb

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

SEARCH ALGORITHMS 147

xC2+

xC1- xC1+

xC

x2+ xb xC2-

s1+

x1- x1+

x

s1- s1+

s2-

x2-

Figure 8.9: Hooke“Jeeves Pattern and New Points

The exploration is coupled to the pattern move to complete the algorithm for a single value

of the scale. The inputs for Algorithm hjsearch are an initial iterate x, the function, and the

scale. On output, a point x is returned for which the exploration has failed. There are other

considerations, such as the budget for function evaluations, that should trigger a return from the

exploratory phase in a good implementation. In our MATLAB code hooke.m we pay attention

to the number of function evaluations and change in the function value as part of the decision to

return from the exploratory phase.

Algorithm 8.3.2. hjsearch(x, f, h)

1. xb = x; xC = x; sf = 1

2. Call hjexplore(x, xC , f, h, sf )

3. While sf = 1

(a) d = x ’ xb ; xb = x; xC = x + d

(b) Call hjexplore(x, xC , f, h, sf );

If sf = 0; xC = x; Call hjexplore(x, xC , f, h, sf )

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

148 ITERATIVE METHODS FOR OPTIMIZATION

Step 3b requires care in implementation. If sf = 0 on exit from the ¬rst call to hjexplore,

one should only test f at those points on the stencil centered at x that have not been evaluated

before.

The Hooke“Jeeves algorithm simply calls hjsearch repeatedly as h varies over a sequence

{hk } of scales.

Algorithm 8.3.3. hooke(x, f, {hk })

1. For k = 1, . . .

Call hjsearch(x, f, hk )

As is the case with implicit ¬ltering, the Hooke“Jeeves algorithm can be applied to bound

constrained problems in a completely natural way [145], [227] by simply restricting the stencil

points to those that satisfy the bounds and avoiding pattern moves that leave the feasible region.

The Hooke“Jeeves algorithm shares with implicit ¬ltering the property that extension to

bound constrained problems is trivial [145]. One simply restricts the exploratory and pattern

moves to the feasible set.

8.3.2 Convergence and the Simplex Gradient

As with MDS, if the set of sampling points remains bounded, only ¬nitely many explorations

can take place before hjsearch returns and the scale must be reduced. The conditions for

reduction in the scale include failure of an exploratory move centered at the current best point x.

This means that we can apply Theorem 6.2.9 with κ+ = 1 to prove the same result we obtained

for MDS.

Theorem 8.3.1. Let f satisfy (6.1). Let {xk } be the sequence of Hooke“Jeeves best points.

Assume that the set

{x | f (x) ¤ f (x0 )}

is bounded. Then let hk ’ 0 and if

φ Bk

lim = 0,

k’∞ σ+ (S k )

where B k is the ball of radius 2hk about xk , then every limit point of {xk } is a critical point of

fs .

8.4 Other Approaches

In this section we brie¬‚y discuss two methods that have been used successfully for noisy prob-

lems. These methods are substantially more dif¬cult to implement than the ones that we have

discussed so far and we will give few details. The pointers to the literature are a good starting

place for the interested and energetic reader.

8.4.1 Surrogate Models

As any sampling method progresses, the function values can be used to build a (possibly)

quadratic model based, for example, on interpolation or least squares ¬t-to-data. Such mod-

els are called surrogates or response surfaces. Even for smooth f there are risks in doing this.

Points from early in the iteration may corrupt an accurate model that could be built from the

more recent points; however, the most recent points alone may not provide a rich enough set of

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

SEARCH ALGORITHMS 149

interpolatory data. The function being modeled could be too complex to be modeled in a sim-

ple way (think of the Lennard“Jones function), and very misleading results could be obtained.

However, this approach is often very productive even for smooth problems in which evaluation

of f is very expensive (see [28] for a high-¬‚ying example).

Initialization of the model requires an initial set of points at which to sample f . Selection of

this point set is not a trivial issue, and the regular stencils used in implicit ¬ltering and the direct

search algorithms are very poor choices. The study of this issue alone is a ¬eld in itself, called

design and analysis of computer experiments (DACE) [27], [167], [230].

Having built such a model, one then ¬nds one or more local minima of the model. One can

use either a conventional gradient-based method, a sampling algorithm of the type discussed in

Chapters 7 or 8, or an algorithm that is itself based on building models like the one described in

[62], the nongradient-based approaches being used when the model is expected to be multimodal

or nonconvex. Upon minimizing the model, one then evaluates f again at one or more new points.

The implementation of such a scheme requires careful coordination between the sampling

of the function, the optimization of the model, and the changing of the set of sample points. We

refer the reader to [28] and [4] for more information on recent progress in this area.

8.4.2 The DIRECT Algorithm

Suppose f is a Lipschitz continuous function on [a, b] with Lipschitz constant L. If one has a

priori knowledge of L, one can use this in a direct search algorithm to eliminate intervals of

possible optimal points based on the function values at the endpoints of these intervals. The

Shubert algorithm [146], [214], [241] is the simplest way to use this idea. The method begins

with the fact that

f (x) ≥ flow (x, a, b) = max(f (a) ’ L(x ’ a), f (b) ’ L(b ’ x))

(8.11)

for all x ∈ [a, b]. If one samples f repeatedly, one can use (8.11) on a succession of intervals

and obtain a piecewise linear approximation to f . If In = [an , bn ] ‚ [a, b] then f (x) ≥

flow (x, an , bn ) on In , the minimum value of flow (x, an , bn ) is

Vn = (f (an ) + f (bn ) ’ L(bn ’ an ))/2,

and the minimizer is

Mn = (f (an ) ’ f (bn ) + L(bn + an ))/(2L).

The algorithm begins with I0 = [a, b], selects the interval for which Vn is least, and divides at

Mn . This means that if K intervals have been stored we have, replacing In and adding IK+1 to

the list,

In = [an , Mn ] and IK+1 = [Mn , bn ].

The sequence of intervals is only ordered by the iteration counter, not by location. In this way

the data structure for the intervals is easy to manage.

If there are p and k such that p = k and Vp ≥ max(f (ak ), f (bk )), then Ip need not be

searched any longer, since the best value from Ip is worse than the best value in Ik . The

algorithm™s rule for division automatically incorporates this information and will never sample

from Ip .

There are two problems with this algorithm. One cannot expect to know the Lipschitz

constant L, so it must be estimated. An estimated Lipschitz constant that is too low can lead to

erroneous rejection of an interval. An estimate that is too large will lead to slow convergence,

since intervals that should have been discarded will be repeatedly divided. The second problem

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

150 ITERATIVE METHODS FOR OPTIMIZATION

10

9

8

7

6

5

4

3

2

1