0 1 2 3 4 5 6 7

Figure 8.10: Selection of Intervals in DIRECT

is far more serious. The obvious generalization of the Shubert algorithm to more than one

dimension would replace intervals by N -dimensional hyperrectangles and require sampling at

each of the 2N vertices of the rectangle to be divided. This exponential complexity makes this

trivial generalization of the Shubert algorithm completely impractical.

The DIRECT algorithm [150] attempts to address these problems by sampling at the midpoint

of the hyperrectangle rather than the vertices and indirectly estimating the Lipschitz constant as

the optimization progresses. The scheme is not completely successful in that the mesh of sample

points becomes everywhere dense as the optimization progresses. Hence the algorithm becomes

an exhaustive search, a fact that is used in [150] to assert global convergence. In spite of the

exponential complexity of exhaustive search, even one with a ¬xed-size mesh (a problem with

any deterministic algorithm that is truly global [248]), DIRECT has been reported to perform well

in the early phases of the iteration [150], [108] and for suites of small test problems. DIRECT is

worth consideration as an intermediate algorithmic level between methods like implicit ¬ltering,

Nelder“Mead, Hooke“Jeeves, or MDS on the conservative side and nondeterministic methods

like simulated annealing or genetic algorithms on the radical side.

We will describe DIRECT completely only for the case N = 1. This will make clear how

the algorithm implicitly estimates the Lipschitz constant. The extension to larger values of N

requires careful management of the history of subdivision of the hyperrectangles, and we will give

a simple pictorial account of that. For more details we refer to [150], [147], or the documentation

[108] of the FORTRAN implementation of DIRECT from the software collection.

As with the Shubert algorithm we begin with an interval [a, b] but base our lower bound and

our subdivision strategy on the midpoint c = (a + b)/2. If the Lipschitz constant L is known

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

SEARCH ALGORITHMS 151

then

f (x) ≥ f (c) ’ L(b ’ a)/2.

If we are to divide an interval and also retain the current value c as the midpoint of an interval

in the set of intervals, we must divide an interval into three parts. If there are K intervals on

the list and an interval In = [an , bn ] with midpoint cn has been selected for division, the new

intervals are

IK+1 = [an , an + (bn ’ an )/3], In = [an + (bn ’ an )/3, bn ’ (bn ’ an )/3], and

IK+2 = [bn ’ (bn ’ an )/3, bn ].

So cn is still the midpoint of In and two new midpoints have been added.

The remaining part of the algorithm is the estimation of the Lipschitz constant and the

simultaneous selection of the intervals to be divided. If the Lipschitz constant were known, an

interval would be selected for division if f (c) ’ L(b ’ a)/2 were smallest. This is similar to

the Shubert algorithm. In order for there to even exist a Lipschitz constant that would force

an interval to be selected for division in this way, that interval must have the smallest midpoint

value of all intervals having the same length. Moreover, there should be no interval of a different

length for which f (c) ’ L(b ’ a)/2 was smaller.

The DIRECT algorithm applies this rule to all possible combinations of possible Lipschitz

constants and interval sizes. If one plots the values of f at the midpoints against the lengths of

the intervals in the list to obtain a plot like the one in Figure 8.10, one can visually eliminate

all but one interval for each interval length. By taking the convex hull of the lowest points, one

can eliminate interval lengths for which all function values are so high that f (c) ’ L(b ’ a)/2

would be smaller for the best point at a different length no matter what L was. For example, the

three points that intersect the line in Figure 8.10 would correspond to intervals that would be

subdivided at this step. The slopes of the line segments through the three points are estimates

of the Lipschitz constant. These estimates are not used explicitly, as they would be in the

Shubert algorithm, but implicitly in the process of selection of intervals to be divided. Unlike

the Shubert algorithm, where the Lipschitz constant is assumed known, the DIRECT algorithm

will eventually subdivide every interval.

The resulting algorithm may divide more than a single interval at each stage and the number

of intervals to be divided may vary. This is easy to implement for a single variable. However,

for more than one variable there are several ways to divide a hyperrectangle into parts and one

must keep track of how an interval has previously been divided in order not to cluster sample

points prematurely by repeatedly dividing an interval in the same way. Figures 8.11 and 8.12,

taken from [108], illustrate this issue for N = 2. In Figure 8.11 the entire rectangle will be

divided. Shading indicates that the rectangle has been selected for division. Four new midpoints

are sampled. The subdivision into new rectangles could be done in two ways: the ¬gure shows

an initial horizontal subdivision followed by a vertical division of the rectangle that contains the

original center. The second division is shown in Figure 8.12. The two shaded rectangles are

selected for division. Note that four new centers are added to the small square and two to the

larger, nonsquare, rectangle. In this way the minimum number of new centers is added.

DIRECT parallelizes in a natural way. All hyperrectangles that are candidates for division

may be divided simultaneously, and for each hyperrectangle the function evaluations at each of

the new midpoints can also be done in parallel. We refer the reader to [150] and [108] for details

on the data structures and to [108] for a FORTRAN implementation and additional discussion

on the exploitation of parallelism.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

152 ITERATIVE METHODS FOR OPTIMIZATION

6 6 6

5 9 8 5 9 8 5 9 8

2 2 2

Figure 8.11: Initial Division of Rectangles with DIRECT

6 6 6

9 9

5 9 8 7 5 6 9 8 7 5 6 9 8

4 4

2 3 2 6 3 2 6

Figure 8.12: Second Division of Rectangles with DIRECT

8.5 Examples

In each of the examples we compare the central difference BFGS form of implicit ¬ltering from

§7.6 (solid line) with the Nelder“Mead (dashed line), Hooke“Jeeves (solid line with circles), and

MDS (dashed-dotted line) algorithms.

For each example we speci¬ed both an initial iterate and choice of scales. This is suf¬cient

to initialize both implicit ¬ltering and Hooke“Jeeves. We used the implicit ¬ltering forward

difference stencil as the initial simplex for both Nelder“Mead and MDS.

The plots re¬‚ect the differences in the startup procedures for the varying algorithms. In

particular, Nelder“Mead and MDS sort the simplex and hence, if the initial iterate is not the best

point, report the lower value as the ¬rst iterate.

The relative performance of the various methods on these example problems should not be

taken as a de¬nitive evaluation, nor should these examples be thought of as a complete suite of test

problems. One very signi¬cant factor that is not re¬‚ected in the results in this section is that both

implicit ¬ltering [69], [55] and multidirectional search [85] are easy to implement in parallel,

while Nelder“Mead and Hooke“Jeeves are inherently sequential. The natural parallelism of

implicit ¬ltering and multidirectional search can be further exploited by using idle processors to

explore other points on the line search direction or the pattern.

8.5.1 Weber™s Problem

The initial data were the same as that in §7.6.1. Implicit ¬ltering does relatively poorly for this

problem because of the nonsmoothness at optimality. The resluts for these problems are plotted

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

SEARCH ALGORITHMS 153

’80

’100

’120

’140

’160

function value

’180

’200

’220

’240

’260

’280

0 50 100 150 200 250

function evaluations

Figure 8.13: First Weber Example

in Figures 8.13, 8.14, and 8.15. The other three algorithms perform equally well. Note in the

third example that MDS ¬nds a local minimum that is not the global minimum.

8.5.2 Parameter ID

In the computations reported in this section each algorithm was allowed 500 evaluations of f

and the sequence of scales was {2’j }12 .

j=1

We begin with the two examples from §7.6.2. With the initial iterate of (5, 5)T , the exact

solution to the continuous problem lies on the grid that the Hooke“Jeeves algorithm uses to search

for the solution. This explains the unusually good performance of the Hooke“Jeeves optimization

shown in both Figures 8.16 and 8.17. When the initial iterate is changed to (5.1, 5.3)T , the

performance of Hooke“Jeeves is very different as one can see from Figures 8.18 and 8.19. The

other algorithms do not have such a sensitivity to the initial iterate for this example. We have no

explanation for the good performance turned in by the Nelder“Mead algorithm on this problem.

8.5.3 Convex Quadratics

The problems and initial data are the same as those in §7.6.3. This is an example of how sampling

algorithms can perform poorly for very simple problems and how this poor performance is made

worse by increasing the problem size. Exercise 7.7.4 illustrates this point very directly. One

would expect implicit ¬ltering to do well since a central difference gradient has no error for

quadratic problems. For the larger problem (N = 32, Figures 8.21 and 8.23), both the Nelder“

Mead and MDS algorithms perform poorly while the Hooke“Jeeves algorithm does surprisingly

well. The difference in performance of the algorithms is much smaller for the low-dimensional

problem (N = 4, Figures 8.20 and 8.22).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

154 ITERATIVE METHODS FOR OPTIMIZATION

70

60

50

function value

40

30

20

10

0

0 20 40 60 80 100 120 140

function evaluations

Figure 8.14: Second Weber Example

70

60

50

function value

40

30

20

10

0 20 40 60 80 100 120

function evaluations

Figure 8.15: Third Weber Example

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

SEARCH ALGORITHMS 155

2

10

1

10

0

10

function value

’1

10

’2

10

’3

10

0 100 200 300 400 500 600

function evaluations

Figure 8.16: Parameter ID, tol = 10’3 , x0 = (5, 5)T

2

10

0

10

’2

10

function value

’4