<< . .

. 27
( : 32)



. . >>

0
0 1 2 3 4 5 6 7


Figure 8.10: Selection of Intervals in DIRECT


is far more serious. The obvious generalization of the Shubert algorithm to more than one
dimension would replace intervals by N -dimensional hyperrectangles and require sampling at
each of the 2N vertices of the rectangle to be divided. This exponential complexity makes this
trivial generalization of the Shubert algorithm completely impractical.
The DIRECT algorithm [150] attempts to address these problems by sampling at the midpoint
of the hyperrectangle rather than the vertices and indirectly estimating the Lipschitz constant as
the optimization progresses. The scheme is not completely successful in that the mesh of sample
points becomes everywhere dense as the optimization progresses. Hence the algorithm becomes
an exhaustive search, a fact that is used in [150] to assert global convergence. In spite of the
exponential complexity of exhaustive search, even one with a ¬xed-size mesh (a problem with
any deterministic algorithm that is truly global [248]), DIRECT has been reported to perform well
in the early phases of the iteration [150], [108] and for suites of small test problems. DIRECT is
worth consideration as an intermediate algorithmic level between methods like implicit ¬ltering,
Nelder“Mead, Hooke“Jeeves, or MDS on the conservative side and nondeterministic methods
like simulated annealing or genetic algorithms on the radical side.
We will describe DIRECT completely only for the case N = 1. This will make clear how
the algorithm implicitly estimates the Lipschitz constant. The extension to larger values of N
requires careful management of the history of subdivision of the hyperrectangles, and we will give
a simple pictorial account of that. For more details we refer to [150], [147], or the documentation
[108] of the FORTRAN implementation of DIRECT from the software collection.
As with the Shubert algorithm we begin with an interval [a, b] but base our lower bound and
our subdivision strategy on the midpoint c = (a + b)/2. If the Lipschitz constant L is known


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 151

then

f (x) ≥ f (c) ’ L(b ’ a)/2.


If we are to divide an interval and also retain the current value c as the midpoint of an interval
in the set of intervals, we must divide an interval into three parts. If there are K intervals on
the list and an interval In = [an , bn ] with midpoint cn has been selected for division, the new
intervals are

IK+1 = [an , an + (bn ’ an )/3], In = [an + (bn ’ an )/3, bn ’ (bn ’ an )/3], and
IK+2 = [bn ’ (bn ’ an )/3, bn ].

So cn is still the midpoint of In and two new midpoints have been added.
The remaining part of the algorithm is the estimation of the Lipschitz constant and the
simultaneous selection of the intervals to be divided. If the Lipschitz constant were known, an
interval would be selected for division if f (c) ’ L(b ’ a)/2 were smallest. This is similar to
the Shubert algorithm. In order for there to even exist a Lipschitz constant that would force
an interval to be selected for division in this way, that interval must have the smallest midpoint
value of all intervals having the same length. Moreover, there should be no interval of a different
length for which f (c) ’ L(b ’ a)/2 was smaller.
The DIRECT algorithm applies this rule to all possible combinations of possible Lipschitz
constants and interval sizes. If one plots the values of f at the midpoints against the lengths of
the intervals in the list to obtain a plot like the one in Figure 8.10, one can visually eliminate
all but one interval for each interval length. By taking the convex hull of the lowest points, one
can eliminate interval lengths for which all function values are so high that f (c) ’ L(b ’ a)/2
would be smaller for the best point at a different length no matter what L was. For example, the
three points that intersect the line in Figure 8.10 would correspond to intervals that would be
subdivided at this step. The slopes of the line segments through the three points are estimates
of the Lipschitz constant. These estimates are not used explicitly, as they would be in the
Shubert algorithm, but implicitly in the process of selection of intervals to be divided. Unlike
the Shubert algorithm, where the Lipschitz constant is assumed known, the DIRECT algorithm
will eventually subdivide every interval.
The resulting algorithm may divide more than a single interval at each stage and the number
of intervals to be divided may vary. This is easy to implement for a single variable. However,
for more than one variable there are several ways to divide a hyperrectangle into parts and one
must keep track of how an interval has previously been divided in order not to cluster sample
points prematurely by repeatedly dividing an interval in the same way. Figures 8.11 and 8.12,
taken from [108], illustrate this issue for N = 2. In Figure 8.11 the entire rectangle will be
divided. Shading indicates that the rectangle has been selected for division. Four new midpoints
are sampled. The subdivision into new rectangles could be done in two ways: the ¬gure shows
an initial horizontal subdivision followed by a vertical division of the rectangle that contains the
original center. The second division is shown in Figure 8.12. The two shaded rectangles are
selected for division. Note that four new centers are added to the small square and two to the
larger, nonsquare, rectangle. In this way the minimum number of new centers is added.
DIRECT parallelizes in a natural way. All hyperrectangles that are candidates for division
may be divided simultaneously, and for each hyperrectangle the function evaluations at each of
the new midpoints can also be done in parallel. We refer the reader to [150] and [108] for details
on the data structures and to [108] for a FORTRAN implementation and additional discussion
on the exploitation of parallelism.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




152 ITERATIVE METHODS FOR OPTIMIZATION



6 6 6




5 9 8 5 9 8 5 9 8




2 2 2




Figure 8.11: Initial Division of Rectangles with DIRECT



6 6 6



9 9

5 9 8 7 5 6 9 8 7 5 6 9 8

4 4



2 3 2 6 3 2 6




Figure 8.12: Second Division of Rectangles with DIRECT


8.5 Examples
In each of the examples we compare the central difference BFGS form of implicit ¬ltering from
§7.6 (solid line) with the Nelder“Mead (dashed line), Hooke“Jeeves (solid line with circles), and
MDS (dashed-dotted line) algorithms.
For each example we speci¬ed both an initial iterate and choice of scales. This is suf¬cient
to initialize both implicit ¬ltering and Hooke“Jeeves. We used the implicit ¬ltering forward
difference stencil as the initial simplex for both Nelder“Mead and MDS.
The plots re¬‚ect the differences in the startup procedures for the varying algorithms. In
particular, Nelder“Mead and MDS sort the simplex and hence, if the initial iterate is not the best
point, report the lower value as the ¬rst iterate.
The relative performance of the various methods on these example problems should not be
taken as a de¬nitive evaluation, nor should these examples be thought of as a complete suite of test
problems. One very signi¬cant factor that is not re¬‚ected in the results in this section is that both
implicit ¬ltering [69], [55] and multidirectional search [85] are easy to implement in parallel,
while Nelder“Mead and Hooke“Jeeves are inherently sequential. The natural parallelism of
implicit ¬ltering and multidirectional search can be further exploited by using idle processors to
explore other points on the line search direction or the pattern.


8.5.1 Weber™s Problem
The initial data were the same as that in §7.6.1. Implicit ¬ltering does relatively poorly for this
problem because of the nonsmoothness at optimality. The resluts for these problems are plotted


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 153

’80


’100


’120


’140


’160
function value




’180


’200


’220


’240


’260


’280
0 50 100 150 200 250
function evaluations


Figure 8.13: First Weber Example


in Figures 8.13, 8.14, and 8.15. The other three algorithms perform equally well. Note in the
third example that MDS ¬nds a local minimum that is not the global minimum.

8.5.2 Parameter ID
In the computations reported in this section each algorithm was allowed 500 evaluations of f
and the sequence of scales was {2’j }12 .
j=1
We begin with the two examples from §7.6.2. With the initial iterate of (5, 5)T , the exact
solution to the continuous problem lies on the grid that the Hooke“Jeeves algorithm uses to search
for the solution. This explains the unusually good performance of the Hooke“Jeeves optimization
shown in both Figures 8.16 and 8.17. When the initial iterate is changed to (5.1, 5.3)T , the
performance of Hooke“Jeeves is very different as one can see from Figures 8.18 and 8.19. The
other algorithms do not have such a sensitivity to the initial iterate for this example. We have no
explanation for the good performance turned in by the Nelder“Mead algorithm on this problem.

8.5.3 Convex Quadratics
The problems and initial data are the same as those in §7.6.3. This is an example of how sampling
algorithms can perform poorly for very simple problems and how this poor performance is made
worse by increasing the problem size. Exercise 7.7.4 illustrates this point very directly. One
would expect implicit ¬ltering to do well since a central difference gradient has no error for
quadratic problems. For the larger problem (N = 32, Figures 8.21 and 8.23), both the Nelder“
Mead and MDS algorithms perform poorly while the Hooke“Jeeves algorithm does surprisingly
well. The difference in performance of the algorithms is much smaller for the low-dimensional
problem (N = 4, Figures 8.20 and 8.22).




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




154 ITERATIVE METHODS FOR OPTIMIZATION



70




60




50
function value




40




30




20




10




0
0 20 40 60 80 100 120 140
function evaluations


Figure 8.14: Second Weber Example




70




60




50
function value




40




30




20




10
0 20 40 60 80 100 120
function evaluations


Figure 8.15: Third Weber Example




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 155



2
10




1
10




0
10
function value




’1
10




’2
10




’3
10
0 100 200 300 400 500 600
function evaluations


Figure 8.16: Parameter ID, tol = 10’3 , x0 = (5, 5)T




2
10




0
10




’2
10
function value




’4

<< . .

. 27
( : 32)



. . >>