BASIC CONCEPTS AND GOALS 117

Theorem 6.2.8, like Theorem 6.2.4, motivates using a small simplex gradient as a test for

convergence. Suppose φ ∞ ¤ and an algorithm generates sequences of simplices whose

vertices are intended to approximate a minimizer of fs . We can use the results in §2.3.1 to con-

clude that simplices with σ+ (S) << 1/2 will result in inaccurate forward difference gradients

and those with σ+ (S) << 2/3 in inaccurate central difference gradients. This indicates that

the central simplex gradient will be less sensitive to noise than the forward. While this is not

usually critical in computing a difference Hessian, where the loss of accuracy may cause slow

convergence, it can cause failure of the iteration if one is computing a difference gradient.

If one wants to terminate the algorithm when the simplex gradient is small, say, ¤ „ , a rough

estimate of the minimal possible value of „ is „ = O( 1/2 ) for a forward difference simplex

gradient and „ = O( 2/3 ) for a central simplex gradient.

Moreover, if one is using a centered difference, one has information on the values of f at

enough points to make an important qualitative judgment. In order to evaluate a central simplex

gradient f must be sampled at x1 and x1 ± vj for 1 ¤ j ¤ N . If f (x1 ) ¤ f (x1 ± vj ) for

all 1 ¤ j ¤ N , then one can question the validity of using the simplex gradient as a descent

direction or as a measure of stationarity. We call this stencil failure. We will use stencil failure

as a termination criterion in most of the algorithms we discuss in this part of the book. Our basis

for that is a result from [29], which only requires differentiability of fs .

Theorem 6.2.9. Let S be a nonsingular simplex such that for some µ’ ∈ (0, 1) and κ+ > 0,

κ(V ) ¤ κ+ and xT V V T x ≥ µ’ σ+ (S)2 x 2

for all x.

(6.6)

Let f satisfy (6.1) and let ∇fs be Lipschitz continuously differentiable in a ball B of radius

2σ+ (S) about x1 . Assume that

f (x1 ) < min{f (x1 ± vj )}.

(6.7)

j

Then, if K is the constant from Lemma 6.2.2,

φB

∇fs (x1 ) ¤ 8µ’1 Kκ+ σ+ (S) + .

(6.8) ’

σ+ (S)

Proof. Let R(S), the re¬‚ected simplex, have vertices x1 and {rj }N . (6.7) implies that

j=1

each component of δ(f : S) and δ(f : R) is positive. Now since

V = V (S) = ’V (R),

we must have

0 < δ(f : S)T δ(f : R)

= (V T V ’T δ(f : S))T (V (R)T V (R)’T δ(f : R))

(6.9)

= ’D(f : S)T V V T D(f : R).

We apply Lemma 6.2.2 to both D(f : S) and D(f : R) to obtain

D(f : S) = ∇fs (x1 ) + E1 and D(f : R) = ∇fs (x1 ) + E2 ,

where, since κ(V ) = κ(V (R)) ¤ κ+ ,

φB

Ek ¤ Kκ+ σ+ (S) + .

σ+ (S)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

118 ITERATIVE METHODS FOR OPTIMIZATION

Since V ¤ 2σ+ (S) we have by (6.9)

∇fs (x1 )T V V T ∇fs (x1 ) ¤ 4σ+ (S)2 ∇fs (x1 ) ( E1 + E2 )

(6.10)

+4σ+ (S)2 E1 E2 .

The assumptions of the lemma give a lower estimate of the left side of (6.10),

wT V V T w ≥ µ’ σ+ (S)2 w 2 .

Hence,

∇2 f (x1 ) ¤ b ∇2 f (x1 ) + c,

where, using (6.10),

φB

b = 8µ’1 Ks κ+ σ+ (S) +

1

σ+ (S)

and

2

φB µ’ 2

4µ’1 (Ks κ+ )2

c= σ+ (S) + = B.

’

σ+ (S) 16

So b2 ’ 4c = b2 (1 ’ µ’ /4) and the quadratic formula then implies that

√

1 + 1 ’ µ’ /4

b + b2 ’ 4c

∇2 f (x1 ) ¤ =b ¤b

2 2

as asserted.

6.3 Examples

Our examples are selected to represent a variety of problems that can be attacked by the methods

in this part of the book and, at the same time, are easy for the reader to implement. Many of the

problems to which these methods have been applied have complex objective functions and have

been solved as team efforts [107], [250], [121], [70], [69]. In many such cases the objective

function is not even available as a single subroutine as the optimizer, simulator, and design tool

are one package. Hence, the examples we present in this part of the book are even more arti¬cial

than the ones in the ¬rst part. The cost of an evaluation of f is much less in these examples than

it is in practice.

6.3.1 Weber™s Problem

Our discussion of this problem is based on [182]. Weber™s problem is to locate a central facility

(a warehouse or factory, for example) so that the total cost associated with distribution to several

demand centers is minimized. The model is that the cost is proportional to the distance from the

facility. The proportionality constant may be positive re¬‚ecting transportation costs or negative

re¬‚ecting environmental concerns.

If the locations of the demand centers are {zi } ‚ R2 and the corresponding weights are

{wi }, then the objective function is

[(x)1 ’ (zi )1 ]2 + [(x)2 ’ (zi )2 ]2 .

f (x) = wi x ’ zi = wi

(6.11)

i i

We will assume that

wi > 0,

i

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BASIC CONCEPTS AND GOALS 119

so that a global optimum exists. If i wi < 0 then inf f = ’∞ and there is no global optimum.

Weber™s problem is not differentiable at x = zi because the square root function is not

differentiable at 0. A gradient-based algorithm, applied in a naive way, will have dif¬culty with

this problem. There are special-purpose algorithms (see [182] for a survey) for Weber™s problem,

especially if all the weights are positive. Our main interest is in the case where at least one weight

is negative. In that case there may be multiple local minima.

We will consider two examples. The ¬rst, and simplest, is from [182]. This example has

three demand centers with

2 90 43

w = (2, 4, ’5)T and (z1 , z2 , z3 ) = .

42 11 88

The global minimum is at x— = (90, 11)T , at which the gradient is not de¬ned. The complex

contours near the minimizer in Figure 6.2 illustrate the dif¬culty of the problem.

200

150

100

50

0

’50

’100

’150

’200

’200 ’150 ’100 ’50 0 50 100 150 200

Figure 6.2: Contour/Surface for Weber™s Function: First Example

Our second example has two local minimizers, at (’10, ’10) and (25, 30) with the global

minimizer at (25, 30). There are four demand centers with

’10 0 5 25

w = (2, ’4, 2, 1)T and (z1 , z2 , z3 , z4 ) = .

’10 0 8 30

See Figure 6.3.

Our third example adds the oscillatory function

φ(x) = sin(.0035xT x) + 5 sin(.003(x ’ y)T (x ’ y))

to the second example, where y = (’20, 0)T . This complicates the optimization landscape

signi¬cantly, as the surface and contour plots in Figure 6.4 show.

6.3.2 Perturbed Convex Quadratics

The sum of a simple convex quadratic and low-amplitude high-frequency perturbation will serve

as a model problem for all the algorithms in this section. For example, the function graphed on

the right in Figure 6.1,

f (x) = 2x2 (1 + .75 cos(80x)/12) + cos(100x)2 /24

is one of the examples in [120]. Our general form will be

= (x ’ ξ0 )T H(x ’ ξ0 )(1 + a1 cos(bT (x ’ ξ1 ) + c1 (x ’ ξ1 )T (x ’ ξ1 )))

f (x) 1

(6.12)

+a2 (1 + cos(bT (x ’ ξ2 )T + c2 (x ’ ξ2 )T (x ’ ξ2 ))) + a3 |rand|,

2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

120 ITERATIVE METHODS FOR OPTIMIZATION

60

40

20

0

’20

’40

’60

’60 ’40 ’20 0 20 40 60

Figure 6.3: Contour/Surface for Weber™s Function: Second Example

60

40

20

0

’20

’40

’60

’60 ’40 ’20 0 20 40 60

Figure 6.4: Contour/Surface plots for Weber™s Function: Third Example

where {ξj }, {aj }, {bj }, {cj } are given and rand is a random number generator. f has been

designed so that the minimum value is O(a1 +a2 +a3 ). The unperturbed case a1 = a2 = a3 = 0

is also of interest for many of the algorithms in this part of the book.

6.3.3 Lennard“Jones Problem

The objective function is a simple model of the potential energy in a molecule of identical atoms.

Assume that there are M atoms and that ξi ∈ R3 is the position of the ith atom. Letting

dij = ξi ’ ξj

and

x = (ξ1 , . . . , ξM )T ∈ RN

T T

where N = 3M , we have that the Lennard“Jones energy function is

M i’1

d’12 ’ 2d’6 .

f (x) =

(6.13) ij ij

i=1 j=1

2

f has many local minimizers (O(eM ) is one conjecture [142]) and the values at the mini-

mizers are close. Hence, the Lennard“Jones function does not conform to the noisy perturbation

of a smooth function paradigm. The reader is asked in some of the exercises to see how the

methods perform.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BASIC CONCEPTS AND GOALS 121

6.4 Exercises on Basic Concepts

6.4.1. Show that if wi > 0 for all i then Weber™s problem has a unique local minimum.

6.4.2. Prove Lemma 6.2.6.

6.4.3. Try to minimize the Lennard“Jones functional using some of the algorithms from the ¬rst

part of the book. Vary the initial iterate and M . Compare your best results with those in

[142], [40], and [210].

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

122 ITERATIVE METHODS FOR OPTIMIZATION

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 7

Implicit Filtering

7.1 Description and Analysis of Implicit Filtering

The implicit ¬ltering algorithm was originally formulated in [270], [251], and [271], as a

difference-gradient implementation of the gradient projection algorithm [18] in which the dif-

ference increment is reduced in size as the iteration progresses. A different formulation for

unconstrained problems with certain convexity properties was introduced at about the same time

in [279]. From the point of view of this book, the simplex gradient is used in a direct way. The

algorithmic description and analysis in this chapter uses the results from §6.2 directly. We will

focus on unconstrained problems and derive the convergence results that implicit ¬ltering shares

with the search algorithms in Chapter 8.

Implicit ¬ltering, by using an approximate gradient directly, offers the possibility of im-

proved performance with quasi-Newton methods and can be easily applied to bound constrained

problems. We explore these two possibilities in §§7.2 and 7.4.

In its simplest unconstrained form, implicit ¬ltering is the steepest descent algorithm with

difference gradients, where the difference increment varies as the iteration progresses. Because

the gradient is only an approximation, the computed steepest descent direction may fail to be a

descent direction and the line search may fail. In this event, the difference increment is reduced.

For a given x ∈ RN and h > 0 we let the simplex S(x, h) be the right simplex from x with

edges having length h. Hence the vertices are x and x + hvi for 1 ¤ i ¤ N with V = I. So

κ(V ) = 1. The performance of implicit ¬ltering with a central difference gradient is far superior

to that with the forward difference gradient [120], [187], [250]. We will, therefore, use centered

differences in the discussion. We illustrate the performance of forward difference gradients in

§7.6.

We set

∇h f (x) = DC (f : S(x, h)).

We use a simple Armijo [7] line search and demand that the suf¬cient decrease condition

2

f (x ’ »∇h f (x)) ’ f (x) < ’±» ∇h f (x)

(7.1)

holds (compare with (3.4)) for some ± > 0.

Our central difference steepest descent algorithm fdsteep terminates when

∇h f (x) ¤ „ h

(7.2)

for some „ > 0, when more than pmax iterations have been taken, after a stencil failure, or

when the line search fails by taking more than amax backtracks. Even the failures of fdsteep

123

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.