. 1
( : 32)



. . >>

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Contents

Preface xiii

How to Get the Software xv

I Optimization of Smooth Functions 1
1 Basic Concepts 3
1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Suf¬cient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Quadratic Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.1 Positive De¬nite Hessian . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.2 Inde¬nite Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6.1 Discrete Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6.2 Parameter Identi¬cation . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.3 Convex Quadratics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Exercises on Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Local Convergence of Newton™s Method 13
2.1 Types of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The Standard Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Newton™s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Errors in Functions, Gradients, and Hessians . . . . . . . . . . . . . . 17
2.3.2 Termination of the Iteration . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Nonlinear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Gauss“Newton Iteration . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.2 Overdetermined Problems . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Underdetermined Problems . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Inexact Newton Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Convergence Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.2 Implementation of Newton“CG . . . . . . . . . . . . . . . . . . . . . 30
2.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.1 Parameter Identi¬cation . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.2 Discrete Control Problem . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Exercises on Local Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 35

ix
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




x CONTENTS

3 Global Convergence 39
3.1 The Method of Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Line Search Methods and the Armijo Rule . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Stepsize Control with Polynomial Models . . . . . . . . . . . . . . . . 43
3.2.2 Slow Convergence of Steepest Descent . . . . . . . . . . . . . . . . . 45
3.2.3 Damped Gauss“Newton Iteration . . . . . . . . . . . . . . . . . . . . 47
3.2.4 Nonlinear Conjugate Gradient Methods . . . . . . . . . . . . . . . . . 48
3.3 Trust Region Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Changing the Trust Region and the Step . . . . . . . . . . . . . . . . . 51
3.3.2 Global Convergence of Trust Region Algorithms . . . . . . . . . . . . 52
3.3.3 A Unidirectional Trust Region Algorithm . . . . . . . . . . . . . . . . 54
3.3.4 The Exact Solution of the Trust Region Problem . . . . . . . . . . . . 55
3.3.5 The Levenberg“Marquardt Parameter . . . . . . . . . . . . . . . . . . 56
3.3.6 Superlinear Convergence: The Dogleg . . . . . . . . . . . . . . . . . . 58
3.3.7 A Trust Region Method for Newton“CG . . . . . . . . . . . . . . . . . 63
3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.1 Parameter Identi¬cation . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.2 Discrete Control Problem . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Exercises on Global Convergence . . . . . . . . . . . . . . . . . . . . . . . . 68

4 The BFGS Method 71
4.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1.1 Local Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1.2 Global Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.2 A BFGS“Armijo Algorithm . . . . . . . . . . . . . . . . . . . . . . . 80
4.3 Other Quasi-Newton Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.1 Parameter ID Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.2 Discrete Control Problem . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 Exercises on BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5 Simple Bound Constraints 87
5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2 Necessary Conditions for Optimality . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 Suf¬cient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4 The Gradient Projection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 91
5.4.1 Termination of the Iteration . . . . . . . . . . . . . . . . . . . . . . . 91
5.4.2 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.3 Identi¬cation of the Active Set . . . . . . . . . . . . . . . . . . . . . . 95
5.4.4 A Proof of Theorem 5.2.4 . . . . . . . . . . . . . . . . . . . . . . . . 96
5.5 Superlinear Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.5.1 The Scaled Gradient Projection Algorithm . . . . . . . . . . . . . . . . 96
5.5.2 The Projected Newton Method . . . . . . . . . . . . . . . . . . . . . . 100
5.5.3 A Projected BFGS“Armijo Algorithm . . . . . . . . . . . . . . . . . . 102
5.6 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.6.1 In¬nite-Dimensional Problems . . . . . . . . . . . . . . . . . . . . . . 106
5.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.7.1 Parameter ID Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.7.2 Discrete Control Problem . . . . . . . . . . . . . . . . . . . . . . . . 106


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




CONTENTS xi

5.8 Exercises on Bound Constrained Optimization . . . . . . . . . . . . . . . . . . 108


II Optimization of Noisy Functions 109
6 Basic Concepts and Goals 111
6.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2 The Simplex Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2.1 Forward Difference Simplex Gradient . . . . . . . . . . . . . . . . . . 113
6.2.2 Centered Difference Simplex Gradient . . . . . . . . . . . . . . . . . . 115
6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.1 Weber™s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.2 Perturbed Convex Quadratics . . . . . . . . . . . . . . . . . . . . . . 119
6.3.3 Lennard“Jones Problem . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.4 Exercises on Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7 Implicit Filtering 123
7.1 Description and Analysis of Implicit Filtering . . . . . . . . . . . . . . . . . . 123
7.2 Quasi-Newton Methods and Implicit Filtering . . . . . . . . . . . . . . . . . . 124
7.3 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.4 Implicit Filtering for Bound Constrained Problems . . . . . . . . . . . . . . . 126
7.5 Restarting and Minima at All Scales . . . . . . . . . . . . . . . . . . . . . . . 127
7.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.6.1 Weber™s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.6.2 Parameter ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.6.3 Convex Quadratics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.7 Exercises on Implicit Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 133

8 Direct Search Algorithms 135
8.1 The Nelder“Mead Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.1.1 Description and Implementation . . . . . . . . . . . . . . . . . . . . . 135
8.1.2 Suf¬cient Decrease and the Simplex Gradient . . . . . . . . . . . . . . 137
8.1.3 McKinnon™s Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.1.4 Restarting the Nelder“Mead Algorithm . . . . . . . . . . . . . . . . . 141
8.2 Multidirectional Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2.1 Description and Implementation . . . . . . . . . . . . . . . . . . . . . 143
8.2.2 Convergence and the Simplex Gradient . . . . . . . . . . . . . . . . . 144
8.3 The Hooke“Jeeves Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.3.1 Description and Implementation . . . . . . . . . . . . . . . . . . . . . 145
8.3.2 Convergence and the Simplex Gradient . . . . . . . . . . . . . . . . . 148
8.4 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.4.1 Surrogate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.4.2 The DIRECT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.5.1 Weber™s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.5.2 Parameter ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.5.3 Convex Quadratics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.6 Exercises on Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 159


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




xii CONTENTS

Bibliography 161

Index 177




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Preface

This book on unconstrained and bound constrained optimization can be used as a tutorial for
self-study or a reference by those who solve such problems in their work. It can also serve as a
textbook in an introductory optimization course.
As in my earlier book [154] on linear and nonlinear equations, we treat a small number of
methods in depth, giving a less detailed description of only a few (for example, the nonlinear
conjugate gradient method and the DIRECT algorithm). We aim for clarity and brevity rather
than complete generality and con¬ne our scope to algorithms that are easy to implement (by the
reader!) and understand.
One consequence of this approach is that the algorithms in this book are often special cases
of more general ones in the literature. For example, in Chapter 3, we provide details only
for trust region globalizations of Newton™s method for unconstrained problems and line search
globalizations of the BFGS quasi-Newton method for unconstrained and bound constrained
problems. We refer the reader to the literature for more general results. Our intention is that
both our algorithms and proofs, being special cases, are more concise and simple than others in
the literature and illustrate the central issues more clearly than a fully general formulation.
Part II of this book covers some algorithms for noisy or global optimization or both. There
are many interesting algorithms in this class, and this book is limited to those deterministic
algorithms that can be implemented in a more-or-less straightforward way. We do not, for
example, cover simulated annealing, genetic algorithms, response surface methods, or random
search procedures.
The reader of this book should be familiar with the material in an elementary graduate level
course in numerical analysis, in particular direct and iterative methods for the solution of linear
equations and linear least squares problems. The material in texts such as [127] and [264] is
suf¬cient.
A suite of MATLAB— codes has been written to accompany this book. These codes were
used to generate the computational examples in the book, but the algorithms do not depend
on the MATLAB environment and the reader can easily implement the algorithms in another
language, either directly from the algorithmic descriptions or by translating the MATLAB code.
The MATLAB environment is an excellent choice for experimentation, doing the exercises, and
small-to-medium-scale production work. Large-scale work on high-performance computers is
best done in another language. The reader should also be aware that there is a large amount of
high-quality software available for optimization. The book [195], for example, provides pointers
to several useful packages.
Parts of this book are based upon work supported by the National Science Foundation over
several years, most recently under National Science Foundation grants DMS-9321938, DMS-
9700569, and DMS-9714811, and by allocations of computing resources from the North Carolina
Supercomputing Center. Any opinions, ¬ndings, and conclusions or recommendations expressed
— MATLAB is a registered trademark of The MathWorks, Inc., 24 Prime Park Way, Natick, MA 01760, USA, (508)
653-1415, info@mathworks.com, http://www.mathworks.com.


xiii
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




xiv PREFACE

in this material are those of the author and do not necessarily re¬‚ect the views of the National
Science Foundation or of the North Carolina Supercomputing Center.
The list of students and colleagues who have helped me with this project, directly, through
collaborations/discussions on issues that I treat in the manuscript, by providing pointers to the
literature, or as a source of inspiration, is long. I am particularly indebted to Tom Banks, Jim
Banoczi, John Betts, David Bortz, Steve Campbell, Tony Choi, Andy Conn, Douglas Cooper, Joe
David, John Dennis, Owen Eslinger, J¨ rg Gablonsky, Paul Gilmore, Matthias Heinkenschloß,
o
Laura Helfrich, Lea Jenkins, Vickie Kearn, Carl and Betty Kelley, Debbie Lockhart, Casey Miller,
Jorge Mor´ , Mary Rose Muccie, John Nelder, Chung-Wei Ng, Deborah Poulson, Ekkehard
e
Sachs, Dave Shanno, Joseph Skudlarek, Dan Sorensen, John Strikwerda, Mike Tocci, Jon Tolle,
Virginia Torczon, Floria Tosca, Hien Tran, Margaret Wright, Steve Wright, and Kevin Yoemans.



C. T. Kelley
Raleigh, North Carolina




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




How to Get the Software

All computations reported in this book were done in MATLAB (version 5.2 on various SUN
SPARCstations and on an Apple Macintosh Powerbook 2400). The suite of MATLAB codes that
we used for the examples is available by anonymous ftp from ftp.math.ncsu.edu in the directory

FTP/kelley/optimization/matlab

or from SIAM™s World Wide Web server at

http://www.siam.org/books/fr18/

One can obtain MATLAB from
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098
(508) 647-7000
Fax: (508) 647-7001
E-mail: info@mathworks.com
WWW: http://www.mathworks.com




xv
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Part I

Optimization of Smooth Functions




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 1

Basic Concepts

1.1 The Problem
The unconstrained optimization problem is to minimize a real-valued function f of N variables.
By this we mean to ¬nd a local minimizer, that is, a point x— such that

f (x— ) ¤ f (x) for all x near x— .
(1.1)

It is standard to express this problem as

min f (x)
(1.2)
x

or to say that we seek to solve the problem min f . The understanding is that (1.1) means that we
seek a local minimizer. We will refer to f as the objective function and to f (x— ) as the minimum
or minimum value. If a local minimizer x— exists, we say a minimum is attained at x— .
We say that problem (1.2) is unconstrained because we impose no conditions on the inde-
pendent variables x and assume that f is de¬ned for all x.
The local minimization problem is different from (and much easier than) the global mini-
mization problem in which a global minimizer, a point x— such that

f (x— ) ¤ f (x) for all x,
(1.3)

is sought.
The constrained optimization problem is to minimize a function f over a set U ‚ RN . A
local minimizer, therefore, is an x— ∈ U such that

f (x— ) ¤ f (x) for all x ∈ U near x— .
(1.4)

Similar to (1.2) we express this as
min f (x)
(1.5)
x∈U

or say that we seek to solve the problem minU f . A global minimizer is a point x— ∈ U such
that
f (x— ) ¤ f (x) for all x ∈ U .
(1.6)
We consider only the simplest constrained problems in this book (Chapter 5 and §7.4) and refer
the reader to [104], [117], [195], and [66] for deeper discussions of constrained optimization
and pointers to software.
Having posed an optimization problem one can proceed in the classical way and use methods
that require smoothness of f . That is the approach we take in this ¬rst part of the book. These

3
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




4 ITERATIVE METHODS FOR OPTIMIZATION

methods can fail if the objective function has discontinuities or irregularities. Such nonsmooth
effects are common and can be caused, for example, by truncation error in internal calculations
for f , noise in internal probabilistic modeling in f , table lookup within f , or use of experimental
data in f . We address a class of methods for dealing with such problems in Part II.


1.2 Notation
In this book, following the convention in [154], vectors are to be understood as column vectors.
The vector x— will denote a solution, x a potential solution, and {xk }k≥0 the sequence of iterates.
We will refer to x0 as the initial iterate. x0 is sometimes timidly called the initial guess. We will
denote the ith component of a vector x by (x)i (note the parentheses) and the ith component
of xk by (xk )i . We will rarely need to refer to individual components of vectors. We will let
‚f /‚xi denote the partial derivative of f with respect to (x)i . As is standard [154], e = x ’ x—
will denote the error, en = xn ’ x— the error in the nth iterate, and B(r) the ball of radius r
about x—
B(r) = {x | e < r}.
For x ∈ RN we let ∇f (x) ∈ RN denote the gradient of f ,
∇f (x) = (‚f /‚x1 , . . . , ‚f /‚xN ),
when it exists.
We let ∇2 f denote the Hessian of f ,
(∇2 f )ij = ‚ 2 f /‚xi ‚xj ,
when it exists. Note that ∇2 f is the Jacobian of ∇f . However, ∇2 f has more structure than
a Jacobian for a general nonlinear function. If f is twice continuously differentiable, then the
Hessian is symmetric ((∇2 f )ij = (∇2 f )ji ) by equality of mixed partial derivatives [229].
In this book we will consistently use the Euclidean norm
N
(x)2 .
x= i
i=1

When we refer to a matrix norm we will mean the matrix norm induced by the Euclidean norm
Ax
A = max .
x
x=0

. 1
( : 32)



. . >>