k ’ ∞.

For the proofs of the above results, see [Wol69] and [Wol71].

7.2.7 Secant-like methods

In quasi-Newton methods the Hessian matrix H is replaced by a suitable

approximation. Precisely, the generic iterate is

x(k+1) = x(k) ’ B’1 ∇f (x(k) ) = x(k) + s(k) .

k

Assume that f : Rn ’ R is of class C 2 on an open convex set D ‚ Rn .

In such a case, H is symmetric and, as a consequence, approximants Bk

of H ought to be symmetric. Moreover, if Bk were symmetric at a point

x(k) , we would also like the next approximant Bk+1 to be symmetric at

x(k+1) = x(k) + s(k) .

To generate Bk+1 starting from Bk , consider the Taylor expansion

∇f (x(k) ) = ∇f (x(k+1) ) + Bk+1 (x(k) ’ x(k+1) ),

from which we get

Bk+1 s(k) = y(k) , with y(k) = ∇f (x(k+1) ) ’ ∇f (x(k) ).

Using again a series expansion of B, we end up with the following ¬rst-order

approximation of H

(y(k) ’ Bk s(k) )cT

Bk+1 = Bk + , (7.42)

cT s(k)

where c ∈ Rn and having assumed that cT s(k) = 0. We notice that taking

c = s(k) yields Broyden™s method, already discussed in Section 7.1.4 in the

case of systems of nonlinear equations.

Since (7.42) does not guarantee that Bk+1 is symmetric, it must be

suitably modi¬ed. A way for constructing a symmetric approximant Bk+1

310 7. Nonlinear Systems and Numerical Optimization

consists of choosing c = y(k) ’ Bk s(k) in (7.42), assuming that (y(k) ’

Bk s(k) )T s(k) = 0. By so doing, the following symmetric ¬rst-order approx-

imation is obtained

(y(k) ’ Bk s(k) )(y(k) ’ Bk s(k) )T

Bk+1 = Bk + . (7.43)

(y(k) ’ Bk s(k) )T s(k)

From a computational standpoint, disposing of an approximation for H

is not completely satisfactory, since the inverse of the approximation of

H appears in the iterative methods that we are dealing with. Using the

Sherman-Morrison formula (3.57), with Ck = B’1 , yields the following

k

recursive formula for the computation of the inverse

(s(k) ’ Ck y(k) )(s(k) ’ Ck y(k) )T

Ck+1 = Ck + , k = 0, 1, . . . (7.44)

(s(k) ’ Ck y(k) )T y(k)

having assumed that y(k) = Bs(k) , where B is a symmetric nonsingular

matrix, and that (s(k) ’ Ck y(k) )T y(k) = 0.

An algorithm that employs the approximations (7.43) or (7.44), is po-

tentially unstable when (s(k) ’ Ck y(k) )T y(k) 0, due to rounding errors.

For this reason, it is convenient to set up the previous scheme in a more

stable form. To this end, instead of (7.42), we introduce the approximation

(y(k) ’ Bk s(k) )cT

(1)

Bk+1 = Bk + ,

cT s(k)

(2)

then, we de¬ne Bk+1 as being the symmetric part

(1) (1)

Bk+1 + (Bk+1 )T

(2)

Bk+1 = .

2

The procedure can be iterated as follows

(2j)

(y(k) ’ Bk+1 s(k) )cT

(2j+1) (2j)

Bk+1 = Bk+1 + ,

cT s

(7.45)

(2j+1) (2j+1)

(Bk+1 )T

Bk+1 +

(2j+2)

Bk+1 =

2

(0)

with k = 0, 1, . . . and having set Bk+1 = Bk . It can be shown that the limit

as j tends to in¬nity of (7.45) is

(y(k) ’ Bk s(k) )cT + c(y(k) ’ Bk s(k) )T

(j)

lim B = Bk+1 = Bk +

cT s(k)

j’∞

(7.46)

(y(k) ’ Bk s(k) )T s(k) T

’ cc ,

(cT s(k) )2

7.3 Constrained Optimization 311

having assumed that cT s(k) = 0. If c = s(k) , the method employing (7.46)

is known as the symmetric Powell-Broyden method. Denoting by BSP B

the corresponding matrix Bk+1 , it can be shown that BSP B is the unique

solution to the problem:

¯ ¯

¬nd B such that B ’ B is minimized,

F

¯

where Bs(k) = y(k) and · F is the Frobenius norm.

As for the error made approximating H(x(k+1) ) with BSP B , it can be proved

that

BSP B ’ H(x(k+1) ) ¤ Bk ’ H(x(k) ) + 3L s(k) ,

F F

where it is assumed that H is Lipschitz continuous, with Lipschitz constant

L, and that the iterates x(k+1) and x(k) belong to D.

To deal with the particular case in which the Hessian matrix is not only

symmetric but also positive de¬nite, we refer to [DS83], Section 9.2.

7.3 Constrained Optimization

The simplest case of constrained optimization can be formulated as follows.

Given f : Rn ’ R,

minimize f (x), with x ∈ „¦ ‚ Rn . (7.47)

More precisely, the point x— is said to be a global minimizer in „¦ if it

satis¬es (7.47), while it is a local minimizer if ∃R > 0 such that

f (x— ) ¤ f (x), ∀x ∈ B(x— ; R) ‚ „¦.

Existence of solutions to problem (7.47) is, for instance, ensured by the

Weierstrass theorem, in the case in which f is continuous and „¦ is a closed

and bounded set. Under the assumption that „¦ is a convex set, the following

optimality conditions hold.

Property 7.9 Let „¦ ‚ Rn be a convex set, x— ∈ „¦ and f ∈ C 1 (B(x— ; R)),

for a suitable R > 0. Then:

1. if x— is a local minimizer of f then

∇f (x— )T (x ’ x— ) ≥ 0, ∀x ∈ „¦; (7.48)

2. moreover, if f is convex on „¦ (see (7.21)) and (7.48) is satis¬ed, then

x— is a global minimizer of f .

312 7. Nonlinear Systems and Numerical Optimization

We recall that f : „¦ ’ R is a strongly convex function if ∃ρ > 0 such

that

f [±x + (1 ’ ±)y] ¤ ±f (x) + (1 ’ ±)f (y) ’ ±(1 ’ ±)ρ x ’ y 2 , (7.49)

2

∀x, y ∈ „¦ and ∀± ∈ [0, 1]. The following result holds.

Property 7.10 Let „¦ ‚ Rn be a closed and convex set and f be a strongly

convex function in „¦. Then there exists a unique local minimizer x— ∈ „¦.

Throughout this section, we refer to [Avr76], [Ber82], [CCP70], [Lue73] and

[Man69], for the proofs of the quoted results and further details.

A remarkable instance of (7.47) is the following problem: given f : Rn ’ R,

minimize f (x), under the constraint that h(x) = 0, (7.50)

where h : Rn ’ Rm , with m ¤ n, is a given function of components

h1 , . . . , hm . The analogues of critical points in problem (7.50) are called

the regular points.

De¬nition 7.2 A point x— ∈ Rn , such that h(x— ) = 0, is said to be

regular if the column vectors of the Jacobian matrix Jh (x— ) are linearly

independent, having assumed that hi ∈ C 1 (B(x— ; R)), for a suitable R > 0

and i = 1, . . . , m.

Our aim now is to convert problem (7.50) into an unconstrained minimiza-

tion problem of the form (7.2), to which the methods introduced in Section

7.2 can be applied.

For this purpose, we introduce the Lagrangian function L : Rn+m ’ R

L(x, ») = f (x) + »T h(x),

where the vector » is called the Lagrange multiplier. Moreover, let us de-

note by JL the Jacobian matrix associated with L, but where the partial

derivatives are only taken with respect to the variables x1 , . . . , xn . The link

between (7.2) and (7.50) is then expressed by the following result.

Property 7.11 Let x— be a local minimizer for (7.50) and suppose that,

for a suitable R > 0, f, hi ∈ C 1 (B(x— ; R)), for i = 1, . . . , m. Then there

exists a unique vector »— ∈ Rm such that JL (x— , »— ) = 0.

Conversely, assume that x— ∈ Rn satis¬es h(x— ) = 0 and that, for a

suitable R > 0 and i = 1, . . . , m, f, hi ∈ C 2 (B(x— ; R)). Let HL be the

matrix of entries ‚ 2 L/‚xi ‚xj for i, j = 1, . . . , n. If there exists a vector

»— ∈ Rm such that JL (x— , »— ) = 0 and

zT HL (x— , »— )z > 0 ∀z = 0, ∇h(x— )T z = 0,

with

then x— is a strict local minimizer of (7.50).

7.3 Constrained Optimization 313

The last class of problems that we are going to deal with includes the case

where inequality constraints are also present, i.e.: given f : Rn ’ R,

minimize f (x), under the constraint that h(x) = 0 and g(x) ¤ 0,(7.51)

where h : Rn ’ Rm , with m ¤ n, and g : Rn ’ Rr are two given functions.

It is understood that g(x) ¤ 0 means gi (x) ¤ 0 for i = 1, . . . , r. Inequality

constraints give rise to some extra formal complication with respect to the

case previously examined, but do not prevent converting the solution of

(7.51) into the minimization of a suitable Lagrangian function.

In particular, De¬nition 7.2 becomes

De¬nition 7.3 Assume that hi , gj ∈ C 1 (B(x— ; R)) for a suitable R >

0 with i = 1, . . . , m and j = 1, . . . , r, and denote by J (x— ) the set of

indices j such that gj (x— ) = 0. A point x— ∈ Rn such that h(x— ) = 0 and

g(x— ) ¤ 0 is said to be regular if the column vectors of the Jacobian matrix

Jh (x— ) together with the vectors ∇gj (x— ), j ∈ J (x— ) form a set of linearly

independent vectors.

Finally, an analogue of Property 7.11 holds, provided that the following

Lagrangian function is used

M(x, », µ) = f (x) + »T h(x) + µT g(x)

instead of L and that further assumptions on the constraints are made.

For the sake of simplicity, we report in this case only the following nec-

essary condition for optimality of problem (7.51) to hold.

Property 7.12 Let x— be a regular local minimizer for (7.51) and suppose

that, for a suitable R > 0, f, hi , gj ∈ C 1 (B(x— ; R)) with i = 1, . . . , m,

j = 1, . . . , r. Then, there exist only two vectors »— ∈ Rm and µ— ∈ Rr ,

such that JM (x— , »— , µ— ) = 0 with µ— ≥ 0 and µ— gj (x— ) = 0 ∀j = 1, . . . , r.

j j

7.3.1 Kuhn-Tucker Necessary Conditions for Nonlinear

Programming

In this section we recall some results, known as Kuhn-Tucker conditions

[KT51], that ensure in general the existence of a local solution for the non-

linear programming problem. Under suitable assumptions they also guar-

antee the existence of a global solution. Throughout this section we suppose

that a minimization problem can always be reformulated as a maximization

one.

314 7. Nonlinear Systems and Numerical Optimization

Let us consider the general nonlinear programming problem:

given f : Rn ’ R,

maximize f (x), subject to

gi (x) ¤ bi i = 1, . . . , l,

(7.52)

gi (x) ≥ bi i = l + 1, . . . , k,

gi (x) = bi i = k + 1, . . . , m,

x ≥ 0.

A vector x that satis¬es the constraints above is called a feasible solution of

(7.52) and the set of the feasible solutions is called the feasible region. We

assume henceforth that f, gi ∈ C 1 (Rn ), i = 1, . . . , m, and de¬ne the sets

I= = {i : gi (x— ) = bi }, I= = {i : gi (x— ) = bi }, J= = {i : x— = 0}, J> =

i

{i : x— > 0}, having denoted by x— a local maximizer of f . We associate

i

with (7.52) the following Lagrangian

m m+n

L(x, ») = f (x) + »i [bi ’ gi (x)] ’ »i xi’m .

i=1 i=m+1

The following result can be proved.

Property 7.13 (Kuhn-Tucker conditions I and II) If f has a con-

strained local maximum at the point x = x— , it is necessary that a vector

»— ∈ Rm+n exists such that (¬rst Kuhn-Tucker condition)

∇x L(x— , »— ) ¤ 0,

where strict equality holds for every component i ∈ J> . Moreover (second

Kuhn-Tucker condition)

(∇x L(x— , »— )) x— = 0.

T

The other two necessary Kuhn-Tucker conditions are as follows.

Property 7.14 Under the same hypothesis as in Property 7.13, the third

Kuhn-Tucker condition requires that:

∇» L(x— , »— ) ≥ 0 i = 1, . . . , l,

∇» L(x— , »— ) ¤ 0 i = l + 1, . . . , k,

∇» L(x— , »— ) = 0 i = k + 1, . . . , m.

Moreover (fourth Kuhn-Tucker condition)

(∇» L(x— , »— )) x— = 0.

T

7.3 Constrained Optimization 315

It is worth noticing that the Kuhn-Tucker conditions hold provided that

the vector »— exists. To ensure this, it is necessary to introduce a further

geometric condition that is known as constraint quali¬cation (see [Wal75],

p. 48).

We conclude this section by the following fundamental theorem which

establishes when the Kuhn-Tucker conditions become also su¬cient for the

existence of a global maximizer for f .

Property 7.15 Assume that the function f in (7.52) is a concave func-

tion (i.e., ’f is convex) in the feasible region. Suppose also that the point

(x— , »— ) satis¬es all the Kuhn-Tucker necessary conditions and that the

functions gi for which »— > 0 are convex while those for which »— < 0 are

i i

concave. Then f (x— ) is the constrained global maximizer of f for problem

(7.52).

7.3.2 The Penalty Method

The basic idea of this method is to eliminate, partly or completely, the

constraints in order to transform the constrained problem into an uncon-

strained one. This new problem is characterized by the presence of a pa-

rameter that yields a measure of the accuracy at which the constraint is

actually imposed.

Let us consider the constrained problem (7.50), assuming we are search-

ing for the solution x— only in „¦ ‚ Rn . Suppose that such a problem admits

at least one solution in „¦ and write it in the following penalized form

minimize L± (x) for x ∈ „¦, (7.53)

where

1

L± (x) = f (x) + ± h(x) 2 .2

2

The function L± : Rn ’ R is called the penalized Lagrangian, and ± is

called the penalty parameter. It is clear that if the constraint was exactly

satis¬ed then minimizing f would be equivalent to minimizing L± .

The penalty method is an iterative technique for solving (7.53).

For k = 0, 1, . . . , until convergence, one must solve the sequence of prob-

lems

minimize L±k (x) with x ∈ „¦, (7.54)

where {±k } is an increasing monotonically sequence of positive penalty

parameters, such that ±k ’ ∞ as k ’ ∞. As a consequence, after choosing

±k , at each step of the penalty process we have to solve a minimization

problem with respect to the variable x, leading to a sequence of values x— ,

k

solutions to (7.54). By doing so, the objective function L±k (x) tends to

in¬nity, unless h(x) is equal to zero.

316 7. Nonlinear Systems and Numerical Optimization

The minimization problems can then be solved by one of the methods

introduced in Section 7.2. The following property ensures the convergence

of the penalty method in the form (7.53).

Property 7.16 Assume that f : Rn ’ R and h : Rn ’ Rm , with m ¤ n,

are continuous functions on a closed set „¦ ‚ Rn and suppose that the

sequence of penalty parameters ±k > 0 is monotonically divergent. Finally,

let x— be the global minimizer of problem (7.54) at step k. Then, taking

k

the limit as k ’ ∞, the sequence x— converges to x— , which is a global

k

minimizer of f in „¦ and satis¬es the constraint h(x— ) = 0.

Regarding the selection of the parameters ±k , it can be shown that large

values of ±k make the minimization problem in (7.54) ill-conditioned, thus

making its solution quite prohibitive unless the initial guess is particularly

close to x— . On the other hand, the sequence ±k must not grow too slowly,

since this would negatively a¬ect the overall convergence of the method.

A choice that is commonly made in practice is to pick up a not too large

value of ±0 and then set ±k = β±k’1 for k > 0, where β is an integer

number between 4 and 10 (see [Ber82]). Finally, the starting point for the

numerical method used to solve the minimization problem (7.54) can be

set equal to the last computed iterate.

The penalty method is implemented in Program 63. This requires as

input parameters the functions f, h, an initial value alpha0 for the penalty

parameter and the number beta.

Program 63 - lagrpen : Penalty method

function [x,vinc,nit]=lagrpen(x0,alpha0,beta,f,h,toll)

x = x0; [r,c]=size(h); vinc = 0;

for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end

norm2h=[™(™,h(1,1:c),™)ˆ2™];

for i=2:r, norm2h=[norm2h,™+(™,h(i,1:c),™)ˆ2™]; end

alpha = alpha0; options(1)=0; options(2)=toll*0.1; nit = 0;

while vinc > toll

g=[f,™+0.5*™,num2str(alpha,16),™*™,norm2h];

[x]=fmins(g,x,options);

vinc=0; nit = nit + 1;

for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end

alpha=alpha*beta;

end

Example 7.6 Let us employ the penalty method to compute the minimizer of

f (x) = 100(x2 ’ x2 )2 + (1 ’ x1 )2 under the constraint h(x) = (x1 + 0.5)2 +

1

(x2 + 0.5)2 ’ 0.25 = 0. The crosses in Figure 7.3 denote the sequence of iterates

computed by Program 63 starting from x(0) = (1, 1)T and choosing ±0 = 0.1, β =

6. The method converges in 12 iterations to the value x = (’0.2463, ’0.0691)T ,

satisfying the constraint up to a tolerance of 10’4 . •

7.3 Constrained Optimization 317

2

1.5

1

0.5