<< . .

. 45
( : 95)



. . >>

of f . If the sequence x(k) is bounded, then ∇f (x(k) ) tends to zero as
k ’ ∞.
For the proofs of the above results, see [Wol69] and [Wol71].


7.2.7 Secant-like methods
In quasi-Newton methods the Hessian matrix H is replaced by a suitable
approximation. Precisely, the generic iterate is

x(k+1) = x(k) ’ B’1 ∇f (x(k) ) = x(k) + s(k) .
k

Assume that f : Rn ’ R is of class C 2 on an open convex set D ‚ Rn .
In such a case, H is symmetric and, as a consequence, approximants Bk
of H ought to be symmetric. Moreover, if Bk were symmetric at a point
x(k) , we would also like the next approximant Bk+1 to be symmetric at
x(k+1) = x(k) + s(k) .
To generate Bk+1 starting from Bk , consider the Taylor expansion

∇f (x(k) ) = ∇f (x(k+1) ) + Bk+1 (x(k) ’ x(k+1) ),

from which we get

Bk+1 s(k) = y(k) , with y(k) = ∇f (x(k+1) ) ’ ∇f (x(k) ).

Using again a series expansion of B, we end up with the following ¬rst-order
approximation of H

(y(k) ’ Bk s(k) )cT
Bk+1 = Bk + , (7.42)
cT s(k)
where c ∈ Rn and having assumed that cT s(k) = 0. We notice that taking
c = s(k) yields Broyden™s method, already discussed in Section 7.1.4 in the
case of systems of nonlinear equations.
Since (7.42) does not guarantee that Bk+1 is symmetric, it must be
suitably modi¬ed. A way for constructing a symmetric approximant Bk+1
310 7. Nonlinear Systems and Numerical Optimization

consists of choosing c = y(k) ’ Bk s(k) in (7.42), assuming that (y(k) ’
Bk s(k) )T s(k) = 0. By so doing, the following symmetric ¬rst-order approx-
imation is obtained
(y(k) ’ Bk s(k) )(y(k) ’ Bk s(k) )T
Bk+1 = Bk + . (7.43)
(y(k) ’ Bk s(k) )T s(k)

From a computational standpoint, disposing of an approximation for H
is not completely satisfactory, since the inverse of the approximation of
H appears in the iterative methods that we are dealing with. Using the
Sherman-Morrison formula (3.57), with Ck = B’1 , yields the following
k
recursive formula for the computation of the inverse

(s(k) ’ Ck y(k) )(s(k) ’ Ck y(k) )T
Ck+1 = Ck + , k = 0, 1, . . . (7.44)
(s(k) ’ Ck y(k) )T y(k)

having assumed that y(k) = Bs(k) , where B is a symmetric nonsingular
matrix, and that (s(k) ’ Ck y(k) )T y(k) = 0.
An algorithm that employs the approximations (7.43) or (7.44), is po-
tentially unstable when (s(k) ’ Ck y(k) )T y(k) 0, due to rounding errors.
For this reason, it is convenient to set up the previous scheme in a more
stable form. To this end, instead of (7.42), we introduce the approximation

(y(k) ’ Bk s(k) )cT
(1)
Bk+1 = Bk + ,
cT s(k)
(2)
then, we de¬ne Bk+1 as being the symmetric part

(1) (1)
Bk+1 + (Bk+1 )T
(2)
Bk+1 = .
2
The procedure can be iterated as follows
(2j)
(y(k) ’ Bk+1 s(k) )cT
(2j+1) (2j)
Bk+1 = Bk+1 + ,
cT s
(7.45)
(2j+1) (2j+1)
(Bk+1 )T
Bk+1 +
(2j+2)
Bk+1 =
2
(0)
with k = 0, 1, . . . and having set Bk+1 = Bk . It can be shown that the limit
as j tends to in¬nity of (7.45) is

(y(k) ’ Bk s(k) )cT + c(y(k) ’ Bk s(k) )T
(j)
lim B = Bk+1 = Bk +
cT s(k)
j’∞
(7.46)
(y(k) ’ Bk s(k) )T s(k) T
’ cc ,
(cT s(k) )2
7.3 Constrained Optimization 311

having assumed that cT s(k) = 0. If c = s(k) , the method employing (7.46)
is known as the symmetric Powell-Broyden method. Denoting by BSP B
the corresponding matrix Bk+1 , it can be shown that BSP B is the unique
solution to the problem:
¯ ¯
¬nd B such that B ’ B is minimized,
F

¯
where Bs(k) = y(k) and · F is the Frobenius norm.
As for the error made approximating H(x(k+1) ) with BSP B , it can be proved
that

BSP B ’ H(x(k+1) ) ¤ Bk ’ H(x(k) ) + 3L s(k) ,
F F

where it is assumed that H is Lipschitz continuous, with Lipschitz constant
L, and that the iterates x(k+1) and x(k) belong to D.

To deal with the particular case in which the Hessian matrix is not only
symmetric but also positive de¬nite, we refer to [DS83], Section 9.2.



7.3 Constrained Optimization
The simplest case of constrained optimization can be formulated as follows.
Given f : Rn ’ R,

minimize f (x), with x ∈ „¦ ‚ Rn . (7.47)

More precisely, the point x— is said to be a global minimizer in „¦ if it
satis¬es (7.47), while it is a local minimizer if ∃R > 0 such that

f (x— ) ¤ f (x), ∀x ∈ B(x— ; R) ‚ „¦.

Existence of solutions to problem (7.47) is, for instance, ensured by the
Weierstrass theorem, in the case in which f is continuous and „¦ is a closed
and bounded set. Under the assumption that „¦ is a convex set, the following
optimality conditions hold.

Property 7.9 Let „¦ ‚ Rn be a convex set, x— ∈ „¦ and f ∈ C 1 (B(x— ; R)),
for a suitable R > 0. Then:

1. if x— is a local minimizer of f then

∇f (x— )T (x ’ x— ) ≥ 0, ∀x ∈ „¦; (7.48)

2. moreover, if f is convex on „¦ (see (7.21)) and (7.48) is satis¬ed, then
x— is a global minimizer of f .
312 7. Nonlinear Systems and Numerical Optimization

We recall that f : „¦ ’ R is a strongly convex function if ∃ρ > 0 such
that
f [±x + (1 ’ ±)y] ¤ ±f (x) + (1 ’ ±)f (y) ’ ±(1 ’ ±)ρ x ’ y 2 , (7.49)
2

∀x, y ∈ „¦ and ∀± ∈ [0, 1]. The following result holds.

Property 7.10 Let „¦ ‚ Rn be a closed and convex set and f be a strongly
convex function in „¦. Then there exists a unique local minimizer x— ∈ „¦.
Throughout this section, we refer to [Avr76], [Ber82], [CCP70], [Lue73] and
[Man69], for the proofs of the quoted results and further details.

A remarkable instance of (7.47) is the following problem: given f : Rn ’ R,
minimize f (x), under the constraint that h(x) = 0, (7.50)
where h : Rn ’ Rm , with m ¤ n, is a given function of components
h1 , . . . , hm . The analogues of critical points in problem (7.50) are called
the regular points.

De¬nition 7.2 A point x— ∈ Rn , such that h(x— ) = 0, is said to be
regular if the column vectors of the Jacobian matrix Jh (x— ) are linearly
independent, having assumed that hi ∈ C 1 (B(x— ; R)), for a suitable R > 0
and i = 1, . . . , m.
Our aim now is to convert problem (7.50) into an unconstrained minimiza-
tion problem of the form (7.2), to which the methods introduced in Section
7.2 can be applied.
For this purpose, we introduce the Lagrangian function L : Rn+m ’ R
L(x, ») = f (x) + »T h(x),
where the vector » is called the Lagrange multiplier. Moreover, let us de-
note by JL the Jacobian matrix associated with L, but where the partial
derivatives are only taken with respect to the variables x1 , . . . , xn . The link
between (7.2) and (7.50) is then expressed by the following result.

Property 7.11 Let x— be a local minimizer for (7.50) and suppose that,
for a suitable R > 0, f, hi ∈ C 1 (B(x— ; R)), for i = 1, . . . , m. Then there
exists a unique vector »— ∈ Rm such that JL (x— , »— ) = 0.
Conversely, assume that x— ∈ Rn satis¬es h(x— ) = 0 and that, for a
suitable R > 0 and i = 1, . . . , m, f, hi ∈ C 2 (B(x— ; R)). Let HL be the
matrix of entries ‚ 2 L/‚xi ‚xj for i, j = 1, . . . , n. If there exists a vector
»— ∈ Rm such that JL (x— , »— ) = 0 and
zT HL (x— , »— )z > 0 ∀z = 0, ∇h(x— )T z = 0,
with
then x— is a strict local minimizer of (7.50).
7.3 Constrained Optimization 313



The last class of problems that we are going to deal with includes the case
where inequality constraints are also present, i.e.: given f : Rn ’ R,

minimize f (x), under the constraint that h(x) = 0 and g(x) ¤ 0,(7.51)

where h : Rn ’ Rm , with m ¤ n, and g : Rn ’ Rr are two given functions.
It is understood that g(x) ¤ 0 means gi (x) ¤ 0 for i = 1, . . . , r. Inequality
constraints give rise to some extra formal complication with respect to the
case previously examined, but do not prevent converting the solution of
(7.51) into the minimization of a suitable Lagrangian function.
In particular, De¬nition 7.2 becomes

De¬nition 7.3 Assume that hi , gj ∈ C 1 (B(x— ; R)) for a suitable R >
0 with i = 1, . . . , m and j = 1, . . . , r, and denote by J (x— ) the set of
indices j such that gj (x— ) = 0. A point x— ∈ Rn such that h(x— ) = 0 and
g(x— ) ¤ 0 is said to be regular if the column vectors of the Jacobian matrix
Jh (x— ) together with the vectors ∇gj (x— ), j ∈ J (x— ) form a set of linearly
independent vectors.

Finally, an analogue of Property 7.11 holds, provided that the following
Lagrangian function is used

M(x, », µ) = f (x) + »T h(x) + µT g(x)

instead of L and that further assumptions on the constraints are made.
For the sake of simplicity, we report in this case only the following nec-
essary condition for optimality of problem (7.51) to hold.

Property 7.12 Let x— be a regular local minimizer for (7.51) and suppose
that, for a suitable R > 0, f, hi , gj ∈ C 1 (B(x— ; R)) with i = 1, . . . , m,
j = 1, . . . , r. Then, there exist only two vectors »— ∈ Rm and µ— ∈ Rr ,
such that JM (x— , »— , µ— ) = 0 with µ— ≥ 0 and µ— gj (x— ) = 0 ∀j = 1, . . . , r.
j j



7.3.1 Kuhn-Tucker Necessary Conditions for Nonlinear
Programming
In this section we recall some results, known as Kuhn-Tucker conditions
[KT51], that ensure in general the existence of a local solution for the non-
linear programming problem. Under suitable assumptions they also guar-
antee the existence of a global solution. Throughout this section we suppose
that a minimization problem can always be reformulated as a maximization
one.
314 7. Nonlinear Systems and Numerical Optimization

Let us consider the general nonlinear programming problem:

given f : Rn ’ R,
maximize f (x), subject to
gi (x) ¤ bi i = 1, . . . , l,
(7.52)
gi (x) ≥ bi i = l + 1, . . . , k,
gi (x) = bi i = k + 1, . . . , m,
x ≥ 0.

A vector x that satis¬es the constraints above is called a feasible solution of
(7.52) and the set of the feasible solutions is called the feasible region. We
assume henceforth that f, gi ∈ C 1 (Rn ), i = 1, . . . , m, and de¬ne the sets
I= = {i : gi (x— ) = bi }, I= = {i : gi (x— ) = bi }, J= = {i : x— = 0}, J> =
i
{i : x— > 0}, having denoted by x— a local maximizer of f . We associate
i
with (7.52) the following Lagrangian
m m+n
L(x, ») = f (x) + »i [bi ’ gi (x)] ’ »i xi’m .
i=1 i=m+1

The following result can be proved.

Property 7.13 (Kuhn-Tucker conditions I and II) If f has a con-
strained local maximum at the point x = x— , it is necessary that a vector
»— ∈ Rm+n exists such that (¬rst Kuhn-Tucker condition)

∇x L(x— , »— ) ¤ 0,

where strict equality holds for every component i ∈ J> . Moreover (second
Kuhn-Tucker condition)

(∇x L(x— , »— )) x— = 0.
T


The other two necessary Kuhn-Tucker conditions are as follows.

Property 7.14 Under the same hypothesis as in Property 7.13, the third
Kuhn-Tucker condition requires that:

∇» L(x— , »— ) ≥ 0 i = 1, . . . , l,
∇» L(x— , »— ) ¤ 0 i = l + 1, . . . , k,
∇» L(x— , »— ) = 0 i = k + 1, . . . , m.

Moreover (fourth Kuhn-Tucker condition)

(∇» L(x— , »— )) x— = 0.
T
7.3 Constrained Optimization 315

It is worth noticing that the Kuhn-Tucker conditions hold provided that
the vector »— exists. To ensure this, it is necessary to introduce a further
geometric condition that is known as constraint quali¬cation (see [Wal75],
p. 48).
We conclude this section by the following fundamental theorem which
establishes when the Kuhn-Tucker conditions become also su¬cient for the
existence of a global maximizer for f .

Property 7.15 Assume that the function f in (7.52) is a concave func-
tion (i.e., ’f is convex) in the feasible region. Suppose also that the point
(x— , »— ) satis¬es all the Kuhn-Tucker necessary conditions and that the
functions gi for which »— > 0 are convex while those for which »— < 0 are
i i
concave. Then f (x— ) is the constrained global maximizer of f for problem
(7.52).


7.3.2 The Penalty Method
The basic idea of this method is to eliminate, partly or completely, the
constraints in order to transform the constrained problem into an uncon-
strained one. This new problem is characterized by the presence of a pa-
rameter that yields a measure of the accuracy at which the constraint is
actually imposed.
Let us consider the constrained problem (7.50), assuming we are search-
ing for the solution x— only in „¦ ‚ Rn . Suppose that such a problem admits
at least one solution in „¦ and write it in the following penalized form

minimize L± (x) for x ∈ „¦, (7.53)

where
1
L± (x) = f (x) + ± h(x) 2 .2
2
The function L± : Rn ’ R is called the penalized Lagrangian, and ± is
called the penalty parameter. It is clear that if the constraint was exactly
satis¬ed then minimizing f would be equivalent to minimizing L± .
The penalty method is an iterative technique for solving (7.53).
For k = 0, 1, . . . , until convergence, one must solve the sequence of prob-
lems

minimize L±k (x) with x ∈ „¦, (7.54)

where {±k } is an increasing monotonically sequence of positive penalty
parameters, such that ±k ’ ∞ as k ’ ∞. As a consequence, after choosing
±k , at each step of the penalty process we have to solve a minimization
problem with respect to the variable x, leading to a sequence of values x— ,
k
solutions to (7.54). By doing so, the objective function L±k (x) tends to
in¬nity, unless h(x) is equal to zero.
316 7. Nonlinear Systems and Numerical Optimization

The minimization problems can then be solved by one of the methods
introduced in Section 7.2. The following property ensures the convergence
of the penalty method in the form (7.53).

Property 7.16 Assume that f : Rn ’ R and h : Rn ’ Rm , with m ¤ n,
are continuous functions on a closed set „¦ ‚ Rn and suppose that the
sequence of penalty parameters ±k > 0 is monotonically divergent. Finally,
let x— be the global minimizer of problem (7.54) at step k. Then, taking
k
the limit as k ’ ∞, the sequence x— converges to x— , which is a global
k
minimizer of f in „¦ and satis¬es the constraint h(x— ) = 0.
Regarding the selection of the parameters ±k , it can be shown that large
values of ±k make the minimization problem in (7.54) ill-conditioned, thus
making its solution quite prohibitive unless the initial guess is particularly
close to x— . On the other hand, the sequence ±k must not grow too slowly,
since this would negatively a¬ect the overall convergence of the method.
A choice that is commonly made in practice is to pick up a not too large
value of ±0 and then set ±k = β±k’1 for k > 0, where β is an integer
number between 4 and 10 (see [Ber82]). Finally, the starting point for the
numerical method used to solve the minimization problem (7.54) can be
set equal to the last computed iterate.
The penalty method is implemented in Program 63. This requires as
input parameters the functions f, h, an initial value alpha0 for the penalty
parameter and the number beta.

Program 63 - lagrpen : Penalty method
function [x,vinc,nit]=lagrpen(x0,alpha0,beta,f,h,toll)
x = x0; [r,c]=size(h); vinc = 0;
for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end
norm2h=[™(™,h(1,1:c),™)ˆ2™];
for i=2:r, norm2h=[norm2h,™+(™,h(i,1:c),™)ˆ2™]; end
alpha = alpha0; options(1)=0; options(2)=toll*0.1; nit = 0;
while vinc > toll
g=[f,™+0.5*™,num2str(alpha,16),™*™,norm2h];
[x]=fmins(g,x,options);
vinc=0; nit = nit + 1;
for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end
alpha=alpha*beta;
end


Example 7.6 Let us employ the penalty method to compute the minimizer of
f (x) = 100(x2 ’ x2 )2 + (1 ’ x1 )2 under the constraint h(x) = (x1 + 0.5)2 +
1
(x2 + 0.5)2 ’ 0.25 = 0. The crosses in Figure 7.3 denote the sequence of iterates
computed by Program 63 starting from x(0) = (1, 1)T and choosing ±0 = 0.1, β =
6. The method converges in 12 iterations to the value x = (’0.2463, ’0.0691)T ,
satisfying the constraint up to a tolerance of 10’4 . •
7.3 Constrained Optimization 317

2


1.5


1


0.5

<< . .

. 45
( : 95)



. . >>