0

’0.5

’1

’1.5

’2

’2 ’1.5 ’1 ’0.5 0 0.5 1 1.5 2

FIGURE 7.3. Convergence history of the penalty method in Example 7.6

7.3.3 The Method of Lagrange Multipliers

A variant of the penalty method makes use of (instead of L± (x) in (7.53))

the augmented Lagrangian function G± : Rm — Rn ’ R given by

1

G± (x, ») = f (x) + »T h(x) + ± h(x) 2 , (7.55)

2

2

where » ∈ Rm is a Lagrange multiplier. Clearly, if x— is a solution to prob-

lem (7.50), then it will also be a solution to (7.55), but with the advantage,

with respect to (7.53), of disposing of the further degree of freedom ». The

penalty method applied to (7.55) reads: for k = 0, 1, . . . , until convergence,

solve the sequence of problems

minimize G±k (x, »k ) for x ∈ „¦, (7.56)

where {»k } is a bounded sequence of unknown vectors in Rn , and the

parameters ±k are de¬ned as above (notice that if »k were zero, then we

would recover method (7.54)).

Property 7.16 also holds for method (7.56), provided that the multipliers

are assumed to be bounded. Notice that the existence of the minimizer

of (7.56) is not guaranteed, even in the case where f has a unique global

minimizer (see Example 7.7). This circumstance can be overcome by adding

further non quadratic terms to the augmented Lagrangian function (e.g.,

of the form h p , with p large).

2

318 7. Nonlinear Systems and Numerical Optimization

Example 7.7 Let us ¬nd the minimizer of f (x) = ’x4 under the constraint

x = 0. Such problem clearly admits the solution x— = 0. If, instead, one considers

the augmented Lagrangian function

1

L±k (x, »k ) = ’x4 + »k x + ±k x2 ,

2

one ¬nds that it no longer admits a minimum at x = 0, though vanishing there,

•

for any ±k di¬erent from zero.

As far as the choice of the multipliers is concerned, the sequence of vectors

»k is typically assigned by the following formula

»k+1 = »k + ±k h(x(k) ),

where »0 is a given value while the sequence of ±k can be set a priori or

modi¬ed during run-time.

As for the convergence properties of the method of Lagrange multipliers,

the following local result holds.

Property 7.17 Assume that x— is a regular strict local minimizer of (7.50)

and that:

1. f, hi ∈ C 2 (B(x— ; R)) with i = 1, . . . , m and for a suitable R > 0;

2. the pair (x— , »— ) satis¬es zT HG0 (x— , »— )z > 0, ∀z = 0 such that

Jh (x— )T z = 0;

3. ∃¯ > 0 such that HG± (x— , »— ) > 0.

± ¯

Then, there exist three positive scalars δ, γ and M such that, for any pair

(», ±) ∈ V = (», ±) ∈ Rm+1 : » ’ »— 2 < δ±, ± ≥ ± , the problem

¯

minimize G± (x, »), with x ∈ B(x— ; γ),

admits a unique solution x(», ±), di¬erentiable with respect to its argu-

ments. Moreover, ∀(», ±) ∈ V

¤ M » ’ »— 2 .

x(», ±) ’ x— 2

Under further assumptions (see [Ber82], Proposition 2.7), it can be proved

that the Lagrange multipliers method converges. Moreover, if ±k ’ ∞, as

k ’ ∞, then

»k+1 ’ »— 2

lim = 0.

»k ’ »— 2

k’∞

and the convergence of the method is more than linear. In the case where

the sequence ±k has an upper bound, the method converges linearly.

7.4 Applications 319

Finally, we notice that, unlike the penalty method, it is no longer nec-

essary that the sequence of ±k tends to in¬nity. This, in turn, limits the

ill-conditioning of problem (7.56) as ±k is growing. Another advantage con-

cerns the convergence rate of the method, which turns out to be indepen-

dent of the growth rate of the penalty parameter, in the case of the Lagrange

multipliers technique. This of course implies a considerable reduction of the

computational cost.

The method of Lagrange multipliers is implemented in Program 64. Com-

pared with Program 63, this further requires in input the initial value

lambda0 of the multiplier.

Program 64 - lagrmult : Method of Lagrange multipliers

function [x,vinc,nit]=lagrmult(x0,lambda0,alpha0,beta,f,h,toll)

x = x0; [r,c]=size(h); vinc = 0; lambda = lambda0;

for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end

norm2h=[™(™,h(1,1:c),™)ˆ2™];

for i=2:r, norm2h=[norm2h,™+(™,h(i,1:c),™)ˆ2™]; end

alpha = alpha0; options(1)=0; options(2)=toll*0.1; nit = 0;

while vinc > toll

lh=[™(™,h(1,1:c),™)*™,num2str(lambda(1))];

for i=2:r, lh=[lh,™+(™,h(i,1:c),™)*™,num2str(lambda(i))];

end

g=[f,™+0.5*™,num2str(alpha,16),™*™,norm2h,™+™,lh];

[x]=fmins(g,x,options);

vinc=0; nit = nit + 1;

for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end

alpha=alpha*beta;

for i=1:r, lambda(i)=lambda(i)+alpha*eval(h(i,1:c)); end

end

Example 7.8 We use the method of Lagrange multipliers to solve the prob-

lem presented in Example 7.6. Set » = 10 and leave the remaining parameters

unchanged. The method converges in 6 iterations and the crosses in Figure 7.4

show the iterates computed by Program 64. The constraint is here satis¬ed up

•

to machine precision.

7.4 Applications

The two applications of this section are concerned with nonlinear systems

arising in the simulation of the electric potential in a semiconductor device

and in the triangulation of a two-dimensional polygon.

320 7. Nonlinear Systems and Numerical Optimization

1

0.5

0

’0.5

’1

’1 ’0.5 0 0.5 1

FIGURE 7.4. Convergence history for the method of Lagrange multipliers in

Example 7.8

7.4.1 Solution of a Nonlinear System Arising from

Semiconductor Device Simulation

Let us consider the nonlinear system in the unknown u ∈ Rn

F(u) = Au + φ(u) ’ b = 0, (7.57)

where A = (»/h)2 tridiagn (’1, 2’1), for h = 1/(n+1), φi (u) = 2K sinh(ui )

for i = 1, . . . , n, where » and K are two positive constants and b ∈ Rn is

a given vector. Problem (7.57) arises in the numerical simulation of semi-

conductor devices in microelectronics, where u and b represent electric

potential and doping pro¬le, respectively.

In Figure 7.5 (left) we show schematically the particular device consid-

ered in the numerical example, a p ’ n junction diode of unit normalized

length, subject to an external bias V = Vb ’ Va , together with the doping

pro¬le of the device, normalized to 1 (right). Notice that bi = b(xi ), for

i = 1, . . . , n, where xi = ih. The mathematical model of the problem at

hand comprises a nonlinear Poisson equation for the electric potential and

two continuity equations of advection-di¬usion type, as those addressed in

Chapter 12, for the current densities. For the complete derivation of the

model and its analysis see, for instance, [Mar86] and [Jer96].

Solving system (7.57) corresponds to ¬nding the minimizer in Rn of the

function f : Rn ’ R de¬ned as

n

1

cosh(ui )) ’ bT u.

f (u) = uT Au + 2 (7.58)

2 i=1

7.4 Applications 321

b(x)

1

p n

x

0 L

’ ’1

+

∆V

FIGURE 7.5. Scheme of a semiconductor device (left); doping pro¬le (right)

It can be checked (see Exercise 5) that for any u, v ∈ Rn with u = v and

for any » ∈ (0, 1)

»f (u) + (1 ’ »)f (v) ’ f (»u + (1 ’ »)v) > (1/2)»(1 ’ ») u ’ v 2

A,

where · A denotes the energy norm introduced in (1.28). This implies

that f (u) is an uniformly convex function in Rn , that is, it strictly satis¬es

(7.49) with ρ = 1/2.

Property 7.10 ensures, in turn, that the function in (7.58) admits a unique

minimizer u— ∈ Rn and it can be shown (see Theorem 14.4.3, p. 503 [OR70])

that there exists a sequence {±k } such that the iterates of the damped

Newton method introduced in Section 7.2.6 converge to u— ∈ Rn (at least)

superlinearly.

Thus, using the damped Newton method for solving system (7.57) leads to

the following sequence of linearized problems:

given u(0) ∈ Rn , ∀k ≥ 0 solve

(k)

A + 2K diagn (cosh(ui )) δu(k) = b ’ Au(k) + φ(u(k) ) , (7.59)

then set u(k+1) = u(k) + ±k δu(k) .

Let us now address two possible choices of the acceleration parameters

±k . The ¬rst one has been proposed in [BR81] and is

1

±k = , k = 0, 1, . . . , (7.60)

F(u(k) )

1 + ρk

where · denotes a vector norm, for instance · = · ∞ , and the

coe¬cients ρk ≥ 0 are suitable acceleration parameters picked in such a

way that the descent condition F(u(k) + ±k δu(k) ) ∞ < F(u(k) ) ∞ is

satis¬ed (see [BR81] for the implementation details of the algorithm).

322 7. Nonlinear Systems and Numerical Optimization

We notice that, as F(u(k) ) ∞ ’ 0, (7.60) yields ±k ’ 1, thus recov-

ering the full (quadratic) convergence of Newton™s method. Otherwise, as

typically happens in the ¬rst iterations, F(u(k) ) ∞ 1 and ±k is quite

close to zero, with a strong reduction of the Newton variation (damping).

As an alternative to (7.60), the sequence {±k } can be generated using the

simpler formula, suggested in [Sel84], Chapter 7

±k = 2’i(i’1)/2 , k = 0, 1, . . . , (7.61)

where i is the ¬rst integer in the interval [1, Itmax ] such that the descent

condition above is satis¬ed, Itmax being the maximum admissible number

of damping cycles for any Newton™s iteration (¬xed equal to 10 in the

numerical experiments).

As a comparison, both damped and standard Newton™s methods have been

implemented, the former one with both choices (7.60) and (7.61) for the

coe¬cients ±k . In the case of Newton™s method, we have set in (7.59) ±k = 1

for any k ≥ 0.

The numerical examples have been performed with n = 49, bi = ’1 for

i ¤ n/2 and the remaining values bi equal to 1. Moreover, we have taken

»2 = 1.67 · 10’4 , K = 6.77 · 10’6 and ¬xed the ¬rst n/2 components of the

initial vector u(0) equal to Va and the remaining ones equal to Vb , where

Va = 0 and Vb = 10.

The tolerance on the maximum change between two successive iterates,

which monitors the convergence of damped Newton™s method (7.59), has

been set equal to 10’4 .

4

10 1

0.9

2

10 0.8

0.7

0

10 0.6

(1)

0.5

’2

10 0.4

0.3

(2)

’4

10 0.2

(3) 0.1

’6

10 0

0 1 2 0 2 4 6 8 10

10 10 10

FIGURE 7.6. Absolute error (left) and damping parameters ±k (right). The error

curve for standard Newton™s method is denoted by (1), while (2) and (3) refer to

damped Newton™s method with the choices (7.61) and (7.60) for the coe¬cients

±k , respectively

Figure 7.6 (left) shows the log-scale absolute error for the three algorithms

as functions of the iteration number. Notice the rapid convergence of the

7.4 Applications 323

damped Newton™s method (8 and 10 iterations in the case of (7.60) and

(7.61), respectively), compared with the extremely slow convergence of the

standard Newton™s method (192 iterations). Moreover, it is interesting to

analyze in Figure 7.6 (right) the plot of the sequences of parameters ±k as

functions of the iteration number.

The starred and the circled curves refer to the choices (7.60) and (7.61)

for the coe¬cients ±k , respectively. As previously observed, the ±k ™s start

from very small values, to converge quickly to 1 as the damped Newton

method (7.59) enters the attraction region of the minimizer x— .

7.4.2 Nonlinear Regularization of a Discretization Grid

In this section we go back to the problem of regularizing a discretization

grid that has been introduced in Section 3.14.2. There, we considered the

technique of barycentric regularization, which leads to solving a linear sys-

tem, typically of large size and featuring a sparse coe¬cient matrix.

In this section we address two alternative techniques, denoted as reg-

ularization by edges and by areas. The main di¬erence with respect to

the method described in Section 3.14.2 lies in the fact that these new ap-

proaches lead to systems of nonlinear equations.

Using the notation of Section 3.14.2, for each pair of nodes xj , xk ∈ Zi ,

denote by ljk the edge on the boundary ‚Pi of Pi which connects them

and by xjk the midpoint of ljk , while for each triangle T ∈ Pi we denote

by xb,T the centroid of T . Moreover, let ni = dim(Zi ) and denote for any

geometric entity (side or triangle) by | · | its measure in R1 or R2 .

In the case of regularization by edges, we let

«

xi = xjk |ljk | /|‚Pi |, ∀xi ∈ Nh , (7.62)

ljk ∈‚Pi

while in the case of regularization by areas, we let

xb,T |T | /|Pi |, ∀xi ∈ Nh .

xi = (7.63)

T ∈Pi

(‚D)

if xi ∈

In both the regularization procedures we assume that xi = xi

‚D, that is, the nodes lying on the boundary of the domain D are ¬xed. Let-

ting n = N ’Nb be the number of internal nodes, relation (7.62) amounts to

solving the following two systems of nonlinear equations for the coordinates

324 7. Nonlinear Systems and Numerical Optimization

{xi } and {yi } of the internal nodes, with i = 1, . . . , n

«

1

xi ’ (xj + xk )|ljk | / |ljk | = 0,

2

« ljk ∈‚Pi ljk ∈‚Pi (7.64)

1

yi ’ (yj + yk )|ljk | / |ljk | = 0.

2

ljk ∈‚Pi ljk ∈‚Pi

Similarly, (7.63) leads to the following nonlinear systems, for i = 1, . . . , n

1

xi ’ (x1,T + x2,T + x3,T )|T | / |T | = 0,

3

T ∈Pi T ∈Pi

(7.65)

1

yi ’ (y1,T + y2,T + y3,T )|T | / |T | = 0,

3

T ∈Pi T ∈Pi

where xs,T = (xs,T , ys,T ), for s = 1, 2, 3, are the coordinates of the vertices

of each triangle T ∈ Pi . Notice that the nonlinearity of systems (7.64) and

(7.65) is due to the presence of terms |ljk | and |T |.

Both systems (7.64) and (7.65) can be cast in the form (7.1), denoting,

as usual, by fi the i-th nonlinear equation of the system, for i = 1, . . . , n.

The complex functional dependence of fi on the unknowns makes it pro-

hibitive to use Newton™s method (7.4), which would require the explicit

computation of the Jacobian matrix JF .

A convenient alternative is provided by the nonlinear Gauss-Seidel method

(see [OR70], Chapter 7), which generalizes the corresponding method pro-

posed in Chapter 4 for linear systems and can be formulated as follows.

Denote by zi , for i = 1, . . . , n, either of the unknown xi or yi . Given the

(0) (0)

initial vector z(0) = (z1 , . . . , zn )T , for k = 0, 1, . . . until convergence,

solve

(k+1) (k+1) (k) (k)

fi (z1 , . . . , zi’1 , ξ, zi+1 , . . . , zn ) = 0, i = 1, . . . , n, (7.66)