LAB software package developed in [Kel99].

7.1 Solution of Systems of Nonlinear Equations 287

3. Di¬erence approximations of the Jacobian matrix

Another possibility consists of replacing JF (x(k) ) (whose explicit compu-

tation is often very expensive) with an approximation through n-dimensional

di¬erences of the form

(k)

F(x(k) + hj ej ) ’ F(x(k) )

(k)

∀k ≥ 0,

(Jh )j = , (7.9)

(k)

hj

(k)

where ej is the j-th vector of the canonical basis of Rn and hj > 0 are

increments to be suitably chosen at each step k of the iteration (7.4). The

following result can be shown.

Property 7.1 Let F and x— be such that the hypotheses of Theorem 7.1

are ful¬lled, where · denotes the · 1 vector norm and the corresponding

induced matrix norm. If there exist two positive constants µ and h such

that x(0) ∈ B(x— , µ) and 0 < |hj | ¤ h for j = 1, . . . , n then the sequence

(k)

de¬ned by

’1

(k)

x(k+1) = x(k) ’ Jh F(x(k) ), (7.10)

is well de¬ned and converges linearly to x— . Moreover, if there exists a

positive constant C such that max|hj | ¤ C x(k) ’ x— or, equivalently,

(k)

j

(k)

there exists a positive constant c such that max|hj | ¤ c F(x(k) ) , then

j

the sequence (7.10) is convergent quadratically.

This result does not provide any constructive indication as to how to com-

(k)

pute the increments hj . In this regard, the following remarks can be made.

(k)

The ¬rst-order truncation error with respect to hj , which arises from the

(k)

divided di¬erence (7.10), can be reduced by reducing the sizes of hj . On

(k)

the other hand, a too small value for hj can lead to large rounding er-

rors. A trade-o¬ must therefore be made between the need of limiting the

truncation errors and ensuring a certain accuracy in the computations.

A possible choice is to take

√

(k) (k)

M max |xj |, Mj sign(xj ),

hj =

where Mj is a parameter that characterizes the typical size of the com-

ponent xj of the solution. Further improvements can be achieved using

higher-order divided di¬erences to approximate derivatives, like

(k) (k)

F(x(k) + hj ej ) ’ F(x(k) ’ hj ej )

(k)

∀k ≥ 0.

(Jh )j = ,

(k)

2hj

For further details on this subject, see, for instance, [BS90].

288 7. Nonlinear Systems and Numerical Optimization

7.1.3 Quasi-Newton Methods

By this term, we denote all those schemes in which globally convergent

methods are coupled with Newton-like methods that are only locally con-

vergent, but with an order greater than one.

In a quasi-Newton method, given a continuously di¬erentiable function

F : Rn ’ Rn , and an initial value x(0) ∈ Rn , at each step k one has to

accomplish the following operations:

1. compute F(x(k) );

˜

2. choose JF (x(k) ) as being either the exact JF (x(k) ) or an approxima-

tion of it;

˜

3. solve the linear system JF (x(k) )δx(k) = ’F(x(k) );

4. set x(k+1) = x(k) + ±k δx(k) , where ±k are suitable damping parame-

ters.

Step 4. is thus the characterizing element of this family of methods. It will

be addressed in Section 7.2.6, where a criterion for selecting the “direction”

δx(k) will be provided.

7.1.4 Secant-like Methods

These methods are constructed starting from the secant method introduced

in Section 6.2 for scalar functions. Precisely, given two vectors x(0) and x(1) ,

at the generic step k ≥ 1 we solve the linear system

Qk δx(k+1) = ’F(x(k) ) (7.11)

and we set x(k+1) = x(k) + δx(k+1) . Qk is an n — n matrix such that

Qk δx(k) = F(x(k) ) ’ F(x(k’1) ) = b(k) , k ≥ 1,

and is obtained by a formal generalization of (6.13). However, the algebraic

relation above does not su¬ce to uniquely determine Qk . For this purpose

we require Qk for k ≥ n to be a solution to the following set of n systems

Qk x(k) ’ x(k’j) = F(x(k) ) ’ F(x(k’j) ), j = 1, . . . , n. (7.12)

If the vectors x(k’j) , . . . , x(k) are linearly independent, system (7.12) allows

for calculating all the unknown coe¬cients {(Qk )lm , l, m = 1, . . . , n} of

Qk . Unfortunately, in practice the above vectors tend to become linearly

dependent and the resulting scheme is unstable, not to mention the need

for storing all the previous n iterates.

For these reasons, an alternative approach is pursued which aims at pre-

serving the information already provided by the method at step k. Precisely,

7.1 Solution of Systems of Nonlinear Equations 289

Qk is looked for in such a way that the di¬erence between the following

linear approximants to F(x(k’1) ) and F(x(k) ), respectively

F(x(k) ) + Qk (x ’ x(k) ), F(x(k’1) ) + Qk’1 (x ’ x(k’1) ),

is minimized jointly with the constraint that Qk satis¬es system (7.12).

Using (7.12) with j = 1, the di¬erence between the two approximants is

found to be

dk = (Qk ’ Qk’1 ) x ’ x(k’1) . (7.13)

Let us decompose the vector x ’ x(k’1) as

x ’ x(k’1) = ±δx(k) + s,

where ± ∈ R and sT δx(k) = 0. Therefore, (7.13) becomes

dk = ± (Qk ’ Qk’1 ) δx(k) + (Qk ’ Qk’1 ) s.

Only the second term in the relation above can be minimized since the ¬rst

one is independent of Qk , being

(Qk ’ Qk’1 )δx(k) = b(k) ’ Qk’1 δx(k) .

The problem has thus become: ¬nd the matrix Qk such that (Qk ’ Qk’1 ) s

is minimized ∀s orthogonal to δx(k) with the constraint that (7.12) holds.

It can be shown that such a matrix exists and can be recursively computed

as follows

T

(b(k) ’ Qk’1 δx(k) )δx(k)

Qk = Qk’1 + . (7.14)

T

δx(k) δx(k)

The method (7.11), with the choice (7.14) of matrix Qk is known as the

Broyden method. To initialize (7.14), we set Q0 equal to the matrix JF (x(0) )

or to any approximation of it, for instance, the one yielded by (7.9). As for

the convergence of Broyden™s method, the following result holds.

Property 7.2 If the assumptions of Theorem 7.1 are satis¬ed and there

exist two positive constants µ and γ such that

x(0) ’ x— ¤ µ, Q0 ’ JF (x— ) ¤ γ,

then the sequence of vectors x(k) generated by Broyden™s method is well

de¬ned and converges superlinearly to x— , that is

x(k) ’ x— ¤ ck x(k’1) ’ x— (7.15)

where the constants ck are such that lim ck = 0.

k’∞

290 7. Nonlinear Systems and Numerical Optimization

Under further assumptions, it is also possible to prove that the sequence

Qk converges to JF (x— ), a property that does not necessarily hold for the

above method as demonstrated in Example 7.3.

There exist several variants to Broyden™s method which aim at reducing

its computational cost, but are usually less stable (see [DS83], Chapter 8).

Program 58 implements Broyden™s method (7.11)-(7.14). We have denoted

by Q the initial approximation Q0 in (7.14).

Program 58 - broyden : Broyden™s method for nonlinear systems

function [x,it]=broyden(x,Q,nmax,toll,f)

[n,m]=size(f); it=0; err=1;

fk=zeros(n,1); fk1=fk;

for i=1:n, fk(i)=eval(f(i,:)); end

while it < nmax & err > toll

s=-Q \ fk; x=s+x; err=norm(s,inf);

if err > toll

for i=1:n, fk1(i)=eval(f(i,:)); end

Q=Q+1/(s™*s)*fk1*s™

end

it=it+1; fk=fk1;

end

Example 7.2 Let us solve using Broyden™s method the nonlinear system of Ex-

ample 7.1. The method converges in 35 iterations to the value (0.7 · 10’8 , 0.7 ·

10’8 )T compared with the 26 iterations required by Newton™s method starting

from the same initial guess (x(0) = (0.1, 0.1)T ). The matrix Q0 has been set equal

to the Jacobian matrix evaluated at x(0) . Figure 7.1 shows the behavior of the

•

Euclidean norm of the error for both methods.

Example 7.3 Suppose we wish to solve using the Broyden method the nonlinear

system F(x) = (x1 +x2 ’3; x2 +x2 ’9)T = 0. This system admits the two solutions

1 2

(0, 3)T and (3, 0)T . Broyden™s method converges in 8 iterations to the solution

(0, 3)T starting from x(0) = (2, 4)T . However, the sequence of Qk , stored in the

variable Q of Program 58, does not converge to the Jacobian matrix, since

1 1 1 1

lim Q(k) = = JF [(0, 3)T ] = .

1.5 1.75 0 6

k’∞

•

7.1.5 Fixed-point Methods

We conclude the analysis of methods for solving systems of nonlinear equa-

tions by extending to n-dimensions the ¬xed-point techniques introduced

in the scalar case. For this, we reformulate problem (7.1) as

given G : Rn ’ Rn , ¬nd x— ∈ Rn such that G(x— ) = x— (7.16)

7.1 Solution of Systems of Nonlinear Equations 291

’1

10

’2

10

’3

10

’4

10

’5

10

’6

10

’7

10

’8

10

’9

10

0 5 10 15 20 25 30 35 40

FIGURE 7.1. Euclidean norm of the error for the Newton method (solid line)

and the Broyden method (dashed line) in the case of the nonlinear system of

Example 7.1

where G is related to F through the following property: if x— is a ¬xed

point of G, then F(x— ) = 0.

Analogously to what was done in Section 6.3, we introduce iterative meth-

ods for the solution of (7.16) of the form:

given x(0) ∈ Rn , for k = 0, 1, . . . until convergence, ¬nd

x(k+1) = G(x(k) ). (7.17)

In order to analyze the convergence of the ¬xed-point iteration (7.17) the

following de¬nition will be useful.

De¬nition 7.1 A mapping G : D ‚ Rn ’ Rn is contractive on a set

D0 ‚ D if there exists a constant ± < 1 such that G(x) ’ G(y) ¤

± x ’ y for all x, y in D0 where · is a suitable vector norm.

The existence and uniqueness of a ¬xed point for G is ensured by the

following theorem.

Theorem 7.2 (contraction-mapping theorem) Suppose that G : D ‚

Rn ’ Rn is contractive on a closed set D0 ‚ D and that G(x) ‚ D0 for

all x ∈ D0 . Then G has a unique ¬xed point in D0 .

Proof. Let us ¬rst prove the uniqueness of the ¬xed point. For this, assume that

there exist two distinct ¬xed points, x— , y— . Then

x— ’ y— = G(x— ) ’ G(y— ) ¤ ± x— ’ y—

from which (1 ’ ±) x— ’ y— ¤ 0. Since (1 ’ ±) > 0, it must necessarily be that

x— ’ y— = 0, i.e., x— = y— .

292 7. Nonlinear Systems and Numerical Optimization

To prove the existence we show that x(k) given by (7.17) is a Cauchy sequence.

This in turn implies that x(k) is convergent to a point x(—) ∈ D0 . Take x(0)

arbitrarily in D0 . Then, since the image of G is included in D0 , the sequence x(k)

is well de¬ned and

x(k+1) ’ x(k) = G(x(k) ) ’ G(x(k’1) ) ¤ ± x(k) ’ x(k’1) .

After p steps, p ≥ 1, we obtain

p

’x ¤ x(k+i) ’ x(k+i’1) ¤ ±p’1 + . . . + 1 x(k+1) ’ x(k)

(k+p) (k)

x

i=1

±k

¤ x(1) ’ x(0) .

1’±

Owing to the continuity of G it follows that lim G(x(k) ) = G(x(—) ) which proves

k’∞

3

(—)

that x is a ¬xed point for G.

The following result provides a su¬cient condition for the iteration (7.17) to

converge (for the proof see [OR70], pp. 299-301), and extends the analogous

Theorem 6.3 in the scalar case.

Property 7.3 Suppose that G : D ‚ Rn ’ Rn has a ¬xed point x— in the

interior of D and that G is continuously di¬erentiable in a neighborhood of

x— . Denote by JG the Jacobian matrix of G and assume that ρ(JG (x(—) )) <

1. Then there exists a neighborhood S of x— such that S ‚ D and, for any

x(0) ∈ S, the iterates de¬ned by (7.17) all lie in D and converge to x— .

As usual, since the spectral radius is the in¬mum of the induced matrix

norms, in order for convergence to hold it su¬ces to check that JG (x) < 1

for some matrix norm.

Example 7.4 Consider the nonlinear system

T

F(x) = x2 + x2 ’ 1, 2x1 + x2 ’ 1 = 0,

1 2

whose solutions are x— = (0, 1)T and x— = (4/5, ’3/5)T . To solve it, let us use

1 2

two ¬xed-point schemes, respectively de¬ned by the following iteration functions

® ®

1 ’ x2 1 ’ x2

G1 (x) = ° » , G2 (x) = ° ».

2 2 (7.18)

1 ’ x1 ’ 1 ’ x1

2 2

It can be checked that Gi (x— ) = x— for i = 1, 2 and that the Jacobian matrices

i i

of G1 and G2 , evaluated at x— and x— respectively, are

1 2

® ®

0 ’1 0 ’1

2 2

JG1 (x— ) = ° » , JG2 (x— ) = ° ».

1 2

4

0 0 0

3

7.1 Solution of Systems of Nonlinear Equations 293

The spectral radii are ρ(JG1 (x— )) = 0 and ρ(JG2 (x— )) = 2/3 0.817 < 1 so

1 2

that both methods are convergent in a suitable neighborhood of their respective

¬xed points.

Running Program 59, with a tolerance of 10’10 on the maximum absolute

di¬erence between two successive iterates, the ¬rst scheme converges to x— in 9

1

iterations, starting from x(0) = (’0.9, 0.9)T , while the second one converges to

x— in 115 iterations, starting from x(0) = (0.9, 0.9)T . The dramatic change in

2

the convergence behavior of the two methods can be explained in view of the

di¬erence between the spectral radii of the corresponding iteration matrices. •

Remark 7.1 Newton™s method can be regarded as a ¬xed-point method

with iteration function

GN (x) = x ’ J’1 (x)F(x). (7.19)

F

If we denote by r(k) = F(x(k) ) the residual at step k, from (7.19) it turns

out that Newton™s method can be alternatively formulated as

I ’ JGN (x(k) ) x(k+1) ’ x(k) = ’r(k) .

This equation allows us to interpret Newton™s method as a preconditioned

stationary Richardson method. This prompts introducing a parameter ±k

in order to accelerate the convergence of the iteration

I ’ JGN (x(k) ) x(k+1) ’ x(k) = ’±k r(k) .

The problem of how to select ±k will be addressed in Section 7.2.6.

An implementation of the ¬xed-point method (7.17) is provided in Pro-

gram 59. We have denoted by dim the size of the nonlinear system and

by Phi the variables containing the functional expressions of the iteration

function G. In output, the vector alpha contains the approximation of the

sought zero of F and the vector res contains the sequence of the maximum

norms of the residuals of F(x(k) ).

Program 59 - ¬xposys : Fixed-point method for nonlinear systems

function [alpha, res, nit]=¬xposys(dim, x0, nmax, toll, Phi, F)

x = x0; alpha=[x™]; res = 0;

for k=1:dim,

r=abs(eval(F(k,:))); if (r > res), res = r; end

end;

nit = 0; residual(1)=res;

while ((nit <= nmax) & (res >= toll)),

nit = nit + 1;

for k = 1:dim, xnew(k) = eval(Phi(k,:)); end

x = xnew; res = 0; alpha=[alpha;x]; x=x™;

for k = 1:dim,

294 7. Nonlinear Systems and Numerical Optimization

r = abs(eval(F(k,:)));

if (r > res), res=r; end,

end

residual(nit+1)=res;

end

res=residual™;