<< . .

. 42
( : 95)



. . >>

For an e¬cient implementation of these techniques we refer to the MAT-
LAB software package developed in [Kel99].
7.1 Solution of Systems of Nonlinear Equations 287

3. Di¬erence approximations of the Jacobian matrix
Another possibility consists of replacing JF (x(k) ) (whose explicit compu-
tation is often very expensive) with an approximation through n-dimensional
di¬erences of the form
(k)
F(x(k) + hj ej ) ’ F(x(k) )
(k)
∀k ≥ 0,
(Jh )j = , (7.9)
(k)
hj
(k)
where ej is the j-th vector of the canonical basis of Rn and hj > 0 are
increments to be suitably chosen at each step k of the iteration (7.4). The
following result can be shown.

Property 7.1 Let F and x— be such that the hypotheses of Theorem 7.1
are ful¬lled, where · denotes the · 1 vector norm and the corresponding
induced matrix norm. If there exist two positive constants µ and h such
that x(0) ∈ B(x— , µ) and 0 < |hj | ¤ h for j = 1, . . . , n then the sequence
(k)

de¬ned by
’1
(k)
x(k+1) = x(k) ’ Jh F(x(k) ), (7.10)

is well de¬ned and converges linearly to x— . Moreover, if there exists a
positive constant C such that max|hj | ¤ C x(k) ’ x— or, equivalently,
(k)
j
(k)
there exists a positive constant c such that max|hj | ¤ c F(x(k) ) , then
j
the sequence (7.10) is convergent quadratically.
This result does not provide any constructive indication as to how to com-
(k)
pute the increments hj . In this regard, the following remarks can be made.
(k)
The ¬rst-order truncation error with respect to hj , which arises from the
(k)
divided di¬erence (7.10), can be reduced by reducing the sizes of hj . On
(k)
the other hand, a too small value for hj can lead to large rounding er-
rors. A trade-o¬ must therefore be made between the need of limiting the
truncation errors and ensuring a certain accuracy in the computations.
A possible choice is to take

(k) (k)
M max |xj |, Mj sign(xj ),
hj =

where Mj is a parameter that characterizes the typical size of the com-
ponent xj of the solution. Further improvements can be achieved using
higher-order divided di¬erences to approximate derivatives, like
(k) (k)
F(x(k) + hj ej ) ’ F(x(k) ’ hj ej )
(k)
∀k ≥ 0.
(Jh )j = ,
(k)
2hj
For further details on this subject, see, for instance, [BS90].
288 7. Nonlinear Systems and Numerical Optimization

7.1.3 Quasi-Newton Methods
By this term, we denote all those schemes in which globally convergent
methods are coupled with Newton-like methods that are only locally con-
vergent, but with an order greater than one.
In a quasi-Newton method, given a continuously di¬erentiable function
F : Rn ’ Rn , and an initial value x(0) ∈ Rn , at each step k one has to
accomplish the following operations:
1. compute F(x(k) );
˜
2. choose JF (x(k) ) as being either the exact JF (x(k) ) or an approxima-
tion of it;
˜
3. solve the linear system JF (x(k) )δx(k) = ’F(x(k) );
4. set x(k+1) = x(k) + ±k δx(k) , where ±k are suitable damping parame-
ters.
Step 4. is thus the characterizing element of this family of methods. It will
be addressed in Section 7.2.6, where a criterion for selecting the “direction”
δx(k) will be provided.


7.1.4 Secant-like Methods
These methods are constructed starting from the secant method introduced
in Section 6.2 for scalar functions. Precisely, given two vectors x(0) and x(1) ,
at the generic step k ≥ 1 we solve the linear system

Qk δx(k+1) = ’F(x(k) ) (7.11)

and we set x(k+1) = x(k) + δx(k+1) . Qk is an n — n matrix such that

Qk δx(k) = F(x(k) ) ’ F(x(k’1) ) = b(k) , k ≥ 1,

and is obtained by a formal generalization of (6.13). However, the algebraic
relation above does not su¬ce to uniquely determine Qk . For this purpose
we require Qk for k ≥ n to be a solution to the following set of n systems

Qk x(k) ’ x(k’j) = F(x(k) ) ’ F(x(k’j) ), j = 1, . . . , n. (7.12)

If the vectors x(k’j) , . . . , x(k) are linearly independent, system (7.12) allows
for calculating all the unknown coe¬cients {(Qk )lm , l, m = 1, . . . , n} of
Qk . Unfortunately, in practice the above vectors tend to become linearly
dependent and the resulting scheme is unstable, not to mention the need
for storing all the previous n iterates.
For these reasons, an alternative approach is pursued which aims at pre-
serving the information already provided by the method at step k. Precisely,
7.1 Solution of Systems of Nonlinear Equations 289

Qk is looked for in such a way that the di¬erence between the following
linear approximants to F(x(k’1) ) and F(x(k) ), respectively

F(x(k) ) + Qk (x ’ x(k) ), F(x(k’1) ) + Qk’1 (x ’ x(k’1) ),

is minimized jointly with the constraint that Qk satis¬es system (7.12).
Using (7.12) with j = 1, the di¬erence between the two approximants is
found to be

dk = (Qk ’ Qk’1 ) x ’ x(k’1) . (7.13)

Let us decompose the vector x ’ x(k’1) as

x ’ x(k’1) = ±δx(k) + s,

where ± ∈ R and sT δx(k) = 0. Therefore, (7.13) becomes

dk = ± (Qk ’ Qk’1 ) δx(k) + (Qk ’ Qk’1 ) s.

Only the second term in the relation above can be minimized since the ¬rst
one is independent of Qk , being

(Qk ’ Qk’1 )δx(k) = b(k) ’ Qk’1 δx(k) .

The problem has thus become: ¬nd the matrix Qk such that (Qk ’ Qk’1 ) s
is minimized ∀s orthogonal to δx(k) with the constraint that (7.12) holds.
It can be shown that such a matrix exists and can be recursively computed
as follows
T
(b(k) ’ Qk’1 δx(k) )δx(k)
Qk = Qk’1 + . (7.14)
T
δx(k) δx(k)
The method (7.11), with the choice (7.14) of matrix Qk is known as the
Broyden method. To initialize (7.14), we set Q0 equal to the matrix JF (x(0) )
or to any approximation of it, for instance, the one yielded by (7.9). As for
the convergence of Broyden™s method, the following result holds.

Property 7.2 If the assumptions of Theorem 7.1 are satis¬ed and there
exist two positive constants µ and γ such that

x(0) ’ x— ¤ µ, Q0 ’ JF (x— ) ¤ γ,

then the sequence of vectors x(k) generated by Broyden™s method is well
de¬ned and converges superlinearly to x— , that is

x(k) ’ x— ¤ ck x(k’1) ’ x— (7.15)

where the constants ck are such that lim ck = 0.
k’∞
290 7. Nonlinear Systems and Numerical Optimization

Under further assumptions, it is also possible to prove that the sequence
Qk converges to JF (x— ), a property that does not necessarily hold for the
above method as demonstrated in Example 7.3.
There exist several variants to Broyden™s method which aim at reducing
its computational cost, but are usually less stable (see [DS83], Chapter 8).

Program 58 implements Broyden™s method (7.11)-(7.14). We have denoted
by Q the initial approximation Q0 in (7.14).

Program 58 - broyden : Broyden™s method for nonlinear systems
function [x,it]=broyden(x,Q,nmax,toll,f)
[n,m]=size(f); it=0; err=1;
fk=zeros(n,1); fk1=fk;
for i=1:n, fk(i)=eval(f(i,:)); end
while it < nmax & err > toll
s=-Q \ fk; x=s+x; err=norm(s,inf);
if err > toll
for i=1:n, fk1(i)=eval(f(i,:)); end
Q=Q+1/(s™*s)*fk1*s™
end
it=it+1; fk=fk1;
end

Example 7.2 Let us solve using Broyden™s method the nonlinear system of Ex-
ample 7.1. The method converges in 35 iterations to the value (0.7 · 10’8 , 0.7 ·
10’8 )T compared with the 26 iterations required by Newton™s method starting
from the same initial guess (x(0) = (0.1, 0.1)T ). The matrix Q0 has been set equal
to the Jacobian matrix evaluated at x(0) . Figure 7.1 shows the behavior of the

Euclidean norm of the error for both methods.


Example 7.3 Suppose we wish to solve using the Broyden method the nonlinear
system F(x) = (x1 +x2 ’3; x2 +x2 ’9)T = 0. This system admits the two solutions
1 2
(0, 3)T and (3, 0)T . Broyden™s method converges in 8 iterations to the solution
(0, 3)T starting from x(0) = (2, 4)T . However, the sequence of Qk , stored in the
variable Q of Program 58, does not converge to the Jacobian matrix, since
1 1 1 1
lim Q(k) = = JF [(0, 3)T ] = .
1.5 1.75 0 6
k’∞




7.1.5 Fixed-point Methods
We conclude the analysis of methods for solving systems of nonlinear equa-
tions by extending to n-dimensions the ¬xed-point techniques introduced
in the scalar case. For this, we reformulate problem (7.1) as
given G : Rn ’ Rn , ¬nd x— ∈ Rn such that G(x— ) = x— (7.16)
7.1 Solution of Systems of Nonlinear Equations 291

’1
10

’2
10

’3
10

’4
10

’5
10

’6
10

’7
10

’8
10

’9
10
0 5 10 15 20 25 30 35 40


FIGURE 7.1. Euclidean norm of the error for the Newton method (solid line)
and the Broyden method (dashed line) in the case of the nonlinear system of
Example 7.1

where G is related to F through the following property: if x— is a ¬xed
point of G, then F(x— ) = 0.
Analogously to what was done in Section 6.3, we introduce iterative meth-
ods for the solution of (7.16) of the form:

given x(0) ∈ Rn , for k = 0, 1, . . . until convergence, ¬nd

x(k+1) = G(x(k) ). (7.17)

In order to analyze the convergence of the ¬xed-point iteration (7.17) the
following de¬nition will be useful.

De¬nition 7.1 A mapping G : D ‚ Rn ’ Rn is contractive on a set
D0 ‚ D if there exists a constant ± < 1 such that G(x) ’ G(y) ¤
± x ’ y for all x, y in D0 where · is a suitable vector norm.
The existence and uniqueness of a ¬xed point for G is ensured by the
following theorem.

Theorem 7.2 (contraction-mapping theorem) Suppose that G : D ‚
Rn ’ Rn is contractive on a closed set D0 ‚ D and that G(x) ‚ D0 for
all x ∈ D0 . Then G has a unique ¬xed point in D0 .
Proof. Let us ¬rst prove the uniqueness of the ¬xed point. For this, assume that
there exist two distinct ¬xed points, x— , y— . Then

x— ’ y— = G(x— ) ’ G(y— ) ¤ ± x— ’ y—

from which (1 ’ ±) x— ’ y— ¤ 0. Since (1 ’ ±) > 0, it must necessarily be that
x— ’ y— = 0, i.e., x— = y— .
292 7. Nonlinear Systems and Numerical Optimization

To prove the existence we show that x(k) given by (7.17) is a Cauchy sequence.
This in turn implies that x(k) is convergent to a point x(—) ∈ D0 . Take x(0)
arbitrarily in D0 . Then, since the image of G is included in D0 , the sequence x(k)
is well de¬ned and

x(k+1) ’ x(k) = G(x(k) ) ’ G(x(k’1) ) ¤ ± x(k) ’ x(k’1) .

After p steps, p ≥ 1, we obtain
p
’x ¤ x(k+i) ’ x(k+i’1) ¤ ±p’1 + . . . + 1 x(k+1) ’ x(k)
(k+p) (k)
x
i=1
±k
¤ x(1) ’ x(0) .
1’±

Owing to the continuity of G it follows that lim G(x(k) ) = G(x(—) ) which proves
k’∞
3
(—)
that x is a ¬xed point for G.

The following result provides a su¬cient condition for the iteration (7.17) to
converge (for the proof see [OR70], pp. 299-301), and extends the analogous
Theorem 6.3 in the scalar case.

Property 7.3 Suppose that G : D ‚ Rn ’ Rn has a ¬xed point x— in the
interior of D and that G is continuously di¬erentiable in a neighborhood of
x— . Denote by JG the Jacobian matrix of G and assume that ρ(JG (x(—) )) <
1. Then there exists a neighborhood S of x— such that S ‚ D and, for any
x(0) ∈ S, the iterates de¬ned by (7.17) all lie in D and converge to x— .


As usual, since the spectral radius is the in¬mum of the induced matrix
norms, in order for convergence to hold it su¬ces to check that JG (x) < 1
for some matrix norm.

Example 7.4 Consider the nonlinear system
T
F(x) = x2 + x2 ’ 1, 2x1 + x2 ’ 1 = 0,
1 2


whose solutions are x— = (0, 1)T and x— = (4/5, ’3/5)T . To solve it, let us use
1 2
two ¬xed-point schemes, respectively de¬ned by the following iteration functions
®  ® 
1 ’ x2 1 ’ x2
G1 (x) = ° » , G2 (x) = ° ».
2 2 (7.18)
1 ’ x1 ’ 1 ’ x1
2 2


It can be checked that Gi (x— ) = x— for i = 1, 2 and that the Jacobian matrices
i i
of G1 and G2 , evaluated at x— and x— respectively, are
1 2
®  ® 
0 ’1 0 ’1
2 2
JG1 (x— ) = ° » , JG2 (x— ) = ° ».
1 2
4
0 0 0
3
7.1 Solution of Systems of Nonlinear Equations 293

The spectral radii are ρ(JG1 (x— )) = 0 and ρ(JG2 (x— )) = 2/3 0.817 < 1 so
1 2
that both methods are convergent in a suitable neighborhood of their respective
¬xed points.
Running Program 59, with a tolerance of 10’10 on the maximum absolute
di¬erence between two successive iterates, the ¬rst scheme converges to x— in 9
1
iterations, starting from x(0) = (’0.9, 0.9)T , while the second one converges to
x— in 115 iterations, starting from x(0) = (0.9, 0.9)T . The dramatic change in
2
the convergence behavior of the two methods can be explained in view of the
di¬erence between the spectral radii of the corresponding iteration matrices. •

Remark 7.1 Newton™s method can be regarded as a ¬xed-point method
with iteration function

GN (x) = x ’ J’1 (x)F(x). (7.19)
F

If we denote by r(k) = F(x(k) ) the residual at step k, from (7.19) it turns
out that Newton™s method can be alternatively formulated as

I ’ JGN (x(k) ) x(k+1) ’ x(k) = ’r(k) .

This equation allows us to interpret Newton™s method as a preconditioned
stationary Richardson method. This prompts introducing a parameter ±k
in order to accelerate the convergence of the iteration

I ’ JGN (x(k) ) x(k+1) ’ x(k) = ’±k r(k) .

The problem of how to select ±k will be addressed in Section 7.2.6.
An implementation of the ¬xed-point method (7.17) is provided in Pro-
gram 59. We have denoted by dim the size of the nonlinear system and
by Phi the variables containing the functional expressions of the iteration
function G. In output, the vector alpha contains the approximation of the
sought zero of F and the vector res contains the sequence of the maximum
norms of the residuals of F(x(k) ).

Program 59 - ¬xposys : Fixed-point method for nonlinear systems
function [alpha, res, nit]=¬xposys(dim, x0, nmax, toll, Phi, F)
x = x0; alpha=[x™]; res = 0;
for k=1:dim,
r=abs(eval(F(k,:))); if (r > res), res = r; end
end;
nit = 0; residual(1)=res;
while ((nit <= nmax) & (res >= toll)),
nit = nit + 1;
for k = 1:dim, xnew(k) = eval(Phi(k,:)); end
x = xnew; res = 0; alpha=[alpha;x]; x=x™;
for k = 1:dim,
294 7. Nonlinear Systems and Numerical Optimization

r = abs(eval(F(k,:)));
if (r > res), res=r; end,
end
residual(nit+1)=res;
end
res=residual™;

<< . .

. 42
( : 95)



. . >>