<< . .

. 16
( : 26)



. . >>

In this section we use the l2 norm to measure nonlinear residuals. The
reason for this is that the Krylov methods use the scalar product and the
estimates in Chapters 2 and 3 are in terms of the Euclidean norm. In the
examples discussed in § 6.4 we will scale the l2 norm by a factor of 1/N so
that the results for the di¬erential and integral equations will be independent
of the computational mesh.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




101
INEXACT NEWTON METHODS

6.2.1. Newton GMRES. We provide a detailed analysis of the Newton-
GMRES iteration. We begin by discussing the e¬ects of a forward di¬erence
approximation to the action of the Jacobian on a vector.
If the linear iterative method is any one of the Krylov subspace methods
discussed in Chapters 2 and 3 then each inner iteration requires at least one
evaluation of the action of F (xc ) on a vector. In the case of CGNR and CGNE,
an evaluation of the action of F (xc )T on a vector is also required. In many
implementations [20], [24], the action of F (xc ) on a vector w is approximated
by a forward di¬erence, (5.15), Dh F (x : w) for some h. It is important to note
that this is entirely di¬erent from forming the ¬nite di¬erence Jacobian ∇h F (x)
and applying that matrix to w. In fact, as pointed out in [20], application
of GMRES to the linear equation for the Newton step with matrix-vector
products approximated by ¬nite di¬erences is the same as the application of
GMRES to the matrix Gh F (x) whose last k ’ 1 columns are the vectors

vk = Dh F (x : vk’1 )

The sequence {vk } is formed in the course of the forward-di¬erence GMRES
iteration.
To illustrate this point, we give the forward-di¬erence GMRES algorithm
for computation of the Newton step, s = ’F (x)’1 F (x). Note that matrix-
vector products are replaced by forward di¬erence approximations to F (x), a
sequence of approximate steps {sk } is produced, that b = ’F (x), and that the
initial iterate for the linear problem is the zero vector. We give a MATLAB
implementation of this algorithm in the collection of MATLAB codes.
Algorithm 6.2.1. fdgmres(s, x, F, h, ·, kmax, ρ)
1. s = 0, r = ’F (x), v1 = r/ r 2 , ρ = r 2 , β = ρ, k = 0

2. While ρ > · F (x) and k < kmax do
2

(a) k = k + 1
(b) vk+1 = Dh F (x : vk )
for j = 1, . . . k
T
i. hjk = vk+1 vj
ii. vk+1 = vk+1 ’ hjk vj
(c) hk+1,k = vk+1 2

(d) vk+1 = vk+1 / vk+1 2

(e) e1 = (1, 0, . . . , 0)T ∈ Rk+1
Minimize βe1 ’ Hk y k Rk+1 to obtain y k ∈ Rk .
(f) ρ = βe1 ’ Hk y k Rk+1 .

3. s = Vk y k .
In our MATLAB implementation we solve the least squares problem in
step 2e by the Givens rotation approach used in Algorithm gmres.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




102 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

As should be clear from inspection of Algorithm gmres, the di¬erence
between ∇h F and Gh F is that the columns are computed by taking directional
derivatives based on two di¬erent orthonormal bases: ∇h F using the basis of
unit vectors in coordinate directions and Gh F the basis for the Krylov space Kk

constructed by algorithm fdgmres. The action of Gh F on Kk , the orthogonal
complement of Kk , is not used by the algorithm and, for the purposes of

analysis, could be speci¬ed by the action of ∇h F on any basis for Kk . Hence
Gh F is also ¬rst order accurate in h. Assuming that there is no error in the
evaluation of F we may summarize the observations above in the following
proposition, which is a special case of the more general results in [20].
Proposition 6.2.1. Let the standard assumptions hold. Let · ∈ (0, 1).
¯ ¯
Then there are CG , h, and δ such that if x ∈ B(δ), h ¤ h, and Algo-
rithm fdgmres terminates with k < kmax then the computed step s satis¬es

(6.13) F (x)s + F (x) < (· + CG h) F (x) 2 .
2


Proof. First let δ be small enough so that B(δ) ‚ „¦ and the conclusions
to Lemma 4.3.1 hold. Let {uj } be any orthonormal basis for RN such that
uj = vj for 1 ¤ j ¤ k, where {vj }k is the orthonormal basis for Kk generated
j=1
by Algorithm fdgmres.
Consider the linear map de¬ned by its action on {uj }

Buj = Dh F (x : uj ).

Note that
1
Dh F (x : uj ) = 0F (x + th x 2 uj )uj dt

1
= F (x)uj + 0 (F (x + th x 2 uj ) ’ F (x))uj dt.

Since F is Lipschitz continuous and {uj } is an orthonormal basis for RN we
have, with γ = γ( x— 2 + δ),
¯

(6.14) B ’ F (x) ¤ h x 2 γ/2 ¤ h¯ /2
γ
2


Since the linear iteration (fdgmres) terminates in k < kmax iterations, we
have, since B and Gh F agree on Kk ,

Bs + F (x) ¤ · F (x)
2 2


and therefore

(6.15) F (x)s + F (x) ¤ · F (x) + h¯ s 2 /2.
γ
2 2


Assume that
¯γ ’1
h¯ ¤ F (x— )’1 2 /2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




103
INEXACT NEWTON METHODS

Then Lemma 4.3.1 and (6.15) imply
’1 ’1
F (x— )’1 s 2 /2 ¤ F (x)’1 s 2
2 2

¯γ
¤ F (x)s ¤ (1 + ·) F (x) + h¯ s 2 /2.
2 2

Therefore,
¤ 4(1 + ·) F (x— )’1
(6.16) s F (x) 2 .
2 2

Combining (6.16) with (6.15) completes the proof with CG = 4¯ (1 + γ
— )’1 .
·) F (x 2
Proposition 6.2.1 implies that a ¬nite di¬erence implementation will not
a¬ect the performance of Newton-GMRES if the steps in the forward di¬erence
approximation of the derivatives are su¬ciently small. In an implementation,
therefore, we must specify not only the sequence {·n }, but also the steps {hn }
used to compute forward di¬erences. Note also that Proposition 6.2.1 applies
to a restarted GMRES because upon successful termination (6.15) will hold.
We summarize our results so far in the analogs of Theorem 6.1.4 and
Theorem 6.1.2. The proofs are immediate consequences of Theorems 6.1.2,
6.1.4, and Proposition 6.2.1 and are left as (easy) exercises.
Theorem 6.2.1. Assume that the assumptions of Proposition 6.2.1 hold.
Then there are δ, σ such that if x0 ∈ B(δ) and the sequences {·n } and {hn }
¯
satisfy
σn = ·n + CG hn ¤ σ ¯
then the forward di¬erence Newton-GMRES iteration

xn+1 = xn + sn

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly and sn satis¬es

(6.17) F (xn )sn + F (xn ) < σn F (xn ) 2 .
2

Moreover,
• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear
2
with q-order 1 + p.
Theorem 6.2.2. Assume that the assumptions of Proposition 6.2.1 hold.
Then there is δ such that if x0 ∈ B(δ) and the sequences {·n } and {hn } satisfy

σn = ·n + CG hn ∈ [0, σ ]
¯

with 0 < σ < 1 then the forward di¬erence Newton-GMRES iteration
¯

xn+1 = xn + sn



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




104 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly with respect to · — and sn satis¬es (6.17). Moreover
• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear
2
with q-order 1 + p.
In the case where hn is approximately the square root of machine roundo¬,
σn ≈ ·n if ·n is not too small and Theorem 6.2.2 states that the observed
behavior of the forward di¬erence implementation will be the same as that of
an implementation with exact derivatives.

6.2.2. Other Newton-iterative methods. Other iterative methods may
be used as the solver for the linear equation for the Newton step. Newton-
multigrid [99] is one approach using a stationary iterative method. See also
[6], [111]. If F is symmetric and positive de¬nite, as it is in the context of
unconstrained minimization, [63], Newton-CG is a possible approach.
When storage is restricted or the problem is very large, GMRES may not
be practical. GMRES(m), for a small m, may not converge rapidly enough
to be useful. CGNR and CGNE o¬er alternatives, but require evaluation of
transpose-vector products. Bi-CGSTAB and TFQMR should be considered in
all cases where there is not enough storage for GMRES(m) to perform well.
If a transpose is needed, as it will be if CGNR or CGNE is used as the
iterative method, a forward di¬erence formulation is not an option because
approximation of the transpose-vector product by di¬erences is not possible.
Computation of ∇h F (x) and its transpose analytically is one option as is
di¬erentiation of F automatically or by hand. The reader may explore this
further in Exercise 6.5.13.
The reader may also be interested in experimenting with the other Krylov
subspace methods described in § 3.6, [78], and [12].

6.3. Newton-GMRES implementation
We provide a MATLAB code nsolgm in the collection that implements
a forward-di¬erence Newton-GMRES algorithm. This algorithm requires
di¬erent inputs than nsol. As input we must give the initial iterate, the
function, the vector of termination tolerances as we did for nsol. In addition,
we must provide a method for forming the sequence of forcing terms {·n }.
We adjust · as the iteration progresses with a variation of a choice from
[69]. This issue is independent of the particular linear solver and our discussion
is in the general inexact Newton method setting. Setting · to a constant for
the entire iteration is often a reasonable strategy as we see in § 6.4, but the
choice of that constant depends on the problem. If a constant · is too small,
much e¬ort can be wasted in the initial stages of the iteration. The choice in



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




105
INEXACT NEWTON METHODS

[69] is motivated by a desire to avoid such over solving. Over solving means
that the linear equation for the Newton step is solved to a precision far beyond
what is needed to correct the nonlinear iteration. As a measure of the degree
to which the nonlinear iteration approximates the solution we begin with
·n = γ F (xn ) 2 / F (xn’1 ) 2 ,
A

A
where γ ∈ (0, 1] is a parameter. If ·n is uniformly bounded away from 1,
A
then setting ·n = ·n for n > 0 would guarantee q-quadratic convergence by
Theorem 6.1.1. In this way, the most information possible would be extracted
from the inner iteration. In order to specify the choice at n = 0 and bound
the sequence away from 1 we set
±
 ·max , n = 0,

B
(6.18) ·n =

 min(· A
max , ·n ), n > 0.

In (6.18) the parameter ·max is an upper limit on the sequence {·n }. In [69]
the choices γ = .9 and ·max = .9999 are used.
B
It may happen that ·n is small for one or more iterations while xn is still
far from the solution. A method of safeguarding was suggested in [69] to avoid
volatile decreases in ·n . The idea is that if ·n’1 is su¬ciently large we do not
let ·n decrease by much more than a factor of ·n’1 .
±
 ·max , n = 0,





C A 2
min(·max , ·n ), n > 0, γ·n’1 ¤ .1,
(6.19) ·n =





 A 2 2
min(·max , max(·n , γ·n’1 )), n > 0, γ·n’1 > .1.
The constant .1 is somewhat arbitrary. This safeguarding does improve the
performance of the iteration.
There is a chance that the ¬nal iterate will reduce F far beyond the
desired level and that the cost of the solution of the linear equation for the last
step will be more accurate than is really needed. This oversolving on the ¬nal
step can be controlled comparing the norm of the current nonlinear residual
F (xn ) to the nonlinear residual norm at which the iteration would terminate
„t = „a + „r F (x0 )
and bounding ·n from below by a constant multiple of „t / F (xn ) . The
algorithm nsolgm does this and uses
C
(6.20) ·n = min(·max , max(·n , .5„t / F (xn ) )).
Exercise 6.5.9 asks the reader to modify nsolgm so that other choices of
the sequence {·n } can be used.
We use dirder to approximate directional derivatives and use the default
value of h = 10’7 . In all the examples in this book we use the value γ = .9 as
recommended in [69].



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




106 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Algorithm 6.3.1. nsolgm(x, F, „, ·)

1. rc = r0 = F (x) 2 / N

2. Do while F (x) 2 / N > „r r0 + „a
(a) Select ·.
(b) fdgmres(s, x, F, ·)
(c) x = x + s
(d) Evaluate F (x)

(e) r+ = F (x) 2 / N , σ = r+ /rc , rc = r+
(f) If F (x) ¤ „r r0 + „a exit.
2


Note that the cost of an outer iterate is one evaluation of F to compute
the value at the current iterate and other evaluations of F to compute the
forward di¬erences for each inner iterate. Hence if a large value of · can be
used, the cost of the entire iteration can be greatly reduced. If a maximum
of m GMRES iterations is needed (or GMRES(m) is used as a linear solver)
the storage requirements of Algorithm nsolgm are the m + 5 vectors x, F (x),
x + hv, F (x + hv), s, and the Krylov basis {vk }m .
k=1
Since GMRES forms the inner iterates and makes termination decisions
based on scalar products and l2 norms, we also terminate the outer iteration on
small l2 norms of the nonlinear residuals. However, because of our applications
to di¬erential and integral equations, we scale the l2 norm of the nonlinear

residual by a factor of 1/ N so that constant functions will have norms that
are independent of the computational mesh.
For large and poorly conditioned problems, GMRES will have to be
restarted. We illustrate the consequences of this in § 6.4 and ask the reader to
make the necessary modi¬cations to nsolgm in Exercise 6.5.9.
The preconditioners we use in § 6.4 are independent of the outer iteration.
Since the Newton steps for M F (x) = 0 are the same as those for F (x) = 0, it
is equivalent to precondition the nonlinear equation before calling nsolgm.

6.4. Examples for Newton-GMRES
In this section we consider two examples. We revisit the H-equation and solve
a preconditioned nonlinear convection-di¬usion equation.
In all the ¬gures we plot the relative nonlinear residual F (xn ) 2 / F (x0 ) 2
against the number of function evaluations required by all inner and outer
iterations to compute xn . Counts of function evaluations corresponding to
outer iterations are indicated by circles. From these plots one can compare
not only the number of outer iterations, but also the total cost. This enables
us to directly compare the costs of di¬erent strategies for selection of ·. Note
that if only a single inner iteration is needed, the total cost of the outer iteration
will be two function evaluations since F (xc ) will be known. One new function
evaluation will be needed to approximate the action of F (xc ) on a vector



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

<< . .

. 16
( : 26)



. . >>