A nice insight into the properties of this local estimator was given by

Rodr´±guez ([64], see also [35]), who compared it with the corresponding

residual estimator (4.9). Namely, neglecting in the residual estimator just

the residual part, i.e., setting

1 2

hE [νE · ∇uh ]E

˜2 and · 2 := ˜2

·K := ˜ ·K ,

0,E

2

K∈Th

E‚‚K\‚„¦

then the following result is true:

Theorem 4.3 There exist two constants c1 , c2 > 0 depending only on the

family of triangulations such that

c1 · ¤ · ¤ c2 · .

˜ ˜

The motivation for the method of gradient recovery is to be seen in the

fact that Rh uh possesses special convergence properties. Namely, under

certain assumptions the recovered gradient Rh uh converges asymptoti-

cally to ∇u faster than ∇uh does. In such a case Rh uh is said to be a

superconvergent approximation to ∇u.

If superconvergence holds, the simple decomposition

∇u ’ ∇uh = Rh uh ’ ∇uh + ∇u ’ Rh uh

demonstrates that the ¬rst di¬erence on the right-hand side represents

the asymptotically dominating, computable part of the gradient error

∇u ’ ∇uh . In other words, if we could de¬ne, for the class of problems

under consideration, a superconvergent gradient recovery Rh uh that is com-

putable with moderate expense, then the quantities · K and · de¬ned above

may serve as a tool for a posteriori error estimation.

194 4. Grid Generation and A Posteriori Error Estimation

Unfortunately, such superconvergence properties are valid only under

rather restrictive assumptions (especially with respect to the grid and to the

regularity of the weak solution). Thus it is di¬cult to obtain a full math-

ematical foundation in practice. Nevertheless, gradient recovery is often

applied and yields satisfactory results in many situations.

The following example, which is due to Repin [63], shows that a recovered

gradient does not have to re¬‚ect the real behaviour of the error.

Example 4.4 Consider the following boundary value problem for d = 1

and „¦ = (0, 1):

’u = f u(0) = u(1) ’ 1 = 0 .

in „¦ ,

If f is constant, the exact solution reads u(x) = x(2 + (1 ’ x)f )/2. Suppose

we have found the function vh = x as an approximate solution. For an

arbitrary partition of „¦ into subintervals, this function is piecewise linear

and it satis¬es the boundary conditions formulated above. Now let Rh be

an arbitrary gradient recovery operator that is able to reproduce at least

constants. Since vh = 1, we have vh ’ Rh vh = 0, whereas the real error is

vh ’ u = (x ’ 1 )f.

2

An interpretation of this e¬ect is that the function vh does not solve the

corresponding discrete (Galerkin) equations. But this property of uh is used

for the proof of superconvergence. This property also plays an important

role in the derivation of the residual error estimates, because the error

equation is used therein.

Dual-Weighted Residual Error Estimators

The aforementioned a posteriori error estimates have two disadvantages:

On the one hand, certain global constants, which are not known in general,

are part of the bounds. Typical examples are ±’1 in (4.6) and the constants

C1 , C2 in the equivalence relation (4.10). On the other hand, we obtained

√

scaling factors like hK and hE simply by using a particular approximation

operator.

In the following, we will outline a method that attempts to circumvent

these drawbacks. It is especially appropriate for the estimation of errors of

functionals depending linearly on the solution.

So let J : V ’ R denote a linear, continuous functional. We are interested

in an estimate of |J(u) ’ J(uh )|. Therefore, the following auxiliary dual

problem is considered:

Find w ∈ V such that a(v, w) = J(v) for all v ∈ V.

Taking v = u ’ uh , we get immediately

J(u) ’ J(uh ) = J(u ’ uh ) = a(u ’ uh , w) .

If wh ∈ Vh is an arbitrary element, the error equation (2.39) yields

J(u) ’ J(uh ) = a(u ’ uh , w ’ wh ) .

4.2. A Posteriori Error Estimates and Grid Adaptation 195

Obviously, the right-hand side is of the same structure as in the derivation

of the estimate (4.8). Consequently, by using the same arguments it follows

that

|J(u) ’ J(uh )| ¤ w ’ wh

rK (uh ) 0,K 0,K

K∈Th

[νE · ∇uh ]E w ’ wh

+ .

0,E

0,E

E∈Eh

In contrast to the previous approaches, here the norms of w ’ wh will not

be theoretically analyzed but numerically approximated. This can be done

by an approximation of the dual solution w. There are several (more or less

heuristic) ways to do this.

(1) Estimation of the approximation error: Here, the norms of w ’ wh

are estimated as in the case of residual error estimators. Since the

result depends on the unknown H 1 -seminorm of w, which is equiva-

lent to the L2 -norm of ∇w, the ¬nite element solution wh ∈ Vh of the

auxiliary problem is used to approximate ∇w. It is a great disadvan-

tage of this approach that again global constants enter in the ¬nal

estimate through the estimation of the approximation error. Further-

more, the discrete auxiliary problem is of similar complexity to that

of the original discrete problem.

(2) Higher-order discretizations of the auxiliary problem: The auxiliary

problem is solved numerically by using a method that is more accu-

rate than the original method to determine a solution in Vh . Then

w is replaced by that solution and wh ∈ Vh by an interpolant of

that solution. Unfortunately, since the discrete auxiliary problem is

of comparatively large dimension, this approach is rather expensive.

(3) Approximation by means of higher-order recovery: This method

works similarly to the approach described in the previous subsection;

w is replaced by an element that is recovered from the ¬nite element

solution wh ∈ Vh of the auxiliary problem. The recovered element

approximates w with higher order in both norms than wh does. This

method exhibits two problems: On the one hand, the auxiliary prob-

lem has to be solved numerically, and on the other hand, ensuring

the corresponding superconvergence properties may be di¬cult.

At the end of this section we want to mention how the method could be

used to estimate certain norms of the error. In the case where the norms

are induced by particular scalar products, there is a simple, formal way.

For example, for the L2 -norm we have

u ’ uh , u ’ uh 0

u ’ uh = .

0

u ’ uh 0

196 4. Grid Generation and A Posteriori Error Estimation

Keeping u and uh ¬xed, we get with the de¬nition

v, u ’ uh 0

J(v) :=

u ’ uh 0

a linear, continuous functional J : H 1 („¦) ’ R such that J(u) ’ J(uh ) =

u ’ uh 0 .

The practical di¬culty of this approach consists in the fact that to be

able to ¬nd the solution w of the auxiliary problem we have to know the

values of J, but they depend on the unknown element u ’ uh . The idea

of approximating these values immediately implies two problems: There is

additional expense, and the in¬‚uence of the approximation quality on the

accuracy of the obtained bounds has to be analyzed.

Exercises

4.4 Let „¦ ‚ R2 be a bounded domain with a polygonal, Lipschitz con-

1

tinuous boundary and V := H0 („¦). Now consider a V -elliptic, continuous

bilinear form a and a continuous linear form b. The problem

u∈V : a(u, v) = b(v) for all v ∈ V

is discretized using piecewise linear, continuous ¬nite elements. If Ei de-

notes the support of the nodal basis functions of Vh associated with the

vertex ai , show that the abstract local error indicators

a(e, v)

·i := sup

v

1

v∈H0 (Ei )

can be estimated by means of the solutions ei ∈ H0 (Ei ) of the local

1

boundary value problems

ei ∈ H0 (Ei ) : a(ei , v) = b(v) ’ a(uh , v) for all v ∈ H0 (Ei )

1 1

as follows (M and ± denote the constants appearing in the continuity and

ellipticity conditions on a):

± ei ¤ ·i ¤ M ei .

1

If necessary, the elements of H0 (Ei ) are extended by zero to the whole

domain „¦.

4.5 A linear polynomial on some triangle is uniquely de¬ned either by

its values at the vertices or by its values at the edge midpoints. For a

¬xed triangulation of a polygonally bounded, simply connected domain

„¦ ‚ R2 , there can be de¬ned two ¬nite element spaces by identifying

common degrees of freedom of adjacent triangles.

Exercises 197

(a) Show that the dimension of the space de¬ned by the degrees of free-

dom located at the vertices is less than the dimension of the other

space (provided that the triangulation consists of more than one

triangle).

(b) How can one explain this “loss of degrees of freedom”?

4.6 Denote by Th a triangulation of the domain „¦ ‚ Rd . Show that for

a function v : „¦ ’ R that is continuously di¬erentiable on each element

the jump [νE · ∇v]E of the normal derivative of v across an element edge

E does not depend on the orientation of the normal νE .

4.7 Let a regular family of triangulations (Th ) of a domain „¦ ‚ R2 be

given. Show that there exist constants C > 0 that depend only on the

family (Th ) such that

|v|2 ¤ for all v ∈ L2 („¦) ,

2

Cv 0

0,∆(K)

K∈Th

|v|2 ¤ for all v ∈ L2 („¦) .

2

Cv 0

0,∆(E)

E∈Eh

4.8 Let „¦ ‚ Rd be a bounded domain. Show that there are constants

C1 , C2 > 0 such that for all v ∈ H0 („¦),

1

C1 |v|1 ¤ v ¤ C2 |v|1 .

1

5

Iterative Methods

for Systems of Linear Equations

We consider again the system of linear equations

Ax = b (5.1)

with nonsingular matrix A ∈ Rm,m , right-hand side b ∈ Rm , and solution

x ∈ Rm . As shown in Chapters 2 and 3, such systems of equations arise

from ¬nite element discretizations of elliptic boundary value problems. The

matrix A is the sti¬ness matrix and thus sparse, as can be seen from (2.37).

A sparse matrix is vaguely a matrix with so many vanishing elements that

using this structure in the solution of (5.1) is advantageous. Taking advan-

tage of a band or hull structure was discussed in Section 2.5. More precisely,

if (5.1) represents a ¬nite element discretization, then it is not su¬cient to

know the properties of the solution method for a ¬xed m. It is on the con-

trary necessary to study a sequence of problems with growing dimension

m, as it appears by the re¬nement of a triangulation. In the strict sense we

understand by the notion sparse matrices a sequence of matrices in which

the number of nonzero elements per row is bounded independently of the

dimension. This is the case for the sti¬ness matrices due to (2.37) if the un-

derlying sequence of triangulations is regular in the sense of De¬nition 3.28,

for example. In ¬nite element discretizations of time-dependent problems

(Chapter 7) as well as in ¬nite volume discretizations (Chapter 6) systems

of equations of equal properties arise, so that the following considerations

can be also applied there.

The described matrix structure is best applied in iterative methods that

have the operation matrix — vector as an essential module, where either

the system matrix A or a matrix of similar structure derived from it is

5. Iterative Methods for Systems of Linear Equations 199

concerned. If the matrix is sparse in the strict sense, then O(m) elementary

operations are necessary. In particular, list-oriented storage schemes can be

of use, as pointed out in Section 2.5.

The e¬ort for the approximative solution of (5.1) by an iterative method

is determined by the number of elementary operations per iteration step

and the number of iterations k that are necessary in order to reach the

desired relative error level µ > 0, i.e., to meet the demand

x(k) ’ x ¤ µ x(0) ’ x . (5.2)

Here x(k) k is the sequence of iterates for the initial value x(0) , · a ¬xed

norm in Rm , and x = A’1 b the exact solution of (5.1).

For all methods to be discussed we will have linear convergence of the

kind

x(k) ’ x ¤ x(0) ’ x

k

(5.3)

with a contraction number with 0 < < 1, which in general depends on

the dimension m. To satisfy (5.2), k iterations are thus su¬cient, with

1 1

k≥ ln ln . (5.4)

µ

The computational e¬ort of a method obviously depends on the size of µ,

although this will be seen as ¬xed and only the dependence on the dimen-

sion m is considered: often µ will be omitted in the corresponding Landau™s

symbols. The methods di¬er therefore by their convergence behaviour, de-

scribed by the contraction number and especially by its dependence

on m (for speci¬c classes of matrices and boundary value problems). A

method is (asymptotically) optimal if the contraction numbers are bounded

independently of m:

(m) ¤ < 1. (5.5)

In this case the total e¬ort for a sparse matrix is O(m) elementary opera-

tions, as for a matrix — vector step. Of course, for a more exact comparison,

the corresponding constants, which also re¬‚ect the e¬ort of an iteration

step, have to be exactly estimated.

While direct methods solve the system of equations (5.1) with machine

precision, provided it is solvable in a stable manner, one can freely choose

the accuracy with iterative methods. If (5.1) is generated by the discretiza-

tion of a boundary value problem, it is recommended to solve it only with

that accuracy with which (5.1) approximates the boundary value prob-

lem. Asymptotic statements hereto have, among others, been developed in

(3.89), (7.129) and give an estimation of the approximation error by Ch± ,

with constants C, ± > 0, whereby h is the mesh size of the corresponding

triangulation. Since the constants in these estimates are usually unknown,

the error level can be adapted only asymptotically in m, in order to gain

200 5. Iterative Methods for Systems of Linear Equations

an algorithmic error of equal asymptotics compared to the error of ap-

proximation. Although this contradicts the above-described point of view

of a constant error level, it does not alter anything in the comparison of

the methods: The respective e¬ort always has to be multiplied by a fac-

tor O(ln m) if in d space dimensions m ∼ h’d is valid, and the relations

between the methods compared remain the same.

Furthermore, the choice of the error level µ will be in¬‚uenced by the

quality of the initial iterate. Generally, statements about the initial iterate

are only possible for special situations: For parabolic initial boundary value

problems (Chapter 7) and a one-step time discretization it is recommended

to use the approximation of the old time level as initial iterate. In the

case of a hierarchy of space discretizations, a nested iteration is possible

(Section 5.6), where the initial iterates will naturally result.

5.1 Linear Stationary Iterative Methods

5.1.1 General Theory

We begin with the study of the following class of a¬ne-linear iteration

functions,

¦(x) := M x + N b , (5.6)

with matrices M, N ∈ Rm,m to be speci¬ed later. By means of ¦ an iter-

ation sequence x(0) , x(1) , x(2) , . . . is de¬ned through a ¬xed-point iteration

x(k+1) := ¦ x(k) , k = 0, 1, . . . , (5.7)

from an initial approximation x(0) . Methods of this kind are called linear

stationary, because of their form (5.6) with a ¬xed iteration matrix M . The

function ¦ : Rm ’ Rm is continuous, so that in case of convergence of x(k)

for k ’ ∞, for the limit x we have

x = ¦(x) = M x + N b .

In order to achieve that the ¬xed-point iteration de¬ned by (5.6) is con-

sistent with Ax = b, i.e., each solution of (5.1) is also a ¬xed point, we

must require

A’1 b = M A’1 b + N b for arbitrary b ∈ Rm ,

i.e., A’1 = M A’1 + N , and thus

I = M + NA . (5.8)

On the other hand, if N is nonsingular, which will always be the case in

the following, then (5.8) also implies that a ¬xed point of (5.6) solves the

system of equations.

5.1. Linear Stationary Iterative Methods 201

Assuming the validity of (5.8), the ¬xed-point iteration for (5.6) can also

be written as

x(k+1) = x(k) ’ N Ax(k) ’ b , (5.9)

because

M x(k) + N b = (I ’ N A) x(k) + N b .

If N is nonsingular, we have additionally an equivalent form given by

W x(k+1) ’ x(k) = ’ Ax(k) ’ b (5.10)

with W := N ’1 . The correction x(k+1) ’x(k) for x(k) is given by the residual

g (k) := Ax(k) ’ b

through (5.9) or (5.10), possibly by solving a system of equations. In order

to compete with the direct method, the solution of (5.10) should require

one order in m fewer elementary operations. For dense matrices no more

operations than O(m2 ) should be necessary as are already necessary for

the calculation of g (k) . The same holds for sparse matrices, for example

band matrices. On the other side the method should converge, and that as

quickly as possible.

In the form (5.6) ¦ is Lipschitz continuous for a given norm · on

R with Lipschitz constant M , where · is a norm on Rm,m that is

m

consistent with the vector norm (see (A3.9)).

More precisely, for a consistent iteration the error

e(k) := x(k) ’ x ,

with x = A’1 b still denoting the exact solution, even satis¬es

e(k+1) = M e(k) ,

because (5.7) and (5.8) imply

e(k+1) = x(k+1) ’ x = M x(k) + N b ’ M x ’ N Ax = M e(k) . (5.11)

The spectral radius of M , that is, the maximum of the absolute values of

the (complex) eigenvalues of M , will be denoted by (M ).

The following general convergence theorem holds:

Theorem 5.1 A ¬xed-point iteration given by (5.6) to solve Ax = b is

globally and linearly convergent if

(M ) < 1 . (5.12)

· on Rm,m induced by a norm ·

This is satis¬ed if for a matrix norm

on Rm we have

M < 1. (5.13)

202 5. Iterative Methods for Systems of Linear Equations

If the consistency condition (5.8) holds and the matrix and vector norms

applied are consistent, then the convergence is monotone in the following

sense:

e(k+1) ¤ M e(k) . (5.14)

Proof: Assuming (5.12), then for µ = (1 ’ (M )) /2 > 0 there is a norm

· S on Rm such that the induced norm · S on Rm,m satis¬es

¤ (M ) + µ < 1

M S

(see [16, p. 34]). The function ¦ is a contraction with respect to this special

norm on Rm . Therefore, Banach™s ¬xed-point theorem (Theorem 8.4) can

be applied on X = (Rm , · S ), which ensures the global convergence of

the sequence x(k) k to a ¬xed point x of ¦.

¯

If (5.13) holds, ¦ is a contraction even with respect to the norm ·

on Rm , and M is the Lipschitz constant. Finally relation (5.14) follows

2

from (5.11).

In any case, we have convergence in any norm on Rm , since they are all

equivalent. Linear convergence for (5.12) holds only in the generally not

available norm · S with M S as contraction number.

As termination criterion for the concrete iteration methods to be

introduced, often

g (k) ¤ δ g (0) (5.15)

is used with a control parameter δ > 0, abbreviated as g (k) = 0. The

connection to the desired reduction of the relative error according to (5.2)

is given by