the computed approximations of an eigenvalue/eigenvector pair (», x) of

A. De¬ning the residual as

r = Ax ’ »x, x = 0,

it then follows that

r 2

min |» ’ »i | ¤ . (5.13)

x

»i ∈σ(A) 2

Proof. Since A is hermitian, it admits a system of orthonormal eigenvectors {uk }

n

which can be taken as a basis of C . In particular, x =

n

±i ui with ±i = uH x,

i

i=1

n

±i (»i ’ »)ui . As a consequence

and thus r =

i=1

n n

2

r 2

βi (»i ’ ») , with βi = |±k | /( |±j |2 ).

2 2

= (5.14)

x 2

i=1 j=1

n

3

Since βi = 1, the inequality (5.13) immediately follows from (5.14).

i=1

The estimate (5.13) ensures that a small absolute error corresponds to a

small relative residual in the computation of the eigenvalue of the matrix

A which is closest to ».

Let us now consider the following a posteriori estimate for the eigenvector

x (for the proof, see [IK66], pp. 142-143).

Property 5.6 Under the same assumptions of Theorem 5.5, suppose that

|»i ’ »| ¤ r 2 for i = 1, . . . , m and that |»i ’ »| ≥ δ > 0 for i =

m + 1, . . . , n. Then

r 2

d(x, Um ) ¤ (5.15)

δ

where d(x, Um ) is the Euclidean distance between x and the space Um gen-

erated by the eigenvectors ui , i = 1, . . . , m associated with the eigenvalues

»i of A.

192 5. Approximation of Eigenvalues and Eigenvectors

Notice that the a posteriori estimate (5.15) ensures that a small absolute

error corresponds to a small residual in the approximation of the eigenvec-

tor associated with the eigenvalue of A that is closest to », provided that

the eigenvalues of A are well-separated (that is, if δ is su¬ciently large).

In the general case of a nonhermitian matrix A, an a posteriori estimate

can be given for the eigenvalue » only when the matrix of the eigenvectors

of A is available. We have the following result (for the proof, we refer to

[IK66], p. 146).

Property 5.7 Let A ∈ Cn—n be a diagonalizable matrix, with matrix of

eigenvectors X = [x1 , . . . , xn ]. If, for some µ > 0,

¤ µ x 2,

r 2

then

min |» ’ »i | ¤ µ X’1 X 2.

2

»i ∈σ(A)

This estimate is of little practical use, since it requires the knowledge of all

the eigenvectors of A. Examples of a posteriori estimates that can actually

be implemented in a numerical algorithm will be provided in Sections 5.3.1

and 5.3.2.

5.3 The Power Method

The power method is very good at approximating the extremal eigenvalues

of the matrix, that is, the eigenvalues having largest and smallest module,

denoted by »1 and »n respectively, as well as their associated eigenvectors.

Solving such a problem is of great interest in several real-life applications

(geosysmic, machine and structural vibrations, electric network analysis,

quantum mechanics, . . . ) where the computation of »n (and its associated

eigenvector xn ) arises in the determination of the proper frequency (and

the corresponding fundamental mode) of a given physical system. We shall

come back to this point in Section 5.12.

Having approximations of »1 and »n can also be useful in the analysis of

numerical methods. For instance, if A is symmetric and positive de¬nite,

one can compute the optimal value of the acceleration parameter of the

Richardson method and estimate its error reducing factor (see Chapter

4), as well as perform the stability analysis of discretization methods for

systems of ordinary di¬erential equations (see Chapter 11).

5.3.1 Approximation of the Eigenvalue of Largest Module

Let A ∈ Cn—n be a diagonalizable matrix and let X ∈ Cn—n be the matrix of

its eigenvectors xi , for i = 1, . . . , n. Let us also suppose that the eigenvalues

5.3 The Power Method 193

of A are ordered as

|»1 | > |»2 | ≥ |»3 | . . . ≥ |»n |, (5.16)

where »1 has algebraic multiplicity equal to 1. Under these assumptions,

»1 is called the dominant eigenvalue of matrix A.

Given an arbitrary initial vector q(0) ∈ Cn of unit Euclidean norm, consider

for k = 1, 2, . . . the following iteration based on the computation of powers

of matrices, commonly known as the power method:

z(k) = Aq(k’1)

q(k) = z(k) / z(k) (5.17)

2

ν (k) = (q(k) )H Aq(k) .

Let us analyze the convergence properties of method (5.17). By induction

on k one can check that

Ak q(0)

k ≥ 1.

(k)

q = , (5.18)

Ak q(0) 2

This relation explains the role played by the powers of A in the method.

Because A is diagonalizable, its eigenvectors xi form a basis of Cn ; it is

thus possible to represent q(0) as

n

±i ∈ C,

(0)

q = ±i xi , i = 1, . . . , n. (5.19)

i=1

Moreover, since Axi = »i xi , we have

n k

±i »i

k (0)

±1 »k

Aq = x1 + xi , k = 1, 2, . . . (5.20)

1

± »1

i=2 1

Since |»i /»1 | < 1 for i = 2, . . . , n, as k increases the vector Ak q(0) (and

thus also q(k) , due to (5.18)), tends to assume an increasingly signi¬cant

component in the direction of the eigenvector x1 , while its components in

the other directions xj decrease. Using (5.18) and (5.20), we get

±1 »k (x1 + y(k) ) x1 + y(k)

1

(k)

q = = µk ,

±1 »k (x1 + y(k) ) 2 x1 + y(k) 2

1

where µk is the sign of ±1 »k and y(k) denotes a vector that vanishes as

1

k ’ ∞.

As k ’ ∞, the vector q(k) thus aligns itself along the direction of eigen-

vector x1 , and the following error estimate holds at each step k.

194 5. Approximation of Eigenvalues and Eigenvectors

Theorem 5.6 Let A ∈ Cn—n be a diagonalizable matrix whose eigenvalues

satisfy (5.16). Assuming that ±1 = 0, there exists a constant C > 0 such

that

k

»2

’ x1 ¤C k ≥ 1,

(k)

˜

q , (5.21)

2

»1

where

n k

q(k) Ak q(0) ±i »i

2

(k)

˜

q = = x1 + xi , k = 1, 2, . . . (5.22)

±1 »k ± »1

i=2 1

1

Proof. Since A is diagonalizable, without losing generality, we can pick up the

nonsingular matrix X in such a way that its columns have unit Euclidean length,

that is xi 2 = 1 for i = 1, . . . , n. From (5.20) it thus follows that

n n

k k

±i »i ±i »i

xi ’ x1

x1 + = xi

2 2

±1 »1 ±1 »1

i=2 i=2

1/2 1/2

n n

2 2k k 2

±i »i »2 ±i

¤ ¤ ,

±1 »1 »1 ±1

i=2 i=2

1/2

n

3

2

that is (5.21) with C = (±i /±1 ) .

i=2

Estimate (5.21) expresses the convergence of the sequence q(k) towards x1 .

˜

Therefore the sequence of Rayleigh quotients

H

((˜ (k) )H A˜ (k) )/ q(k) 2

= q(k) Aq(k) = ν (k)

˜

q q 2

will converge to »1 . As a consequence, limk’∞ ν (k) = »1 , and the conver-

gence will be faster when the ratio |»2 /»1 | is smaller.

If the matrix A is real and symmetric it can be proved, always assuming

that ±1 = 0, that (see [GL89], pp. 406-407)

2k

»2

|»1 ’ ν | ¤ |»1 ’ »n | tan (θ0 )

(k) 2

, (5.23)

»1

where cos(θ0 ) = |xT q(0) | = 0. Inequality (5.23) outlines that the conver-

1

gence of the sequence ν (k) to »1 is quadratic with respect to the ratio |»2 /»1 |

(we refer to Section 5.3.3 for numerical results).

We conclude the section by providing a stopping criterion for the iteration

(5.17). For this purpose, let us introduce the residual at step k

r(k) = Aq(k) ’ ν (k) q(k) , k ≥ 1,

H

and, for µ > 0, the matrix µE(k) = ’r(k) q(k) ∈ Cn—n with E(k) = 1.

2

Since

µE(k) q(k) = ’r(k) , k ≥ 1, (5.24)

5.3 The Power Method 195

we obtain A + µE(k) q(k) = ν (k) q(k) . As a result, at each step of the

power method ν (k) is an eigenvalue of the perturbed matrix A + µE(k) .

From (5.24) and from de¬nition (1.20) it also follows that µ = r(k) 2 for

k = 1, 2, . . . . Plugging this identity back into (5.10) and approximating the

partial derivative in (5.10) by the incremental ratio |»1 ’ ν (k) |/µ, we get

r(k) 2

|»1 ’ ν | k ≥ 1,

(k)

, (5.25)

| cos(θ» )|

where θ» is the angle between the right and the left eigenvectors, x1 and y1 ,

associated with »1 . Notice that, if A is an hermitian matrix, then cos(θ» ) =

1, so that (5.25) yields an estimate which is analogue to (5.13).

In practice, in order to employ the estimate (5.25) it is necessary at each

step k to replace | cos(θ» )| with the module of the scalar product between

two approximations q(k) and w(k) of x1 and y1 , computed by the power

method. The following a posteriori estimate is thus obtained

r(k) 2

|»1 ’ ν | k ≥ 1.

(k)

, (5.26)

|(w(k) )H q(k) |

Examples of applications of (5.26) will be provided in Section 5.3.3.

5.3.2 Inverse Iteration

In this section we look for an approximation of the eigenvalue of a matrix

A ∈ Cn—n which is closest to a given number µ ∈ C, where µ ∈ σ(A).

’1

For this, the power iteration (5.17) can be applied to the matrix (Mµ ) =

’1

(A ’ µI) , yielding the so-called inverse iteration or inverse power method.

The number µ is called a shift.

The eigenvalues of M’1 are ξi = (»i ’ µ)’1 ; let us assume that there

µ

exists an integer m such that

|»m ’ µ| < |»i ’ µ|, ∀i = 1, . . . , n and i = m. (5.27)

This amounts to requiring that the eigenvalue »m which is closest to µ has

multiplicity equal to 1. Moreover, (5.27) shows that ξm is the eigenvalue of

M’1 with largest module; in particular, if µ = 0, »m turns out to be the

µ

eigenvalue of A with smallest module.

Given an arbitrary initial vector q(0) ∈ Cn of unit Euclidean norm, for

k = 1, 2, . . . the following sequence is constructed:

(A ’ µI) z(k) = q(k’1)

q(k) = z(k) / z(k) (5.28)

2

σ (k) = (q(k) )H Aq(k) .

196 5. Approximation of Eigenvalues and Eigenvectors

Notice that the eigenvectors of Mµ are the same as those of A since Mµ =

X (Λ ’ µIn ) X’1 , where Λ = diag(»1 , . . . , »n ). For this reason, the Rayleigh

quotient in (5.28) is computed directly on the matrix A (and not on M’1 ). µ

The main di¬erence with respect to (5.17) is that at each step k a linear

system with coe¬cient matrix Mµ = A ’ µI must be solved. For numerical

convenience, the LU factorization of Mµ is computed once for all at k = 1,

so that at each step only two triangular systems are to be solved, with a

cost of the order of n2 ¬‚ops.

Although being more computationally expensive than the power method

(5.17), the inverse iteration has the advantage that it can converge to any

desired eigenvalue of A (namely, the one closest to the shift µ). Inverse iter-

ation is thus ideally suited for re¬ning an initial estimate µ of an eigenvalue

of A, which can be obtained, for instance, by applying the localization tech-

niques introduced in Section 5.1. Inverse iteration can be also e¬ectively

employed to compute the eigenvector associated with a given (approximate)

eigenvalue, as described in Section 5.8.1.

In view of the convergence analysis of the iteration (5.28) we assume that

A is diagonalizable, so that q(0) can be represented in the form (5.19).

Proceeding in the same way as in the power method, we let

n k

±i ξi

(k)

˜

q = xm + xi ,

±m ξm

i=1,i=m

where xi are the eigenvectors of M’1 (and thus also of A), while ±i are as

µ

in (5.19). As a consequence, recalling the de¬nition of ξi and using (5.27),

we get

lim q(k) = xm , lim σ (k) = »m .

˜

k’∞ k’∞

Convergence will be faster when µ is closer to »m . Under the same assump-

tions made for proving (5.26), the following a posteriori estimate can be

obtained for the approximation error on »m

r(k) 2

|»m ’ σ | k ≥ 1,

(k)

, (5.29)

|(w(k) )H q(k) |

where r(k) = Aq(k) ’ σ (k) q(k) and w(k) is the k-th iterate of the inverse

power method to approximate the left eigenvector associated with »m .

5.3.3 Implementation Issues

The convergence analysis of Section 5.3.1 shows that the e¬ectiveness of

the power method strongly depends on the dominant eigenvalues being

well-separated (that is, |»2 |/|»1 | 1). Let us now analyze the behavior of

iteration (5.17) when two dominant eigenvalues of equal module exist (that

is, |»2 | = |»1 |). Three cases must be distinguished:

5.3 The Power Method 197

1. »2 = »1 : the two dominant eigenvalues are coincident. The method

is still convergent, since for k su¬ciently large (5.20) yields

Ak q(0) »k (±1 x1 + ±2 x2 )

1

which is an eigenvector of A. For k ’ ∞, the sequence q(k) (after

˜

a suitable rede¬nition) converges to a vector lying in the subspace

spanned by the eigenvectors x1 and x2 , while the sequence ν (k) still

converges to »1 .

2. »2 = ’»1 : the two dominant eigenvalues are opposite. In this case

the eigenvalue of largest module can be approximated by applying the

power method to the matrix A2 . Indeed, for i = 1, . . . , n, »i (A2 ) =

2

[»i (A)] , so that »2 = »2 and the analysis falls into the previous case,

1 2

where the matrix is now A2 .

3. »2 = »1 : the two dominant eigenvalues are complex conjugate. Here,

undamped oscillations arise in the sequence of vectors q(k) and the

power method is not convergent (see [Wil65], Chapter 9, Section 12).

As for the computer implementation of (5.17), it is worth noting that nor-

malizing the vector q(k) to 1 keeps away from over¬‚ow (when |»1 | > 1) or

under¬‚ow (when |»1 | < 1) in (5.20). We also point out that the requirement

±1 = 0 (which is a priori impossible to ful¬l when no information about

the eigenvector x1 is available) is not essential for the actual convergence

of the algorithm.

Indeed, although it can be proved that, working in exact arithmetic, the

sequence (5.17) converges to the pair (»2 , x2 ) if ±1 = 0 (see Exercise 10),

the arising of (unavoidable) rounding errors ensures that in practice the

vector q(k) contains a non-null component also in the direction of x1 . This

allows for the eigenvalue »1 to “show-up” and the power method to quickly

converge to it.

An implementation of the power method is given in Program 26. Here

and in the following algorithm, the convergence check is based on the a

posteriori estimate (5.26).

Here and in the remainder of the chapter, the input data z0, toll and

nmax are the initial vector, the tolerance for the stopping test and the

maximum admissible number of iterations, respectively. In output, the vec-

tors nu1 and err contain the sequences {ν (k) } and { r(k) 2 /| cos(θ» )|} (see

(5.26)), whilst x1 and niter are the approximation of the eigenvector x1

and the number of iterations taken by the algorithm to converge, respec-

tively.