00000000000000000

00000000000000000

11111111111111111 11111111111111111

00000000000000000

11111111111111111

00000000000000000 00000000000000000

11111111111111111

k

k

0

0 r

r

q

k k

FIGURE 3.2. Partial pivoting by row (left) or complete pivoting (right). Shaded

areas of the matrix are those involved in the searching for the pivotal entry

Example 3.6 Let us consider the linear system Ax = b with

10’13 1

A=

1 1

3.5 Pivoting 87

and where b is chosen in such a way that x = (1, 1)T is the exact solution.

Suppose we use base 2 and 16 signi¬cant digits. GEM without pivoting would

give xM EG = (0.99920072216264, 1)T , while GEM plus partial pivoting furnishes

•

the exact solution up to the 16th digit.

Let us analyze how partial pivoting a¬ects the LU factorization induced

by GEM. At the ¬rst step of GEM with partial pivoting, after ¬nding

out the entry ar1 of maximum module in the ¬rst column, the elementary

permutation matrix P1 which exchanges the ¬rst row with the r-th row is

constructed (if r = 1, P1 is the identity matrix). Next, the ¬rst Gaussian

transformation matrix M1 is generated and we set A(2) = M1 P1 A(1) . A

similar approach is now taken on A(2) , searching for a new permutation

matrix P2 and a new matrix M2 such that

A(3) = M2 P2 A(2) = M2 P2 M1 P1 A(1) .

Executing all the elimination steps, the resulting upper triangular matrix

U is now given by

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) . (3.51)

Letting M = Mn’1 Pn’1 . . . M1 P1 and P = Pn’1 . . . P1 , we obtain that

U=MA and, thus, U = (MP’1 )PA. It can easily be checked that the matrix

L = PM’1 is unit lower triangular, so that the LU factorization reads

PA = LU. (3.52)

One should not be worried by the presence of the inverse of M, since M’1 =

P’1 M’1 . . . P’1 M’1 and P’1 = PT while M’1 = 2In ’ Mi .

i

1 1 n’1 n’1 i i

Once L, U and P are available, solving the initial linear system amounts

to solving the triangular systems Ly = Pb and Ux = y. Notice that the

entries of the matrix L coincide with the multipliers computed by LU fac-

torization, without pivoting, when applied to the matrix PA.

If complete pivoting is performed, at the ¬rst step of the process, once the

element aqr of largest module in submatrix A(1 : n, 1 : n) has been found,

we must exchange the ¬rst row and column with the q-th row and the

r-th column. This generates the matrix P1 A(1) Q1 , where P1 and Q1 are

permutation matrices by rows and by columns, respectively.

As a consequence, the action of matrix M1 is now such that A(2) =

M1 P1 A(1) Q1 . Repeating the process, at the last step, instead of (3.51) we

obtain

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) Q1 . . . Qn’1 .

In the case of complete pivoting the LU factorization becomes

PAQ = LU,

88 3. Direct Methods for the Solution of Linear Systems

where Q = Q1 . . . Qn’1 is a permutation matrix accounting for all permu-

tations that have been operated. By construction, matrix L is still lower

triangular, with module entries less than or equal to 1. As happens in

partial pivoting, the entries of L are the multipliers produced by the LU

factorization process without pivoting, when applied to the matrix PAQ.

Program 9 is an implementation of the LU factorization with complete

pivoting. For an e¬cient computer implementation of the LU factorization

with partial pivoting, we refer to the MATLAB intrinsic function lu.

Program 9 - LUpivtot : LU factorization with complete pivoting

function [L,U,P,Q] = LUpivtot(A,n)

P=eye(n); Q=P; Minv=P;

for k=1:n-1

[Pk,Qk]=pivot(A,k,n); A=Pk*A*Qk;

[Mk,Mkinv]=MGauss(A,k,n);

A=Mk*A; P=Pk*P; Q=Q*Qk;

Minv=Minv*Pk*Mkinv;

end

U=triu(A); L=P*Minv;

function [Mk,Mkinv]=MGauss(A,k,n)

Mk=eye(n);

for i=k+1:n, Mk(i,k)=-A(i,k)/A(k,k); end

Mkinv=2*eye(n)-Mk;

function [Pk,Qk]=pivot(A,k,n)

[y,i]=max(abs(A(k:n,k:n))); [piv,jpiv]=max(y);

ipiv=i(jpiv); jpiv=jpiv+k-1; ipiv=ipiv+k-1;

Pk=eye(n); Pk(ipiv,ipiv)=0; Pk(k,k)=0; Pk(k,ipiv)=1; Pk(ipiv,k)=1;

Qk=eye(n); Qk(jpiv,jpiv)=0; Qk(k,k)=0; Qk(k,jpiv)=1; Qk(jpiv,k)=1;

Remark 3.3 The presence of large pivotal entries is not in itself su¬cient

to guarantee accurate solutions, as demonstrated by the following example

(taken from [JM92]). For the linear system Ax = b

® ® ®

’4000 2000 2000 x1 400

° 2000 0.78125 0 » ° x2 » = ° 1.3816 »

2000 0 0 x3 1.9273

at the ¬rst step the pivotal entry coincides with the diagonal entry ’4000

itself. However, executing GEM on such a matrix yields the solution

T

x = [0.00096365, ’0.698496, 0.90042329]

whose ¬rst component drastically di¬ers from that of the exact solution

T

x = [1.9273, ’0.698496, 0.9004233] . The cause of this behaviour should

3.6 Computing the Inverse of a Matrix 89

be ascribed to the wide variations among the system coe¬cients. Such cases

can be remedied by a suitable scaling of the matrix (see Section 3.12.1).

Remark 3.4 (Pivoting for symmetric matrices) As already noticed,

pivoting is not strictly necessary if A is symmetric and positive de¬nite.

A separate comment is deserved when A is symmetric but not positive

de¬nite, since pivoting could destroy the symmetry of the matrix. This

can be avoided by employing a complete pivoting of the form PAPT , even

though this pivoting can only turn out into a reordering of the diagonal

entries of A. As a consequence, the presence on the diagonal of A of small

entries might inhibit the advantages of the pivoting. To deal with matrices

of this kind, special algorithms are needed (like the Parlett-Reid method

[PR70] or the Aasen method [Aas71]) for whose description we refer to

[GL89], and to [JM92] for the case of sparse matrices.

3.6 Computing the Inverse of a Matrix

The explicit computation of the inverse of a matrix can be carried out using

the LU factorization as follows. Denoting by X the inverse of a nonsingular

matrix A∈ Rn—n , the column vectors of X are the solutions to the linear

systems Axi = ei , for i = 1, . . . , n.

Supposing that PA=LU, where P is the partial pivoting permutation

matrix, we must solve 2n triangular systems of the form

Lyi = Pei , Uxi = yi i = 1, . . . , n,

i.e., a succession of linear systems having the same coe¬cient matrix but

di¬erent right hand sides. The computation of the inverse of a matrix is a

costly procedure which can sometimes be even less stable than MEG (see

[Hig88]).

An alternative approach for computing the inverse of A is provided by

the Faddev or Leverrier formula, which, letting B0 =I, recursively computes

1

tr(ABk’1 ), Bk = ’ABk’1 + ±k I, k = 1, 2, . . . , n.

±k =

k

Since Bn = 0, if ±n = 0 we get

1

A’1 = Bn’1 ,

±n

and the computational cost of the method for a full matrix is equal to

(n ’ 1)n3 ¬‚ops (for further details see [FF63], [Bar89]).

90 3. Direct Methods for the Solution of Linear Systems

3.7 Banded Systems

Discretization methods for boundary value problems often lead to solving

linear systems with matrices having banded, block or sparse forms. Ex-

ploiting the structure of the matrix allows for a dramatic reduction in the

computational costs of the factorization and of the substitution algorithms.

In the present and forthcoming sections, we shall address special variants

of MEG or LU factorization that are properly devised for dealing with ma-

trices of this kind. For the proofs and a more comprehensive treatment, we

refer to [GL89] and [Hig88] for banded or block matrices, while we refer to

[JM92], [GL81] and [Saa96] for sparse matrices and the techniques for their

storage.

The main result for banded matrices is the following.

Property 3.4 Let A∈ Rn—n . Suppose that there exists a LU factorization

of A. If A has upper bandwidth q and lower bandwidth p, then L has lower

bandwidth p and U has upper bandwidth q.

In particular, notice that the same memory area used for A is enough to

also store its LU factorization. Consider, indeed, that a matrix A having

upper bandwidth q and lower bandwidth p is usually stored in a matrix B

(p + q + 1) — n, assuming that

bi’j+q+1,j = aij

for all the indices i, j that fall into the band of the matrix. For instance, in

the case of the tridiagonal matrix A=tridiag5 (’1, 2, ’1) (where q = p = 1),

the compact storage reads

®

0 ’1 ’1 ’1 ’1

B=° 2 2 ».

2 2 2

’1 ’1 ’1 ’1 0

The same format can be used for storing the factorization LU of A. It is

clear that this storage format can be quite inconvenient in the case where

only a few bands of the matrix are large. In the limit, if only one column

and one row were full, we would have p = q = n and thus B would be a

full matrix with a lot of zero entries.

Finally, we notice that the inverse of a banded matrix is generally full

(as happens for the matrix A considered above).

3.7 Banded Systems 91

3.7.1 Tridiagonal Matrices

Consider the particular case of a linear system with nonsingular tridiagonal

matrix A given by

®

0

a1 c1

b2 a2 . . .

A= .

..

cn’1

.

° »

0 bn an

In such an event, the matrices L and U of the LU factorization of A are

bidiagonal matrices of the form

®

®

0

±1 c1

0

1

β2 1 .

±2 . .

U=

.

L= .. .. ..

. cn’1

. . »

° ° »

0 0

βn 1 ±n

The coe¬cients ±i and βi can easily be computed by the following relations

bi

, ±i = ai ’ βi ci’1 , i = 2, . . . , n.

±1 = a1 , βi = (3.53)

±i’1

This is known as the Thomas algorithm and can be regarded as a particular

instance of the Doolittle factorization, without pivoting. When one is not

interested in storing the coe¬cients of the original matrix, the entries ±i

and βi can be overwritten on A.

The Thomas algorithm can also be extended to solve the whole tridi-

agonal system Ax = f . This amounts to solving two bidiagonal systems

Ly = f and Ux = y, for which the following formulae hold

(Ly = f ) y1 = f1 , yi = fi ’ βi yi’1 , i = 2, . . . , n, (3.54)

yn

, xi = (yi ’ ci xi+1 ) /±i , i = n ’ 1, . . . , 1. (3.55)

(Ux = y) xn =

±n

The algorithm requires only 8n ’ 7 ¬‚ops: precisely, 3(n ’ 1) ¬‚ops for the

factorization (3.53) and 5n ’ 4 ¬‚ops for the substitution procedure (3.54)-

(3.55).

As for the stability of the method, if A is a nonsingular tridiagonal matrix

and L and U are the factors actually computed, then

|δA| ¤ (4u + 3u2 + u3 )|L| |U|,

92 3. Direct Methods for the Solution of Linear Systems

where δA is implicitly de¬ned by the relation A + δA = LU while u is the

roundo¬ unit. In particular, if A is also symmetric and positive de¬nite or

it is an M-matrix, we have

4u + 3u2 + u3

|δA| ¤ |A|,

1’u

which implies the stability of the factorization procedure in such cases. A

similar result holds even if A is diagonally dominant.

3.7.2 Implementation Issues

An implementation of the LU factorization for banded matrices is shown

in Program 10.

Program 10 - lu band : LU factorization for a banded matrix

function [A] = lu band (A,p,q)

[n,n]=size(A);

for k = 1:n-1

for i = k+1:min(k+p,n), A(i,k)=A(i,k)/A(k,k); end

for j = k+1:min(k+q,n)

for i = k+1:min(k+p,n), A(i,j)=A(i,j)-A(i,k)*A(k,j); end

end

end

In the case where n p and n q, this algorithm approximately takes

2npq ¬‚ops, with a considerable saving with respect to the case in which A

is a full matrix.

Similarly, ad hoc versions of the substitution methods can be devised

(see Programs 11 and 12). Their costs are, respectively, of the order of 2np

¬‚ops and 2nq ¬‚ops, always assuming that n p and n q.

Program 11 - forw band : Forward substitution for a banded matrix L

function [b] = forw band (L, p, b)

[n,n]=size(L);

for j = 1:n

for i=j+1:min(j+p,n); b(i) = b(i) - L(i,j)*b(j); end

end

Program 12 - back band : Backward substitution for a banded matrix U

function [b] = back band (U, q, b)

[n,n]=size(U);

for j=n:-1:1

b (j) = b (j) / U (j,j);

for i = max(1,j-q):j-1, b(i)=b(i)-U(i,j)*b(j); end

end

3.8 Block Systems 93

The programs assume that the whole matrix is stored (including also the

zero entries).

Concerning the tridiagonal case, the Thomas algorithm can be imple-

mented in several ways. In particular, when implementing it on computers

where divisions are more costly than multiplications, it is possible (and

convenient) to devise a version of the algorithm without divisions in (3.54)

and (3.55), by resorting to the following form of the factorization

A = LDMT =

® ® ®

’1 ’1

0

γ1 0 0 γ1 c1 0

γ1

.. ..

γ2

b2

’1 ’1

. .

γ2 0 γ2

..

.

.. .. .. ..

° » ° » ° »

. . . .

0 cn’1

0

’1 ’1

γn

0 bn γn 0 0 γn

The coe¬cients γi can be recursively computed by the formulae

γi = (ai ’ bi γi’1 ci’1 )’1 , for i = 1, . . . , n

where γ0 = 0, b1 = 0 and cn = 0 have been assumed. The forward and

backward substitution algorithms respectively read

y1 = γ1 f1 , yi = γi (fi ’ bi yi’1 ), i = 2, . . . , n

(Ly = f )

(3.56)

xi = yi ’ γi ci xi+1 , i = n ’ 1, . . . , 1.

(Ux = y) xn = yn

In Program 13 we show an implementation of the Thomas algorithm in

the form (3.56), without divisions. The input vectors a, b and c contain

the coe¬cients of the tridiagonal matrix {ai }, {bi } and {ci }, respectively,

while the vector f contains the components fi of the right-hand side f.

Program 13 - mod thomas : Thomas algorithm, modi¬ed version

function [x] = mod thomas (a,b,c,f)

n = size(a); b = [0; b]; c = [c; 0]; gamma (1) = 1/a (1);

for i =2:n, gamma(i)=1/(a(i)-b(i)*gamma(i-1)*c(i-1)); end

y (1) = gamma (1) * f (1);

for i = 2:n, y(i)=gamma(i)*(f(i)-b(i)*y(i-1)); end

x (n) = y (n);

for i = n-1:-1:1, x(i)=y(i)-gamma(i)*c(i)*x(i+1); end

3.8 Block Systems

In this section we deal with the LU factorization of block-partitioned matri-

ces, where each block can possibly be of a di¬erent size. Our aim is twofold:

optimizing the storage occupation by suitably exploiting the structure of

the matrix and reducing the computational cost of the solution of the sys-