<< . .

. 14
( : 95)



. . >>

00000000000000000 11111111111111111
00000000000000000
00000000000000000
11111111111111111 11111111111111111
00000000000000000
11111111111111111
00000000000000000 00000000000000000
11111111111111111
k
k
0
0 r
r

q
k k

FIGURE 3.2. Partial pivoting by row (left) or complete pivoting (right). Shaded
areas of the matrix are those involved in the searching for the pivotal entry



Example 3.6 Let us consider the linear system Ax = b with

10’13 1
A=
1 1
3.5 Pivoting 87

and where b is chosen in such a way that x = (1, 1)T is the exact solution.
Suppose we use base 2 and 16 signi¬cant digits. GEM without pivoting would
give xM EG = (0.99920072216264, 1)T , while GEM plus partial pivoting furnishes

the exact solution up to the 16th digit.

Let us analyze how partial pivoting a¬ects the LU factorization induced
by GEM. At the ¬rst step of GEM with partial pivoting, after ¬nding
out the entry ar1 of maximum module in the ¬rst column, the elementary
permutation matrix P1 which exchanges the ¬rst row with the r-th row is
constructed (if r = 1, P1 is the identity matrix). Next, the ¬rst Gaussian
transformation matrix M1 is generated and we set A(2) = M1 P1 A(1) . A
similar approach is now taken on A(2) , searching for a new permutation
matrix P2 and a new matrix M2 such that

A(3) = M2 P2 A(2) = M2 P2 M1 P1 A(1) .

Executing all the elimination steps, the resulting upper triangular matrix
U is now given by

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) . (3.51)

Letting M = Mn’1 Pn’1 . . . M1 P1 and P = Pn’1 . . . P1 , we obtain that
U=MA and, thus, U = (MP’1 )PA. It can easily be checked that the matrix
L = PM’1 is unit lower triangular, so that the LU factorization reads

PA = LU. (3.52)

One should not be worried by the presence of the inverse of M, since M’1 =
P’1 M’1 . . . P’1 M’1 and P’1 = PT while M’1 = 2In ’ Mi .
i
1 1 n’1 n’1 i i
Once L, U and P are available, solving the initial linear system amounts
to solving the triangular systems Ly = Pb and Ux = y. Notice that the
entries of the matrix L coincide with the multipliers computed by LU fac-
torization, without pivoting, when applied to the matrix PA.
If complete pivoting is performed, at the ¬rst step of the process, once the
element aqr of largest module in submatrix A(1 : n, 1 : n) has been found,
we must exchange the ¬rst row and column with the q-th row and the
r-th column. This generates the matrix P1 A(1) Q1 , where P1 and Q1 are
permutation matrices by rows and by columns, respectively.
As a consequence, the action of matrix M1 is now such that A(2) =
M1 P1 A(1) Q1 . Repeating the process, at the last step, instead of (3.51) we
obtain

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) Q1 . . . Qn’1 .

In the case of complete pivoting the LU factorization becomes

PAQ = LU,
88 3. Direct Methods for the Solution of Linear Systems

where Q = Q1 . . . Qn’1 is a permutation matrix accounting for all permu-
tations that have been operated. By construction, matrix L is still lower
triangular, with module entries less than or equal to 1. As happens in
partial pivoting, the entries of L are the multipliers produced by the LU
factorization process without pivoting, when applied to the matrix PAQ.
Program 9 is an implementation of the LU factorization with complete
pivoting. For an e¬cient computer implementation of the LU factorization
with partial pivoting, we refer to the MATLAB intrinsic function lu.

Program 9 - LUpivtot : LU factorization with complete pivoting
function [L,U,P,Q] = LUpivtot(A,n)
P=eye(n); Q=P; Minv=P;
for k=1:n-1
[Pk,Qk]=pivot(A,k,n); A=Pk*A*Qk;
[Mk,Mkinv]=MGauss(A,k,n);
A=Mk*A; P=Pk*P; Q=Q*Qk;
Minv=Minv*Pk*Mkinv;
end
U=triu(A); L=P*Minv;

function [Mk,Mkinv]=MGauss(A,k,n)
Mk=eye(n);
for i=k+1:n, Mk(i,k)=-A(i,k)/A(k,k); end
Mkinv=2*eye(n)-Mk;

function [Pk,Qk]=pivot(A,k,n)
[y,i]=max(abs(A(k:n,k:n))); [piv,jpiv]=max(y);
ipiv=i(jpiv); jpiv=jpiv+k-1; ipiv=ipiv+k-1;
Pk=eye(n); Pk(ipiv,ipiv)=0; Pk(k,k)=0; Pk(k,ipiv)=1; Pk(ipiv,k)=1;
Qk=eye(n); Qk(jpiv,jpiv)=0; Qk(k,k)=0; Qk(k,jpiv)=1; Qk(jpiv,k)=1;



Remark 3.3 The presence of large pivotal entries is not in itself su¬cient
to guarantee accurate solutions, as demonstrated by the following example
(taken from [JM92]). For the linear system Ax = b
® ® ® 
’4000 2000 2000 x1 400
° 2000 0.78125 0 » ° x2 » = ° 1.3816 »
2000 0 0 x3 1.9273
at the ¬rst step the pivotal entry coincides with the diagonal entry ’4000
itself. However, executing GEM on such a matrix yields the solution
T
x = [0.00096365, ’0.698496, 0.90042329]

whose ¬rst component drastically di¬ers from that of the exact solution
T
x = [1.9273, ’0.698496, 0.9004233] . The cause of this behaviour should
3.6 Computing the Inverse of a Matrix 89

be ascribed to the wide variations among the system coe¬cients. Such cases
can be remedied by a suitable scaling of the matrix (see Section 3.12.1).

Remark 3.4 (Pivoting for symmetric matrices) As already noticed,
pivoting is not strictly necessary if A is symmetric and positive de¬nite.
A separate comment is deserved when A is symmetric but not positive
de¬nite, since pivoting could destroy the symmetry of the matrix. This
can be avoided by employing a complete pivoting of the form PAPT , even
though this pivoting can only turn out into a reordering of the diagonal
entries of A. As a consequence, the presence on the diagonal of A of small
entries might inhibit the advantages of the pivoting. To deal with matrices
of this kind, special algorithms are needed (like the Parlett-Reid method
[PR70] or the Aasen method [Aas71]) for whose description we refer to
[GL89], and to [JM92] for the case of sparse matrices.



3.6 Computing the Inverse of a Matrix
The explicit computation of the inverse of a matrix can be carried out using
the LU factorization as follows. Denoting by X the inverse of a nonsingular
matrix A∈ Rn—n , the column vectors of X are the solutions to the linear
systems Axi = ei , for i = 1, . . . , n.
Supposing that PA=LU, where P is the partial pivoting permutation
matrix, we must solve 2n triangular systems of the form

Lyi = Pei , Uxi = yi i = 1, . . . , n,

i.e., a succession of linear systems having the same coe¬cient matrix but
di¬erent right hand sides. The computation of the inverse of a matrix is a
costly procedure which can sometimes be even less stable than MEG (see
[Hig88]).

An alternative approach for computing the inverse of A is provided by
the Faddev or Leverrier formula, which, letting B0 =I, recursively computes

1
tr(ABk’1 ), Bk = ’ABk’1 + ±k I, k = 1, 2, . . . , n.
±k =
k
Since Bn = 0, if ±n = 0 we get

1
A’1 = Bn’1 ,
±n

and the computational cost of the method for a full matrix is equal to
(n ’ 1)n3 ¬‚ops (for further details see [FF63], [Bar89]).
90 3. Direct Methods for the Solution of Linear Systems

3.7 Banded Systems
Discretization methods for boundary value problems often lead to solving
linear systems with matrices having banded, block or sparse forms. Ex-
ploiting the structure of the matrix allows for a dramatic reduction in the
computational costs of the factorization and of the substitution algorithms.
In the present and forthcoming sections, we shall address special variants
of MEG or LU factorization that are properly devised for dealing with ma-
trices of this kind. For the proofs and a more comprehensive treatment, we
refer to [GL89] and [Hig88] for banded or block matrices, while we refer to
[JM92], [GL81] and [Saa96] for sparse matrices and the techniques for their
storage.
The main result for banded matrices is the following.



Property 3.4 Let A∈ Rn—n . Suppose that there exists a LU factorization
of A. If A has upper bandwidth q and lower bandwidth p, then L has lower
bandwidth p and U has upper bandwidth q.



In particular, notice that the same memory area used for A is enough to
also store its LU factorization. Consider, indeed, that a matrix A having
upper bandwidth q and lower bandwidth p is usually stored in a matrix B
(p + q + 1) — n, assuming that

bi’j+q+1,j = aij

for all the indices i, j that fall into the band of the matrix. For instance, in
the case of the tridiagonal matrix A=tridiag5 (’1, 2, ’1) (where q = p = 1),
the compact storage reads

® 
0 ’1 ’1 ’1 ’1
B=° 2 2 ».
2 2 2
’1 ’1 ’1 ’1 0

The same format can be used for storing the factorization LU of A. It is
clear that this storage format can be quite inconvenient in the case where
only a few bands of the matrix are large. In the limit, if only one column
and one row were full, we would have p = q = n and thus B would be a
full matrix with a lot of zero entries.
Finally, we notice that the inverse of a banded matrix is generally full
(as happens for the matrix A considered above).
3.7 Banded Systems 91

3.7.1 Tridiagonal Matrices
Consider the particular case of a linear system with nonsingular tridiagonal
matrix A given by
® 
0
a1 c1
 
 b2 a2 . . . 
 
A= .
..
 cn’1 
.
° »
0 bn an

In such an event, the matrices L and U of the LU factorization of A are
bidiagonal matrices of the form
® 

®
0
±1 c1
0
1
 

 β2 1 .
 
±2 . .


 U= 
  .
L= .. ..  ..
 . cn’1 
. . »
° ° »
0 0
βn 1 ±n

The coe¬cients ±i and βi can easily be computed by the following relations

bi
, ±i = ai ’ βi ci’1 , i = 2, . . . , n.
±1 = a1 , βi = (3.53)
±i’1

This is known as the Thomas algorithm and can be regarded as a particular
instance of the Doolittle factorization, without pivoting. When one is not
interested in storing the coe¬cients of the original matrix, the entries ±i
and βi can be overwritten on A.
The Thomas algorithm can also be extended to solve the whole tridi-
agonal system Ax = f . This amounts to solving two bidiagonal systems
Ly = f and Ux = y, for which the following formulae hold

(Ly = f ) y1 = f1 , yi = fi ’ βi yi’1 , i = 2, . . . , n, (3.54)


yn
, xi = (yi ’ ci xi+1 ) /±i , i = n ’ 1, . . . , 1. (3.55)
(Ux = y) xn =
±n

The algorithm requires only 8n ’ 7 ¬‚ops: precisely, 3(n ’ 1) ¬‚ops for the
factorization (3.53) and 5n ’ 4 ¬‚ops for the substitution procedure (3.54)-
(3.55).
As for the stability of the method, if A is a nonsingular tridiagonal matrix
and L and U are the factors actually computed, then

|δA| ¤ (4u + 3u2 + u3 )|L| |U|,
92 3. Direct Methods for the Solution of Linear Systems

where δA is implicitly de¬ned by the relation A + δA = LU while u is the
roundo¬ unit. In particular, if A is also symmetric and positive de¬nite or
it is an M-matrix, we have
4u + 3u2 + u3
|δA| ¤ |A|,
1’u
which implies the stability of the factorization procedure in such cases. A
similar result holds even if A is diagonally dominant.


3.7.2 Implementation Issues
An implementation of the LU factorization for banded matrices is shown
in Program 10.

Program 10 - lu band : LU factorization for a banded matrix
function [A] = lu band (A,p,q)
[n,n]=size(A);
for k = 1:n-1
for i = k+1:min(k+p,n), A(i,k)=A(i,k)/A(k,k); end
for j = k+1:min(k+q,n)
for i = k+1:min(k+p,n), A(i,j)=A(i,j)-A(i,k)*A(k,j); end
end
end
In the case where n p and n q, this algorithm approximately takes
2npq ¬‚ops, with a considerable saving with respect to the case in which A
is a full matrix.
Similarly, ad hoc versions of the substitution methods can be devised
(see Programs 11 and 12). Their costs are, respectively, of the order of 2np
¬‚ops and 2nq ¬‚ops, always assuming that n p and n q.

Program 11 - forw band : Forward substitution for a banded matrix L
function [b] = forw band (L, p, b)
[n,n]=size(L);
for j = 1:n
for i=j+1:min(j+p,n); b(i) = b(i) - L(i,j)*b(j); end
end


Program 12 - back band : Backward substitution for a banded matrix U
function [b] = back band (U, q, b)
[n,n]=size(U);
for j=n:-1:1
b (j) = b (j) / U (j,j);
for i = max(1,j-q):j-1, b(i)=b(i)-U(i,j)*b(j); end
end
3.8 Block Systems 93

The programs assume that the whole matrix is stored (including also the
zero entries).
Concerning the tridiagonal case, the Thomas algorithm can be imple-
mented in several ways. In particular, when implementing it on computers
where divisions are more costly than multiplications, it is possible (and
convenient) to devise a version of the algorithm without divisions in (3.54)
and (3.55), by resorting to the following form of the factorization
A = LDMT =
® ® ® 
’1 ’1
0
γ1 0 0 γ1 c1 0
γ1
     
.. ..
γ2
 b2     
’1 ’1
. .
γ2 0 γ2
     
..
     
.
.. .. .. ..
° » ° » ° »
. . . .
0 cn’1
0
’1 ’1
γn
0 bn γn 0 0 γn

The coe¬cients γi can be recursively computed by the formulae
γi = (ai ’ bi γi’1 ci’1 )’1 , for i = 1, . . . , n
where γ0 = 0, b1 = 0 and cn = 0 have been assumed. The forward and
backward substitution algorithms respectively read
y1 = γ1 f1 , yi = γi (fi ’ bi yi’1 ), i = 2, . . . , n
(Ly = f )
(3.56)
xi = yi ’ γi ci xi+1 , i = n ’ 1, . . . , 1.
(Ux = y) xn = yn
In Program 13 we show an implementation of the Thomas algorithm in
the form (3.56), without divisions. The input vectors a, b and c contain
the coe¬cients of the tridiagonal matrix {ai }, {bi } and {ci }, respectively,
while the vector f contains the components fi of the right-hand side f.

Program 13 - mod thomas : Thomas algorithm, modi¬ed version
function [x] = mod thomas (a,b,c,f)
n = size(a); b = [0; b]; c = [c; 0]; gamma (1) = 1/a (1);
for i =2:n, gamma(i)=1/(a(i)-b(i)*gamma(i-1)*c(i-1)); end
y (1) = gamma (1) * f (1);
for i = 2:n, y(i)=gamma(i)*(f(i)-b(i)*y(i-1)); end
x (n) = y (n);
for i = n-1:-1:1, x(i)=y(i)-gamma(i)*c(i)*x(i+1); end



3.8 Block Systems
In this section we deal with the LU factorization of block-partitioned matri-
ces, where each block can possibly be of a di¬erent size. Our aim is twofold:
optimizing the storage occupation by suitably exploiting the structure of
the matrix and reducing the computational cost of the solution of the sys-

<< . .

. 14
( : 95)



. . >>