94 3. Direct Methods for the Solution of Linear Systems

3.8.1 Block LU Factorization

Let A∈ Rn—n be the following block partitioned matrix

A11 A12

A= ,

A21 A22

where A11 ∈ Rr—r is a nonsingular square matrix whose factorization

L11 D1 R11 is known, while A22 ∈ R(n’r)—(n’r) . In such a case it is possible

to factorize A using only the LU factorization of the block A11 . Indeed, it

is true that

A11 A12 L11 0 D1 0 R11 R12

= ,

A21 A22 L21 In’r 0 ∆2 0 In’r

where

L21 = A21 R’1 D’1 , R12 = D’1 L’1 A12 ,

11 1 1 11

∆2 = A22 ’ L21 D1 R12 .

If necessary, the reduction procedure can be repeated on the matrix ∆2 ,

thus obtaining a block-version of the LU factorization.

If A11 were a scalar, the above approach would reduce by one the size of

the factorization of a given matrix. Applying iteratively this method yields

an alternative way of performing the Gauss elimination.

We also notice that the proof of Theorem 3.4 can be extended to the

case of block matrices, obtaining the following result.

Theorem 3.7 Let A ∈ Rn—n be partitioned in m — m blocks Aij with

i, j = 1, . . . , m. A admits a unique LU block factorization (with L having

unit diagonal entries) i¬ the m ’ 1 dominant principal block minors of A

are nonzero.

Since the block factorization is an equivalent formulation of the standard

LU factorization of A, the stability analysis carried out for the latter holds

for its block-version as well. Improved results concerning the e¬cient use

in block algorithms of fast forms of matrix-matrix product are dealt with

in [Hig88]. In the forthcoming section we focus solely on block-tridiagonal

matrices.

3.8.2 Inverse of a Block-partitioned Matrix

The inverse of a block matrix can be constructed using the LU factorization

introduced in the previous section. A remarkable application is when A is

a block matrix of the form

A = C + UBV,

3.8 Block Systems 95

where C is a block matrix that is “easy” to invert (for instance, when C

is given by the diagonal blocks of A), while U, B and V take into account

the connections between the diagonal blocks. In such an event A can be

inverted by using the Sherman-Morrison or Woodbury formula

’1

’1

BVC’1 , (3.57)

A’1 = (C + UBV) = C’1 ’ C’1 U I + BVC’1 U

having assumed that C and I + BVC’1 U are two nonsingular matrices.

This formula has several practical and theoretical applications, and is par-

ticularly e¬ective if connections between blocks are of modest relevance.

3.8.3 Block Tridiagonal Systems

Consider block tridiagonal systems of the form

® ® ®

0

A11 A12 x1 b1

.. . .

A21

. .

.

A22

. .

= ,

An x = (3.58)

.. ..

° . .

»° »

. .

. .

° An’1,n » . .

0 xn bn

An,n’1 Ann

where Aij are matrices of order ni — nj and xi and bi are column vectors of

size ni , for i, j = 1, . . . , n. We assume that the diagonal blocks are squared,

although not necessarily of the same size. For k = 1, . . . , n, set

®U

®

0

A12

0

In1 1

L1 ..

In2

.

U2

Ak = .

.. ..

..

. .

° »° . Ak’1,k »

0 0

Lk’1 Ink Uk

Equating for k = n the matrix above with the corresponding blocks of An ,

it turns out that U1 = A11 , while the remaining blocks can be obtained

solving sequentially, for i = 2, . . . , n, the systems Li’1 Ui’1 = Ai,i’1 for

the columns of L and computing Ui = Aii ’ Li’1 Ai’1,i .

This procedure is well de¬ned only if all the matrices Ui are nonsingular,

which is the case if, for instance, the matrices A1 , . . . , An are nonsingular.

As an alternative, one could resort to factorization methods for banded

matrices, even if this requires the storage of a large number of zero entries

(unless a suitable reordering of the rows of the matrix is performed).

A remarkable instance is when the matrix is block tridiagonal and sym-

metric, with symmetric and positive de¬nite blocks. In such a case (3.58)

96 3. Direct Methods for the Solution of Linear Systems

takes the form

® ® ®

A11 AT 0 x1 b1

21

.. . .

A21 A22

. .

.

. .

= .

.. ..

° . .

»° »

. .

AT

. .

° » . .

n,n’1

0 xn bn

An,n’1 Ann

Here we consider an extension to the block case of the Thomas algorithm,

which aims at transforming A into a block bidiagonal matrix. To this pur-

pose, we ¬rst have to eliminate the block corresponding to matrix A21 .

Assume that the Cholesky factorization of A11 is available and denote by

H11 the Cholesky factor. If we multiply the ¬rst row of the block system

by H’T , we ¬nd

11

H11 x1 + H’T AT x2 = H’T b1 .

21

11 11

Letting H21 = H’T AT and c1 = H11 b1 , it follows that A21 = HT H11 and

21 21

11

thus the ¬rst two rows of the system are

H11 x1 + H21 x2 = c1 ,

HT H11 x1 + A22 x2 + AT x3 = b2 .

21 32

As a consequence, multiplying the ¬rst row by HT and subtracting it from

21

the second one, the unknown x1 is eliminated and the following equivalent

equation is obtained

(1)

A22 x2 + AT x3 = b2 ’ H21 c1 ,

32

(1) (1)

with A22 = A22 ’ HT H21 . At this point, the factorization of A22 is carried

21

out and the unknown x3 is eliminated from the third row of the system,

and the same is repeated for the remaining rows of the system. At the end

n’1

of the procedure, which requires solving (n ’ 1) j=1 nj linear systems to

compute the matrices Hi+1,i , i = 1, . . . , n ’ 1, we end up with the following

block bidiagonal system

® ® ®

0

H11 H21 x1 c1

. .

..

. .

.

H22

. .

=

..

. Hn,n’1 ° . » ° . ». .

° » . .

0 xn cn

Hnn

which can be solved with a (block) back substitution method. If all blocks

have the same size p, then the number of multiplications required by the

algorithm is about (7/6)(n’1)p3 ¬‚ops (assuming both p and n very large).

3.9 Sparse Matrices 97

3.9 Sparse Matrices

In this section we brie¬‚y address the numerical solution of linear sparse

systems, that is, systems where the matrix A∈ Rn—n has a number of

nonzero entries of the order of n (and not n2 ). We call a pattern of a sparse

matrix the set of its nonzero coe¬cients.

Banded matrices with su¬ciently small bands are sparse matrices. Ob-

viously, for a sparse matrix the matrix structure itself is redundant and it

can be more conveniently substituted by a vector-like structure by means

of matrix compacting techniques, like the banded matrix format discussed

in Section 3.7.

4

5 3

x xxx x x 2

6

xxx x

x

xxxx

xxxxxx 1

7

x xx

xxx

xx xx x

xxx xx 8 12

xxx x

xx x

11

xxx xx 9

x xx xx 10

FIGURE 3.3. Pattern of a symmetric sparse matrix (left) and of its associated

graph (right). For the sake of clarity, the loops have not been drawn; moreover,

since the matrix is symmetric, only one of the two sides associated with each

aij = 0 has been reported

For sake of convenience, we associate with a sparse matrix A an oriented

graph G(A). A graph is a pair (V, X ) where V is a set of p points and X

is a set of q ordered pairs of elements of V that are linked by a line. The

elements of V are called the vertices of the graph, while the connection lines

are called the paths of the graph.

The graph G(A) associated with a matrix A∈ Rm—n can be constructed

by identifying the vertices with the set of the indices from 1 to the maximum

between m and n and supposing that a path exists which connects two

vertices i and j if aij = 0 and is directed from i to j, for i = 1, . . . , m and

j = 1, . . . , n. For a diagonal entry aii = 0, the path joining the vertex i

with itself is called a loop. Since an orientation is associated with each side,

the graph is called oriented (or ¬nite directed). As an example, Figure 3.3

displays the pattern of a symmetric and sparse 12 — 12 matrix, together

with its associated graph.

As previously noticed, during the factorization procedure, nonzero entries

can be generated in memory positions that correspond to zero entries in

98 3. Direct Methods for the Solution of Linear Systems

the starting matrix. This action is referred to as ¬ll-in. Figure 3.4 shows the

e¬ect of ¬ll-in on the sparse matrix whose pattern is shown in Figure 3.3.

Since use of pivoting in the factorization process makes things even more

complicated, we shall only consider the case of symmetric positive de¬nite

matrices for which pivoting is not necessary.

A ¬rst remarkable result concerns the amount of ¬ll-in. Let mi (A) =

i ’ min {j < i : aij = 0} and denote by E(A) the convex hull of A, given

by

E(A) = {(i, j) : 0 < i ’ j ¤ mi (A)} . (3.59)

For a symmetric positive de¬nite matrix,

E(A) = E(H + HT ) (3.60)

where H is the Cholesky factor, so that ¬ll-in is con¬ned within the convex

hull of A (see Figure 3.4). Moreover, if we denote by lk (A) the number of

active rows at the k-th step of the factorization (i.e., the number of rows

of A with i > k and aik = 0), the computational cost of the factorization

process is

n

1

lk (A) (lk (A) + 3) ¬‚ops, (3.61)

2

k=1

having accounted for all the nonzero entries of the convex hull. Con¬nement

of ¬ll-in within E(A) ensures that the LU factorization of A can be stored

without extra memory areas simply by storing all the entries of E(A) (in-

cluding the null elements). However, such a procedure might still be highly

ine¬cient due to the large number of zero entries in the hull (see Exercise

11).

On the other hand, from (3.60) one gets that the reduction in the convex

hull re¬‚ects a reduction of ¬ll-in, and in turn, due to (3.61), of the number

of operations needed to perform the factorization. For this reason several

strategies for reordering the graph of the matrix have been devised. Among

them, we recall the Cuthill-McKee method, which will be addressed in the

next section.

An alternative consists of decomposing the matrix into sparse subma-

trices, with the aim of reducing the original problem to the solution of

subproblems of reduced size, where matrices can be stored in full format.

This approach leads to submatrix decomposition methods which will be

addressed in Section 3.9.2.

3.9.1 The Cuthill-McKee Algorithm

The Cuthill-McKee algorithm is a simple and e¬ective method for reorder-

ing the system variables.

3.9 Sparse Matrices 99

x xxx x x

x

xxx x

x x

xxxx

xxx

xxxxxx

xxxx

x 000 x

111x1111

0000 x xx

000x x

111

x 0000

1111 xxx

x 000

111

0 1111x x

x 0000x

1 0000 xx xx x

1111 xxx xx

xxx x

xx

xx x

xx

x 000x

111

0000000 x 000x x

1111111 111 xxx xx

xx

x 0000000

1111111 x xx xx

x

000

111

FIGURE 3.4. The shaded regions in the left ¬gure show the areas of the matrix

that can be a¬ected by ¬ll-in, for the matrix considered in Figure 3.3. Solid

lines denote the boundary of E(A). The right ¬gure displays the factors that

have been actually computed. Black dots denote the elements of A that were

originarily equal to zero

The ¬rst step of the algorithm consists of associating with each vertex of

the graph the number of its connections with neighboring vertices, called

the degree of the vertex. Next, the following steps are taken:

1. a vertex with a low number of connections is chosen as the ¬rst vertex

of the graph;

2. the vertices connected to it are progressively re-labeled starting from

those having lower degrees;

3. the procedure is repeated starting from the vertices connected to the

second vertex in the updated list. The nodes already re-labeled are

ignored. Then, a third new vertex is considered, and so on, until all

the vertices have been explored.

The usual way to improve the e¬ciency of the algorithm is based on the

so-called reverse form of the Cuthill-McKee method. This consists of ex-

ecuting the Cuthill-McKee algorithm described above where, at the end,

the i-th vertex is moved into the n ’ i + 1-th position of the list, n being

the number of nodes in the graph. Figure 3.5 reports, for comparison, the

graphs obtained using the direct and reverse Cuthill-McKee reordering in

the case of the matrix pattern represented in Figure 3.3, while in Figure

3.6 the factors L and U are compared. Notice the absence of ¬ll-in when

the reverse Cuthill-McKee method is used.

Remark 3.5 For an e¬cient solution of linear systems with sparse ma-

trices, we mention the public domain libraries SPARSKIT [Saa90], UMF-

PACK [DD95] and the MATLAB sparfun package.

100 3. Direct Methods for the Solution of Linear Systems

4 (8)

5 (12) 3 (11)

4 (5)

3 (2)

5 (6)

6 (7) 2 (9)

6 (1) 2 (4)

7 (6)

7 (12) 1 (3)

1 (10)

8 (5)

12 (1)

12 (7)

8 (8)

11 (11) 9 (3) 11 (2)

9 (10)

10 (4)

10 (9)