2. 3 Extensions
Chapter 3 Chapter 3 Numerics
3.3 Storage Modes
3.3.2 Compressed Matrix Storage Mode
The first mode we discuss is known as the compressed matrix storage mode, which essentially "compresses" the non-zero elements in each row to the left. The mode uses two matrices to store the sparse m x n matrix A: an m x l matrix AS, and an m x l matrix K, where l is the maximum number of non-zero elements in a row of A. AS contains the non-zero elements of A, row by row, padding with O's each row of A containing fewer than l non-zero elements. K contains the corresponding column indices of each non-zero element in A. For example, to store the 6 x 6 matrix
11 0 13 0 0 0 21 22 0 24 0 0 0 32 33 0 35 0
A= (3.12)
0 0 43 44 0 46 51 0 0 54 55 0 61 62 0 0 65 66
11 13 0 0 1 3
* *
22 21 24 0 2 1 4
*
33 32 35 0 3 2 5
*
AS= K = (3.13)
44 43 46 0 4 3 6
*
55 51 54 0 5 1 4
*
66 61 62 65 6 1 2 5
It is easy to see that this method is most effective for sparse matrices with approx- imately the same number of non-zero elements in each row.
When A is to be used in matrix-vector products, the compressed matrix storage mode brings up the problem of random gather (with its close cousin, random scatter[7]). Random gather involves "randomly accessing" data stored in computer memory. For example, in the compressed matrix mode, the matrix-vector product
v +-- A w would be coded as
do i = 1, m v(i) = 0 do j= 1, 1
v(i) = v(i)
+
AS(i,j)*
w(K(i,j)) end doend do
If the non-zero elements of A occur in random positions in each row of AS, the above code will randomly jump around in memory, gathering the correct elements of w to form the product. (Usually memory access is fastest when successively accessed elements are distributed in memory with unit stride, i.e., at consecutive addresses. Thus, in the above example, even if the successive row elements of A are not completely randomly distributed, the algorithm will be bogged down if they are not arranged contiguously in memory.) For most present computer architectures, random gather is a very inefficient process. Sparse matrices with an irregular sparsity pattern therefore present a special challenge in terms of storage and processing.
3.3.3 Storage by Indices
Another mode, well suited to matrices with a random sparsity pattern, is storage by indices. In this mode, A is stored in three data structures: AS, lA, and JA, all vectors of length l, where l is the number of non-zero elements of A. AS contains the non-zero elements of A, in any order, and lA and JA contain the row and column indices, respectively, of the corresponding elements of AS. For example,
the matrix
11 0 13 0 0 0
21 22 0 24 0 0 0 32 33 0 35 0
A=
(3.14)0 0 43 44 0 46
0 0 0 0 0 0
61 62 0 0 65 66 might be stored as
AS (11,22,32,33,13,21,43,24,66,46,35,62,61,65,44), IA (1,2,3,3,1,2,4,2,6,4,3,6,6,6,4),
JA (1, 2, 2, 3, 3, 1, 3, 4, 6, 6, 5, 2, 1, 5, 4).
Storage by indices is intuitive and simple to implement, yet it is not the most storage efficient for matrices that are not particularly sparse. For example, if A is an integer matrix with more than 1/3 of its entries non-zero, this mode actually consumes more memory than storing all of A!
3.3.4 Storage by Columns or Rows
Another mode is storage by columns (or, analogously, by rows). Storage by columns stores the m x n matrix A in three data structures: two arrays AS and IA of length l, and an array JA of length n
+
1, where lis the number of non-zero elements of A. AS contains the non-zero elements of A, column by column, left to right. IA contains the row indices of the corresponding elements in AS, and J A lists the positions in AS at which each new column of A begins. (The lastelement of JA is l
+
1.) For example, the matrix 11 0 13 0 0 0 21 22 0 24 0 0 0 32 33 0 0 0A= (3.15)
0 0 43 44 0 46 0 0 0 0 0 0 61 62 0 0 0 66 would be stored as
AS (11,61,21,62,32,22,13,33,43,44,24,46,66), IA (1, 6, 2, 6, 3, 2, 1, 3, 4, 4, 2, 4, 6),
JA (1,4, 7, 10, 12, 12, 14).
3.3.5 Compre s sed Diagonal M ode
The final mode which we describe in detail is the compressed diagonal storage mode. This is the mode we use for storing most of the matrix in (2.19), as it is designed for square sparse matrices whose elements are concentrated along a few diagonals. As we shall see, this mode also lends itself well to fast matrix-vector products. The compressed diagonal storage mode stores the m x m matrix A in two data structures: an m x l matrix AS and a vector LA of length l, where l is the number of non-zero diagonals in A. The elements of LA give the positions of the non-zero diagonals relative to the major diagonal, and the columns of AS give the diagonals, padded with leading zeroes for diagonals below the major diagonal and with trailing zeroes for diagonals above the major diagonal. For example, the
6 X 6 matrix
11 0 13 0 0 0 21 22 0 24 0 0 0 32 33 0 35 0
A= (3.16)
0 0 43 44 0 46 51 0 0 54 0 0 61 62 0 0 65 66 would be stored as
11 13 0 0 0 22 24 21 0 0 33 35 32 0 0
AS= (3.17)
44 46 43 0 0 55 0 54 51 0 66 0 65 62 61 and
LA= (0, 2, -1, -4, -5). (3.18) The compressed diagonal method is storage efficient for sparse matrices whose elements are concentrated along a few diagonals. In addition, it is well suited to matrix-vector products. One can verify that to calculate the matrix-vector product v ~ A w, one simply accumulates the dot products of the columns of AS with w (properly aligned according to LA) into v. Dot products run very quickly on most architectures, as they require accessing contiguous pieces of memory, one after another. The dot product accumulations run particularly fast on the IBM RS/6000's we used for the simulations in this thesis on account of their superscalar capability, allowing a multiply and an add instruction to be performed in one cycle.
The compressed diagonal method is thus extremely well suited to our application, both from the point of view of storage and of speed.
3.3.6 Supercell Application
Having explored some sparse matrix storage modes, we now present our method for storing the matrix in (2.19). Since we have found the quasi-minimum residual iterative method best for solving our system, our goal is to store the matrix in (2.19) so as to conserve space and make matrix-vector products as fast as possible.
We begin by analyzing the matrix. As we show below, aside from the two blocks of the form uvut' the matrix is sparse with all non-zero elements concentrated along 11 diagonals. This part of the matrix is therefore stored using the compressed diagonal mode. The remaining two blocks are composed of the dense matrix U and the diagonal matrix V (see Section 2.1). Since dense matrix-matrix products are costly (O(M3), where M = nxny), we perform the matrix-vector products involving these blocks, uvutc, one matrix-vector product at a time: U(V(UtC)). We thus store U in an ordinary two-dimensional array and V in one-dimensional array.
The rest of the matrix (aside from the identity matrix blocks) describes the onsite energies and nearest neighbor interactions in the tight-binding Hamiltonian.
It is therefore not surprising that, if we order the supercell basis correctly, the matrix will have its non-zero elements concentrated along a few diagonals. We can arrange this by ordering the basis elements as in Figure 3.1. (Recall that the basis consists of nxnynz orbitals, one for each site in the supercell representation of the device- see Section 2.1.) Since the supercell model enforces periodic boundary conditions by connecting sites on opposite edges of the supercell, there are two categories of orbitals to consider in determining the sparsity pattern of the matrix:
those on the edges of the supercell and those in the interior. The analysis of orbital connectivity in Table 3.1 shows that there are only eleven different values of i - j for which orbital li) is connected with orbital lj) to produce a non-zero element in the matrix. Since one of them is the major diagonal i - j = 0 (on which the 1 's in the unit matrix blocks lie), the part of the matrix excluding the blocks of the form uvut has all non-zero elements concentrated on 11 diagonals. We thus use