i
CATATAN HARIAN
PENELITIAN KEMITRAAN
DANA ITS 2020
PENGEMBANGAN ALGORITMA TRANSFORMASI ERROR-FREE UNTUK
PERKALIAN MATRIKS KOMPLEKS
Tim Peneliti :
Dr. Imam Mukhlash, S.Si., MT (Matematika/FSAD)
Drs. Bandung Arry Sanjoyo, MIKom. (Matematika/FSAD)
Drs. Nurul Hidayat, MKom (Matematika/FSAD)
Nurul Yakim Kazal (Matematika/FSAD)
DIREKTORAT RISET DAN PENGABDIAN KEPADA MASYARAKAT
INSTITUT TEKNOLOGI SEPULUH NOPEMBER
SURABAYA
2020
2
Secara umum, kegiatan penelitian ini terdiri dari kegiatan diskusi sebelum diskusi dengan mitra dilakukan. Beberapa catatan kegiatan penelitian dirangkum dalam Tabel berikut:
No
Tanggal
Kegiatan
1 09/06/2020
Catatan: Persiapan diskusi pertama dengan prof. Ozaki Dokumen Pendukung:
Lampiran 1
2 12/06/2020
Catatan: Diskusi Tim Peneliti dengan mitra Dokumen Pendukung:
Dokumen pendukung: Lampiran 2
3 22/06/2020
Catatan: Persiapan diskusi kedua, konsep dari algoritma ExtractScalar dalam transformasi error-free
Dokumen Pendukung:
-4 26/06/2020 Catatan: Implementasi konsep dari algoritma ExtractScalar dalam transformasi error-free untuk perkalian matriks kompleks (Proposed Method 1)
3
No
Tanggal
Kegiatan
Dokumen pendukung: Lampiran 3
5 30/06/2020 Catatan: Diskusi tentang implementasi konsep algoritma ExtractScalar
Dokumen pendukung: Lampiran 4
6 10/07/2020 Catatan: Implementasi konsep dari algoritma ExtractScalar dalam transformasi error-free untuk perkalian matriks kompleks (Proposed Method 2 dan 3)
Dokumen pendukung: Lampiran 4
7 21/07/2020 Catatan: Penyiapan draft paper tentang overview
Dokumen pendukung: Lampiran 5
8 24/07/2020 Catatan: Draf paper tentang overview dari transformasi error-free untuk perkalian matriks dan metode-metode yang
4
No
Tanggal
Kegiatan
Dokumen pendukung:Lampiran 5
9 03/08/2020 Catatan: Penyiapan materi paper ICoMPAC Dokumen pendukung:
10 06/08/2020 Catatan: Diskusi tentang draf paper untuk ICoMPAC 2020
Dokumen pendukung: Lampiran 6
5
No
Tanggal
Kegiatan
12 12/08/2020 Catatan: Diskusi draf paper untuk ICoMPAC 2020
13 13/08/2020 Catatan: Diskusi draf paper untuk ICoMPAC 2020 Dokumen pendukung:
14 14/08/2020 Catatan: Final review dari draf paper untuk ICoMPAC 2020 dg Prof. Ozaki
Dokumen pendukung: Lampiran 7
15 15/08/2020 Catatan: Diskusi final review dari draf paper untuk ICoMPAC 2020 berdasarkan masukan dari Prof. Ozaki
16 18/08/2020 Catatan: Submit paper ke ICoMPAC
17 19/08/2020 Catatan: Draf paper untuk Journal of Computational and Applied Mathematics (Section 2)
18 24/08/2020 Catatan: Draf paper untuk Journal of Computational and Applied Mathematics (Section 3)
Dokumen pendukung: -
19 28/08/2020 Catatan: Draf paper untuk Journal of Computational and Applied Mathematics (lanjutan Section 2 dan 3)
6
No
Tanggal
Kegiatan
Dokumen pendukung: Lampiran 8
20 29/08/2020 Catatan: Proses revisi dari draf paper untuk Journal of
Computational and Applied Mathematics (Section 2 dan 3) berdasarkan masukan dari Prof. Ozaki
21 01/09/2020 Catatan: Proses revisi dari draf paper untuk Journal of
Computational and Applied Mathematics (Section 2 dan 3) –lanjutan berdasarkan masukan dari Prof. Ozaki
22 08/09/2020 Catatan: Proses revisi dari draf paper untuk Journal of
Computational and Applied Mathematics (Section 2 dan 3) –lanjutan berdasarkan masukan dari Prof. Ozaki Dokumen pendukung:
7
No
Tanggal
Kegiatan
23 11/09/2020 Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): Zoom meeting dengan Prof. Ozaki
Dokumen pendukung:
Dokumen pendukung: Lampiran 9
24 14/09/2020 Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): berdasarkan hasil Zoom meeting dengan Prof. Ozaki
25
17/09/2020
Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): -lanjutan berdasarkan hasil Zoom meeting dengan Prof. Ozaki
26
24/09/2020
Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): berdasarkan hasil Zoom meeting dengan Prof. Ozaki
27
25/09/2020
Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki
28
29/09/2020
Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki – lanjutan
29
06/10/2020
Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki – lanjutan
8
No
Tanggal
Kegiatan
30
08/10/2020
Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki – lanjutan
Revisi paper ICoMPAC 31
12/10/2020 Persiapan Zoom Meeting dengan prof. Ozaki
32
13/10/2020
Zoom meeting dengan Prof. Ozaki: Diskusi hasil eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics.
33
15/10/2020 Eksperimen ulang berdasarkan berdasarkan hasil diskusi dalam Zoom meeting
34
20/10/2020 Lanjutan eksperimen ulang berdasarkan berdasarkan hasil diskusi dalam Zoom meeting
35
22/10/2020 Diskusi perbaikan paper ICoMPAC untuk publikasi ke IOP
36
26/10/2020 Diskusi perbaikan paper ICoMPAC untuk publikasi ke IOP – lanjutan
37
27/10/2020 Zoom meeting dengan Prof. Ozaki: Diskusi lanjutan tentang paper ICoMPAC untuk publikasi ke IOP
38
02/11/2020 Submit revisi paper ke ICoMPAC untuk publikasi ke IOP 39
03/11/2020 Diskusi eksperimen numerik untuk condition number 40
04/11/2020 Eksperimen numerik untuk condition number – lanjutan 41
05/11/2020 Diskusi eksperimen numerik untuk condition number – lanjutan 42
06/11/2020 Eksperimen numerik untuk condition number – lanjutan 43
07/11/2020 Diskusi eksperimen numerik untuk condition number – lanjutan 44
9
No
Tanggal
Kegiatan
45
10/11/2020
Zoom meeting dengan Prof. Ozaki: Menambahkan analisis tentang condition number dalam draf paper untuk Journal of Computational and Applied Mathematics.
46
11/11/2020 Diskusi tentang analisis condition number dalam draf paper
47
12/11/2020
Diskusi tentang analisis condition number dalam draf paper
Eksperimen condition number untuk perkalian matriks kompleks dengan invers
48
13/11/2020 Diskusi tentang hasil eksperimen condition number untuk perkalian matriks kompleks dengan invers 49
16/11/2020 Diskusi tentang hasil eksperimen condition number untuk perkalian matriks kompleks dengan invers - lanjutan 50
17/11/2020 Diskusi tentang hasil eksperimen condition number untuk perkalian matriks kompleks dengan invers – lanjutan
51 23/11/2020 Persiapan Zoom meeting dengan Prof. Ozaki
52 24/11/2020
Zoom meeting dengan Prof. Ozaki: Revisi analisis tentang condition number dalam draf paper untuk Journal of Computational and Applied Mathematics.
53 26/11/2020 Eksperimen ulang untuk condition number Pembuatan draft Laporan Akhir
54 27/11/2020 Eksperimen ulang untuk condition number Pembuatan draft Laporan Akhir - lanjutan
55 28/11/2020 Perbaikan draft paper
Pembuatan draft Laporan Akhir - lanjutan
56 30/11/2020 Finalisasi Laporan Akhir - lanjutan
Keterangan: Dokumen pendukung pada setiap kegiatan dapat berupa foto, grafik, tabel, catatan, dokumen, data dan sebagainya)
Extending the Use of ExtractScalar Algorithm for Matrix
Splitting
Nurul Yakim Kazal1, Imam Mukhlash2, and Bandung Arry S.3
1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember
June 12, 2020
Rump et al introduced an algorithm called ExtractScalar and the analysis of this algorithm is given by Lemma 3.3 in [1] stating that:
Let q and p0 be the be the results of Algorithm 3.2 (ExtractScalar) applied to floating-point numbers σ and p. Assume σ = 2k ∈ F for some k ∈ Z, assume |p| ≤ 2−Mσ for some 0 ≤ M ∈ N . Then
p = q + p0, |p0| ≤ eps · σ, |q| ≤ 2−Mσ and q ∈ eps · σZ (1) According to this Lemma, we need to find M and σ such that |p| ≤ 2−Mσ to make the algorithm work well. Since Rump et al also introduced the concept of unit in the first place (uf p) in [1] which is defined by
uf p(r) := 2blog2|r|c f or 0 6= r ∈ R (2)
and one of its properties states that
0 6= r ∈ R =⇒ uf p(r) ≤ |r| < 2 · uf p(r) (3) then it is reasonable to use the last inequality in (3) to set
2 · uf p(r) = 2−Mσ f or 0 6= r ∈ R (4) But we also know that 2 · uf p(r) = 2 · uf p(r) · 2M · 2−M which means
σ = 2 · uf p(r) · 2M = 2M +1· uf p(r) (5)
for 0 6= r ∈ R. Therefore, we can define σ := 2M +1· uf p(r), for 0 ≤ M ∈ N when applying Ex-tractScalar to a nonzero floating point r as an input.
We can now adapt the algorithm of ExtractScalar to perform the split of a matrix A ∈ Fm×n since matrix splitting is basicly a process of applying ExtractScalar to every element of the matrix A, which is aij, for 1 ≤ i ≤ m and 1 ≤ j ≤ n. This is done by transforming the matrix A ∈ Fm×n into A(1) and
A(2) such that
A = A(1)+ A(2) (6)
In order to do this, we firstly define a constant γ by
γ = M = blog2(n)c (7)
As there are only 53 bits to store the significand of a floating point in double precision, then we need to assume that n eps−1, where eps = 2−53. Next, we can choose to apply the ExtractScalar algorithm
either column-by-column or row-by-row. For the latter, we need to find the vector P ∈ Fm whose elements are given by
Pi = max
1≤j≤n|aij| (8)
where aij represent each element of the matrix A. Finding the vector P is useful for computing the
vector Q having similar size as P , whose elemets are defined by
Qi= 2γ+1· uf p(Pi) (9)
Obviously, (9) is just another form of (5) and we have
max
1≤i≤n|aij| ≤ Qi (10)
Therefore, we can set σi = Qi and apply the ExtractScalar algorithm to every element of the i-th row.
After implementing the algorithm to all rows of A, we end up with two matrices A(1)and A(2)satisfying (6).
Correspondingly, if we want to apply the algorithm column-by-column, we need to find the vector R ∈ Fn whose elements are given by
Rj= max
1≤i≤m|aij| (11)
where aij represent each element of the matrix A. Finding the vector R is useful for computing the
vector S having similar size as R, whose elemets are defined by
Sj = 2γ+1· uf p(Rj) (12)
Again, (9) is just another form of (5) and we have
max
1≤j≤m|aij| ≤ Sj (13)
Therefore, we can set σj = Sj and apply the ExtractScalar algorithm to every element of the j-th
column. After implementing the algorithm to all columns of A, we end up with two matrices A(1) and A(2) satisfying (6).
The procedure explained before (either row-by-row or column-by-column) can be applied again to A(2)
and it gives outputs of A(2) and A(3) satisfying
A(2)= A(2)+ A(3) (14)
From (6) and (14), we then have
A = A(1)+ A(2)+ A(3) (15) Applying the procedure repeatedly (k − 1) times, we end up having
A = A(1)+ A(2)+ · · · + A(k) (16) Algorithm 1 ExtractScalar f u n c t i o n [ q, p0] = E x t r a c t S c a l a r ( σ, p ) q = f l(σ + p) − σ p0 = f l(p − q) end 2
Algorithm 2 UnitFirstPlace f u n c t i o n [ uf p ] = U n i t F i r s t P l a c e ( r ) i f r == 0 u f p = 0 ; e l s e a = f loor(log2(abs(r))) ; uf p = 2.a end i f end Algorithm 3 MatrixSplittingByRow f u n c t i o n [ S ] = M a t r i x S p l i t t i n g B y R o w (A) [m, n] = size(A) ; γ = f loor(log2(n)) ; S{1} = zeros(size(A)) ; k = 0 ; w h i l e norm (A, i n f )~=0 k = k + 1 ; f = @ U n i t F i r s t P l a c e ; g = @ E x t r a c t S c a l a r ; mu = max ( abs (A ) , [ ] , 2 ) ; uf p = f (mu) ; sigma = 2(γ+1). ∗ uf p ; f o r i = 1 : m f o r j = 1 : n
S{k}(i, j) = g(σ(i), A(i, j)) ; end f o r end f o r A = A − S{k} ; end w h i l e end 3
Algorithm 4 MatrixSplittingByColumn f u n c t i o n [ S ] = M a t r i x S p l i t t i n g B y C o l u m n (A) [m, n] = size(A) ; γ = f loor(log2(n)) ; S{1} = zeros(size(A)) ; k = 0 ; w h i l e norm (A, i n f )~=0 k = k + 1 ; f = @ U n i t F i r s t P l a c e ; g = @ E x t r a c t S c a l a r ; mu = max ( abs (A ) ) ; uf p = f (mu) ; sigma = 2(γ+1). ∗ uf p ; f o r j = 1 : n f o r i = 1 : m S{k}(i, j) = g(σ(j), A(i, j)) ; end f o r end f o r A = A − S{k} ; end w h i l e end
References
[1] Siegfried M. Rump, Takeshi Ogita, and Shin’ichi Oishi. Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput, page 2008.
Error-Free Transformation for Complex Matrix
Multiplication
Nurul Yakim Kazal1, Imam Mukhlash2, Chairul Imron3, and Bandung Arry S.4
1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematics, Institut Teknologi Sepuluh Nopember
June 25, 2020
1
ExtractScalar Algorithm for Error-Free Splitting of Floating
Point Numbers
Rump et al. introduced the concept of unit in the first place (uf p) or leading bit of a real number in [2] which is defined by
0 6= r ∈ R =⇒ uf p(r) := 2blog2|r|c and uf p(0) := 0 (1)
Based on this definition, we construct an algorithm called UnitFirstPlace as follows:
Algorithm 1 UnitFirstPlace f u n c t i o n [ uf p ] = U n i t F i r s t P l a c e ( r ) i f r == 0 uf p = 0 e l s e a = f l o o r ( l o g 2 ( abs ( r ) ) ) uf p = 2a end end
One of uf p properties given in [2] states that
0 6= r ∈ R =⇒ uf p(r) ≤ |r| < 2 · uf p(r) (2) Proof of (2)1
Assume that 0 6= r ∈ R. If 0 6= r ∈ R, then log2|r| ∈ R and it applies that 0 ≤ log2|r| − blog2|r|c < 1.
Also, the following holds
0 ≤ log2|r| − blog2|r|c < 1 =⇒ blog2|r|c ≤ log2|r| < 1 + blog2|r|c =⇒ 2blog2|r|c ≤ 2log2|r|< 21+blog2|r|c
=⇒ 2blog2|r|c ≤ 2log2|r|< 2 · 2blog2|r|c
=⇒ uf p(r) ≤ |r| < 2 · uf p(r)
Rump et al. also constructed an algorithm named ExtractScalar in [2] as follows: 1the property is given but not proven in [2]
Algorithm 2 ExtractScalar
f u n c t i o n [ q, p0] = E x t r a c t S c a l a r ( σ, p ) q = f l(σ + p) − σ
p0 = f l(p − q) end
the Algorithm 2 is basically used for splitting a floating point number p into two parts, namely q and p0, such that p = q + p0. The analysis of Algorithm 2 is given by the following lemma:
Lemma 12
Let q and p0 be the be the results of Algorithm 2 (ExtractScalar) applied to floating-point numbers σ and p. Assume σ = 2k
∈ F for some k ∈ Z, assume |p| ≤ 2−Mσ for some 0 ≤ M ∈ N . Then p = q + p0, |p0| ≤ u · σ, |q| ≤ 2−Mσ and
q ∈ u · σZ (3) with u = 2−53 for IEEE 753 binary64 (double precision).
According to this lemma, we need to find M and σ such that |p| ≤ 2−Mσ is satisfied to make the algorithm work well. Since the second inequality given by (2) always holds for 0 6= r ∈ R, then it is reasonable to set
2−Mσ = 2 · uf p(r) f or 0 6= r ∈ R (4) such that
|r| < 2−Mσ (5)
which is the assumption required by the lemma is always satisfied. However, we also know that
2 · uf p(r) = 2 · uf p(r) · 2M· 2−M (6)
From (4) and (6), we have 2 · uf p(r) · 2M· 2−M = 2−Mσ and applying cancellation law to this equation
gives the following:
σ = 2 · uf p(r) · 2M
= 2M +1· uf p(r) (7)
for 0 6= r ∈ R. Therefore, we can define
σ := 2M +1· uf p(r) (8)
for 0 ≤ M ∈ N when applying ExtractScalar to a nonzero floating point r as an input.
2
Extending Use of ExtractScalar Algorithm for Matrix Splitting
and Matrix Multiplication
2.1 Matrix Splitting
We can now adapt the ExtractScalar algorithm to perform the transformation of a matrix A ∈ Fm×n
into A(1) and A(2) of the same size as A such that
A = A(1)+ A(2) (9)
This is done by applying the algorithm of ExtractScalar to every element aij of A, for 1 ≤ i ≤ m and
1 ≤ j ≤ n, either row-by-row or column-by-column. For the former, we need to find a vector P ∈ Fm whose elements are given by
Pi = max
1≤j≤n|aij| (10)
2The lemma has been proven in [2]
for 1 ≤ i ≤ m and aij represents each element of the matrix A. Finding the vector P is used for
computing the vector σ, having similar size to P , whose elemets are defined by
σi= 2M +1· uf p(Pi) (11)
Obviously, (11) is just another form of (8). From (11) and (5), we have
Pi< σi (12)
which implies that |aij| < σi, for 1 ≤ j ≤ n. This result shows that σi, for 1 ≤ i ≤ m satisfies the
required assumption of Lemma 1. Therefore, we can now apply the ExtractScalar algorithm to every element of the i-th row of A and end up with two matrices A(1) and A(2)
satisfying (9). This procedure is done by applying the following algorithm called MatrixSplittingByRow:
Algorithm 3 MatrixSplittingByRow f u n c t i o n [ A(1), A(2)] = M a t r i x S p l i t t i n g B y R o w ( A ) [ m, n ] = s i z e ( A ) P = max ( abs ( A ) , [ ] , 2 ) uf p = U n i t F i r s t P l a c e ( P ) σ = 2^(M + 1). ∗ uf p A(1) = z e r o s ( s i z e ( A ) ) A(2) = z e r o s ( s i z e ( A ) ) f o r i = 1 : m f o r j = 1 : n
[ A(1)(i, j), A(2)(i, j) ] = E x t r a c t S c a l a r ( σ(i), A(i, j) ) end
end end
Correspondingly, if we want to apply the ExtractScalar algorithm to A column-by-column, we need to find the vector Q ∈ Fn whose elements are given by
Qj= max
1≤i≤m|aij| (13)
for 1 ≤ j ≤ n and aij represents each element of the matrix A. Finding the vector Q is useful as it is
intended for computing the vector τ , having similar size as Q, whose elemets are defined by
τj= 2M +1· uf p(Qj) (14)
Again, (14) is just another form of (8). using (14) and (5), we have
Qj< τj (15)
implying that |aij| < τj, for 1 ≤ i ≤ m. This demonstrates that τj, for 1 ≤ j ≤ n, meets the necessary
assumption of Lemma 1. Therefore, we can then apply the ExtractScalar algorithm to every element of the j-th column of A and we obtain two matrices A(1) and A(2) such that (9) holds. This procedure
is done by applying the following algorithm called MatrixSplittingByColumn:
Algorithm 4 MatrixSplittingByColumn f u n c t i o n [ A(1), A(2)] = M a t r i x S p l i t t i n g B y C o l u m n ( A ) [ m, n ] = s i z e ( A ) Q = max ( abs ( A ) ) uf p = U n i t F i r s t P l a c e ( Q ) τ = 2^(M + 1). ∗ uf p A(1) = z e r o s ( s i z e ( A ) ) A(2) = z e r o s ( s i z e ( A ) ) f o r i = 1 : m f o r j = 1 : n
[ A(1)(i, j), A(2)(i, j) ] = E x t r a c t S c a l a r ( τ (j), A(i, j) )
end end end
2.2 Matrix Multiplication
As what has been explained in [1], we firstly define M by
M := & log2(n) + 53 2 ' (16)
and use this M in Algorithm 3 and Algorithm 4 to split matrices A ∈ Fm×nand B ∈ Fn×prespectively and repeatedly. Then, there exists nA, nB∈ N such that
A = nA X r=1 D(r) , B = nB X s=1 E(s) , D(nA+1)= O mn , and E(nB+1)= Onp (17)
where Omnis a zero matrix of the size m × n.
Next, we modify the Algorithm 3 and 4 to construct the Algorithm 5 and 6, in order to trans-form matrices A ∈ Fm×n
and B ∈ Fn×p such that (17) is obtained.
Algorithm 5 MatrixSplittingByRow_Mod f u n c t i o n [ D(r)] = MatrixSplittingByRow_Mod ( A ) [ m, n ] = s i z e ( A ) M = c e i l ( ( l o g 2 ( n ) + 5 3 ) / 2 ) D(r){1} = z e r o s ( s i z e ( A ) ) k = 0 w h i l e norm ( A , i n f ) 6= 0 k = k + 1 P = max ( abs ( A ) , [ ] , 2 ) uf p = U n i t F i r s t P l a c e ( P ) σ = 2^(M + 1). ∗ uf p f o r i = 1 : m f o r j = 1 : n
[ D(r){k}(i, j), A(i, j) ] = E x t r a c t S c a l a r ( σ(i), A(i, j) ) end
end end
end
Algorithm 6 MatrixSplittingByColumn_Mod f u n c t i o n [ E(s)] = MatrixSplittingByColumn_Mod ( B ) [ n, p ] = s i z e ( B ) M = c e i l ( ( l o g 2 ( n ) + 5 3 ) / 2 ) E(s){1} = z e r o s ( s i z e ( B ) ) k = 0 w h i l e norm ( B , i n f ) 6= 0 k = k + 1 Q = max ( abs ( A ) , [ ] , 2 ) uf p = U n i t F i r s t P l a c e ( Q ) τ = 2^(M + 1). ∗ uf p f o r i = 1 : m f o r j = 1 : n
[ E(s){k}(i, j), B(i, j) ] = E x t r a c t S c a l a r ( τ (j), B(i, j) )
end end end
end
Next, The Theorem 1 in [1] guarantees that
f l A(r)B(s) = A(r)B(s)
(18)
which also implies an error-free transformation of a matrix product by
AB = X
1≤i≤nA,1≤j≤nB
f l A(i), B(j)
(19)
Next, we adapt the EFT_Mul algorithm constructed by Ozaki et al. in [1] using slightly different matrix splitting algorithm, namely Algorithm 5 and 6, to form EFT_MatMul algorithm. This algorithm then transforms the result of matrix product between A ∈ Fm×n
and B ∈ Fn×p into unevaluated sum of
floating-point matrices without rounding errors, such that
AB =
nA·nB
X
i=1
C(i), where C(i)∈ Fm×p (20)
Algorithm 7 EFT_MatMul f u n c t i o n [ C ] = EFT_MatMul( A, B ) P = MatrixSplittingByRow_Mod ( A ) Q = MatrixSplittingByColumn_Mod ( B ) N A = l e n g t h ( P ) N B = l e n g t h ( Q ) k = 1 f o r i = 1 : N A f o r j = 1 : N B C{k} = P {i} ∗ Q{j} k = k + 1 end end end
To obtain the accurate result from (20), we can apply the accurate summation algorithm called AccSum, developed by Rump et al. in [2], to the output of the Algorithm 7 and we have the following algorithm:
Algorithm 8 AccMatMul f u n c t i o n [ Result_AB ] = AccMatMul ( A, B ) [ m, n ] = s i z e ( A ) [ n, p ] = s i z e ( B ) C = EFT_MatMul( A, B ) Result_AB = z e r o s ( m, p ) f o r i = 1 : m f o r j = 1 : p v e c t o r = z e r o s ( s i z e ( C ) ) f o r k = 1 : l e n g t h ( v e c t o r ) v e c t o r ( k ) = C{k}(i, j) end Result_AB ( i, j ) = AccSum ( v e c t o r ) end end end
3
Complex Matrix Multiplication
3.1 Simple Application
If we have matrices A, B ∈ Fm×n
and C, D ∈ Fn×p, then
(A + Bi)(C + Di) = (AC − BD) + (AD + BC)i (21)
From section 2, we know that the accurate result of the matrix products AC, BD, AD, and BC can be achieved by implementing either algorithm 8. Then applying AccSum to AC and −BD, we get (AC − BD). Similarly, executing AccSum with AD and BC as inputs yields (AD + BC). Using these ideas, we constructed an algorithm called AccCompMatMul to obtain the accurate results of (21) as follows: Algorithm 9 AccCompMatMul f u n c t i o n [ Re alPart , I m a g i n a r y P a r t ] = AccCompMatMul ( A, B, C, D ) [ m, n ] = s i z e ( A ) ; [ n, p ] = s i z e ( C ) Re al{1} = AccMatMul ( A, C ) Re al{2} = −AccMatMul ( B, D ) R e a l P a r t = z e r o s ( m, p ) I m a g i n a r y {1} = AccMatMul ( A, D ) I m a g i n a r y {2} = AccMatMul ( B, C ) I m a g i n a r y P a r t = z e r o s ( m, p ) f o r i = 1 : m f o r j = 1 : p R e a l V e c t o r = z e r o s ( s i z e ( R eal ) ) I m a g i n a r y V e c t o r = z e r o s ( s i z e ( I m a g i n a r y ) ) f o r k = 1 : l e n g t h ( R e a l V e c t o r ) R e a l V e c t o r ( k ) = Re al{k}(i, j) I m a g i n a r y V e c t o r ( k ) = I m a g i n a r y {k}(i, j) end R e a l P a r t ( i, j ) = AccSum ( R e a l V e c t o r ) I m a g i n a r y P a r t ( i, j ) = AccSum ( I m a g i n a r y V e c t o r ) end end end 6
RealPart and ImaginaryPart which are the outputs of Algorithm 9 represent (AC − BD) and (AD + BC) respectively.
References
[1] Katsuhisa Ozaki, Takeshi Ogita, Shin’ichi Oishi, and Siegfried M Rump. Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numerical Algorithms, 59(1):95–118, 2012.
[2] Siegfried M. Rump, Takeshi Ogita, and Shin’ichi Oishi. Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput, page 2008.
Error-Free Transformation for Complex Matrix
Multiplication
Nurul Yakim Kazal1, Imam Mukhlash2, Chairul Imron3, and Bandung Arry S.4
1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematics, Institut Teknologi Sepuluh Nopember
July 10, 2020
1
Tentative Goal 3.2
If we are given complex matrices
A + Bi and C + Di (1)
for A, B ∈ Fm×n
and C, D ∈ Fn×p, then
(A + Bi)(C + Di) = (AC − BD) + (AD + BC)i (2)
Ozaki et al. in [1] defined β as
β := &
log2(n) − log2(u) 2
'
(3)
From the proof of Theorem 1 in [1], we find that β needs to satisfy the following condition:
log2(n) − log2(u)
2 ≤ β (4)
such that
fl A(r)C(s) = A(r)C(s), fl B(r)D(s) = B(r)D(s), fl A(r)D(s) = A(r)D(s)
and fl B(r)C(s) = B(r)C(s)
(5) hold, where A(r), B(r), C(s) and D(s) are given by the followings:
A = nA X r=1 A(r), B = nB X r=1 B(r), C = nC X s=1 C(s), D = nD X s=1 D(s) (6) for A(r), B(r) ∈ Fm×n and C(s), D(s)
∈ Fn×p. We need to construct new splitting algorithm to obtain
(6) such that the followings are satisfied:
fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) and fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s) (7)
In order to do that, we firstly define a constant γ as follows:
γ := &
log2(n) − log2(u) + 1 2
'
(8)
From (3), (4) and (8), we have
log2(n) − log2(u) 2 ≤ β =
&
log2(n) − log2(u) 2
' <
&
log2(n) − log2(u) + 1 2
'
= γ (9)
which means that γ is valid to guarantee that (5) is satisfied.
Similar to procedures done in [1], two vectors, namely σ(1)∈ Fmand τ(1)
∈ Fp, are defined by
σ(1)i = 2γ· 2Pi(1) and τ(1)
j = 2
γ· 2Q(1)j (10)
where Pi(1) and Q(1)j are given by
Pi(1) =llog2 max
1≤j≤n|aij| + max1≤j≤n|bij|
m
and Q(1)j =llog2 max
1≤i≤m|aij| + max1≤i≤m|bij|
m
(11)
using σ(1)and the concept of ExtractScalar to every element of matrices A and B, then using τ(1) and
the concept of ExtractScalar to every element of matrices C and D result in the followings:
A = A(1)+ A(2), B = B(1)+ B(2), C = C(1)+ C(2) and D = D(1)+ D(2) (12)
Again, σ(2)∈ Fm and τ(2)
∈ Fp are defined by:
σ(2)i = 2γ· 2Pi(2) and τ(2)
j = 2
γ· 2Q(2)j
(13)
where Pi(2) and Q(2)j are given by
Pi(2) =llog2 max 1≤j≤n|a (2) ij | + max 1≤j≤n|b (2) ij | m
and Q(2)j =llog2 max
1≤i≤m|c (2) ij | + max 1≤i≤m|d (2) ij | m (14)
using σ(2) and the concept of ExtractScalar to every element of matrices A(2) and B(2), then using τ(2) and the concept of ExtractScalar to every element of matrices C and D result in the followings:
A(2) = A(2)+ A(3), B(2)= B(2)+ B(3), C(2)= C(2)+ C(3) and D(2)= D(2)+ D(3) (15)
The general idea is to define σ(w)∈ Fmdan τ(w)
∈ Fp as
σ(w)i = 2γ· 2Pi(w) and τ(w)
j = 2
γ· 2Q(w)j (16)
where Pi(w) and Q(w)j are given by
Pi(w)=llog2 max 1≤j≤n|a (w) ij | + max1≤j≤n|b (w) ij | m
and Q(w)j =llog2 max
1≤i≤m|c (w) ij | + max1≤i≤m|d (w) ij | m (17)
using σ(w)and the concept of ExtractScalar to every element of matrices A(w)and B(w), then using τ(w)
and the concept of ExtractScalar to every element of matrices C(w)and D(w) result in the followings:
A(w)= A(w)+ A(w+1), B(w) = B(w)+ B(w+1), C(w)= C(w)+ C(w+1) and D(w)= D(w)+ D(w+1)
(18) This general idea are implemented repeatedly until (6) and the following conditions hold:
A(nA+1)= O
mn, B(nB+1)= Omn, C(nC+1)= Onp and D(nD+1)= Omn (19)
where Omnand Onprepresent zero matrices of the size m × n and n × p respectively.
Theorem A
Assume that A, B ∈ Fm×n and C, D ∈ Fn×p. Implementing (16) and the general idea repeatedly results in (19). Also, it also implies that (7) holds.
Proof
We need to show that (6) holds and the followings are true:
fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) and fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s)
It suffices to just show that fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s) is satisfied, since fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) follows exatly similar idea but using different sign.
As it has been explained before, if we use σ(r) to split the matrices A and B, then we have
a(r)ij ∈ uσi(r)Z and b(r)ij ∈ uσ (r)
i Z (20)
|a(r)ij | ≤ 2−γσi(r) and |b(r)ij | ≤ 2−γσi(r) (21)
Similarly, if we use τ(s) to split the matrices C and D, then we have
c(s)ij ∈ uτj(s)Z and d(s)ij ∈ uτ (s)
j Z (22)
|c(s)ij | ≤ 2−γτj(s) and |d(s)ij | ≤ 2−γτj(s) (23) (20), (21), (22) and (24) are all consequences of applying the concept of ExtractScalar algorithm as it is shown by Lemma 3.3 in [2].
From (20) and (22), we obtain
a(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z, b (r) ikc (s) kj ∈ u 2σ(r) i τ (s) j Z and a (r) ikd (s) kj + b (r) ikc (s) kj ∈ u 2σ(r) i τ (s) j Z (24)
which then implies
n X k=1 a(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z, n X k=1 b(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z and n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z (25) From (21) and (23), we obtain
|a(r)ikd(s)kj| ≤ 2−2γσ(r)i τj(s) and |b(r)ikc(s)kj| ≤ 2−2γσi(r)τj(s) (26)
which then implies
n X k=1 |a(r)ikd(s)kj| ≤ n2−2γσ(r)i τj(s) and n X k=1 |b(r)ikc(s)kj| ≤ n2−2γσ(r)i τj(s) (27)
Obviously, the followings hold:
n X k=1 a(r)ikd(s)kj ≤ n X k=1 |a(r)ikd(s)kj| ≤ n2−2γσi(r)τj(s) and n X k=1 b(r)ikc(s)kj ≤ n X k=1 |b(r)ikc(s)kj| ≤ n2−2γσi(r)τj(s) (28) and (28) implies that
n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ≤ n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ≤ n2−2γσi(r)τj(s)+ n2−2γσ(r)i τj(s) = 2n2−2γσ(r)i τj(s) = n2−2γ+1σi(r)τj(s) (29)
Using the definition of γ, we find that
n2−2γ+1= n2
−2 log2(n) − log2(u) + 1
2
+1
≤ n2−(log2(n)−log2(u)+1)+1= n2− log2(n)+log2(u)= u (30)
from (29) and (30), we find that
n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ≤ uσi(r)τj(s) (31) 3
using (24), (25) and (31), we obtain u2σ(r) i τ (s) j ≤ Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj ≤ uσ (r) i τ (s) j , if Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj 6= 0 fl Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj = 0, if Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj = 0 (32) From Remark 2 in [1], this means that there is no roundoff in fl A(r)D(s)+ B(r)C(s) and this completes
the proof.
Based on this theorem, we develop the following algorithms:
Algorithm 1 Split_AB f u n c t i o n [ E, F ] = Split_AB ( A, B ) [ ~ , n ] = s i z e ( A ) k = 1 u = 2^−53 gamma = c e i l ( ( l o g 2 ( n)− l o g 2 ( u ) + 1 ) / 2 ) E {1} = z e r o s ( s i z e ( A ) ) F {1} = z e r o s ( s i z e ( B ) )
w h i l e norm ( A , i n f )~=0 && norm ( B , i n f )~=0 mu_A = max ( abs ( A ) , [ ] , 2 )
mu_B = max ( abs ( B ) , [ ] , 2 ) mu = z e r o s ( s i z e (mu_A) )
f o r i = 1 : l e n g t h (mu)
mu( i ) = mu_A( i)+mu_B( i ) end w = 2 . ^ ( c e i l ( l o g 2 (mu))+gamma) S = repmat ( w , 1 , n ) E {k} = ( A + S ) − S F {k} = ( B + S ) − S A = A − E {k} B = B − F {k} k = k + 1 end end 4
Algorithm 2 Split_CD f u n c t i o n [ G, H ] = Split_CD ( C, D ) [ ~ , n ] = s i z e ( C ) k = 1 u = 2^−53 gamma = c e i l ( ( l o g 2 ( n)− l o g 2 ( u ) + 1 ) / 2 ) G{1} = z e r o s ( s i z e ( C ) ) H {1} = z e r o s ( s i z e ( D ) )
w h i l e norm ( C , i n f )~=0 && norm ( D , i n f )~=0 mu_C = max ( abs ( C ) )
mu_D = max ( abs ( D ) ) mu = z e r o s ( s i z e (mu_C) )
f o r i = 1 : l e n g t h (mu)
mu( i ) = mu_C( i)+mu_D( i ) end w = 2 . ^ ( c e i l ( l o g 2 (mu))+gamma) S = repmat ( w , n , 1 ) G{k} = ( C + S ) − S H {k} = ( D + S ) − S C = C − G{k} D = D − H {k} k = k + 1 end end Algorithm 3 EFT_ComMatMul f u n c t i o n [ K ] = EFT_ComMatMul( A, B, C, D ) [ E, F ] = Split_AB ( A, B ) [ G, H ] = Split_CD ( C, D ) q = 1 f o r i = 1 : l e n g t h ( E ) f o r j = 1 : l e n g t h ( H ) I {q} = E {i}∗H {j } q = q + 1 end end r = 1 f o r i = 1 : l e n g t h ( F ) f o r j = 1 : l e n g t h ( G ) J {r} = F {i}∗G{j } r = r + 1 end end K {1} = z e r o s ( s i z e ( I ) ) f o r i = 1 : l e n g t h ( I )
K {i} = I {i}+J {i} end
end
2
Tentative Goal 3.3
In simple application, there are four matrix multiplication, namely AC, BD, AD and BC. If we let
P = A(C + D), , Q = (A + B)D and R = B(C − D) (33)
then
(A + Bi)(C + Di) = (P − Q) + (Q + R)i (34) In (34), there are only three matrix multiplication. We need to find a new splitting algorithm for P, Q and R to obtain the following:
A = nA X r=1 A(r), C + D = nCpD X s=1 S(s), A(r) ∈ Fm×n, S(s) ∈ Fn×p (35) A + B = nApB X r=1 T(r), D = nD X s=1 D(s), T(r)∈ Fm×n, D(s) ∈ Fn×p (36) B = nB X r=1 B(r), C − D = nCmD X s=1 U(s), B(r)∈ Fm×n, U(s) ∈ Fn×p (37)
where nA, nCpD, nApB, nD, nB, nCmD∈ N, such that:
A(r)S(s)= fl A(r)S(s), T(r)D(s)= fl T(r)D(s)
and B(r)U(s)= fl B(r)U(s)
(38)
To illustrate the new splitting algorithm, we only use (35), while (36) and (37) follow accordingly.
Firstly, we can define a vector σ(1) ∈ Fmas
σ(1)i = 2γ· 2Vi(1) (39) where Vi(1)=llog2 max 1≤j≤n|aij| m (40)
and aijrepresents the element of matrix A. Then, we use σ(1)and implement the concept of ExtractScalar
algorithm to every element of the matrix A to obtain A = A(1)+ A(2). Again, we define σ(2)∈ Fm as
σ(2)i = 2γ· 2Vi(2) (41) where Vi(2)=llog2 max 1≤j≤n|a (2) ij | m (42)
and a(2)ij represents the element of matrix A(2). Then, we use σ(2) and implement the concept of
ExtractScalar algorithm to every element of the matrix A(2) to obtain A(2) = A(2) + A(3). The
basic idea is to define σ(w) ∈ Fmas
σ(w)i = 2γ· 2Vi(w) (43) where Vi(w)=llog2 max 1≤j≤n|a (w) ij | m (44)
and a(w)ij represents the element of matrix A(w). Then, we use σ(w) and implement the concept of
ExtractScalar algorithm to every element of the matrix A(w) to obtain A(w)= A(w)+ A(w+1).
Imple-menting (43) and the basic idea explained before repeatedly results in
A = nA X r=1 A(r) and A(nA+1)= O mn (45)
where Omnis a zero matrix of the size m × n.
Next, we define τ(1)∈ Fp as
τj(1)= 2γ· 2Wi(1) (46)
where
Wj(1)=llog2 max
1≤i≤m|cij| + max1≤i≤m|dij|
m
(47)
cij and dij represent the elements of matrix C and D. Then, we use τ(1) and implement the concept
of ExtractScalar algorithm to every element of the matrix C and D to obtain C = C(1)+ C(2) and D = D(1)+ D(2) Again, we define τ(2)∈ Fp as τj(2)= 2γ· 2Wj(2) (48) where Wj(2)=llog2 max 1≤i≤m|c (2) ij | + max1≤i≤m|d (2) ij | m (49)
c(2)ij and d(2)ij represents the element of matrix C(2)and D(2). Then, we use τ(2)and implement the concept of ExtractScalar algorithm to every element of the matrix C(2) and D(2) to obtain C(2) = C(2)+ C(3) and D(2)= D(2)+ D(3)
. The basic idea is to define τ(w)∈ Fp as
τj(w)= 2γ· 2Wj(w) (50) where Wj(w)=llog2 max 1≤i≤m|c (w) ij | + max1≤i≤m|d (w) ij | m (51)
and c(w)ij and d(w)ij represent the element of matrix C(w)and D(w). Then, we use τ(w) and implement the
concept of ExtractScalar algorithm to every element of the matrix C(w) and D(w) to obtain C(w) =
C(w)+ C(w+1) and D(w)= D(w)+ D(w+1). We then implement (50) and the basic idea explained before
repeatedly until the followings hold:
C = nC X s=1 C(s), C(nC+1)= O np and D = nD X s=1 D(s), D(nD+1)= O np (52)
where Onpis a zero matrix of the size n × p. Through this step, we get nC = nD and we can find
S(s)= C(s)+ D(s) (53)
for 1 ≤ s ≤ nC= nD.
Theorem B
Let A ∈ Fm×n and C, D ∈ Fn×p. Implementing (43), (50) and the basic idea explained before re-peatedly yield (45) and (52). It then implies that A(r)S(s)= fl A(r)S(s), where S(s) is given by (53). Proof
If we split A using σ(r), then the consequences of Lemma 3.3 in [2] yield the following:
a(r)ij ∈ uσ(r)i Z (54)
and
|a(r)ij | ≤ 2−γσ(r)
i (55)
Similarly, If we use τ(s) to split C and D, then
c(s)ij ∈ uτj(s)Z and d(s)ij ∈ uτ (s)
j Z (56)
and
|c(s)ij | ≤ 2−γτj(s) and |d(s)ij | ≤ 2−γτj(s) (57) Using (53) and (56), we have
s(s)ij = c(s)ij + d(s)ij ∈ uτj(s)Z (58) From (54) and (58), we also have
a(r)iks(s)kj ∈ u2σ(r)
i τ
(s)
j Z (59)
which then implies
n
X
k=1
a(r)iks(s)kj ∈ u2σ(r)i τj(s)Z (60)
Using (53) and (57), we have
|s(s)ij | = |c(s)ij + d(s)ij | ≤ |c(s)ij | + |d(s)ij | ≤ 2−γτj(s)+ 2−γτj(s)= 2 · 2−γτj(s)= 2−γ+1τj(s) (61)
or we can write
|s(s)ij | ≤ 2−γ+1τ(s)
j (62)
Using (55) and (62), we obtain
|a(r)iks(s)kj| = |a(r)ik||s(s)kj| ≤ 2−γσ(r)i
2−γ+1τj(s) = 2−2γ+1σ(r)
i τ
(s)
j (63)
which then implies
n X k=1 a(r)iks(s)kj ≤ n X k=1 |a(r)iks(s)kj| ≤ n2−2γ+1σ(r) i τ (s) j (64)
Using the definition of γ given by (8) as we did in (30), we find that
n2−2γ+1 ≤ u (65)
From (64) and (65), we obtain
n X k=1 a(r)iks(s)kj ≤ uσi(r)τj(s) (66)
Using relations given by (59), (60) and (66), we conclude that
u2σ(r) i τ (s) j ≤ Pn k=1a (r) iks (s) kj ≤ uσ (r) i τ (s) j , if Pn k=1a (r) iks (s) kj 6= 0 fl Pn k=1a (r) iks (s) kj = 0, if Pn k=1a (r) iks (s) kj = 0 (67)
From Remark 2 in [1], this means that there is no roundoff in fl A(r)S(s) and this completes the proof.
Based on this theorem, we develop the following algorithms:
Algorithm 4 Split_CpD f u n c t i o n [ F, G ] = Split_CpD ( C, D ) q = s i z e ( C, 1 ) k = 1 u = 2^−53 b e t a = c e i l (( − l o g 2 ( u ) + l o g 2 ( q ) ) / 2 ) F {1} = z e r o s ( s i z e ( C ) ) G{1} = z e r o s ( s i z e ( D ) )
w h i l e norm ( C , i n f )~=0 && norm ( D , i n f )~=0 mu_C = max ( abs ( C ) )
mu_D = max ( abs ( D ) ) mu = z e r o s ( s i z e (mu_C) )
f o r i = 1 : l e n g t h (mu)
mu( i ) = mu_C( i)+mu_D( i ) end w = 2 . ^ ( c e i l ( l o g 2 (mu))+ b e t a ) S = repmat ( w , q , 1 ) F {k} = ( C + S ) − S G{k} = ( D + S ) − S C = C − F {k} D = D − G{k} k = k + 1 end end 8
Algorithm 5 Equation12 f u n c t i o n [ H ] = E q u a t i o n 1 2 ( A, C, D ) E = S p l i t A ( A ) [ F, G ] = Split_CD ( C, D ) S {1} = z e r o s ( s i z e ( F { 1 } ) ) f o r k = 1 : l e n g t h ( F ) S {k} = F {k}+G{k} end l = 1 f o r i = 1 : l e n g t h ( E ) f o r j = 1 : l e n g t h ( S ) H {l} = E {i}∗S {j } l = l + 1 end end end
References
[1] Katsuhisa Ozaki, Takeshi Ogita, Shin’ichi Oishi, and Siegfried M Rump. Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numerical Algorithms, 59(1):95–118, 2012.
[2] Siegfried M. Rump, Takeshi Ogita, and Shin’ichi Oishi. Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput, page 2008.
Error-Free Transformation for Complex Matrix
Multiplication
Nurul Yakim Kazal1, Imam Mukhlash2, Chairul Imron3, Bandung Arry S.4, and Katsuhisa Ozaki5
1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematical Sciences, Shibaura Institute of Technology
July 24, 2020
1
Introduction
2
Overview of Error-Free Transformation for (Real) Matrix
Mul-tiplication
Let F be a set of floating-point numbers in IEEE 754. fl(·) means that all operations in the parenthesis are evaluated by floating-point arithmetic. Let u be the roundoff unit, i.e., u = 2−24 for binary32 in IEEE 754 and u = 2−53 for binary64 in IEEE 754.
We briefly introduce the error-free transformation of matrix multiplication developed in [2]. For given two matrices A ∈ Fm×n
and B ∈ Fn×p, we consider the error-free transformation of AB. Let
A = A(k) and B = B(k). (1)
We set two vectors v(k)∈ Fmand w(k)∈ Fp as follows:
vi(k)=llog2 max 1≤j≤n|a (k) ij | m , wj(k)=llog2 max 1≤i≤n|b (k) ij | m . (2)
Here, we define exceptions:
max 1≤j≤n|a (k) ij | = 0 =⇒ v (k) i = 0, 1≤i≤mmax |b (k) ij | = 0 =⇒ w (k) j = 0.
Next, vectors v(k)and w(k)are respectively used for the purpose of computing other two vectors, namely σ(k) ∈ Fmand τ(k) ∈ Fp. Specifically, σ(k) ∈ Fmand τ(k) ∈ Fp are defined as σi(k)= fl2β· 2v(k)i = 2β· 2v(k)i , τ(k) j = fl 2β· 2wj(k) = 2β· 2wj(k). (3) where β is given by β = & log2n − log2u 2 ' . (4)
As stated by Remark 3 in [2], the aim of obtaining vectors σ(k) and τ(k) are for finding f and g which
are powers of 2 such that
max 1≤j≤n|a (k) ij | ≤ f and max 1≤i≤n|b (k) ij | ≤ g 1
Then, A(k), A(k+1), B(k), and B(k+1) satisfying A(k) = A(k)+ A(k+1) and B(k) = B(k)+ B(k+1) are obtained by a(k)ij = fl a(k)ij + σ(k)i − σ(k) i , a(k+1)ij = fla(k)ij − a(k)ij b(k)ij = fl b(k)ij + τj(k) − τ(k) j , b(k+1)ij = flb(k)ij − b(k)ij (5)
The procedures involving (3) and (5) are simply adopted from Algorithm 3.2 constructed by Rump et al. [3] which is known as ExtractScalar. Consequently, we have the followings:
a(k)ij = a(k)ij + a(k+1)ij , a (k) ij ≤ 2 −βσ(k) i , a (k) ij ∈ uσ (k) i Z and a (k+1) ij ≤ uσ (k) i b(k)ij = b(k)ij + b(k+1)ij , b (k) ij ≤ 2 −βτ(k) j , b (k) ij ∈ uτ (k) j Z and b (k+1) ij ≤ uτ (k) j (6)
Theoretically, if we implement (2), (3) and (5) to A(k) and B(k), for k = 1, 2, · · · , then there exists nA, nB ∈ N such that A = nA X r=1 A(r), B = nB X s=1 B(s), , A(nA+1)= O mn and B(nB+1)= Onp (7)
where Omn and Onp, respectively, represent zero matrices of the size m × n and n × p. In (7), nA
and nB depend on n and the difference in the magnitude of elements in rows of A and columns of B.
Practically, Ozaki et al. [2] developed the following algorithm called Split_Mat based on (2), (3) and (5) to obtain matrix D(r), such that
A =
l
X
r=1
D(r) l ≤ l. (8)
It is important to note that (8) is achieved without rounding errors.
Algorithm 1 Split_Mat
function D = Split_Mat ( A, l, δ ) [ m, n ] = size ( A )
k = 1 u = 2^−52
β = fl(d((− log2(u) + log2(n))/2)e)
D{1} = zeros ( size ( A ) ) while ( k < l ) µ = max ( abs ( A ) , [ ] , 2 ) ; if ( max(µ) == 0 ) return end w = fl(2.^(ceil(log2(µ)) + β)) S = repmat ( w, 1, n ) D{k} = fl((A + S) − S) A = fl(A − D{k}) if ( nnz(D{k}) < δ ∗ m ∗ n ) D{k} = sparse ( D{k} ) end k = k + 1 end if ( k == l ) D{k} = A end end
When Algorithm 1 is implemented, δ (satisfying 0 ≤ δ < 1) is determined as the criterion to use the sparse formula. Practically, sparse representation is used, if the number of nonzero entries in a matrix
of the size m × n is less than δmn. Also, we need to set l = ∞, if (7) is required. Therefore, nA is
obtained such that A =PnA
r=1D
(r)holds or it can be written, in MATLAB notation, as A =PnA
r=1D{r}.
Correspondingly, Algorithm 1 can also be used to split matrix B such that B =Pl
s=1E
(s) and this is
done by applying E = Split_Mat(BT, l, δ)T .
Next, for matrices A(r) and B(s)given by (7),
A(r)B(s)= flA(r)B(s), for 1 ≤ r ≤ nA and 1 ≤ s ≤ nB (9)
is satisfied. It means that if we use floating-point arithmetic for A(r)B(B), rounding error never occurs
in the evaluation. Hence, we can obtain C(k), such that
AB = nAnB X k=1 C(k), C(1)= flA(1)B(1), · · · , C(nAnB)= fl A(nA)B(nB) . (10)
Here, AB is transformed into an unevaluated sum of nAnB floating-point matrices. If there is no so
big difference in the magnitude in the elements of the multiplied matrices, nA and nB becomes 3, 4 or
5 in many cases. In practical implementation, Ozaki et al. [2] designed the following algorithm named EFT_Mul to compute C(k) satisfying (10).
Algorithm 2 EFT_Mul function C = EFT_Mul ( A, B, δ ) [m, n] = size ( A ) [n, p] = size ( B ) D = Split_Mat ( A, inf, δ ) nA = length ( D ) E = Split_Mat ( BT, inf, δ ) nB = length ( E ) for r = 1 : nB E{r} = E{r}T end t = 1 for r = 1 : nA for s = 1 : nB C{t} = fl(D{r} ∗ E{s}) t = t + 1 end end end
If we apply Algorithm 4.5 in [3] and Algorithm 7.4 in [4] to the sum of C(k) in (10) componentwise, we
can obtain accurate numerical result. Let R ∈ Fm×p be the computed result using Algorithm 4.5 in [3]
and S ∈ Fm×p be Algorithm 7.4 in [4], then
|R − AB| ≤ 2u|AB| and |S − AB| ≤ u|AB| (11)
are satisfied. Here, inequality for matrices means that it holds elementwise. If (AB)ij6= 0, then we have
from (11) |Rij− (AB)ij |AB|ij ≤ 2u and |Sij− (AB)ij |AB|ij ≤ u (12)
If we use floating-point arithmetic directly to AB, dan let ˆC be the computed result, then
| ˆC − AB| ≤ nu|A||B| (13)
is satisfied [1]. Assuming (AB)ij6= 0, from (13), we have
| ˆC − ABij| ≤ nu
(|A||B|)ij
|AB|ij
(14)
In (14), the accuracy of the numerical result depends on n and the condition number of dot product. On the other hand, the relative error in (11) is bounded by the constants 2u and u, namely, it is independent to n and the condition number.
3
Extended Use of Error-Free Transformation for Complex
Ma-trix Multiplication
Given any complex matrices
˜
A = A + Bi and C = C + Di,˜ for A, B ∈ Fm×n and C, D ∈ Fn×p
where i denotes the imaginary unit, we want to compute the multiplication of these matrices, namely
˜
A ˜C = (A + Bi)(C + Di)
3.1 Simple Application
We start the discussion from the following equation:
˜
A ˜C = (A + Bi)(C + Di) = (AC − BD) + (AD + BC)i (15)
By adapting the concept of splitting the real matrices, we firstly need to set
A + Bi =A(k)+ B(k)i and C + Di =C(k)+ D(k)i. (16)
Then, vectors v(k), w(k)∈ Fm and x(k), y(k)∈ Fp are defined as follows:
v(k)i =llog2 max 1≤j≤n|a (k) ij | m , w(k)i =llog2 max 1≤j≤n|b (k) ij | m , x(k)j =llog2 max 1≤i≤n|c (k) ij | m , y(k)j =llog2 max 1≤i≤n|d (k) ij | m (17)
Here, a(k)ij , b(k)ij , c(k)ij and d(k)ij represent the elements of matrix A(k), B(k), C(k) and D(k), respectively. Next, we also define vectors σA(k), σB(k)∈ Fmand τ(k)
C , τ (k) D ∈ F p by: σA i(k) = fl2β· 2vi(k)= 2β· 2v(k)i , σ(k) B i= fl 2β· 2w(k)i = 2β· 2w(k)i τC(k)j= fl2β· 2xj(k)= 2β· 2x(k)j , τ(k) D j = fl 2β· 2yj(k)= 2β· 2y(k)j (18)
where β is given by (4) in the section 2. To findA(k)+ B(k)i, A(k+1)+ B(k+1)i, C(k)+ D(k)i
andC(k+1)+ D(k+1)isuch that
A(k)+ B(k)i=A(k)+ B(k)i+A(k+1)+ B(k+1)i and
C(k)+ D(k)i=C(k)+ D(k)i+C(k+1)+ D(k+1)i
(19)
hold, we can now apply the concept of Algorithm 3.2 in [3] as follows:
a(k)ij = fl a(k)ij + σA i(k) − σ(k)A i, a(k+1)ij = fla(k)ij − a(k)ij b(k)ij = fl b(k)ij + σB i(k) − σ(k) B i , b(k+1)ij = flb(k)ij − b(k)ij c(k)ij = fl c(k)ij + τC(k)j − τ(k) C j , c(k+1)ij = flc(k)ij − c(k)ij d(k)ij = fl d(k)ij + τD j(k) − τ(k) D j , d(k+1)ij = fld(k)ij − d(k)ij (20) 4
where a(k)ij , b(k)ij , c(k)ij and d(k)ij are elements of matrix A(k), B(k), C(k) and D(k), respectively. Note that
the procedures (16), (17), (18) and (20) are based on (1), (2), (3) and (5), respectively. Then, applying (17), (18) and (20) toA(k)+ B(k)iandC(k)+ D(k)i, for k = 1, 2, · · · , yields
A + Bi = nAB X r=1 A(r)+ B(r)i, A nAB+1 + B(nAB+1)i = Omn C + Di = nCD X s=1 C(s)+ D(s)i, C nCD+1 + D(nCD+1)i = Onp, (21)
where A(r), B(r) ∈ Fm×n and C(s), D(s) ∈ Fn×p. Since (7) and (21) are obtained from mathematically
similar process, then (9) suggests that
flA(r)C(s)= A(r)C(s), flB(r)D(s)= B(r)D(s),
flA(r)D(s)= A(r)D(s) and flB(r)C(s)= B(r)C(s),
(22)
hold for 1 ≤ r ≤ nAB, 1 ≤ s ≤ nCD and nAB, nCD∈ N. Hence, we have
AC = nABnCD X i=1 E(i), BD = nABnCD X j=1 F(j), AD = nABnCD X k=1 G(k) and BC = nABnCD X l=1 H(l) (23)
which mean that AC, BD, AD and BC are all transformed into an unevaluated sum of nABnCD
floating-point matrices. Then, (23) implies the followings:
(AC − BD) = nABnCD X i=1 E(i)− nABnCD X j=1 F(j) and (AD + BC) = nABnCD X k=1 G(k)+ nABnCD X l=1 H(l). (24)
Both summations (AC − BD) and (AD + BC) given by (24) are kept unevaluated, which means that both are transformed into an unevaluated sum of 2nABnCD. It is also worthy to note that the procedures
of obtaining (21), (23) and (24) are performed with no rounding erros involved. Therefore, error-free transformation for complex matrix multiplication satisfying (15) is achieved.
In practical implementation, we present the Split_CompMat_1 algorithm to split the complex matrix ˜
A = (A + Bi) based on (16), (17), (18) and (20) such that
˜ A = l X r=1 E(r), E(r)= A(r)+ B(r)i and l ≤ l (25) is satisfied without rounding errors.
Algorithm 3 Split_CompMat_1 function E = Split_CompMat_1 ( ˜A, l, δ ) % ˜A i s a complex m a t r i x [ m, n ] = size ( ˜A ) k = 1 u = 2^ −53; % d o u b l e p r e c i s i o n ( b i n a r y 6 4 ) β = fl(d((− log2(u) + log2(n))/2)e)
% E {1} = z e r o s ( s i z e ( ˜A))+ z e r o s ( s i z e ( ˜A ) ) ∗ 1 i while ( k < l )
µA = max ( abs ( real ( ˜A ) ) , [ ] , 2 )
µB = max ( abs ( imag ( ˜A ) ) , [ ] , 2 )
wA = fl(2.^(ceil(log2(µA)) + β)) wB = fl(2.^(ceil(log2(µB)) + β)) S = complex ( wA, wB) E{k} = fl(( ˜A + S) − S) ˜ A = fl( ˜A − E{k}) if (nnz(E{k}) < δ ∗ m ∗ n) D{k} = sparse(D{k}) end k = k + 1 end if ( k == l ) D{k} = ˜A end end
Since Split_CompMat is basicly an adaptation of Split_Mat in [2] for splitting compex matrices, then the values of δ and l are similar to those set when executing Algorithm 1. Also, this algorithm can be used for splitting the complex matrix ˜C = (C + Di) such that
˜ C = l X s=1 F(s) F(s)= C(s)+ D(s)i and l ≤ l (26) holds and it is done by executing
F = cellfun@transpose, Split_CompMat_1 C.T, l, δ,0UniformOutput0, false in MATLAB notation.
Next, Algorithm 3 leads to the next algorithm called EFT_CompMul_1. If given any complex matri-ces ˜A = (A + Bi) and ˜C = (C + Di), then EFT_CompMul_1 algorithm computes (AC − BD) and (AD − BC) satisfying (24) such that (15) holds.
Algorithm 4 EFT_CompMul_1
function res = EFT_CompMul_1 ( ˜A, ˜C ) % ˜A and ˜C a r e a complex m a t r i x ApB = Split_CompMat_1( ˜A) ;
CpD = cellfun ( @transpose, Split_CompMat_1 ˜C.T, l, δ , ’ UniformOutput ’ , false ) ; N _AB = length ( ApB ) ;
N _CD = length ( CpD ) ; k = 1 ;
for r = 1 : N _AB for s = 1 : N _CD
AC{k} = real ( ApB{r} ) ∗ real ( CpD{s} ) ; BD{k} = −imag ( ApB{r} ) ∗ imag ( CpD{s} ) ; AD{k} = real ( ApB{r} ) ∗ imag ( CpD{s} ) ; BC{k} = imag ( ApB{r} ) ∗ real ( CpD{s} ) ; k = k + 1 ;
end end
N _AC = length ( AC ) ;
for i = (N _AC + 1) : (N _AC + length(BD)) AC{i} = BD{i − N _AC} ;
AD{i} = BC{i − N _AC} ; end
N _AC = length ( AC ) ; for j = 1 : N _AC
res{j} = complex ( AC{j}, AD{j} ) ; end
end
3.2 Error-Free Transformation of Complex Matrices
Here, the error-free transformation for complex matrix multiplication is obtained by proposing new splitting algorithm for complex matrices ˜A = (A + Bi) and ˜C = (C + Di), such that (21) and
fl A(r)C(s)−B(r)D(s) = A(r)C(s)
−B(r)D(s) and fl A(r)D(s)+B(r)C(s) = A(r)D(s)+B(r)C(s)
(27)
are satisfied. In order to do that, we firstly define a constant γ as follows:
γ := &
log2(n) − log2(u) + 1 2
'
(28)
By recalling the proof of Theorem 1 in [2], we find that β needs to satisfy
log2(n) − log2(u)
2 ≤ β (29)
such that (22) holds. Since (4), (28) and (29) suggest that
log2(n) − log2(u) 2 ≤ β =
&
log2(n) − log2(u) 2
' ≤
&
log2(n) − log2(u) + 1 2
'
= γ, (30)
then γ is valid to guarantee that (22) is still satisfied.
Next, similar to procedures done in the subsection 3.1, (16) is set and two vectors, namely p(k) ∈ Fm
and q(k)∈ Fp, are defined by
p(k)i =llog2max max
1≤j≤n|a (k) ij |, max1≤j≤n|b (k) ij | m
and qj(k)=llog2max max 1≤i≤n|c (k) ij |, max1≤i≤n|d (k) ij | m . (31) 7
Then, p(k)i and qj(k)are used for computing σ(k)∈ Fm and τ(k)∈ Fp as follows:
σi(k)= 2γ· 2p(1)i and τ(k)
j = 2
γ
· 2qj(1), (32)
Note that a(k)ij , b(k)ij , c(k)ij and d(k)ij are the elements of A(k), B(k), C(k) and D(k) in (16), respectively. In order to satisfy (19),A(k)+ B(k)i, A(k+1)+ B(k+1)i, C(k)+ D(k)iandC(k+1)+ D(k+1)iare
then computed by applying the concept of Algorithm 3.2 in [3] as follows:
a(k)ij = fl a(k)ij + σ(k)i − σ(k) i , a(k+1)ij = fla(k)ij − a(k)ij b(k)ij = fl b(k)ij + σi(k) − σi(k) , b(k+1)ij = flb(k)ij − b(k)ij c(k)ij = fl c(k)ij + τj(k) − τj(k) , c(k+1)ij = flc(k)ij − c(k)ij d(k)ij = fl d(k)ij + τj(k) − τj(k) , d(k+1)ij = fld(k)ij − d(k)ij (33)
In (33), a(k)ij , b(k)ij , c(k)ij and d(k)ij denote the elements of matrices A(k), B(k), C(k) and D(k), respectively. Again, the procedures (16), (31), (32) and (33) are based on (1), (2), (3) and (5). Next, implement-ing (31), (32) and (33) toA(k)+ B(k)iandC(k)+ D(k)i, for k = 1, 2, · · · yield results satisfying (21).
Theorem A
Let ˜A = (A + Bi) ∈ Fm×n and ˜C = (C + Di) ∈ Fn×p be two complex matrices. Implementing (31), (32) and (33) to A(k)+ B(k)iand C(k)+ D(k)i, for k = 1, 2, · · · results in (21) and it implies that (27) holds.
Proof
Assume that ˜A = (A + Bi) ∈ Fm×n and ˜
C = (C + Di) ∈ Fn×p be two complex matrices. We have
shown that applying (31), (32) and (33) to A(k)+ B(k)iand C(k)+ D(k)i, for k = 1, 2, · · · , result in (21). Hence, It suffices to just show that (27) is satisfied. Firstly, we want to demonstrate that
fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s)
Since we use σ(r) to obtain A(r)+ B(s)i satisfying (21), then (6) suggests that
a(r)ij ∈ uσi(r)Z and b(r)ij ∈ uσi(r)Z (34)
|a(r)ij | ≤ 2−γσi(r) and |b(r)ij | ≤ 2−γσi(r) (35)
Similarly, τ(s) is used to split the matrices ˜C = (C + Di) in order to obtain (21), then we have the
followings:
c(s)ij ∈ uτj(s)Z and d(s)ij ∈ uτj(s)Z (36)
|c(s)ij | ≤ 2−γτj(s) and |d(s)ij | ≤ 2−γτj(s) (37) From (34) and (36), we obtain
a(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z, b (r) ikd (s) kj ∈ u 2σ(r) i τ (s) j Z and a (r) ikc (s) kj − b (r) ikd (s) kj ∈ u 2σ(r) i τ (s) j Z (38) which implies n X k=1 a(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z, n X k=1 b(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z and n X k=1 a(r)ikc(s)kj − n X k=1 b(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z. (39) From (35) and (37), we obtain
|a(r)ikc(s)kj| ≤ 2−2γσ(r)i τj(s) and |b(r)ikd(s)kj| ≤ 2−2γσi(r)τj(s) (40)
which implies n X k=1 |a(r)ikc(s)kj| ≤ n2−2γσi(r)τj(s) and n X k=1 |b(r)ikd(s)kj| ≤ n2−2γσi(r)τj(s). (41)
Using the properties of absolute value, we find that:
n X k=1 a(r)ikc(s)kj ≤ n X k=1 |a(r)ikc(s)kj| ≤ n2−2γσ(r) i τ (s) j and n X k=1 b(r)ikd(s)kj ≤ n X k=1 |b(r)ikd(s)kj| ≤ n2−2γσ(r) i τ (s) j (42) and (42) implies that
n X k=1 a(r)ikc(s)kj − n X k=1 b(r)ikd(s)kj ≤ n X k=1 a(r)ikc(s)kj + n X k=1 b(r)ikd(s)kj ≤ n2−2γσi(r)τj(s)+ n2−2γσ(r)i τj(s) = 2n2−2γσ(r)i τj(s) = n2−2γ+1σi(r)τj(s) (43)
Using the definition of γ, we find that
n2−2γ+1= n2
−2 log2(n) − log2(u) + 1
2
+1
≤ n2−(log2(n)−log2(u)+1)+1= n2− log2(n)+log2(u)= u (44)
from (43) and (44), we find that
n X k=1 a(r)ikc(s)kj − n X k=1 b(r)ikd(s)kj ≤ uσi(r)τj(s) (45)
using (38), (39) and (45), we obtain
u2σ(r) i τ (s) j ≤ Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj ≤ uσ (r) i τ (s) j , if Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj 6= 0 fl Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj = 0, if Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj = 0 (46) From Remark 2 in [2], this means that there is no rounding errors in the evaluation of fl A(r)C(s)−
B(r)D(s). In other words, fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) is satisfied. Using similar idea,
we also obtain that fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s) and this completes the proof.
Practically, we introduce the following algorithm, named Split_CompMat_2, which is used for split-ting the complex matrix ˜A = (A + Bi) based on (16), (31), (32) and (33) such that (27) is satisfied.
Algorithm 5 Split_CompMat_2 function E = Split_CompMat_2 ( ˜A, l, δ ) % ˜A i s a complex m a t r i x [ m, n ] = size ( ˜A ) k = 1 u = 2^ −53; % d o u b l e p r e c i s i o n ( b i n a r y 6 4 ) γ = fl(d((− log2(u) + log2(n) + 1)/2)e)
% E {1} = z e r o s ( s i z e ( ˜A))+ z e r o s ( s i z e ( ˜A ) ) ∗ 1 i while ( k < l )
µA = max ( abs ( real ( ˜A ) ) , [ ] , 2 )
µB = max ( abs ( imag ( ˜A ) ) , [ ] , 2 )
µ = max(µA, µB) w = fl(2.^(ceil(log2(µ)) + γ)) S = complex ( w, w ) E{k} = fl(( ˜A + S) − S) ˜ A = fl( ˜A − E{k}) if (nnz(E{k}) < δ ∗ m ∗ n) D{k} = sparse(D{k}) end k = k + 1 end if ( k == l ) D{k} = ˜A end
Again, l and δ are similarly set to ones acting as inputs of Algorithm 1 and 3. Moreover, to split the complex matrix ˜C = (C + Di) based on (16), (31), (32) and (33) such that
˜ C = l X s=1 F(s) F(s)= C(s)+ D(s)i and l ≤ l
and (27) hold, we need to run
F = cellfun@transpose, Split_CompMat_2 C.T, l, δ,0UniformOutput0, false in MATLAB notation.
Next, we construct EFT_CompMul_2 algorithm, which is an error-free transformtion for complex ma-trix multiplication between ˜A = (A + Bi) and ˜C = (C + Di) such that (15) is achieved. It is worthy noting that the construction of this algorithm based on the Theorem A.
Algorithm 6 EFT_CompMul_2
function G = EFT_CompMul_2 ( ˜A, ˜C ) % ˜A and ˜C a r e complex m a t r i c e s E = Split_CompMat_2 ( A ) ;
F = cellfun @transpose, Split_CompMat_2 C.T, l, δ,0UniformOutput0, false ;
k = 1 ; for i = 1 : length ( E ) for j = 1 : length ( F ) G{k} = E{i} ∗ G{j} ; q = q + 1 ; end end end 10
3.3 Other Forms
In subsection 3.1, there are four matrix multiplication, namely AC, BD, AD and BC. If we let
P = A(C + D), , Q = (A + B)D and R = B(C − D) (47)
then
(A + Bi)(C + Di) = (P − Q) + (Q + R)i (48) In (47), there are only three matrix multiplication. We want to find a new splitting algorithm for P, Q and R to obtain A = nA X r=1 A(r), C + D = nCpD X s=1 S(s), A(r)∈ Fm×n, S(s)∈ Fn×p (49) A + B = nApB X r=1 T(r), D = nD X s=1 D(s), T(r)∈ Fm×n, D(s) ∈ Fn×p (50) B = nB X r=1 B(r), C − D = nCmD X s=1 U(s), B(r)∈ Fm×n, U(s)∈ Fn×p (51) where nA, nCpD, nApB, nD, nB, nCmD∈ N, such that:
A(r)S(s)= fl A(r)S(s), T(r)D(s)= fl T(r)D(s) and B(r)U(s)= fl B(r)U(s) (52) To illustrate the new splitting algorithm, we only use (49) and show that A(r)S(s)= fl A(r)S(s) holds, while (50) and (51) follow accordingly.
Firstly, (16) is set and the vector t(k)∈ Fm is defined as
t(k)i =llog2 max 1≤j≤n|a (k) ij | m . (53)
Using (53), we define the vector σ(k)∈ Fmby
σ(k)i = 2γ· 2t(k)i (54)
where a(k)ij indicates every element of matrix A(k) in (16). To compute A(k) satisfying A(k) = A(k)+
A(k+1), the concept of Algorithm 3.2 in [3] is implemented as follows:
a(k)ij = fl a(k)ij + σ(k)i − σ(k)i
, a(k+1)ij = fla(k)ij − a(k)ij (55)
where a(k)ij represents every entry of the matrix A(k). Now, if the procedures (53), (54) and (55) are applied to A(k), for k = 1, 2, · · · , then we obtain
A = nA X r=1 A(r) and A(nA+1)= O mn (56)
where Omnis a zero matrix of the size m × n. It is worthy noting that (53), (54) and (55) are respectively
based (2), (3) and (5). Concretely, we present the algorithm called Split_Mat_Mod which is used for splitting the real part of the complex matrix ˜A = (A + Bi) such that (8) is achieved.