• Tidak ada hasil yang ditemukan

CATATAN HARIAN PENELITIAN KEMITRAAN DANA ITS 2020 PENGEMBANGAN ALGORITMA TRANSFORMASI ERROR-FREE UNTUK PERKALIAN MATRIKS KOMPLEKS.

N/A
N/A
Protected

Academic year: 2021

Membagikan "CATATAN HARIAN PENELITIAN KEMITRAAN DANA ITS 2020 PENGEMBANGAN ALGORITMA TRANSFORMASI ERROR-FREE UNTUK PERKALIAN MATRIKS KOMPLEKS."

Copied!
265
0
0

Teks penuh

(1)

i

CATATAN HARIAN

PENELITIAN KEMITRAAN

DANA ITS 2020

PENGEMBANGAN ALGORITMA TRANSFORMASI ERROR-FREE UNTUK

PERKALIAN MATRIKS KOMPLEKS

Tim Peneliti :

Dr. Imam Mukhlash, S.Si., MT (Matematika/FSAD)

Drs. Bandung Arry Sanjoyo, MIKom. (Matematika/FSAD)

Drs. Nurul Hidayat, MKom (Matematika/FSAD)

Nurul Yakim Kazal (Matematika/FSAD)

DIREKTORAT RISET DAN PENGABDIAN KEPADA MASYARAKAT

INSTITUT TEKNOLOGI SEPULUH NOPEMBER

SURABAYA

2020

(2)

2

Secara umum, kegiatan penelitian ini terdiri dari kegiatan diskusi sebelum diskusi dengan mitra dilakukan. Beberapa catatan kegiatan penelitian dirangkum dalam Tabel berikut:

No

Tanggal

Kegiatan

1 09/06/2020

Catatan: Persiapan diskusi pertama dengan prof. Ozaki Dokumen Pendukung:

Lampiran 1

2 12/06/2020

Catatan: Diskusi Tim Peneliti dengan mitra Dokumen Pendukung:

Dokumen pendukung: Lampiran 2

3 22/06/2020

Catatan: Persiapan diskusi kedua, konsep dari algoritma ExtractScalar dalam transformasi error-free

Dokumen Pendukung:

-4 26/06/2020 Catatan: Implementasi konsep dari algoritma ExtractScalar dalam transformasi error-free untuk perkalian matriks kompleks (Proposed Method 1)

(3)

3

No

Tanggal

Kegiatan

Dokumen pendukung: Lampiran 3

5 30/06/2020 Catatan: Diskusi tentang implementasi konsep algoritma ExtractScalar

Dokumen pendukung: Lampiran 4

6 10/07/2020 Catatan: Implementasi konsep dari algoritma ExtractScalar dalam transformasi error-free untuk perkalian matriks kompleks (Proposed Method 2 dan 3)

Dokumen pendukung: Lampiran 4

7 21/07/2020 Catatan: Penyiapan draft paper tentang overview

Dokumen pendukung: Lampiran 5

8 24/07/2020 Catatan: Draf paper tentang overview dari transformasi error-free untuk perkalian matriks dan metode-metode yang

(4)

4

No

Tanggal

Kegiatan

Dokumen pendukung:Lampiran 5

9 03/08/2020 Catatan: Penyiapan materi paper ICoMPAC Dokumen pendukung:

10 06/08/2020 Catatan: Diskusi tentang draf paper untuk ICoMPAC 2020

Dokumen pendukung: Lampiran 6

(5)

5

No

Tanggal

Kegiatan

12 12/08/2020 Catatan: Diskusi draf paper untuk ICoMPAC 2020

13 13/08/2020 Catatan: Diskusi draf paper untuk ICoMPAC 2020 Dokumen pendukung:

14 14/08/2020 Catatan: Final review dari draf paper untuk ICoMPAC 2020 dg Prof. Ozaki

Dokumen pendukung: Lampiran 7

15 15/08/2020 Catatan: Diskusi final review dari draf paper untuk ICoMPAC 2020 berdasarkan masukan dari Prof. Ozaki

16 18/08/2020 Catatan: Submit paper ke ICoMPAC

17 19/08/2020 Catatan: Draf paper untuk Journal of Computational and Applied Mathematics (Section 2)

18 24/08/2020 Catatan: Draf paper untuk Journal of Computational and Applied Mathematics (Section 3)

Dokumen pendukung: -

19 28/08/2020 Catatan: Draf paper untuk Journal of Computational and Applied Mathematics (lanjutan Section 2 dan 3)

(6)

6

No

Tanggal

Kegiatan

Dokumen pendukung: Lampiran 8

20 29/08/2020 Catatan: Proses revisi dari draf paper untuk Journal of

Computational and Applied Mathematics (Section 2 dan 3) berdasarkan masukan dari Prof. Ozaki

21 01/09/2020 Catatan: Proses revisi dari draf paper untuk Journal of

Computational and Applied Mathematics (Section 2 dan 3) –lanjutan berdasarkan masukan dari Prof. Ozaki

22 08/09/2020 Catatan: Proses revisi dari draf paper untuk Journal of

Computational and Applied Mathematics (Section 2 dan 3) –lanjutan berdasarkan masukan dari Prof. Ozaki Dokumen pendukung:

(7)

7

No

Tanggal

Kegiatan

23 11/09/2020 Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): Zoom meeting dengan Prof. Ozaki

Dokumen pendukung:

Dokumen pendukung: Lampiran 9

24 14/09/2020 Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): berdasarkan hasil Zoom meeting dengan Prof. Ozaki

25

17/09/2020

Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): -lanjutan berdasarkan hasil Zoom meeting dengan Prof. Ozaki

26

24/09/2020

Catatan: Revisi dari draf paper untuk Journal of Computational and Applied Mathematics (Section 2 dan 3): berdasarkan hasil Zoom meeting dengan Prof. Ozaki

27

25/09/2020

Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki

28

29/09/2020

Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki – lanjutan

29

06/10/2020

Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki – lanjutan

(8)

8

No

Tanggal

Kegiatan

30

08/10/2020

Menambahkan eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics: Zoom meeting dengan Prof. Ozaki – lanjutan

Revisi paper ICoMPAC 31

12/10/2020 Persiapan Zoom Meeting dengan prof. Ozaki

32

13/10/2020

Zoom meeting dengan Prof. Ozaki: Diskusi hasil eksperimen numerik dalam draf paper untuk Journal of Computational and Applied Mathematics.

33

15/10/2020 Eksperimen ulang berdasarkan berdasarkan hasil diskusi dalam Zoom meeting

34

20/10/2020 Lanjutan eksperimen ulang berdasarkan berdasarkan hasil diskusi dalam Zoom meeting

35

22/10/2020 Diskusi perbaikan paper ICoMPAC untuk publikasi ke IOP

36

26/10/2020 Diskusi perbaikan paper ICoMPAC untuk publikasi ke IOP – lanjutan

37

27/10/2020 Zoom meeting dengan Prof. Ozaki: Diskusi lanjutan tentang paper ICoMPAC untuk publikasi ke IOP

38

02/11/2020 Submit revisi paper ke ICoMPAC untuk publikasi ke IOP 39

03/11/2020 Diskusi eksperimen numerik untuk condition number 40

04/11/2020 Eksperimen numerik untuk condition number – lanjutan 41

05/11/2020 Diskusi eksperimen numerik untuk condition number – lanjutan 42

06/11/2020 Eksperimen numerik untuk condition number – lanjutan 43

07/11/2020 Diskusi eksperimen numerik untuk condition number – lanjutan 44

(9)

9

No

Tanggal

Kegiatan

45

10/11/2020

Zoom meeting dengan Prof. Ozaki: Menambahkan analisis tentang condition number dalam draf paper untuk Journal of Computational and Applied Mathematics.

46

11/11/2020 Diskusi tentang analisis condition number dalam draf paper

47

12/11/2020

Diskusi tentang analisis condition number dalam draf paper

Eksperimen condition number untuk perkalian matriks kompleks dengan invers

48

13/11/2020 Diskusi tentang hasil eksperimen condition number untuk perkalian matriks kompleks dengan invers 49

16/11/2020 Diskusi tentang hasil eksperimen condition number untuk perkalian matriks kompleks dengan invers - lanjutan 50

17/11/2020 Diskusi tentang hasil eksperimen condition number untuk perkalian matriks kompleks dengan invers – lanjutan

51 23/11/2020 Persiapan Zoom meeting dengan Prof. Ozaki

52 24/11/2020

Zoom meeting dengan Prof. Ozaki: Revisi analisis tentang condition number dalam draf paper untuk Journal of Computational and Applied Mathematics.

53 26/11/2020 Eksperimen ulang untuk condition number Pembuatan draft Laporan Akhir

54 27/11/2020 Eksperimen ulang untuk condition number Pembuatan draft Laporan Akhir - lanjutan

55 28/11/2020 Perbaikan draft paper

Pembuatan draft Laporan Akhir - lanjutan

56 30/11/2020 Finalisasi Laporan Akhir - lanjutan

Keterangan: Dokumen pendukung pada setiap kegiatan dapat berupa foto, grafik, tabel, catatan, dokumen, data dan sebagainya)

(10)

Extending the Use of ExtractScalar Algorithm for Matrix

Splitting

Nurul Yakim Kazal1, Imam Mukhlash2, and Bandung Arry S.3

1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember

June 12, 2020

Rump et al introduced an algorithm called ExtractScalar and the analysis of this algorithm is given by Lemma 3.3 in [1] stating that:

Let q and p0 be the be the results of Algorithm 3.2 (ExtractScalar) applied to floating-point numbers σ and p. Assume σ = 2k ∈ F for some k ∈ Z, assume |p| ≤ 2−Mσ for some 0 ≤ M ∈ N . Then

p = q + p0, |p0| ≤ eps · σ, |q| ≤ 2−Mσ and q ∈ eps · σZ (1) According to this Lemma, we need to find M and σ such that |p| ≤ 2−Mσ to make the algorithm work well. Since Rump et al also introduced the concept of unit in the first place (uf p) in [1] which is defined by

uf p(r) := 2blog2|r|c f or 0 6= r ∈ R (2)

and one of its properties states that

0 6= r ∈ R =⇒ uf p(r) ≤ |r| < 2 · uf p(r) (3) then it is reasonable to use the last inequality in (3) to set

2 · uf p(r) = 2−Mσ f or 0 6= r ∈ R (4) But we also know that 2 · uf p(r) = 2 · uf p(r) · 2M · 2−M which means

σ = 2 · uf p(r) · 2M = 2M +1· uf p(r) (5)

for 0 6= r ∈ R. Therefore, we can define σ := 2M +1· uf p(r), for 0 ≤ M ∈ N when applying Ex-tractScalar to a nonzero floating point r as an input.

We can now adapt the algorithm of ExtractScalar to perform the split of a matrix A ∈ Fm×n since matrix splitting is basicly a process of applying ExtractScalar to every element of the matrix A, which is aij, for 1 ≤ i ≤ m and 1 ≤ j ≤ n. This is done by transforming the matrix A ∈ Fm×n into A(1) and

A(2) such that

A = A(1)+ A(2) (6)

In order to do this, we firstly define a constant γ by

γ = M = blog2(n)c (7)

As there are only 53 bits to store the significand of a floating point in double precision, then we need to assume that n  eps−1, where eps = 2−53. Next, we can choose to apply the ExtractScalar algorithm

(11)

either column-by-column or row-by-row. For the latter, we need to find the vector P ∈ Fm whose elements are given by

Pi = max

1≤j≤n|aij| (8)

where aij represent each element of the matrix A. Finding the vector P is useful for computing the

vector Q having similar size as P , whose elemets are defined by

Qi= 2γ+1· uf p(Pi) (9)

Obviously, (9) is just another form of (5) and we have

max

1≤i≤n|aij| ≤ Qi (10)

Therefore, we can set σi = Qi and apply the ExtractScalar algorithm to every element of the i-th row.

After implementing the algorithm to all rows of A, we end up with two matrices A(1)and A(2)satisfying (6).

Correspondingly, if we want to apply the algorithm column-by-column, we need to find the vector R ∈ Fn whose elements are given by

Rj= max

1≤i≤m|aij| (11)

where aij represent each element of the matrix A. Finding the vector R is useful for computing the

vector S having similar size as R, whose elemets are defined by

Sj = 2γ+1· uf p(Rj) (12)

Again, (9) is just another form of (5) and we have

max

1≤j≤m|aij| ≤ Sj (13)

Therefore, we can set σj = Sj and apply the ExtractScalar algorithm to every element of the j-th

column. After implementing the algorithm to all columns of A, we end up with two matrices A(1) and A(2) satisfying (6).

The procedure explained before (either row-by-row or column-by-column) can be applied again to A(2)

and it gives outputs of A(2) and A(3) satisfying

A(2)= A(2)+ A(3) (14)

From (6) and (14), we then have

A = A(1)+ A(2)+ A(3) (15) Applying the procedure repeatedly (k − 1) times, we end up having

A = A(1)+ A(2)+ · · · + A(k) (16) Algorithm 1 ExtractScalar f u n c t i o n [ q, p0] = E x t r a c t S c a l a r ( σ, p ) q = f l(σ + p) − σ p0 = f l(p − q) end 2

(12)

Algorithm 2 UnitFirstPlace f u n c t i o n [ uf p ] = U n i t F i r s t P l a c e ( r ) i f r == 0 u f p = 0 ; e l s e a = f loor(log2(abs(r))) ; uf p = 2.a end i f end Algorithm 3 MatrixSplittingByRow f u n c t i o n [ S ] = M a t r i x S p l i t t i n g B y R o w (A) [m, n] = size(A) ; γ = f loor(log2(n)) ; S{1} = zeros(size(A)) ; k = 0 ; w h i l e norm (A, i n f )~=0 k = k + 1 ; f = @ U n i t F i r s t P l a c e ; g = @ E x t r a c t S c a l a r ; mu = max ( abs (A ) , [ ] , 2 ) ; uf p = f (mu) ; sigma = 2(γ+1). ∗ uf p ; f o r i = 1 : m f o r j = 1 : n

S{k}(i, j) = g(σ(i), A(i, j)) ; end f o r end f o r A = A − S{k} ; end w h i l e end 3

(13)

Algorithm 4 MatrixSplittingByColumn f u n c t i o n [ S ] = M a t r i x S p l i t t i n g B y C o l u m n (A) [m, n] = size(A) ; γ = f loor(log2(n)) ; S{1} = zeros(size(A)) ; k = 0 ; w h i l e norm (A, i n f )~=0 k = k + 1 ; f = @ U n i t F i r s t P l a c e ; g = @ E x t r a c t S c a l a r ; mu = max ( abs (A ) ) ; uf p = f (mu) ; sigma = 2(γ+1). ∗ uf p ; f o r j = 1 : n f o r i = 1 : m S{k}(i, j) = g(σ(j), A(i, j)) ; end f o r end f o r A = A − S{k} ; end w h i l e end

References

[1] Siegfried M. Rump, Takeshi Ogita, and Shin’ichi Oishi. Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput, page 2008.

(14)

Error-Free Transformation for Complex Matrix

Multiplication

Nurul Yakim Kazal1, Imam Mukhlash2, Chairul Imron3, and Bandung Arry S.4

1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematics, Institut Teknologi Sepuluh Nopember

June 25, 2020

1

ExtractScalar Algorithm for Error-Free Splitting of Floating

Point Numbers

Rump et al. introduced the concept of unit in the first place (uf p) or leading bit of a real number in [2] which is defined by

0 6= r ∈ R =⇒ uf p(r) := 2blog2|r|c and uf p(0) := 0 (1)

Based on this definition, we construct an algorithm called UnitFirstPlace as follows:

Algorithm 1 UnitFirstPlace f u n c t i o n [ uf p ] = U n i t F i r s t P l a c e ( r ) i f r == 0 uf p = 0 e l s e a = f l o o r ( l o g 2 ( abs ( r ) ) ) uf p = 2a end end

One of uf p properties given in [2] states that

0 6= r ∈ R =⇒ uf p(r) ≤ |r| < 2 · uf p(r) (2) Proof of (2)1

Assume that 0 6= r ∈ R. If 0 6= r ∈ R, then log2|r| ∈ R and it applies that 0 ≤ log2|r| − blog2|r|c < 1.

Also, the following holds

0 ≤ log2|r| − blog2|r|c < 1 =⇒ blog2|r|c ≤ log2|r| < 1 + blog2|r|c =⇒ 2blog2|r|c ≤ 2log2|r|< 21+blog2|r|c

=⇒ 2blog2|r|c ≤ 2log2|r|< 2 · 2blog2|r|c

=⇒ uf p(r) ≤ |r| < 2 · uf p(r)

Rump et al. also constructed an algorithm named ExtractScalar in [2] as follows: 1the property is given but not proven in [2]

(15)

Algorithm 2 ExtractScalar

f u n c t i o n [ q, p0] = E x t r a c t S c a l a r ( σ, p ) q = f l(σ + p) − σ

p0 = f l(p − q) end

the Algorithm 2 is basically used for splitting a floating point number p into two parts, namely q and p0, such that p = q + p0. The analysis of Algorithm 2 is given by the following lemma:

Lemma 12

Let q and p0 be the be the results of Algorithm 2 (ExtractScalar) applied to floating-point numbers σ and p. Assume σ = 2k

∈ F for some k ∈ Z, assume |p| ≤ 2−Mσ for some 0 ≤ M ∈ N . Then p = q + p0, |p0| ≤ u · σ, |q| ≤ 2−Mσ and

q ∈ u · σZ (3) with u = 2−53 for IEEE 753 binary64 (double precision).

According to this lemma, we need to find M and σ such that |p| ≤ 2−Mσ is satisfied to make the algorithm work well. Since the second inequality given by (2) always holds for 0 6= r ∈ R, then it is reasonable to set

2−Mσ = 2 · uf p(r) f or 0 6= r ∈ R (4) such that

|r| < 2−Mσ (5)

which is the assumption required by the lemma is always satisfied. However, we also know that

2 · uf p(r) = 2 · uf p(r) · 2M· 2−M (6)

From (4) and (6), we have 2 · uf p(r) · 2M· 2−M = 2−Mσ and applying cancellation law to this equation

gives the following:

σ = 2 · uf p(r) · 2M

= 2M +1· uf p(r) (7)

for 0 6= r ∈ R. Therefore, we can define

σ := 2M +1· uf p(r) (8)

for 0 ≤ M ∈ N when applying ExtractScalar to a nonzero floating point r as an input.

2

Extending Use of ExtractScalar Algorithm for Matrix Splitting

and Matrix Multiplication

2.1 Matrix Splitting

We can now adapt the ExtractScalar algorithm to perform the transformation of a matrix A ∈ Fm×n

into A(1) and A(2) of the same size as A such that

A = A(1)+ A(2) (9)

This is done by applying the algorithm of ExtractScalar to every element aij of A, for 1 ≤ i ≤ m and

1 ≤ j ≤ n, either row-by-row or column-by-column. For the former, we need to find a vector P ∈ Fm whose elements are given by

Pi = max

1≤j≤n|aij| (10)

2The lemma has been proven in [2]

(16)

for 1 ≤ i ≤ m and aij represents each element of the matrix A. Finding the vector P is used for

computing the vector σ, having similar size to P , whose elemets are defined by

σi= 2M +1· uf p(Pi) (11)

Obviously, (11) is just another form of (8). From (11) and (5), we have

Pi< σi (12)

which implies that |aij| < σi, for 1 ≤ j ≤ n. This result shows that σi, for 1 ≤ i ≤ m satisfies the

required assumption of Lemma 1. Therefore, we can now apply the ExtractScalar algorithm to every element of the i-th row of A and end up with two matrices A(1) and A(2)

satisfying (9). This procedure is done by applying the following algorithm called MatrixSplittingByRow:

Algorithm 3 MatrixSplittingByRow f u n c t i o n [ A(1), A(2)] = M a t r i x S p l i t t i n g B y R o w ( A ) [ m, n ] = s i z e ( A ) P = max ( abs ( A ) , [ ] , 2 ) uf p = U n i t F i r s t P l a c e ( P ) σ = 2^(M + 1). ∗ uf p A(1) = z e r o s ( s i z e ( A ) ) A(2) = z e r o s ( s i z e ( A ) ) f o r i = 1 : m f o r j = 1 : n

[ A(1)(i, j), A(2)(i, j) ] = E x t r a c t S c a l a r ( σ(i), A(i, j) ) end

end end

Correspondingly, if we want to apply the ExtractScalar algorithm to A column-by-column, we need to find the vector Q ∈ Fn whose elements are given by

Qj= max

1≤i≤m|aij| (13)

for 1 ≤ j ≤ n and aij represents each element of the matrix A. Finding the vector Q is useful as it is

intended for computing the vector τ , having similar size as Q, whose elemets are defined by

τj= 2M +1· uf p(Qj) (14)

Again, (14) is just another form of (8). using (14) and (5), we have

Qj< τj (15)

implying that |aij| < τj, for 1 ≤ i ≤ m. This demonstrates that τj, for 1 ≤ j ≤ n, meets the necessary

assumption of Lemma 1. Therefore, we can then apply the ExtractScalar algorithm to every element of the j-th column of A and we obtain two matrices A(1) and A(2) such that (9) holds. This procedure

is done by applying the following algorithm called MatrixSplittingByColumn:

(17)

Algorithm 4 MatrixSplittingByColumn f u n c t i o n [ A(1), A(2)] = M a t r i x S p l i t t i n g B y C o l u m n ( A ) [ m, n ] = s i z e ( A ) Q = max ( abs ( A ) ) uf p = U n i t F i r s t P l a c e ( Q ) τ = 2^(M + 1). ∗ uf p A(1) = z e r o s ( s i z e ( A ) ) A(2) = z e r o s ( s i z e ( A ) ) f o r i = 1 : m f o r j = 1 : n

[ A(1)(i, j), A(2)(i, j) ] = E x t r a c t S c a l a r ( τ (j), A(i, j) )

end end end

2.2 Matrix Multiplication

As what has been explained in [1], we firstly define M by

M := & log2(n) + 53 2 ' (16)

and use this M in Algorithm 3 and Algorithm 4 to split matrices A ∈ Fm×nand B ∈ Fn×prespectively and repeatedly. Then, there exists nA, nB∈ N such that

A = nA X r=1 D(r) , B = nB X s=1 E(s) , D(nA+1)= O mn , and E(nB+1)= Onp (17)

where Omnis a zero matrix of the size m × n.

Next, we modify the Algorithm 3 and 4 to construct the Algorithm 5 and 6, in order to trans-form matrices A ∈ Fm×n

and B ∈ Fn×p such that (17) is obtained.

Algorithm 5 MatrixSplittingByRow_Mod f u n c t i o n [ D(r)] = MatrixSplittingByRow_Mod ( A ) [ m, n ] = s i z e ( A ) M = c e i l ( ( l o g 2 ( n ) + 5 3 ) / 2 ) D(r){1} = z e r o s ( s i z e ( A ) ) k = 0 w h i l e norm ( A , i n f ) 6= 0 k = k + 1 P = max ( abs ( A ) , [ ] , 2 ) uf p = U n i t F i r s t P l a c e ( P ) σ = 2^(M + 1). ∗ uf p f o r i = 1 : m f o r j = 1 : n

[ D(r){k}(i, j), A(i, j) ] = E x t r a c t S c a l a r ( σ(i), A(i, j) ) end

end end

end

(18)

Algorithm 6 MatrixSplittingByColumn_Mod f u n c t i o n [ E(s)] = MatrixSplittingByColumn_Mod ( B ) [ n, p ] = s i z e ( B ) M = c e i l ( ( l o g 2 ( n ) + 5 3 ) / 2 ) E(s){1} = z e r o s ( s i z e ( B ) ) k = 0 w h i l e norm ( B , i n f ) 6= 0 k = k + 1 Q = max ( abs ( A ) , [ ] , 2 ) uf p = U n i t F i r s t P l a c e ( Q ) τ = 2^(M + 1). ∗ uf p f o r i = 1 : m f o r j = 1 : n

[ E(s){k}(i, j), B(i, j) ] = E x t r a c t S c a l a r ( τ (j), B(i, j) )

end end end

end

Next, The Theorem 1 in [1] guarantees that

f l A(r)B(s) = A(r)B(s)

(18)

which also implies an error-free transformation of a matrix product by

AB = X

1≤i≤nA,1≤j≤nB

f l A(i), B(j)

(19)

Next, we adapt the EFT_Mul algorithm constructed by Ozaki et al. in [1] using slightly different matrix splitting algorithm, namely Algorithm 5 and 6, to form EFT_MatMul algorithm. This algorithm then transforms the result of matrix product between A ∈ Fm×n

and B ∈ Fn×p into unevaluated sum of

floating-point matrices without rounding errors, such that

AB =

nA·nB

X

i=1

C(i), where C(i)∈ Fm×p (20)

Algorithm 7 EFT_MatMul f u n c t i o n [ C ] = EFT_MatMul( A, B ) P = MatrixSplittingByRow_Mod ( A ) Q = MatrixSplittingByColumn_Mod ( B ) N A = l e n g t h ( P ) N B = l e n g t h ( Q ) k = 1 f o r i = 1 : N A f o r j = 1 : N B C{k} = P {i} ∗ Q{j} k = k + 1 end end end

To obtain the accurate result from (20), we can apply the accurate summation algorithm called AccSum, developed by Rump et al. in [2], to the output of the Algorithm 7 and we have the following algorithm:

(19)

Algorithm 8 AccMatMul f u n c t i o n [ Result_AB ] = AccMatMul ( A, B ) [ m, n ] = s i z e ( A ) [ n, p ] = s i z e ( B ) C = EFT_MatMul( A, B ) Result_AB = z e r o s ( m, p ) f o r i = 1 : m f o r j = 1 : p v e c t o r = z e r o s ( s i z e ( C ) ) f o r k = 1 : l e n g t h ( v e c t o r ) v e c t o r ( k ) = C{k}(i, j) end Result_AB ( i, j ) = AccSum ( v e c t o r ) end end end

3

Complex Matrix Multiplication

3.1 Simple Application

If we have matrices A, B ∈ Fm×n

and C, D ∈ Fn×p, then

(A + Bi)(C + Di) = (AC − BD) + (AD + BC)i (21)

From section 2, we know that the accurate result of the matrix products AC, BD, AD, and BC can be achieved by implementing either algorithm 8. Then applying AccSum to AC and −BD, we get (AC − BD). Similarly, executing AccSum with AD and BC as inputs yields (AD + BC). Using these ideas, we constructed an algorithm called AccCompMatMul to obtain the accurate results of (21) as follows: Algorithm 9 AccCompMatMul f u n c t i o n [ Re alPart , I m a g i n a r y P a r t ] = AccCompMatMul ( A, B, C, D ) [ m, n ] = s i z e ( A ) ; [ n, p ] = s i z e ( C ) Re al{1} = AccMatMul ( A, C ) Re al{2} = −AccMatMul ( B, D ) R e a l P a r t = z e r o s ( m, p ) I m a g i n a r y {1} = AccMatMul ( A, D ) I m a g i n a r y {2} = AccMatMul ( B, C ) I m a g i n a r y P a r t = z e r o s ( m, p ) f o r i = 1 : m f o r j = 1 : p R e a l V e c t o r = z e r o s ( s i z e ( R eal ) ) I m a g i n a r y V e c t o r = z e r o s ( s i z e ( I m a g i n a r y ) ) f o r k = 1 : l e n g t h ( R e a l V e c t o r ) R e a l V e c t o r ( k ) = Re al{k}(i, j) I m a g i n a r y V e c t o r ( k ) = I m a g i n a r y {k}(i, j) end R e a l P a r t ( i, j ) = AccSum ( R e a l V e c t o r ) I m a g i n a r y P a r t ( i, j ) = AccSum ( I m a g i n a r y V e c t o r ) end end end 6

(20)

RealPart and ImaginaryPart which are the outputs of Algorithm 9 represent (AC − BD) and (AD + BC) respectively.

References

[1] Katsuhisa Ozaki, Takeshi Ogita, Shin’ichi Oishi, and Siegfried M Rump. Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numerical Algorithms, 59(1):95–118, 2012.

[2] Siegfried M. Rump, Takeshi Ogita, and Shin’ichi Oishi. Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput, page 2008.

(21)

Error-Free Transformation for Complex Matrix

Multiplication

Nurul Yakim Kazal1, Imam Mukhlash2, Chairul Imron3, and Bandung Arry S.4

1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematics, Institut Teknologi Sepuluh Nopember

July 10, 2020

1

Tentative Goal 3.2

If we are given complex matrices

A + Bi and C + Di (1)

for A, B ∈ Fm×n

and C, D ∈ Fn×p, then

(A + Bi)(C + Di) = (AC − BD) + (AD + BC)i (2)

Ozaki et al. in [1] defined β as

β := &

log2(n) − log2(u) 2

'

(3)

From the proof of Theorem 1 in [1], we find that β needs to satisfy the following condition:

log2(n) − log2(u)

2 ≤ β (4)

such that

fl A(r)C(s) = A(r)C(s), fl B(r)D(s) = B(r)D(s), fl A(r)D(s) = A(r)D(s)

and fl B(r)C(s) = B(r)C(s)

(5) hold, where A(r), B(r), C(s) and D(s) are given by the followings:

A = nA X r=1 A(r), B = nB X r=1 B(r), C = nC X s=1 C(s), D = nD X s=1 D(s) (6) for A(r), B(r) ∈ Fm×n and C(s), D(s)

∈ Fn×p. We need to construct new splitting algorithm to obtain

(6) such that the followings are satisfied:

fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) and fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s) (7)

In order to do that, we firstly define a constant γ as follows:

γ := &

log2(n) − log2(u) + 1 2

'

(8)

From (3), (4) and (8), we have

log2(n) − log2(u) 2 ≤ β =

&

log2(n) − log2(u) 2

' <

&

log2(n) − log2(u) + 1 2

'

= γ (9)

(22)

which means that γ is valid to guarantee that (5) is satisfied.

Similar to procedures done in [1], two vectors, namely σ(1)∈ Fmand τ(1)

∈ Fp, are defined by

σ(1)i = 2γ· 2Pi(1) and τ(1)

j = 2

γ· 2Q(1)j (10)

where Pi(1) and Q(1)j are given by

Pi(1) =llog2 max

1≤j≤n|aij| + max1≤j≤n|bij|

m

and Q(1)j =llog2 max

1≤i≤m|aij| + max1≤i≤m|bij|

m

(11)

using σ(1)and the concept of ExtractScalar to every element of matrices A and B, then using τ(1) and

the concept of ExtractScalar to every element of matrices C and D result in the followings:

A = A(1)+ A(2), B = B(1)+ B(2), C = C(1)+ C(2) and D = D(1)+ D(2) (12)

Again, σ(2)∈ Fm and τ(2)

∈ Fp are defined by:

σ(2)i = 2γ· 2Pi(2) and τ(2)

j = 2

γ· 2Q(2)j

(13)

where Pi(2) and Q(2)j are given by

Pi(2) =llog2 max 1≤j≤n|a (2) ij | + max 1≤j≤n|b (2) ij | m

and Q(2)j =llog2 max

1≤i≤m|c (2) ij | + max 1≤i≤m|d (2) ij | m (14)

using σ(2) and the concept of ExtractScalar to every element of matrices A(2) and B(2), then using τ(2) and the concept of ExtractScalar to every element of matrices C and D result in the followings:

A(2) = A(2)+ A(3), B(2)= B(2)+ B(3), C(2)= C(2)+ C(3) and D(2)= D(2)+ D(3) (15)

The general idea is to define σ(w)∈ Fmdan τ(w)

∈ Fp as

σ(w)i = 2γ· 2Pi(w) and τ(w)

j = 2

γ· 2Q(w)j (16)

where Pi(w) and Q(w)j are given by

Pi(w)=llog2 max 1≤j≤n|a (w) ij | + max1≤j≤n|b (w) ij | m

and Q(w)j =llog2 max

1≤i≤m|c (w) ij | + max1≤i≤m|d (w) ij | m (17)

using σ(w)and the concept of ExtractScalar to every element of matrices A(w)and B(w), then using τ(w)

and the concept of ExtractScalar to every element of matrices C(w)and D(w) result in the followings:

A(w)= A(w)+ A(w+1), B(w) = B(w)+ B(w+1), C(w)= C(w)+ C(w+1) and D(w)= D(w)+ D(w+1)

(18) This general idea are implemented repeatedly until (6) and the following conditions hold:

A(nA+1)= O

mn, B(nB+1)= Omn, C(nC+1)= Onp and D(nD+1)= Omn (19)

where Omnand Onprepresent zero matrices of the size m × n and n × p respectively.

Theorem A

Assume that A, B ∈ Fm×n and C, D ∈ Fn×p. Implementing (16) and the general idea repeatedly results in (19). Also, it also implies that (7) holds.

Proof

We need to show that (6) holds and the followings are true:

fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) and fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s)

(23)

It suffices to just show that fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s) is satisfied, since fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) follows exatly similar idea but using different sign.

As it has been explained before, if we use σ(r) to split the matrices A and B, then we have

a(r)ij ∈ uσi(r)Z and b(r)ij ∈ uσ (r)

i Z (20)

|a(r)ij | ≤ 2−γσi(r) and |b(r)ij | ≤ 2−γσi(r) (21)

Similarly, if we use τ(s) to split the matrices C and D, then we have

c(s)ij ∈ uτj(s)Z and d(s)ij ∈ uτ (s)

j Z (22)

|c(s)ij | ≤ 2−γτj(s) and |d(s)ij | ≤ 2−γτj(s) (23) (20), (21), (22) and (24) are all consequences of applying the concept of ExtractScalar algorithm as it is shown by Lemma 3.3 in [2].

From (20) and (22), we obtain

a(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z, b (r) ikc (s) kj ∈ u 2σ(r) i τ (s) j Z and a (r) ikd (s) kj + b (r) ikc (s) kj ∈ u 2σ(r) i τ (s) j Z (24)

which then implies

n X k=1 a(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z, n X k=1 b(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z and n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z (25) From (21) and (23), we obtain

|a(r)ikd(s)kj| ≤ 2−2γσ(r)i τj(s) and |b(r)ikc(s)kj| ≤ 2−2γσi(r)τj(s) (26)

which then implies

n X k=1 |a(r)ikd(s)kj| ≤ n2−2γσ(r)i τj(s) and n X k=1 |b(r)ikc(s)kj| ≤ n2−2γσ(r)i τj(s) (27)

Obviously, the followings hold:

n X k=1 a(r)ikd(s)kj ≤ n X k=1 |a(r)ikd(s)kj| ≤ n2−2γσi(r)τj(s) and n X k=1 b(r)ikc(s)kj ≤ n X k=1 |b(r)ikc(s)kj| ≤ n2−2γσi(r)τj(s) (28) and (28) implies that

n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ≤ n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ≤ n2−2γσi(r)τj(s)+ n2−2γσ(r)i τj(s) = 2n2−2γσ(r)i τj(s) = n2−2γ+1σi(r)τj(s) (29)

Using the definition of γ, we find that

n2−2γ+1= n2

−2 log2(n) − log2(u) + 1

2



+1

≤ n2−(log2(n)−log2(u)+1)+1= n2− log2(n)+log2(u)= u (30)

from (29) and (30), we find that

n X k=1 a(r)ikd(s)kj + n X k=1 b(r)ikc(s)kj ≤ uσi(r)τj(s) (31) 3

(24)

using (24), (25) and (31), we obtain      u2σ(r) i τ (s) j ≤ Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj ≤ uσ (r) i τ (s) j , if Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj 6= 0 fl  Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj  = 0, if Pn k=1a (r) ikd (s) kj + Pn k=1b (r) ikc (s) kj = 0 (32) From Remark 2 in [1], this means that there is no roundoff in fl A(r)D(s)+ B(r)C(s) and this completes

the proof. 

Based on this theorem, we develop the following algorithms:

Algorithm 1 Split_AB f u n c t i o n [ E, F ] = Split_AB ( A, B ) [ ~ , n ] = s i z e ( A ) k = 1 u = 2^−53 gamma = c e i l ( ( l o g 2 ( n)− l o g 2 ( u ) + 1 ) / 2 ) E {1} = z e r o s ( s i z e ( A ) ) F {1} = z e r o s ( s i z e ( B ) )

w h i l e norm ( A , i n f )~=0 && norm ( B , i n f )~=0 mu_A = max ( abs ( A ) , [ ] , 2 )

mu_B = max ( abs ( B ) , [ ] , 2 ) mu = z e r o s ( s i z e (mu_A) )

f o r i = 1 : l e n g t h (mu)

mu( i ) = mu_A( i)+mu_B( i ) end w = 2 . ^ ( c e i l ( l o g 2 (mu))+gamma) S = repmat ( w , 1 , n ) E {k} = ( A + S ) − S F {k} = ( B + S ) − S A = A − E {k} B = B − F {k} k = k + 1 end end 4

(25)

Algorithm 2 Split_CD f u n c t i o n [ G, H ] = Split_CD ( C, D ) [ ~ , n ] = s i z e ( C ) k = 1 u = 2^−53 gamma = c e i l ( ( l o g 2 ( n)− l o g 2 ( u ) + 1 ) / 2 ) G{1} = z e r o s ( s i z e ( C ) ) H {1} = z e r o s ( s i z e ( D ) )

w h i l e norm ( C , i n f )~=0 && norm ( D , i n f )~=0 mu_C = max ( abs ( C ) )

mu_D = max ( abs ( D ) ) mu = z e r o s ( s i z e (mu_C) )

f o r i = 1 : l e n g t h (mu)

mu( i ) = mu_C( i)+mu_D( i ) end w = 2 . ^ ( c e i l ( l o g 2 (mu))+gamma) S = repmat ( w , n , 1 ) G{k} = ( C + S ) − S H {k} = ( D + S ) − S C = C − G{k} D = D − H {k} k = k + 1 end end Algorithm 3 EFT_ComMatMul f u n c t i o n [ K ] = EFT_ComMatMul( A, B, C, D ) [ E, F ] = Split_AB ( A, B ) [ G, H ] = Split_CD ( C, D ) q = 1 f o r i = 1 : l e n g t h ( E ) f o r j = 1 : l e n g t h ( H ) I {q} = E {i}∗H {j } q = q + 1 end end r = 1 f o r i = 1 : l e n g t h ( F ) f o r j = 1 : l e n g t h ( G ) J {r} = F {i}∗G{j } r = r + 1 end end K {1} = z e r o s ( s i z e ( I ) ) f o r i = 1 : l e n g t h ( I )

K {i} = I {i}+J {i} end

end

2

Tentative Goal 3.3

In simple application, there are four matrix multiplication, namely AC, BD, AD and BC. If we let

P = A(C + D), , Q = (A + B)D and R = B(C − D) (33)

(26)

then

(A + Bi)(C + Di) = (P − Q) + (Q + R)i (34) In (34), there are only three matrix multiplication. We need to find a new splitting algorithm for P, Q and R to obtain the following:

A = nA X r=1 A(r), C + D = nCpD X s=1 S(s), A(r) ∈ Fm×n, S(s) ∈ Fn×p (35) A + B = nApB X r=1 T(r), D = nD X s=1 D(s), T(r)∈ Fm×n, D(s) ∈ Fn×p (36) B = nB X r=1 B(r), C − D = nCmD X s=1 U(s), B(r)∈ Fm×n, U(s) ∈ Fn×p (37)

where nA, nCpD, nApB, nD, nB, nCmD∈ N, such that:

A(r)S(s)= fl A(r)S(s), T(r)D(s)= fl T(r)D(s)

and B(r)U(s)= fl B(r)U(s)

(38)

To illustrate the new splitting algorithm, we only use (35), while (36) and (37) follow accordingly.

Firstly, we can define a vector σ(1) ∈ Fmas

σ(1)i = 2γ· 2Vi(1) (39) where Vi(1)=llog2 max 1≤j≤n|aij| m (40)

and aijrepresents the element of matrix A. Then, we use σ(1)and implement the concept of ExtractScalar

algorithm to every element of the matrix A to obtain A = A(1)+ A(2). Again, we define σ(2)∈ Fm as

σ(2)i = 2γ· 2Vi(2) (41) where Vi(2)=llog2 max 1≤j≤n|a (2) ij | m (42)

and a(2)ij represents the element of matrix A(2). Then, we use σ(2) and implement the concept of

ExtractScalar algorithm to every element of the matrix A(2) to obtain A(2) = A(2) + A(3). The

basic idea is to define σ(w) ∈ Fmas

σ(w)i = 2γ· 2Vi(w) (43) where Vi(w)=llog2 max 1≤j≤n|a (w) ij | m (44)

and a(w)ij represents the element of matrix A(w). Then, we use σ(w) and implement the concept of

ExtractScalar algorithm to every element of the matrix A(w) to obtain A(w)= A(w)+ A(w+1).

Imple-menting (43) and the basic idea explained before repeatedly results in

A = nA X r=1 A(r) and A(nA+1)= O mn (45)

where Omnis a zero matrix of the size m × n.

Next, we define τ(1)∈ Fp as

τj(1)= 2γ· 2Wi(1) (46)

where

Wj(1)=llog2 max

1≤i≤m|cij| + max1≤i≤m|dij|

m

(47)

(27)

cij and dij represent the elements of matrix C and D. Then, we use τ(1) and implement the concept

of ExtractScalar algorithm to every element of the matrix C and D to obtain C = C(1)+ C(2) and D = D(1)+ D(2) Again, we define τ(2)∈ Fp as τj(2)= 2γ· 2Wj(2) (48) where Wj(2)=llog2 max 1≤i≤m|c (2) ij | + max1≤i≤m|d (2) ij | m (49)

c(2)ij and d(2)ij represents the element of matrix C(2)and D(2). Then, we use τ(2)and implement the concept of ExtractScalar algorithm to every element of the matrix C(2) and D(2) to obtain C(2) = C(2)+ C(3) and D(2)= D(2)+ D(3)

. The basic idea is to define τ(w)∈ Fp as

τj(w)= 2γ· 2Wj(w) (50) where Wj(w)=llog2 max 1≤i≤m|c (w) ij | + max1≤i≤m|d (w) ij | m (51)

and c(w)ij and d(w)ij represent the element of matrix C(w)and D(w). Then, we use τ(w) and implement the

concept of ExtractScalar algorithm to every element of the matrix C(w) and D(w) to obtain C(w) =

C(w)+ C(w+1) and D(w)= D(w)+ D(w+1). We then implement (50) and the basic idea explained before

repeatedly until the followings hold:

C = nC X s=1 C(s), C(nC+1)= O np and D = nD X s=1 D(s), D(nD+1)= O np (52)

where Onpis a zero matrix of the size n × p. Through this step, we get nC = nD and we can find

S(s)= C(s)+ D(s) (53)

for 1 ≤ s ≤ nC= nD.

Theorem B

Let A ∈ Fm×n and C, D ∈ Fn×p. Implementing (43), (50) and the basic idea explained before re-peatedly yield (45) and (52). It then implies that A(r)S(s)= fl A(r)S(s), where S(s) is given by (53). Proof

If we split A using σ(r), then the consequences of Lemma 3.3 in [2] yield the following:

a(r)ij ∈ uσ(r)i Z (54)

and

|a(r)ij | ≤ 2−γσ(r)

i (55)

Similarly, If we use τ(s) to split C and D, then

c(s)ij ∈ uτj(s)Z and d(s)ij ∈ uτ (s)

j Z (56)

and

|c(s)ij | ≤ 2−γτj(s) and |d(s)ij | ≤ 2−γτj(s) (57) Using (53) and (56), we have

s(s)ij = c(s)ij + d(s)ij ∈ uτj(s)Z (58) From (54) and (58), we also have

a(r)iks(s)kj ∈ u2σ(r)

i τ

(s)

j Z (59)

which then implies

n

X

k=1

a(r)iks(s)kj ∈ u2σ(r)i τj(s)Z (60)

(28)

Using (53) and (57), we have

|s(s)ij | = |c(s)ij + d(s)ij | ≤ |c(s)ij | + |d(s)ij | ≤ 2−γτj(s)+ 2−γτj(s)= 2 · 2−γτj(s)= 2−γ+1τj(s) (61)

or we can write

|s(s)ij | ≤ 2−γ+1τ(s)

j (62)

Using (55) and (62), we obtain

|a(r)iks(s)kj| = |a(r)ik||s(s)kj| ≤ 2−γσ(r)i 

2−γ+1τj(s) = 2−2γ+1σ(r)

i τ

(s)

j (63)

which then implies

n X k=1 a(r)iks(s)kj ≤ n X k=1 |a(r)iks(s)kj| ≤ n2−2γ+1σ(r) i τ (s) j (64)

Using the definition of γ given by (8) as we did in (30), we find that

n2−2γ+1 ≤ u (65)

From (64) and (65), we obtain

n X k=1 a(r)iks(s)kj ≤ uσi(r)τj(s) (66)

Using relations given by (59), (60) and (66), we conclude that

     u2σ(r) i τ (s) j ≤ Pn k=1a (r) iks (s) kj ≤ uσ (r) i τ (s) j , if Pn k=1a (r) iks (s) kj 6= 0 fl  Pn k=1a (r) iks (s) kj  = 0, if Pn k=1a (r) iks (s) kj = 0 (67)

From Remark 2 in [1], this means that there is no roundoff in fl A(r)S(s) and this completes the proof. 

Based on this theorem, we develop the following algorithms:

Algorithm 4 Split_CpD f u n c t i o n [ F, G ] = Split_CpD ( C, D ) q = s i z e ( C, 1 ) k = 1 u = 2^−53 b e t a = c e i l (( − l o g 2 ( u ) + l o g 2 ( q ) ) / 2 ) F {1} = z e r o s ( s i z e ( C ) ) G{1} = z e r o s ( s i z e ( D ) )

w h i l e norm ( C , i n f )~=0 && norm ( D , i n f )~=0 mu_C = max ( abs ( C ) )

mu_D = max ( abs ( D ) ) mu = z e r o s ( s i z e (mu_C) )

f o r i = 1 : l e n g t h (mu)

mu( i ) = mu_C( i)+mu_D( i ) end w = 2 . ^ ( c e i l ( l o g 2 (mu))+ b e t a ) S = repmat ( w , q , 1 ) F {k} = ( C + S ) − S G{k} = ( D + S ) − S C = C − F {k} D = D − G{k} k = k + 1 end end 8

(29)

Algorithm 5 Equation12 f u n c t i o n [ H ] = E q u a t i o n 1 2 ( A, C, D ) E = S p l i t A ( A ) [ F, G ] = Split_CD ( C, D ) S {1} = z e r o s ( s i z e ( F { 1 } ) ) f o r k = 1 : l e n g t h ( F ) S {k} = F {k}+G{k} end l = 1 f o r i = 1 : l e n g t h ( E ) f o r j = 1 : l e n g t h ( S ) H {l} = E {i}∗S {j } l = l + 1 end end end

References

[1] Katsuhisa Ozaki, Takeshi Ogita, Shin’ichi Oishi, and Siegfried M Rump. Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numerical Algorithms, 59(1):95–118, 2012.

[2] Siegfried M. Rump, Takeshi Ogita, and Shin’ichi Oishi. Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput, page 2008.

(30)

Error-Free Transformation for Complex Matrix

Multiplication

Nurul Yakim Kazal1, Imam Mukhlash2, Chairul Imron3, Bandung Arry S.4, and Katsuhisa Ozaki5

1Department of Mathematics, Institut Teknologi Sepuluh Nopember 2Department of Mathematics, Institut Teknologi Sepuluh Nopember 3Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematics, Institut Teknologi Sepuluh Nopember 4Department of Mathematical Sciences, Shibaura Institute of Technology

July 24, 2020

1

Introduction

2

Overview of Error-Free Transformation for (Real) Matrix

Mul-tiplication

Let F be a set of floating-point numbers in IEEE 754. fl(·) means that all operations in the parenthesis are evaluated by floating-point arithmetic. Let u be the roundoff unit, i.e., u = 2−24 for binary32 in IEEE 754 and u = 2−53 for binary64 in IEEE 754.

We briefly introduce the error-free transformation of matrix multiplication developed in [2]. For given two matrices A ∈ Fm×n

and B ∈ Fn×p, we consider the error-free transformation of AB. Let

A = A(k) and B = B(k). (1)

We set two vectors v(k)∈ Fmand w(k)∈ Fp as follows:

vi(k)=llog2 max 1≤j≤n|a (k) ij | m , wj(k)=llog2 max 1≤i≤n|b (k) ij | m . (2)

Here, we define exceptions:

max 1≤j≤n|a (k) ij | = 0 =⇒ v (k) i = 0, 1≤i≤mmax |b (k) ij | = 0 =⇒ w (k) j = 0.

Next, vectors v(k)and w(k)are respectively used for the purpose of computing other two vectors, namely σ(k) ∈ Fmand τ(k) ∈ Fp. Specifically, σ(k) ∈ Fmand τ(k) ∈ Fp are defined as σi(k)= fl2β· 2v(k)i  = 2β· 2v(k)i , τ(k) j = fl  2β· 2wj(k)  = 2β· 2wj(k). (3) where β is given by β = & log2n − log2u 2 ' . (4)

As stated by Remark 3 in [2], the aim of obtaining vectors σ(k) and τ(k) are for finding f and g which

are powers of 2 such that

max 1≤j≤n|a (k) ij | ≤ f and max 1≤i≤n|b (k) ij | ≤ g 1

(31)

Then, A(k), A(k+1), B(k), and B(k+1) satisfying A(k) = A(k)+ A(k+1) and B(k) = B(k)+ B(k+1) are obtained by a(k)ij = fl a(k)ij + σ(k)i  − σ(k) i  , a(k+1)ij = fla(k)ij − a(k)ij  b(k)ij = fl b(k)ij + τj(k) − τ(k) j  , b(k+1)ij = flb(k)ij − b(k)ij  (5)

The procedures involving (3) and (5) are simply adopted from Algorithm 3.2 constructed by Rump et al. [3] which is known as ExtractScalar. Consequently, we have the followings:

a(k)ij = a(k)ij + a(k+1)ij , a (k) ij ≤ 2 −βσ(k) i , a (k) ij ∈ uσ (k) i Z and a (k+1) ij ≤ uσ (k) i b(k)ij = b(k)ij + b(k+1)ij , b (k) ij ≤ 2 −βτ(k) j , b (k) ij ∈ uτ (k) j Z and b (k+1) ij ≤ uτ (k) j (6)

Theoretically, if we implement (2), (3) and (5) to A(k) and B(k), for k = 1, 2, · · · , then there exists nA, nB ∈ N such that A = nA X r=1 A(r), B = nB X s=1 B(s), , A(nA+1)= O mn and B(nB+1)= Onp (7)

where Omn and Onp, respectively, represent zero matrices of the size m × n and n × p. In (7), nA

and nB depend on n and the difference in the magnitude of elements in rows of A and columns of B.

Practically, Ozaki et al. [2] developed the following algorithm called Split_Mat based on (2), (3) and (5) to obtain matrix D(r), such that

A =

l

X

r=1

D(r) l ≤ l. (8)

It is important to note that (8) is achieved without rounding errors.

Algorithm 1 Split_Mat

function D = Split_Mat ( A, l, δ ) [ m, n ] = size ( A )

k = 1 u = 2^−52

β = fl(d((− log2(u) + log2(n))/2)e)

D{1} = zeros ( size ( A ) ) while ( k < l ) µ = max ( abs ( A ) , [ ] , 2 ) ; if ( max(µ) == 0 ) return end w = fl(2.^(ceil(log2(µ)) + β)) S = repmat ( w, 1, n ) D{k} = fl((A + S) − S) A = fl(A − D{k}) if ( nnz(D{k}) < δ ∗ m ∗ n ) D{k} = sparse ( D{k} ) end k = k + 1 end if ( k == l ) D{k} = A end end

When Algorithm 1 is implemented, δ (satisfying 0 ≤ δ < 1) is determined as the criterion to use the sparse formula. Practically, sparse representation is used, if the number of nonzero entries in a matrix

(32)

of the size m × n is less than δmn. Also, we need to set l = ∞, if (7) is required. Therefore, nA is

obtained such that A =PnA

r=1D

(r)holds or it can be written, in MATLAB notation, as A =PnA

r=1D{r}.

Correspondingly, Algorithm 1 can also be used to split matrix B such that B =Pl

s=1E

(s) and this is

done by applying E = Split_Mat(BT, l, δ)T .

Next, for matrices A(r) and B(s)given by (7),

A(r)B(s)= flA(r)B(s), for 1 ≤ r ≤ nA and 1 ≤ s ≤ nB (9)

is satisfied. It means that if we use floating-point arithmetic for A(r)B(B), rounding error never occurs

in the evaluation. Hence, we can obtain C(k), such that

AB = nAnB X k=1 C(k), C(1)= flA(1)B(1), · · · , C(nAnB)= fl  A(nA)B(nB)  . (10)

Here, AB is transformed into an unevaluated sum of nAnB floating-point matrices. If there is no so

big difference in the magnitude in the elements of the multiplied matrices, nA and nB becomes 3, 4 or

5 in many cases. In practical implementation, Ozaki et al. [2] designed the following algorithm named EFT_Mul to compute C(k) satisfying (10).

Algorithm 2 EFT_Mul function C = EFT_Mul ( A, B, δ ) [m, n] = size ( A ) [n, p] = size ( B ) D = Split_Mat ( A, inf, δ ) nA = length ( D ) E = Split_Mat ( BT, inf, δ ) nB = length ( E ) for r = 1 : nB E{r} = E{r}T end t = 1 for r = 1 : nA for s = 1 : nB C{t} = fl(D{r} ∗ E{s}) t = t + 1 end end end

If we apply Algorithm 4.5 in [3] and Algorithm 7.4 in [4] to the sum of C(k) in (10) componentwise, we

can obtain accurate numerical result. Let R ∈ Fm×p be the computed result using Algorithm 4.5 in [3]

and S ∈ Fm×p be Algorithm 7.4 in [4], then

|R − AB| ≤ 2u|AB| and |S − AB| ≤ u|AB| (11)

are satisfied. Here, inequality for matrices means that it holds elementwise. If (AB)ij6= 0, then we have

from (11) |Rij− (AB)ij |AB|ij ≤ 2u and |Sij− (AB)ij |AB|ij ≤ u (12)

If we use floating-point arithmetic directly to AB, dan let ˆC be the computed result, then

| ˆC − AB| ≤ nu|A||B| (13)

is satisfied [1]. Assuming (AB)ij6= 0, from (13), we have

| ˆC − ABij| ≤ nu

(|A||B|)ij

|AB|ij

(14)

(33)

In (14), the accuracy of the numerical result depends on n and the condition number of dot product. On the other hand, the relative error in (11) is bounded by the constants 2u and u, namely, it is independent to n and the condition number.

3

Extended Use of Error-Free Transformation for Complex

Ma-trix Multiplication

Given any complex matrices

˜

A = A + Bi and C = C + Di,˜ for A, B ∈ Fm×n and C, D ∈ Fn×p

where i denotes the imaginary unit, we want to compute the multiplication of these matrices, namely

˜

A ˜C = (A + Bi)(C + Di)

3.1 Simple Application

We start the discussion from the following equation:

˜

A ˜C = (A + Bi)(C + Di) = (AC − BD) + (AD + BC)i (15)

By adapting the concept of splitting the real matrices, we firstly need to set

A + Bi =A(k)+ B(k)i and C + Di =C(k)+ D(k)i. (16)

Then, vectors v(k), w(k)∈ Fm and x(k), y(k)∈ Fp are defined as follows:

v(k)i =llog2 max 1≤j≤n|a (k) ij | m , w(k)i =llog2 max 1≤j≤n|b (k) ij | m , x(k)j =llog2 max 1≤i≤n|c (k) ij | m , y(k)j =llog2 max 1≤i≤n|d (k) ij | m (17)

Here, a(k)ij , b(k)ij , c(k)ij and d(k)ij represent the elements of matrix A(k), B(k), C(k) and D(k), respectively. Next, we also define vectors σA(k), σB(k)∈ Fmand τ(k)

C , τ (k) D ∈ F p by: σA i(k) = fl2β· 2vi(k)= 2β· 2v(k)i , σ(k) B i= fl  2β· 2w(k)i = 2β· 2w(k)i τC(k)j= fl2β· 2xj(k)= 2β· 2x(k)j , τ(k) D j = fl  2β· 2yj(k)= 2β· 2y(k)j (18)

where β is given by (4) in the section 2. To findA(k)+ B(k)i, A(k+1)+ B(k+1)i, C(k)+ D(k)i

andC(k+1)+ D(k+1)isuch that



A(k)+ B(k)i=A(k)+ B(k)i+A(k+1)+ B(k+1)i and 

C(k)+ D(k)i=C(k)+ D(k)i+C(k+1)+ D(k+1)i

(19)

hold, we can now apply the concept of Algorithm 3.2 in [3] as follows:

a(k)ij = fl a(k)ij + σA i(k) − σ(k)A i, a(k+1)ij = fla(k)ij − a(k)ij  b(k)ij = fl b(k)ij + σB i(k)  − σ(k) B i  , b(k+1)ij = flb(k)ij − b(k)ij  c(k)ij = fl c(k)ij + τC(k)j − τ(k) C j  , c(k+1)ij = flc(k)ij − c(k)ij  d(k)ij = fl d(k)ij + τD j(k)  − τ(k) D j  , d(k+1)ij = fld(k)ij − d(k)ij  (20) 4

(34)

where a(k)ij , b(k)ij , c(k)ij and d(k)ij are elements of matrix A(k), B(k), C(k) and D(k), respectively. Note that

the procedures (16), (17), (18) and (20) are based on (1), (2), (3) and (5), respectively. Then, applying (17), (18) and (20) toA(k)+ B(k)iandC(k)+ D(k)i, for k = 1, 2, · · · , yields

A + Bi = nAB X r=1  A(r)+ B(r)i,  A nAB+1  + B(nAB+1)i  = Omn C + Di = nCD X s=1  C(s)+ D(s)i,  C nCD+1  + D(nCD+1)i  = Onp, (21)

where A(r), B(r) ∈ Fm×n and C(s), D(s) ∈ Fn×p. Since (7) and (21) are obtained from mathematically

similar process, then (9) suggests that

flA(r)C(s)= A(r)C(s), flB(r)D(s)= B(r)D(s),

flA(r)D(s)= A(r)D(s) and flB(r)C(s)= B(r)C(s),

(22)

hold for 1 ≤ r ≤ nAB, 1 ≤ s ≤ nCD and nAB, nCD∈ N. Hence, we have

AC = nABnCD X i=1 E(i), BD = nABnCD X j=1 F(j), AD = nABnCD X k=1 G(k) and BC = nABnCD X l=1 H(l) (23)

which mean that AC, BD, AD and BC are all transformed into an unevaluated sum of nABnCD

floating-point matrices. Then, (23) implies the followings:

(AC − BD) = nABnCD X i=1 E(i)− nABnCD X j=1 F(j) and (AD + BC) = nABnCD X k=1 G(k)+ nABnCD X l=1 H(l). (24)

Both summations (AC − BD) and (AD + BC) given by (24) are kept unevaluated, which means that both are transformed into an unevaluated sum of 2nABnCD. It is also worthy to note that the procedures

of obtaining (21), (23) and (24) are performed with no rounding erros involved. Therefore, error-free transformation for complex matrix multiplication satisfying (15) is achieved.

In practical implementation, we present the Split_CompMat_1 algorithm to split the complex matrix ˜

A = (A + Bi) based on (16), (17), (18) and (20) such that

˜ A = l X r=1 E(r), E(r)= A(r)+ B(r)i and l ≤ l (25) is satisfied without rounding errors.

(35)

Algorithm 3 Split_CompMat_1 function E = Split_CompMat_1 ( ˜A, l, δ ) % ˜A i s a complex m a t r i x [ m, n ] = size ( ˜A ) k = 1 u = 2^ −53; % d o u b l e p r e c i s i o n ( b i n a r y 6 4 ) β = fl(d((− log2(u) + log2(n))/2)e)

% E {1} = z e r o s ( s i z e ( ˜A))+ z e r o s ( s i z e ( ˜A ) ) ∗ 1 i while ( k < l )

µA = max ( abs ( real ( ˜A ) ) , [ ] , 2 )

µB = max ( abs ( imag ( ˜A ) ) , [ ] , 2 )

wA = fl(2.^(ceil(log2(µA)) + β)) wB = fl(2.^(ceil(log2(µB)) + β)) S = complex ( wA, wB) E{k} = fl(( ˜A + S) − S) ˜ A = fl( ˜A − E{k}) if (nnz(E{k}) < δ ∗ m ∗ n) D{k} = sparse(D{k}) end k = k + 1 end if ( k == l ) D{k} = ˜A end end

Since Split_CompMat is basicly an adaptation of Split_Mat in [2] for splitting compex matrices, then the values of δ and l are similar to those set when executing Algorithm 1. Also, this algorithm can be used for splitting the complex matrix ˜C = (C + Di) such that

˜ C = l X s=1 F(s) F(s)= C(s)+ D(s)i and l ≤ l (26) holds and it is done by executing

F = cellfun@transpose, Split_CompMat_1 C.T, l, δ,0UniformOutput0, false in MATLAB notation.

Next, Algorithm 3 leads to the next algorithm called EFT_CompMul_1. If given any complex matri-ces ˜A = (A + Bi) and ˜C = (C + Di), then EFT_CompMul_1 algorithm computes (AC − BD) and (AD − BC) satisfying (24) such that (15) holds.

(36)

Algorithm 4 EFT_CompMul_1

function res = EFT_CompMul_1 ( ˜A, ˜C ) % ˜A and ˜C a r e a complex m a t r i x ApB = Split_CompMat_1( ˜A) ;

CpD = cellfun ( @transpose, Split_CompMat_1 ˜C.T, l, δ , ’ UniformOutput ’ , false ) ; N _AB = length ( ApB ) ;

N _CD = length ( CpD ) ; k = 1 ;

for r = 1 : N _AB for s = 1 : N _CD

AC{k} = real ( ApB{r} ) ∗ real ( CpD{s} ) ; BD{k} = −imag ( ApB{r} ) ∗ imag ( CpD{s} ) ; AD{k} = real ( ApB{r} ) ∗ imag ( CpD{s} ) ; BC{k} = imag ( ApB{r} ) ∗ real ( CpD{s} ) ; k = k + 1 ;

end end

N _AC = length ( AC ) ;

for i = (N _AC + 1) : (N _AC + length(BD)) AC{i} = BD{i − N _AC} ;

AD{i} = BC{i − N _AC} ; end

N _AC = length ( AC ) ; for j = 1 : N _AC

res{j} = complex ( AC{j}, AD{j} ) ; end

end

3.2 Error-Free Transformation of Complex Matrices

Here, the error-free transformation for complex matrix multiplication is obtained by proposing new splitting algorithm for complex matrices ˜A = (A + Bi) and ˜C = (C + Di), such that (21) and

fl A(r)C(s)−B(r)D(s) = A(r)C(s)

−B(r)D(s) and fl A(r)D(s)+B(r)C(s) = A(r)D(s)+B(r)C(s)

(27)

are satisfied. In order to do that, we firstly define a constant γ as follows:

γ := &

log2(n) − log2(u) + 1 2

'

(28)

By recalling the proof of Theorem 1 in [2], we find that β needs to satisfy

log2(n) − log2(u)

2 ≤ β (29)

such that (22) holds. Since (4), (28) and (29) suggest that

log2(n) − log2(u) 2 ≤ β =

&

log2(n) − log2(u) 2

' ≤

&

log2(n) − log2(u) + 1 2

'

= γ, (30)

then γ is valid to guarantee that (22) is still satisfied.

Next, similar to procedures done in the subsection 3.1, (16) is set and two vectors, namely p(k) ∈ Fm

and q(k)∈ Fp, are defined by

p(k)i =llog2max max

1≤j≤n|a (k) ij |, max1≤j≤n|b (k) ij | m

and qj(k)=llog2max max 1≤i≤n|c (k) ij |, max1≤i≤n|d (k) ij | m . (31) 7

(37)

Then, p(k)i and qj(k)are used for computing σ(k)∈ Fm and τ(k)∈ Fp as follows:

σi(k)= 2γ· 2p(1)i and τ(k)

j = 2

γ

· 2qj(1), (32)

Note that a(k)ij , b(k)ij , c(k)ij and d(k)ij are the elements of A(k), B(k), C(k) and D(k) in (16), respectively. In order to satisfy (19),A(k)+ B(k)i, A(k+1)+ B(k+1)i, C(k)+ D(k)iandC(k+1)+ D(k+1)iare

then computed by applying the concept of Algorithm 3.2 in [3] as follows:

a(k)ij = fl a(k)ij + σ(k)i  − σ(k) i  , a(k+1)ij = fla(k)ij − a(k)ij  b(k)ij = fl b(k)ij + σi(k) − σi(k)  , b(k+1)ij = flb(k)ij − b(k)ij  c(k)ij = fl c(k)ij + τj(k) − τj(k)  , c(k+1)ij = flc(k)ij − c(k)ij  d(k)ij = fl d(k)ij + τj(k) − τj(k)  , d(k+1)ij = fld(k)ij − d(k)ij  (33)

In (33), a(k)ij , b(k)ij , c(k)ij and d(k)ij denote the elements of matrices A(k), B(k), C(k) and D(k), respectively. Again, the procedures (16), (31), (32) and (33) are based on (1), (2), (3) and (5). Next, implement-ing (31), (32) and (33) toA(k)+ B(k)iandC(k)+ D(k)i, for k = 1, 2, · · · yield results satisfying (21).

Theorem A

Let ˜A = (A + Bi) ∈ Fm×n and ˜C = (C + Di) ∈ Fn×p be two complex matrices. Implementing (31), (32) and (33) to A(k)+ B(k)iand C(k)+ D(k)i, for k = 1, 2, · · · results in (21) and it implies that (27) holds.

Proof

Assume that ˜A = (A + Bi) ∈ Fm×n and ˜

C = (C + Di) ∈ Fn×p be two complex matrices. We have

shown that applying (31), (32) and (33) to A(k)+ B(k)iand C(k)+ D(k)i, for k = 1, 2, · · · , result in (21). Hence, It suffices to just show that (27) is satisfied. Firstly, we want to demonstrate that

fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s)

Since we use σ(r) to obtain A(r)+ B(s)i satisfying (21), then (6) suggests that

a(r)ij ∈ uσi(r)Z and b(r)ij ∈ uσi(r)Z (34)

|a(r)ij | ≤ 2−γσi(r) and |b(r)ij | ≤ 2−γσi(r) (35)

Similarly, τ(s) is used to split the matrices ˜C = (C + Di) in order to obtain (21), then we have the

followings:

c(s)ij ∈ uτj(s)Z and d(s)ij ∈ uτj(s)Z (36)

|c(s)ij | ≤ 2−γτj(s) and |d(s)ij | ≤ 2−γτj(s) (37) From (34) and (36), we obtain

a(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z, b (r) ikd (s) kj ∈ u 2σ(r) i τ (s) j Z and a (r) ikc (s) kj − b (r) ikd (s) kj ∈ u 2σ(r) i τ (s) j Z (38) which implies n X k=1 a(r)ikc(s)kj ∈ u2σ(r) i τ (s) j Z, n X k=1 b(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z and n X k=1 a(r)ikc(s)kj − n X k=1 b(r)ikd(s)kj ∈ u2σ(r) i τ (s) j Z. (39) From (35) and (37), we obtain

|a(r)ikc(s)kj| ≤ 2−2γσ(r)i τj(s) and |b(r)ikd(s)kj| ≤ 2−2γσi(r)τj(s) (40)

(38)

which implies n X k=1 |a(r)ikc(s)kj| ≤ n2−2γσi(r)τj(s) and n X k=1 |b(r)ikd(s)kj| ≤ n2−2γσi(r)τj(s). (41)

Using the properties of absolute value, we find that:

n X k=1 a(r)ikc(s)kj ≤ n X k=1 |a(r)ikc(s)kj| ≤ n2−2γσ(r) i τ (s) j and n X k=1 b(r)ikd(s)kj ≤ n X k=1 |b(r)ikd(s)kj| ≤ n2−2γσ(r) i τ (s) j (42) and (42) implies that

n X k=1 a(r)ikc(s)kj − n X k=1 b(r)ikd(s)kj ≤ n X k=1 a(r)ikc(s)kj + n X k=1 b(r)ikd(s)kj ≤ n2−2γσi(r)τj(s)+ n2−2γσ(r)i τj(s) = 2n2−2γσ(r)i τj(s) = n2−2γ+1σi(r)τj(s) (43)

Using the definition of γ, we find that

n2−2γ+1= n2

−2 log2(n) − log2(u) + 1

2



+1

≤ n2−(log2(n)−log2(u)+1)+1= n2− log2(n)+log2(u)= u (44)

from (43) and (44), we find that

n X k=1 a(r)ikc(s)kj − n X k=1 b(r)ikd(s)kj ≤ uσi(r)τj(s) (45)

using (38), (39) and (45), we obtain

     u2σ(r) i τ (s) j ≤ Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj ≤ uσ (r) i τ (s) j , if Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj 6= 0 fl  Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj  = 0, if Pn k=1a (r) ikc (s) kj − Pn k=1b (r) ikd (s) kj = 0 (46) From Remark 2 in [2], this means that there is no rounding errors in the evaluation of fl A(r)C(s)

B(r)D(s). In other words, fl A(r)C(s)− B(r)D(s) = A(r)C(s)− B(r)D(s) is satisfied. Using similar idea,

we also obtain that fl A(r)D(s)+ B(r)C(s) = A(r)D(s)+ B(r)C(s) and this completes the proof.

 Practically, we introduce the following algorithm, named Split_CompMat_2, which is used for split-ting the complex matrix ˜A = (A + Bi) based on (16), (31), (32) and (33) such that (27) is satisfied.

(39)

Algorithm 5 Split_CompMat_2 function E = Split_CompMat_2 ( ˜A, l, δ ) % ˜A i s a complex m a t r i x [ m, n ] = size ( ˜A ) k = 1 u = 2^ −53; % d o u b l e p r e c i s i o n ( b i n a r y 6 4 ) γ = fl(d((− log2(u) + log2(n) + 1)/2)e)

% E {1} = z e r o s ( s i z e ( ˜A))+ z e r o s ( s i z e ( ˜A ) ) ∗ 1 i while ( k < l )

µA = max ( abs ( real ( ˜A ) ) , [ ] , 2 )

µB = max ( abs ( imag ( ˜A ) ) , [ ] , 2 )

µ = max(µA, µB) w = fl(2.^(ceil(log2(µ)) + γ)) S = complex ( w, w ) E{k} = fl(( ˜A + S) − S) ˜ A = fl( ˜A − E{k}) if (nnz(E{k}) < δ ∗ m ∗ n) D{k} = sparse(D{k}) end k = k + 1 end if ( k == l ) D{k} = ˜A end

Again, l and δ are similarly set to ones acting as inputs of Algorithm 1 and 3. Moreover, to split the complex matrix ˜C = (C + Di) based on (16), (31), (32) and (33) such that

˜ C = l X s=1 F(s) F(s)= C(s)+ D(s)i and l ≤ l

and (27) hold, we need to run

F = cellfun@transpose, Split_CompMat_2 C.T, l, δ,0UniformOutput0, false in MATLAB notation.

Next, we construct EFT_CompMul_2 algorithm, which is an error-free transformtion for complex ma-trix multiplication between ˜A = (A + Bi) and ˜C = (C + Di) such that (15) is achieved. It is worthy noting that the construction of this algorithm based on the Theorem A.

Algorithm 6 EFT_CompMul_2

function G = EFT_CompMul_2 ( ˜A, ˜C ) % ˜A and ˜C a r e complex m a t r i c e s E = Split_CompMat_2 ( A ) ;

F = cellfun @transpose, Split_CompMat_2 C.T, l, δ,0UniformOutput0, false ;

k = 1 ; for i = 1 : length ( E ) for j = 1 : length ( F ) G{k} = E{i} ∗ G{j} ; q = q + 1 ; end end end 10

(40)

3.3 Other Forms

In subsection 3.1, there are four matrix multiplication, namely AC, BD, AD and BC. If we let

P = A(C + D), , Q = (A + B)D and R = B(C − D) (47)

then

(A + Bi)(C + Di) = (P − Q) + (Q + R)i (48) In (47), there are only three matrix multiplication. We want to find a new splitting algorithm for P, Q and R to obtain A = nA X r=1 A(r), C + D = nCpD X s=1 S(s), A(r)∈ Fm×n, S(s)∈ Fn×p (49) A + B = nApB X r=1 T(r), D = nD X s=1 D(s), T(r)∈ Fm×n, D(s) ∈ Fn×p (50) B = nB X r=1 B(r), C − D = nCmD X s=1 U(s), B(r)∈ Fm×n, U(s)∈ Fn×p (51) where nA, nCpD, nApB, nD, nB, nCmD∈ N, such that:

A(r)S(s)= fl A(r)S(s), T(r)D(s)= fl T(r)D(s) and B(r)U(s)= fl B(r)U(s) (52) To illustrate the new splitting algorithm, we only use (49) and show that A(r)S(s)= fl A(r)S(s) holds, while (50) and (51) follow accordingly.

Firstly, (16) is set and the vector t(k)∈ Fm is defined as

t(k)i =llog2 max 1≤j≤n|a (k) ij | m . (53)

Using (53), we define the vector σ(k)∈ Fmby

σ(k)i = 2γ· 2t(k)i (54)

where a(k)ij indicates every element of matrix A(k) in (16). To compute A(k) satisfying A(k) = A(k)+

A(k+1), the concept of Algorithm 3.2 in [3] is implemented as follows:

a(k)ij = fl a(k)ij + σ(k)i  − σ(k)i



, a(k+1)ij = fla(k)ij − a(k)ij  (55)

where a(k)ij represents every entry of the matrix A(k). Now, if the procedures (53), (54) and (55) are applied to A(k), for k = 1, 2, · · · , then we obtain

A = nA X r=1 A(r) and A(nA+1)= O mn (56)

where Omnis a zero matrix of the size m × n. It is worthy noting that (53), (54) and (55) are respectively

based (2), (3) and (5). Concretely, we present the algorithm called Split_Mat_Mod which is used for splitting the real part of the complex matrix ˜A = (A + Bi) such that (8) is achieved.

Gambar

Table 2. Comparison of relative error (Real part) for (20)
Table 2: Comparison of relative error (Real part) for n = 1000
Table 3: Comparison of relative error (Imaginary part) for n = 1000
Table 1: Comparison of computing time ratio for all methods
+7

Referensi

Dokumen terkait

Tabel XVI menunjukkan seluruh profil karakteristik subjek penelitian pada pengukuran akhir memiliki nilai signifikansi (p)&gt;0,05. Artinya semua karakteristik, termasuk

Pada agroforestri awal dan menengah, intensitas cahaya yang diperoleh tanaman masih cukup besar (&gt; 50 %) sehingga tanaman masih dapat beradaptasi dan tumbuh dengan baik.

Hasil penelitian ini menunjukkan bahwa hipotesis profitabilitas, leverage, kompleksitas operasi perusahaan, komite audit tidak berpengaruh terhadap audit delay dan

That’s what we’re going to program in this capstone project that brings together network-enabled projects like the Web-Enabled Light Switch and the Android Door Lock

Jika F hitung &lt; F tabel, maka H 0 diterima yang berarti tidak ada pengaruh antar perlakuan penambahan serbuk serai dalam pakan terhadap jumlah endoparasit

[r]

Implementasi model pembelajaran berbasis e-learning untuk meningkatkan hasil belajar mahasiswa pada mata kuliah Pengembangan Kurikulum.. Universitas Pendidikan Indonesia |

[r]