Rank- 𝑟 Approximation of an Arbitrary Matrix

RANDOM NODE-ASYNCHRONOUS UPDATES ON GRAPHS

2.8 An Application: Randomized Computation of Singular Vectors

2.8.1 Rank- 𝑟 Approximation of an Arbitrary Matrix

Although the assumptionkAk₂ =1 required by Algorithm 2 can be satisfied easily by normalizing the data matrixA with its largest singular value, the computation of the largest singular value itself may not be very practical especially whenA is

large in dimensions. It is, in fact, possible to remove this assumption by introducing a normalization step into the algorithm. Furthermore, it is also possible to extend Algorithm 2 in such a way that it converges to the dominant𝑟 singular vectors of Atogether with the top-𝑟 singular values for an arbitrary value of𝑟. The extended version of the algorithm is presented in Algorithm 3.

Algorithm 3 differs from Algorithm 2 in three ways: Firstly, the vector variables u and v in Algorithm 2 are extended to be matrices with𝑟 columns. Secondly, Algorithm 3 uses an auxiliary variable C. Thirdly, and the most importantly, Algorithm 3 uses aQR decomposition(Line 6) that serves as the normalization step.

More precisely, instead of updating the variableUdirectly, Algorithm 3 first updates the auxiliary variableC(Line 5), and then updatesUas the unitary part of the QR decomposition ofC. We note that the matrixTin Line 6 of the algorithm denotes the upper-triangular part of the QR decomposition ofC. Without loss of generality, it is assumed thatThas non-negative diagonal entries, and its diagonal entries are in the descending order.

Although the convergence of Algorithm 2 is ensured by Theorem 2.6, we do not provide an explicit proof for the convergence of Algorithm 3. Nevertheless, by the virtue of Theorem 2.6 we can argue for the convergence of Algorithm 3 since it is a natural extension of Algorithm 2 with an additional normalization step. We observe that the variables of Algorithm 3 converge as follows:

U→Ue𝑟, T→ 𝚺²_𝑟, V→Ve𝑟 𝚺𝑟, (2.104) where Ue𝑟 and eV𝑟 are the first 𝑟 columns of Ue and eV, respectively, and𝚺𝑟 is the top-left𝑟×𝑟block of𝚺. Thus, the productU V^Hconverges toA𝑟, which is the best rank-𝑟 approximation ofA, i.e,A𝑟 =eU𝑟𝚺_𝑟Ve^H𝑟 .

In order to verify its convergence, we simulate Algorithm 3 on the test matrix MEDLINE [188], which is a full-rank and sparse matrix of size 1033 × 5735.

We measure the convergence of the algorithm in terms of the squared Frobenius norm of the difference between A𝑟 and the productU V^H. Since the update index is selected randomly (Line 3) in every iteration of Algorithm 3, the error term, i.e, kA𝑟 −U V^Hk²

F, is a random variable as well. So, we compute the expected error by averaging over 10³ independent runs of the algorithm. These results are presented in Figure 2.9a for the cases of𝑟 ∈ {1, 2, 3, 10}, which numerically verify the convergence of the algorithm. Note that the algorithm requires more iterations to converge as the value of𝑟gets larger.

0 1 2 3 4 5 6 7 8 9 10 11 12 k/(M+N) _×₁₀² 10^-15

10^-13 10^-11 10^-9 10^-7 10^-5 10^-3 10^-1 10¹ 10³ 10⁵

Eh. .Ar−UVH. .2 F

r= 1 r= 2 r= 3 r= 10

(a)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k/(M+N) _×₁₀² 10^-15

10^-13 10^-11 10^-9 10^-7 10^-5 10^-3 10^-1 10¹ 10³ 10⁵

Eh. .A3−UVH. .2 F

γ= 1 γ= 10⁻² γ= 10⁻³ γ= 10⁻⁴

(b)

Figure 2.9: (a) Convergence of Algorithm 3 for various different values of𝑟. (b) Convergence of Algorithm 3 for the case𝑟 =3 when the normalization step in Line 6 is executed with probability𝛾. Here𝑘 indicates the number of iterations.

We note that Line 5 of Algorithm 3 updates only a row of the auxiliary variableC in every iteration. Since the matrixCis not expected to change significantly during an iteration, the normalization step in Line 6 can be skipped in some iterations in order to reduce the overall computational complexity of the algorithm. In order to verify this claim, we modify the implementation of the algorithm in such a way that Line 6 is executed with probability 𝛾. So, the modified implementation reduces to Algorithm 3 when𝛾 =1. For the case of𝑟 =3, we compute the expected error of the modified implementation by averaging over 10³independent runs. These results are presented in Figure 2.9b for the values of𝛾 ∈ {1, 10^-2, 10^-3, 10^-4}, which shows that the modified implementation keeps converging for wide range of values of𝛾. More interestingly, the rate of convergence remains visually the same even when the normalization step is executed with probability as low as 𝛾 =10^-2. Moreover, the rate of convergence decreases marginally when𝛾 =10^-3 ≈1/𝑀. This is consistent with the fact that an iteration of Algorithm 3 updates only one row of Cthat has 𝑀 rows in total. Nevertheless, when 𝛾 has a very small value, e.g., 𝛾 =10^-4, the algorithm indeed gets significantly slower.

Regarding the computational complexity of the algorithm, note that the cost of Line 5, Line 6 and Line 9 areO (𝑁 𝑟),O (𝑀 𝑟²), andO (𝑀 𝑟), respectively. However, the algorithm gets to Lines 5 and 6 with probability𝑀/(𝑀+𝑁), and it gets to Line 9 with probability 𝑁/(𝑀+𝑁). When we further assume that Line 6 is executed with probability𝛾,the average costof an iteration of Algorithm 3 can be found as follows:

E[computatinal cost per iteration] =O

𝑀 𝑁 𝑟 +𝛾 𝑀²𝑟² 𝑀 +𝑁

≈ O

𝑀 𝑁 𝑟 𝑀+𝑁

, (2.105)

where the approximation is valid when𝛾 ≤ 𝑁/(𝑀 𝑟), which is acceptable in prac- tice as suggested by Figure 2.9b. On the other hand, the synchronous form of (2.101) requires O (𝑀 𝑁 𝑟)multiplications per iteration. In order to compensate the additional factor of 𝑀+𝑁, the iteration index 𝑘 is normalized by 𝑀+𝑁 in both Figures 2.9a and 2.9b.

Relevance of Algorithm 3 follows from its applicability for asynchronous and dis- tributed implementation. Since a single iteration of the algorithm requires a partial information of the matrixA, (i.e., a single column or row) multiple processors can operate on the same matrixAsimultaneously without requiring any ordering among them. More importantly, it is possible to extend Algorithm 3 in such a way that the data matrixAis partitioned into multiple smaller pieces, and each piece is stored in a different processing core as we discuss next.

Dalam dokumen Signals on Networks: Random Asynchronous and Multirate Processing, and Uncertainty Principles (Halaman 83-86)