RANDOM NODE-ASYNCHRONOUS UPDATES ON GRAPHS
2.8 An Application: Randomized Computation of Singular Vectors
2.8.1 Rank- π Approximation of an Arbitrary Matrix
Although the assumptionkAk2 =1 required by Algorithm 2 can be satisfied easily by normalizing the data matrixA with its largest singular value, the computation of the largest singular value itself may not be very practical especially whenA is
large in dimensions. It is, in fact, possible to remove this assumption by introducing a normalization step into the algorithm. Furthermore, it is also possible to extend Algorithm 2 in such a way that it converges to the dominantπ singular vectors of Atogether with the top-π singular values for an arbitrary value ofπ. The extended version of the algorithm is presented in Algorithm 3.
Algorithm 3 differs from Algorithm 2 in three ways: Firstly, the vector variables u and v in Algorithm 2 are extended to be matrices withπ columns. Secondly, Algorithm 3 uses an auxiliary variable C. Thirdly, and the most importantly, Algorithm 3 uses aQR decomposition(Line 6) that serves as the normalization step.
More precisely, instead of updating the variableUdirectly, Algorithm 3 first updates the auxiliary variableC(Line 5), and then updatesUas the unitary part of the QR decomposition ofC. We note that the matrixTin Line 6 of the algorithm denotes the upper-triangular part of the QR decomposition ofC. Without loss of generality, it is assumed thatThas non-negative diagonal entries, and its diagonal entries are in the descending order.
Although the convergence of Algorithm 2 is ensured by Theorem 2.6, we do not provide an explicit proof for the convergence of Algorithm 3. Nevertheless, by the virtue of Theorem 2.6 we can argue for the convergence of Algorithm 3 since it is a natural extension of Algorithm 2 with an additional normalization step. We observe that the variables of Algorithm 3 converge as follows:
UβUeπ, Tβ πΊ2π, VβVeπ πΊπ, (2.104) where Ueπ and eVπ are the first π columns of Ue and eV, respectively, andπΊπ is the top-leftπΓπblock ofπΊ. Thus, the productU VHconverges toAπ, which is the best rank-π approximation ofA, i.e,Aπ =eUππΊπVeHπ .
In order to verify its convergence, we simulate Algorithm 3 on the test matrix MEDLINE [188], which is a full-rank and sparse matrix of size 1033 Γ 5735.
We measure the convergence of the algorithm in terms of the squared Frobenius norm of the difference between Aπ and the productU VH. Since the update index is selected randomly (Line 3) in every iteration of Algorithm 3, the error term, i.e, kAπ βU VHk2
F, is a random variable as well. So, we compute the expected error by averaging over 103 independent runs of the algorithm. These results are presented in Figure 2.9a for the cases ofπ β {1, 2, 3, 10}, which numerically verify the convergence of the algorithm. Note that the algorithm requires more iterations to converge as the value ofπgets larger.
0 1 2 3 4 5 6 7 8 9 10 11 12 k/(M+N) Γ102 10-15
10-13 10-11 10-9 10-7 10-5 10-3 10-1 101 103 105
Eh. .ArβUVH. .2 F
i
r= 1 r= 2 r= 3 r= 10
(a)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k/(M+N) Γ102 10-15
10-13 10-11 10-9 10-7 10-5 10-3 10-1 101 103 105
Eh. .A3βUVH. .2 F
i
Ξ³= 1 Ξ³= 10β2 Ξ³= 10β3 Ξ³= 10β4
(b)
Figure 2.9: (a) Convergence of Algorithm 3 for various different values ofπ. (b) Convergence of Algorithm 3 for the caseπ =3 when the normalization step in Line 6 is executed with probabilityπΎ. Hereπ indicates the number of iterations.
We note that Line 5 of Algorithm 3 updates only a row of the auxiliary variableC in every iteration. Since the matrixCis not expected to change significantly during an iteration, the normalization step in Line 6 can be skipped in some iterations in order to reduce the overall computational complexity of the algorithm. In order to verify this claim, we modify the implementation of the algorithm in such a way that Line 6 is executed with probability πΎ. So, the modified implementation reduces to Algorithm 3 whenπΎ =1. For the case ofπ =3, we compute the expected error of the modified implementation by averaging over 103independent runs. These results are presented in Figure 2.9b for the values ofπΎ β {1, 10-2, 10-3, 10-4}, which shows that the modified implementation keeps converging for wide range of values ofπΎ. More interestingly, the rate of convergence remains visually the same even when the normalization step is executed with probability as low as πΎ =10-2. Moreover, the rate of convergence decreases marginally whenπΎ =10-3 β1/π. This is consistent with the fact that an iteration of Algorithm 3 updates only one row of Cthat has π rows in total. Nevertheless, when πΎ has a very small value, e.g., πΎ =10-4, the algorithm indeed gets significantly slower.
Regarding the computational complexity of the algorithm, note that the cost of Line 5, Line 6 and Line 9 areO (π π),O (π π2), andO (π π), respectively. However, the algorithm gets to Lines 5 and 6 with probabilityπ/(π+π), and it gets to Line 9 with probability π/(π+π). When we further assume that Line 6 is executed with probabilityπΎ,the average costof an iteration of Algorithm 3 can be found as follows:
E[computatinal cost per iteration] =O
π π π +πΎ π2π2 π +π
β O
π π π π+π
, (2.105)
where the approximation is valid whenπΎ β€ π/(π π), which is acceptable in prac- tice as suggested by Figure 2.9b. On the other hand, the synchronous form of (2.101) requires O (π π π)multiplications per iteration. In order to compensate the additional factor of π+π, the iteration index π is normalized by π+π in both Figures 2.9a and 2.9b.
Relevance of Algorithm 3 follows from its applicability for asynchronous and dis- tributed implementation. Since a single iteration of the algorithm requires a partial information of the matrixA, (i.e., a single column or row) multiple processors can operate on the same matrixAsimultaneously without requiring any ordering among them. More importantly, it is possible to extend Algorithm 3 in such a way that the data matrixAis partitioned into multiple smaller pieces, and each piece is stored in a different processing core as we discuss next.