RANDOM NODE-ASYNCHRONOUS UPDATES ON GRAPHS
2.3 Cascade of Asynchronous Updates
2.3.3 Rate of Convergence
are selected sufficiently often (see [32] for precise definition), the study [32] showed that the linear asynchronous model in (2.5) converges for any index sequence if and only if the spectral radius of|A|is strictly less than unity, where|A|denotes a matrix with element-wise absolute values ofA. On the other hand, our Corollary 2.2 allows eigenvalues with magnitudes grater than unity. Although these two results appear to be contradictory (whenAconsists of non-negative elements), the key difference is the notion of convergence. As an example, consider the matrix A2 defined in (2.55). Its spectral radius is exactly 1, and [32] proved thatthere existsa sequence of indices under which iterations onA2do not converge. For example, assumingπ is odd, consider the index sequence generated asπ =(2π β1) (modπ) +1. However, Corollary 2.2 proves the convergence in a statisticalmean-square averaged sense.
(See Figure 2.5.) In short, when compared with [32], Corollary 2.2 requires a weaker condition onAand guarantees a convergence in a weaker (and probabilistic) sense.
which will be denoted by πΎ. Thus, the regular power method will run dπΎ/πe iterations, whereas the component-wise variant will runπΎ iterations.
For the numerical experiment we consider three symmetric matrices of sizeπ =100.
All three matrices are constructed such thatππ =1 is an eigenvalue with multiplicity π =1, and the remaining π-1 eigenvalues are selected to be |ππ|< 1 so that the power method (hence any random variant) is guaranteed to converge to an eigen- vector of the eigenvalueππ =1. (See Corollary 2.2.) In the first two examples we consider a pair of simultaneously diagonalizable matrices. The non-unit eigenvalues of the first matrix are selected to be positive (visualized in Figure 2.3a), and the non-unit eigenvalues of the second matrix are selected to be the negative of those of the first matrix (visualized in Figure 2.3b). In the third example we take a random symmetric matrix with non-unit eigenvalues satisfyingβ0.5 < ππ < 0.5 (visualized in Figure 2.3c). Figures 2.3d, 2.3e and 2.3f show the value ofE[krπΎk2
2]/kr0k2
2as a function ofπΎ for the three matrices described above.
-1 -0.5 0 0.5 1
0
(a) Eigenvalue gap: 0.0026
-1 -0.5 0 0.5 1
0
(b) Eigenvalue gap: 0.0026
-1 -0.5 0 0.5 1
0
(c) Eigenvalue gap: 0.5011
0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10-7
10-6 10-5 10-4 10-3 10-2 10-1 100
Average Residual Norm Squared
Random Component-Wise Regular Power Method
(d)
0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10-7
10-6 10-5 10-4 10-3 10-2 10-1 100
Average Residual Norm Squared
Random Component-Wise Regular Power Method
(e)
0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10-7
10-6 10-5 10-4 10-3 10-2 10-1 100
Average Residual Norm Squared
Random Component-Wise Regular Power Method
(f)
Figure 2.3: Non-unit eigenvalues of the (a) first, (b) second, and (c) third examples.
Eigenvalue gap is defined as the difference between 1 and the magnitude of the largest non-unit eigenvalue. Normalized residual errors in the (d) first, (e) second, and (f) third examples. Since the regular power method requires π inner products per iteration, the residual error appears only at integer multiples ofπ =100. Results are obtained by averaging over 104independent runs.
We first compare the results in Figures 2.3d and 2.3e. Since the eigenvalues have the same magnitudes, the regular power method behaves the same in both cases.
Although the eigenvalue gap is the same in both cases, the random component- wise method convergessignificantly fasterwhen the second dominant eigenvalue is negative. When the second dominant eigenvalue is positive, both the regular and
the component-wise updates behave similarly. In the third example, Figure 2.3f, the matrix has a large eigenvalue gap, in which case the random component-wise updates do not converge as fast as the regular power method.
Theoretical Justification
In order to explain the behavior in Figure 2.3, in this section we will assume a slightly simplified stochastic model for the selection of the update sets in (2.5). Namely, we will assume that the scheme (2.5) updatesexactlyπTindices per iteration. Thus, the random variableT(which denotes the size of the update sets) becomes a deterministic quantity, andπ2
T =0. So, the parameter πΏT (the amount of asynchronicity) reduces to the following form:
πΏT = πβπT πβ1
. (2.50)
In this setting we note that an update in the form of (2.5) requiresπTinner products per iteration. So, the cost of a single power iteration, which requires π inner products, is equivalent to the cost of π/πT asynchronous iterations in which πT indices are updates simultaneously. Since the associated cost of an eigenvalue defined in (2.40) disregards the cost of an iteration, we consider the following quantity instead:
π π; πT, π
= 1+ πT π
|π|2β1+πΏT(πβ1) |πβ1|2 !π/πT
, (2.51)
which results in a fair comparison among the component-wise updates with different amount of asynchronicity. The quantityπ π; πT, π
can be interpreted as the amount of reduction in the residual error when eigenvalueπis present in the matrixAwith the eigenspace parameterπ, and the model (2.5) updatesπTindices simultaneously.
Thus, smaller values of π π; πT, π
indicate a better (faster) convergence of the randomized scheme in (2.5).
We first note that the quantityπ π; πT, π
can be equivalently re-written as follows:
π π; πT, π
= 1+ πT π
(πΌ+1)
πβ πΌ πΌ+1
2β 1
(πΌ+1)2 !π/πT
, (2.52)
where πΌ=πΏT (πβ1) as in Corollary 2.2. Then, it is clear that the point πβ = πΌ/(πΌ+1)minimizesπ π; πT, π
over the variableπ, andπ π; πT, π
(as a function ofπ) is circularly symmetric with respect to the pointπβ . In addition, the inequality
(2.36) ensures thatπ π; πT, π
β₯ 0. In order to demonstrate its behavior, we evaluate π π; πT, π) numerically over the unit disk (as a function ofπ) for different values ofπT andπ. These computations are visualized in Figure 2.4.
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
Im(Ξ»)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a)π(π; π ,0.8)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
Im(Ξ»)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b)π(π; π/2,0.8)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
Im(Ξ»)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(c)π(π; 1,0.8)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
Im(Ξ»)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(d)π(π; 1,0.6)
Figure 2.4: Numerical evaluation ofπ π, πT
for various different values ofπT and π. The value ofπ is set to beπ =100.
In the case of synchronous updates we have πT =π, thus πΌ=0, and the quantity defined in (2.51) reduces toπ π; π , π
=|π|2irrespective of the value of π, which can be seen clearly from Figure 2.4a. Thus, as |π| approaches 1, the value of π π; π , π
approaches 1 irrespective of the phase ofπ. So, only the magnitude of an eigenvalue affects the convergence rate of the regular power iteration, which is a well-known result.
In the case of asynchronous updates we have πT < π, and we will assume π <1 (which is the case in most practical applications). Thus, we have πΌ <0, and the phase of an eigenvalue becomes important since π π; πT, π
is no longer a
circularly symmetric function ofπwith respect to the origin. Figures 2.4b, 2.4c and 2.4d visualize this behavior clearly. In particular, note that asπapproaches 1, the quantityπ π; πT, π
approaches 1 as well. On the other hand, asπapproachesβ1, the quantityπ π; πT, π
stays bounded away from 1. More precisely, π 1; πT, π
=1, π β1; πT, π
= 1+ πT π 4πΌ
!π/πT
. (2.53)
So, eigenvalues that are close to 1 result in a slower convergence, whereas eigen- values can be arbitrarily close toβ1, yet the convergence does not necessarily slow down. Therefore, in light of (2.53) and Figure 2.4 we can conclude thatthe random component-wise updates favor negative eigenvalues over positive ones. This con- clusion is consistent with the numerical observations made in Figures 2.3d and 2.3e:
when the second dominant eigenvalue is close to 1, both the random component- wise updates and the regular power iteration converge slowly. On the contrary, when the second dominant eigenvalue is close toβ1, the random component-wise updates converge faster than the synchronous (regular) counter-part. In fact, it is possible to construct a matrixA(by placing the second dominant eigenvalue sufficiently close toβ1) such that the randomized updates convergearbitrarily fasterthan the regular power iteration.
Although random component-wise updates converge faster when the second domi- nant eigenvalue is close to β1, Figure 2.3f shows that randomized updates are not always faster than the synchronous counter-part. In order to explain the behavior observed in Figure 2.3f, we considerπ π; πT, π
evaluated atπ=0. More precisely, π 0; πT, π
= 1+ πT π
πΏT(πβ1) β1
!π/πT
β₯ 1β πT π
!2π/πT
, (2.54)
where the lower bound follows from (2.36). As long as the updates are randomized (the case of πT < π), it is clear from (2.54) thatπ 0; πT, π
is bounded away from zero. Figures 2.4b, 2.4c and 2.4d visualize this behavior as well. Then, we can conclude that in the case of random component-wise updates the associated cost of an eigenvalue is bounded away from zero even when the eigenvalue itself is close to zero. This conclusion is consistent with the simulation results presented in Figure 2.3f. When the non-unit eigenvalues are close to zero, regular power iteration converges faster than its randomized variant.
As a concluding remark, we note that the results presented in this section are valid whenAis a normal matrix, i.e,A is unitarily diagonalizable. The results of
this section may not hold true when A is an arbitrary matrix. Nevertheless, the normality condition is not a loss of generality when dealing with undirected graphs as in Section 2.7, or if our goal is to construct a random component-wise method that can compute the singular vectors of an arbitrary matrix as in Section 2.8.