• Tidak ada hasil yang ditemukan

RANDOM NODE-ASYNCHRONOUS UPDATES ON GRAPHS

2.3 Cascade of Asynchronous Updates

2.3.3 Rate of Convergence

are selected sufficiently often (see [32] for precise definition), the study [32] showed that the linear asynchronous model in (2.5) converges for any index sequence if and only if the spectral radius of|A|is strictly less than unity, where|A|denotes a matrix with element-wise absolute values ofA. On the other hand, our Corollary 2.2 allows eigenvalues with magnitudes grater than unity. Although these two results appear to be contradictory (whenAconsists of non-negative elements), the key difference is the notion of convergence. As an example, consider the matrix A2 defined in (2.55). Its spectral radius is exactly 1, and [32] proved thatthere existsa sequence of indices under which iterations onA2do not converge. For example, assuming𝑁 is odd, consider the index sequence generated as𝑖 =(2π‘˜ βˆ’1) (mod𝑁) +1. However, Corollary 2.2 proves the convergence in a statisticalmean-square averaged sense.

(See Figure 2.5.) In short, when compared with [32], Corollary 2.2 requires a weaker condition onAand guarantees a convergence in a weaker (and probabilistic) sense.

which will be denoted by 𝐾. Thus, the regular power method will run d𝐾/𝑁e iterations, whereas the component-wise variant will run𝐾 iterations.

For the numerical experiment we consider three symmetric matrices of size𝑁 =100.

All three matrices are constructed such thatπœ†π‘ =1 is an eigenvalue with multiplicity 𝑀 =1, and the remaining 𝑁-1 eigenvalues are selected to be |πœ†π‘–|< 1 so that the power method (hence any random variant) is guaranteed to converge to an eigen- vector of the eigenvalueπœ†π‘ =1. (See Corollary 2.2.) In the first two examples we consider a pair of simultaneously diagonalizable matrices. The non-unit eigenvalues of the first matrix are selected to be positive (visualized in Figure 2.3a), and the non-unit eigenvalues of the second matrix are selected to be the negative of those of the first matrix (visualized in Figure 2.3b). In the third example we take a random symmetric matrix with non-unit eigenvalues satisfyingβˆ’0.5 < πœ†π‘– < 0.5 (visualized in Figure 2.3c). Figures 2.3d, 2.3e and 2.3f show the value ofE[kr𝐾k2

2]/kr0k2

2as a function of𝐾 for the three matrices described above.

-1 -0.5 0 0.5 1

0

(a) Eigenvalue gap: 0.0026

-1 -0.5 0 0.5 1

0

(b) Eigenvalue gap: 0.0026

-1 -0.5 0 0.5 1

0

(c) Eigenvalue gap: 0.5011

0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10-7

10-6 10-5 10-4 10-3 10-2 10-1 100

Average Residual Norm Squared

Random Component-Wise Regular Power Method

(d)

0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10-7

10-6 10-5 10-4 10-3 10-2 10-1 100

Average Residual Norm Squared

Random Component-Wise Regular Power Method

(e)

0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10-7

10-6 10-5 10-4 10-3 10-2 10-1 100

Average Residual Norm Squared

Random Component-Wise Regular Power Method

(f)

Figure 2.3: Non-unit eigenvalues of the (a) first, (b) second, and (c) third examples.

Eigenvalue gap is defined as the difference between 1 and the magnitude of the largest non-unit eigenvalue. Normalized residual errors in the (d) first, (e) second, and (f) third examples. Since the regular power method requires 𝑁 inner products per iteration, the residual error appears only at integer multiples of𝑁 =100. Results are obtained by averaging over 104independent runs.

We first compare the results in Figures 2.3d and 2.3e. Since the eigenvalues have the same magnitudes, the regular power method behaves the same in both cases.

Although the eigenvalue gap is the same in both cases, the random component- wise method convergessignificantly fasterwhen the second dominant eigenvalue is negative. When the second dominant eigenvalue is positive, both the regular and

the component-wise updates behave similarly. In the third example, Figure 2.3f, the matrix has a large eigenvalue gap, in which case the random component-wise updates do not converge as fast as the regular power method.

Theoretical Justification

In order to explain the behavior in Figure 2.3, in this section we will assume a slightly simplified stochastic model for the selection of the update sets in (2.5). Namely, we will assume that the scheme (2.5) updatesexactlyπœ‡Tindices per iteration. Thus, the random variableT(which denotes the size of the update sets) becomes a deterministic quantity, and𝜎2

T =0. So, the parameter 𝛿T (the amount of asynchronicity) reduces to the following form:

𝛿T = π‘βˆ’πœ‡T π‘βˆ’1

. (2.50)

In this setting we note that an update in the form of (2.5) requiresπœ‡Tinner products per iteration. So, the cost of a single power iteration, which requires 𝑁 inner products, is equivalent to the cost of 𝑁/πœ‡T asynchronous iterations in which πœ‡T indices are updates simultaneously. Since the associated cost of an eigenvalue defined in (2.40) disregards the cost of an iteration, we consider the following quantity instead:

π‘Ÿ πœ†; πœ‡T, 𝜌

= 1+ πœ‡T 𝑁

|πœ†|2βˆ’1+𝛿T(πœŒβˆ’1) |πœ†βˆ’1|2 !𝑁/πœ‡T

, (2.51)

which results in a fair comparison among the component-wise updates with different amount of asynchronicity. The quantityπ‘Ÿ πœ†; πœ‡T, 𝜌

can be interpreted as the amount of reduction in the residual error when eigenvalueπœ†is present in the matrixAwith the eigenspace parameter𝜌, and the model (2.5) updatesπœ‡Tindices simultaneously.

Thus, smaller values of π‘Ÿ πœ†; πœ‡T, 𝜌

indicate a better (faster) convergence of the randomized scheme in (2.5).

We first note that the quantityπ‘Ÿ πœ†; πœ‡T, 𝜌

can be equivalently re-written as follows:

π‘Ÿ πœ†; πœ‡T, 𝜌

= 1+ πœ‡T 𝑁

(𝛼+1)

πœ†βˆ’ 𝛼 𝛼+1

2βˆ’ 1

(𝛼+1)2 !𝑁/πœ‡T

, (2.52)

where 𝛼=𝛿T (πœŒβˆ’1) as in Corollary 2.2. Then, it is clear that the point πœ†β˜… = 𝛼/(𝛼+1)minimizesπ‘Ÿ πœ†; πœ‡T, 𝜌

over the variableπœ†, andπ‘Ÿ πœ†; πœ‡T, 𝜌

(as a function ofπœ†) is circularly symmetric with respect to the pointπœ†β˜…. In addition, the inequality

(2.36) ensures thatπ‘Ÿ πœ†; πœ‡T, 𝜌

β‰₯ 0. In order to demonstrate its behavior, we evaluate π‘Ÿ πœ†; πœ‡T, 𝜌) numerically over the unit disk (as a function ofπœ†) for different values ofπœ‡T and𝜌. These computations are visualized in Figure 2.4.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(Ξ»)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a)π‘Ÿ(πœ†; 𝑁 ,0.8)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(Ξ»)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(b)π‘Ÿ(πœ†; 𝑁/2,0.8)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(Ξ»)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(c)π‘Ÿ(πœ†; 1,0.8)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(Ξ»)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(Ξ»)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(d)π‘Ÿ(πœ†; 1,0.6)

Figure 2.4: Numerical evaluation ofπ‘Ÿ πœ†, πœ‡T

for various different values ofπœ‡T and 𝜌. The value of𝑁 is set to be𝑁 =100.

In the case of synchronous updates we have πœ‡T =𝑁, thus 𝛼=0, and the quantity defined in (2.51) reduces toπ‘Ÿ πœ†; 𝑁 , 𝜌

=|πœ†|2irrespective of the value of 𝜌, which can be seen clearly from Figure 2.4a. Thus, as |πœ†| approaches 1, the value of π‘Ÿ πœ†; 𝑁 , 𝜌

approaches 1 irrespective of the phase ofπœ†. So, only the magnitude of an eigenvalue affects the convergence rate of the regular power iteration, which is a well-known result.

In the case of asynchronous updates we have πœ‡T < 𝑁, and we will assume 𝜌 <1 (which is the case in most practical applications). Thus, we have 𝛼 <0, and the phase of an eigenvalue becomes important since π‘Ÿ πœ†; πœ‡T, 𝜌

is no longer a

circularly symmetric function ofπœ†with respect to the origin. Figures 2.4b, 2.4c and 2.4d visualize this behavior clearly. In particular, note that asπœ†approaches 1, the quantityπ‘Ÿ πœ†; πœ‡T, 𝜌

approaches 1 as well. On the other hand, asπœ†approachesβˆ’1, the quantityπ‘Ÿ πœ†; πœ‡T, 𝜌

stays bounded away from 1. More precisely, π‘Ÿ 1; πœ‡T, 𝜌

=1, π‘Ÿ βˆ’1; πœ‡T, 𝜌

= 1+ πœ‡T 𝑁 4𝛼

!𝑁/πœ‡T

. (2.53)

So, eigenvalues that are close to 1 result in a slower convergence, whereas eigen- values can be arbitrarily close toβˆ’1, yet the convergence does not necessarily slow down. Therefore, in light of (2.53) and Figure 2.4 we can conclude thatthe random component-wise updates favor negative eigenvalues over positive ones. This con- clusion is consistent with the numerical observations made in Figures 2.3d and 2.3e:

when the second dominant eigenvalue is close to 1, both the random component- wise updates and the regular power iteration converge slowly. On the contrary, when the second dominant eigenvalue is close toβˆ’1, the random component-wise updates converge faster than the synchronous (regular) counter-part. In fact, it is possible to construct a matrixA(by placing the second dominant eigenvalue sufficiently close toβˆ’1) such that the randomized updates convergearbitrarily fasterthan the regular power iteration.

Although random component-wise updates converge faster when the second domi- nant eigenvalue is close to βˆ’1, Figure 2.3f shows that randomized updates are not always faster than the synchronous counter-part. In order to explain the behavior observed in Figure 2.3f, we considerπ‘Ÿ πœ†; πœ‡T, 𝜌

evaluated atπœ†=0. More precisely, π‘Ÿ 0; πœ‡T, 𝜌

= 1+ πœ‡T 𝑁

𝛿T(πœŒβˆ’1) βˆ’1

!𝑁/πœ‡T

β‰₯ 1βˆ’ πœ‡T 𝑁

!2𝑁/πœ‡T

, (2.54)

where the lower bound follows from (2.36). As long as the updates are randomized (the case of πœ‡T < 𝑁), it is clear from (2.54) thatπ‘Ÿ 0; πœ‡T, 𝜌

is bounded away from zero. Figures 2.4b, 2.4c and 2.4d visualize this behavior as well. Then, we can conclude that in the case of random component-wise updates the associated cost of an eigenvalue is bounded away from zero even when the eigenvalue itself is close to zero. This conclusion is consistent with the simulation results presented in Figure 2.3f. When the non-unit eigenvalues are close to zero, regular power iteration converges faster than its randomized variant.

As a concluding remark, we note that the results presented in this section are valid whenAis a normal matrix, i.e,A is unitarily diagonalizable. The results of

this section may not hold true when A is an arbitrary matrix. Nevertheless, the normality condition is not a loss of generality when dealing with undirected graphs as in Section 2.7, or if our goal is to construct a random component-wise method that can compute the singular vectors of an arbitrary matrix as in Section 2.8.