Rate of Convergence - Cascade of Asynchronous Updates

RANDOM NODE-ASYNCHRONOUS UPDATES ON GRAPHS

2.3 Cascade of Asynchronous Updates

2.3.3 Rate of Convergence

are selected sufficiently often (see [32] for precise definition), the study [32] showed that the linear asynchronous model in (2.5) converges for any index sequence if and only if the spectral radius of|A|is strictly less than unity, where|A|denotes a matrix with element-wise absolute values ofA. On the other hand, our Corollary 2.2 allows eigenvalues with magnitudes grater than unity. Although these two results appear to be contradictory (whenAconsists of non-negative elements), the key difference is the notion of convergence. As an example, consider the matrix A₂ defined in (2.55). Its spectral radius is exactly 1, and [32] proved thatthere existsa sequence of indices under which iterations onA₂do not converge. For example, assuming𝑁 is odd, consider the index sequence generated as𝑖 =(2𝑘 −1) (mod𝑁) +1. However, Corollary 2.2 proves the convergence in a statisticalmean-square averaged sense.

(See Figure 2.5.) In short, when compared with [32], Corollary 2.2 requires a weaker condition onAand guarantees a convergence in a weaker (and probabilistic) sense.

which will be denoted by 𝐾. Thus, the regular power method will run d𝐾/𝑁e iterations, whereas the component-wise variant will run𝐾 iterations.

For the numerical experiment we consider three symmetric matrices of size𝑁 =100.

All three matrices are constructed such that𝜆_𝑁 =1 is an eigenvalue with multiplicity 𝑀 =1, and the remaining 𝑁-1 eigenvalues are selected to be |𝜆_𝑖|< 1 so that the power method (hence any random variant) is guaranteed to converge to an eigen- vector of the eigenvalue𝜆_𝑁 =1. (See Corollary 2.2.) In the first two examples we consider a pair of simultaneously diagonalizable matrices. The non-unit eigenvalues of the first matrix are selected to be positive (visualized in Figure 2.3a), and the non-unit eigenvalues of the second matrix are selected to be the negative of those of the first matrix (visualized in Figure 2.3b). In the third example we take a random symmetric matrix with non-unit eigenvalues satisfying−0.5 < 𝜆_𝑖 < 0.5 (visualized in Figure 2.3c). Figures 2.3d, 2.3e and 2.3f show the value ofE[kr𝐾k²

2]/kr₀k²

2as a function of𝐾 for the three matrices described above.

-1 -0.5 0 0.5 1

(a) Eigenvalue gap: 0.0026

-1 -0.5 0 0.5 1

(b) Eigenvalue gap: 0.0026

-1 -0.5 0 0.5 1

0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10^-7

10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

Average Residual Norm Squared

Random Component-Wise Regular Power Method

(d)

0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10^-7

10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

Average Residual Norm Squared

Random Component-Wise Regular Power Method

(e)

0 250 500 750 1000 1250 1500 1750 2000 K (Total Number of Inner Products) 10^-7

10^-6 10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

Average Residual Norm Squared

Random Component-Wise Regular Power Method

(f)

Figure 2.3: Non-unit eigenvalues of the (a) first, (b) second, and (c) third examples.

Eigenvalue gap is defined as the difference between 1 and the magnitude of the largest non-unit eigenvalue. Normalized residual errors in the (d) first, (e) second, and (f) third examples. Since the regular power method requires 𝑁 inner products per iteration, the residual error appears only at integer multiples of𝑁 =100. Results are obtained by averaging over 10⁴independent runs.

We first compare the results in Figures 2.3d and 2.3e. Since the eigenvalues have the same magnitudes, the regular power method behaves the same in both cases.

Although the eigenvalue gap is the same in both cases, the random component- wise method convergessignificantly fasterwhen the second dominant eigenvalue is negative. When the second dominant eigenvalue is positive, both the regular and

the component-wise updates behave similarly. In the third example, Figure 2.3f, the matrix has a large eigenvalue gap, in which case the random component-wise updates do not converge as fast as the regular power method.

Theoretical Justification

In order to explain the behavior in Figure 2.3, in this section we will assume a slightly simplified stochastic model for the selection of the update sets in (2.5). Namely, we will assume that the scheme (2.5) updatesexactly𝜇_Tindices per iteration. Thus, the random variable^T(which denotes the size of the update sets) becomes a deterministic quantity, and𝜎²

T =0. So, the parameter 𝛿_T (the amount of asynchronicity) reduces to the following form:

𝛿_T = 𝑁−𝜇_T 𝑁−1

. (2.50)

In this setting we note that an update in the form of (2.5) requires𝜇_Tinner products per iteration. So, the cost of a single power iteration, which requires 𝑁 inner products, is equivalent to the cost of 𝑁/𝜇_T asynchronous iterations in which 𝜇_T indices are updates simultaneously. Since the associated cost of an eigenvalue defined in (2.40) disregards the cost of an iteration, we consider the following quantity instead:

𝑟 𝜆; 𝜇_T, 𝜌

= 1+ 𝜇_T 𝑁

|𝜆|²−1+𝛿_T(𝜌−1) |𝜆−1|² !𝑁/𝜇_T

, (2.51)

which results in a fair comparison among the component-wise updates with different amount of asynchronicity. The quantity𝑟 𝜆; 𝜇_T, 𝜌

can be interpreted as the amount of reduction in the residual error when eigenvalue𝜆is present in the matrixAwith the eigenspace parameter𝜌, and the model (2.5) updates𝜇_Tindices simultaneously.

Thus, smaller values of 𝑟 𝜆; 𝜇_T, 𝜌

indicate a better (faster) convergence of the randomized scheme in (2.5).

We first note that the quantity𝑟 𝜆; 𝜇_T, 𝜌

can be equivalently re-written as follows:

𝑟 𝜆; 𝜇_T, 𝜌

= 1+ 𝜇_T 𝑁

(𝛼+1)

𝜆− 𝛼 𝛼+1

2− 1

(𝛼+1)² !^𝑁/𝜇_T

, (2.52)

where 𝛼=𝛿_T (𝜌−1) as in Corollary 2.2. Then, it is clear that the point 𝜆^★ = 𝛼/(𝛼+1)minimizes𝑟 𝜆; 𝜇_T, 𝜌

over the variable𝜆, and𝑟 𝜆; 𝜇_T, 𝜌

(as a function of𝜆) is circularly symmetric with respect to the point𝜆^★. In addition, the inequality

(2.36) ensures that𝑟 𝜆; 𝜇_T, 𝜌

≥ 0. In order to demonstrate its behavior, we evaluate 𝑟 𝜆; 𝜇_T, 𝜌) numerically over the unit disk (as a function of𝜆) for different values of𝜇_T and𝜌. These computations are visualized in Figure 2.4.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(λ)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(λ)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a)𝑟(𝜆; 𝑁 ,0.8)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(λ)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(λ)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(b)𝑟(𝜆; 𝑁/2,0.8)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(λ)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(λ)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(c)𝑟(𝜆; 1,0.8)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Re(λ)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

Im(λ)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(d)𝑟(𝜆; 1,0.6)

Figure 2.4: Numerical evaluation of𝑟 𝜆, 𝜇_T

for various different values of𝜇_T and 𝜌. The value of𝑁 is set to be𝑁 =100.

In the case of synchronous updates we have 𝜇_T =𝑁, thus 𝛼=0, and the quantity defined in (2.51) reduces to𝑟 𝜆; 𝑁 , 𝜌

=|𝜆|²irrespective of the value of 𝜌, which can be seen clearly from Figure 2.4a. Thus, as |𝜆| approaches 1, the value of 𝑟 𝜆; 𝑁 , 𝜌

approaches 1 irrespective of the phase of𝜆. So, only the magnitude of an eigenvalue affects the convergence rate of the regular power iteration, which is a well-known result.

In the case of asynchronous updates we have 𝜇_T < 𝑁, and we will assume 𝜌 <1 (which is the case in most practical applications). Thus, we have 𝛼 <0, and the phase of an eigenvalue becomes important since 𝑟 𝜆; 𝜇_T, 𝜌

is no longer a

circularly symmetric function of𝜆with respect to the origin. Figures 2.4b, 2.4c and 2.4d visualize this behavior clearly. In particular, note that as𝜆approaches 1, the quantity𝑟 𝜆; 𝜇_T, 𝜌

approaches 1 as well. On the other hand, as𝜆approaches−1, the quantity𝑟 𝜆; 𝜇_T, 𝜌

stays bounded away from 1. More precisely, 𝑟 1; 𝜇_T, 𝜌

=1, 𝑟 −1; 𝜇_T, 𝜌

= 1+ 𝜇_T 𝑁 4𝛼

!^𝑁/𝜇T

. (2.53)

So, eigenvalues that are close to 1 result in a slower convergence, whereas eigenvalues can be arbitrarily close to−1, yet the convergence does not necessarily slow down. Therefore, in light of (2.53) and Figure 2.4 we can conclude thatthe random component-wise updates favor negative eigenvalues over positive ones. This conclusion is consistent with the numerical observations made in Figures 2.3d and 2.3e:

when the second dominant eigenvalue is close to 1, both the random component- wise updates and the regular power iteration converge slowly. On the contrary, when the second dominant eigenvalue is close to−1, the random component-wise updates converge faster than the synchronous (regular) counter-part. In fact, it is possible to construct a matrixA(by placing the second dominant eigenvalue sufficiently close to−1) such that the randomized updates convergearbitrarily fasterthan the regular power iteration.

Although random component-wise updates converge faster when the second dominant eigenvalue is close to −1, Figure 2.3f shows that randomized updates are not always faster than the synchronous counter-part. In order to explain the behavior observed in Figure 2.3f, we consider𝑟 𝜆; 𝜇_T, 𝜌

evaluated at𝜆=0. More precisely, 𝑟 0; 𝜇_T, 𝜌

= 1+ 𝜇_T 𝑁

𝛿_T(𝜌−1) −1

!^𝑁/𝜇_T

≥ 1− 𝜇_T 𝑁

!₂^𝑁/𝜇_T

, (2.54)

where the lower bound follows from (2.36). As long as the updates are randomized (the case of 𝜇_T < 𝑁), it is clear from (2.54) that𝑟 0; 𝜇_T, 𝜌

is bounded away from zero. Figures 2.4b, 2.4c and 2.4d visualize this behavior as well. Then, we can conclude that in the case of random component-wise updates the associated cost of an eigenvalue is bounded away from zero even when the eigenvalue itself is close to zero. This conclusion is consistent with the simulation results presented in Figure 2.3f. When the non-unit eigenvalues are close to zero, regular power iteration converges faster than its randomized variant.

As a concluding remark, we note that the results presented in this section are valid whenAis a normal matrix, i.e,A is unitarily diagonalizable. The results of

this section may not hold true when A is an arbitrary matrix. Nevertheless, the normality condition is not a loss of generality when dealing with undirected graphs as in Section 2.7, or if our goal is to construct a random component-wise method that can compute the singular vectors of an arbitrary matrix as in Section 2.8.

Dalam dokumen Signals on Networks: Random Asynchronous and Multirate Processing, and Uncertainty Principles (Halaman 59-64)