Convergence in Mean-Squared Sense - Cascade of Asynchronous Updates

RANDOM NODE-ASYNCHRONOUS UPDATES ON GRAPHS

2.3 Cascade of Asynchronous Updates

2.3.2 Convergence in Mean-Squared Sense

In the following we will assume thatAhas a unit eigenvalue with multiplicity𝑀 ≥ 1.

This assumption ensures that the asynchronous update equation has a fixed point (Lemma 2.2). Without loss of generality we will order the eigenvalues of Asuch that𝜆_𝑗 ≠1 for 1≤ 𝑗 ≤ 𝑁-𝑀. Notice that non-unit eigenvalues are allowed to be complex in general, and complex eigenvalues on or outside the unit circle are not ruled out. Then, the eigenvalue decomposition ofAcan be written as

A=[U V

1]diag [𝜆

1 · · · 𝜆_𝑁

-𝑀 1 · · · 1]

[U V1]^H, (2.29) whereV₁∈C^𝑁×𝑀 is an orthonormal basis for the eigenspace of the unit eigenvalue, and U ∈C^𝑁×(𝑁^-^𝑀) corresponds to the eigenvectors of the non-unit eigenvalues.

SinceAis assumed to be a normal matrix, we haveU^HV₁=0, andU^HU=I. We now define the following quantities:

𝜌 =𝜆

max U^H diag(U U^H)U

, (2.30)

¯ 𝜌 =𝜆

min U^H diag(U U^H)U

, (2.31)

which will play a crucial role in the analysis of convergence. Notice that𝜌and ¯𝜌do not depend on the particular selection of the basis matrixU. Just the column space ofUdetermines their values. More importantly, we have the following property:

Lemma 2.3. The following holds true for anyU∈C^𝑁×(^𝑁^-^𝑀) withU^HU=I:

1 𝑁

I U^H diag U U^H

U I. (2.32)

Proof. We note that U∈C^𝑁×(𝑁^-^𝑚) has orthonormal columns, i.e., U^HU=I and prove the upper bound in (2.32) first. Note thatU U^H I. Then we can write the following:

(U U^H)𝑖,𝑖 =e^H𝑖 U U^He𝑖 ≤ 1 =⇒ diag U U^H

I =⇒ U^H diag U U^H

U I, (2.33) wheree𝑖denotes the𝑖^{𝑡 ℎ} column of the identity matrix of dimension 𝑁.

We now prove the lower bound in (2.32). Letu(𝑖) denote the𝑖^{𝑡 ℎ} row ofU, then it is clear that (U U^H)𝑖, 𝑗 =u₍_𝑖₎u^H₍_𝑖₎. Letx∈C^𝑁 be an arbitrary vector. Then,

x^HU U^Hx=

x^HU U^Hx =

𝑁

𝑖=1 𝑁

𝑗=1

𝑥^∗

𝑖 (U U^H)𝑖, 𝑗𝑥_𝑗

𝑁

𝑖=1 𝑁

𝑗=1

𝑥^∗

𝑖 u_(𝑖)u^H₍_𝑗)𝑥_𝑗

≤

𝑁

𝑖=1 𝑁

𝑗=1

|𝑥_𝑖|

u_(𝑖)u^H₍_𝑗) |𝑥_𝑗| ≤

𝑁

𝑖=1 𝑁

𝑗=1

|𝑥_𝑖| ku(𝑖)k₂ ku(𝑗)k₂ |𝑥_𝑗|

𝑁

𝑖=1

|𝑥_𝑖| ku(𝑖)k₂

!₂

≤ 𝑁

𝑁

𝑖=1

|𝑥_𝑖|² ku(𝑖)k²

2= 𝑁x^H diag U U^H x.

(2.34) Then, the inequity (2.34) implies that

U U^H 𝑁 diag U U^H

=⇒ U^HU U^HU 𝑁U^H diag U U^H

U, (2.35) which proves the lower bound due to the fact thatU^HU=I. So, Lemma 2.3 implies the following inequality regarding the quantities ¯𝜌and𝜌:

1 𝑁

≤ 𝜌¯ ≤ 𝜌 ≤ 1. (2.36)

For an arbitrary x𝑘, let r𝑘 denote the residual from the projection of x𝑘 onto the column space ofV₁. That is,

r𝑘 =x𝑘−V₁V^H₁ x𝑘 =U U^H x𝑘. (2.37)

Then, the convergence of x𝑘 to an eigenvector of the unit eigenvalue is equivalent to the convergence ofr𝑘 to zero. The following theorem, whose proof is presented in Appendix 2.10.2, provides bounds forr𝑘 as follows:

Theorem 2.2. The expected squaredℓ

2-norm of the residual at the𝑘^{𝑡 ℎ} iteration is bounded as follows:

𝜓^𝑘 kr₀k²

2 ≤ E kr𝑘k²

≤ Ψ^𝑘 kr₀k²

2, (2.38)

where

Ψ = max

1≤𝑗≤𝑁-𝑀

𝑐(𝜆_𝑗), 𝜓 = min

1≤𝑗≤𝑁-𝑀 𝑐¯(𝜆_𝑗), (2.39)

𝑐(𝜆) =1+ 𝜇_T 𝑁

|𝜆|²−1+𝛿_T(𝜌−1) |𝜆−1|²

, (2.40)

𝑐(𝜆) =1+ 𝜇_T 𝑁

|𝜆|²−1+𝛿_T(𝜌¯−1) |𝜆−1|²

The importance of Theorem 2.2 is twofold: First, it reveals the effect of the eigenvalues (𝜆_𝑗), the eigenspace geometry (𝜌,𝜌¯), and the amount of asynchronicity of the updates (𝛿_T) on the rate of convergence. In the synchronous case 𝛿_T =0 and 𝜇_T =𝑁, hence we haveΨ =max1≤𝑗≤𝑁-𝑀 |𝜆_𝑗|². This result is consistent with the well-known fact that the rate of convergence of the power iteration is determined by the second largest eigenvalue. However, in the asynchronous case (𝛿_T >0), not just the eigenvalues but the eigenspace geometry ofAhas an effect. As a result, similar matrices may have different convergence rates due to their different eigenspaces.

This point will be elaborated in Section 2.4. Furthermore, in order to guarantee that E[kr𝑘k²

2] ≤𝜀kr₀k²

2 for a given error threshold 𝜀, inequalities in (2.38) show that it is necessary to have at leastblog(𝜀)/log(𝜓)citerations, and sufficient to have dlog(𝜀)/log(Ψ)eiterations.

Secondly, Theorem 2.2 reveals a region for the eigenvalues such that the residual error through asynchronous updates is guaranteed to convergence to zero in the mean-squared sense. The following corollary presents this result formally.

Corollary 2.2. Assume that all non-unit eigenvalues of A satisfy the following condition:

𝜆− 𝛼 𝛼+1

< 1

𝛼+1

, (2.41)

where

𝛼=𝛿_T (𝜌−1). (2.42)

Then,

𝑘lim→∞ E kr𝑘k²

=0. (2.43)

Proof. From (2.39) it is clear thatΨ < 1 if and only if

|𝜆|²−1+𝛼|𝜆−1|² < 0, (2.44) for all non-unit eigenvalues𝜆. The inequality in (2.44) can be equivalently written as in (2.41). Since it implies thatΨ < 1, Theorem 2.2 guarantees the convergence ofE[kr𝑘k²

2] to zero as the number of updates,𝑘, goes to infinity.

An important remark is as follows: Corollary 2.2 provides a condition under which r𝑘 is guaranteed to converge to a point (zero) as 𝑘 goes to infinity. On the other hand, x𝑘 itself only converges to a random variable defined over the eigenspace of the unit eigenvalue. This is illustrated in Figure 2.1 where the eigenspace of the unit eigenvalue is spanned by the vector [1 1]^H, and x₀ =[-1 1]^H. In the synchronous case the signal converges to a point through a deterministic trajectory as shown in Figure 2.1a. For the random asynchronous case, Figure 2.1b illustrates the trajectories of the signals for different realizations. Convergence ofr𝑘 to zero implies that the limit ofx𝑘 always lie in the eigenspace of the unit eigenvalue (with a random orientation). Since any point in the eigenspace is an eigenvector, we can safely say thatx𝑘 converges to an eigenvector of the unit eigenvalue.

Notice that the convergence region for the eigenvalues defined in (2.41) is parametrized by 𝛼, and it is a disk on the complex plane with radius 1/(𝛼+1) centered at 𝛼/(𝛼+1). This region is visualized in Figure 2.2. Notice that 0 ≤ 𝛿_T ≤ 1 and 0< 𝜌 ≤ 1 always hold true. As a result𝛼satisfies−1< 𝛼≤ 0. The key observa- tion is that the region in (2.41) grows as𝛼approaches−1, and it is the smallest (and corresponds to the unit disk) when 𝛼=0. The quantity 𝛽 and the large circle in Figure 2.2 will be explained after Corollary 2.4.

Corollary 2.2 reveals the combined effect of the eigenspace geometry ofA(quantified with𝜌) and the amount of asynchronicity (quantified with𝛿_T) on the convergence of the iterations. In the case of𝛿_T =0 the region reduces to the unit disk, which is the well-known condition on the eigenvalues for the synchronous updates to converge.

This is an expected result since the case of𝛿_T =0 corresponds to the synchronous

(a) (b)

Figure 2.1: Some realizations of the trajectories of the signal through updates for (a) the non-random synchronous case, (a) the random asynchronous case.

update itself. More importantly, the synchronous updates imply𝛼=0 independent of the eigenspace geometry ofA. Therefore, the convergence is determined entirely by the eigenvalues ofAin the synchronous case.

On the other hand, the case of asynchronous updates results in alargerconvergence region for the eigenvalues. First of all, it should be noted that asynchronous updates increase the convergence region if the eigenspace geometry ofApermits. If 𝜌=1 then𝛼=0, and the region of convergence is not improved by asynchronous iterations. However, if𝜌 <1 (which is the case in most practical applications), then it is possible to enlarge the region of convergence using asynchronous iterations. As𝛿_T gets larger (less number of nodes are updated concurrently),𝛼 gets smaller, hence the convergence region gets larger. Even if one index is left unchanged in some iterations, we have𝛿_T >0, and the residualr𝑘can converge to zero,even when non- unit eigenvalues outside the unit circle might exist. This is a remarkable property of the asynchronous updates since the residual (hence the signal itself) would blow up in the case of synchronous updates. Notice that in the extreme case of 𝛿_T =1, the region of convergence is the largest possible. That is to say, updating exactly one node in each iteration maximizes the region of convergence of the eigenvalues.

On the other extreme, the synchronous update is the most restrictive case, which is formally stated in the following corollary:

Corollary 2.3. If the synchronous updates onAconverge, then

𝑘lim→∞ E kr𝑘k²

=0, (2.45)

for random updates onAwith any amount of asynchronicity.

Proof. If the synchronous updates converge, then all non-unit eigenvalues of A satisfy |𝜆|< 1. Hence, they also satisfy (2.41) for any value of 𝛼. Therefore, Corollary 2.2 ensures the convergence of the updates irrespective of the value of

𝛿_T.

It should be clear that converse of Corollary 2.3 is not true. Thus consider a scenario in which a signal over a network of nodes with autonomous (asynchronous) behavior stays in the steady-state. If the nodes start to operate synchronously, then it is possible for the signal to blow up. This happens if some of the eigenvalues fall outside of the reduced convergence region due to the reduction in the amount of asynchronicity.

In fact, the study in [142] claims that large-scale synchronization of neurons is an underlying mechanism of epileptic seizures. Similarly, the study in [196] presents the relation between increased neural synchrony and epilepsy as well as Parkinson’s disease. It should be noted that neural networks follow nonlinear models whereas the model we consider here is linear. Thus, results presented here do not apply to brain networks. Nevertheless, these neurobiological observations are consistent with the implications of Corollary 2.2 and Corollary 2.3 from a conceptual point of view.

Apart from the convergence of the iterations, Theorem 2.2 is also useful to charac- terize the case of non-converging iterations. In this regard, the following corollary presents a region for the eigenvalues such that asynchronous updates are guaranteed not to converge.

Corollary 2.4. Assume that all non-unit eigenvalues ofAsatisfy the following:

𝜆− 𝛽 𝛽+1

≥ 1

𝛽+1

, (2.46)

where

𝛽 =𝛿_T (𝜌¯−1). (2.47)

Then,

E kr𝑘k²

≥ kr₀k²

2. (2.48)

Furthermore, if (2.46) is satisfied with strict inequality, thenE[kr𝑘k²

2] grows un- boundedly as𝑘 goes to infinity.

𝑅𝑅𝑅𝑅(λ) 𝐼𝐼𝐼𝐼(λ)

−𝟏𝟏 𝟏𝟏

𝜶𝜶 − 𝟏𝟏 𝜶𝜶+𝟏𝟏 𝜷𝜷 − 𝟏𝟏

𝜷𝜷+𝟏𝟏

Convergence region, synchronous.

Convergence region, random asynchronous.

Both (2.41) and (2.46) are violated.

Convergence inconclusive.

Region of no convergence, random asynchronous.

Figure 2.2: Regions (given in (2.41) and (2.46)) for the eigenvalues such that random asynchronous updates are guaranteed to converge and diverge, respectively.

Proof. From (2.39) it is clear that𝜓 ≥ 1 if and only if

|𝜆|²−1+𝛽|𝜆−1|² ≥ 0, (2.49) for all non-unit eigenvalues𝜆. The inequality in (2.49) can be equivalently written as in (2.46). Since (2.46) implies that𝜓 ≥ 1, Theorem 2.2 indicates thatE[kr𝑘k²

2] is lower bounded by kr₀k²

2. If (2.46) is satisfied strictly, then 𝜓 >1. As a result, E[kr𝑘k²

2]grows unboundedly as 𝑘 goes to infinity.

From the definitions in (2.42) and (2.47) note that𝛼 ≥ 𝛽is always true due to the fact that 𝜌 ≥ 𝜌¯. Therefore, the conditions in (2.41) and (2.46) describe disjoint regions on the complex plane. See Figure 2.2. Corollary 2.4 also shows that the condition

|𝜆−𝛽/(𝛽+1) | < 1/(𝛽+1)isnecessaryfor the iterations to converge, whereas the condition in (2.41) issufficientfor the convergence (both in the mean square sense).

If there exists an eigenvalue that violates both (2.41) and (2.46), then convergence is inconclusive. This region is also indicated in Figure 2.2.

At this point it is important to compare the implications of Corollary 2.2 with the classical result presented in [14, 32]. Under the mild assumption that all the indices

are selected sufficiently often (see [32] for precise definition), the study [32] showed that the linear asynchronous model in (2.5) converges for any index sequence if and only if the spectral radius of|A|is strictly less than unity, where|A|denotes a matrix with element-wise absolute values ofA. On the other hand, our Corollary 2.2 allows eigenvalues with magnitudes grater than unity. Although these two results appear to be contradictory (whenAconsists of non-negative elements), the key difference is the notion of convergence. As an example, consider the matrix A₂ defined in (2.55). Its spectral radius is exactly 1, and [32] proved thatthere existsa sequence of indices under which iterations onA₂do not converge. For example, assuming𝑁 is odd, consider the index sequence generated as𝑖 =(2𝑘 −1) (mod𝑁) +1. However, Corollary 2.2 proves the convergence in a statisticalmean-square averaged sense.

(See Figure 2.5.) In short, when compared with [32], Corollary 2.2 requires a weaker condition onAand guarantees a convergence in a weaker (and probabilistic) sense.

Dalam dokumen Signals on Networks: Random Asynchronous and Multirate Processing, and Uncertainty Principles (Halaman 52-59)