Randomized Kaczmarz and Gauss-Seidel Algorithms

RANDOMIZED ALGORITHMS AS SWITCHING SYSTEMS

Lemma 5.2. The error correlation matrix converges to R, i.e.,

5.3 Randomized Kaczmarz and Gauss-Seidel Algorithms

Given a complex matrixA∈C^𝑀×𝑁 and a complex vectoru∈C^𝑀, we will consider the following linear system:

A x=u, (5.18)

where we assume that 𝑀 ≥ 𝑁, i.e., the system is overdetermined, and the matrixA has full column rank. When the system is consistent we will use x^★ to denote the solution of the system. When the system is inconsistent, we will usex_LS to denote the least-squares solution, that is,

x_LS =arg min

𝝃 kA𝝃−uk²

2 =(A^HA)^-1A^Hu. (5.19) 5.3.1 Randomized Kaczmarz Algorithm

In order to solve the linear system of equations in (5.18), the Kaczmarz algorithm [93] considers the following iterative updates on the solution vectorx𝑘:

x𝑘+1 =x𝑘 + 𝑢_𝑖

𝑘 −a^H₍_𝑖

𝑘)x𝑘

ka(𝑖𝑘)k²

a_(𝑖

𝑘), (5.20)

wherea^H₍_𝑗₎ denotes the 𝑗^{𝑡 ℎ} rowof the matrixA, and𝑖_𝑘 denotes the index selected at the 𝑘^{𝑡 ℎ} iteration of the algorithm. In words, the Kaczmarz algorithm selects a row from the matrixA, and then updates the solutionx𝑘 accordingly.

In the randomized variant of the algorithm, the 𝑗^{𝑡 ℎ} row is selected randomly and independently with probability 𝑝_𝑗 [81, 165]. In this case, the Kaczmarz algorithm can be represented as a randomly switching system, where the setsAandUconsist of the following elements for 1≤ 𝑗 ≤ 𝑀:

A𝑗 =I− 1 ka(𝑗)k²

a₍_𝑗) a^H₍_𝑗), u𝑗 = 𝑢_𝑗 ka(𝑗)k²

a₍_𝑗). (5.21)

Note thatA𝑗’s are orthogonal projections, thus the Kaczmarz algorithm “switches between orthogonal projections.”

When the formalism of the switching systems is applied to the Kaczmarz algorithm, we get the following average state-transition matrix and the average input signal:

A¯ =I−A^HP W^-2A, u¯ =A^HP W^-2u, (5.22) wherePand Ware diagonal matrices of size 𝑀 with 𝑗^{𝑡 ℎ} diagonal entry being 𝑝_𝑗 and ka(𝑗)k₂, respectively. Thus, the fixed-point of the average system can be found as follows:

x¯ =(A^HP W^-2A)^-1A^HP W^-2u. (5.23) So, the fixed-point of the average system depends on the update probabilities in general. Nevertheless, when the probabilities are selected as 𝑝_𝑗 = ka(𝑗)k²

2/kAk²

(i.e.,P=W²/kAk²

F), the fixed-point of the average system becomes ¯x=x_LS. When the linear system of equations in (5.18) is inconsistent, the fixed-point of the average system does not satisfy the individual systems in (5.21). Namely, x¯ =A𝑗x¯ +u𝑗 does not hold true for all 1 ≤ 𝑗 ≤ 𝑀. Thus, the random vector x𝑘

does not converge to ¯x even when ¯x corresponds to x_LS. As a result, Kaczmarz iterations cannot obtain the least-squares solution.

When the linear system of equations in (5.18) isconsistent, i.e., there existsx^★such thatAx^★=u, the fixed-point of the average system becomes ¯x=x^★irrespective of the update probabilities. Furthermore, ¯x=A𝑗x¯ +u𝑗 holds true for all 1 ≤ 𝑗 ≤ 𝑀. Thus, the random vector x𝑘 converges to the solution of the linear system x^★ as long as the condition given by Lemma 5.2 is met. In fact, the convergence of the randomized Kaczmarz algorithm can be guaranteed for any set of probabilities.

More precisely, we have the following:

Corollary 5.1. When the linear system in(5.18)is consistent, randomized Kaczmarz algorithm converges to the unique solution of (5.18)in the mean-squared sense (and almost surely) for any set of nonzero probabilities.

Proof. We note the following inequalities:

𝚽=

𝑀

𝑗=1

𝑝_𝑗A^∗𝑗 ⊗A𝑗

𝑀

𝑗=1

𝑝_𝑗I𝑁 ⊗A𝑗 =I𝑁 ⊗A¯ (5.24)

=I𝑁2−

I𝑁 ⊗ A^HP W^-2A

≺I𝑁2, (5.25)

whereI𝑁denotes the identity matrix of size𝑁. The inequality in (5.24) follows from the fact that0 A𝑗 I. The strict inequality in (5.25) follows from the assumption that the matrixA has full column rank, and the probabilities satisfy 𝑝_𝑗 >0. The fact that𝚽 0and the inequality (5.25) imply that 𝜌(𝚽) < 1. Then, Lemma 5.2

proves the claimed convergence.

We note that the mean-squared convergence of the randomized Kaczmarz algorithm was first proved explicitly in [165] for the probabilities 𝑝_𝑗 =ka(𝑗)k²

2/kAk²

F. The almost sure convergence of the algorithm was shown in [36]. The study [46]

considered the optimal set of probabilities that minimizes the upper bound in (5.24), and [3] considered the optimal set of probabilities that minimizes 𝜌(𝚽) itself.

Since𝚽is a positive semi-definite matrix by construction in the case of Kaczmarz algorithm, the optimal selection of the probabilities was based on semi-definite programming in both [46] and [3].

Although the first convergence proof of the randomized Kaczmarz algorithm is attributed to the study in [165], it is also possible to find convergence proofs for the algorithm in control theory literature. In particular, the study [23] considered the updates in (5.21) as an “adaptive filtering” (see [23, Eq. (20)]) and proved the almost sure convergence of the iterations (see [23, Theorem 7]). In addition, the book [42] considered the same update scheme as an application of its results (see [42, Section 3.6.2]) and proved the almost sure convergence for the more general case of indices being selected according to anergodic Markov chain. We note that the original form of the Kaczmarz algorithm [93] considers the use of consecutive indices, which is, in fact, equivalent to a Markov chain on a directed cycle graph.

So, [42, Lemma 3.53] proves the convergence of the original Kaczmarz algorithm as well as its randomized variant from the viewpoint of switching systems.

Although the random vector x𝑘 in Kaczmarz algorithm does not converge to the least squares solutionx_LSin the case of inconsistent system of equations, it is in fact possible to obtain the solutionx_LS using asample averaging. In this regard, we first define the sample average (of the first𝐾 iterations)y𝐾 as follows:

y𝐾 = 1 𝐾

𝐾

𝑘=1

x𝑘. (5.26)

We now note thatE[x𝑘]isthe ensemble averageof the random vectorx𝑘, and as the iterations progress the ensemble average converges to ¯x. See (5.10). This is due to the stability of the matrix ¯Aproven in Corollary 5.1. Since the independent selection

of the indices forms anergodic Markov chain, the sample average converges to the ensemble average as more samples are used. More precisely,

𝐾→∞lim ky𝐾 −xk¯ ²

2=0, (5.27)

which follows from the stability of the matrix 𝚽. Thus, the random vector y𝐾

converges to ¯xin (5.23) in the mean-squared sense whether the system of equations are consistent or not. Although this convergence behavior holds true for any set of index selection probabilities, the fixed-point of the average system ¯x becomes the least-squares solution only when the probabilities are selected as P=W²/kAk²

F. With this particular selection of the probabilities, we can claim for the following:

𝑝_𝑗 =ka(𝑗)k²

2/ kAk²

F =⇒ lim

𝐾→∞ky𝐾−x_LSk²

2=0, (5.28) which shows that the sample averagey𝐾 converges to the least-squares solution in the mean-squared sense.

5.3.2 Randomized Gauss-Seidel Algorithm

Another approach for solving the system of equations in (5.18) is the Gauss-Seidel algorithm, which updates the solution vectorx𝑘iteratively according to the following scheme:

x𝑘+1 =x𝑘+ a^H_𝑖

𝑘 u−A x𝑘) ka𝑖_𝑘k²

e𝑖_𝑘, (5.29)

wherea𝑗 and e𝑗 denote the 𝑗^{𝑡 ℎ} column of the matrix Aand the identity matrix I, respectively. The index selected at the 𝑘^{𝑡 ℎ} iteration is denoted by𝑖_𝑘. In words, the Gauss-Seidel algorithm selects a column from the matrixA, and then updates only the corresponding entry of the solution vector. In the randomized variant, the 𝑗^{𝑡 ℎ} column is selected randomly and independently with probability 𝑝_𝑗 [107]. So, the algorithm can be represented as a randomly switching system, where the setsAand U consist of the following elements for 1≤ 𝑗 ≤ 𝑁:

A𝑗 =I− 1 ka𝑗k²

e_𝑗 a^H_𝑗 A, u𝑗 =e_𝑗 a^H_𝑗 u ka𝑗k²

. (5.30)

We note thatA𝑗’s defined in (5.30) satisfyA²_𝑗 =A_𝑗, but they are not Hermitian, i.e., A^H_𝑗 ≠ A𝑗, in general. Thus, the Gauss-Seidel algorithm “switches betweenoblique projections.”

When the formalism of the switching systems is applied to the Gauss-Seidel algorithm, we get the following average state-transition matrix and the average input

signal:

A¯ =I−P W^-2A^HA, u¯ =P W^-2A^Hu, (5.31) whereP andW are diagonal matrices of size 𝑁 with 𝑗^{𝑡 ℎ} diagonal entry being 𝑝_𝑗 andka𝑗k₂, respectively. So, the fixed-point of the average system becomes ¯x=x_LS irrespective of the update probabilities. More importantly, unlike the Kaczmarz method, the point ¯xsatisfies ¯x=A𝑗x¯+u𝑗for all 1 ≤ 𝑗 ≤ 𝑁whether the set of linear equations (5.18) is consistent or not. Thus, the random vectorx𝑘indeed converges to the least-squares solution in the mean-squared sense as long as the condition given by Lemma 5.2 is met. In fact, the convergence of the randomized Gauss-Seidel algorithm can be guaranteed for any set of probabilities. More precisely, we have the following:

Corollary 5.2. The randomized Gauss-Seidel method converges to the least-squares solution of (5.18) in the mean-squared sense (and almost surely) for any set of nonzero probabilities.

Proof. We will show that the third statement in Lemma 5.3 holds, which in turn implies the convergence of the iterations. In this regard takeX=(A^HA)^-1, and note thatX 0. Then,

X−

𝑁

𝑗=1

𝑝_𝑗A𝑗X A^H𝑗 =P W^-2 0, (5.32) where the positive-definiteness follows from the fact that all the probabilities are

nonzero. This proves the claim.

We note that the mean-squared convergence of the randomized Gauss-Seidel algorithm was first proved explicitly in [107] for the probabilities 𝑝_𝑗 = ka𝑗k²

2/kAk²

F. We refer to [132, 80, 110, 122] (and references therein) for more results involving randomized Kaczmarz and Gauss-Seidel algorithms and their extensions.

5.3.3 Randomized Asynchronous Fixed-Point Iterations

When the system in (5.18) is “square,” i.e.,𝑀 = 𝑁, fixed-point iterations provide an alternative approach for obtaining numerical solutions to linear systems of equations [30, 31, 190]. Asynchronous variants of fixed-point iterations are also studied in detail in non-random [32, 14, 19, 20] and random settings [9]. We have also studied random asynchronous fixed-point iterations are studied in the context of graph signal processing for distributed and asynchronous implementation of graph filters in Chapters 4 and 3.

For a given matrixA∈C^𝑁×𝑁and an input signalu∈C^𝑁, in this section we consider the following random asynchronous fixed-point iterations:

(x𝑘+1)𝑖 =











(A x𝑘)𝑖+𝑢_𝑖, w.p. 𝑝_𝑖, (x𝑘)𝑖, w.p. 1−𝑝_𝑖,

(5.33) where the𝑖^{𝑡 ℎ} index of the vectorx𝑘 is updated with probability 𝑝_𝑖independently in every iteration. So, there are 2^𝑁 different ways of updating the vectorx𝑘 in every iteration, and the updates in (5.33) can be written as a randomly switching system with the setsAandUconsisting of 2^𝑁 elements.

More precisely, first enumerate all subsets of{1,· · · , 𝑁}, and let T𝑗 denote the 𝑗^{𝑡 ℎ} subset. Then, the 𝑗^{𝑡 ℎ} element of the sets A andU can be written as follows for 1≤ 𝑗 ≤ 2^𝑁:

A𝑗 =I−DT𝑗 I−A

, u𝑗 =DT𝑗 u, (5.34)

whereDT𝑗 is a diagonal matrix that has 1’s at the indices belonging to the setT𝑗 and 0 elsewhere. Furthermore, the probability𝑞_𝑗 of switching toA𝑗 can be written as follows:

𝑞_𝑗 = Ö

𝑖∈T𝑗

𝑝_𝑖 Ö

𝑖∉T𝑗

(1− 𝑝_𝑖)

. (5.35)

When the formalism of the switching systems is applied to the model (5.33), we get the following average state-transition matrix and the average input signal:

A¯ =I−P(I−A), u¯ =P u, (5.36) whereP=diag( [𝑝

1 𝑝

2 · · · 𝑝_𝑁])is a diagonal matrix consisting of the update probabilities of the model (5.33). So, the fixed point of the average system is the same as the fixed point of the pair (A,u), namely, ¯x=(I−A)^-1u irrespective of the update probabilities. Furthermore, the point ¯x satisfies ¯x=A𝑗x¯ +u𝑗 for all 1≤ 𝑗 ≤ 2^𝑁. Thus, the random vector x𝑘 converges to ¯x as long as the condition (5.13) is met.

Unlike the randomized Kaczmarz and Gauss-Seidel algorithms, random asynchronous fixed-point iterations may not converge for an arbitrary set of probabilities.

Nevertheless, the convergence of the updates can be verified via stability of the ma- trix𝚽defined in (5.13). With the matrices in (5.34) and probabilities in (5.35), the matrix𝚽can be written explicitly as follows in the case of asynchronous fixed-point iterations (See Corollary 4.2 and the proof in Section 4.8.6):

𝚽=A¯^∗⊗A¯ +

(I−P) ⊗P J

(A^∗−I) ⊗ (A−I)

, (5.37)

where J is a diagonal matrix of size 𝑁² with diagonal entries being equal to the vectorized identity matrix of size𝑁. See (4.39).

We know that stability of A, i.e., 𝜌(A) < 1, is both necessary and sufficient for synchronous fixed-point iterations to converge. However, the most important ob- servation regarding (5.37) is that stability of the matrices 𝚽 and A do not imply each other. So, an unstable system (in the synchronous world) may converge with randomized asynchronicity. Furthermore, eigenvectors (and not just eigenvalues) of the matrix A are also important in determining the convergence in the random asynchronous case (see Section 4.4.4). We also note that the conditionA^HP A≺ P is shown to be sufficient for the convergence of the updates in Corollary 3.1 and Corollary 4.3, and it is more relaxed than the necessary condition in the non-random setting [32, 14] (see Lemma 3.1).

The randomized model (5.33) can also be extended to have time-varying input signals, where the vector u changes as the iterations progress. In this case, the updates become a randomized and asynchronous variant of the discrete-time state- space model, and it is possible to study the “frequency response” of such systems in a statistical sense. This aspect is studied in Chapter 4.

Dalam dokumen Signals on Networks: Random Asynchronous and Multirate Processing, and Uncertainty Principles (Halaman 175-181)