• Tidak ada hasil yang ditemukan

Randomized Kaczmarz and Gauss-Seidel Algorithms

RANDOMIZED ALGORITHMS AS SWITCHING SYSTEMS

Lemma 5.2. The error correlation matrix converges to R, i.e.,

5.3 Randomized Kaczmarz and Gauss-Seidel Algorithms

Given a complex matrixA∈C𝑀×𝑁 and a complex vectoru∈C𝑀, we will consider the following linear system:

A x=u, (5.18)

where we assume that 𝑀 β‰₯ 𝑁, i.e., the system is overdetermined, and the matrixA has full column rank. When the system is consistent we will use xβ˜… to denote the solution of the system. When the system is inconsistent, we will usexLS to denote the least-squares solution, that is,

xLS =arg min

𝝃 kAπƒβˆ’uk2

2 =(AHA)-1AHu. (5.19) 5.3.1 Randomized Kaczmarz Algorithm

In order to solve the linear system of equations in (5.18), the Kaczmarz algorithm [93] considers the following iterative updates on the solution vectorxπ‘˜:

xπ‘˜+1 =xπ‘˜ + 𝑒𝑖

π‘˜ βˆ’aH(𝑖

π‘˜)xπ‘˜

ka(π‘–π‘˜)k2

2

a(𝑖

π‘˜), (5.20)

whereaH(𝑗) denotes the 𝑗𝑑 β„Ž rowof the matrixA, andπ‘–π‘˜ denotes the index selected at the π‘˜π‘‘ β„Ž iteration of the algorithm. In words, the Kaczmarz algorithm selects a row from the matrixA, and then updates the solutionxπ‘˜ accordingly.

In the randomized variant of the algorithm, the 𝑗𝑑 β„Ž row is selected randomly and independently with probability 𝑝𝑗 [81, 165]. In this case, the Kaczmarz algorithm can be represented as a randomly switching system, where the setsAandUconsist of the following elements for 1≀ 𝑗 ≀ 𝑀:

A𝑗 =Iβˆ’ 1 ka(𝑗)k2

2

a(𝑗) aH(𝑗), u𝑗 = 𝑒𝑗 ka(𝑗)k2

2

a(𝑗). (5.21)

Note thatA𝑗’s are orthogonal projections, thus the Kaczmarz algorithm β€œswitches between orthogonal projections.”

When the formalism of the switching systems is applied to the Kaczmarz algorithm, we get the following average state-transition matrix and the average input signal:

AΒ― =Iβˆ’AHP W-2A, uΒ― =AHP W-2u, (5.22) wherePand Ware diagonal matrices of size 𝑀 with 𝑗𝑑 β„Ž diagonal entry being 𝑝𝑗 and ka(𝑗)k2, respectively. Thus, the fixed-point of the average system can be found as follows:

xΒ― =(AHP W-2A)-1AHP W-2u. (5.23) So, the fixed-point of the average system depends on the update probabilities in general. Nevertheless, when the probabilities are selected as 𝑝𝑗 = ka(𝑗)k2

2/kAk2

F

(i.e.,P=W2/kAk2

F), the fixed-point of the average system becomes Β―x=xLS. When the linear system of equations in (5.18) is inconsistent, the fixed-point of the average system does not satisfy the individual systems in (5.21). Namely, xΒ― =A𝑗xΒ― +u𝑗 does not hold true for all 1 ≀ 𝑗 ≀ 𝑀. Thus, the random vector xπ‘˜

does not converge to Β―x even when Β―x corresponds to xLS. As a result, Kaczmarz iterations cannot obtain the least-squares solution.

When the linear system of equations in (5.18) isconsistent, i.e., there existsxβ˜…such thatAxβ˜…=u, the fixed-point of the average system becomes Β―x=xβ˜…irrespective of the update probabilities. Furthermore, Β―x=A𝑗xΒ― +u𝑗 holds true for all 1 ≀ 𝑗 ≀ 𝑀. Thus, the random vector xπ‘˜ converges to the solution of the linear system xβ˜… as long as the condition given by Lemma 5.2 is met. In fact, the convergence of the randomized Kaczmarz algorithm can be guaranteed for any set of probabilities.

More precisely, we have the following:

Corollary 5.1. When the linear system in(5.18)is consistent, randomized Kaczmarz algorithm converges to the unique solution of (5.18)in the mean-squared sense (and almost surely) for any set of nonzero probabilities.

Proof. We note the following inequalities:

𝚽=

𝑀

Γ•

𝑗=1

𝑝𝑗Aβˆ—π‘— βŠ—A𝑗

𝑀

Γ•

𝑗=1

𝑝𝑗I𝑁 βŠ—A𝑗 =I𝑁 βŠ—AΒ― (5.24)

=I𝑁2βˆ’

I𝑁 βŠ— AHP W-2A

β‰ΊI𝑁2, (5.25)

whereI𝑁denotes the identity matrix of size𝑁. The inequality in (5.24) follows from the fact that0 A𝑗 I. The strict inequality in (5.25) follows from the assumption that the matrixA has full column rank, and the probabilities satisfy 𝑝𝑗 >0. The fact that𝚽 0and the inequality (5.25) imply that 𝜌(𝚽) < 1. Then, Lemma 5.2

proves the claimed convergence.

We note that the mean-squared convergence of the randomized Kaczmarz algorithm was first proved explicitly in [165] for the probabilities 𝑝𝑗 =ka(𝑗)k2

2/kAk2

F. The almost sure convergence of the algorithm was shown in [36]. The study [46]

considered the optimal set of probabilities that minimizes the upper bound in (5.24), and [3] considered the optimal set of probabilities that minimizes 𝜌(𝚽) itself.

Since𝚽is a positive semi-definite matrix by construction in the case of Kaczmarz algorithm, the optimal selection of the probabilities was based on semi-definite programming in both [46] and [3].

Although the first convergence proof of the randomized Kaczmarz algorithm is attributed to the study in [165], it is also possible to find convergence proofs for the algorithm in control theory literature. In particular, the study [23] considered the updates in (5.21) as an β€œadaptive filtering” (see [23, Eq. (20)]) and proved the almost sure convergence of the iterations (see [23, Theorem 7]). In addition, the book [42] considered the same update scheme as an application of its results (see [42, Section 3.6.2]) and proved the almost sure convergence for the more general case of indices being selected according to anergodic Markov chain. We note that the original form of the Kaczmarz algorithm [93] considers the use of consecutive indices, which is, in fact, equivalent to a Markov chain on a directed cycle graph.

So, [42, Lemma 3.53] proves the convergence of the original Kaczmarz algorithm as well as its randomized variant from the viewpoint of switching systems.

Although the random vector xπ‘˜ in Kaczmarz algorithm does not converge to the least squares solutionxLSin the case of inconsistent system of equations, it is in fact possible to obtain the solutionxLS using asample averaging. In this regard, we first define the sample average (of the first𝐾 iterations)y𝐾 as follows:

y𝐾 = 1 𝐾

𝐾

Γ•

π‘˜=1

xπ‘˜. (5.26)

We now note thatE[xπ‘˜]isthe ensemble averageof the random vectorxπ‘˜, and as the iterations progress the ensemble average converges to Β―x. See (5.10). This is due to the stability of the matrix Β―Aproven in Corollary 5.1. Since the independent selection

of the indices forms anergodic Markov chain, the sample average converges to the ensemble average as more samples are used. More precisely,

πΎβ†’βˆžlim ky𝐾 βˆ’xkΒ― 2

2=0, (5.27)

which follows from the stability of the matrix 𝚽. Thus, the random vector y𝐾

converges to Β―xin (5.23) in the mean-squared sense whether the system of equations are consistent or not. Although this convergence behavior holds true for any set of index selection probabilities, the fixed-point of the average system Β―x becomes the least-squares solution only when the probabilities are selected as P=W2/kAk2

F. With this particular selection of the probabilities, we can claim for the following:

𝑝𝑗 =ka(𝑗)k2

2/ kAk2

F =β‡’ lim

πΎβ†’βˆžkyπΎβˆ’xLSk2

2=0, (5.28) which shows that the sample averagey𝐾 converges to the least-squares solution in the mean-squared sense.

5.3.2 Randomized Gauss-Seidel Algorithm

Another approach for solving the system of equations in (5.18) is the Gauss-Seidel algorithm, which updates the solution vectorxπ‘˜iteratively according to the following scheme:

xπ‘˜+1 =xπ‘˜+ aH𝑖

π‘˜ uβˆ’A xπ‘˜) kaπ‘–π‘˜k2

2

eπ‘–π‘˜, (5.29)

wherea𝑗 and e𝑗 denote the 𝑗𝑑 β„Ž column of the matrix Aand the identity matrix I, respectively. The index selected at the π‘˜π‘‘ β„Ž iteration is denoted byπ‘–π‘˜. In words, the Gauss-Seidel algorithm selects a column from the matrixA, and then updates only the corresponding entry of the solution vector. In the randomized variant, the 𝑗𝑑 β„Ž column is selected randomly and independently with probability 𝑝𝑗 [107]. So, the algorithm can be represented as a randomly switching system, where the setsAand U consist of the following elements for 1≀ 𝑗 ≀ 𝑁:

A𝑗 =Iβˆ’ 1 ka𝑗k2

2

e𝑗 aH𝑗 A, u𝑗 =e𝑗 aH𝑗 u ka𝑗k2

2

. (5.30)

We note thatA𝑗’s defined in (5.30) satisfyA2𝑗 =A𝑗, but they are not Hermitian, i.e., AH𝑗 β‰  A𝑗, in general. Thus, the Gauss-Seidel algorithm β€œswitches betweenoblique projections.”

When the formalism of the switching systems is applied to the Gauss-Seidel algo- rithm, we get the following average state-transition matrix and the average input

signal:

AΒ― =Iβˆ’P W-2AHA, uΒ― =P W-2AHu, (5.31) whereP andW are diagonal matrices of size 𝑁 with 𝑗𝑑 β„Ž diagonal entry being 𝑝𝑗 andka𝑗k2, respectively. So, the fixed-point of the average system becomes Β―x=xLS irrespective of the update probabilities. More importantly, unlike the Kaczmarz method, the point Β―xsatisfies Β―x=A𝑗xΒ―+u𝑗for all 1 ≀ 𝑗 ≀ 𝑁whether the set of linear equations (5.18) is consistent or not. Thus, the random vectorxπ‘˜indeed converges to the least-squares solution in the mean-squared sense as long as the condition given by Lemma 5.2 is met. In fact, the convergence of the randomized Gauss-Seidel algorithm can be guaranteed for any set of probabilities. More precisely, we have the following:

Corollary 5.2. The randomized Gauss-Seidel method converges to the least-squares solution of (5.18) in the mean-squared sense (and almost surely) for any set of nonzero probabilities.

Proof. We will show that the third statement in Lemma 5.3 holds, which in turn implies the convergence of the iterations. In this regard takeX=(AHA)-1, and note thatX 0. Then,

Xβˆ’

𝑁

Γ•

𝑗=1

𝑝𝑗A𝑗X AH𝑗 =P W-2 0, (5.32) where the positive-definiteness follows from the fact that all the probabilities are

nonzero. This proves the claim.

We note that the mean-squared convergence of the randomized Gauss-Seidel al- gorithm was first proved explicitly in [107] for the probabilities 𝑝𝑗 = ka𝑗k2

2/kAk2

F. We refer to [132, 80, 110, 122] (and references therein) for more results involving randomized Kaczmarz and Gauss-Seidel algorithms and their extensions.

5.3.3 Randomized Asynchronous Fixed-Point Iterations

When the system in (5.18) is β€œsquare,” i.e.,𝑀 = 𝑁, fixed-point iterations provide an alternative approach for obtaining numerical solutions to linear systems of equations [30, 31, 190]. Asynchronous variants of fixed-point iterations are also studied in detail in non-random [32, 14, 19, 20] and random settings [9]. We have also studied random asynchronous fixed-point iterations are studied in the context of graph signal processing for distributed and asynchronous implementation of graph filters in Chapters 4 and 3.

For a given matrixA∈C𝑁×𝑁and an input signalu∈C𝑁, in this section we consider the following random asynchronous fixed-point iterations:

(xπ‘˜+1)𝑖 =





ο£²



ο£³

(A xπ‘˜)𝑖+𝑒𝑖, w.p. 𝑝𝑖, (xπ‘˜)𝑖, w.p. 1βˆ’π‘π‘–,

(5.33) where the𝑖𝑑 β„Ž index of the vectorxπ‘˜ is updated with probability 𝑝𝑖independently in every iteration. So, there are 2𝑁 different ways of updating the vectorxπ‘˜ in every iteration, and the updates in (5.33) can be written as a randomly switching system with the setsAandUconsisting of 2𝑁 elements.

More precisely, first enumerate all subsets of{1,Β· Β· Β· , 𝑁}, and let T𝑗 denote the 𝑗𝑑 β„Ž subset. Then, the 𝑗𝑑 β„Ž element of the sets A andU can be written as follows for 1≀ 𝑗 ≀ 2𝑁:

A𝑗 =Iβˆ’DT𝑗 Iβˆ’A

, u𝑗 =DT𝑗 u, (5.34)

whereDT𝑗 is a diagonal matrix that has 1’s at the indices belonging to the setT𝑗 and 0 elsewhere. Furthermore, the probabilityπ‘žπ‘— of switching toA𝑗 can be written as follows:

π‘žπ‘— = Γ–

π‘–βˆˆT𝑗

𝑝𝑖 Γ–

π‘–βˆ‰T𝑗

(1βˆ’ 𝑝𝑖)

. (5.35)

When the formalism of the switching systems is applied to the model (5.33), we get the following average state-transition matrix and the average input signal:

AΒ― =Iβˆ’P(Iβˆ’A), uΒ― =P u, (5.36) whereP=diag( [𝑝

1 𝑝

2 Β· Β· Β· 𝑝𝑁])is a diagonal matrix consisting of the update probabilities of the model (5.33). So, the fixed point of the average system is the same as the fixed point of the pair (A,u), namely, Β―x=(Iβˆ’A)-1u irrespective of the update probabilities. Furthermore, the point Β―x satisfies Β―x=A𝑗xΒ― +u𝑗 for all 1≀ 𝑗 ≀ 2𝑁. Thus, the random vector xπ‘˜ converges to Β―x as long as the condition (5.13) is met.

Unlike the randomized Kaczmarz and Gauss-Seidel algorithms, random asyn- chronous fixed-point iterations may not converge for an arbitrary set of probabilities.

Nevertheless, the convergence of the updates can be verified via stability of the ma- trix𝚽defined in (5.13). With the matrices in (5.34) and probabilities in (5.35), the matrix𝚽can be written explicitly as follows in the case of asynchronous fixed-point iterations (See Corollary 4.2 and the proof in Section 4.8.6):

𝚽=AΒ―βˆ—βŠ—AΒ― +

(Iβˆ’P) βŠ—P J

(Aβˆ—βˆ’I) βŠ— (Aβˆ’I)

, (5.37)

where J is a diagonal matrix of size 𝑁2 with diagonal entries being equal to the vectorized identity matrix of size𝑁. See (4.39).

We know that stability of A, i.e., 𝜌(A) < 1, is both necessary and sufficient for synchronous fixed-point iterations to converge. However, the most important ob- servation regarding (5.37) is that stability of the matrices 𝚽 and A do not imply each other. So, an unstable system (in the synchronous world) may converge with randomized asynchronicity. Furthermore, eigenvectors (and not just eigenvalues) of the matrix A are also important in determining the convergence in the random asynchronous case (see Section 4.4.4). We also note that the conditionAHP Aβ‰Ί P is shown to be sufficient for the convergence of the updates in Corollary 3.1 and Corollary 4.3, and it is more relaxed than the necessary condition in the non-random setting [32, 14] (see Lemma 3.1).

The randomized model (5.33) can also be extended to have time-varying input signals, where the vector u changes as the iterations progress. In this case, the updates become a randomized and asynchronous variant of the discrete-time state- space model, and it is possible to study the β€œfrequency response” of such systems in a statistical sense. This aspect is studied in Chapter 4.