RANDOMIZED ALGORITHMS AS SWITCHING SYSTEMS
Lemma 5.2. The error correlation matrix converges to R, i.e.,
5.3 Randomized Kaczmarz and Gauss-Seidel Algorithms
Given a complex matrixAβCπΓπ and a complex vectoruβCπ, we will consider the following linear system:
A x=u, (5.18)
where we assume that π β₯ π, i.e., the system is overdetermined, and the matrixA has full column rank. When the system is consistent we will use xβ to denote the solution of the system. When the system is inconsistent, we will usexLS to denote the least-squares solution, that is,
xLS =arg min
π kAπβuk2
2 =(AHA)-1AHu. (5.19) 5.3.1 Randomized Kaczmarz Algorithm
In order to solve the linear system of equations in (5.18), the Kaczmarz algorithm [93] considers the following iterative updates on the solution vectorxπ:
xπ+1 =xπ + π’π
π βaH(π
π)xπ
ka(ππ)k2
2
a(π
π), (5.20)
whereaH(π) denotes the ππ‘ β rowof the matrixA, andππ denotes the index selected at the ππ‘ β iteration of the algorithm. In words, the Kaczmarz algorithm selects a row from the matrixA, and then updates the solutionxπ accordingly.
In the randomized variant of the algorithm, the ππ‘ β row is selected randomly and independently with probability ππ [81, 165]. In this case, the Kaczmarz algorithm can be represented as a randomly switching system, where the setsAandUconsist of the following elements for 1β€ π β€ π:
Aπ =Iβ 1 ka(π)k2
2
a(π) aH(π), uπ = π’π ka(π)k2
2
a(π). (5.21)
Note thatAπβs are orthogonal projections, thus the Kaczmarz algorithm βswitches between orthogonal projections.β
When the formalism of the switching systems is applied to the Kaczmarz algorithm, we get the following average state-transition matrix and the average input signal:
AΒ― =IβAHP W-2A, uΒ― =AHP W-2u, (5.22) wherePand Ware diagonal matrices of size π with ππ‘ β diagonal entry being ππ and ka(π)k2, respectively. Thus, the fixed-point of the average system can be found as follows:
xΒ― =(AHP W-2A)-1AHP W-2u. (5.23) So, the fixed-point of the average system depends on the update probabilities in general. Nevertheless, when the probabilities are selected as ππ = ka(π)k2
2/kAk2
F
(i.e.,P=W2/kAk2
F), the fixed-point of the average system becomes Β―x=xLS. When the linear system of equations in (5.18) is inconsistent, the fixed-point of the average system does not satisfy the individual systems in (5.21). Namely, xΒ― =AπxΒ― +uπ does not hold true for all 1 β€ π β€ π. Thus, the random vector xπ
does not converge to Β―x even when Β―x corresponds to xLS. As a result, Kaczmarz iterations cannot obtain the least-squares solution.
When the linear system of equations in (5.18) isconsistent, i.e., there existsxβ such thatAxβ =u, the fixed-point of the average system becomes Β―x=xβ irrespective of the update probabilities. Furthermore, Β―x=AπxΒ― +uπ holds true for all 1 β€ π β€ π. Thus, the random vector xπ converges to the solution of the linear system xβ as long as the condition given by Lemma 5.2 is met. In fact, the convergence of the randomized Kaczmarz algorithm can be guaranteed for any set of probabilities.
More precisely, we have the following:
Corollary 5.1. When the linear system in(5.18)is consistent, randomized Kaczmarz algorithm converges to the unique solution of (5.18)in the mean-squared sense (and almost surely) for any set of nonzero probabilities.
Proof. We note the following inequalities:
π½=
π
Γ
π=1
ππAβπ βAπ
π
Γ
π=1
ππIπ βAπ =Iπ βAΒ― (5.24)
=Iπ2β
Iπ β AHP W-2A
βΊIπ2, (5.25)
whereIπdenotes the identity matrix of sizeπ. The inequality in (5.24) follows from the fact that0 Aπ I. The strict inequality in (5.25) follows from the assumption that the matrixA has full column rank, and the probabilities satisfy ππ >0. The fact thatπ½ 0and the inequality (5.25) imply that π(π½) < 1. Then, Lemma 5.2
proves the claimed convergence.
We note that the mean-squared convergence of the randomized Kaczmarz algorithm was first proved explicitly in [165] for the probabilities ππ =ka(π)k2
2/kAk2
F. The almost sure convergence of the algorithm was shown in [36]. The study [46]
considered the optimal set of probabilities that minimizes the upper bound in (5.24), and [3] considered the optimal set of probabilities that minimizes π(π½) itself.
Sinceπ½is a positive semi-definite matrix by construction in the case of Kaczmarz algorithm, the optimal selection of the probabilities was based on semi-definite programming in both [46] and [3].
Although the first convergence proof of the randomized Kaczmarz algorithm is attributed to the study in [165], it is also possible to find convergence proofs for the algorithm in control theory literature. In particular, the study [23] considered the updates in (5.21) as an βadaptive filteringβ (see [23, Eq. (20)]) and proved the almost sure convergence of the iterations (see [23, Theorem 7]). In addition, the book [42] considered the same update scheme as an application of its results (see [42, Section 3.6.2]) and proved the almost sure convergence for the more general case of indices being selected according to anergodic Markov chain. We note that the original form of the Kaczmarz algorithm [93] considers the use of consecutive indices, which is, in fact, equivalent to a Markov chain on a directed cycle graph.
So, [42, Lemma 3.53] proves the convergence of the original Kaczmarz algorithm as well as its randomized variant from the viewpoint of switching systems.
Although the random vector xπ in Kaczmarz algorithm does not converge to the least squares solutionxLSin the case of inconsistent system of equations, it is in fact possible to obtain the solutionxLS using asample averaging. In this regard, we first define the sample average (of the firstπΎ iterations)yπΎ as follows:
yπΎ = 1 πΎ
πΎ
Γ
π=1
xπ. (5.26)
We now note thatE[xπ]isthe ensemble averageof the random vectorxπ, and as the iterations progress the ensemble average converges to Β―x. See (5.10). This is due to the stability of the matrix Β―Aproven in Corollary 5.1. Since the independent selection
of the indices forms anergodic Markov chain, the sample average converges to the ensemble average as more samples are used. More precisely,
πΎββlim kyπΎ βxkΒ― 2
2=0, (5.27)
which follows from the stability of the matrix π½. Thus, the random vector yπΎ
converges to Β―xin (5.23) in the mean-squared sense whether the system of equations are consistent or not. Although this convergence behavior holds true for any set of index selection probabilities, the fixed-point of the average system Β―x becomes the least-squares solution only when the probabilities are selected as P=W2/kAk2
F. With this particular selection of the probabilities, we can claim for the following:
ππ =ka(π)k2
2/ kAk2
F =β lim
πΎββkyπΎβxLSk2
2=0, (5.28) which shows that the sample averageyπΎ converges to the least-squares solution in the mean-squared sense.
5.3.2 Randomized Gauss-Seidel Algorithm
Another approach for solving the system of equations in (5.18) is the Gauss-Seidel algorithm, which updates the solution vectorxπiteratively according to the following scheme:
xπ+1 =xπ+ aHπ
π uβA xπ) kaππk2
2
eππ, (5.29)
whereaπ and eπ denote the ππ‘ β column of the matrix Aand the identity matrix I, respectively. The index selected at the ππ‘ β iteration is denoted byππ. In words, the Gauss-Seidel algorithm selects a column from the matrixA, and then updates only the corresponding entry of the solution vector. In the randomized variant, the ππ‘ β column is selected randomly and independently with probability ππ [107]. So, the algorithm can be represented as a randomly switching system, where the setsAand U consist of the following elements for 1β€ π β€ π:
Aπ =Iβ 1 kaπk2
2
eπ aHπ A, uπ =eπ aHπ u kaπk2
2
. (5.30)
We note thatAπβs defined in (5.30) satisfyA2π =Aπ, but they are not Hermitian, i.e., AHπ β Aπ, in general. Thus, the Gauss-Seidel algorithm βswitches betweenoblique projections.β
When the formalism of the switching systems is applied to the Gauss-Seidel algo- rithm, we get the following average state-transition matrix and the average input
signal:
AΒ― =IβP W-2AHA, uΒ― =P W-2AHu, (5.31) whereP andW are diagonal matrices of size π with ππ‘ β diagonal entry being ππ andkaπk2, respectively. So, the fixed-point of the average system becomes Β―x=xLS irrespective of the update probabilities. More importantly, unlike the Kaczmarz method, the point Β―xsatisfies Β―x=AπxΒ―+uπfor all 1 β€ π β€ πwhether the set of linear equations (5.18) is consistent or not. Thus, the random vectorxπindeed converges to the least-squares solution in the mean-squared sense as long as the condition given by Lemma 5.2 is met. In fact, the convergence of the randomized Gauss-Seidel algorithm can be guaranteed for any set of probabilities. More precisely, we have the following:
Corollary 5.2. The randomized Gauss-Seidel method converges to the least-squares solution of (5.18) in the mean-squared sense (and almost surely) for any set of nonzero probabilities.
Proof. We will show that the third statement in Lemma 5.3 holds, which in turn implies the convergence of the iterations. In this regard takeX=(AHA)-1, and note thatX 0. Then,
Xβ
π
Γ
π=1
ππAπX AHπ =P W-2 0, (5.32) where the positive-definiteness follows from the fact that all the probabilities are
nonzero. This proves the claim.
We note that the mean-squared convergence of the randomized Gauss-Seidel al- gorithm was first proved explicitly in [107] for the probabilities ππ = kaπk2
2/kAk2
F. We refer to [132, 80, 110, 122] (and references therein) for more results involving randomized Kaczmarz and Gauss-Seidel algorithms and their extensions.
5.3.3 Randomized Asynchronous Fixed-Point Iterations
When the system in (5.18) is βsquare,β i.e.,π = π, fixed-point iterations provide an alternative approach for obtaining numerical solutions to linear systems of equations [30, 31, 190]. Asynchronous variants of fixed-point iterations are also studied in detail in non-random [32, 14, 19, 20] and random settings [9]. We have also studied random asynchronous fixed-point iterations are studied in the context of graph signal processing for distributed and asynchronous implementation of graph filters in Chapters 4 and 3.
For a given matrixAβCπΓπand an input signaluβCπ, in this section we consider the following random asynchronous fixed-point iterations:
(xπ+1)π =


ο£²

ο£³
(A xπ)π+π’π, w.p. ππ, (xπ)π, w.p. 1βππ,
(5.33) where theππ‘ β index of the vectorxπ is updated with probability ππindependently in every iteration. So, there are 2π different ways of updating the vectorxπ in every iteration, and the updates in (5.33) can be written as a randomly switching system with the setsAandUconsisting of 2π elements.
More precisely, first enumerate all subsets of{1,Β· Β· Β· , π}, and let Tπ denote the ππ‘ β subset. Then, the ππ‘ β element of the sets A andU can be written as follows for 1β€ π β€ 2π:
Aπ =IβDTπ IβA
, uπ =DTπ u, (5.34)
whereDTπ is a diagonal matrix that has 1βs at the indices belonging to the setTπ and 0 elsewhere. Furthermore, the probabilityππ of switching toAπ can be written as follows:
ππ = Γ
πβTπ
ππ Γ
πβTπ
(1β ππ)
. (5.35)
When the formalism of the switching systems is applied to the model (5.33), we get the following average state-transition matrix and the average input signal:
AΒ― =IβP(IβA), uΒ― =P u, (5.36) whereP=diag( [π
1 π
2 Β· Β· Β· ππ])is a diagonal matrix consisting of the update probabilities of the model (5.33). So, the fixed point of the average system is the same as the fixed point of the pair (A,u), namely, Β―x=(IβA)-1u irrespective of the update probabilities. Furthermore, the point Β―x satisfies Β―x=AπxΒ― +uπ for all 1β€ π β€ 2π. Thus, the random vector xπ converges to Β―x as long as the condition (5.13) is met.
Unlike the randomized Kaczmarz and Gauss-Seidel algorithms, random asyn- chronous fixed-point iterations may not converge for an arbitrary set of probabilities.
Nevertheless, the convergence of the updates can be verified via stability of the ma- trixπ½defined in (5.13). With the matrices in (5.34) and probabilities in (5.35), the matrixπ½can be written explicitly as follows in the case of asynchronous fixed-point iterations (See Corollary 4.2 and the proof in Section 4.8.6):
π½=AΒ―ββAΒ― +
(IβP) βP J
(AββI) β (AβI)
, (5.37)
where J is a diagonal matrix of size π2 with diagonal entries being equal to the vectorized identity matrix of sizeπ. See (4.39).
We know that stability of A, i.e., π(A) < 1, is both necessary and sufficient for synchronous fixed-point iterations to converge. However, the most important ob- servation regarding (5.37) is that stability of the matrices π½ and A do not imply each other. So, an unstable system (in the synchronous world) may converge with randomized asynchronicity. Furthermore, eigenvectors (and not just eigenvalues) of the matrix A are also important in determining the convergence in the random asynchronous case (see Section 4.4.4). We also note that the conditionAHP AβΊ P is shown to be sufficient for the convergence of the updates in Corollary 3.1 and Corollary 4.3, and it is more relaxed than the necessary condition in the non-random setting [32, 14] (see Lemma 3.1).
The randomized model (5.33) can also be extended to have time-varying input signals, where the vector u changes as the iterations progress. In this case, the updates become a randomized and asynchronous variant of the discrete-time state- space model, and it is possible to study the βfrequency responseβ of such systems in a statistical sense. This aspect is studied in Chapter 4.