CODED COMPUTATION FOR DISTRIBUTED GRADIENT DESCENT
7.3 Code Construction
7.2.2 Computational Trade-offs
In a distributed scheme that does not employ redundancy, the taskmaster has to wait for all the workers to finish in order to compute the full gradient. However, in the scheme outlined above, the taskmaster needs to wait for the fastest ๐ machines to recover the full gradient. Clearly, this requires more computation by each machine.
Note that in the uncoded setting, the amount of computation that each worker does is 1๐ of the total work, whereas in the coded setting each machine performs a ๐ค๐ fraction of the total work. From (7.2), we know that if a scheme can tolerate ๐ stragglers, the fraction of computation that each worker does is ๐ค๐ โฅ ๐ +1
๐ . Therefore, the computation load of each worker increases by a factor of (๐ +1). As will be explained further in Section 7.5, there is a sweet spot for ๐ค๐ (and consequently๐ ) that minimizes the expected total time that the master waits in order to recover the full gradient update.
It is worth noting that it is often assumed [197, 123, 70] that the decoding vectors are precomputed for all possible combinations of returning machines, and the decoding cost is not taken into account in the total computation time. In a practical system, however, it is not very reasonable to compute and store all the decoding vectors, especially as there are ๐๐
such vectors, which grows quickly with๐. In this work, we introduce an online algorithm for computing the decoding vectors on the fly, for the indices of the ๐ workers that respond first. The approach is based on the idea of inverting Vandermonde matrices, which can be done very efficiently. In the sequel, we show how to construct an encoding matrixBfor any๐ค , ๐ and๐, such that the system is resilient to ๐ค ๐
๐
โ1 stragglers, along with an efficient algorithm for computing the decoding vectors{aF :F โ [๐],|F | = ๐}.
7.3.1 Balanced Mask Matrices
We will utilize techniques from [90, 91] to construct the matrixM(and thenB). For that, we present the following definition.
Definition 7(Balanced Matrix). A matrixM โ {0,1}๐ร๐ is column (row)-balanced if for fixed row (column) weight, the weights of any two columns (rows) differ by at most 1.
Ultimately, we are interested in a matrixMwith row weight๐คthat prescribes a mask for the encoding matrixB. As an example, let๐ =8,๐ =4 and๐ค=3. Then,Mis given by
M=
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
1 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
, (7.5)
where each column is of weight ๐๐ค๐ =6. The following algorithm produces a balanced mask matrix. For a fixed column weight๐, each row has weight either ๐ ๐
๐
or๐ ๐ ๐
. Algorithm 6RowBalancedMaskMatrix(๐,๐,๐,๐ก)
Input:
๐: Number of rows ๐: Number of columns ๐: Weight of each column ๐ก: Offset parameter
Output: Row-balancedMโ {0,1}๐ร๐ Mโ0๐ร๐
for ๐ =0 to ๐โ1do for๐ =0 to๐โ1do
๐ =(๐+ ๐ ๐+๐ก)๐ โฒThe quantity (๐ฅ)๐denotes๐ฅmodulo๐. M๐ , ๐ =1
end for end for return M
As a result, when๐is chosen as๐๐ค๐ โZ, all rows will be of weight๐ค. As an example, the matrixMin (7.5) is generated by calling RowBalancedMaskMatrix(8,4,6,0).
Algorithm 6 can be used to generate a mask matrixMfor the encoding matrixB:
The ๐thcolumn ofBwill be chosen as a ReedโSolomon codeword whose support is that of the ๐th column ofM.
7.3.2 Correctness of Algorithm 6
To lighten notation, we prove correctness for ๐ก = 0. The general case follows immediately.
Proposition 34. Let๐ , ๐ and๐be integers where๐ < ๐. The row weights of matrix Mโ {0,1}๐ร๐ produced by Algorithm 6 for๐ก =0are
๐ค๐ =
๏ฃฑ๏ฃด
๏ฃด๏ฃด
๏ฃด
๏ฃฒ
๏ฃด๏ฃด
๏ฃด๏ฃด
๏ฃณ ๐ ๐
๐
, ๐ โ {0, . . . ,(๐ ๐โ1)๐}, ๐ ๐
๐
, ๐ โ {(๐ ๐)๐, . . . , ๐โ1}. Proof. The nonzero entries in column ๐ ofMare given by
S๐ ={๐ ๐ , . . . ,(๐+1)๐โ1}๐,
where the subscript๐denotes reducing the elements of the set modulo๐. Collectively, the nonzero indices in all columns are given by
S ={0, . . . ๐โ1, . . . ,(๐โ1)๐ , . . . ๐ ๐โ1}๐.
In case๐ | ๐ ๐, each element inS, after reducing modulo๐, appears the same number of times. As a result, those indices correspond to columns of equal weight, namely
๐ ๐
๐ . Hence, the two cases of๐ค๐ are identical along with their corresponding index sets.
In the case where๐- ๐ ๐, each of the first ๐ ๐ ๐
๐elements, after reducing modulo๐, appears the same number of times. As a result, the nonzero entries corresponding to those indices are distributed evenly amongst the๐rows, each of which is of weight
๐ ๐ ๐
. The remaining indices{๐ ๐ ๐
๐, . . . , ๐ ๐โ1}๐contribute an additional nonzero entry to their respective rows, those indexed by{0, . . . ,(๐ ๐โ1)๐}. Finally, we have that the first(๐ ๐)๐rows are of weight ๐ ๐
๐
+1=๐ ๐ ๐
, while the remaining ones are of weight ๐ ๐
๐
.
Now consider the case when ๐ก is not necessarily equal to zero. This amounts to shifting (cyclically) the entries in each column by๐ก positions downwards. As a result, the rows themselves are shifted by the same amount, allowing us to conclude the following.
Corollary 35. Let๐ , ๐, and๐be integers where๐ < ๐. The row weights of matrix ๐ โ {0,1}๐ร๐ produced by Algorithm 6 are
๐ค๐ =
๏ฃฑ๏ฃด
๏ฃด๏ฃด
๏ฃด
๏ฃฒ
๏ฃด๏ฃด
๏ฃด๏ฃด
๏ฃณ ๐ ๐
๐
๐ โ {๐ก , . . . ,(๐ก+๐ ๐โ1)๐}, ๐ ๐
๐
๐ โ {0, ๐กโ1} โช {(๐ก+๐ ๐)๐, . . . , ๐โ1}. 7.3.3 ReedโSolomon Codes
This subsection provides a quick overview of ReedโSolomon Codes. A Reedโ
Solomon code of length ๐ and dimension ๐ is a linear subspace RS[๐, ๐] of C๐ corresponding to the evaluation of polynomials of degree less than ๐ with coefficients inCon a set of๐distinct points{๐ผ1, . . . , ๐ผ๐}, also chosen fromC. When๐ผ๐ =๐ผ๐, where๐ผโCis an๐throot of unity, the evaluations of the polynomial๐ก(๐ฅ)=ร๐โ1
๐=0 ๐ก๐๐ฅ๐ on{1, ๐ผ, . . . , ๐ผ๐โ1}corresponds to
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ ๐ก(1) ๐ก(๐ผ) .. . ๐ก(๐ผ๐โ1)
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
=
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
1 1 ยท ยท ยท 1
1 ๐ผ ยท ยท ยท ๐ผ๐โ1 ..
. .. .
.. .
.. . 1 ๐ผ๐โ1 ยท ยท ยท ๐ผ(๐โ1) (๐โ1)
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ ๐ก0 ๐ก1 .. . ๐ก๐โ1
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
=Gt. (7.6)
It is well-known that any ๐ rows ofGform an invertible matrix, which implies that specifying any ๐ evaluations{๐ก(๐ผ๐1), . . . , ๐ก(๐ผ๐๐)}of a polynomial๐ก(๐ฅ)of degree at most ๐ โ1 characterizes it. In particular, fixing ๐ โ1 evaluations of the polynomial to zero characterizes๐ก(๐ฅ) uniquely up to scaling. This property will give us the ability to constructBfromM.
7.3.4 General construction
In case๐ = ๐ค ๐๐ โ Z, the chosen row weight๐ค prevents the existence ofMwhere each column weight is minimal. We have to resort to Algorithm 7 that yieldsM comprised of two matricesMโandM๐ according to
M= h
Mโ M๐
i .
The matricesMโandM๐ are constructed using Algorithm 6. Each column ofMโ
has weight ๐โ :=๐๐ค ๐
and each column ofM๐ has weight๐๐ := ๐๐ค ๐
. Note that according to (7.2), we require ๐๐ โฅ 2 in order to tolerate a positive number of stragglers.
Algorithm 7Column-balanced Mask MatrixM