• Tidak ada hasil yang ditemukan

Code Construction

CODED COMPUTATION FOR DISTRIBUTED GRADIENT DESCENT

7.3 Code Construction

7.2.2 Computational Trade-offs

In a distributed scheme that does not employ redundancy, the taskmaster has to wait for all the workers to finish in order to compute the full gradient. However, in the scheme outlined above, the taskmaster needs to wait for the fastest ๐‘“ machines to recover the full gradient. Clearly, this requires more computation by each machine.

Note that in the uncoded setting, the amount of computation that each worker does is 1๐‘› of the total work, whereas in the coded setting each machine performs a ๐‘ค๐‘˜ fraction of the total work. From (7.2), we know that if a scheme can tolerate ๐‘  stragglers, the fraction of computation that each worker does is ๐‘ค๐‘˜ โ‰ฅ ๐‘ +1

๐‘› . Therefore, the computation load of each worker increases by a factor of (๐‘ +1). As will be explained further in Section 7.5, there is a sweet spot for ๐‘ค๐‘˜ (and consequently๐‘ ) that minimizes the expected total time that the master waits in order to recover the full gradient update.

It is worth noting that it is often assumed [197, 123, 70] that the decoding vectors are precomputed for all possible combinations of returning machines, and the decoding cost is not taken into account in the total computation time. In a practical system, however, it is not very reasonable to compute and store all the decoding vectors, especially as there are ๐‘›๐‘“

such vectors, which grows quickly with๐‘›. In this work, we introduce an online algorithm for computing the decoding vectors on the fly, for the indices of the ๐‘“ workers that respond first. The approach is based on the idea of inverting Vandermonde matrices, which can be done very efficiently. In the sequel, we show how to construct an encoding matrixBfor any๐‘ค , ๐‘˜ and๐‘›, such that the system is resilient to ๐‘ค ๐‘›

๐‘˜

โˆ’1 stragglers, along with an efficient algorithm for computing the decoding vectors{aF :F โŠ‚ [๐‘›],|F | = ๐‘“}.

7.3.1 Balanced Mask Matrices

We will utilize techniques from [90, 91] to construct the matrixM(and thenB). For that, we present the following definition.

Definition 7(Balanced Matrix). A matrixM โˆˆ {0,1}๐‘›ร—๐‘˜ is column (row)-balanced if for fixed row (column) weight, the weights of any two columns (rows) differ by at most 1.

Ultimately, we are interested in a matrixMwith row weight๐‘คthat prescribes a mask for the encoding matrixB. As an example, let๐‘› =8,๐‘˜ =4 and๐‘ค=3. Then,Mis given by

M=

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

1 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

, (7.5)

where each column is of weight ๐‘›๐‘ค๐‘˜ =6. The following algorithm produces a balanced mask matrix. For a fixed column weight๐‘‘, each row has weight either ๐‘˜ ๐‘‘

๐‘›

or๐‘˜ ๐‘‘ ๐‘›

. Algorithm 6RowBalancedMaskMatrix(๐‘›,๐‘˜,๐‘‘,๐‘ก)

Input:

๐‘›: Number of rows ๐‘˜: Number of columns ๐‘‘: Weight of each column ๐‘ก: Offset parameter

Output: Row-balancedMโˆˆ {0,1}๐‘›ร—๐‘˜ Mโ†0๐‘›ร—๐‘˜

for ๐‘— =0 to ๐‘˜โˆ’1do for๐‘– =0 to๐‘‘โˆ’1do

๐‘Ÿ =(๐‘–+ ๐‘— ๐‘‘+๐‘ก)๐‘› โŠฒThe quantity (๐‘ฅ)๐‘›denotes๐‘ฅmodulo๐‘›. M๐‘Ÿ , ๐‘— =1

end for end for return M

As a result, when๐‘‘is chosen as๐‘›๐‘ค๐‘˜ โˆˆZ, all rows will be of weight๐‘ค. As an example, the matrixMin (7.5) is generated by calling RowBalancedMaskMatrix(8,4,6,0).

Algorithm 6 can be used to generate a mask matrixMfor the encoding matrixB:

The ๐‘—thcolumn ofBwill be chosen as a Reedโ€“Solomon codeword whose support is that of the ๐‘—th column ofM.

7.3.2 Correctness of Algorithm 6

To lighten notation, we prove correctness for ๐‘ก = 0. The general case follows immediately.

Proposition 34. Let๐‘˜ , ๐‘‘ and๐‘›be integers where๐‘‘ < ๐‘›. The row weights of matrix Mโˆˆ {0,1}๐‘›ร—๐‘˜ produced by Algorithm 6 for๐‘ก =0are

๐‘ค๐‘– =

๏ฃฑ๏ฃด

๏ฃด๏ฃด

๏ฃด

๏ฃฒ

๏ฃด๏ฃด

๏ฃด๏ฃด

๏ฃณ ๐‘˜ ๐‘‘

๐‘›

, ๐‘– โˆˆ {0, . . . ,(๐‘˜ ๐‘‘โˆ’1)๐‘›}, ๐‘˜ ๐‘‘

๐‘›

, ๐‘– โˆˆ {(๐‘˜ ๐‘‘)๐‘›, . . . , ๐‘›โˆ’1}. Proof. The nonzero entries in column ๐‘— ofMare given by

S๐‘— ={๐‘— ๐‘‘ , . . . ,(๐‘—+1)๐‘‘โˆ’1}๐‘›,

where the subscript๐‘›denotes reducing the elements of the set modulo๐‘›. Collectively, the nonzero indices in all columns are given by

S ={0, . . . ๐‘‘โˆ’1, . . . ,(๐‘˜โˆ’1)๐‘‘ , . . . ๐‘˜ ๐‘‘โˆ’1}๐‘›.

In case๐‘› | ๐‘˜ ๐‘‘, each element inS, after reducing modulo๐‘›, appears the same number of times. As a result, those indices correspond to columns of equal weight, namely

๐‘˜ ๐‘‘

๐‘› . Hence, the two cases of๐‘ค๐‘– are identical along with their corresponding index sets.

In the case where๐‘›- ๐‘˜ ๐‘‘, each of the first ๐‘˜ ๐‘‘ ๐‘›

๐‘›elements, after reducing modulo๐‘›, appears the same number of times. As a result, the nonzero entries corresponding to those indices are distributed evenly amongst the๐‘›rows, each of which is of weight

๐‘˜ ๐‘‘ ๐‘›

. The remaining indices{๐‘˜ ๐‘‘ ๐‘›

๐‘›, . . . , ๐‘˜ ๐‘‘โˆ’1}๐‘›contribute an additional nonzero entry to their respective rows, those indexed by{0, . . . ,(๐‘˜ ๐‘‘โˆ’1)๐‘›}. Finally, we have that the first(๐‘˜ ๐‘‘)๐‘›rows are of weight ๐‘˜ ๐‘‘

๐‘›

+1=๐‘˜ ๐‘‘ ๐‘›

, while the remaining ones are of weight ๐‘˜ ๐‘‘

๐‘›

.

Now consider the case when ๐‘ก is not necessarily equal to zero. This amounts to shifting (cyclically) the entries in each column by๐‘ก positions downwards. As a result, the rows themselves are shifted by the same amount, allowing us to conclude the following.

Corollary 35. Let๐‘˜ , ๐‘‘, and๐‘›be integers where๐‘‘ < ๐‘›. The row weights of matrix ๐‘€ โˆˆ {0,1}๐‘›ร—๐‘˜ produced by Algorithm 6 are

๐‘ค๐‘– =

๏ฃฑ๏ฃด

๏ฃด๏ฃด

๏ฃด

๏ฃฒ

๏ฃด๏ฃด

๏ฃด๏ฃด

๏ฃณ ๐‘˜ ๐‘‘

๐‘›

๐‘– โˆˆ {๐‘ก , . . . ,(๐‘ก+๐‘˜ ๐‘‘โˆ’1)๐‘›}, ๐‘˜ ๐‘‘

๐‘›

๐‘– โˆˆ {0, ๐‘กโˆ’1} โˆช {(๐‘ก+๐‘˜ ๐‘‘)๐‘›, . . . , ๐‘›โˆ’1}. 7.3.3 Reedโ€“Solomon Codes

This subsection provides a quick overview of Reedโ€“Solomon Codes. A Reedโ€“

Solomon code of length ๐‘› and dimension ๐‘“ is a linear subspace RS[๐‘›, ๐‘“] of C๐‘› corresponding to the evaluation of polynomials of degree less than ๐‘“ with coefficients inCon a set of๐‘›distinct points{๐›ผ1, . . . , ๐›ผ๐‘›}, also chosen fromC. When๐›ผ๐‘– =๐›ผ๐‘–, where๐›ผโˆˆCis an๐‘›throot of unity, the evaluations of the polynomial๐‘ก(๐‘ฅ)=ร๐‘“โˆ’1

๐‘–=0 ๐‘ก๐‘–๐‘ฅ๐‘– on{1, ๐›ผ, . . . , ๐›ผ๐‘›โˆ’1}corresponds to

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ ๐‘ก(1) ๐‘ก(๐›ผ) .. . ๐‘ก(๐›ผ๐‘›โˆ’1)

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

=

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

1 1 ยท ยท ยท 1

1 ๐›ผ ยท ยท ยท ๐›ผ๐‘“โˆ’1 ..

. .. .

.. .

.. . 1 ๐›ผ๐‘›โˆ’1 ยท ยท ยท ๐›ผ(๐‘›โˆ’1) (๐‘“โˆ’1)

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ ๐‘ก0 ๐‘ก1 .. . ๐‘ก๐‘“โˆ’1

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

=Gt. (7.6)

It is well-known that any ๐‘“ rows ofGform an invertible matrix, which implies that specifying any ๐‘“ evaluations{๐‘ก(๐›ผ๐‘–1), . . . , ๐‘ก(๐›ผ๐‘–๐‘“)}of a polynomial๐‘ก(๐‘ฅ)of degree at most ๐‘“ โˆ’1 characterizes it. In particular, fixing ๐‘“ โˆ’1 evaluations of the polynomial to zero characterizes๐‘ก(๐‘ฅ) uniquely up to scaling. This property will give us the ability to constructBfromM.

7.3.4 General construction

In case๐‘‘ = ๐‘ค ๐‘˜๐‘› โˆ‰ Z, the chosen row weight๐‘ค prevents the existence ofMwhere each column weight is minimal. We have to resort to Algorithm 7 that yieldsM comprised of two matricesMโ„ŽandM๐‘™ according to

M= h

Mโ„Ž M๐‘™

i .

The matricesMโ„ŽandM๐‘™ are constructed using Algorithm 6. Each column ofMโ„Ž

has weight ๐‘‘โ„Ž :=๐‘›๐‘ค ๐‘˜

and each column ofM๐‘™ has weight๐‘‘๐‘™ := ๐‘›๐‘ค ๐‘˜

. Note that according to (7.2), we require ๐‘‘๐‘™ โ‰ฅ 2 in order to tolerate a positive number of stragglers.

Algorithm 7Column-balanced Mask MatrixM