Problem Setup - Entropy Vectors, Frames, and Constrained Coding

5.1.1 Prior Work

The problem of constructing error-correcting codes with constrained encoding has been addressed by a variety of authors. Dau et al. [28–30] considered the problem of finding linear MDS codes with constrained generator matrices. They have shown that, under certain assumptions, such codes exist over large enough finite fields, as well as over small fields in a special case. A similar problem known as the weakly secure data exchange problem was studied in [111], [112]. The problem deals with a set of users, each with a subset of messages, who are interested in broadcasting their information securely when an eavesdropper is present. In particular, the authors of [112] conjecture the existence of secure codes based on Reed-Solomon codes and present a randomized algorithm to produce them.

The problem was also considered in the context of multisource multicast network coding in [33,50,51].

In [51], the capacity region of a simple multiple access network with three sources is achieved using Reed-Solomon codes. An analogous result is derived in [50] for general multicast networks with 3 sources using Gabidulin codes.

There has been a recent line of work involving what are known as locally repairable codes (LRCs), in which every parity symbol is a function of a predetermined set of data symbols. Codes with local repair properties were described as early as 2007 in the works of [24, 54, 58]. In [48], Gopalan et al introduced bounds on code distance in terms of the locality constraints of LRCs, and since then there have been a number of new specific code constructions and extensions of these bounds [62,76,78,81,92].

Our work will also include theoretical distance bounds reminiscent of those in [48]. Another recent paper is that of Mazumdar [74] in which code symbols are represented as vertices of a partially connected graph. Each code symbol is a function of its neighbors and, if erased, can be recovered from them. Our code also utilizes a graph structure, though solely to describe the encoding procedure.

In other words, there is not necessarily a notion of an individual code symbol being repairable from a local subset of the other code symbols.

m1 m2 m3

c2 c3 c4 c5 c6 c7

Figure 5.1: A bipartite graph representing the coding constraints. Here, there are 3 message symbols and 7 codeword symbols. Eachci is a function of the message symbols to which it is connected. For example,c1is a function of{m1, m2}.

some of these signals. Alternatively, themi could be data files which must be stored in each member of a set of file servers. Each server might only have access to a local set of data files and seeks to store a function of these files, represented asci. In either case, we would like to select our encoding scheme subject to these constraints so that the original messagemcan be determined fromceven in the case that some of the symbolsci are erased or corrupted.

We can represent our encoding constraints in the form of a bipartite graph, G= (M,V,E), with vertex sets M of size s and V of size n representing the message symbols and codeword symbols respectively. As such, we will label the vertices inMas{mi}^si=1 and the vertices in V as {ci}ⁿi=1. A pair (mi, cj)∈ M × V is in the edge setE if and only ifmi ∈ I^cj, that is,cj is a function of mi. For example, in Figure 5.1 we haveI^c1 ={m1, m2} andI^c2 ={m2}.

Let us quickly establish some notation. For any subset mi ∈ M, we will let N(mi) denote the neighborhood ofmi inV:

N(mi) :={cj ∈ V : (mi, cj)∈ E}. Likewise, we will consider neighborhoods of arbitrary subsetsM⁰ ⊆ M:

N(M⁰) := [

m_i∈M⁰

N(mi).

We will denote neighborhoods of elementscj ∈ V and subsets V⁰ ⊆ V similarly. With this notation, it is clear thatN(cj) =I^cj.

When themiare assigned values fromFq, then eachcjhas an associated functionfj :F^|Nq ^(c^j^)|→Fq

which maps the set of values{mi∈ N(cj)}to a value ofcj. By abuse of notation, we will sometimes simply writecj =fj(m), with the understanding thatcj depends only on the coordinates ofmwhich are in N(cj). If we let [c]J be the subvector of c with elements indexed by J ⊆ {1, ..., n}, then

we will write fJ : F^|Nq ^({c^j ^:^{j∈J })|}→ F^{|J |}q for the function which sends [m]N(c_j :j∈J) to the vector [c]J = (fj(m), j ∈ J). Under this notation, we have c = f[n](m), where [n] := {1, ..., n}. If we restrict the functionsfj(·) to be linear, thenC becomes a linear code.

If we define

C:={c∈Fⁿq : ∃m∈F^sq s.t.c=f[n](m)},

thenC is the set of all valid codewords, which is an error-correcting code of lengthnand size at most q^s. Letd(C) be the minimum distance of this code:

d(C) := min

{c1,c2∈C:c16=c2}dH(c1,c₂),

where dH(·,·) denotes the Hamming distance between two vectors. In the case that our fj(·) are linear, we have the following well-known equivalent definition of the code’s minimum distance:

Lemma 14. IfCis a linear code, thend(C) = minc∈C||c||^H, where||c||^H is the Hamming weight (the number of nonzero entries) ofc.

Proof. Since C is linear, the all-zero codeword 0 is in C. Also, for any c1,c2 ∈ C, we have that c1−c2 ∈ C. The result now follows from noting thatdH(c1,c2) =dH(c1−c2,0) =||c1−c2||^H, so d(C) ≤ minc∈C||c||^H. On the other hand, for any c ∈ C we have ||c||^H = dH(c,0), so the reverse inequality also holds.

Let us assume our functions fj(·) are linear, and C a linear code. This means that for each j ∈ {1, ..., n}, there is a column vector gj ∈ F^s×1q , such that cj = fj(m) = m·gj. Since cj is a function of only themi∈ N(cj), we see that the support ofgj must lie in the entries indexed by the elements ofN(cj). If we concatenate the columnsgj, we form the generator matrix ofC,

G= [g1, . . . ,g_n]∈F^s×nq .

For any message vectorm, the corresponding codeword will be given byc=mG.

We can describe the support ofGby examining the adjacency matrixA∈ {0,1}^s×nof the bipartite graphG= (M,V,E) describing our code:

[A]i,j:=







1 if (mi, cj)∈ E 0 otherwise

. (5.1)

Thus thej^thcolumn ofAhas support precisely onN(cj). Hence by our discussion above, a matrix Gwill be a “valid” generator matrix for a codeCwith constraints defined by the bipartite graphG if

the support ofGis a subset of the support ofA. In the example given in Figure 5.1, our adjacency matrix is







1 0 0 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1







. (5.2)

The choice of support entries for a valid generator matrix G determines both the rank of the code (which can be between 0 and s) and its minimum distance. In general, we seek to find a valid generator matrix which produces a full-rank code (yielding the maximum number of distinct codewords,q^s) while simultaneously maximizing the minimum distance of our code (allowing us to correctly determine a codeword even in the presence of up to d^d(C)2 e −1 errors). Furthermore, we would like to ensure efficient methods to decode our codewords in the presence of errors. To this end, we will look to constructing our codes from Reed-Solomon codes, a common class of error-correcting

Dalam dokumen Entropy Vectors, Frames, and Constrained Coding (Halaman 102-105)