Systematic Codes - Entropy Vectors, Frames, and Constrained Coding

In many scenarios, it is desirable to have thes symbols our original message m appear as a subset of the symbols of the corresponding codeword c. This allows m to be retrieved immediately in the absence of errors in c without alluding to a lookup table, inverting the function fM : m 7→ c, or performing any other method of decoding which could be costly in computation or storage. For example, if our original message symbolsm = [m1, ..., ms] collectively represent a collection of data files, the codeword symbolsc= [c1, ..., cn] could represent encoded files stored inn different servers,

wheren > sto protect the data in the case of some servers crashing. Supposeci=mifori= 1, ..., s.

Then in the case where crashes occur only in the servers corresponding to cs+1, ..., cn, we can still easily access our original datam= [c1, ..., cs], which can be used to quickly recompute the cj in the servers which have crashed (forj > s).

Definition 6. Let sandn be integers,n≥s, andq a power of a prime. For any vector c∈Fⁿq and any subsetI ⊆[n], let[c]I denote the subvector ofcin the entries indexed byI. Letf : F^sq →Fⁿq be a function such that for some fixed subsetI^sys⊆[n] of sizes, we have[f(m)]Isys=m. Then the set C={c∈Fⁿq : c=f(m),m∈F^sq}is called a systematic code, or a code in systematic form. For any c= [c1, ..., cn]∈ C, the symbolscj,j∈ I^sys, are called the systematic symbols ofc, and the remaining cj,j /∈ I^sys, are the parity symbols.

If C is a linear code with generator matrix G∈ F^s×nq , then C being systematic is equivalent to the columns of thes×sidentity matrixIs, arising as a subset of the columns ofG. Let us examine what this means in the context of codes with constraints. As before, letG= (M,V,E) be a bipartite graph with|M|=sand|V|=n, where we identify the message symbolsm1, ..., ms with the vertices ofMand the codewords symbols c1, ..., cn with those of V. For each cj ∈ V, we have an associated functioncj =fj({mi ∈ N(cj)}). Thus ifcj is a systematic symbol in a systematic code C such that cj =mi, it must be thatmi ∈ N(cj). In other words, (mi, cj) ∈ E. On this note, we refer to the following definition from basic graph theory:

Definition 7. Let G= (M,V,E) be a bipartite graph. A subset E ⊂ E˜ is called a matching for G if no two edges inE˜ share a common vertex. E˜is said to be a maximal matching if it is not a proper subset of any other matching. A subsetS ⊆ M ∪ V is said to be covered by E˜ if each vertex in S is incident to an edge inE˜.

Under this terminology, the following is clear:

Lemma 15. Let G = (M,V,E) be a bipartite graph, with|M|=sand |V|=n. Then there exists a systematic codeC={c∈Fⁿq : c=f[n](m), m∈F^sq} for somef[n](·)which fits the constraints of G only if there is anM-covering matchingE ⊆ E˜ forG.

Proof. Let I^sys ⊆ [n] be the indices of the systematic symbols of each c = [c1, ..., cn] ∈ C. For eachj ∈ I^sys, let ij ∈[s] be such that cj = mi_j. Since C is constrained by G, we necessarily have (mij, cj)∈ E. Note that by the nature of the systematic code, for any two distinctj1 andj2 in I^sys, we necessarily haveij1 6=ij2. Furthermore, for anymi ∈ M, there must be some j ∈ I^sys such that i=ij. Thus, the set of edges ˜E:={(mij, cj) : j∈ I^sys}is a matching forGwhich covers M.

A crucial tool in examining matchings is Hall’s Theorem:

Theorem 27 (Hall’s Theorem). Let G = (M,V,E) be a bipartite graph. An M-covering matching exists if and only if|M⁰| ≤ |N(M⁰)|for all subsets M⁰⊆ M.

Proof. This is a well-known result in graph theory, proven by Philip Hall in [52]. An accessible proof appears on p. 53 of [67].

5.5.1 Systematic Code Construction Using Reed-Solomon Codes

We now present a sufficient condition on our bipartite constraint graphG= (M,V,E) which allows us to construct a systematic code meeting our constraints which achieves the upper bound on distance from Corollary 6. Our code will be a linear subcode of a Reed-Solomon code, and will have dimension equal to|M|. Loosely speaking, our construction relies on a sufficient amount of connectivity inG. Theorem 28. Let G= (M,V,E)be a bipartite graph where N(M) =V, with |M|=s and|V|=n.

Define the setA:={cj ∈ V : N(cj) = M}, the set of code symbols which are connected to all the message symbols. Letdmin:= minM⁰⊆M|N(M⁰)| − |M⁰|+ 1 andkmin:=n−dmin+ 1. Then ifq is a prime power greater than or equal to n, a linear code C ⊆Fⁿq can be constructed with a generator matrixG∈F^s×nq in systematic form provided thatkmin≥ |V \ A|.

Proof. By our hypotheses, we have

n=|V \ A|+|A| ≤kmin+|A|, (5.9) and by our definition ofkmin, this gives us |A| ≥dmin−1. LetB ⊆ A be a set of sizedmin−1, and set A^∗ :=A \ B, V^∗ :=V \ B, and E^∗ :={(mi, cj)∈ E : cj ∈ V^∗}. Then define the corresponding subgraph of our bipartite graph,G^∗= (M,V^∗,E^∗), in which we can see thatA^∗ is precisely the set of vertices inV^∗which are connected to all ofM. Its cardinality is

|A^∗|=|A| −(dmin−1). (5.10) To avoid confusion, for any subset M⁰ ⊆ M, we will denote the neighborhood of M⁰ in V^∗ as N^∗(M⁰), while still using the notationN(M⁰) to denote the neighborhood of M⁰in the entire setV. We can expressN^∗(M⁰) as the disjoint union (N(M⁰)\ A)t A^∗, so we have

|N^∗(M⁰)|=|N(M⁰)\ A|+|A^∗|. (5.11)

On the other hand, by the definition ofdminwe have

|M⁰| ≤ |N(M⁰)| −(dmin−1) =|N(M⁰)\ A|+|A| −(dmin−1). (5.12) Combining our relations from (5.10), (5.11), and (5.12), we obtain

|M⁰| ≤ |N^∗(M⁰)|, ∀M⁰⊆ M. (5.13) Thus we can apply Hall’s Theorem to the subgraphG^∗to find a matching ˜E ⊆ E^∗ which coversM. If we letc_j(i)be the vertex matched tomi, then we can write this matching as ˜E={(mi, c_j(i))}^si=1⊆ E^∗. Let ˜V ={cj(i)}^si=1 be the subset ofV^∗ which is covered by ˜E.

The symbol cj(i) will correspond to the systematic coordinate of our codeword which is equal to message symbolmi. As such, any edge (mi⁰, cj(i)), fori⁰ 6=i, is effectively ignored. As such, define the set of ignored edges

E^neg:={(mi, cj)∈ E : j∈V˜, j6=j(i)}.

LetA_E_˜be the adjacency matrix of the graph ˜G:= (M,V,E \E^neg), which is the graphGafter removing the ignored edges. Note that any any code fitting the constraints imposed by ˜G will automatically fit those of the original graph G. We claim that the number of zeros in any row of AE˜ is at most n−dmin. Indeed, each message symbol vertexmi is connected to one vertex in V^∗ and all dmin−1 vertices inB, so the corresponding row ofAE˜must have at leastdmin ones.

Now we can construct a linear code with a generator matrix G having the same support set as AE˜, and thus meeting the constraints imposed by the graph ˜G (and therefore G). We will form our code as a linear subcode of a Reed-Solomon code as described in Section 5.4. Select distinct elements {α1, ..., αn} ⊆ Fq, and form the generator matrix G_RS of equation (5.5) for an [n, kmin]q

Reed-Solomon code. To eachmi∈ M, associate the polynomialti(x) :=Q

{j : [AE˜]i,j=0}(x−αj). By our above discussion, we have deg(ti(x))≤n−dmin=kmin−1 for eachi. Now for eachi, expressing ti(x) in the form Pkmin

i⁰=1 ti,i⁰xⁱ⁰⁻¹, we define the coefficient vectorti:= _t ¹

i(αj(i))[ti,1, ..., ti,k_min], where we have normalized the polynomial’s coefficients so that its evaluation atαj(i)is 1. Then if we stack the vectorsti to form the matrixTas in (5.8), and set

G=TG_RS = ti(αj) ti(α_j(i))

, (5.14)

we see that Ghas zeros precisely in the locations of the zeros of A_E_˜, so it is the generator matrix for a linear codeC fitting our constraints. It is in systematic form, since the columns in the indices corresponding to{c_j(i)}^si=1form a permutation of the columns of thes×sidentity matrix. This also

immediately shows that the code is full ranks. Finally, the minimum distanced(C) of our code must be at least that of the [n, kmin]q Reed-Solomon code from which it is derived, thusd(C)≥dmin. But by Corollary 6, the reverse inequality also holds, and we see that we must haved(C) =dmin.

Dalam dokumen Entropy Vectors, Frames, and Constrained Coding (Halaman 109-113)