In many scenarios, it is desirable to have thes symbols our original message m appear as a subset of the symbols of the corresponding codeword c. This allows m to be retrieved immediately in the absence of errors in c without alluding to a lookup table, inverting the function fM : m 7→ c, or performing any other method of decoding which could be costly in computation or storage. For example, if our original message symbolsm = [m1, ..., ms] collectively represent a collection of data files, the codeword symbolsc= [c1, ..., cn] could represent encoded files stored inn different servers,
wheren > sto protect the data in the case of some servers crashing. Supposeci=mifori= 1, ..., s.
Then in the case where crashes occur only in the servers corresponding to cs+1, ..., cn, we can still easily access our original datam= [c1, ..., cs], which can be used to quickly recompute the cj in the servers which have crashed (forj > s).
Definition 6. Let sandn be integers,n≥s, andq a power of a prime. For any vector c∈Fnq and any subsetI ⊆[n], let[c]I denote the subvector ofcin the entries indexed byI. Letf : Fsq →Fnq be a function such that for some fixed subsetIsys⊆[n] of sizes, we have[f(m)]Isys=m. Then the set C={c∈Fnq : c=f(m),m∈Fsq}is called a systematic code, or a code in systematic form. For any c= [c1, ..., cn]∈ C, the symbolscj,j∈ Isys, are called the systematic symbols ofc, and the remaining cj,j /∈ Isys, are the parity symbols.
If C is a linear code with generator matrix G∈ Fs×nq , then C being systematic is equivalent to the columns of thes×sidentity matrixIs, arising as a subset of the columns ofG. Let us examine what this means in the context of codes with constraints. As before, letG= (M,V,E) be a bipartite graph with|M|=sand|V|=n, where we identify the message symbolsm1, ..., ms with the vertices ofMand the codewords symbols c1, ..., cn with those of V. For each cj ∈ V, we have an associated functioncj =fj({mi ∈ N(cj)}). Thus ifcj is a systematic symbol in a systematic code C such that cj =mi, it must be thatmi ∈ N(cj). In other words, (mi, cj) ∈ E. On this note, we refer to the following definition from basic graph theory:
Definition 7. Let G= (M,V,E) be a bipartite graph. A subset E ⊂ E˜ is called a matching for G if no two edges inE˜ share a common vertex. E˜is said to be a maximal matching if it is not a proper subset of any other matching. A subsetS ⊆ M ∪ V is said to be covered by E˜ if each vertex in S is incident to an edge inE˜.
Under this terminology, the following is clear:
Lemma 15. Let G = (M,V,E) be a bipartite graph, with|M|=sand |V|=n. Then there exists a systematic codeC={c∈Fnq : c=f[n](m), m∈Fsq} for somef[n](·)which fits the constraints of G only if there is anM-covering matchingE ⊆ E˜ forG.
Proof. Let Isys ⊆ [n] be the indices of the systematic symbols of each c = [c1, ..., cn] ∈ C. For eachj ∈ Isys, let ij ∈[s] be such that cj = mij. Since C is constrained by G, we necessarily have (mij, cj)∈ E. Note that by the nature of the systematic code, for any two distinctj1 andj2 in Isys, we necessarily haveij1 6=ij2. Furthermore, for anymi ∈ M, there must be some j ∈ Isys such that i=ij. Thus, the set of edges ˜E:={(mij, cj) : j∈ Isys}is a matching forGwhich covers M.
A crucial tool in examining matchings is Hall’s Theorem:
Theorem 27 (Hall’s Theorem). Let G = (M,V,E) be a bipartite graph. An M-covering matching exists if and only if|M0| ≤ |N(M0)|for all subsets M0⊆ M.
Proof. This is a well-known result in graph theory, proven by Philip Hall in [52]. An accessible proof appears on p. 53 of [67].
5.5.1 Systematic Code Construction Using Reed-Solomon Codes
We now present a sufficient condition on our bipartite constraint graphG= (M,V,E) which allows us to construct a systematic code meeting our constraints which achieves the upper bound on distance from Corollary 6. Our code will be a linear subcode of a Reed-Solomon code, and will have dimension equal to|M|. Loosely speaking, our construction relies on a sufficient amount of connectivity inG. Theorem 28. Let G= (M,V,E)be a bipartite graph where N(M) =V, with |M|=s and|V|=n.
Define the setA:={cj ∈ V : N(cj) = M}, the set of code symbols which are connected to all the message symbols. Letdmin:= minM0⊆M|N(M0)| − |M0|+ 1 andkmin:=n−dmin+ 1. Then ifq is a prime power greater than or equal to n, a linear code C ⊆Fnq can be constructed with a generator matrixG∈Fs×nq in systematic form provided thatkmin≥ |V \ A|.
Proof. By our hypotheses, we have
n=|V \ A|+|A| ≤kmin+|A|, (5.9) and by our definition ofkmin, this gives us |A| ≥dmin−1. LetB ⊆ A be a set of sizedmin−1, and set A∗ :=A \ B, V∗ :=V \ B, and E∗ :={(mi, cj)∈ E : cj ∈ V∗}. Then define the corresponding subgraph of our bipartite graph,G∗= (M,V∗,E∗), in which we can see thatA∗ is precisely the set of vertices inV∗which are connected to all ofM. Its cardinality is
|A∗|=|A| −(dmin−1). (5.10) To avoid confusion, for any subset M0 ⊆ M, we will denote the neighborhood of M0 in V∗ as N∗(M0), while still using the notationN(M0) to denote the neighborhood of M0in the entire setV. We can expressN∗(M0) as the disjoint union (N(M0)\ A)t A∗, so we have
|N∗(M0)|=|N(M0)\ A|+|A∗|. (5.11)
On the other hand, by the definition ofdminwe have
|M0| ≤ |N(M0)| −(dmin−1) =|N(M0)\ A|+|A| −(dmin−1). (5.12) Combining our relations from (5.10), (5.11), and (5.12), we obtain
|M0| ≤ |N∗(M0)|, ∀M0⊆ M. (5.13) Thus we can apply Hall’s Theorem to the subgraphG∗to find a matching ˜E ⊆ E∗ which coversM. If we letcj(i)be the vertex matched tomi, then we can write this matching as ˜E={(mi, cj(i))}si=1⊆ E∗. Let ˜V ={cj(i)}si=1 be the subset ofV∗ which is covered by ˜E.
The symbol cj(i) will correspond to the systematic coordinate of our codeword which is equal to message symbolmi. As such, any edge (mi0, cj(i)), fori0 6=i, is effectively ignored. As such, define the set of ignored edges
Eneg:={(mi, cj)∈ E : j∈V˜, j6=j(i)}.
LetAE˜be the adjacency matrix of the graph ˜G:= (M,V,E \Eneg), which is the graphGafter removing the ignored edges. Note that any any code fitting the constraints imposed by ˜G will automatically fit those of the original graph G. We claim that the number of zeros in any row of AE˜ is at most n−dmin. Indeed, each message symbol vertexmi is connected to one vertex in V∗ and all dmin−1 vertices inB, so the corresponding row ofAE˜must have at leastdmin ones.
Now we can construct a linear code with a generator matrix G having the same support set as AE˜, and thus meeting the constraints imposed by the graph ˜G (and therefore G). We will form our code as a linear subcode of a Reed-Solomon code as described in Section 5.4. Select distinct elements {α1, ..., αn} ⊆ Fq, and form the generator matrix GRS of equation (5.5) for an [n, kmin]q
Reed-Solomon code. To eachmi∈ M, associate the polynomialti(x) :=Q
{j : [AE˜]i,j=0}(x−αj). By our above discussion, we have deg(ti(x))≤n−dmin=kmin−1 for eachi. Now for eachi, expressing ti(x) in the form Pkmin
i0=1 ti,i0xi0−1, we define the coefficient vectorti:= t 1
i(αj(i))[ti,1, ..., ti,kmin], where we have normalized the polynomial’s coefficients so that its evaluation atαj(i)is 1. Then if we stack the vectorsti to form the matrixTas in (5.8), and set
G=TGRS = ti(αj) ti(αj(i))
, (5.14)
we see that Ghas zeros precisely in the locations of the zeros of AE˜, so it is the generator matrix for a linear codeC fitting our constraints. It is in systematic form, since the columns in the indices corresponding to{cj(i)}si=1form a permutation of the columns of thes×sidentity matrix. This also
immediately shows that the code is full ranks. Finally, the minimum distanced(C) of our code must be at least that of the [n, kmin]q Reed-Solomon code from which it is derived, thusd(C)≥dmin. But by Corollary 6, the reverse inequality also holds, and we see that we must haved(C) =dmin.