Chapter V: Coding over Sets for DNA Storage: Substitution Errors
5.5 Codes for a Single Substitution
≥ log©
« √1
2𝜖 𝑀 𝐿 𝑘
𝑘𝑘 ª
®
®
¬
− 𝑘log(𝑘) −𝑂(1)
=𝑘
log(√𝜖
2) +log(𝑀 𝐿))
−2𝑘log(𝑘) −𝑂(1)
≥ 𝑘log(𝑀 𝐿) −2𝑘log(𝑘) −𝑂(𝑘). □
.. .
.. .
.. .
Stringz1 Scrambledbits Scrambledbits x1
.. . x𝑀
a1
a𝑀
b𝜎(1)
b𝜎(𝑀)
c𝜋(1)
c𝜋(𝑀)
𝐿/3 2𝐿/3 𝐿
Ordered bya𝑖
.. .
.. .
.. .
Scrambledbits Stringz2 Scrambledbits x𝜎−1(1)
.. . x𝜎−1(𝑀)
a𝜎−1(1)
a−1𝜎 (𝑀)
b1
b𝑀
c𝜎−1(𝜋(1))
c𝜎−1(𝜋(𝑀))
𝐿/3 2𝐿/3 𝐿
Ordered byb𝑖
.. .
.. .
.. .
Scrambledbits Scrambledbits Stringz3
x𝜋−1(1)
.. . x𝜋−1(𝑀)
a𝜋−1(1)
a𝜋−1(𝑀)
b𝜋−1𝜎(1)
b𝜋−1𝜎(𝑀)
c1
c𝑀
𝐿/3 2𝐿/3 𝐿
Ordered byc𝑖
Figure 5.1: Illustration of single-substitution correcting codes over unordered sets.
𝐹(𝑑3) =({b1, . . . ,b𝑀}, 𝜎),
𝐹(𝑑5) =({c1, . . . ,c𝑀}, 𝜋), (5.6) wherea𝑖,b𝑖,c𝑖 ∈ {0,1}𝐿/3−1for every𝑖 ∈ [𝑀], the permutations𝜎and𝜋are in𝑆𝑀, and the indexing of{a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and{c𝑖}𝑀
𝑖=1is lexicographic. Further, letd2,d4, andd6be the binary strings that correspond to𝑑2, 𝑑4, and𝑑6, respectively, and let
s1= (a1, . . . ,a𝑀, b𝜎(1), . . . ,b𝜎(𝑀), c𝜋(1), . . . ,c𝜋(𝑀)),
s2= (a𝜎−1(1), . . . ,a𝜎−1(𝑀), b1, . . . ,b𝑀, c𝜎−1𝜋(1), . . . ,c𝜎−1𝜋(𝑀)),and
s3= (a𝜋−1(1), . . . ,a𝜋−1(𝑀), b𝜋−1𝜎(1), . . . ,b𝜋−1𝜎(𝑀),
c1, . . . ,c𝑀). (5.7)
Without loss of generality4 assume that there exists an integer 𝑡 for which the length|s𝑖|=(𝐿−3)𝑀 =2𝑡−𝑡−1 for all𝑖 ∈ [3]. Then, eachs𝑖 can be encoded by using asystematic [2𝑡−1,2𝑡 −𝑡 −1]2 Hamming code, by introducing𝑡 redundant bits. That is, the encoding function is of the forms𝑖 ↦→ (s𝑖, 𝐸𝐻(s𝑖)), where 𝐸𝐻(s𝑖) are the𝑡redundant bits, and𝑡 ≤ ⌈log(𝑀 𝐿)⌉. Similarly, we assume that there exists an integer ℎfor which the length |d𝑖| =2ℎ−ℎ−1 for𝑖 ∈ {2,4,6}, and let 𝐸𝐻(d𝑖) be the corresponding ℎ bits of redundancy, that result from encoding d𝑖 by using a [2ℎ−1,2ℎ− ℎ−1] Hamming code. By the properties of a Hamming code, and by the definition of ℎ, we have thatℎ ≤ ⌈log(𝑀)⌉.
The data 𝑑 ∈ 𝐷 is mapped to a codeword 𝐶 = {x1, . . . ,x𝑀} as follows, and the reader is encouraged to refer to Figure5.1for clarifications. First, we place{a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and {c𝑖}𝑀
𝑖=1 in the different thirds of thex𝑖’s, sorted by 𝜎 and 𝜋. That is, denotingx𝑖 =(𝑥𝑖,1, . . . , 𝑥𝑖, 𝐿), we define
(𝑥𝑖,1, . . . , 𝑥𝑖, 𝐿/3−1) =a𝑖,
(𝑥𝑖, 𝐿/3+1, . . . , 𝑥𝑖,2𝐿/3−1) =b𝜎(𝑖), and
(𝑥𝑖,2𝐿/3+1, . . . , 𝑥𝑖, 𝐿−1) =c𝜋(𝑖). (5.8) The remaining bits {𝑥𝑖, 𝐿/3}𝑀
𝑖=1, {𝑥𝑖,2𝐿/3}𝑀
𝑖=1, and {𝑥𝑖, 𝐿}𝑀
𝑖=1 are used to accommo- date the information bits of d2,d4,d6, and the redundancy bits {𝐸𝐻(s𝑖)}3
𝑖=1 and {𝐸𝐻(d𝑖)}𝑖∈{2,4,6}, in the following manner.
𝑥𝑖, 𝐿/3=
𝑑2,𝑖,
if𝑖 ∈ [𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉], 𝐸𝐻(d2)𝑖−(𝑀−⌈log𝑀 𝐿⌉−⌈log𝑀⌉),
if𝑖 ∈ [𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉ +1, 𝑀− ⌈log𝑀 𝐿⌉], ℎ
𝐸𝐻(s1)𝑖−(𝑀−⌈log𝑀 𝐿⌉),
if𝑖 ∈ [𝑀− ⌈log𝑀 𝐿⌉ +1, 𝑀],
4Every string can be padded with zeros to extend its length to 2𝑡−𝑡−1 for some𝑡. It is readily verified that this operation extends the string by at most a factor of two, and by the properties of the Hamming code, this will increase the number of redundant bits by at most 1.
𝑥𝑖,2𝐿/3=
𝑑4,𝜎−1(𝑖),
if𝜎−1(𝑖) ∈ [𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉], 𝐸𝐻(d4)𝜎−1(𝑖)−(𝑀−⌈log𝑀 𝐿⌉−⌈log𝑀⌉),
if𝜎−1(𝑖) ∈ [𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉ +1, 𝑀− ⌈log𝑀 𝐿⌉],
𝐸𝐻(s2)𝜎−1(𝑖)−(𝑀−⌈log𝑀 𝐿⌉),
if𝜎−1(𝑖) ∈ [𝑀− ⌈log𝑀 𝐿⌉ +1, 𝑀],
𝑥𝑖, 𝐿 =
𝑑6
,𝜋−1(𝑖),
if𝜋−1(𝑖) ∈ [𝑀 − ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉], 𝐸𝐻(d6)𝜋−1(𝑖)−(𝑀−⌈log𝑀 𝐿⌉−⌈log𝑀⌉),
if𝜋−1(𝑖) ∈ [𝑀 − ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉ +1, 𝑀− ⌈log𝑀 𝐿⌉],
𝐸𝐻(s3)𝜋−1(𝑖)−(𝑀−⌈log𝑀 𝐿⌉),
if𝜋−1(𝑖) ∈ [𝑀 − ⌈log𝑀 𝐿⌉ +1, 𝑀].
(5.9)
That is, if the strings{x𝑖}𝑀
𝑖=1are sorted according to the content of the bits(𝑥𝑖,1, . . . , 𝑥𝑖, 𝐿/3−1) =a𝑖, then the top𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉bits of the(𝐿/3)-th column5con- taind2, the middle⌈log𝑀⌉bits contain𝐸𝐻(d2), and the bottom⌈log𝑀 𝐿⌉bits con- tain𝐸𝐻(s1). Similarly, if the strings are sorted according to(𝑥𝑖, 𝐿/3+1, . . . , 𝑥𝑖,2𝐿/3−1) = b𝑖, then the top𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉bits of the(2𝐿/3)-th column containd4, the middle⌈log𝑀⌉bits contain𝐸𝐻(d4), and the bottom⌈log𝑀 𝐿⌉bits contain𝐸𝐻(s2), and so on. Equations (5.8) and (5.9) conclude the encoding function 𝐸 of Theo- rem5.5.1. It can be readily verified that𝐸is injective since different messages result in either different({a𝑖}𝑀
𝑖=1,{b𝑖}𝑀
𝑖=1,{c𝑖}𝑀
𝑖=1)or the same({a𝑖}𝑀
𝑖=1,{b𝑖}𝑀
𝑖=1,{c𝑖}𝑀
𝑖=1)with different (d2,d4,d6). In either case, the resulting codewords {x𝑖}𝑀
𝑖=1 of the two messages are different.
To verify that the image of𝐸is a 1-substitution code, observe first that since{a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and {c𝑖}𝑀
𝑖=1 are sets, it follows that any two strings in the same set are distinct. Hence, according to (5.8), it follows that𝑑𝐻(x𝑖,x𝑗) ≥3 for every distinct𝑖 and 𝑗 in [𝑀]. Therefore, no 1-substitution error can cause one x𝑖 to be equal to another, and consequently, the result of a 1-substitution error is always in {0,𝑀1}𝐿
.
5Sorting the strings{x𝑖}𝑖=1𝑀 by any ordering method provides a matrix in a natural way, and can consider columns in this matrix.
In what follows a decoding algorithm is presented, whose input is a codeword that was distorted by at most a single substitution, and its output is 𝑑. The algorithm is summarized in Algorithm3.
Algorithm 3: Decoding
Input: A word𝐶′ ∈ B1(𝐶) for some codeword𝐶. Output: The message𝑑encoded as𝐶.
Sort and index the strings in𝐶′={x′1, . . . ,x′𝑀}lexicographically;
Compute the strings ˆa𝑖, ˆb𝑖, and ˆc𝑖 for𝑖 ∈ [𝑀], according to (5.10);
Compute the stringss′1,s′2, ands′3according to (5.12);
Compute the strings𝐸𝐻(s1)′,𝐸𝐻(s2)′, and𝐸𝐻(s3)′according to (5.11);
Use Hamming decoder to decode(s′𝑖, 𝐸𝐻(s𝑖))and obtains𝑖for𝑖 ∈ [3];
According to Lemma5.5.1, we can apply majority vote on the recovered{s𝑖}3
𝑖=1
to obtain the correct strings{a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and{c𝑖}𝑀
𝑖=1, and the permutations𝜎 and𝜋. Then determine𝑑1,𝑑3,𝑑5using combinatorial map (5.6);
Compute(d′𝑖, 𝐸𝐻(d𝑖)′)𝑖 ∈ {2,4,6}according to (5.11) and use Hamming decoder to decode (d′𝑖, 𝐸𝐻(d𝑖)′)and obtaind𝑖 for𝑖 ∈ {2,4,6};
Output𝑑 =(𝑑1, 𝑑2, 𝑑3, 𝑑4, 𝑑5, 𝑑6).
Upon receiving a word 𝐶′ = {x′1, . . . ,x′𝑀} ∈ B1(𝐶) for some codeword𝐶 (once again, the indexing of the elements of𝐶′is lexicographic), we define
ˆ a𝑖 = (𝑥′
𝑖,1, . . . , 𝑥′
𝑖, 𝐿/3−1) bˆ𝑖 = (𝑥′
𝜏−1(𝑖), 𝐿/3+1, . . . , 𝑥′
𝜏−1(𝑖),2𝐿/3−1) (5.10) ˆ
c𝑖 = (𝑥′
𝜌−1(𝑖),2𝐿/3+1, . . . , 𝑥′
𝜌−1(𝑖), 𝐿−1), where 𝜏 is the permutation by which {x′𝑖}𝑀
𝑖=1 are sorted according to their 𝐿/3+ 1, . . . ,2𝐿/3−1 entries, and𝜌is the permutation by which they are sorted according to their 2𝐿/3+1, . . . , 𝐿−1 entries (we emphasize that𝜏and𝜌are unrelated to the original 𝜋and 𝜎, and those will be decoded later). Further, when ordering{x′𝑖}𝑀
𝑖=1
by either the lexicographic ordering, by𝜏, or by 𝜌, we obtaincandidates for each one ofd2, d4, d6, 𝐸𝐻(d2), 𝐸𝐻(d4), 𝐸𝐻(d6), 𝐸𝐻(s1), 𝐸𝐻(s2), and 𝐸𝐻(s3), that we similarly denote with an additional apostrophe6, as follows for𝑖 ∈ [𝑀].
𝑑′
2,𝑖 =𝑥′
𝑖, 𝐿/3,
for𝑖 ∈ [𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉], 𝐸𝐻(d2)𝑖′=𝑥′
𝑖+(𝑀−⌈log𝑀 𝐿⌉−⌈log𝑀⌉), 𝐿/3,
6That is, each one ofd′2,d′4, etc., is obtained fromd2,d4, etc., by at most a single substitution.
for𝑖 ∈ [ ⌈log𝑀⌉], 𝐸𝐻(s1)𝑖′ =𝑥′
𝑖+(𝑀−⌈log𝑀 𝐿⌉), 𝐿/3, for𝑖 ∈ [ ⌈log𝑀 𝐿⌉], 𝑑′
4,𝑖 =𝑥′
𝜏(𝑖),2𝐿/3,
for𝑖 ∈ [𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉], 𝐸𝐻(d4)𝑖′=𝑥′
𝜏(𝑖+(𝑀−⌈log𝑀 𝐿⌉−⌈log𝑀⌉)),2𝐿/3, for𝑖 ∈ [ ⌈log𝑀⌉],
𝐸𝐻(s2)𝑖′ =𝑥′
𝜏(𝑖+(𝑀−⌈log𝑀 𝐿⌉)),2𝐿/3, for𝑖 ∈ [ ⌈log𝑀 𝐿⌉], 𝑑′
6,𝑖 =𝑥′
𝜌(𝑖), 𝐿,
for𝑖 ∈ [𝑀− ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉], 𝐸𝐻(d6)𝑖′=𝑥′
𝜌(𝑖+(𝑀−⌈log𝑀 𝐿⌉−⌈log𝑀⌉)), 𝐿, for𝑖 ∈ [ ⌈log𝑀⌉],
𝐸𝐻(s3)𝑖′ =𝑥′
𝜌(𝑖+(𝑀−⌈log𝑀 𝐿⌉)), 𝐿,
for𝑖 ∈ [ ⌈log𝑀 𝐿⌉]. (5.11)
For example, if we order {x′𝑖}𝑀
𝑖=1 according to𝜏, then the bottom ⌈log(𝑀 𝐿)⌉ bits of the (2𝐿/3)-th column are𝐸𝐻(s2)′, the middle⌈log𝑀⌉bits are 𝐸𝐻(d4)′, and the top𝑀 − ⌈log𝑀 𝐿⌉ − ⌈log𝑀⌉ bits ared′4(see Eq. (5.9)). Now, let
s′1= (aˆ1, . . . ,aˆ𝑀, bˆ𝜏(1), . . . ,bˆ𝜏(𝑀), cˆ𝜌(1), . . . ,cˆ𝜌(𝑀)),
s′2= (aˆ𝜏−1(1), . . . ,aˆ𝜏−1(𝑀), bˆ1, . . . ,bˆ𝑀, ˆ
c𝜏−1𝜌(1), . . . ,cˆ𝜏−1𝜌(𝑀)), and (5.12) s′3= (aˆ𝜌−1(1), . . . ,aˆ𝜌−1(𝑀), bˆ𝜌−1𝜏(1), . . . ,bˆ𝜌−1𝜏(𝑀),
ˆ
c1, . . . ,cˆ𝑀).
The following lemma shows that at least two of the aboves′𝑖 are close in Hamming distance to their encoded counterpart(s𝑖, 𝐸𝐻(s𝑖)).
Lemma 5.5.1. There exist distinct integers𝑘 , ℓ ∈ [3]such that 𝑑𝐻( (s′𝑘, 𝐸𝐻(s𝑘)′),(s𝑘, 𝐸𝐻(s𝑘)) ≤1, and 𝑑𝐻( (s′ℓ, 𝐸𝐻(sℓ)′),(sℓ, 𝐸𝐻(sℓ))) ≤1.
Proof. If the substitution did not occur at either of index sets {1, . . . , 𝐿/3−1}, {𝐿/3+1, . . . ,2𝐿/3−1}, or{2𝐿/3+1, . . . , 𝐿 −1}(which correspond to the values of the strings {a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and{c𝑖}𝑀
𝑖=1, respectively), then the orders among the strings {a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and {c𝑖}𝑀
𝑖=1 are maintained, respectively. That is, we have that𝜏=𝜎and𝜌 =𝜋. This implies that
s′1= (a1, . . . ,a𝑀, b𝜎(1), . . . ,b𝜎(𝑀), c𝜋(1), . . . ,c𝜋(𝑀)),
s′2= (a𝜎−1(1), . . . ,a𝜎−1(𝑀), b1, . . . ,b𝑀, c𝜎−1𝜋(1), . . . ,c𝜎−1𝜋(𝑀)),
s′3= (a𝜋−1(1), . . . ,a𝜋−1(𝑀), b𝜋−1𝜎(1), . . . ,b𝜋−1𝜎(𝑀), c1, . . . ,c𝑀),
and that 𝑑𝐻(𝐸𝐻(s𝑖), 𝐸𝐻(s′𝑖)) ≤ 1, 𝑖 ∈ [3], according to (5.9) and (5.11). In this case, the claim is clear. It remains to show the other cases, and due to symmetry, assume without loss of generality that the substitution occurred in one of the a𝑖’s, i.e., in an entry which is indexed by an integer in [𝐿/3−1].
Let 𝐴 ∈ {0,1}𝑀×𝐿 be a matrix whose rows are the x𝑖’s, in any order. Let 𝐴left be the result of ordering the rows of 𝐴 according to the lexicographic order of their 1, . . . , 𝐿/3−1 entries. Similarly, let 𝐴midand 𝐴right be the results of ordering the rows of 𝐴 by their 𝐿/3 +1, . . . ,2𝐿/3 − 1 and 2𝐿/3+ 1, . . . , 𝐿 − 1 entries, respectively, and let 𝐴′
left, 𝐴′
mid, and 𝐴′
right be defined analogously with {x′𝑖}𝑀
𝑖=1
instead of{x𝑖}𝑀
𝑖=1.
It is readily verified that there exist permutation matrices𝑃1and𝑃2such that𝐴mid = 𝑃1𝐴left and 𝐴right = 𝑃2𝐴left. Moreover, since {b𝑖}𝑀
𝑖=1 = {bˆ𝑖}𝑀
𝑖=1, and {c𝑖}𝑀
𝑖=1 = {cˆ𝑖}𝑀
𝑖=1, it follows that 𝐴′
mid = 𝑃1(𝐴left +𝑅) and 𝐴′
right = 𝑃2(𝐴left +𝑅), where𝑅 ∈ {0,1}𝑀×𝐿 is a matrix of Hamming weight 1; this clearly implies that 𝐴′
mid = 𝐴mid+𝑃1𝑅and that𝐴′
right = 𝐴right+𝑃2𝑅. Now, notice thats2results from vectorizing some submatrix 𝑀2 of 𝐴mid, ands′2 results from vectorizing some submatrix 𝑀′
2
of 𝐴′
mid. Moreover, the matrices 𝑀2 and 𝑀′
2 are taken from their mother matrix by omitting the same rows and columns, and both vectorizing operations consider the entries of 𝑀2and 𝑀′
2 in the same order. In addition, no substitution occurs in the𝐿/3, . . . , 𝐿 entries in thex𝑖’s, which implies that𝑥′
𝜏𝑖,2𝐿/3 =𝑥𝜋(𝑖),2𝐿/3. Then, the redundancies𝐸𝐻(s2)′=𝐸𝐻(s2)and𝐸𝐻(s3)′=𝐸𝐻(s3)can be identified from (5.11).
Therefore, it follows from𝐴′
mid =𝐴mid+𝑃1𝑅that𝑑𝐻( (s′2, 𝐸𝐻(s′2)),(s2, 𝐸𝐻(s2))) ≤
1. The claim fors3is similar. □
By applying a Hamming decoder on either one of the s𝑖’s, the decoder obtains possible candidates for{a𝑖}𝑀
𝑖=1,{b𝑖}𝑀
𝑖=1, and{c𝑖}𝑀
𝑖=1, and by Lemma5.5.1, it follows that these sets of candidates will coincide in at least two cases. Therefore, the decoder can apply a majority vote of the candidates from the decoding of eachs′𝑖, and the winning values are{a𝑖}𝑀
𝑖=1,{b𝑖}𝑀
𝑖=1, and{c𝑖}𝑀
𝑖=1. Having these correct values, the decoder can sort{x′𝑖}𝑀
𝑖=1according to theira𝑖columns, and deduce the values of𝜎 and𝜋by observing the resulting permutation in theb𝑖andc𝑖columns, with respect to their lexicographic ordering. This concludes the decoding of the values 𝑑1, 𝑑3, and𝑑5of the data𝑑.
We are left to extract𝑑2, 𝑑4,and𝑑6. To this end, observe that since the correct values of {a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and {c𝑖}𝑀
𝑖=1are known at this point, the decoder can extract the truepositions ofd2,d4,andd6, as well as their respective redundancy bits𝐸𝐻(d2), 𝐸𝐻(d4), 𝐸𝐻(d6). Hence, we have that
𝑑𝐻( (d′𝑖, 𝐸𝐻(d𝑖)′),(d𝑖, 𝐸𝐻(d𝑖))) ≤1
for 𝑖 ∈ {2,4,6}, and thus that the decoding algorithm is complete by applying a Hamming decoder.
We now turn to compute the redundancy of the above codeC. Note that there are two sources of redundancy—the Hamming code redundancy, which is at most 3(log𝑀 𝐿+ log𝑀 +2) and the fact that the sets {a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and {c𝑖}𝑀
𝑖=1 contain distinct strings. By a straightforward computation, for 4≤ 𝑀 ≤ 2𝐿/6we have
𝑟(C) =log 2𝐿
𝑀
−log
2𝐿/3−1 𝑀
3
· (𝑀!)2·23(𝑀−log𝑀 𝐿−log𝑀−2)
!
=log
𝑀−1
Ö
𝑖=0
(2𝐿 −𝑖) −log
𝑀−1
Ö
𝑖=0
(2𝐿/3−1−𝑖)3
−3𝑀+3 log𝑀 𝐿+3 log𝑀+6
=log
𝑀−1
Ö
𝑖=0
(2𝐿 −𝑖)
(2𝐿/3−2𝑖)3 +3 log𝑀 𝐿+3 log𝑀 +6
≤3𝑀log 2𝐿/3 2𝐿/3−2𝑀
+3 log𝑀 𝐿+3 log+6.
(𝑎)
≤12 log𝑒+3 log𝑀 𝐿+3 log𝑀+6, (5.13)
where inequality(𝑎)is derived in Appendix5.8.
For the case when𝑀 <log𝑀 𝐿+log𝑀, we generate{a𝑖}𝑀
𝑖=1,{b𝑖}𝑀
𝑖=1, and{c𝑖}𝑀
𝑖=1with length𝐿/3− ⌈log𝑀 𝐿+log𝑀
𝑀 ⌉. As a result, we have⌈log𝑀 𝐿+log𝑀
𝑀 ⌉bits𝑥𝑖, 𝑗,𝑖 ∈ [𝑀], 𝑗 ∈ {𝐿/3− ⌈log𝑀 𝐿+log𝑀
𝑀 ⌉ +1, . . . , 𝐿/3} ∪ {2𝐿/3− ⌈log𝑀 𝐿+log𝑀
𝑀 ⌉ +1, . . . ,2𝐿/3} ∪ {𝐿−
⌈log𝑀 𝐿+log𝑀
𝑀 ⌉ +1, . . . , 𝐿} to accommodate the information bits d2,d4,d6 and the redundancy bits{𝐸𝐻(s𝑖)}3
𝑖=1and{𝐸𝐻(d𝑖)}𝑖∈{2,4,6} in each part.
Remark 5.5.1. The above construction is valid whenever 𝑀 ≤ 2𝐿/3−1. However, asymptotically optimal amount of redundancy is achieved for𝑀 ≤ 2𝐿/6.
Remark 5.5.2. In this construction, the separate storage of the Hamming code re- dundancies𝐸𝐻(d2), 𝐸𝐻(d4), and𝐸𝐻(d6)is not necessary. Instead, storing𝐸𝐻(d2, d4,d6) is sufficient, since the true position of those can be inferred after {a𝑖}𝑀
𝑖=1, {b𝑖}𝑀
𝑖=1, and{c𝑖}𝑀
𝑖=1were successfully decoded. This approach results in redundancy of 3 log𝑀 𝐿+log 3𝑀 +𝑂(1), and a similar approach can be utilized in the next section as well.