• Tidak ada hasil yang ditemukan

Chapter V: Coding over Sets for DNA Storage: Substitution Errors

5.4 Bounds

In this section we use sphere packing arguments in order to establish an existence result of codes with low redundancy, and a lower bound on the redundancy of any๐‘˜- substitution code. The latter bound demonstrates the asymptotic optimality of the construction in Sec.5.5for๐‘˜ =1, up to constants. Our techniques rely on upper and lower bounds on the size of the ballB๐‘˜ (Definition5.2.1), which are given below.

However, since our measure for distance is not a metric, extra care is needed when applying sphere-packing arguments. We begin with the existential upper bound in Subsection5.4, continue to provide a lower bound for๐‘˜ =1 in Subsection5.4, and extend this bound to larger values of๐‘˜ in Subsection5.4.

Existential upper bound

In this subsection, let๐‘˜,๐‘€, and๐ฟbe positive integers such that๐‘˜ โ‰ค ๐‘€ ๐ฟand๐‘€ โ‰ค 2๐ฟ. The subsequent series of lemmas will eventually lead to the following upper bound.

Theorem 5.4.1. There exists a ๐‘˜-substitution code C โІ {0,1}๐ฟ

๐‘€

such that๐‘Ÿ(C) โ‰ค 2๐‘˜log(๐‘€ ๐ฟ) +2.

We begin with a simple upper bound on the size of the ballB๐‘˜. Lemma 5.4.1. For every word๐‘Š ={x๐‘–}๐‘€

๐‘–=1 โˆˆ {0,1}

๐ฟ

๐‘€

and every positive integer๐‘˜ โ‰ค ๐‘€ ๐ฟ, we have that|B๐‘˜(๐‘Š) | โ‰คร๐‘˜

โ„“=0 ๐‘€ ๐ฟ

โ„“

.

Proof. Every word inB๐‘˜(๐‘Š) is obtained by flipping the bits inx๐‘– that are indexed by some ๐ฝ๐‘– โІ [๐ฟ], for every ๐‘– โˆˆ [๐‘€], where ร๐‘€

๐‘–=1|๐ฝ๐‘–| โ‰ค ๐‘˜. Clearly, there are at mostร๐‘˜

โ„“=0 ๐‘€ ๐ฟ

โ„“

ways to choose the index sets {๐ฝ๐‘–}๐‘€

๐‘–=1. โ–ก

For ๐‘Š โˆˆ {0,1}

๐ฟ

โ‰ค๐‘€

let R๐‘˜(๐‘Š) be the set of all words ๐‘ˆ โˆˆ {0,1}

๐ฟ

๐‘€

such that ๐‘Š โˆˆ B๐‘˜(๐‘ˆ). That is, for a channel output ๐‘Š, the set R๐‘˜(๐‘Š) contains all potential codewords๐‘ˆ whose transmission through the channel can result in๐‘Š, given that at most๐‘˜ substitutions occur. Further, for๐‘Š โˆˆ {0,1}๐ฟ

๐‘€

define theconfusable setof๐‘Š asG๐‘˜(๐‘Š) โ‰œ โˆช๐‘Šโ€ฒโˆˆB๐‘˜(๐‘Š)R๐‘˜(๐‘Šโ€ฒ). It is readily seen that the words in the confusable set G๐‘˜(๐‘Š) of a word๐‘Š cannot reside in the same ๐‘˜-substitution code as ๐‘Š, and therefore we have the following lemma.

Lemma 5.4.2. For every๐‘˜, ๐‘€, and ๐ฟsuch that ๐‘˜ โ‰ค ๐‘€ ๐ฟ and๐‘€ โ‰ค 2๐ฟ there exists a๐‘˜-substitution codeCsuch that

|C | โ‰ฅ

$ 2๐ฟ

๐‘€

๐ท

%

, where

๐ท โ‰œ max

๐‘Šโˆˆ({0,๐‘€1}๐ฟ)

|G๐‘˜(๐‘Š) |.

Proof. Initialize a listP = {0,1}๐ฟ

๐‘€

, and repeat the following process.

1. Choose๐‘Š โˆˆ P.

2. RemoveG๐‘˜(๐‘Š)fromP.

Clearly, the resulting codeCis of the aforementioned size. It remains to show thatC corrects๐‘˜ substitutions, i.e., thatB๐‘˜(๐ถ) โˆฉ B๐‘˜(๐ถโ€ฒ) =โˆ…for every distinct๐ถ , ๐ถโ€ฒ โˆˆ C.

Recall Definition5.2.1that B๐‘˜(๐ถ) can be obtained by substitution any ๐‘˜ symbols in strings of word๐ถ.

Assume for contradiction that there exist distinct๐ถ , ๐ถโ€ฒ โˆˆ C and๐‘‰ โˆˆ {0โ‰ค,1}๐ฟ

๐‘€

such that๐‘‰ โˆˆ B๐‘˜(๐ถ) โˆฉ B๐‘˜(๐ถโ€ฒ), and w.l.o.g assume that๐ถ was chosen earlier than๐ถโ€ฒin the above process. Since๐‘‰ โˆˆ B๐‘˜(๐ถ), it follows that R๐‘˜(๐‘‰) โІ G๐‘˜(๐ถ). In addition, since๐‘‰ โˆˆ B๐‘˜(๐ถโ€ฒ), it follows that๐ถโ€ฒโˆˆ R๐‘˜(๐‘‰). Therefore, a contradiction is obtained, since๐ถโ€ฒis in G๐‘˜(๐ถ), that was removed from the listPwhen๐ถwas chosen. โ–ก Lemma 5.4.3. For an nonnegative integer ๐‘‡ โ‰ค ๐‘˜ and ๐‘Š โˆˆ {๐‘€โˆ’๐‘‡0,1}๐ฟ

we have that|R๐‘˜(๐‘Š) | โ‰ค2(2๐‘€ ๐ฟ)๐‘˜.

Proof. Denote ๐‘Š = {y1, . . . ,y๐‘€โˆ’๐‘‡} and let ๐‘ˆ โˆˆ R๐‘˜(๐‘Š). Notice that by the definition of R๐‘˜(๐‘Š), there exists a ๐‘˜-substitution operation which turns๐‘ˆ to ๐‘Š. Therefore, everyy๐‘–in๐‘Š is a result of a certain nonnegative number of substitutions in one or more strings in ๐‘ˆ. Hence, we denote by z11, . . . ,z๐‘–1

1 the strings in ๐‘ˆ that resulted in y1 after the ๐‘˜-substitution operation, we denote by z21, . . . ,z2๐‘–

2 the strings which resulted in y2, and so on, up to z๐‘€โˆ’๐‘‡1 , . . . ,z๐‘–๐‘€โˆ’๐‘‡

๐‘€โˆ’๐‘‡, which resulted in y๐‘€โˆ’๐‘‡. Therefore, since ๐‘ˆ = โˆช๐‘€โˆ’๐‘‡

๐‘—=1 {z1๐‘—, . . . ,z๐‘–๐‘—

๐‘—

}, it follows that there exists a

setL โІ [๐‘€] ร— [๐ฟ], of size at most๐‘˜, such that

ยฉ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยซ z11

.. . z๐‘–1

1

z21 .. . z๐‘–๐‘€โˆ’๐‘‡โˆ’1

๐‘€โˆ’๐‘‡โˆ’1

z1๐‘€โˆ’๐‘‡ .. . z๐‘–๐‘€โˆ’๐‘‡

๐‘€โˆ’๐‘‡

ยช

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฌ

=

ยฉ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยญ

ยซ y1

.. . y1 y2 .. . y๐‘€โˆ’๐‘‡โˆ’1

y๐‘€โˆ’๐‘‡

.. . y๐‘€โˆ’๐‘‡

ยช

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฎ

ยฌ

(L)

, (5.1)

where (ยท)(L) is a matrix operator, which corresponds to flipping the bits that are indexed by L in the matrix on which it operates. In what follows, we bound the number of ways to chooseL, which will consequently provide a bound on|R๐‘˜(๐‘Š) |.

Note that the number of ways to choose L is summed over all possible tuples (๐‘–1, . . . , ๐‘–๐‘€โˆ’๐‘‡).

First, defineP={๐‘:๐‘–๐‘ >1}, and denote๐‘ƒ โ‰œ |P |. Therefore, sinceร๐‘€โˆ’๐‘‡

๐‘—=1 ๐‘–๐‘— =๐‘€, it follows that

โˆ‘๏ธ

๐‘โˆˆP

๐‘–๐‘ =

๐‘€โˆ’๐‘‡

โˆ‘๏ธ

๐‘—=1

๐‘–๐‘— โˆ’โˆ‘๏ธ

๐‘—โˆ‰P

๐‘–๐‘—

=๐‘€ โˆ’ (๐‘€ โˆ’๐‘‡ โˆ’๐‘ƒ)

=๐‘‡+๐‘ƒ. (5.2)

Second, notice that for every ๐‘ โˆˆ P, the set {z1๐‘, . . . ,z๐‘–๐‘

๐‘

} contains ๐‘–๐‘ different strings. Hence, since after the ๐‘˜-substitution operation they are all equal to y๐‘, it follows that at least๐‘–๐‘โˆ’1 of them must undergo at least one substitution. Clearly, there are ๐‘–๐‘–๐‘

๐‘โˆ’1

=๐‘–๐‘ different ways to choose who will these๐‘–๐‘โˆ’1 strings be, and additional ๐ฟ๐‘–๐‘โˆ’1different ways to determine the locations of the substitutions, and therefore๐‘–๐‘ยท๐ฟ๐‘–๐‘โˆ’1ways to choose these๐‘–๐‘โˆ’1 substitutions.

Third, notice that

๐‘˜ โˆ’โˆ‘๏ธ

๐‘โˆˆP

(๐‘–๐‘โˆ’1) =๐‘˜ โˆ’โˆ‘๏ธ

๐‘โˆˆP

๐‘–๐‘+๐‘ƒ

(5.2)

= ๐‘˜ โˆ’๐‘‡ , (5.3)

and hence, there are at most ๐‘˜ โˆ’๐‘‡ remaining positions to be chosen to L, after choosing the๐‘–๐‘โˆ’1 positions for every ๐‘ โˆˆ P as described above.

Now, letIbe the set of all tuples(๐‘–1, . . . , ๐‘–๐‘€โˆ’๐‘‡) of positive integers that sum to๐‘€ (whose size is ๐‘€๐‘€โˆ’โˆ’๐‘‡1โˆ’1

by the famousstars and barstheorem). Let๐‘ :I โ†’ Nbe a function which maps (๐‘–1, . . . , ๐‘–๐‘€โˆ’๐‘‡) โˆˆ I to the number of different๐‘ˆ โˆˆ R๐‘˜(๐‘Š) for which there exist L โІ [๐‘€] ร— [๐ฟ] of size at most ๐‘˜ such that (5.1) is satisfied.

Since this quantity is at most the number of ways to choose a suitableL, the above arguments demonstrate that

๐‘(๐‘–1, . . . , ๐‘–๐‘€โˆ’๐‘‡) โ‰ค ๐‘€ ๐ฟ

๐‘˜ โˆ’๐‘‡ ร–

๐‘โˆˆP

๐‘–๐‘๐ฟ๐‘–๐‘โˆ’1. Then, we have

|R๐‘˜(๐‘Š) | โ‰คโˆ‘๏ธ

I

๐‘(๐‘–1, . . . , ๐‘–๐‘€โˆ’๐‘‡)

โ‰ค โˆ‘๏ธ

I

๐‘€ ๐ฟ ๐‘˜โˆ’๐‘‡

ร–

๐‘โˆˆP

๐‘–๐‘๐ฟ๐‘–๐‘โˆ’1

โ‰ค โˆ‘๏ธ

I

(๐‘€ ๐ฟ)๐‘˜โˆ’๐‘‡๐ฟ

ร

๐‘(๐‘–๐‘โˆ’1) ร–

๐‘โˆˆP

๐‘–๐‘

(5.3)

โ‰ค โˆ‘๏ธ

I

(๐‘€ ๐ฟ)๐‘˜โˆ’๐‘‡๐ฟ๐‘‡ ร–

๐‘โˆˆP

๐‘–๐‘. (5.4)

Since the geometric mean of positive numbers is always less than the arithmetic one, we have

รŽ

๐‘โˆˆP๐‘–๐‘ 1/๐‘ƒ

โ‰ค 1

๐‘ƒ

ร

๐‘โˆˆP๐‘–๐‘, and hence, (5.4) โ‰ค โˆ‘๏ธ

I

(๐‘€ ๐ฟ)๐‘˜โˆ’๐‘‡๐ฟ๐‘‡ ร

๐‘๐‘–๐‘

๐‘ƒ

๐‘ƒ

(5.2)

โ‰ค โˆ‘๏ธ

I

(๐‘€ ๐ฟ)๐‘˜โˆ’๐‘‡๐ฟ๐‘‡( (๐‘‡ +๐‘ƒ)/๐‘ƒ)๐‘ƒ

โ‰ค

๐‘€ โˆ’1 ๐‘€โˆ’๐‘‡ โˆ’1

(๐‘€ ๐ฟ)๐‘˜โˆ’๐‘‡๐ฟ๐‘‡( (๐‘‡+๐‘ƒ)/๐‘ƒ)๐‘ƒ

โ‰ค ๐‘€

๐‘‡

(๐‘€ ๐ฟ)๐‘˜โˆ’๐‘‡๐ฟ๐‘‡( (๐‘‡ +๐‘ƒ)/๐‘ƒ)๐‘ƒ

โ‰ค (๐‘€ ๐ฟ)๐‘˜โˆ’๐‘‡(๐‘€ ๐ฟ)๐‘‡( (๐‘‡+๐‘ƒ)/๐‘ƒ)๐‘ƒ

โ‰ค (๐‘€ ๐ฟ)๐‘˜( (๐‘‡ +๐‘ƒ)/๐‘ƒ)๐‘ƒ

(๐‘Ž)

โ‰ค (๐‘€ ๐ฟ)๐‘˜2๐‘‡

โ‰ค (2๐‘€ ๐ฟ)๐‘˜, (5.5)

where(๐‘Ž)will be proved in Appendix5.9. โ–ก

Proof. (of Theorem 5.4.1) It follows from Lemma5.4.1, Lemma 5.4.3, and from the definition of๐ทthat

๐ท โ‰ค max

๐‘Šโˆˆ({0,๐‘€1}๐ฟ)

|B๐‘˜(๐‘Š) | ยท max

๐‘Šโˆˆ({0,๐‘€1}๐ฟ)

|R๐‘˜(๐‘Š) |

โ‰ค

๐‘˜

โˆ‘๏ธ

โ„“=0

๐‘€ ๐ฟ โ„“

!

(2๐‘€ ๐ฟ)๐‘˜.

Therefore, the codeCthat is constructed in Lemma5.4.2satisfies ๐‘Ÿ(C) โ‰ค log

2๐ฟ ๐‘€

โˆ’log|C |

โ‰ค log ๐‘˜

โˆ‘๏ธ

โ„“=0

๐‘€ ๐ฟ โ„“

!

(2๐‘€ ๐ฟ)๐‘˜

!

=log

๐‘˜

โˆ‘๏ธ

โ„“=0

๐‘€ ๐ฟ โ„“

!

+๐‘˜(log(๐‘€ ๐ฟ) +1)

โ‰ค log

๐‘˜ ๐‘€ ๐ฟ

๐‘˜

+๐‘˜(log(๐‘€ ๐ฟ) +1)

โ‰ค

log(๐‘˜) โˆ’log(๐‘˜!) +log(๐‘€ ๐ฟ๐‘˜) +๐‘˜(log(๐‘€ ๐ฟ) +1)

โ‰ค 2๐‘˜log(๐‘€ ๐ฟ) +2. โ–ก

Lower bound for a single substitution code

Notice that the bound in Lemma5.4.1is tight, e.g., in cases where๐‘‘๐ป(x๐‘–,x๐‘—) โ‰ฅ2๐‘˜+1 for all distinct๐‘–, ๐‘— โˆˆ [๐‘€]. This might occur only if ๐‘€ is less than the maximum size of a ๐‘˜-substitution correcting vector-code, i.e., when๐‘€ โ‰ค 2๐ฟ/(ร๐‘˜

๐‘–=0 ๐ฟ ๐‘–

) [79, Sec. 4.2]. When the minimum Hamming distance between the strings in a codeword is not large enough, different substitution errors might result in identical words, and the size of the ball is smaller than the given upper bound. This is illustrated in the following example.

Example 5.4.1. For ๐ฟ = 3 and ๐‘€ = 2, consider the word ๐‘Š = {010,011}. By flipping either the two underlined symbols, or the two bold symbols, the word๐‘Šโ€ฒ= {010,110} is obtained. Hence, different substitution operations might result in identical words. As a result, the size of the ball

B2(๐‘Š) ={{010,011},{110,011},{000,011},{011}, {010,111},{010,001},{010},{100,011},

{111,011},{110,111},{110,001},{110,010}, {001,011},{000,111},{000,001},{000,010}, {010,101}}

is17, which is smaller than the upper bound of theB2(๐‘Š)ball size ๐‘€ ๐ฟ0 + ๐‘€ ๐ฟ

1

+

๐‘€ ๐ฟ 2

=22.

However, in some cases it is possible to bound the size ofB๐‘˜ from below by using tools from Fourier analysis of Boolean functions. In the following it is assumed that that๐‘˜ =1. A word๐‘Š โˆˆ {0,1}

๐ฟ

๐‘€

corresponds to a Boolean function ๐‘“๐‘Š :{ยฑ1}๐ฟ โ†’ {ยฑ1} as follows. For x โˆˆ {0,1}๐ฟ let x โˆˆ {ยฑ1}๐ฟ be the vector which is obtained fromxbe replacing every 0 by 1 and every 1 byโˆ’1. Then, we define ๐‘“๐‘Š(x) = โˆ’1 if x โˆˆ ๐‘Š, and 1 otherwise. Considering the set {ยฑ1}๐ฟ as the hypercube graph3, theboundary๐œ• ๐‘“๐‘Š of ๐‘“๐‘Š is the set of all edges{x1,x2} โˆˆ {ยฑ1}๐ฟ

2

in this graph such that ๐‘“๐‘Š(x1) โ‰  ๐‘“๐‘Š(x2).

Lemma 5.4.4. For every word๐‘Š โˆˆ {0,1}๐ฟ

๐‘€

we have that |B1(๐‘Š) | โ‰ฅ |๐œ• ๐‘“๐‘Š| +1.

Proof. Let๐‘Š = {x1, . . . ,x๐‘€} be a word. Every edge๐‘’ ={x๐‘–,xโ€ฒ} on the boundary of ๐‘“๐‘Šcorresponds to a substitution operation inx๐‘–from๐‘Šthat results in a word๐‘Š๐‘’ = {x1, . . . ,x๐‘–โˆ’1,xโ€ฒ,x๐‘–+1, . . . ,x๐‘€} โˆˆ (B1(๐‘Š)\{๐‘Š}) โˆฉ {0,1}๐ฟ

๐‘€

. To show that every edge ๐‘’ on the boundary corresponds to a unique word ๐‘Š๐‘’ in B1(๐‘Š), assume for contradiction that๐‘Š๐‘’ =๐‘Š๐‘’โ€ฒ for two distinct edges๐‘’ = {x1,x2} and๐‘’โ€ฒ = {y1,y2}, where x1,y1 โˆˆ ๐‘Š andx2,y2 โˆ‰ ๐‘Š. Since both ๐‘Š๐‘’ and ๐‘Š๐‘’โ€ฒ contain precisely one element which is not in๐‘Š, and are missing one element which is in๐‘Š, it follows that x1 = y1 and x2 = y2, a contradiction. Therefore, there exists an injective mapping between the boundary of ๐‘“๐‘Š andB1(๐‘Š)\{๐‘Š}, and the claim follows. โ–ก Notice that the bound in Lemma5.4.4 is tight, e.g., in cases where the minimum Hamming distance between the strings of๐‘Šis at least 2. This implies the tightness of the bound which is given below in these cases. Having established the connection betweenB1(๐‘Š)and the boundary of ๐‘“๐‘Š, the following Fourier analytic claim will aid in proving a lower bound. Let thetotal influenceof ๐‘“๐‘Šbe๐ผ(๐‘“๐‘Š) โ‰œ ร๐ฟ

๐‘–=1Prx(๐‘“๐‘Š(x) โ‰  ๐‘“๐‘Š(xโŠ•๐‘–)), wherexโŠ•๐‘–is obtained fromxby changing the sign of the๐‘–-th entry, andx is chosen uniformly at random.

3The nodes of the hypercube graph of dimension๐ฟare identified by{ยฑ1}๐ฟ, and every two nodes are connected if and only if the Hamming distance between them is 1.

Lemma 5.4.5. [72, Theorem 2.39] For every function ๐‘“ : {ยฑ1}๐ฟ โ†’ R, we have that๐ผ(๐‘“) โ‰ฅ2๐›ผlog(1/๐›ผ), where๐›ผ=๐›ผ(๐‘“) โ‰œ min{Prx(๐‘“(x) =1),Prx(๐‘“(x)=โˆ’1)}, andx โˆˆ {ยฑ1}๐ฟ is chosen uniformly at random.

Lemma 5.4.6. Let ๐ฟ and ๐‘€ be integers such that ๐‘€ โ‰ค 2(1โˆ’๐œ–)๐ฟ and ๐ฟ > 1

๐œ– for some0 < ๐œ– <1. For every word๐‘Š โˆˆ {0,1}

๐ฟ

๐‘€

we have that |๐œ• ๐‘“๐‘Š| โ‰ฅ ๐œ– ๐‘€ ๐ฟ.

Proof. Since๐‘€ โ‰ค 2(1โˆ’๐œ–)๐ฟ and๐›ผ= ๐›ผ(๐‘“๐‘Š) =min{(2๐ฟ โˆ’๐‘€)/2๐ฟ, ๐‘€/2๐ฟ}, it follows that๐›ผ= ๐‘€/2๐ฟwhenever๐ฟ > 1

๐œ–, which holds for every non-constant๐ฟ. In addition, since Prx(๐‘“๐‘Š(x) โ‰  ๐‘“๐‘Š(xโŠ•๐‘–))equals the fraction of dimension๐‘–edges that lie on the boundary of ๐‘“๐‘Š ([Oโ€™Donnell]), Lemma5.4.4implies that

๐ผ(๐‘“๐‘Š) = |๐œ• ๐‘“๐‘Š| 2๐ฟโˆ’1

.

Therefore, since ๐‘€ โ‰ค 2(1โˆ’๐œ–)๐ฟ and from Lemma 5.4.5 we have that |๐œ• ๐‘“๐‘Š| = 2๐ฟโˆ’1๐ผ(๐‘“๐‘Š) โ‰ฅ2๐ฟ๐›ผlog(1/๐›ผ) = ๐‘€log(2๐ฟ/๐‘€) โ‰ฅ ๐œ– ๐‘€ ๐ฟ. โ–ก Corollary 5.4.1. For integers ๐ฟ and ๐‘€ and a constant 0 < ๐œ– < 1such that ๐‘€ โ‰ค 2(1โˆ’๐œ–)๐ฟ and ๐ฟ > 1

๐œ–, any 1-substitution code C โІ {0,1}๐ฟ

๐‘€

satisfies that ๐‘Ÿ(C) โ‰ฅ log(๐‘€ ๐ฟ) โˆ’log 1/๐œ–.

Proof. According to Lemma 5.4.4 and Lemma 5.4.6, every codeword of every C excludes at least ๐œ– ๐‘€ ๐ฟ other words from belonging to C. Hence, we have that

|C | โ‰ค 2

๐ฟ

๐‘€

/๐œ– ๐‘€ ๐ฟ, and by the definition of redundancy, it follows that ๐‘Ÿ(C)=log

2๐ฟ ๐‘€

โˆ’log(|C |)

โ‰ฅ log(๐œ– ๐‘€ ๐ฟ)

=log(๐‘€ ๐ฟ) โˆ’log 1/๐œ– . โ–ก

Remark 5.4.1. A similar lower bound was presented in [62], where it was shown that for a code correcting๐‘ loss of strings,๐‘žsubstitution errors in each of๐‘กstrings, the redundancy is lower bounded by

๐‘  ๐ฟ+๐‘กlog๐‘€+๐‘ก ๐‘žlog๐ฟโˆ’log(๐‘ !๐‘ก!๐‘ž!๐‘ก) +๐‘œ(1),

when๐‘€ =2๐›ฝ ๐ฟ for some0< ๐›ฝ < 1. Taking๐‘ =0and๐‘ž =๐‘ก =1results in the lower boundlog(๐‘€ ๐ฟ) +๐‘œ(1), which is order-wise the same as, and stronger by a constant log 1/๐œ– than the lower boundlog(๐‘€ ๐ฟ) โˆ’log 1/๐œ– in Corollary5.4.1. Compared to the bound in [62], the bound in Corollary5.4.1does not require๐‘€to be exponential in๐ฟ, and thus applies to broader parameter range for๐‘€ and๐ฟ.

Lower bound for more than one substitution

Similar techniques to the ones in Subsection5.4can be used to obtain a lower bound for larger values of๐‘˜. Specifically, we have the following theorem.

Theorem 5.4.2. For integers๐ฟ,๐‘€,๐‘˜, and positive constants๐œ– , ๐‘ <1such that๐‘€ โ‰ค 2(1โˆ’๐œ–)๐ฟ, ๐ฟ > 1

๐œ–, and ๐‘˜ โ‰ค ๐‘๐œ–

โˆš

๐‘€, a ๐‘˜-substitution code C โІ {0,1}๐ฟ

๐‘€

satisfies that๐‘Ÿ(C) โ‰ฅ ๐‘˜(log(๐‘€ ๐ฟ) โˆ’2 log(๐‘˜)) โˆ’๐‘‚(1).

To prove this theorem, it is shown that certain special๐‘˜-subsets of๐œ• ๐‘“๐‘Šcorrespond to words inB๐‘˜(๐‘Š), and by bounding the number of these special subsets from below, the lower bound is attained. A subset of ๐‘˜ boundary edges is called special, if it does not contain two edges that intersect on a node (i.e., a string) in๐‘Š. Formally, a subset S โІ ๐œ• ๐‘“๐‘Š is special if |S | = ๐‘˜, and for every {x1,y1},{x2,y2} โˆˆ S with ๐‘“๐‘Š(x1) = ๐‘“๐‘Š(x2) =โˆ’1 and ๐‘“๐‘Š(y1) = ๐‘“๐‘Š(y2) = 1 we have thatx1 โ‰  x2. We begin by showing how special sets are relevant to proving Theorem5.4.2.

Lemma 5.4.7. For every word๐‘Š โˆˆ {0,1}๐ฟ

๐‘€

we have that

|B๐‘˜(๐‘Š) | โ‰ฅ |{S โІ๐œ• ๐‘“๐‘Š|S is special}|

๐‘˜๐‘˜

.

Proof. It is shown that every special set corresponds to a word in B๐‘˜(๐‘Š), and at most ๐‘˜๐‘˜ different special sets can correspond to the same word (namely, there exists a mapping from the family of special sets to B๐‘˜(๐‘Š), which is at most ๐‘˜๐‘˜ to 1). Let S = {{x๐‘–,y๐‘–}}๐‘˜

๐‘–=1 be special, where ๐‘“๐‘Š(x๐‘–) = โˆ’1 and ๐‘“๐‘Š(y๐‘–) = 1 for every ๐‘– โˆˆ [๐‘˜]. Let ๐‘ŠS โˆˆ {0,1}

๐ฟ

โ‰ค๐‘€

be obtained from ๐‘Š by removing the x๐‘–โ€™s and adding the y๐‘–โ€™s, i.e., ๐‘ŠS โ‰œ (๐‘Š \ {x๐‘–}๐‘˜

๐‘–=1) โˆช {y๐‘–}๐‘‡

๐‘–=1 for some๐‘‡ โ‰ค ๐‘˜; notice that there are exactly ๐‘˜ distinctx๐‘–โ€™s but at most ๐‘˜ distinct y๐‘–โ€™s, since S is special, and therefore we assume w.l.o.g thaty1, . . . ,y๐‘‡ are the distincty๐‘–โ€™s. It is readily verified that๐‘ŠS โˆˆ B๐‘˜(๐‘Š), since๐‘ŠS can be obtained from๐‘Š by performing๐‘˜ substitution operations in ๐‘Š, each of which corresponds to an edge in S. Moreover, every S corresponds to a unique surjective function ๐‘“S : [๐‘˜] โ†’ [๐‘‡] such that ๐‘“S(๐‘–) = ๐‘— if there exists ๐‘— โ‰ค ๐‘‡ such that {x๐‘–,y๐‘—} โˆˆ S, and hence at most ๐‘˜๐‘‡ โ‰ค ๐‘˜๐‘˜ different special setsScan correspond to the same word inB๐‘˜(๐‘Š). โ–ก We now turn to prove a lower bound on the number of special sets.

Lemma 5.4.8. Let ๐ฟ and ๐‘€ be integers such that ๐‘€ โ‰ค 2(1โˆ’๐œ–)๐ฟ and ๐ฟ > 1

๐œ– for some0 < ๐œ– < 1. If there exists a positive constant๐‘ < 1 such that ๐‘˜ โ‰ค ๐‘ยท๐œ–

โˆš ๐‘€, then there are at least(1โˆ’๐‘2) |๐œ• ๐‘“๐‘Š|

๐‘˜

special setsS โІ ๐œ• ๐‘“๐‘Š.

Proof. Clearly, the number of ways to choose a๐‘˜-subset of๐œ• ๐‘“๐‘Šwhich isnotspecial, i.e., contains ๐‘˜ distinct edges of ๐œ• ๐‘“๐‘Š but at least two of those are adjacent to the samex โˆˆ๐‘Š, is at most

๐‘€ ยท ๐ฟ

2

ยท

|๐œ• ๐‘“๐‘Š| ๐‘˜ โˆ’2

=๐‘€ ยท ๐ฟ

2

ยท ๐‘˜(๐‘˜โˆ’1)

(|๐œ• ๐‘“๐‘Š| โˆ’๐‘˜+2) (|๐œ• ๐‘“๐‘Š| โˆ’๐‘˜+1) ยท

|๐œ• ๐‘“๐‘Š| ๐‘˜

. Observe that the multiplier of |๐œ• ๐‘“๐‘˜๐‘Š|

in the above expression can be bounded as follows.

๐‘€ ยท ๐ฟ

2

ยท ๐‘˜(๐‘˜โˆ’1)

(|๐œ• ๐‘“๐‘Š| โˆ’๐‘˜+2) (|๐œ• ๐‘“๐‘Š| โˆ’๐‘˜+1)

โ‰ค๐‘€ ยท ๐ฟ

2

ยท ๐‘˜(๐‘˜โˆ’1)

(๐œ– ๐‘€ ๐ฟโˆ’๐‘˜+2) (๐œ– ๐‘€ ๐ฟโˆ’๐‘˜ +1)

โ‰ค๐‘€ ยท๐ฟ2ยท ๐‘˜2 ๐œ–2๐‘€2๐ฟ2

,

where the former inequality follows since|๐œ• ๐‘“๐‘Š| โ‰ฅ ๐œ– ๐‘€ ๐ฟby Lemma5.4.6; the latter inequality follows since ๐‘˜ โ‰ค ๐‘๐œ–

โˆš

๐‘€ implies that๐œ– ๐‘€ ๐ฟ โˆ’๐‘˜ +2 โ‰ฅ ๐œ– ๐‘€ ๐ฟโˆ’๐‘˜ +1 โ‰ฅ

โˆš1

2 ยท๐œ– ๐‘€ ๐ฟ whenever ๐‘

โˆš

โˆš 2

2โˆ’1 โ‰ค ๐ฟ

โˆš

๐‘€, which holds for every non-constant ๐‘€ and ๐ฟ. Therefore, since

๐‘€ ยท๐ฟ2ยท ๐‘˜2 ๐œ–2๐‘€2๐ฟ2

= ๐‘˜2 ๐œ–2๐‘€

โ‰ค ๐‘2, it follows that these are at least(1โˆ’๐‘2) ยท |๐œ• ๐‘“๐‘Š|

๐‘˜

special subsets in๐œ• ๐‘“๐‘Š. โ–ก Lemma 5.4.7 and Lemma 5.4.8 readily imply that |B๐‘˜(๐‘Š) | โ‰ฅ (1โˆ’๐‘2)

๐‘˜๐‘˜ ๐œ– ๐‘€ ๐ฟ

๐‘˜

for every๐‘Š โˆˆ {0,1}

๐ฟ

๐‘€

, from which we can prove Theorem5.4.2.

Proof. (of Theorem 5.4.2) Clearly, no two ๐‘˜-balls around codewords in C can intersect, and therefore we must have|C | โ‰ค 2

๐ฟ

๐‘€

/min๐‘ŠโˆˆC|B๐‘˜(๐‘Š) |. Therefore, ๐‘Ÿ(C) =log

2๐ฟ ๐‘€

โˆ’log|C |

โ‰ฅ log

(1โˆ’๐‘2) ๐‘˜๐‘˜

๐œ– ๐‘€ ๐ฟ ๐‘˜

=log ๐œ– ๐‘€ ๐ฟ

๐‘˜

โˆ’๐‘˜log(๐‘˜) โˆ’๐‘‚(1)

โ‰ฅ log

(๐œ– ๐‘€ ๐ฟโˆ’๐‘˜+1)๐‘˜ ๐‘˜๐‘˜

โˆ’ ๐‘˜log(๐‘˜) โˆ’๐‘‚(1)

โ‰ฅ logยฉ

ยญ

ยญ

ยซ โˆš1

2๐œ– ๐‘€ ๐ฟ ๐‘˜

๐‘˜๐‘˜ ยช

ยฎ

ยฎ

ยฌ

โˆ’ ๐‘˜log(๐‘˜) โˆ’๐‘‚(1)

=๐‘˜

log(โˆš๐œ–

2) +log(๐‘€ ๐ฟ))

โˆ’2๐‘˜log(๐‘˜) โˆ’๐‘‚(1)

โ‰ฅ ๐‘˜log(๐‘€ ๐ฟ) โˆ’2๐‘˜log(๐‘˜) โˆ’๐‘‚(๐‘˜). โ–ก