• Tidak ada hasil yang ditemukan

Robust Indexing for Deletion/Insertion Errors

Chapter VI: Robust Indexing: Optimal Codes Correcting Deletion/Insertion

6.5 Robust Indexing for Deletion/Insertion Errors

Letβ„“βˆ—be the most significant bit wherea𝑖1 anda𝑖2 differ, i.e.,(π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ—βˆ’1) = (π‘Žπ‘–

2,1, . . . , π‘Žπ‘–

2,β„“βˆ—βˆ’1) andπ‘Žπ‘–

1,β„“βˆ— = 1 andπ‘Žπ‘–

2,β„“βˆ— =0. Then according to the if statement in the encoding procedure, we have that

π‘žπ‘–

1βˆ’ βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

(2πΏβ€²βˆ’β„“βˆ’π‘π»( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖1βˆ’1

𝑗=1)) > 0 and π‘žπ‘–

2βˆ’ βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

(2πΏβ€²βˆ’β„“βˆ’π‘π»( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖2βˆ’1

𝑗=1)) ≀0, which implies that

π‘žπ‘–

2βˆ’π‘žπ‘–

1 <

βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

(2πΏβ€²βˆ’β„“ βˆ’π‘π»( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖2βˆ’1

𝑗=1))

βˆ’ βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

(2πΏβ€²βˆ’β„“ βˆ’π‘π»( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖1βˆ’1

𝑗=1))

= βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

(𝑁𝐻( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖1βˆ’1

𝑗=1)

βˆ’π‘π»( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖2βˆ’1

𝑗=1))

= βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

𝑁𝐻( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖𝑗=𝑖1βˆ’1

2)

(π‘Ž)

≀

𝑖1βˆ’1

βˆ‘οΈ

𝑗=𝑖2

|c: 𝑑𝐻(c,a𝑗) ≀2π‘˜|

=(𝑖1βˆ’π‘–2)𝑄 , (6.13)

where (π‘Ž) follows from the definition of 𝑁𝐻(a, 𝐴) and the fact that the strings which have(π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“1βˆ’1,0)and(π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“2βˆ’1,0)as prefixes, respectively, whereπ‘Žπ‘–

1,β„“1=1, π‘Žπ‘–

1,β„“2=1 andβ„“1 β‰  β„“2, are different. Eq. (6.13) contradicts to the fact that the integers(π‘ž1, . . . , π‘žπ‘€) =𝐹𝐻

𝑄 (𝑑)satisfyπ‘žπ‘–βˆ’π‘žπ‘–+1 > 𝑄for𝑖 ∈ [π‘€βˆ’1], which impliesπ‘žπ‘–

1βˆ’π‘žπ‘–

2 β‰₯ (𝑖1βˆ’π‘–2)𝑄.

Since the calculation of𝑁𝐻(a, 𝐴)has polynomial complexity, the complexity of the encoding/decoding procedure is polynomial in 𝑀and 𝐿′.

Theorem 6.5.1. For integers 𝑀 , 𝐿 , π‘˜, and 𝐿′ β‰œ 3 log𝑀+4π‘˜2+1. If 𝐿′+4π‘˜ 𝐿′+ 2π‘˜log(4π‘˜ 𝐿′) ≀ 𝐿, there exists aπ‘˜-deletion code, computable in 𝑝 π‘œπ‘™ 𝑦(𝑀 , 𝐿) time, that has redundancy8π‘˜log𝑀 𝐿+ (12π‘˜ +2)log𝑀+𝑂(π‘˜3) +π‘œ(log𝑀 𝐿).

Similar to the construction in Sec.6.3, we use the first 𝐿′bits (π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′), 𝑖 ∈ [𝑀] in each string x𝑖 as indexing bits and sort the strings {x𝑖} according to the lexicographic order of {(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′)}𝑀

𝑖=1. To protect the ordering, we use Reed- Solomon code to protect the characteristic vector 1({(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′)}𝑀

𝑖=1). The difference is that in this section, we construct the indexing bits{(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′)}𝑀

𝑖=1

such that the mutual deletion distance among {(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′)}𝑀

𝑖=1, rather than the mutual Hamming distance considered in Sec.6.3, is at least 2π‘˜+1, i.e., the deletion ballsDπ‘˜( (π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′))andDπ‘˜( (π‘₯𝑗 ,1, . . . , π‘₯𝑗 , 𝐿′))do not intersect for𝑖≠ 𝑗, where the deletion ballDπ‘˜(u)of a stringu{0,1}𝑛is the set of all lengthπ‘›βˆ’π‘˜ subsequence ofu. Define

S𝐷 =

{a1, . . . ,a𝑀} :Dπ‘˜(a𝑖) ∩ Dπ‘˜(a𝑗) =βˆ…for𝑖 β‰  𝑗 .

The construction is based on the following two lemmas, where the first one is robust indexing for deletion/insertion errors, which will be proved in Sec. 6.5 and the second one is a deletion code construction, which we presented in Ch. 4.

Lemma 6.5.1. For 𝑃=2π‘˜ πΏπ‘˜β€²2

, there exists an invertible mapping 𝐹𝐷

𝑆 : 2𝐿′ βˆ’ (𝑀 βˆ’1)𝑃+π‘€βˆ’1

π‘€βˆ’1 β†’

{0,1}𝐿′ 𝑀

, computable in 𝑝 π‘œπ‘™ 𝑦(𝑀 , 𝐿) time, such that for any 𝑑 ∈ [ ⌈(2

𝐿′

βˆ’π‘€ 𝑃)π‘€βˆ’1

(π‘€βˆ’1)! βŒ‰], we have that𝐹𝐷

𝑆 (𝑑) ∈ S𝐷.

Lemma 6.5.2. (Corollary of Theorem 4.1.2) For any integer𝑛and𝑁 =𝑛+4π‘˜log𝑛+ π‘œ(log𝑛), there exists a systematic encoding function 𝐸 𝑛𝑐 : {0,1}𝑛 β†’ {0,1}𝑁, computed in𝑂(𝑛2π‘˜+1) time, and a decoding function 𝐷 𝑒 𝑐 : {0,1}π‘βˆ’π‘˜ β†’ {0,1}𝑛, computed in𝑂(π‘›π‘˜+1)time, such that for anyc∈ {0,1}𝑛and substringd∈ {0,1}π‘βˆ’π‘˜ of𝐸 𝑛𝑐(c), we have that𝐷 𝑒 𝑐(d) =c.

Code Constructions

The code construction is the same as that in Sec.6.3except that here, the indexing bits {(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′)}𝑀

𝑖=1 are generated using the map 𝐹𝐷

𝑆 . In addition, a deletion code in Lemma6.5.2is used to protect the concatenated string.

Let the datad∈ 𝐷to be encoded be a tupled=(𝑑1,d2), where𝑑1 ∈ [ ⌈(2

𝐿′

βˆ’π‘€ 𝑃)π‘€βˆ’1 (π‘€βˆ’1)! βŒ‰]

and

d2 ∈ {0,1}𝑛

such that𝑛+4π‘˜log𝑛+π‘œ(log𝑛) =𝑀(πΏβˆ’πΏβ€²) βˆ’4π‘˜ 𝐿′, which implies that𝑛= 𝑀(πΏβˆ’ 𝐿′) βˆ’4π‘˜ πΏβ€²βˆ’4π‘˜βŒˆlog𝑀 πΏβŒ‰ βˆ’π‘œ(log𝑀 𝐿). We briefly present the encoding/decoding procedure as follows.

Encoding:

(1) Let𝐹𝐷

𝑆 (𝑑1) ={a1, . . . ,a𝑀} ∈ S𝐻such thata1=1𝐿′anda1 >a2 > . . . >a𝑀. Let(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′) =a𝑖, for𝑖 ∈ [𝑀].

(2) Let

(π‘₯1, 𝐿′+1, . . . , π‘₯1, 𝐿′+4π‘˜ 𝐿′+4π‘˜log(4π‘˜ 𝐿′)+π‘œ(log(4π‘˜ 𝐿′)))

=𝐸 𝑛𝑐(𝑅 𝑆2π‘˜(1({a1, . . . ,a𝑀}))) (3) Place the deletion code𝐸 𝑛𝑐(d2)in bits

(π‘₯1, 𝐿′+4π‘˜ 𝐿′+4π‘˜log(4π‘˜ 𝐿′)+π‘œ(log(4π‘˜ 𝐿′))+1, . . . , π‘₯1, 𝐿), and (π‘₯𝑖, 𝐿′+1, . . . , π‘₯𝑖, 𝐿)for𝑖 ∈ [2, 𝑀].

Upon receiving{x′𝑖}𝑀

𝑖=1, the decoding procedure is as follows.

Decoding:

(1) Find the unique stringx′𝑖

0 such that (π‘₯β€²

𝑖0,1, . . . , π‘₯β€²

𝑖0, πΏβ€²βˆ’π‘˜) = 1πΏβ€²βˆ’π‘˜. Thenx′𝑖

0 is an erroneous copy ofx1and the string

(π‘₯β€²

𝑖0, 𝐿′+1, . . . , π‘₯β€²

𝑖0, 𝐿′+4π‘˜ 𝐿+4π‘˜log(4π‘˜ 𝐿′)+π‘œ(log(4π‘˜ 𝐿′))βˆ’π‘˜) is an erroneous copy of

(π‘₯1, 𝐿′+1, . . . , π‘₯1, 𝐿′+4π‘˜ 𝐿′+4π‘˜log(4π‘˜ 𝐿′)+π‘œ(log(4π‘˜ 𝐿′))) =𝐸 𝑛𝑐(𝑅 𝑆2π‘˜(1({a𝑖}𝑖=1𝑀 ))). Correct the vector𝑅 𝑆2π‘˜(1({a𝑖}𝑀

𝑖=1))and use it to recover1({a𝑖}𝑀

𝑖=1), and thus the indexing bits{(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′)}𝑀

𝑖=1. Recover𝑑1= (𝐹𝐻

𝑆 )βˆ’1({a𝑖}𝑀

𝑖=1).

(2) For each𝑖 ∈ [𝑀], find the uniqueπœ‹(𝑖) ∈ [𝑀]such that(π‘₯β€²

πœ‹(𝑖),1, . . . , π‘₯β€²

πœ‹(𝑖), πΏβ€²βˆ’π‘˜) is a lengthπΏβ€²βˆ’π‘˜ substring of(π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′)(note thatπœ‹(1) =𝑖0). Checking if a string is a substring of another can be done in linear time using a greedy algorithm.

(4) Since xβ€²πœ‹(𝑖) is an erroneous copy of x𝑖, 𝑖 ∈ [𝑀], the concatenation mβ€² = ( (π‘₯β€²

πœ‹(1), 𝐿′+4π‘˜ 𝐿′+4π‘˜log(4π‘˜ 𝐿′)+π‘œ(log(4π‘˜ 𝐿′))+1, . . . , π‘₯β€²

πœ‹(1), 𝐿1),xβ€²πœ‹(

2), . . . ,xβ€²πœ‹(𝑀)), where 𝐿1 is the length of xβ€²πœ‹(

1), is an erroneous copy of 𝐸 𝑛𝑐(d2). Use the de- coder𝐷 𝑒 𝑐(mβ€²)=d2.

(5) Output(𝑑1,d2).

The proof of correctness is similar to that in Sec.6.3. The redundancy of the code is

π‘Ÿ(C)=log 2𝐿

𝑀

βˆ’log⌈

ΓŽπ‘€βˆ’1

𝑖=1 (2𝐿′ βˆ’π‘– 𝑃) (𝑀 βˆ’1)! βŒ‰

βˆ’ [𝑀(πΏβˆ’πΏβ€²) βˆ’4π‘˜ πΏβ€²βˆ’8π‘˜log(4π‘˜ 𝐿′) βˆ’π‘œ(log(4π‘˜ 𝐿′))

βˆ’4π‘˜βŒˆlog𝑀 πΏβŒ‰ βˆ’π‘œ(log𝑀 𝐿)]

≀8π‘˜log𝑀 𝐿+ (12π‘˜+2)log𝑀+𝑂(π‘˜3) +π‘œ(π‘˜log𝑀 𝐿). Computing𝐹𝐷

𝑆

We now prove Lemma 6.5.1. The robust indexing algorithm for generating the indexing strings {π‘₯𝑖,1, . . . , π‘₯𝑖, 𝐿′} is the same as in Sec. 6.4 except that we replace the notations 𝑁𝐻(a, 𝐴) and 𝑄, which are based on Hamming distance, with their deletion distance counterparts. For a string c ∈ {0,1}β„“ and a set of indices Ξ” = {𝛿1, . . . , π›Ώπ‘Ÿ} βŠ‚ [β„“], letc(Ξ”)be the lengthβ„“βˆ’π‘Ÿsubsequence ofcobtained by deleting bits(𝑐𝛿

1, 𝑐𝛿

2, . . . , 𝑐𝛿

π‘Ÿ)inc.

For sequencesc1 ∈ {0,1}β„“1 andc2 ∈ {0,1}β„“2 and nonnegative integersπ‘Ÿ1, π‘Ÿ2, define the set

I (c1,c2, π‘Ÿ1, π‘Ÿ2) ={(Ξ”1,Ξ”2) :Ξ”1 βŠ† [β„“1],|Ξ”1| β‰€π‘Ÿ1,Ξ”2 βŠ† [β„“2],|Ξ”2| ≀ π‘Ÿ2, c1(Ξ”1) =c2(Ξ”2)}

and the number

𝑁(c1,c2, π‘Ÿ1, π‘Ÿ2) =|I (c1,c2, π‘Ÿ1, π‘Ÿ2) |, (6.14) which is the number of ways to delete no more than π‘Ÿ1 and π‘Ÿ2 bits in c1 and c2, respectively, such that the resulting subsequences are the same. For a sequencea∈ {0,1}β„“ of lengthβ„“ ∈ [0, 𝐿′]and a set of sequences 𝐴 βŠ‚ {0,1}𝐿′, define

𝑁𝐷(a, 𝐴) =βˆ‘οΈ

c∈𝐴

βˆ‘οΈ

cβ€²:cβ€²βˆˆ{0,1}𝐿′ and(𝑐1β€²,...,𝑐ℓ)=a

𝑁(cβ€²,c, π‘˜ , π‘˜).

For an empty sequenceaand a sequencec, we have that 𝑁𝐷(a,c) =𝑃 β‰œ

π‘˜

βˆ‘οΈ

π‘Ÿ=0

𝐿′ π‘Ÿ

2

2π‘Ÿ, (6.15)

since 𝑁𝐷(a,c) is the number of tuples (cβ€²,Ξ”1,Ξ”2) of sequences cβ€² ∈ {0,1}𝐿′ and index setsΞ”1,Ξ”2 βŠ‚ [𝐿′] such that after no more thanπ‘˜ deletions in indicesΞ”1and Ξ”2incandcβ€², respectively, we obtain the same subsequencec(Ξ”1) =cβ€²(Ξ”2).

The algorithm for computing𝐹𝐷

𝑆 is the same as that for computing𝐹𝐻

𝑆 , by replacing the numbers 𝑁𝐻( (π‘Žπ‘–,1, . . . , π‘Žπ‘–,β„“βˆ’1,0),{a𝑗}π‘–βˆ’1

𝑗=1) and 𝑄 with numbers 𝑁𝐷( (π‘Žπ‘–,1, . . . , π‘Žπ‘–,β„“βˆ’1,0),{a𝑗}π‘–βˆ’1

𝑗=1) and 𝑃. To prove the correctness of the algorithm, we need to show that 𝑁𝐷(a, 𝐴) satisfies the two properties similar to the ones in Eq. (6.5) and Eq. (6.6). The first is that

𝑁𝐷(a, 𝐴)= 𝑁𝐷( (a,0), 𝐴) +𝑁𝐷( (a,1), 𝐴) (6.16) for a sequencea∈ {0,1}β„“ of lengthβ„“ ∈ [πΏβ€²βˆ’1] and a set 𝐴 βŠ‚ {0,1}𝐿′, which is a deletion counterpart of Eq. (6.5). This can be proved by noticing that

𝑁𝐷(a, 𝐴)= βˆ‘οΈ

cβ€²:cβ€²βˆˆ{0,1}𝐿′ and(𝑐′1,...,𝑐ℓ′)=a

βˆ‘οΈ

c∈𝐴

𝑁(cβ€²,c, π‘˜ , π‘˜) and that for every sequence cβ€² ∈ {0,1}𝐿′ that satisfies (𝑐′

1, . . . , 𝑐′

β„“) = a, we have either𝑐′

β„“+1 =1 or𝑐′

β„“+1 =0.

The second property is that the number𝑁𝐷(a, 𝐴)is computable in polynomial time.

Since obtaining an explicit expression as in Eq. (6.6) is challenging, we compute the number 𝑁𝐷(a,c) using dynamic programming for two sequences a ∈ {0,1}β„“ andc ∈ {0,1}𝐿′ such thatβ„“ ∈ [0, 𝐿′]. Givenaandc, we compute

𝑛(π‘˜1, π‘˜2, π‘Ÿ1, π‘Ÿ2)

= βˆ‘οΈ

cβ€²:cβ€²βˆˆ{0,1}πΏβ€²βˆ’β„“+π‘˜1 and(𝑐′1,...,π‘β€²π‘˜ 1)=(π‘Žβ„“βˆ’π‘˜

1+1,...,π‘Žβ„“)

𝑁(cβ€²,(π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′), π‘Ÿ1, π‘Ÿ2). Note that𝑁𝐷(a,c) =𝑛(β„“, 𝐿′, π‘˜ , π‘˜). In addition, by definition of𝑁𝐷(a, 𝐴), we have that𝑁𝐷(a, 𝐴) =Í

cβˆˆπ΄π‘π·(a,c). Hence,𝑁𝐷(a, 𝐴)can be computed efficiently when 𝑁𝐷(a,c)is computed.

Forπ‘˜1=0, we have that

𝑛(0, π‘˜2, π‘Ÿ1, π‘Ÿ2) = βˆ‘οΈ

cβ€²:cβ€²βˆˆ{0,1}πΏβ€²βˆ’β„“

𝑁(cβ€²,(π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′), π‘Ÿ1, π‘Ÿ2), (6.17)

which by Eq. (6.14) equals 0 when πΏβ€²βˆ’β„“βˆ’π‘Ÿ1 > π‘˜2orπ‘˜2βˆ’π‘Ÿ2 > πΏβ€²βˆ’β„“. WhenπΏβ€²βˆ’ β„“βˆ’π‘Ÿ1 ≀ π‘˜2andπ‘˜2βˆ’π‘Ÿ2≀ πΏβ€²βˆ’β„“, we show that

𝑛(0, π‘˜2, π‘Ÿ1, π‘Ÿ2) =

π‘Ÿ2

βˆ‘οΈ

𝑖=π‘˜2βˆ’(πΏβ€²βˆ’β„“)

π‘˜2 𝑖

πΏβ€²βˆ’β„“ πΏβ€²βˆ’β„“βˆ’ (π‘˜2βˆ’π‘–)

2πΏβ€²βˆ’β„“βˆ’(π‘˜2βˆ’π‘–), (6.18) forπ‘˜2β‰₯ πΏβ€²βˆ’β„“and that

𝑛(0, π‘˜2, π‘Ÿ1, π‘Ÿ2) =

π‘Ÿ1

βˆ‘οΈ

𝑖=πΏβ€²βˆ’β„“βˆ’π‘˜2

π‘˜2 π‘˜2βˆ’ (πΏβ€²βˆ’β„“βˆ’π‘–)

πΏβ€²βˆ’β„“ 𝑖

2𝑖, (6.19) forπ‘˜2< πΏβ€²βˆ’β„“. Forπ‘˜2 β‰₯ πΏβ€²βˆ’β„“and sets(Ξ”1,Ξ”2) ∈ I (cβ€²,c= (π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′), π‘Ÿ1, π‘Ÿ2), the cardinality|Ξ”2|satisfiesπ‘˜2βˆ’ (πΏβ€²βˆ’β„“) ≀ |Ξ”2| β‰€π‘Ÿ2because

cβ€²(Ξ”1)= (π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′) (Ξ”2). For given |Ξ”2|, there are |Ξ”π‘˜2

2|

ways to select Ξ”2 and πΏβ€²βˆ’β„“βˆ’(πΏβ€²βˆ’π‘˜β„“2βˆ’Ξ”2)

choices of Ξ”1. Moreover, givenc,Ξ”1, andΞ”2, there are 2πΏβ€²βˆ’β„“βˆ’(π‘˜2βˆ’Ξ”2) choices ofcβ€²such thatc(Ξ”2)= cβ€²(Ξ”1). Hence we have Eq. (6.18). Similarly, we have Eq. (6.19). Therefore, the number𝑛(π‘˜1, π‘˜2, π‘Ÿ1, π‘Ÿ2)can be computed whenπ‘˜1=0.

For π‘˜1 > 0, we compute π‘›π‘˜

1, π‘˜2,π‘Ÿ1,π‘Ÿ2 iteratively from π‘˜1 = 0 to π‘˜1 = β„“ using the following recursion.

𝑛(π‘˜1, π‘˜2, π‘Ÿ1, π‘Ÿ2) = βˆ‘οΈ

π‘˜:π‘˜βˆˆ[πΏβ€²βˆ’π‘˜2+1, 𝐿′],π‘π‘˜=π‘Žβ„“βˆ’π‘˜

1+1

𝑛(π‘˜1βˆ’1, πΏβ€²βˆ’π‘˜ , π‘Ÿ1, π‘Ÿ2βˆ’ π‘˜+πΏβ€²βˆ’π‘˜2+1) +2𝑛(π‘˜1βˆ’1, π‘˜2, π‘Ÿ1βˆ’1, π‘Ÿ2), (6.20) where 𝑛(π‘˜β€², π‘˜β€²β€², π‘Ÿβ€², π‘Ÿβ€²β€²) = 0 if π‘Ÿβ€² < 0 or π‘Ÿβ€²β€² < 0. Note that for any (Ξ”1,Ξ”2) ∈ I (cβ€²,c= (π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′), π‘Ÿ1, π‘Ÿ2), we have either 1βˆˆΞ”1or 1βˆ‰ Ξ”1. When 1βˆˆΞ”1, thencβ€²β€²(Ξ”1\{1} βˆ’1) = (π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′) (Ξ”2), wherecβ€²β€² = (𝑐′

2, . . . , 𝑐′

πΏβ€²βˆ’β„“+π‘˜1) and Ξ”βˆ’π‘– ={π‘—βˆ’π‘– : 𝑗 βˆˆΞ”}for any setΞ”and integer𝑖. Note that there are𝑛(π‘˜1βˆ’1, π‘˜2, π‘Ÿ1βˆ’ 1, π‘Ÿ2) choices of(cβ€²β€²,Ξ”1\{1} βˆ’1,Ξ”2)such that(𝑐′′

1, . . . , 𝑐′′

π‘˜1βˆ’1) =(π‘Žβ„“βˆ’π‘˜

1+2,...,π‘Žβ„“)and cβ€²β€²(Ξ”1\{1}βˆ’1)= (π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′) (Ξ”2). Since𝑐′

1can be either 0 or 1 when 1βˆˆΞ”1. We have 2𝑛(π‘˜1βˆ’1, π‘˜2, π‘Ÿ1βˆ’1, π‘Ÿ2) choices of (cβ€²,Ξ”1,Ξ”2) such that (𝑐′

2, . . . , 𝑐′

π‘˜1) = (𝑐′′

1, . . . , 𝑐′′

π‘˜1βˆ’1) =(π‘Žπ‘™βˆ’π‘˜

1+2,...,π‘Žβ„“)andcβ€²(Ξ”1)= (π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′) (Ξ”2), when 1 βˆˆΞ”1. When 1 βˆ‰ Ξ”1, Let π‘˜ be the minimum index such that π‘˜ ∈ [πΏβ€²βˆ’ π‘˜2 +1, 𝐿′] and (π‘˜βˆ’πΏβ€²+π‘˜2) βˆ‰ Ξ”2. Then, we have thatπ‘π‘˜ =𝑐′

1=π‘Žπ‘™βˆ’π‘˜

1+1, [1, π‘˜βˆ’πΏβ€²+π‘˜2βˆ’1] ∈ Ξ”2, and cβ€²β€²(Ξ”1βˆ’1) = (π‘π‘˜+1, . . . , 𝑐𝐿′) (Ξ”2\[1, π‘˜ βˆ’ 𝐿′+ π‘˜2 βˆ’1] βˆ’ π‘˜ + πΏβ€²βˆ’ π‘˜2), where cβ€²β€² =(𝑐′

2, . . . , 𝑐′

πΏβ€²βˆ’β„“+π‘˜1). There are𝑛(π‘˜1βˆ’1, πΏβ€²βˆ’π‘˜ , π‘Ÿ1, π‘Ÿ2βˆ’π‘˜+πΏβ€²βˆ’π‘˜2+1)choices of (cβ€²β€²,Ξ”1 βˆ’ 1,Ξ”2\[1, π‘˜ βˆ’ 𝐿′+ π‘˜2 βˆ’ 1] βˆ’ π‘˜ + πΏβ€²βˆ’ π‘˜2) such that cβ€²β€²(Ξ”1 βˆ’ 1) =

(π‘π‘˜+1, . . . , 𝑐𝐿′) (Ξ”2\[1, π‘˜ βˆ’ 𝐿′+ π‘˜2βˆ’1] βˆ’π‘˜ + πΏβ€²βˆ’ π‘˜2) and that (𝑐′′

1, . . . , 𝑐′′

π‘˜1βˆ’1) = (π‘Žβ„“βˆ’π‘˜

1+2,...,π‘Žβ„“). Therefore, there are 𝑛(π‘˜1 βˆ’ 1, 𝐿′ βˆ’ π‘˜ , π‘Ÿ1, π‘Ÿ2 βˆ’ π‘˜ + 𝐿′ βˆ’ π‘˜2 + 1) choices of (cβ€²,Ξ”1,Ξ”2) such that (𝑐′

1, . . . , 𝑐′

π‘˜1) = (π‘Žβ„“βˆ’π‘˜

1+1, . . . , π‘Žβ„“) and cβ€²(Ξ”1) = (π‘πΏβ€²βˆ’π‘˜2+1, . . . , 𝑐𝐿′) (Ξ”2). Note that for each π‘˜ satisfying π‘˜ ∈ [πΏβ€²βˆ’ π‘˜2+1, 𝐿′] and π‘π‘˜ =𝑐′

1 =π‘Žβ„“βˆ’π‘˜

1+1, there are𝑛(π‘˜1βˆ’1, πΏβ€²βˆ’π‘˜ , π‘Ÿ1, π‘Ÿ2βˆ’π‘˜+πΏβ€²βˆ’π‘˜2+1)choices of such (cβ€²,Ξ”1,Ξ”2). In addition, different π‘˜ corresponds to different choices since π‘˜ is the minimum index such that(π‘˜βˆ’πΏβ€²+π‘˜2) βˆ‰ Ξ”2. Hence, we have (6.4).

By Eq. (6.17), (6.18), (6.19), and (6.4), the number 𝑁(a,c) = 𝑛(β„“, 𝐿′, π‘˜ , π‘˜) can be recursively computed for any a ∈ {0,1}β„“ and c ∈ {0,1}𝐿′. Therefore, the encoding/decoding can be computed in 𝑝 π‘œπ‘™ 𝑦(𝑀 , 𝐿′) time.

We are now ready to present the algorithm that computes 𝐹𝐷

𝑆 (𝑑) for an integer 𝑑 ∈ h

2πΏβ€²βˆ’(π‘€βˆ’1)𝑃+π‘€βˆ’1 π‘€βˆ’1

i

. The algorithm is the same as the encoding procedure in Sec. 6.4, by replacing 𝑁𝐻(a, 𝐴) with 𝑁𝐷(a, 𝐴) for any sequence a and set of sequences 𝐴. In addition, the integers π‘žπ‘– are generated such that π‘ž1 = 2𝐿′ and π‘žπ‘–+1βˆ’ π‘žπ‘– > 𝑃 for𝑖 ∈ [𝑀 βˆ’1]. Such π‘žπ‘–, 𝑖 ∈ [𝑀] can be generated following the same argument in Lemma6.4.1, since𝑑 ∈ h

2πΏβ€²βˆ’(π‘€βˆ’1)𝑃+π‘€βˆ’1 π‘€βˆ’1

i

. Given integers π‘žπ‘–, 𝑖 ∈ [𝑀], satisfying π‘ž1 = 2𝐿′ and π‘žπ‘–+1βˆ’ π‘žπ‘– > 𝑃 for 𝑖 ∈ [𝑀 βˆ’1], the encoding procedure for generating{a1, . . . ,a𝑀} is given as follows.

Encoding:

for𝑖 ∈ [𝑀], do π‘ž =π‘žπ‘–.

forβ„“ ∈ [𝐿′], do

if 2πΏβ€²βˆ’β„“ βˆ’π‘π·( (π‘Žπ‘–,1, . . . , π‘Žπ‘–,β„“βˆ’1,0),{a𝑗}π‘–βˆ’1

𝑗=1) β‰₯π‘ž, thenπ‘Žπ‘–,β„“ =0.

else

π‘ž =π‘žβˆ’ (2πΏβ€²βˆ’β„“ βˆ’π‘π·( (π‘Žπ‘–,1, . . . , π‘Žπ‘–,β„“βˆ’1,0),{a𝑗}π‘–βˆ’1

𝑗=1)), π‘Žπ‘–,β„“ =1.

end if end for

end for

return{a1, . . . ,a𝑀}.

The correctness of the encoding procedure follows similar argument to the one in Sec. 6.4. We prove that the input(π‘ž1, . . . , π‘žπ‘€)and output{a1, . . . ,a𝑀}satisfy

decimal(a𝑖) =π‘žπ‘–βˆ’1+ βˆ‘οΈ

β„“:π‘Žπ‘– ,β„“=1andβ„“βˆˆ[𝐿′]

𝑁𝐷( (π‘Žπ‘–,1, . . . , π‘Žπ‘–,β„“βˆ’1,0),{a𝑗}π‘–βˆ’π‘—=11) (6.21) and {a1, . . . ,a𝑀} ∈ S𝐷. The following is a deletion metric version of 6.4.3, by replacing𝑁𝐻(a, 𝐴)with𝑁𝐷(a, 𝐴)for any sequencea∈ {0,1}β„“and set𝐴 ∈ {0,1}𝐿′. Lemma 6.5.3. After theβ„“-th,β„“ ∈ [𝐿′], inner for loop in the𝑖-th,𝑖 ∈ [𝑀], outer for loop in the encoding procedure, we have that

0< π‘ž ≀ 2πΏβ€²βˆ’β„“ βˆ’π‘π·( (π‘Žπ‘–,1, . . . , π‘Žπ‘–,β„“),{a𝑗}π‘–π‘—βˆ’1=1). (6.22) At the end of the𝑖-th outer for loop, we have thatπ‘ž =1.

Proof. The proof is the same as that of Lemma6.4.3, by noticing that 𝑁𝐷(0,{a𝑗}π‘–π‘—βˆ’1=1) +𝑁𝐷(1,{a𝑗}π‘–π‘—βˆ’1=1) =

π‘–βˆ’1

βˆ‘οΈ

𝑗=1

𝑁𝐷(,a𝑗)

(π‘Ž)

=(π‘–βˆ’1)𝑃,

where is the empty sequence and(π‘Ž)follows from (6.15) and the fact that𝑁𝐷(a, 𝐴)= Í

cβˆˆπ΄π‘π·(a,c). In addition, we have (6.16), which is the deletion metric version of (6.5). The rest of the proof follows the same as in6.4.3. β–‘ From Lemma6.5.3, we have

π‘ž =2πΏβ€²βˆ’πΏβ€² βˆ’π‘π·(a𝑖,{a𝑗}π‘–π‘—βˆ’1=1) =1,

at the end of the 𝑖-th outer for-loop, 𝑖 ∈ [𝑀]. Hence, 𝑁𝐷(a𝑖,{a𝑗}π‘–βˆ’1

𝑗=1) = 0 for 𝑖 ∈ [𝑀] and Dπ‘˜(a𝑖) ∩ Dπ‘˜(a𝑗) = βˆ… for any 𝑖 β‰  𝑗, 𝑖, 𝑗 ∈ [𝑀]. Then, we have that{a𝑖}𝑀

𝑖=1 ∈ S𝐷. In addition, similar to Lemma6.4.4, we can use Lemma6.5.3to show that the output{a𝑖}𝑀

𝑖=1satisfies Eq. (6.21).

Therefore, we have the following decoding algorithm, similar to the one in Sec. 6.4.

Decoding:

(1) Order the strings{a𝑖}𝑀

𝑖=1such thata1 > a2 > . . . >a𝑀. (2) For𝑖 ∈ [𝑀],

π‘žπ‘–=decimal(a𝑖) +1+ βˆ‘οΈ

β„“:π‘Žπ‘– ,β„“=1andβ„“βˆˆ[𝐿′]

𝑁𝐷( (π‘Žπ‘–,1, . . . , π‘Žπ‘–,β„“βˆ’1,0),{a𝑗}π‘–π‘—βˆ’1=1). (6.23) Finally, the correctness of decoding is guaranteed by (6.21) and the fact thata1 >

a2 > . . . > a𝑀, where a𝑖 is the output generated in the𝑖-th outer-loop. The latter follows similar proof to the one in Sec. 6.4. Suppose there exists𝑖1 > 𝑖2 such that a𝑖1 >a𝑖2 alphabetically. Then we have that

π‘žπ‘–

2βˆ’π‘žπ‘–

1 <

βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—] (2πΏβ€²βˆ’β„“

βˆ’π‘π·( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖2βˆ’1

𝑗=1))

βˆ’ βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

(2πΏβ€²βˆ’β„“ βˆ’π‘π·( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖1βˆ’1

𝑗=1))

= βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

(𝑁𝐷( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖1βˆ’1

𝑗=1)

βˆ’π‘π·( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖2βˆ’1

𝑗=1))

= βˆ‘οΈ

β„“:π‘Žπ‘–

1,β„“=1andβ„“βˆˆ[β„“βˆ—]

𝑁𝐷( (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“βˆ’1,0),{a𝑗}𝑖𝑗=𝑖1βˆ’1

2)

(π‘Ž)

≀𝑁𝐷(βˆ…,{a𝑗}𝑖1βˆ’1

𝑗=𝑖2)

(𝑏)

≀ (𝑖1βˆ’π‘–2)𝑃, (6.24)

whereβˆ…is the empty sequence and(π‘Ž)follows from the definition of𝑁𝐷(a, 𝐴)and the fact that the strings which have (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“1βˆ’1,0) and (π‘Žπ‘–

1,1, . . . , π‘Žπ‘–

1,β„“2βˆ’1,0) as prefixes, respectively, are different. Inequality (𝑏) follows from (6.15) and the fact that𝑁𝐷(a, 𝐴)=Í

cβˆˆπ΄π‘π·(a,c).