Codes Capable of Correcting Bursts of Deletions

Chapter IV: Correcting Deletions/Insertions-Generalizations

4.5 Codes Capable of Correcting Bursts of Deletions

length𝑁₂−𝑘subsequence of 𝑅 𝑒 𝑝_𝑘+₁(𝑔_𝑐(𝑔_𝑐(c)))), which is a𝑘-deletion correcting code. Therefore 𝑔_𝑐(𝑔_𝑐(c)) can be recovered. In addition, (𝑧_𝑛+1, . . . , 𝑧_𝑛+𝑁

1−𝑘) is a length 𝑁₁ − 𝑘 subsequence of 𝑔_𝑐(c). Since 𝑔_𝑐(𝑔_𝑐(c)) is a 𝑘-deletion correcting hash of𝑔_𝑐(c), the hash𝑔_𝑐(c) can be recovered. Finally, note that (𝑧₁, . . . , 𝑧_𝑛−𝑘) is a length𝑛−𝑘 subsequence of𝑛, we can use𝑔_𝑐(c) to recoverc. The decoding ofc from𝑔_𝑐(c) is done using brute force, over all sequencesc^′that satisfyd ∈ B𝑘(c^′).

The computing of 𝑔_𝑐(c) is done by brute force, over sequencesc^′ ∈ B𝑘(c). Hence the encoding and decoding complexities are𝑂(𝑛²^𝑘+¹) and𝑂(𝑛^𝑘+¹)respectively.

of a Varshamov-Tenengolts code ^𝑘(^𝑘+1)₂ times. For notation convenience, suppose 𝑖|𝑛 for 𝑖 ∈ [𝑘], where [𝑘] = {1, . . . , 𝑘}2. Then for a vector u ∈ {0,1}^𝑛, define

𝑓^𝑏^𝑐

𝑘 :{0,1}^𝑛→ [ [^𝑛

𝑘 +1]]^𝑘× [ [ ^𝑛

𝑘−1 +1]]^𝑘−1× · · · × [ [𝑛+1]]as 𝑓

𝑏_𝑐

𝑘 (u) =

𝑛 𝑘−1

∑︁

𝑗=0

𝑢_𝑘·_𝑗+1mod 𝑛 𝑘

+1,

𝑛 𝑘−1

∑︁

𝑗=0

𝑢_𝑘·_𝑗+2mod 𝑛 𝑘

+1, . . . ,

𝑛 𝑘−1

∑︁

𝑗=0

𝑢_𝑘_·_𝑗+𝑘 mod 𝑛 𝑘

𝑛 𝑘−1−1

∑︁

𝑗=0

𝑢_(𝑘−1)·_𝑗+1mod 𝑛 𝑘 −1 +1,

𝑛 𝑘−1−1

∑︁

𝑗=0

𝑢₍_𝑘−1)·_𝑗+2 mod 𝑛

𝑘−1 +1, . . . ,

𝑛 𝑘−1−1

∑︁

𝑗=0

𝑢₍_𝑘_−1)·_𝑗₊_𝑘₋₁mod 𝑛 𝑘 −1 +1

𝑛 𝑘−2−1

∑︁

𝑗=0

𝑢_(𝑘−₂_)·_𝑗+₁mod 𝑛 𝑘 −2 +1,

𝑛 𝑘−2−1

∑︁

𝑗=0

𝑢₍_𝑘−₂_)·_𝑗+₂ mod 𝑛

𝑘−2 +1, . . . ,

𝑛 𝑘−2−1

∑︁

𝑗=0

𝑢_(𝑘−2)·_{𝑗+𝑘−2}mod 𝑛 𝑘 −2 +1 ..

𝑛

∑︁

𝑗=1

𝑢_𝑗 mod𝑛+1

∈ [ [𝑛 𝑘

+1]]^𝑘 × [ [ 𝑛

𝑘−1 +1]]^𝑘−¹× · · · × [ [𝑛+1]]. For convenience we will sometimes assume that the image of 𝑓^𝑏^𝑐

𝑘 (u) is an integer fromh h

Î^𝑘

𝑠=1(^𝑛

𝑠+1)^𝑠i i

≤ O (⁽^𝑛⁺^𝑘⁾

𝑘2

𝑘! ). It is straightforward to show that 𝑓^𝑏^𝑐

𝑘 satisfies the confusability property, but we include the following lemma for completeness.

Lemma 4.5.1. Supposeu ∈ {0,1}^𝑛. Then for anyy∈ B^𝑏^𝑐

𝑘 (u), 𝑓

𝑏_𝑐

𝑘 (u) ≠ 𝑓

𝑏_𝑐 𝑘 (y).

Proof. To prove the result, assume thatzis the result of a burst of deletions of length at most 𝑘 occurring touand we are given 𝑓^𝑏^𝑐

𝑘 (u). We will show that it is possible to uniquely recoverufromzgiven 𝑓^𝑏^𝑐

𝑘 (u), which is equivalent to showing that for anyy∈ B^𝑏^𝑐

𝑘 (u), 𝑓

𝑏_𝑐

𝑘 (u) ≠ 𝑓

𝑏_𝑐 𝑘 (y). Suppose 𝑓^𝑏^𝑐

𝑘 (u) = (𝑎_{𝑘 ,}₁, . . . , 𝑎_{𝑘 , 𝑘}, 𝑎₁_{, 𝑘−}₁, . . . , 𝑎_𝑘−₁_{, 𝑘−}₁, . . . , 𝑎₁)and that|z|=𝑛−𝑠 so that z is the result of a burst of 𝑠 ≤ 𝑘 consecutive deletions occurring to u.

Consider the sequences:

z⁽¹⁾ = (𝑧₁, 𝑧₁_+𝑠, 𝑧₁₊₂_𝑠, . . . , 𝑧_𝑛−𝑠+₁),

2We can replace^𝑛𝑖 with⌈^𝑛_𝑖⌉if𝑖∤𝑛.

z⁽²⁾ = (𝑧₂, 𝑧_2+𝑠, 𝑧₂₊₂_𝑠, . . . , 𝑧_{𝑛−𝑠+2}), ..

z^(𝑠) = (𝑧_𝑠, 𝑧₂_𝑠, 𝑧₃_𝑠, . . . , 𝑧_𝑛).

Also, let

u⁽¹⁾ = (𝑢₁, 𝑢₁_+𝑠, 𝑢₁₊₂_𝑠, . . . , 𝑢_𝑛−𝑠+₁), u⁽²⁾ = (𝑢₂, 𝑢₂₊_𝑠, 𝑢₂₊₂_𝑠, . . . , 𝑢_𝑛₋_𝑠₊₂),

.. .

u⁽^𝑠⁾ = (𝑢_𝑠, 𝑢₂_𝑠, 𝑢₃_𝑠, . . . , 𝑢_𝑛).

Since for𝑖 ∈ [𝑠],z⁽^𝑖⁾ is the result of a single deletion occurring tou⁽^𝑖⁾, it is possible to recoveru⁽^𝑖⁾ givenz⁽^𝑖⁾ and𝑎_𝑠,₁, 𝑎_𝑠,₂, . . . , 𝑎_𝑠,𝑠 since

u^(𝑖) =(𝑢⁽

𝑖)

1 , . . . , 𝑢⁽

𝑖)

𝑛 𝑠

) ∈ {0,1}^𝑛^𝑠 :

𝑛 𝑠

∑︁

𝑗=1

𝑢⁽

𝑖)

𝑗 ≡𝑎_𝑠,𝑖 mod 𝑛 𝑠

+1 o

is a code capable of correcting a single deletion. □

From Lemma 4.5.1the mapping 𝑓^𝑏^𝑐

𝑘 satisfies the confusability property. Further- more, 𝑓^𝑏^𝑐

𝑘 satisfies the redundancy property since log Î𝑘 𝑠=1(^𝑛

𝑠+1)^𝑠

≤ O

𝑘²log(𝑛+ 𝑘)

, and 𝑘 is assumed to be a constant. Therefore, from Theorem 4.2.1, for any u ∈ {0,1}^𝑛 there exists an integer𝑎 such that𝑎 ≤ 2^log^|B^𝑏𝑐^𝑘 (u) |+𝑜(log𝑛)

, and for any y∈ B^𝑏^𝑐

𝑘 (u), 𝑓^𝑏^𝑐

𝑘 (u) . 𝑓^𝑏^𝑐

𝑘 (y) mod𝑎.

We define our codeC^𝑏^𝑐(𝑁 , 𝑘)with𝑁 =𝑛+2 log|B^𝑏^𝑐

𝑘 (x) | +𝑜(log𝑛)as follows:

C^𝑏^𝑐(𝑁 , 𝑘) = (

u,1,0^𝑘,1^𝑘,0, 𝑎, 𝑓^𝑏^𝑐

𝑘 (u) mod𝑎

:u∈ {0,1}^𝑛 )

. (4.7) We now prove the following theorem and thus prove Theorem4.1.3. In the statement below,uis the information portion of the sequence (the non-redundancy part) from (4.7).

Theorem 4.5.1. Let z be the result of a consecutive burst of length at most 𝑘 occurring tox ∈ C^𝑏^𝑐(𝑁 , 𝑘). Then, we can uniquely determinexfromz.

Proof. To prove the result, we show how to recoveru fromz. In order to recover ufromz, we show that it is possible to separatezinto two parts: z₁ andz₂where either a) z₁ is the result of a burst of deletions of length at most 𝑘 occurring to u or b) z₂ is the result of a burst of deletions of length at most 𝑘 occurring to r = (1, 0^𝑘, 1^𝑘, 0, 𝑎, 𝑓^𝑏^𝑐

𝑘 (u) mod𝑎). Note that ifz₁ ≠ u, then (due to the length of the burst) z₂ = r and the fact that we can recoverufromz₁ provided rfollows immediately from Theorem 4.2.1. If b) holds and z₂ ≠ r, then by similar logic, u=z₁. Note that the fact thatz₁≠ ucan be determined immediately by the length of z₁ (due to the deletions) and similarly we can easily detect when z₂ ≠ r by considering the length ofz₂. Therefore, in the remainder of the proof we show how to recover z₁,z₂ from z assuming a burst of 𝑠 ≤ 𝑘 deletions have occurred to x resulting inz.

In order to separatezintoz₁andz₂, we make use of the marker sequence 1, 0^𝑘, 1^𝑘, 0, which is embedded into every codeword in our code according to (4.7). Let |z| = 𝑛−𝑠. If

(𝑧_𝑛+₁, 𝑧_𝑛+₂, . . . , 𝑧_𝑛+₂_𝑘−𝑠+₁) =(0^𝑘−𝑠+¹,1^𝑘), (4.8) then, it is straightforward to observe thatz₂ =rwherez₂is equal to the last 𝑁−𝑛 bits ofz. We set z₁ to be equal to the first𝑛− 𝑠 bits ofz so that by the previous discussion we can recoverufromz.

Next, suppose that

(𝑧_𝑛+1, 𝑧_𝑛+2, . . . , 𝑧_𝑛+𝑘+1) =(1,0^𝑘). (4.9) In this case the burst of length𝑘 could not have started in any of the positions from the set [𝑛] ={1,2, . . . , 𝑛}, which impliesuis equal to the first𝑛bits ofz.

The only case left to consider is where the deletion begins in marker sequence 1, 0^𝑘, 1^𝑘, 0. First note that if the deletion occurs in the marker sequence then (4.8) can hold only if the deletion begins in position 𝑛+1 in x. In this case, it is straightforward to verify that the decoding described for this will still generate u sinceris still equal to the last 𝑁 −𝑛 bits ofz. If the deletion begins in one of the positions{𝑛+2, 𝑛+3, . . . , 𝑛+𝑘+1}, then

(𝑧_𝑛+₁, 𝑧_𝑛+₂, . . . , 𝑧_𝑛+₁_+𝑘) = (1,0^𝑗,1^𝑘−^𝑗),

so that neither (4.8) or (4.9) can hold. If the deletion begins in the marker sequence after position𝑛+𝑘+1 inx, then (4.9) holds and the decoding is correct in this case

as well. □

Codes correcting bursts of deletions

Next, we consider a more generalized type of burst error pattern. In this section, we want to correct𝑘 bursts each occurring within a window of length at most𝑡_𝐿 where the deletions in each burst need not occur consecutively. For shorthand, we refer to these codes as (𝑘 , 𝑡_𝐿)-burst codes. The main result here will be to show that for the case where𝑘 , 𝑡_𝐿 are constants, there exists(𝑘 , 𝑡_𝐿)-burst codes with redundancy 4𝑘(1+𝜖)log𝑛for𝑘 , 𝑛large enough.

We begin by first introducing some notation, and then we proceed to our code construction. We say that z ∈ {0,1}^𝑛^−|^𝐽^| is the result of 𝑘 bursts each occurring within a window of length at most 𝑡_𝐿 occurring to x ∈ {0,1}^𝑛 if there exists sets 𝐽 , 𝐽_𝑏 ⊆ [𝑛], with|𝐽| ≤ 𝑘 ·𝑡_𝐿, |𝐽_𝑏|=𝑘 such that the following holds:

1. zcan be obtained by deleting symbols fromxin positions𝐽. 2. For any 𝑗 ∈ 𝐽, there exists an𝑖 ∈ 𝐽_𝑏where|𝑗 −𝑖| < 𝑡_𝐿. We illustrate these notations in the following example.

Example 4.5.1. Suppose x = (0,1,1,

1,0,

1,0,0,0,1,1,

0,0) ∈ {0,1}¹³ is in a (2,3)-burst code. Let

z=(0,1,1,0,0,0,0,1,1,0)¹⁰.

Then, we can claim that zis the result of 2 bursts of deletions of length at most3 since we can write𝐽 ={4,6,12}and𝐽_𝑏 ={4,12}with𝑡_𝐿 =3. It follows that given z, it is possible to uniquely recoverxprovidedxis in a(2,3)-burst code.

For a vector x ∈ {0,1}^𝑚, let 𝐵_{𝑘 ,𝑡}

𝐿(x) be the set of vectors possible given that 𝑘 bursts each occurring within a window of length at most𝑡_𝐿 occur tox. Then, define B^𝑏

𝑘 ,𝑡𝐿

(x) ⊆ {0,1}^𝑚 so that B^𝑏

𝑘 ,𝑡𝐿

(x) ={y ∈ {0,1}^𝑚 :𝐵_{𝑘 ,𝑡}

𝐿(x) ∩𝐵_{𝑘 ,𝑡}

𝐿(y) ≠ ∅,y≠x}.

Clearly, ifx is in a (𝑘 , 𝑡_𝐿)-burst code, then y cannot be in the same code for any y∈ B^𝑏

𝑘 ,𝑡𝐿

(x). The following claim follows from straightforward counting arguments.

Claim 4.5.2. For integers 𝑘 , 𝑡_𝐿, 𝑚, and anyu∈ {0,1}^𝑚,

|B^𝑏

𝑘 ,𝑡𝐿(u) | ≤ 𝑚²^𝑘·

(𝑡_𝐿 +1)^𝑘2^𝑘·𝑡^𝐿2

In order to apply the syndrome compression technique, we need to specify the labeling and also to show that the redundancy and confusability properties hold.

For this setup, we will use the same systematic labeling used to correct multiple deletions that was introduced in Sec.4.4. More specifically, we will use the labeling 𝑔 defined in (4.6). It follows immediately from our definitions and Lemma 4.4.4 that ifu,y ∈ {0,1}^𝑛andy ∈ B^𝑏

𝑘 ,𝑡𝐿

(u)\{u}, then 𝑓

𝑠 𝑦 𝑠

𝑘 (u)≠ 𝑓

𝑠 𝑦 𝑠 𝑘 (y),

so that the confusability property holds. The redundancy property also follows immediately from the definition of 𝑔 since 𝑘 , 𝑡_𝐿 are constants. Thus, to construct (𝑘 , 𝑡_𝐿)-burst codes, we can apply the same syndrome compression procedure as described in Sec. 4.3 and Sec. 4.4, except that we will search for an 𝑎 ∈ [ [𝑛²^𝑘 ·

(𝑡_𝐿+1)^𝑘2^𝑘·𝑡^𝐿2

]] such that 𝑓

𝑠 𝑦 𝑠

𝑘 (u) . 𝑓

𝑠 𝑦 𝑠

𝑘 (y) mod𝑎 for anyy ∈ B^𝑏

𝑘 ,𝑡𝐿

(u). Since log𝑎 ≤ 2𝑘log𝑛+𝑜(log𝑛)for𝑛large enough, the resulting construction is systematic and has redundancy 4𝑘log𝑛 + 𝑜(log𝑛) for 𝑘 , 𝑛 large enough. Hence, we have Theorem4.1.4.

Dalam dokumen CorrectingErrorsinDNAStorage - California Institute of Technology (Halaman 119-124)