• Tidak ada hasil yang ditemukan

Codes Capable of Correcting Bursts of Deletions

Chapter IV: Correcting Deletions/Insertions-Generalizations

4.5 Codes Capable of Correcting Bursts of Deletions

length𝑁2βˆ’π‘˜subsequence of 𝑅 𝑒 π‘π‘˜+1(𝑔𝑐(𝑔𝑐(c)))), which is aπ‘˜-deletion correcting code. Therefore 𝑔𝑐(𝑔𝑐(c)) can be recovered. In addition, (𝑧𝑛+1, . . . , 𝑧𝑛+𝑁

1βˆ’π‘˜) is a length 𝑁1 βˆ’ π‘˜ subsequence of 𝑔𝑐(c). Since 𝑔𝑐(𝑔𝑐(c)) is a π‘˜-deletion correcting hash of𝑔𝑐(c), the hash𝑔𝑐(c) can be recovered. Finally, note that (𝑧1, . . . , π‘§π‘›βˆ’π‘˜) is a lengthπ‘›βˆ’π‘˜ subsequence of𝑛, we can use𝑔𝑐(c) to recoverc. The decoding ofc from𝑔𝑐(c) is done using brute force, over all sequencescβ€²that satisfyd ∈ Bπ‘˜(cβ€²).

The computing of 𝑔𝑐(c) is done by brute force, over sequencescβ€² ∈ Bπ‘˜(c). Hence the encoding and decoding complexities are𝑂(𝑛2π‘˜+1) and𝑂(π‘›π‘˜+1)respectively.

of a Varshamov-Tenengolts code π‘˜(π‘˜+1)2 times. For notation convenience, suppose 𝑖|𝑛 for 𝑖 ∈ [π‘˜], where [π‘˜] = {1, . . . , π‘˜}2. Then for a vector u ∈ {0,1}𝑛, define

𝑓𝑏𝑐

π‘˜ :{0,1}𝑛→ [ [𝑛

π‘˜ +1]]π‘˜Γ— [ [ 𝑛

π‘˜βˆ’1 +1]]π‘˜βˆ’1Γ— Β· Β· Β· Γ— [ [𝑛+1]]as 𝑓

𝑏𝑐

π‘˜ (u) =

𝑛 π‘˜βˆ’1

βˆ‘οΈ

𝑗=0

π‘’π‘˜Β·π‘—+1mod 𝑛 π‘˜

+1,

𝑛 π‘˜βˆ’1

βˆ‘οΈ

𝑗=0

π‘’π‘˜Β·π‘—+2mod 𝑛 π‘˜

+1, . . . ,

𝑛 π‘˜βˆ’1

βˆ‘οΈ

𝑗=0

π‘’π‘˜Β·π‘—+π‘˜ mod 𝑛 π‘˜

+1

𝑛 π‘˜βˆ’1βˆ’1

βˆ‘οΈ

𝑗=0

𝑒(π‘˜βˆ’1)·𝑗+1mod 𝑛 π‘˜ βˆ’1 +1,

𝑛 π‘˜βˆ’1βˆ’1

βˆ‘οΈ

𝑗=0

𝑒(π‘˜βˆ’1)·𝑗+2 mod 𝑛

π‘˜βˆ’1 +1, . . . ,

𝑛 π‘˜βˆ’1βˆ’1

βˆ‘οΈ

𝑗=0

𝑒(π‘˜βˆ’1)·𝑗+π‘˜βˆ’1mod 𝑛 π‘˜ βˆ’1 +1

𝑛 π‘˜βˆ’2βˆ’1

βˆ‘οΈ

𝑗=0

𝑒(π‘˜βˆ’2)·𝑗+1mod 𝑛 π‘˜ βˆ’2 +1,

𝑛 π‘˜βˆ’2βˆ’1

βˆ‘οΈ

𝑗=0

𝑒(π‘˜βˆ’2)·𝑗+2 mod 𝑛

π‘˜βˆ’2 +1, . . . ,

𝑛 π‘˜βˆ’2βˆ’1

βˆ‘οΈ

𝑗=0

𝑒(π‘˜βˆ’2)·𝑗+π‘˜βˆ’2mod 𝑛 π‘˜ βˆ’2 +1 ..

.

𝑛

βˆ‘οΈ

𝑗=1

𝑒𝑗 mod𝑛+1

∈ [ [𝑛 π‘˜

+1]]π‘˜ Γ— [ [ 𝑛

π‘˜βˆ’1 +1]]π‘˜βˆ’1Γ— Β· Β· Β· Γ— [ [𝑛+1]]. For convenience we will sometimes assume that the image of 𝑓𝑏𝑐

π‘˜ (u) is an integer fromh h

ΓŽπ‘˜

𝑠=1(𝑛

𝑠+1)𝑠i i

≀ O ((𝑛+π‘˜)

π‘˜2

π‘˜! ). It is straightforward to show that 𝑓𝑏𝑐

π‘˜ satisfies the confusability property, but we include the following lemma for completeness.

Lemma 4.5.1. Supposeu ∈ {0,1}𝑛. Then for anyy∈ B𝑏𝑐

π‘˜ (u), 𝑓

𝑏𝑐

π‘˜ (u) β‰  𝑓

𝑏𝑐 π‘˜ (y).

Proof. To prove the result, assume thatzis the result of a burst of deletions of length at most π‘˜ occurring touand we are given 𝑓𝑏𝑐

π‘˜ (u). We will show that it is possible to uniquely recoverufromzgiven 𝑓𝑏𝑐

π‘˜ (u), which is equivalent to showing that for anyy∈ B𝑏𝑐

π‘˜ (u), 𝑓

𝑏𝑐

π‘˜ (u) β‰  𝑓

𝑏𝑐 π‘˜ (y). Suppose 𝑓𝑏𝑐

π‘˜ (u) = (π‘Žπ‘˜ ,1, . . . , π‘Žπ‘˜ , π‘˜, π‘Ž1, π‘˜βˆ’1, . . . , π‘Žπ‘˜βˆ’1, π‘˜βˆ’1, . . . , π‘Ž1)and that|z|=π‘›βˆ’π‘  so that z is the result of a burst of 𝑠 ≀ π‘˜ consecutive deletions occurring to u.

Consider the sequences:

z(1) = (𝑧1, 𝑧1+𝑠, 𝑧1+2𝑠, . . . , π‘§π‘›βˆ’π‘ +1),

2We can replace𝑛𝑖 withβŒˆπ‘›π‘–βŒ‰ifπ‘–βˆ€π‘›.

z(2) = (𝑧2, 𝑧2+𝑠, 𝑧2+2𝑠, . . . , π‘§π‘›βˆ’π‘ +2), ..

.

z(𝑠) = (𝑧𝑠, 𝑧2𝑠, 𝑧3𝑠, . . . , 𝑧𝑛).

Also, let

u(1) = (𝑒1, 𝑒1+𝑠, 𝑒1+2𝑠, . . . , π‘’π‘›βˆ’π‘ +1), u(2) = (𝑒2, 𝑒2+𝑠, 𝑒2+2𝑠, . . . , π‘’π‘›βˆ’π‘ +2),

.. .

u(𝑠) = (𝑒𝑠, 𝑒2𝑠, 𝑒3𝑠, . . . , 𝑒𝑛).

Since for𝑖 ∈ [𝑠],z(𝑖) is the result of a single deletion occurring tou(𝑖), it is possible to recoveru(𝑖) givenz(𝑖) andπ‘Žπ‘ ,1, π‘Žπ‘ ,2, . . . , π‘Žπ‘ ,𝑠 since

n

u(𝑖) =(𝑒(

𝑖)

1 , . . . , 𝑒(

𝑖)

𝑛 𝑠

) ∈ {0,1}𝑛𝑠 :

𝑛 𝑠

βˆ‘οΈ

𝑗=1

𝑒(

𝑖)

𝑗 β‰‘π‘Žπ‘ ,𝑖 mod 𝑛 𝑠

+1 o

is a code capable of correcting a single deletion. β–‘

From Lemma 4.5.1the mapping 𝑓𝑏𝑐

π‘˜ satisfies the confusability property. Further- more, 𝑓𝑏𝑐

π‘˜ satisfies the redundancy property since log ΓŽπ‘˜ 𝑠=1(𝑛

𝑠+1)𝑠

!

≀ O

π‘˜2log(𝑛+ π‘˜)

, and π‘˜ is assumed to be a constant. Therefore, from Theorem 4.2.1, for any u ∈ {0,1}𝑛 there exists an integerπ‘Ž such thatπ‘Ž ≀ 2log|Bπ‘π‘π‘˜ (u) |+π‘œ(log𝑛)

, and for any y∈ B𝑏𝑐

π‘˜ (u), 𝑓𝑏𝑐

π‘˜ (u) . 𝑓𝑏𝑐

π‘˜ (y) modπ‘Ž.

We define our codeC𝑏𝑐(𝑁 , π‘˜)with𝑁 =𝑛+2 log|B𝑏𝑐

π‘˜ (x) | +π‘œ(log𝑛)as follows:

C𝑏𝑐(𝑁 , π‘˜) = (

x=

u,1,0π‘˜,1π‘˜,0, π‘Ž, 𝑓𝑏𝑐

π‘˜ (u) modπ‘Ž

:u∈ {0,1}𝑛 )

. (4.7) We now prove the following theorem and thus prove Theorem4.1.3. In the statement below,uis the information portion of the sequence (the non-redundancy part) from (4.7).

Theorem 4.5.1. Let z be the result of a consecutive burst of length at most π‘˜ occurring tox ∈ C𝑏𝑐(𝑁 , π‘˜). Then, we can uniquely determinexfromz.

Proof. To prove the result, we show how to recoveru fromz. In order to recover ufromz, we show that it is possible to separatezinto two parts: z1 andz2where either a) z1 is the result of a burst of deletions of length at most π‘˜ occurring to u or b) z2 is the result of a burst of deletions of length at most π‘˜ occurring to r = (1, 0π‘˜, 1π‘˜, 0, π‘Ž, 𝑓𝑏𝑐

π‘˜ (u) modπ‘Ž). Note that ifz1 β‰  u, then (due to the length of the burst) z2 = r and the fact that we can recoverufromz1 provided rfollows immediately from Theorem 4.2.1. If b) holds and z2 β‰  r, then by similar logic, u=z1. Note that the fact thatz1β‰  ucan be determined immediately by the length of z1 (due to the deletions) and similarly we can easily detect when z2 β‰  r by considering the length ofz2. Therefore, in the remainder of the proof we show how to recover z1,z2 from z assuming a burst of 𝑠 ≀ π‘˜ deletions have occurred to x resulting inz.

In order to separatezintoz1andz2, we make use of the marker sequence 1, 0π‘˜, 1π‘˜, 0, which is embedded into every codeword in our code according to (4.7). Let |z| = π‘›βˆ’π‘ . If

(𝑧𝑛+1, 𝑧𝑛+2, . . . , 𝑧𝑛+2π‘˜βˆ’π‘ +1) =(0π‘˜βˆ’π‘ +1,1π‘˜), (4.8) then, it is straightforward to observe thatz2 =rwherez2is equal to the last π‘βˆ’π‘› bits ofz. We set z1 to be equal to the firstπ‘›βˆ’ 𝑠 bits ofz so that by the previous discussion we can recoverufromz.

Next, suppose that

(𝑧𝑛+1, 𝑧𝑛+2, . . . , 𝑧𝑛+π‘˜+1) =(1,0π‘˜). (4.9) In this case the burst of lengthπ‘˜ could not have started in any of the positions from the set [𝑛] ={1,2, . . . , 𝑛}, which impliesuis equal to the first𝑛bits ofz.

The only case left to consider is where the deletion begins in marker sequence 1, 0π‘˜, 1π‘˜, 0. First note that if the deletion occurs in the marker sequence then (4.8) can hold only if the deletion begins in position 𝑛+1 in x. In this case, it is straightforward to verify that the decoding described for this will still generate u sinceris still equal to the last 𝑁 βˆ’π‘› bits ofz. If the deletion begins in one of the positions{𝑛+2, 𝑛+3, . . . , 𝑛+π‘˜+1}, then

(𝑧𝑛+1, 𝑧𝑛+2, . . . , 𝑧𝑛+1+π‘˜) = (1,0𝑗,1π‘˜βˆ’π‘—),

so that neither (4.8) or (4.9) can hold. If the deletion begins in the marker sequence after position𝑛+π‘˜+1 inx, then (4.9) holds and the decoding is correct in this case

as well. β–‘

Codes correcting bursts of deletions

Next, we consider a more generalized type of burst error pattern. In this section, we want to correctπ‘˜ bursts each occurring within a window of length at most𝑑𝐿 where the deletions in each burst need not occur consecutively. For shorthand, we refer to these codes as (π‘˜ , 𝑑𝐿)-burst codes. The main result here will be to show that for the case whereπ‘˜ , 𝑑𝐿 are constants, there exists(π‘˜ , 𝑑𝐿)-burst codes with redundancy 4π‘˜(1+πœ–)log𝑛forπ‘˜ , 𝑛large enough.

We begin by first introducing some notation, and then we proceed to our code construction. We say that z ∈ {0,1}π‘›βˆ’|𝐽| is the result of π‘˜ bursts each occurring within a window of length at most 𝑑𝐿 occurring to x ∈ {0,1}𝑛 if there exists sets 𝐽 , 𝐽𝑏 βŠ† [𝑛], with|𝐽| ≀ π‘˜ ·𝑑𝐿, |𝐽𝑏|=π‘˜ such that the following holds:

1. zcan be obtained by deleting symbols fromxin positions𝐽. 2. For any 𝑗 ∈ 𝐽, there exists an𝑖 ∈ 𝐽𝑏where|𝑗 βˆ’π‘–| < 𝑑𝐿. We illustrate these notations in the following example.

Example 4.5.1. Suppose x = (0,1,1,

1,0,

1,0,0,0,1,1,

0,0) ∈ {0,1}13 is in a (2,3)-burst code. Let

z=(0,1,1,0,0,0,0,1,1,0)10.

Then, we can claim that zis the result of 2 bursts of deletions of length at most3 since we can write𝐽 ={4,6,12}and𝐽𝑏 ={4,12}with𝑑𝐿 =3. It follows that given z, it is possible to uniquely recoverxprovidedxis in a(2,3)-burst code.

For a vector x ∈ {0,1}π‘š, let π΅π‘˜ ,𝑑

𝐿(x) be the set of vectors possible given that π‘˜ bursts each occurring within a window of length at most𝑑𝐿 occur tox. Then, define B𝑏

π‘˜ ,𝑑𝐿

(x) βŠ† {0,1}π‘š so that B𝑏

π‘˜ ,𝑑𝐿

(x) ={y ∈ {0,1}π‘š :π΅π‘˜ ,𝑑

𝐿(x) βˆ©π΅π‘˜ ,𝑑

𝐿(y) β‰  βˆ…,yβ‰ x}.

Clearly, ifx is in a (π‘˜ , 𝑑𝐿)-burst code, then y cannot be in the same code for any y∈ B𝑏

π‘˜ ,𝑑𝐿

(x). The following claim follows from straightforward counting arguments.

Claim 4.5.2. For integers π‘˜ , 𝑑𝐿, π‘š, and anyu∈ {0,1}π‘š,

|B𝑏

π‘˜ ,𝑑𝐿(u) | ≀ π‘š2π‘˜Β·

(𝑑𝐿 +1)π‘˜2π‘˜Β·π‘‘πΏ2

.

In order to apply the syndrome compression technique, we need to specify the labeling and also to show that the redundancy and confusability properties hold.

For this setup, we will use the same systematic labeling used to correct multiple deletions that was introduced in Sec.4.4. More specifically, we will use the labeling 𝑔 defined in (4.6). It follows immediately from our definitions and Lemma 4.4.4 that ifu,y ∈ {0,1}𝑛andy ∈ B𝑏

π‘˜ ,𝑑𝐿

(u)\{u}, then 𝑓

𝑠 𝑦 𝑠

π‘˜ (u)β‰  𝑓

𝑠 𝑦 𝑠 π‘˜ (y),

so that the confusability property holds. The redundancy property also follows immediately from the definition of 𝑔 since π‘˜ , 𝑑𝐿 are constants. Thus, to construct (π‘˜ , 𝑑𝐿)-burst codes, we can apply the same syndrome compression procedure as described in Sec. 4.3 and Sec. 4.4, except that we will search for an π‘Ž ∈ [ [𝑛2π‘˜ Β·

(𝑑𝐿+1)π‘˜2π‘˜Β·π‘‘πΏ2

]] such that 𝑓

𝑠 𝑦 𝑠

π‘˜ (u) . 𝑓

𝑠 𝑦 𝑠

π‘˜ (y) modπ‘Ž for anyy ∈ B𝑏

π‘˜ ,𝑑𝐿

(u). Since logπ‘Ž ≀ 2π‘˜log𝑛+π‘œ(log𝑛)for𝑛large enough, the resulting construction is systematic and has redundancy 4π‘˜log𝑛 + π‘œ(log𝑛) for π‘˜ , 𝑛 large enough. Hence, we have Theorem4.1.4.