Chapter IV: Correcting Deletions/Insertions-Generalizations
4.5 Codes Capable of Correcting Bursts of Deletions
lengthπ2βπsubsequence of π π ππ+1(ππ(ππ(c)))), which is aπ-deletion correcting code. Therefore ππ(ππ(c)) can be recovered. In addition, (π§π+1, . . . , π§π+π
1βπ) is a length π1 β π subsequence of ππ(c). Since ππ(ππ(c)) is a π-deletion correcting hash ofππ(c), the hashππ(c) can be recovered. Finally, note that (π§1, . . . , π§πβπ) is a lengthπβπ subsequence ofπ, we can useππ(c) to recoverc. The decoding ofc fromππ(c) is done using brute force, over all sequencescβ²that satisfyd β Bπ(cβ²).
The computing of ππ(c) is done by brute force, over sequencescβ² β Bπ(c). Hence the encoding and decoding complexities areπ(π2π+1) andπ(ππ+1)respectively.
of a Varshamov-Tenengolts code π(π+1)2 times. For notation convenience, suppose π|π for π β [π], where [π] = {1, . . . , π}2. Then for a vector u β {0,1}π, define
πππ
π :{0,1}πβ [ [π
π +1]]πΓ [ [ π
πβ1 +1]]πβ1Γ Β· Β· Β· Γ [ [π+1]]as π
ππ
π (u) =
π πβ1
βοΈ
π=0
π’πΒ·π+1mod π π
+1,
π πβ1
βοΈ
π=0
π’πΒ·π+2mod π π
+1, . . . ,
π πβ1
βοΈ
π=0
π’πΒ·π+π mod π π
+1
π πβ1β1
βοΈ
π=0
π’(πβ1)Β·π+1mod π π β1 +1,
π πβ1β1
βοΈ
π=0
π’(πβ1)Β·π+2 mod π
πβ1 +1, . . . ,
π πβ1β1
βοΈ
π=0
π’(πβ1)Β·π+πβ1mod π π β1 +1
π πβ2β1
βοΈ
π=0
π’(πβ2)Β·π+1mod π π β2 +1,
π πβ2β1
βοΈ
π=0
π’(πβ2)Β·π+2 mod π
πβ2 +1, . . . ,
π πβ2β1
βοΈ
π=0
π’(πβ2)Β·π+πβ2mod π π β2 +1 ..
.
π
βοΈ
π=1
π’π modπ+1
β [ [π π
+1]]π Γ [ [ π
πβ1 +1]]πβ1Γ Β· Β· Β· Γ [ [π+1]]. For convenience we will sometimes assume that the image of πππ
π (u) is an integer fromh h
Γπ
π =1(π
π +1)π i i
β€ O ((π+π)
π2
π! ). It is straightforward to show that πππ
π satisfies the confusability property, but we include the following lemma for completeness.
Lemma 4.5.1. Supposeu β {0,1}π. Then for anyyβ Bππ
π (u), π
ππ
π (u) β π
ππ π (y).
Proof. To prove the result, assume thatzis the result of a burst of deletions of length at most π occurring touand we are given πππ
π (u). We will show that it is possible to uniquely recoverufromzgiven πππ
π (u), which is equivalent to showing that for anyyβ Bππ
π (u), π
ππ
π (u) β π
ππ π (y). Suppose πππ
π (u) = (ππ ,1, . . . , ππ , π, π1, πβ1, . . . , ππβ1, πβ1, . . . , π1)and that|z|=πβπ so that z is the result of a burst of π β€ π consecutive deletions occurring to u.
Consider the sequences:
z(1) = (π§1, π§1+π , π§1+2π , . . . , π§πβπ +1),
2We can replaceππ withβππβifπβ€π.
z(2) = (π§2, π§2+π , π§2+2π , . . . , π§πβπ +2), ..
.
z(π ) = (π§π , π§2π , π§3π , . . . , π§π).
Also, let
u(1) = (π’1, π’1+π , π’1+2π , . . . , π’πβπ +1), u(2) = (π’2, π’2+π , π’2+2π , . . . , π’πβπ +2),
.. .
u(π ) = (π’π , π’2π , π’3π , . . . , π’π).
Since forπ β [π ],z(π) is the result of a single deletion occurring tou(π), it is possible to recoveru(π) givenz(π) andππ ,1, ππ ,2, . . . , ππ ,π since
n
u(π) =(π’(
π)
1 , . . . , π’(
π)
π π
) β {0,1}ππ :
π π
βοΈ
π=1
π’(
π)
π β‘ππ ,π mod π π
+1 o
is a code capable of correcting a single deletion. β‘
From Lemma 4.5.1the mapping πππ
π satisfies the confusability property. Further- more, πππ
π satisfies the redundancy property since log Γπ π =1(π
π +1)π
!
β€ O
π2log(π+ π)
, and π is assumed to be a constant. Therefore, from Theorem 4.2.1, for any u β {0,1}π there exists an integerπ such thatπ β€ 2log|Bπππ (u) |+π(logπ)
, and for any yβ Bππ
π (u), πππ
π (u) . πππ
π (y) modπ.
We define our codeCππ(π , π)withπ =π+2 log|Bππ
π (x) | +π(logπ)as follows:
Cππ(π , π) = (
x=
u,1,0π,1π,0, π, πππ
π (u) modπ
:uβ {0,1}π )
. (4.7) We now prove the following theorem and thus prove Theorem4.1.3. In the statement below,uis the information portion of the sequence (the non-redundancy part) from (4.7).
Theorem 4.5.1. Let z be the result of a consecutive burst of length at most π occurring tox β Cππ(π , π). Then, we can uniquely determinexfromz.
Proof. To prove the result, we show how to recoveru fromz. In order to recover ufromz, we show that it is possible to separatezinto two parts: z1 andz2where either a) z1 is the result of a burst of deletions of length at most π occurring to u or b) z2 is the result of a burst of deletions of length at most π occurring to r = (1, 0π, 1π, 0, π, πππ
π (u) modπ). Note that ifz1 β u, then (due to the length of the burst) z2 = r and the fact that we can recoverufromz1 provided rfollows immediately from Theorem 4.2.1. If b) holds and z2 β r, then by similar logic, u=z1. Note that the fact thatz1β ucan be determined immediately by the length of z1 (due to the deletions) and similarly we can easily detect when z2 β r by considering the length ofz2. Therefore, in the remainder of the proof we show how to recover z1,z2 from z assuming a burst of π β€ π deletions have occurred to x resulting inz.
In order to separatezintoz1andz2, we make use of the marker sequence 1, 0π, 1π, 0, which is embedded into every codeword in our code according to (4.7). Let |z| = πβπ . If
(π§π+1, π§π+2, . . . , π§π+2πβπ +1) =(0πβπ +1,1π), (4.8) then, it is straightforward to observe thatz2 =rwherez2is equal to the last πβπ bits ofz. We set z1 to be equal to the firstπβ π bits ofz so that by the previous discussion we can recoverufromz.
Next, suppose that
(π§π+1, π§π+2, . . . , π§π+π+1) =(1,0π). (4.9) In this case the burst of lengthπ could not have started in any of the positions from the set [π] ={1,2, . . . , π}, which impliesuis equal to the firstπbits ofz.
The only case left to consider is where the deletion begins in marker sequence 1, 0π, 1π, 0. First note that if the deletion occurs in the marker sequence then (4.8) can hold only if the deletion begins in position π+1 in x. In this case, it is straightforward to verify that the decoding described for this will still generate u sinceris still equal to the last π βπ bits ofz. If the deletion begins in one of the positions{π+2, π+3, . . . , π+π+1}, then
(π§π+1, π§π+2, . . . , π§π+1+π) = (1,0π,1πβπ),
so that neither (4.8) or (4.9) can hold. If the deletion begins in the marker sequence after positionπ+π+1 inx, then (4.9) holds and the decoding is correct in this case
as well. β‘
Codes correcting bursts of deletions
Next, we consider a more generalized type of burst error pattern. In this section, we want to correctπ bursts each occurring within a window of length at mostπ‘πΏ where the deletions in each burst need not occur consecutively. For shorthand, we refer to these codes as (π , π‘πΏ)-burst codes. The main result here will be to show that for the case whereπ , π‘πΏ are constants, there exists(π , π‘πΏ)-burst codes with redundancy 4π(1+π)logπforπ , πlarge enough.
We begin by first introducing some notation, and then we proceed to our code construction. We say that z β {0,1}πβ|π½| is the result of π bursts each occurring within a window of length at most π‘πΏ occurring to x β {0,1}π if there exists sets π½ , π½π β [π], with|π½| β€ π Β·π‘πΏ, |π½π|=π such that the following holds:
1. zcan be obtained by deleting symbols fromxin positionsπ½. 2. For any π β π½, there exists anπ β π½πwhere|π βπ| < π‘πΏ. We illustrate these notations in the following example.
Example 4.5.1. Suppose x = (0,1,1,
1,0,
1,0,0,0,1,1,
0,0) β {0,1}13 is in a (2,3)-burst code. Let
z=(0,1,1,0,0,0,0,1,1,0)10.
Then, we can claim that zis the result of 2 bursts of deletions of length at most3 since we can writeπ½ ={4,6,12}andπ½π ={4,12}withπ‘πΏ =3. It follows that given z, it is possible to uniquely recoverxprovidedxis in a(2,3)-burst code.
For a vector x β {0,1}π, let π΅π ,π‘
πΏ(x) be the set of vectors possible given that π bursts each occurring within a window of length at mostπ‘πΏ occur tox. Then, define Bπ
π ,π‘πΏ
(x) β {0,1}π so that Bπ
π ,π‘πΏ
(x) ={y β {0,1}π :π΅π ,π‘
πΏ(x) β©π΅π ,π‘
πΏ(y) β β ,yβ x}.
Clearly, ifx is in a (π , π‘πΏ)-burst code, then y cannot be in the same code for any yβ Bπ
π ,π‘πΏ
(x). The following claim follows from straightforward counting arguments.
Claim 4.5.2. For integers π , π‘πΏ, π, and anyuβ {0,1}π,
|Bπ
π ,π‘πΏ(u) | β€ π2πΒ·
(π‘πΏ +1)π2πΒ·π‘πΏ2
.
In order to apply the syndrome compression technique, we need to specify the labeling and also to show that the redundancy and confusability properties hold.
For this setup, we will use the same systematic labeling used to correct multiple deletions that was introduced in Sec.4.4. More specifically, we will use the labeling π defined in (4.6). It follows immediately from our definitions and Lemma 4.4.4 that ifu,y β {0,1}πandy β Bπ
π ,π‘πΏ
(u)\{u}, then π
π π¦ π
π (u)β π
π π¦ π π (y),
so that the confusability property holds. The redundancy property also follows immediately from the definition of π since π , π‘πΏ are constants. Thus, to construct (π , π‘πΏ)-burst codes, we can apply the same syndrome compression procedure as described in Sec. 4.3 and Sec. 4.4, except that we will search for an π β [ [π2π Β·
(π‘πΏ+1)π2πΒ·π‘πΏ2
]] such that π
π π¦ π
π (u) . π
π π¦ π
π (y) modπ for anyy β Bπ
π ,π‘πΏ
(u). Since logπ β€ 2πlogπ+π(logπ)forπlarge enough, the resulting construction is systematic and has redundancy 4πlogπ + π(logπ) for π , π large enough. Hence, we have Theorem4.1.4.