Properties of Factor Precedence Multiset - On the Multiset of Factors of a String

In this section we explore various properties of OP_k^s.

Lemma 4.2. Let uand v be two strings from Σ^∗. Then,OP_k(u) = OP_k(v) implies m_k(u) =m_k(v).

Proof. OP_kû =OP_k^v ⇒ ∀s ∈ Σ^k, OP_kû(s, s) = OP_k^v(s, s). Observe that OP_kû(s, s) is

nothing but the number of ways to choose two distinct occurrences of s as a factor in u from|u|_s such occurrences. So,OP_k^u(s, s) = ^(|u|^s⁾²₂^−|u|^s.

Case OP_k^u(s, s)>0: In this case, |u|_s =|v|_s= ¹⁺

√1+8∗OP_k^u(s,s)

2 .

Case OP_kû(s, s) = 0: If there exists a string s⁰ such that either OP_kû(s, s⁰) > 0 or OP_kû(s⁰, s)>0, then|u|_s =|v|_s= 1. Otherwise, |u|_s =|v|_s= 0.

This also shows that we can derivemk(u) fromOP_k^uand hence we have the proof.

Theorem 4.3. Letuandv be two strings fromΣ^∗. Then,OP_k(u) = OP_k(v) implies OP_i(u) = OP_i(v) ∀i∈ {1, . . . , n}.

We prove the claim after proving the following results.

Lemma 4.4. LetΣ be a finite alphabet, p, q ∈Σ^k−1 and v ∈Σ^∗. Then, OP_k−1^v (p, q) = X

a,b∈Σ

OP_k^v(pa, bq) +OL_v(p, q)

where,OL_v(p, q) =







|v|_p.suff₁_q, if suffk−2(p) = pref_k−2(q)

0, otherwise

Proof. The ordered pair (p, q) can be in the stringv in the formpaandbqask-length factors wherepa is appearing before bq for all possiblea, b∈Σ. The only other way (p, q) is in v, where that occurrence of (p, q) cannot be captured by precedence data of factors of lengthk, is when prefk−2(p) = suffk−2(q). In that casep.suff1(q) contains (p, q). So counting all possible (pa, bq) ∈ OP_k^v and |v|_p.suff₁_q in case of prefk−2(p) = suffk−2(q) gives us OP_k−1^v (p, q). Here, OL_v(p, q) gives the frequency of p.suff₁q only when prefk−2(p) = suffk−2(q).

Example. Ifv = 1001, then we can easily verify thatOP₁^v(1,0) = 2. But OP₂^v(10,00) +OP₂^v(10,10) +OP₂^v(11,00) +OP₂^v(11,10)

= 1 + 0 + 0 + 0

= 1

Another occurrence of 1 appearing before 0 is hidden in the occurrence of the factor 10 as pref₀(1) = suff₀(0) = . This is captured by OL_v(1,0) = 1 and hence we get the correct value of OP₁^v(1,0) = 1 +OL_v(1,0) = 2.

Lemma 4.5. Let Σ be a finite alphabet. Letp, q ∈Σ^k−1 and v ∈Σ^∗. Then, OP_k−1^v (p, q) = X

a,b∈Σ

OP_k^v(ap, bq) +PR_v(p, q)

where, PR_v(p, q) =











|v|_q, if p is prefix of v and p6=q

|v|q−1, if p is prefix of v and p=q

0, otherwise

Proof. The ordered pair (p, q) can be in the stringv in the formapandbqask-length factors whereap is appearing beforebq for all possiblea, b∈Σ. The only other way (p, q) is in v, where that occurrence of (p, q) cannot be caught by occurrence of (ap, bq), is whenp is a prefix ofv. In that case the number of occurrences of (p, q) is missed the same number of times q is appears in v without being a prefix, which is

|v|_q if p6=q and |v|_q−1 ifp=q. So adding this number of missed cases with count of (ap, bq) in v, for all a, b∈Σ gives us OP_k−1^v (p, q).

Example. Ifv = 1000010001⁰ then OP₂^v(00,01) = 8 =

OP₂^v(000,001) +OP₂^v(000,101) +OP₂^v(100,001) +OP₂^v(100,101)

= 5 + 0 + 3 + 0 = 8.

But

OP₂^v(10,01) = 36=

OP₂⁽010,001) +OP₂^v(010,101) +OP₂^v(110,001) +OP₂^v(110,101)

= 1 + 0 + 0 + 0 = 1.

Here if we add |v|₀₁= 2, then we get the right answer.

Again,

OP₂^v(10,10) = 16=

OP₂⁽010,010) +OP₂^v(010,110) +OP₂^v(110,010) +OP₂^v(110,110) +|v|₁₀

= 0 + 0 + 0 + 0 + 2 = 2.

Here we have to subtract 1 from the obtained result as we are counting the prefix also while counting the occurrence after the prefix.

Lemma 4.6. LetΣ be an alphabet. Let p, q ∈Σ^k−1 and v ∈Σ^∗. Then, OP_k−1^v (p, q) = X

a,b∈Σ

OP_k^v(pa, qb) +SF_v(p, q)

where, SF_v(p, q) =











|v|_p, if q is suffix of v and p6=q

|v|_p −1, if q is suffix of v and p=q

0, otherwise

Proof. The ordered pair (p, q) can be in the stringv in the formpaandqbask-length factors wherepa is appearing before qbfor all possiblea, b∈Σ. The only other way (p, q) is in v, where that occurrence of (p, q) cannot be caught by occurrence of (pa, qb), is whenq is a suffix of v. In that case the number of occurrences of (p, q) is missed the same number of times p is appears in v without being a suffix, which is

|v|p if p6=q and |v|p−1 ifp=q. So adding this number of missed cases with count of (pa, qb) in v, for all a, b∈Σ gives us OP_k−1^v (p, q).

An example to demonstrate this result can be easily constructed by taking the reverse of the string which was taken to demonstrate the previous lemma.

Lemma 4.7. LetΣ be a finite alphabet. Letp, q ∈Σ^k−1 and v ∈Σ^∗. Then, OP_k−1^v (p, q) = X

a,b∈Σ

OP_k^v(ap, qb) +PR_v(p, q) +SF_v(p, q)− MD_v(p, q)

where, MD_v(p, q) =







β−δ−γ+|v|_p, if p=q

β, otherwise

where, δ=







1, if p is prefix of v 0, otherwise

, and, γ =







1, if q is suffix of v 0, otherwise

and, β =







1, if p is prefix andq is suffix of v 0, otherwise

where SF_v(p, q) and PR_v(p, q) are as defined in Lemma 4.5 and 4.6, respectively.

Proof. The ordered pair (p, q) can be in the stringv in the formapandqbask-length factors where ap is appearing before qbfor all possible a, b∈Σ.

As we have seen in Lemma 4.5 and Lemma 4.6 we have to add both PR_v(p, q) andSF_v(p, q) to get the result as in this case count can be affected forpbeing prefix and q being suffix of v. But if p = q then we are counting (p, q) twice while p is at prefix position and q be at suffix position. So, we need to subtract 1 thereafter.

There are further terms to be subtracted. While calculating X

a,b∈Σ

OP_k^v(ap, qb) we have added the cases where for each occurrence of p (= q) we have added some (ap, pb) to the list but here p and q coincides and cannot create a pair (p, q). Thus we have to subtract the number of such occurrences of p in v, that is, |v|p, except the cases where p(= q) is either suffix or prefix or both. These cases are taken care of byδ and γ and β.

In Figure4.2 and Figure4.3we present a demonstration of the Lemma 4.7show- ing different cases that may occur while findingOP_v^k−1(p, q) from OP_v^k.

Proof. (of Theorem 4.3)

From Lemma4.4we get that for everyp, q ∈Σ^k−1,OP_k−1^v (p, q) can be determined unambiguously from OP_k^v. Hence we get OPk−1(v) from OP_k(v) and then we get OPk−2(v) from OPk−1(v) and so on. Hence we get OPi(v) from OPi+1(v) ∀1 ≤i ≤ (k−1). Also we have OP_k(u) = OP_k(v). So, in a similar manner we get OPk−1(u)

=10001100010110 is a prefix of ? is a suffix of ?

||ℛ(,)ℱ(,)ℳ(,) Case-1=000 =100 (0

00,100)=1

××22000000(

000,100)=0 (

000,100)=1 (

000,100)=0 Case-2=100 =001 (1

00,001)=3

✓×22200100(

100,001)=0 (

100,001)=1 (

100,001)=0 Case-3=000 =110 (0

00,110)=3

×✓22020010(

000,110)=0 (

000,110)=1 (

000,110)=0 Case-4=100 =110 (1

00,110)=3

✓✓22221111(

100,110)=0 (

100,110)=0 Case-5=100 =100 (1

00,100)=1

✓×22100101(

100,100)=0 (

100,100)=1 (

100,100)=0 Case-6=110 =110 (1

10,110)=1

×✓22010011(

110,110)=1 (

110,110)=0 (

110,110)=0

00 01

00 10

00 11

01 01

01 10

10 00

10 11

11 00

121231210001 000110100010 110121110011 =000010100101 110111110110 321231211000 000010001011 110111101100 00

0 00 1

01 0

01 1

10 0

10 1

11 0

1323123000 1123123001 0001011010 =1111113011 3323123100 0001001101 1111111110

Figure 4.2: Examples of finding OP_v^k−1(p, q) from OP_v^k.

Figure 4.3: Examples of finding OP_v^k−1(p, q) from OP_v^k where p = q and p is both suffix and prefix of v.

from OP_k(u) and then we get OPk−2(u) from OPk−1(u) and so on. Hence we have the result.

Corollary 4.8. Let u, v ∈Σ^∗ for some finite alphabet Σ, then OP_k^v =OP_k^v implies for all m_[1,k](u) = m_[1,k](v).

Proof. By Theorem 4.3, ∀i∈ {1, ..., k} we have OP_i^v =OP_i^v and by Lemma 4.2 we have now m_[1,k](u) =m_[1,k](v).

Corollary 4.9. Let u and v be two strings from Σ^∗. Then, OP_k(u) = OP_k(v) implies prefk−1(u) =prefk−1(v) and suffk−1(u) =suffk−1(v).

Proof. Lemma4.2and Corollary4.8together gives us the result. An alternative way to prove it is to look for ap, q ∈Σ^k−1 such thatOP_k−1^u (p, q)6= X

a,b∈Σ

OP_k(ap, bq) Such pwill be prefix ofu. The same can be done withv and asOPk(u) = OPk(v) we shall get the same prefix.

For suffix part, we need to look for again p, q ∈ Σ^k−1 such that OP_k−1^u (p, q) 6=

a,b∈Σ

OP_k(pa, qb) Then q will be suffix for both u and v.

Lemma 4.10. Letu∈Σ^∗ and v, w∈Σ^k with v 6=w. Then, OP_k^u(v, w) +OP_k^u(w, v) =|u|_v|u|_w

Proof. The value OP_k^u(v, w) is the number of ways v appears before w in u, and OP_k^u(w, v) is the number of ways w appears before v in u. Together they form the statement number of waysv andw occurs anywhere inuas an unordered pair which is nothing but |u|_v|u|_w.

The above results are all necessary conditions of a given precedence information to be valid but we have not arrived at any sufficient condition.

Dalam dokumen On the Multiset of Factors of a String (Halaman 55-62)