Algorithm B: Generalization of Blum’s Algorithm

Random Number Generation from Markov Chains

3.5 Algorithm B: Generalization of Blum’s Algorithm

In the first variant of algorithm A, instead of applying Ψ directly to Λ_i =π_i(X) for i=χ (or Λ_i =π_i(X)^|^πⁱ^(X)^|−¹ fori̸=χ), we first split Λ_i into several segments with lengthsk_i1, k_i2, ..., then apply Ψ to all of the segments separately. It can be proved that this variant of algorithm A can generate independent unbiased sequences from an arbitrary Markov chain, as long aski1, ki2, ...do not depend on the order of elements in each exit sequence. For example, we can split Λi into two segments of lengths⌊^|^Λ₂ⁱ^|⌋and⌈^|^Λ₂ⁱ^|⌉, we can also split it into three segments of lengths (a, a,|Λi|−2a) .... Generally, the shorter each segment is, the faster we can obtain the final output. But at the same time, we may have to sacrifice a little information efficiency.

The second variant of algorithm A is based on the following idea: for a given sequence from a Markov chain, we can split it into some shorter sequences such that they are independent of each other, therefore we can apply algorithm A to all of the sequences and then concatenate their output sequences together as the ﬁnal one. In order to do this, given a sequence X =x1x2..., we can use x1=sα as a special state to it. For example, in practice, we can set a constantk, if there exists a minimal integerisuch thatxi=sαandi > k, then we can splitX into two sequencesx1x2...xi and x_ix_i+1... (note that both of the sequences have the elementx_i). For the second sequencex_ix_i+1..., we can repeat the same procedure. Iteratively, we can split a sequence X into several sequences such that they are independent of each other. These sequences, with the exception of the last one, start and end withsα, and their lengths are usually slightly longer thank.

the information-theoretic upper bound.

Algorithm B

Input: A sequence (or a stream)x1x2...produced by a Markov chain, wherexi ∈ {s1, s2, ..., sn}. Parameter: npositive integer functions (window size)ϖi(k) withk≥1 for all 1≤i≤n.

Output: A sequence (or a stream)Y of 0s and 1s.

Main Function:

E_i =ϕ(empty) for all 1≤i≤n.

ki= 1 for all 1≤i≤n.

c: the index of current state, namely,sc=x1. while next input symbol issj (̸=null)do

Ec =Ecsj (Addsj toEc).

if |Ej| ≥ϖj(kj)then Output Ψ(E_j).

E_j =ϕ.

k_j =k_j+ 1.

end if c=j.

end while

In the algorithm above, we apply function Ψ on Ej to generate random bits if and only if the window forE_j is completely ﬁlled and the Markov chain is currently at states_j.

For example, we set ϖ_i(k) = 4 for all 1 ≤i ≤n and for all k ≥1 and assume that the input sequence is

X =s1s1s1s2s2s2s1s2s2.

After reading the second to last (8th) symbol s₂, we have

E1=s1s1s2s2, E2=s2s2s1.

In this case,|E1| ≥4 so the window forE1is full, but we do not apply Ψ toE1 because the current state of the Markov chain iss2, nots1.

By reading the last (9th) symbols2, we get

E₁=s₁s₁s₂s₂, E₂=s₂s₂s₁s₂.

Since the current state of the Markov chain is s2 and|E2| ≥4, we produce Ψ(E2=s2s2s1s2) and resetE2 asϕ.

In the example above, treatingXas input to algorithm B, we get the output sequence Ψ(s2s2s1s2).

The algorithm does not output Ψ(E1 =s1s1s2s2) until the Markov chain reaches state s1 again.

Timing is crucial!

Note that Blum’s algorithm is a special case of algorithm B by setting the window size functions ϖ_i(k) = 2 for all 1≤i ≤nand k∈ {1,2, ...}. Namely, algorithm B is a generalization of Blum’s algorithm, the key is that when we increase the windows sizes, we can apply more eﬃcient schemes (compared to the von Neumann’s scheme) for Ψ. Assume a sequence of symbols X = x1x2...xN

withxN =sχ have been read by the algorithm above, we want to show that for anyN, the output sequence is always independent and unbiased. Unfortunately, Blum’s proof for the case ofϖi(k) = 2 cannot be applied to our proposed scheme.

For alliwith 1≤i≤n, we can write

π_i(X) =F_i1F_i2...F_im_iE_i,

whereF_ij with 1≤j≤m_i are the segments used to generate outputs. For alli, j, we have

|Fij|=ϖi(j),

and 









0≤ |E_i|< ϖ_i(m_i+ 1) ifi=χ, 0<|E_i| ≤ϖ_i(m_i+ 1) otherwise.

See ﬁgure 3.5 for simple illustration.

F11 F12 F13 E₁

F21 F22

F31 F32 F33

)

1(X p

)

2(X p

)

3(X p

Figure 3.5. The simpliﬁed expressions for the exit sequences ofX.

Theorem 3.9 (Algorithm B). Let the sequence generated by a Markov chain be used as input to algorithm B, then algorithm B generates an independent unbiased sequence of bits in expected linear time.

Proof. In the following proof, we use the same idea as in the proof for algorithm A.

Let us ﬁrst divide all the possible input sequences in {s₁, s₂, ..., s_n}^N into classes, and useGto denote the set consisting of all the classes. Two sequences X and X^′ are in the same class if and only if

1. x1=x^′₁ andxN =x^′_N. 2. For alliwith 1≤i≤n,

π_i(X) =F_i1F_i2...F_im_iE_i,

π_i(X^′) =F_i1^′ F_i2^′ ...F_im^′

iE_i^′, where Fij andF_ij^′ are the segments used to generate outputs.

3. For alli, j,F_ij ≡F_ij^′. 4. For alli,Ei=E_i^′.

Let us use ΨB to denote algorithm B. ForY ∈ {0,1}^∗, letBY be the set of sequencesX of length N such that ΨB(X) =Y. We show that for anyS∈G,|S∩

BY|=|S∩

BY^′| whenever|Y|=|Y^′|. If S is empty, this conclusion is trivial. In the following, we only consider the case that S is not empty.

Now, given a class S, let us deﬁne S_ij as the set consisting of all the permutations of F_ij for X ∈S. GivenY_ij∈ {0,1}^∗, we continue to deﬁne

Sij(Yij) ={Λij ∈Sij|Ψ(Λij) =Yij}

for all 1≤i≤nand 1≤j≤mi, which is the subset ofSij consisting of all sequences yieldingYij. According to lemma 2.1, we know that |Sij(Yij)| =|Sij(Y_ij^′)|whenever |Yij| =|Y_ij^′|. This implies that|S_ij(Y_ij)|is a function of|Y_ij|, which can be written asM_ij(|Y_ij|).

Let l₁₁, l₁₂, ..., l_1m₁, l₂₁..., l_nm_n be nonnegative integers such that their sum is |Y|, we want to prove that

|S∩

BY|= ∑

l11+...+l_nmn=|Y|

∏n i=1

m_i

∏

j=1

Mij(lij).

The proof is by induction. Letw =∑n

i=1mi. First, the conclusion holds forw = 1. Assume the conclusion holds forw >1, we want to prove that the conclusion also holds forw+ 1.

Note that for all 1≤i ≤n, ifj1 < j2, then Fij₁ generates an output before Fij₂ in algorithm B. So given an input sequence X ∈S, the last segment that generates an output (the output can be an empty string) is F_im_i for some i with 1 ≤ i ≤n. Now, we show that this i is fixed for all the sequences inS, i.e., the position of the last segment generating an output keeps unchanged. To prove this, given a sequenceX ∈S, let us see the firstasymbols ofX, i.e.,Xâ, such that the last segment Fim_i generates an output just after reading xa when the input sequence is X. Based on our algorithm,Xâ has the following properties.

1. The last symbol x_a=s_i. 2. πi(X^a) =Fi1Fi2...Fim_i.

3. πj(X^a) =Fj1Fj2...Fjm_jEej forj̸=i, where|Eej|>0.

Now, let us permute each segment ofF11, F12, ..., Fnm_n toF₁₁^′ , F₁₂^′ , ..., F_nm^′ _n, then we get another sequenceX^′∈S. According to lemma 3.4, if we consider the first asymbols ofX^′, i.e., X^′â, it has the similar properties asXâ:

1. The last symbol x^′_a=s_i. 2. πi(X^′^a) =F_i1^′ F_i2^′ ...F_im^′

i. 3. πj(X^′^a) =F_j1^′ F_j2^′ ...F_jm^′

jEej forj ̸=i, where |Eej|>0.

This implies that when the input sequence isX^′,F_im^′ _i generates an output just after readingx^′_a and it is the last one. So we can conclude that for all the sequences inS, their last segments generating outputs are at the same position.

Let us ﬁx the last segment F_im_i and assume F_im_i generates the last l_im_i bits of Y. We want to know how many sequences in S∩

B_Y have F_im_i as their last segments that generate outputs.

In order to get the answer, we concatenate Fim_i with Ei as the new Ei. As a result, we have

∑n

i=1mi−1 =wsegments to generate the ﬁrst|Y| −lim_i bits ofY. Based on our assumption, the number of such sequences will be

∑

l₁₁+...+l_i(_mi−1)+...=|Y|−l_imi

1 Mim_i(lim_i)

∏n k=1

∏

j=1

Mkj(lkj),

where l₁₁, ..., l_i(m_i₋₁₎, l_(i+1)1..., l_nm_n are nonnegative integers. For each l_im_i, there are M_im_i(l_im_i) diﬀerent choices for F_im_i. Therefore, |S∩

B_Y| can be obtained by multiplying M_im_i(l_im_i) by the number above and summing them up overlim_i. Namely, we can get the conclusion above.

According to this conclusion, we know that if |Y|=|Y^′|, then|S∩

BY|=|S∩

BY^′|. Using the same argument as in Theorem 3.7 we complete the proof of the theorem.

Normally, the window size functions ϖ_i(k) for 1≤i≤n can be any positive integer functions.

Here, we fix these window size functions as a constant, namely, ϖ. By increasing the value ofϖ, we can increase the efficiency of the scheme, but at the same time it may cost more storage space and need more waiting time. It is helpful to analyze the relationship between scheme efficiency and window sizeϖ.

Theorem 3.10 (Efficiency). Let X be a sequence of length N generated by a Markov chain with transition matrix P, which is used as input to algorithm B with constant window sizeϖ. Then as the length of the sequence goes to infinity, the limiting efficiency of algorithm B is

η(ϖ) =

∑n i=1

uiηi(ϖ),

where U = (u1, u2, ..., un) is the stationary distribution of this Markov chain, and ηi(ϖ) is the eﬃciency of Ψwhen the input sequence of lengthϖ is generated by a n-face coin with distribution (pi1, pi2, ..., pin).

Proof. WhenN → ∞, there exists anϵ_N which→0, such that with probability 1−ϵ_N, (u_i−ϵ_N)N <

|π_i(X)|<(u_i+ϵ_N)N for all 1≤i≤n.

The eﬃciency of algorithm B can be written as η(ϖ), which satisﬁes

∑n

i=1⌊^|^πⁱ^(X)_ϖ^|−¹⌋ηi(ϖ)ϖ

N ≤η(ϖ)≤

∑n

i=1⌊^|^πⁱ_ϖ^(X)^|⌋ηi(ϖ)ϖ

N .

With probability 1−ϵN, we have

∑n

i=1(^(uⁱ⁻_ϖ^ϵ^N^)N −1)η_i(ϖ)ϖ

N ≤η(ϖ)≤

∑n i=1

(u_i−ϵ_N)N

ϖ η_i(ϖ)ϖ

N .

So whenN → ∞, we have that

η(ϖ) =

∑n i=1

uiηi(ϖ).

This completes the proof.

Let us deﬁneα(N) =∑

n_k2ⁿ^k, where∑

2ⁿ^k is the standard binary expansion of N. Assume Ψ is the Elias function, then

ηi(ϖ) = 1 ϖ

∑

k₁+...+k_n=ϖ

α( ϖ!

k₁!k₂!...k_n!)p^k_i1¹p^k_i2²...p^k_inⁿ.

Based on this formula, we can numerically study the relationship between the limiting eﬃciency and the window size (see section 3.7). In fact, when the window size becomes large, the limiting eﬃciency (n→ ∞) approaches the information-theoretic upper bound.

Dalam dokumen Randomness and Noise in Information Systems (Halaman 82-89)