Algorithm C: An Optimal Algorithm - Random Number Generation from Markov Chains

Random Number Generation from Markov Chains

3.6 Algorithm C: An Optimal Algorithm

Let us deﬁneα(N) =∑

n_k2ⁿ^k, where∑

2ⁿ^k is the standard binary expansion of N. Assume Ψ is the Elias function, then

ηi(ϖ) = 1 ϖ

∑

k₁+...+k_n=ϖ

α( ϖ!

k₁!k₂!...k_n!)p^k_i1¹p^k_i2²...p^k_inⁿ.

Based on this formula, we can numerically study the relationship between the limiting eﬃciency and the window size (see section 3.7). In fact, when the window size becomes large, the limiting eﬃciency (n→ ∞) approaches the information-theoretic upper bound.

Proof. It is easy to see that if|S_(α,K)∩

B_Y|=|S_(α,K)∩

B_Y′|for all (α, K) and|Y|=|Y^′|, thenY and Y^′ have the same probability to be generated. In this case, f can generate random bits from an arbitrary Markov chain. In the rest, we only need to prove the inverse claim.

Iff can generate random bits from an arbitrary Markov chain, then P[f(X) =Y] =P[f(X) = Y^′] for any two binary sequences Y and Y^′ of the same length. Here, let pij be the transition probability from statesi to statesj for all 1≤i, j≤n, we can write

P[f(X) =Y] = ∑

α,K∈G

|S_(α,K₎∩

B_Y|ϕ_K(p₁₁, p₁₂, ..., p_nn)P(x₁=s_α),

where

G={K|kij ∈ {0}∪ Z⁺,∑

i,j

kij =N−1},

and

ϕK(p11, p12, ..., pnn) =

∏n i=1

∏n j=1

p^k_ij^ij.

Similarly,

P[f(X) =Y^′] = ∑

α,K∈G

|S_(α,K)∩

B_Y′|ϕ_K(p₁₁, p₁₂, ..., p_nn)P(x₁=s_α).

As a result,

∑

α,K∈G

(|S_(α,K)∩

BY^′| − |S_(α,K₎∩

BY|)ϕK(p11, ..., pnn)×P(x1=sα) = 0.

SinceP(x₁=s_α) can be any value in [0,1], for all 1≤α≤nwe have

∑

K∈G

(|S_(α,K)∩

B_Y′| − |S_(α,K)∩

B_Y|)ϕ_K(p₁₁, ..., p_nn) = 0.

It can be proved that∪

K∈G{ϕK(p11, p12, ..., pnn)} are linear independent in the vector space of

functions on the transition probabilities, namely

{(p11, p12, ..., pnn)|pij ∈[0,1],

∑n j=1

pij = 1}.

Based on this fact, we can conclude that|S_(α,K₎∩

BY|=|S_(α,K)∩

BY^′|for all (α, K) if|Y|=|Y^′|.

Let us deﬁne α(N) = ∑

n_k2ⁿ^k, where ∑

2ⁿ^k is the standard binary expansion of N, then we have the suﬃcient condition for an optimal function .

Lemma 3.12 (Suﬃcient condition for an optimal function). Let f^∗ be a function that generates random bits from an arbitrary Markov chain with unknown transition probabilities. If for anyαand any n×n nonnegative integer matrix K with ∑n

i=1

∑n

j=1kij = N −1, the following equation is satisﬁed,

∑

X∈S_(α,K)

|f^∗(X)|=α(|S(α,K)|),

thenf^∗ generates independent unbiased random bits with optimal information eﬃciency. Note that

|f^∗(X)| is the length off^∗(x)and|S_(α,K)|is the size of S_(α,K).

Proof. Let hdenote an arbitrary function that is able to generate random bits from any Markov chain. According to lemma 2.9 in [88], we know that

∑

X∈S_(α,K)

|h(X)| ≤α(|S(α,K)|).

Then the average output length ofhis

E(|h(X)|) = 1 N

∑

(α,K)

∑

X∈S_(α,K)

|h(X)|ϕ(K)P[x1=sα]

≤ 1 N

∑

(α,K)

α(|S_(α,K)|)ϕ(K)P[x1=sα]

= 1

∑

(α,K)

∑

X∈S_(α,K)

|f^∗(X)|ϕ(K)P[x₁=s_α]

= E(|f^∗(X)|).

Sof^∗ is the optimal one. This completes the proof.

Here, we construct the following algorithm (Algorithm C) which satisﬁes all the conditions in lemma 3.11 and lemma 3.12. As a result, it can generate unbiased random bits from an arbitrary Markov chain with optimal information eﬃciency.

Algorithm C

Input: A sequenceX =x₁x₂..., x_N produced by a Markov chain, wherex_i∈S={s₁, s₂, ..., s_n}. Output: A sequenceY of 0^′s and 1^′s.

Main Function:

1) Get the matrixK={kij}with

kij=kj(πi(X)).

2) DeﬁneS(X) as

S(X) ={X^′|k_j(π_i(X^′)) =k_ij∀i, j;x^′₁=x₁},

then compute|S(X)|.

3) Compute the rank r(X) ofX in S(X) with respect to a given order. The rank with respect to a lexicographic order will be given later.

4) According to|S(X)|andr(X), determine the output sequence. Let∑

k2ⁿ^k be the standard binary expansion of |S(X)| with n1 > n2 > ... and assume the starting value of r(X) is 0. If r(X) < 2ⁿ¹, the output is the n₁ digit binary representation of r(x). If ∑i

k=12ⁿ^k ≤ r(x) <

∑i+1

k=12ⁿ^k, the output is then_i+1 digit binary representation ofr(x).

Comment: The fast calculations of|S(X)| andr(x) will be given in the rest of this section.

In algorithm A, when we use Elias’s function as Ψ, the limiting efficiencyηN = Ê[M]_N (asN → ∞) realizes the bound ^H(X)_N . Algorithm C is optimal, so it has the same or higher efficiency. Therefore, the limiting efficiency of algorithm C asN → ∞also realizes the bound ^H(X)_N .

In algorithm C, for an input sequence X with x_N = s_χ, we can rank it with respect to the lexicographic order ofθ(X) andσ(X). Here, we deﬁne

θ(X) = (π1(X)_|_π₁_(X)_|, . . . , πn(X)_|_π_n_(X)_|),

which is the vector of the last symbols ofπi(X) for 1≤i≤n. Andσ(X) is the complement ofθ(X) inπ(X), namely,

σ(X) = (π₁(X)^|^π¹^(X)^|−¹, . . . , π_n(X)^|^πⁿ^(X)^|−¹).

For example, when the input sequence is

X =s1s4s2s1s3s2s3s1s1s2s3s4s1,

its exit sequences are

π(X) = [s4s3s1s2, s1s3s3, s2s1s4, s2s1].

Then for this input sequenceX, we have that

θ(X) = [s2, s3, s4, s1],

σ(X) = [s4s2s1, s1s3, s2s1, s2].

Based on the lexicographic order defined above, both|S(X)|andr(X) can be obtained using a brute-force search. However, this approach in not computationally efficient. Here, we describe an efficient algorithm for computing|S(X)|andr(X) whennis a small constant, such that algorithm C is computable inO(Nlog³Nlog logN) time. This method is inspired by the algorithm for computing the Elias function that is described in [99]. However, when n is not small, the complexity of computing |S(X)| (or r(x)) has an exponential dependence on n, which will make this algorithm much slower in computation than the previous algorithms.

Lemma 3.13. Let

Z= (

∏n i=1

(ki1+ki2+...+kin)!

k_i1!k_i2!...k_in! ), and let N =∑n

i=1

∑n

j=1kij, then Z is computable in O(Nlog³Nlog logN) time (not related with n).

Proof. It is known that given two numbers of length n bits, their multiplication or division is computable inO(nlognlog logn) time based on Sch¨onhage-Strassen algorithm [4]. We can calculate Z based on this fast multiplication.

For simpliﬁcation, we denotek_i=∑n

j=1k_ij. Note that we can writeZ as a multiplication ofN terms, namely

1 ,k1

2 , ..., k1

k₁₁,k1

1 ,k1

2 ..., kn

k_nn, which are denoted as

ρ⁰₁, ρ⁰₂, ..., ρ⁰_N₋₁, ρ⁰_N.

It is easy to see that the notation of everyρ⁰_i used 2 log₂N bits (log₂N for the numerator and logN for the denominator). The total time to compute all of them is much less thanO(Nlog³Nlog logN).

Based on these notations, we writeZ as

Z=ρ⁰₁ρ⁰₂...ρ⁰_N₋₁ρ⁰_N.

Suppose that log₂Nis an integer. Otherwise, we can add trivial terms to the formula above to make log₂N be an integer. In order to calculateZ quickly, the following calculations are performed:

ρ^s_i =ρ^s_2i⁻₋¹₁ρ^s_2i⁻¹,

s= 1,2, ...,log₂N; i= 1,2, ...,2⁻^sN.

Then we are able to compute Z iteratively and ﬁnally get

Z =ρ^log₁ ²^N.

To calculateρ¹_i fori= 1,2, ..., N/2, it takes 2(N/2) multiplications of numbers with length log₂N bits. Similarly, to calculateρ^s_i fori= 1,2, ..., N/2, it takes 2(N/2^s) multiplications of numbers with length 2^slog₂N bits. So the time complexity of computingZ is

log∑₂N s=1

2(N/2^s)O(2^slog₂Nlog(2^slog₂N) log log(2^slog₂N)).

This value is not greater than

O(Nlog²Nlog(NlogN) log log(NlogN)),

which yields the result in the lemma.

Lemma 3.14. Let nbe a small constant and N be the input length, then|S(X)| in algorithm C is computable inO(Nlog³Nlog logN)time.

Proof. The idea to compute|S(X)|in algorithm C is that we can divideS(X) into diﬀerent classes, denoted byS(X, θ) for diﬀerentθ such that

S(X, θ) ={X^′|∀i, j, kj(πi(X^′)) =kij, θ(X^′) =θ},

wherekij =kj(πi(X)) is the number ofsj’s inπi(X) for all 1≤i, j≤n. θ(X) is the vector of the last symbols ofπ(X) deﬁned above. As a result, we have|S(X)|=∑

θ|S(X, θ)|. Although it is not easy to calculate|S(X)|directly, but it is much easier to compute|S(X, θ)|for a givenθ.

For a givenθ= (θ₁, θ₂, ..., θ_n), we need ﬁrst determine whetherS(X, θ) is empty or not. In order to do this, we quickly construct a collection of exit sequences Λ = [Λ1,Λ2, ...,Λn] by moving the ﬁrst θi in πi(X) to the end for all 1≤i≤n. According to the main lemma, we know that S(X, θ) is

empty if and only ifπ_i(X) does not includeθ_i for some ior (x₁,Λ) is not feasible.

IfS(X, θ) is not empty, then (x₁,Λ) is feasible. In this case, based on the main lemma, we have

|S(X, θ)|=

∏n i=1

(ki1+ki2+...+kin−1)!

k_i1!...(k_iθ_i−1)!...k_in!

= (

∏n i=1

(ki1+ki2+...+kin)!

k_i1!k_i2!...k_in! )(

∏n i=1

kiθ_i

(k_i1+k_i2+...+k_in)).

Here, we let

Z= (

∏n i=1

(ki1+ki2+...+kin)!

ki1!ki2!...kin! ).

Then we can get

|S(X)|=∑

|S(X, θ)|=Z(∑

∏n i=1

k_iθ_i

(ki1+ki2+...+kin)).

According to lemma 3.13, Z is computable in O(Nlog³Nlog logN) time. So if n is a small constant, then |S(X)| is also computable in O(Nlog³Nlog logN) time. However, when n is not small, we have to enumerate all the possible combinations for θ with O(nⁿ) time, which is not computationally eﬃcient.

Lemma 3.15. Let n be a small constant and N be the input length, then r(X) in algorithm C is computable inO(Nlog³Nlog logN)time.

Proof. Based on some calculations in the lemma above, we can try to obtainr(X) whenX is ranked with respect to the lexicographic order ofθ(X) andσ(X). Letr(X, θ(X)) denote the rank of X in S(X, θ(X)), then we have that

r(X) = ∑

θ<θ(X)

|S(X, θ)|+r(X, θ(X)),

where<is based on the lexicographic order. In the formula, whennis a small constant,∑

θ<θ(X)|S(X, θ)|

can be obtained inO(Nlog³Nlog logN) time by computing

∑

θ<θ(X):|S(X,θ)|>0

∏n i=1kiθ_i

∏n

i=1(k_i1+k_i2+...+k_in) ,

whereZ is deﬁned in the last lemma and the second term can be calculated fast whennis a small constant. (However, n cannot be big, since the complexity of computing the second term has an exponential dependence onn.)

So far, we only need to compute r(X, θ(X)), with respect to the lexicography order of σ(X).

Here, we writeσ(X) as the concatenation of a group of sequences, namely

σ(X) =σ1(X)∗σ2(X)∗...∗σn(X),

such that for all 1≤i≤n σi(X) =πi(X)^|^πⁱ^(X)^|−¹.

There areM = (N−1)−nsymbols inσ(X). Letri(X) be the number of sequences inS(X, θ(X)) such that their ﬁrst M −i symbols areσ(X)[1, M−i] and their M −i+ 1th symbols are smaller thanσ(X)[M −i+ 1]. Then we can get that

r(X, θ(X)) =

∑M i=1

ri(X).

Let us assume thatσ(X)[M−i+1] =sw_ifor somewi, and it is theuith symbol inσv_i(X). For sim- plicity, we denoteσv_i(X)[ui,|σv_i(X)|] as ζi. For example, whenn= 3 and [σ1(X), σ2(X), σ3(X)] = [s1s2, s2s3, s1s1s1], we have

ζ₁=s₁, ζ₂=s₁s₁, ζ₃=s₁s₁s₁, ζ₄=s₃, ζ₅=s₂s₃, ....

To calculate ri(X), we can count all the sequences generated by permuting the symbols of ζi, σv_i+1(X), ..., σn(X) such that the M −i+ 1th symbol of the new sequence is smaller thansw_i.

Then we can get

ri(X) = ∑

j<wi

(|ζi| −1)!

k₁(ζ_i)!...(k_j(ζ_i)−1)!...k_n(ζ_i)!

∏n i=vi+1

|σi(X)|!

k₁(σ_i(X))!k₂(σ_i(X))!...k_n(σ_i(X))!,

wherekj(X) counts the number ofsi’s inX. Let us deﬁne the values

ρ⁰_i₋₁= |ζ_i| kw_i(ζi),

for all 1≤i≤M. In this expression,k_w_i(ζ_i) is the number ofs_w_i’s inζ_i, ands_w_i is the ﬁrst symbol ofζ_i.

It is easy to show that for 1≤i≤M

ρ⁰_i₋₁ρ⁰_i₋₂....ρ⁰₂ρ⁰₁= |ζi|!

k1(ζi)!...kj(ζi)!...kn(ζi)!

∏n i=v+1

|σi(X)|!

k1(σi(X))!k2(σi(X))!...kn(σi(X))!.

If we also deﬁne the values

λ⁰_i =

∑

j<wikj(ζi)

|ζi| , for all 1≤i≤M, then we have

ri(X) =λ⁰_iρ⁰_i₋₁ρ⁰_i₋₂...ρ⁰₁,

and

r(X, θ(X)) =

∑M i=1

λ⁰_iρ⁰_i₋₁ρ⁰_i₋₂...ρ⁰₂ρ⁰₁.

Suppose that log₂M is an integer. Otherwise, we can add trivial terms to the formula above to make log₂M an integer. In order to quickly calculate r(X, θ(X)), the following calculations are performed forsfrom 1 to log₂M:

ρ^s_i =ρ^s_2i⁻¹ρ^s_2i⁻₋¹₁, i= 1,2, ...,2⁻^sM −1,

λ^s_i =λ^s_2i⁻₋¹₁+λ^s_2i⁻¹ρ^s_2i⁻₋¹₁, i= 1,2, ...,2⁻^sM.

By computing all ρ^s_i andλ^s_i forsfrom 1 to log₂M iteratively, we can get that

r(X, θ(X)) =λ^log₁ ²^M.

Now, we use the same idea in [99] to analyze the computational complexity. Note that every ρ⁰_i and λ⁰_i can be represented using 2 log₂M bits (log₂M for the numerator and logM for the denominator). And we can calculate all of them quickly. To calculate ρ¹_i for i= 1,2, ..., M/2−1, it takes at most 2(M/2) multiplications of numbers with length log₂M bits. To calculate λ¹_i for i= 1,2, ..., M/2, it takes 3(M/2) multiplications of numbers with length log₂M bits. That is because we can writeλ¹_i as ^a_b +_d^c = ^ad+bc_bd for some integersa, b, c, d with length log₂M bits. Similarly, to calculate allρ^s_i andλ^s_i for somes, it takes at most 5(M/2^s) multiplications of numbers with length 2^slog₂M bits. As a result, the time complexity of computingZ is

log∑₂M s=1

5(M/2^s)O(2^slog₂Mlog(2^slog₂M) log log(2^slog₂M)),

which is computable in O(Mlog³Mlog logM) time. As a result, for a small constant n, r(X) is computable inO(Nlog³Nlog logN) time.

Based on the discussion above, we know that algorithm C is computable inO(Nlog³Nlog logN) time when nis a small constant. However, whennis not a constant, this algorithm is not computationally eﬃcient since its time complexity depends exponentially onn.

Dalam dokumen Randomness and Noise in Information Systems (Halaman 89-99)