Random Number Generation from Markov Chains
3.6 Algorithm C: An Optimal Algorithm
Let us defineα(N) =∑
nk2nk, where∑
2nk is the standard binary expansion of N. Assume Ψ is the Elias function, then
ηi(ϖ) = 1 ϖ
∑
k1+...+kn=ϖ
α( ϖ!
k1!k2!...kn!)pki11pki22...pkinn.
Based on this formula, we can numerically study the relationship between the limiting efficiency and the window size (see section 3.7). In fact, when the window size becomes large, the limiting efficiency (n→ ∞) approaches the information-theoretic upper bound.
Proof. It is easy to see that if|S(α,K)∩
BY|=|S(α,K)∩
BY′|for all (α, K) and|Y|=|Y′|, thenY and Y′ have the same probability to be generated. In this case, f can generate random bits from an arbitrary Markov chain. In the rest, we only need to prove the inverse claim.
Iff can generate random bits from an arbitrary Markov chain, then P[f(X) =Y] =P[f(X) = Y′] for any two binary sequences Y and Y′ of the same length. Here, let pij be the transition probability from statesi to statesj for all 1≤i, j≤n, we can write
P[f(X) =Y] = ∑
α,K∈G
|S(α,K)∩
BY|ϕK(p11, p12, ..., pnn)P(x1=sα),
where
G={K|kij ∈ {0}∪ Z+,∑
i,j
kij =N−1},
and
ϕK(p11, p12, ..., pnn) =
∏n i=1
∏n j=1
pkijij.
Similarly,
P[f(X) =Y′] = ∑
α,K∈G
|S(α,K)∩
BY′|ϕK(p11, p12, ..., pnn)P(x1=sα).
As a result,
∑
α,K∈G
(|S(α,K)∩
BY′| − |S(α,K)∩
BY|)ϕK(p11, ..., pnn)×P(x1=sα) = 0.
SinceP(x1=sα) can be any value in [0,1], for all 1≤α≤nwe have
∑
K∈G
(|S(α,K)∩
BY′| − |S(α,K)∩
BY|)ϕK(p11, ..., pnn) = 0.
It can be proved that∪
K∈G{ϕK(p11, p12, ..., pnn)} are linear independent in the vector space of
functions on the transition probabilities, namely
{(p11, p12, ..., pnn)|pij ∈[0,1],
∑n j=1
pij = 1}.
Based on this fact, we can conclude that|S(α,K)∩
BY|=|S(α,K)∩
BY′|for all (α, K) if|Y|=|Y′|.
Let us define α(N) = ∑
nk2nk, where ∑
2nk is the standard binary expansion of N, then we have the sufficient condition for an optimal function .
Lemma 3.12 (Sufficient condition for an optimal function). Let f∗ be a function that generates random bits from an arbitrary Markov chain with unknown transition probabilities. If for anyαand any n×n nonnegative integer matrix K with ∑n
i=1
∑n
j=1kij = N −1, the following equation is satisfied,
∑
X∈S(α,K)
|f∗(X)|=α(|S(α,K)|),
thenf∗ generates independent unbiased random bits with optimal information efficiency. Note that
|f∗(X)| is the length off∗(x)and|S(α,K)|is the size of S(α,K).
Proof. Let hdenote an arbitrary function that is able to generate random bits from any Markov chain. According to lemma 2.9 in [88], we know that
∑
X∈S(α,K)
|h(X)| ≤α(|S(α,K)|).
Then the average output length ofhis
E(|h(X)|) = 1 N
∑
(α,K)
∑
X∈S(α,K)
|h(X)|ϕ(K)P[x1=sα]
≤ 1 N
∑
(α,K)
α(|S(α,K)|)ϕ(K)P[x1=sα]
= 1
N
∑
(α,K)
∑
X∈S(α,K)
|f∗(X)|ϕ(K)P[x1=sα]
= E(|f∗(X)|).
Sof∗ is the optimal one. This completes the proof.
Here, we construct the following algorithm (Algorithm C) which satisfies all the conditions in lemma 3.11 and lemma 3.12. As a result, it can generate unbiased random bits from an arbitrary Markov chain with optimal information efficiency.
Algorithm C
Input: A sequenceX =x1x2..., xN produced by a Markov chain, wherexi∈S={s1, s2, ..., sn}. Output: A sequenceY of 0′s and 1′s.
Main Function:
1) Get the matrixK={kij}with
kij=kj(πi(X)).
2) DefineS(X) as
S(X) ={X′|kj(πi(X′)) =kij∀i, j;x′1=x1},
then compute|S(X)|.
3) Compute the rank r(X) ofX in S(X) with respect to a given order. The rank with respect to a lexicographic order will be given later.
4) According to|S(X)|andr(X), determine the output sequence. Let∑
k2nk be the standard binary expansion of |S(X)| with n1 > n2 > ... and assume the starting value of r(X) is 0. If r(X) < 2n1, the output is the n1 digit binary representation of r(x). If ∑i
k=12nk ≤ r(x) <
∑i+1
k=12nk, the output is theni+1 digit binary representation ofr(x).
Comment: The fast calculations of|S(X)| andr(x) will be given in the rest of this section.
In algorithm A, when we use Elias’s function as Ψ, the limiting efficiencyηN = E[M]N (asN → ∞) realizes the bound H(X)N . Algorithm C is optimal, so it has the same or higher efficiency. Therefore, the limiting efficiency of algorithm C asN → ∞also realizes the bound H(X)N .
In algorithm C, for an input sequence X with xN = sχ, we can rank it with respect to the lexicographic order ofθ(X) andσ(X). Here, we define
θ(X) = (π1(X)|π1(X)|, . . . , πn(X)|πn(X)|),
which is the vector of the last symbols ofπi(X) for 1≤i≤n. Andσ(X) is the complement ofθ(X) inπ(X), namely,
σ(X) = (π1(X)|π1(X)|−1, . . . , πn(X)|πn(X)|−1).
For example, when the input sequence is
X =s1s4s2s1s3s2s3s1s1s2s3s4s1,
its exit sequences are
π(X) = [s4s3s1s2, s1s3s3, s2s1s4, s2s1].
Then for this input sequenceX, we have that
θ(X) = [s2, s3, s4, s1],
σ(X) = [s4s2s1, s1s3, s2s1, s2].
Based on the lexicographic order defined above, both|S(X)|andr(X) can be obtained using a brute-force search. However, this approach in not computationally efficient. Here, we describe an efficient algorithm for computing|S(X)|andr(X) whennis a small constant, such that algorithm C is computable inO(Nlog3Nlog logN) time. This method is inspired by the algorithm for computing the Elias function that is described in [99]. However, when n is not small, the complexity of computing |S(X)| (or r(x)) has an exponential dependence on n, which will make this algorithm much slower in computation than the previous algorithms.
Lemma 3.13. Let
Z= (
∏n i=1
(ki1+ki2+...+kin)!
ki1!ki2!...kin! ), and let N =∑n
i=1
∑n
j=1kij, then Z is computable in O(Nlog3Nlog logN) time (not related with n).
Proof. It is known that given two numbers of length n bits, their multiplication or division is computable inO(nlognlog logn) time based on Sch¨onhage-Strassen algorithm [4]. We can calculate Z based on this fast multiplication.
For simplification, we denoteki=∑n
j=1kij. Note that we can writeZ as a multiplication ofN terms, namely
k1
1 ,k1
2 , ..., k1
k11,k1
1 ,k1
2 ..., kn
knn, which are denoted as
ρ01, ρ02, ..., ρ0N−1, ρ0N.
It is easy to see that the notation of everyρ0i used 2 log2N bits (log2N for the numerator and logN for the denominator). The total time to compute all of them is much less thanO(Nlog3Nlog logN).
Based on these notations, we writeZ as
Z=ρ01ρ02...ρ0N−1ρ0N.
Suppose that log2Nis an integer. Otherwise, we can add trivial terms to the formula above to make log2N be an integer. In order to calculateZ quickly, the following calculations are performed:
ρsi =ρs2i−−11ρs2i−1,
s= 1,2, ...,log2N; i= 1,2, ...,2−sN.
Then we are able to compute Z iteratively and finally get
Z =ρlog1 2N.
To calculateρ1i fori= 1,2, ..., N/2, it takes 2(N/2) multiplications of numbers with length log2N bits. Similarly, to calculateρsi fori= 1,2, ..., N/2, it takes 2(N/2s) multiplications of numbers with length 2slog2N bits. So the time complexity of computingZ is
log∑2N s=1
2(N/2s)O(2slog2Nlog(2slog2N) log log(2slog2N)).
This value is not greater than
O(Nlog2Nlog(NlogN) log log(NlogN)),
which yields the result in the lemma.
Lemma 3.14. Let nbe a small constant and N be the input length, then|S(X)| in algorithm C is computable inO(Nlog3Nlog logN)time.
Proof. The idea to compute|S(X)|in algorithm C is that we can divideS(X) into different classes, denoted byS(X, θ) for differentθ such that
S(X, θ) ={X′|∀i, j, kj(πi(X′)) =kij, θ(X′) =θ},
wherekij =kj(πi(X)) is the number ofsj’s inπi(X) for all 1≤i, j≤n. θ(X) is the vector of the last symbols ofπ(X) defined above. As a result, we have|S(X)|=∑
θ|S(X, θ)|. Although it is not easy to calculate|S(X)|directly, but it is much easier to compute|S(X, θ)|for a givenθ.
For a givenθ= (θ1, θ2, ..., θn), we need first determine whetherS(X, θ) is empty or not. In order to do this, we quickly construct a collection of exit sequences Λ = [Λ1,Λ2, ...,Λn] by moving the first θi in πi(X) to the end for all 1≤i≤n. According to the main lemma, we know that S(X, θ) is
empty if and only ifπi(X) does not includeθi for some ior (x1,Λ) is not feasible.
IfS(X, θ) is not empty, then (x1,Λ) is feasible. In this case, based on the main lemma, we have
|S(X, θ)|=
∏n i=1
(ki1+ki2+...+kin−1)!
ki1!...(kiθi−1)!...kin!
= (
∏n i=1
(ki1+ki2+...+kin)!
ki1!ki2!...kin! )(
∏n i=1
kiθi
(ki1+ki2+...+kin)).
Here, we let
Z= (
∏n i=1
(ki1+ki2+...+kin)!
ki1!ki2!...kin! ).
Then we can get
|S(X)|=∑
θ
|S(X, θ)|=Z(∑
θ
∏n i=1
kiθi
(ki1+ki2+...+kin)).
According to lemma 3.13, Z is computable in O(Nlog3Nlog logN) time. So if n is a small constant, then |S(X)| is also computable in O(Nlog3Nlog logN) time. However, when n is not small, we have to enumerate all the possible combinations for θ with O(nn) time, which is not computationally efficient.
Lemma 3.15. Let n be a small constant and N be the input length, then r(X) in algorithm C is computable inO(Nlog3Nlog logN)time.
Proof. Based on some calculations in the lemma above, we can try to obtainr(X) whenX is ranked with respect to the lexicographic order ofθ(X) andσ(X). Letr(X, θ(X)) denote the rank of X in S(X, θ(X)), then we have that
r(X) = ∑
θ<θ(X)
|S(X, θ)|+r(X, θ(X)),
where<is based on the lexicographic order. In the formula, whennis a small constant,∑
θ<θ(X)|S(X, θ)|
can be obtained inO(Nlog3Nlog logN) time by computing
Z
∑
θ<θ(X):|S(X,θ)|>0
∏n i=1kiθi
∏n
i=1(ki1+ki2+...+kin) ,
whereZ is defined in the last lemma and the second term can be calculated fast whennis a small constant. (However, n cannot be big, since the complexity of computing the second term has an exponential dependence onn.)
So far, we only need to compute r(X, θ(X)), with respect to the lexicography order of σ(X).
Here, we writeσ(X) as the concatenation of a group of sequences, namely
σ(X) =σ1(X)∗σ2(X)∗...∗σn(X),
such that for all 1≤i≤n σi(X) =πi(X)|πi(X)|−1.
There areM = (N−1)−nsymbols inσ(X). Letri(X) be the number of sequences inS(X, θ(X)) such that their first M −i symbols areσ(X)[1, M−i] and their M −i+ 1th symbols are smaller thanσ(X)[M −i+ 1]. Then we can get that
r(X, θ(X)) =
∑M i=1
ri(X).
Let us assume thatσ(X)[M−i+1] =swifor somewi, and it is theuith symbol inσvi(X). For sim- plicity, we denoteσvi(X)[ui,|σvi(X)|] as ζi. For example, whenn= 3 and [σ1(X), σ2(X), σ3(X)] = [s1s2, s2s3, s1s1s1], we have
ζ1=s1, ζ2=s1s1, ζ3=s1s1s1, ζ4=s3, ζ5=s2s3, ....
To calculate ri(X), we can count all the sequences generated by permuting the symbols of ζi, σvi+1(X), ..., σn(X) such that the M −i+ 1th symbol of the new sequence is smaller thanswi.
Then we can get
ri(X) = ∑
j<wi
(|ζi| −1)!
k1(ζi)!...(kj(ζi)−1)!...kn(ζi)!
∏n i=vi+1
|σi(X)|!
k1(σi(X))!k2(σi(X))!...kn(σi(X))!,
wherekj(X) counts the number ofsi’s inX. Let us define the values
ρ0i−1= |ζi| kwi(ζi),
for all 1≤i≤M. In this expression,kwi(ζi) is the number ofswi’s inζi, andswi is the first symbol ofζi.
It is easy to show that for 1≤i≤M
ρ0i−1ρ0i−2....ρ02ρ01= |ζi|!
k1(ζi)!...kj(ζi)!...kn(ζi)!
∏n i=v+1
|σi(X)|!
k1(σi(X))!k2(σi(X))!...kn(σi(X))!.
If we also define the values
λ0i =
∑
j<wikj(ζi)
|ζi| , for all 1≤i≤M, then we have
ri(X) =λ0iρ0i−1ρ0i−2...ρ01,
and
r(X, θ(X)) =
∑M i=1
λ0iρ0i−1ρ0i−2...ρ02ρ01.
Suppose that log2M is an integer. Otherwise, we can add trivial terms to the formula above to make log2M an integer. In order to quickly calculate r(X, θ(X)), the following calculations are performed forsfrom 1 to log2M:
ρsi =ρs2i−1ρs2i−−11, i= 1,2, ...,2−sM −1,
λsi =λs2i−−11+λs2i−1ρs2i−−11, i= 1,2, ...,2−sM.
By computing all ρsi andλsi forsfrom 1 to log2M iteratively, we can get that
r(X, θ(X)) =λlog1 2M.
Now, we use the same idea in [99] to analyze the computational complexity. Note that every ρ0i and λ0i can be represented using 2 log2M bits (log2M for the numerator and logM for the denominator). And we can calculate all of them quickly. To calculate ρ1i for i= 1,2, ..., M/2−1, it takes at most 2(M/2) multiplications of numbers with length log2M bits. To calculate λ1i for i= 1,2, ..., M/2, it takes 3(M/2) multiplications of numbers with length log2M bits. That is because we can writeλ1i as ab +dc = ad+bcbd for some integersa, b, c, d with length log2M bits. Similarly, to calculate allρsi andλsi for somes, it takes at most 5(M/2s) multiplications of numbers with length 2slog2M bits. As a result, the time complexity of computingZ is
log∑2M s=1
5(M/2s)O(2slog2Mlog(2slog2M) log log(2slog2M)),
which is computable in O(Mlog3Mlog logM) time. As a result, for a small constant n, r(X) is computable inO(Nlog3Nlog logN) time.
Based on the discussion above, we know that algorithm C is computable inO(Nlog3Nlog logN) time when nis a small constant. However, whennis not a constant, this algorithm is not compu- tationally efficient since its time complexity depends exponentially onn.