www.elsevier.com/locate/spa
The Markov approximation of the sequences of
N
-valued
random variables and a class of small deviation theorems
Wen Liu
a;∗, Weiguo Yang
baDepartment of Mathematics, Hebei University of Technology, Tianjin 300130,
People’s Republic of China
bTeaching and Research Division of Mathematics, Hebei Institute of Architectural Science and
Technology, Handan 056038, People’s Republic of China
Received 2 March 1999; received in revised form 10 February 2000; accepted 14 February 2000
Abstract
Let {Xn; n¿0}be a sequence of random variables on the probability space (;F; P) taking
values in the alphabetS={1;2; : : : ; N}, andQbe another probability measure onF, under which {Xn; n¿0}is a Markov chain. Leth(P|Q) be the sample divergence rate ofP with respect to
Qrelated to {Xn}. In this paper the Markov approximation of{Xn; n¿0}underP is discussed
by using the notion ofh(P|Q), and a class of small deviation theorems for the averages of the bivariate functions of{Xn; n¿0}are obtained. In the proof an analytic technique in the study of
a.e. convergence together with the martingale convergence theorem is applied. c 2000 Elsevier Science B.V. All rights reserved.
MSC:primary 94A17; secondary 60F15
Keywords:Small deviation theorem; Entropy; Entropy density; Sample divergence rate;
Shannon–McMillan theorem
1. Introduction
Let (;F; P) be the probability space, and {Xn; n¿0} be a sequence of random
variables taking values in S={1;2; : : : ; N} and with the joint distribution
P(X0=x0; : : : ; Xn=xn) =p(x0; : : : ; xn); xi∈S; 06i6n: (1)
Let
fn(!) =−n−1logp(X0; : : : ; Xn); (2)
where log is the natural logarithm, ! is the sample point, and Xi stands for Xi(!). fn(!) is called entropy density of {Xi; 06i6n}. Also let Q be another probability
measure on F, and set
Q(X0=x0; : : : ; Xn=xn) =q(x0; : : : ; xn); xi∈S; 06i6n: (3)
∗Corresponding author.
IfQ is Markovian measure relative to {Xn; n¿0}, then there exist a distribution on S
(q(1); q(2); : : : ; q(N)); (4)
and a sequence of stochastic matrices
(pn(i; j)); i; j∈S; n¿1 (5)
such that
q(x0; : : : ; xn) =q(x0) n Y
k=1
pk(xk−1; xk): (6)
In this case
−n−1logq(X0; : : : ; Xn) =−n−1 "
logq(X0) +
n X
k=1
logpk(Xk−1; Xk)
#
: (7)
To avoid technicalities, assume that p(x0; : : : ; xn); q(x0; : : : ; xn); q(i), and pn(i; j) are all
positive.
Denition 1. Let p andq be given as in (1) and (3), and let
h(P|Q) = lim sup
n
n−1log[p(X0; : : : ; Xn)=q(X0; : : : ; Xn)]: (8)
h(P|Q) is called the sample divergence rate ofP relative to Q.
If Q is Markovian relative to {Xn; n¿0}, i.e., (6) holds, then
h(P|Q) = lim sup
n
n−1log
"
p(X0; : : : ; Xn)=q(x0) n Y
k=1
pk(Xk−1; Xk) #
: (9)
Notice thath(P|Q) is the superior limit of the dierence between the entropy density of {Xi; 06i6n} under Q and one under P.
A question of importance in information theory is the limit properties of the entropy density. Since Shannon’s initial work (Shannon, 1948) was published, there has been a deal of investigation about this question (e.g., see Algoet and Cover, 1988; Barron, 1985; Breiman, 1957; Chung, 1961; Gray and Kieer, 1980; Kieer, 1973, 1974; Liu, 1990; Liu and Yang, 1996; McMillan, 1953).
In this paper a class of small deviation theorems (i.e., the strong limit theorems rep-resented by inequalities) for the averages of the bivariate functions of the sequences of arbitrary N-valued random variables, are established by using the notion of h(P|Q), and an extension of Shannon–McMillan theorem to nonhomogeneous Markov informa-tion sources is given.
2. Main results
For convenience we rst give a limit property of the likelihood ratio as a lemma.
Lemma 1. Letting p and q be given as in (1) and (3); we have
lim sup
n
Proof. It is well known that the likelihood ratio Zn=q(X0; : : : ; Xn)=p(X0; : : : ; Xn) is a
nonnegative supermartingale which converges almost everywhere (under P) to a nite
limit. This implies (10) directly. Obviously (10) implies that
h(P|Q)¿0 P-a:e: (11)
Theorem 1. Let {Xn; n¿0} be a sequence of random variables on the measurable
space(;F)taking values in S={1;2; : : : ; N}; P and Qbe two probability measures onF;such that{Xn; n¿0}is Markovian under Q; i.e.; (6)holds; h(P|Q) be dened by (9); {fn(x; y); n¿1} be a sequence of functions dened on S2; and c¿0 be a
constant. Let
D(c) ={!:h(P|Q)6c}: (12)
Assume that there exists ¿0; such that ∀i∈S;
b(i) = lim sup
n→∞
n−1 n X
k=1
EQ[exp{|fk(Xk−1; Xk)|} |Xk−1=i]6; (13)
where EQ denotes expectation under Q.Set
Ht= 2=e2(t−)2; (14)
where 0¡ t ¡ . Then when 06c6t2H
t; we have
lim sup
n→∞
n−1
n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
62pcHt P-a:e: on D(c): (15)
In particular;
lim
n→∞n −1
n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}= 0 P-a:e: on D(0):
(16)
Remark. Obviously h(P|Q) ≡ 0 if P=Q, and it has been shown in (11) that in general case h(P|Q)¿0P-a:e. Hence h(P|Q) can be used to measure the deviation
between {Xn; n¿0} under P and one under Q. The smaller h(P|Q) is, the smaller
the deviation will be.
Proof. Let
Ix0···xn={!:Xk=xk; xk ∈S; 06k6n}:
Ix0···xn is called the nth-order cylinder set. We have by (1)
P(Ix0···xn) =P(X0=x0; : : : ; Xn=xn) =p(x0; : : : ; xn):
Letting the collection consisting of and all cylinders be denoted by A, and be a
nonnegative constant, dene a set function on A as follows:
(Ix0···xn) =q(x0)
n Y
k=1
exp{fk(xk−1; xk)}pk(xk−1; xk)
EQ[exp{fk(Xk−1; Xk)} |Xk−1=xk−1]
(Ix0) =
where the summation P
xn extends over all possible values of xn. Noticing that
EQ[exp{fn(Xn−1; Xn)} |Xn−1=xn−1]
It follows from (21) and (22) that
By virtue of the inequalities logx6x−1 (x ¿0) and ex−1−x6(x2=2)e|x|;and noticing
It follows from (12) and (25) that
Since P(A(pc=Ht)) = 1, we have by (28),
lim sup
n→∞
n−1 n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
62pcHt P-a:e:onD(c): (29)
When c= 0, choose i∈(0; t] (i= 1;2; : : :); such that i→0 (i→ ∞); and let A∗=
T∞
i=1A(i). Then for all i we have by (26),
lim sup
n→ ∞
n−1 n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}6iHt;
!∈A∗∩D(0):
Since P(A∗) = 1; hence (29) also holds when c= 0.
When − ¡−t ¡ ¡0; by virtue of (34) it can be shown in a similar way that
lim inf
n→ ∞ n −1
n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
¿−2pcHt P-a:e:onD(c): (30)
Eq. (15) follows from (29) and (30), and (16) follows from (15) directly. This com-pletes the proof of the theorem.
Now we give two examples of function sequence {fn(x; y); n¿1} which is not
bounded and satises the exponential assumption (13).
Example 1. Let
fn(i; j) =−logpn(i; j); n= min{pn(i; j); i; j∈S}=pn(in; jn); in; jn ∈S:
Assume that n→0 as n→ ∞. Then fn(in; jn)→ ∞; and {fn; n¿1} is not bounded.
Let 0¡ ¡1. ∀i∈S; we have
EQ[exp{|fk(Xk−1; Xk)|} |Xk−1=i] = N X
j=1
[pk(i; j)]1−¡ N:
Hence the exponential assumption (13) is satised.
Example 2. Let{gn(x; y; n¿1)} be a sequence functions dened onS2 and satisfying the following conditions:
06gn(x; y)6M ¡∞; n¿1;
n= max{gn(x; y); x; y∈S}=gn(in; jn)¿m ¿0;
{an; n¿1} be a sequence of positive numbers satisfying the following conditions:
lim sup
n
an=∞;
lim sup
n n−1
n X
k=1
andfn(x; y) =angn(x; y). Then lim supnfn(in; jn) =∞, and{fn; n¿1} is not bounded. ∀i∈S, we have
EQ[exp{fk(Xk−1; Xk)} |Xk−1=i] = N X
j=1
exp{akgk(i; j)}pk(i; j)6eakM:
Hence the assumption (13) is satised.
Theorem 2. Let
Ht= 2N=e2(t−1)2; 0¡ t ¡1: (31)
Under the conditions of Theorem 1; when 06c6t2Ht; we have
lim sup
n→ ∞
(
fn(!)−n−1 n X
k=1
H[pk(Xk−1;1); : : : ; pk(Xk−1; N)]
)
62pcHt P-a:e: on D(c); (32)
lim inf
n→ ∞
(
fn(!)−n−1 n X
k=1
H[pk(Xk−1;1); : : : ; pk(Xk−1; N)] )
6−2pcHt−c P-a:e: on D(c); (33)
where H(p1; : : : ; pN) denotes the entropy of distribution (p1; : : : ; pN); i.e.;
H(p1; : : : ; pN) =− N X
j=1
pjlogpj:
Proof. In Theorem 1 let fk(x; y) =−logpk(x; y) (k¿1); = 1. Since
EQ[exp{|fk(Xk−1; Xk)|} |Xk−1=i] = N X
j=1
exp{| −logpk(i; j)|}pk(i; j)
=
N X
j=1
pk(i; j)=pk(i; j) =N;
hence ∀i∈S,
b1(i) = lim sup
n→ ∞
n−1 n X
k=1
EQ[exp{|fk(Xk−1; Xk)|} |Xk−1=i]6N: (34)
Noticing that
EQ[−logpk(Xk−1; Xk)|Xk−1] =− N X
j=1
pk(Xk−1; j)logpk(Xk−1; j)
=H[pk(Xk−1;1); : : : ; pk(Xk−1; N)];
when 06c6t2H
t, we have by (34), (31), and (15),
lim sup
n→ ∞
(
n−1 n X
k=1
(−logpk(Xk−1; Xk))−n−1
n X
k=1
H[pk(Xk−1;1); : : : ; pk(Xk−1; N)]
)
lim inf
This completes the proof of Theorem 2.
we have when c ¿0,
lim sup
n→ ∞
n−1 n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
6M
√
c+ cM log(1 +√c)
6M(2√c+c); !∈A
1
Mlog(1 +
√ c)
∩D(c): (46)
Since P(A((1=M)log(1 +√c))) = 1, we have by (46),
lim sup
n→ ∞
n−1 n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
=M(c+ 2√c) P-a:e:onD(c): (47)
When ¡0, it follows from (46) that
lim inf
n→ ∞ n −1
n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
¿−M(e−M −1) +c
; !∈A()∩D(c): (48)
Taking =−(1=M)log(1 +√c) in (48), and using (45), we have c ¿0,
lim inf
n→ ∞ n −1
n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
¿−M√c− cM
log(1 +√c)¿−M(c+ 2
√ c);
!∈A
−M1log(1 +√c)
∩D(c): (49)
Since P(A(−(1=M)log(1 +√c))) = 1, it follows from (49) that
lim inf
n→ ∞ n −1
n X
k=1
{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}
¿−M(c+ 2√c) P-a:e:on D(c): (50)
In a similar way it can be shown that when c= 0 (47) and (50) also hold. Eq. (40)
follows from (47) and (50) directly. This completes the proof of Theorem 3.
3. Some extensions of Shannon–McMillan theorem to nonhomogeneous Markov information sources
Theorem 4. Let {Xn; n¿0} be; under probability measure P; a nonhomogeneous
Markov information source with state space S; initial distribution (4); and
transi-tion matrices(5); i.e.;
P(X0=x0; : : : ; Xn=xn) =p(x0; : : : ; xn) =q(x0) n Y
k=1
Let i; j ∈ S; Sn(i; !) be the number of i in the sequence X0(!); X1(!); : : : ; Xn−1(!);
where j(·) denotes the Kronecker delta function. Also let
(p(i; j)); p(i; j)¿0; i; j∈S (52)
Proof. Let Q be the homogeneous Markov measure determined by initial distribution (4) and transition matrix (52), i.e.,
q(x0; : : : ; xn) =q(x0) n Y
k=1
p(xk−1; xk): (57)
In this case it follows from (9), (51), and (57) that
h(P|Q) = lim sup
It is easy to see from Theorem 3 that
Eqs. (59) and (60) imply that
By the entropy inequality we have
N
This implies thatP(D(0))=1. Hence (54) – (56) can be established by using Theorem 3
(cf. Liu and Yang, 1996). This completes the proof of the theorem.
Corollary. Letting [a]+= max{a;0}; if
Now we give an approach to construct examples satisfying the assumption (53).
Example. Let= min{p(i; j); i; j∈S}; = max{p(i; j); i; j∈S}, and (n(i; j)); i; j∈ S, be a sequence of matrices satisfying the following conditions:
N X
j=1
[n(i; j)]+= N X
j=1
[n(i; j)]−; i∈S; n¿1;
dn= max{|n(i; j)|; i; j∈S}¡min{;1−};
∞
X
n=1
dn=n ¡∞:
Letpn(i; j)=p(i; j)+n(i; j). Then (pn(i; j)) is a stochastic matrix, and by Kronecker’s
lemma we have
lim
n→ ∞n −1
n X
k=1
|pk(i; j)−p(i; j)|6 lim n→ ∞n
−1 n X
k=1 dk= 0:
This implies (53) evidently.
4. For further reading
Billingsley, 1986; Cover and Thomas, 1991; Laha and Rohatgi, 1979; Karlin and Taylor, 1975
Acknowledgements
The authors are thankful to the referee for his valuable suggestions.
References
Algoet, P.H., Cover, T.M., 1988. A sandwich proof of the Shannon–McMillan–Breiman theorem. Ann. Probab. 16, 899–909.
Barron, A.R., 1985. The strong ergodic theorem for densities: generalized Shannon–McMillan–Breiman theorem. Ann. Probab. 13, 1292–1303.
Billingsley, P., 1986. Probability and Measure. Wiley, New York.
Breiman, L., 1957. The individual ergodic theorem of information theory. Ann. Math. Statist. 28, 809–811. Chung, K.L., 1961. The ergodic theorem of information theory. Ann. Math. Statist. 32, 612–614.
Cover, T.M., Thomas, J.A., 1991. Elements of Information Theory. Wiley, New York. Gray, R.M., 1990. Entropy and Information Theory. Springer, New York.
Gray, R.M., Kieer, J.G., 1980. Asymptotically mean stationary measure. Ann. Probab. 8, 962–973. Kieer, J.C., 1973. A counterexample to Perez’s generalization of the Shannon–McMillan theorem. Ann.
Probab. 1, 362–364. Correction 4, 153–154, 1976.
Kieer, J.C., 1974. A Simple proof of Moy–Perez generalization of Shannon–McMillan theorem. Pacic J. Math. 51, 203–204.
Liu, W., 1990. Relative entropy densities and a class of limit theorems of the sequence ofm-valued random variables. Ann. Probab. 18, 829–839.
Liu, W., Yang, W., 1996. A extension of Shannon–McMillan theorem and some limit properties for nonhomogeneous Markov chains. Stochastic Process. Appl. 61, 129–145.
McMillan, B., 1953. The basic theorems of information theory. Ann. Math. Statist. 24, 196–216.