Directory UMM :Data Elmu:jurnal:S:Stochastic Processes And Their Applications:Vol89.Issue1.2000:

(1)

www.elsevier.com/locate/spa

The Markov approximation of the sequences of

N

-valued

random variables and a class of small deviation theorems

Wen Liu

a;∗

_{, Weiguo Yang}

b

a_{Department of Mathematics, Hebei University of Technology, Tianjin 300130,}

People’s Republic of China

b_{Teaching and Research Division of Mathematics, Hebei Institute of Architectural Science and}

Technology, Handan 056038, People’s Republic of China

Received 2 March 1999; received in revised form 10 February 2000; accepted 14 February 2000

Abstract

Let {Xn; n¿0}be a sequence of random variables on the probability space (;F; P) taking

values in the alphabetS={1;2; : : : ; N}, andQbe another probability measure onF, under which {Xn; n¿0}is a Markov chain. Leth(P|Q) be the sample divergence rate ofP with respect to

Qrelated to {Xn}. In this paper the Markov approximation of{Xn; n¿0}underP is discussed

by using the notion ofh(P|Q), and a class of small deviation theorems for the averages of the bivariate functions of{Xn; n¿0}are obtained. In the proof an analytic technique in the study of

MSC:primary 94A17; secondary 60F15

Keywords:Small deviation theorem; Entropy; Entropy density; Sample divergence rate;

Shannon–McMillan theorem

1. Introduction

Let (;F; P) be the probability space, and _{Xn; n¿0} be a sequence of random

variables taking values in S={1;2; : : : ; N} and with the joint distribution

P(X0=x0; : : : ; Xn=xn) =p(x0; : : : ; xn); xi∈S; 06i6n: (1)

Let

fn(!) =−n−1logp(X0; : : : ; Xn); (2)

where log is the natural logarithm, ! is the sample point, and Xi stands for Xi(!). fn(!) is called entropy density of {Xi; 06i6n}. Also let Q be another probability

measure on F, and set

Q(X0=x0; : : : ; Xn=xn) =q(x0; : : : ; xn); xi∈S; 06i6n: (3)

∗_{Corresponding author.}

(2)

IfQ is Markovian measure relative to {Xn; n¿0}, then there exist a distribution on S

(q(1); q(2); : : : ; q(N)); (4)

and a sequence of stochastic matrices

(pn(i; j)); i; j∈S; n¿1 (5)

such that

q(x0; : : : ; xn) =q(x0) n Y

k=1

pk(xk−1; xk): (6)

In this case

−n−1logq(X0; : : : ; Xn) =−n−1 "

logq(X0) +

n X

k=1

logpk(Xk−1; Xk)

#

: (7)

To avoid technicalities, assume that p(x0; : : : ; xn); q(x0; : : : ; xn); q(i), and pn(i; j) are all

positive.

Denition 1. Let p andq be given as in (1) and (3), and let

h(P_|Q) = lim sup

n

n−1log[p(X0; : : : ; Xn)=q(X0; : : : ; Xn)]: (8)

h(P_|Q) is called the sample divergence rate ofP relative to Q.

If Q is Markovian relative to _{Xn; n¿0_}, i.e., (6) holds, then

h(P_|Q) = lim sup

n

n−1log

"

p(X0; : : : ; Xn)=q(x0) n Y

k=1

pk(Xk−1; Xk) #

: (9)

Notice thath(P|Q) is the superior limit of the dierence between the entropy density of _{Xi; 06i6n} under Q and one under P.

A question of importance in information theory is the limit properties of the entropy density. Since Shannon’s initial work (Shannon, 1948) was published, there has been a deal of investigation about this question (e.g., see Algoet and Cover, 1988; Barron, 1985; Breiman, 1957; Chung, 1961; Gray and Kieer, 1980; Kieer, 1973, 1974; Liu, 1990; Liu and Yang, 1996; McMillan, 1953).

In this paper a class of small deviation theorems (i.e., the strong limit theorems rep-resented by inequalities) for the averages of the bivariate functions of the sequences of arbitrary N-valued random variables, are established by using the notion of h(P|Q), and an extension of Shannon–McMillan theorem to nonhomogeneous Markov informa-tion sources is given.

2. Main results

For convenience we rst give a limit property of the likelihood ratio as a lemma.

Lemma 1. Letting p and q be given as in (1) and (3); we have

lim sup

n

(3)

Proof. It is well known that the likelihood ratio Zn=q(X0; : : : ; Xn)=p(X0; : : : ; Xn) is a

nonnegative supermartingale which converges almost everywhere (under P) to a nite

limit. This implies (10) directly. Obviously (10) implies that

h(P|Q)¿0 P-a:e: (11)

Theorem 1. Let _{Xn; n¿0} be a sequence of random variables on the measurable

space(;F)taking values in S={1;2; : : : ; N}; P and Qbe two probability measures onF;such that{Xn; n¿0}is Markovian under Q; i.e.; (6)holds; h(P|Q) be dened by (9); _{fn(x; y); n¿1} be a sequence of functions dened on S2; and c¿0 be a

constant. Let

D(c) ={!:h(P|Q)6c}: (12)

Assume that there exists ¿0; such that ∀i∈S;

b(i) = lim sup

n→∞

n−1 n X

k=1

EQ[exp_{_|fk(Xk−1; Xk)|} |Xk−1=i]6; (13)

where EQ denotes expectation under Q.Set

Ht= 2=e2(t−)2; (14)

where 0¡ t ¡ . Then when 06c6t2_H

t; we have

lim sup

n→∞

n−1

n X

k=1

{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}

62pcHt P-a:e: on D(c): (15)

In particular;

lim

n→∞n −1

n X

k=1

{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}= 0 P-a:e: on D(0):

(16)

Remark. Obviously h(P|Q) ≡ 0 if P=Q, and it has been shown in (11) that in general case h(P_|Q)¿0P-a:e. Hence h(P_|Q) can be used to measure the deviation

between {Xn; n¿0} under P and one under Q. The smaller h(P|Q) is, the smaller

the deviation will be.

Proof. Let

Ix0···xn={!:Xk=xk; xk ∈S; 06k6n}:

Ix0···xn is called the nth-order cylinder set. We have by (1)

P(Ix0···xn) =P(X0=x0; : : : ; Xn=xn) =p(x0; : : : ; xn):

Letting the collection consisting of and all cylinders be denoted by _A, and be a

nonnegative constant, dene a set function on A as follows:

(Ix0···xn) =q(x0)

n Y

k=1

exp{fk(xk−1; xk)}pk(xk−1; xk)

EQ[exp_{fk(Xk−1; Xk)} |Xk−1=xk−1]

(4)

(Ix0) =

where the summation P

xn extends over all possible values of xn. Noticing that

EQ[exp{fn(Xn−1; Xn)} |Xn−1=xn−1]

It follows from (21) and (22) that

(5)

By virtue of the inequalities logx6x−1 (x ¿0) and ex−1−x6(x2=2)e|x|;and noticing

It follows from (12) and (25) that

(6)

Since P(A(pc=Ht)) = 1, we have by (28),

lim sup

n→∞

n−1 n X

k=1

62pcHt P-a:e:onD(c): (29)

When c= 0, choose i∈(0; t] (i= 1;2; : : :); such that i→0 (i→ ∞); and let A∗=

T∞

i=1A(i). Then for all i we have by (26),

lim sup

n→ ∞

n−1 n X

k=1

{fk(Xk−1; Xk)−EQ[fk(Xk−1; Xk)|Xk−1]}6iHt;

!∈A∗∩D(0):

Since P(A∗_{) = 1}_; _{hence (29) also holds when} _c_{= 0.}

When − ¡−t ¡ ¡0; by virtue of (34) it can be shown in a similar way that

lim inf

n→ ∞ n −1

n X

k=1

¿₋2pcHt P-a:e:onD(c): (30)

Eq. (15) follows from (29) and (30), and (16) follows from (15) directly. This com-pletes the proof of the theorem.

Now we give two examples of function sequence {fn(x; y); n¿1} which is not

bounded and satises the exponential assumption (13).

Example 1. Let

fn(i; j) =−logpn(i; j); n= min{pn(i; j); i; j∈S}=pn(in; jn); in; jn ∈S:

Assume that n→0 as n→ ∞. Then fn(in; jn)→ ∞; and {fn; n¿1} is not bounded.

Let 0¡ ¡1. ∀i∈S; we have

EQ[exp{|fk(Xk−1; Xk)|} |Xk−1=i] = N X

j=1

[pk(i; j)]1−¡ N:

Hence the exponential assumption (13) is satised.

Example 2. Let{gn(x; y; n¿1)} be a sequence functions dened onS2 and satisfying the following conditions:

06gn(x; y)6M ¡∞; n¿1;

n= max{gn(x; y); x; y∈S}=gn(in; jn)¿m ¿0;

{an; n¿1} be a sequence of positive numbers satisfying the following conditions:

lim sup

n

an=∞;

lim sup

n n−1

n X

k=1

(7)

andfn(x; y) =angn(x; y). Then lim supnfn(in; jn) =∞, and{fn; n¿1} is not bounded. ∀i_∈S, we have

EQ[exp{fk(Xk−1; Xk)} |Xk−1=i] = N X

j=1

exp{akgk(i; j)}pk(i; j)6eakM_:

Hence the assumption (13) is satised.

Theorem 2. Let

Ht= 2N=e2₍_t₋₁₎2_; ₀_{¡ t ¡}₁_: ₍₃₁₎

Under the conditions of Theorem 1; when 06c6t2Ht; we have

lim sup

n→ ∞

(

fn(!)−n−1 n X

k=1

H[pk(Xk−1;1); : : : ; pk(Xk−1; N)]

)

62pcHt P-a:e: on D(c); (32)

lim inf

n→ ∞

(

fn(!)−n−1 n X

k=1

H[pk(Xk−1;1); : : : ; pk(Xk−1; N)] )

6₋2pcHt−c P-a:e: on D(c); (33)

where H(p1; : : : ; pN) denotes the entropy of distribution (p1; : : : ; pN); i.e.;

H(p1; : : : ; pN) =− N X

j=1

pjlogpj:

Proof. In Theorem 1 let fk(x; y) =−logpk(x; y) (k¿1); = 1. Since

EQ[exp{|fk(Xk−1; Xk)|} |Xk−1=i] = N X

j=1

exp_{{| −}logpk(i; j)|}pk(i; j)

=

N X

j=1

pk(i; j)=pk(i; j) =N;

hence _∀i_∈S,

b1(i) = lim sup

n→ ∞

n−1 n X

k=1

EQ[exp{|fk(Xk−1; Xk)|} |Xk−1=i]6N: (34)

Noticing that

EQ[−logpk(Xk−1; Xk)|Xk−1] =− N X

j=1

pk(Xk−1; j)logpk(Xk−1; j)

=H[pk(Xk−1;1); : : : ; pk(Xk−1; N)];

when 06c6t2_H

t, we have by (34), (31), and (15),

lim sup

n→ ∞

(

n−1 n X

k=1

(−logpk(Xk−1; Xk))−n−1

n X

k=1

H[pk(Xk−1;1); : : : ; pk(Xk−1; N)]

)

(8)

lim inf

This completes the proof of Theorem 2.

(9)

(10)

we have when c ¿0,

lim sup

n→ ∞

n−1 n X

k=1

6M

_√

c+ cM log(1 +√c)

6M(2√c+c); !_∈A

1

Mlog(1 +

√ c)

∩D(c): (46)

Since P(A((1=M)log(1 +√c))) = 1, we have by (46),

lim sup

n→ ∞

n−1 n X

k=1

=M(c+ 2√c) P-a:e:onD(c): (47)

When ¡0, it follows from (46) that

lim inf

n→ ∞ n −1

n X

k=1

¿₋M(e−M ₋1) +c

; !∈A()∩D(c): (48)

Taking =₋(1=M)log(1 +√c) in (48), and using (45), we have c ¿0,

lim inf

n→ ∞ n −1

n X

k=1

¿₋M√c₋ cM

log(1 +√c)¿−M(c+ 2

√ c);

!_∈A

−_M1log(1 +√c)

∩D(c): (49)

Since P(A(−(1=M)log(1 +√c))) = 1, it follows from (49) that

lim inf

n→ ∞ n −1

n X

k=1

¿₋M(c+ 2√c) P-a:e:on D(c): (50)

In a similar way it can be shown that when c= 0 (47) and (50) also hold. Eq. (40)

follows from (47) and (50) directly. This completes the proof of Theorem 3.

3. Some extensions of Shannon–McMillan theorem to nonhomogeneous Markov information sources

Theorem 4. Let _{Xn; n¿0} be; under probability measure P; a nonhomogeneous

Markov information source with state space S; initial distribution (4); and

transi-tion matrices(5); i.e.;

P(X0=x0; : : : ; Xn=xn) =p(x0; : : : ; xn) =q(x0) n Y

k=1

(11)

Let i; j ∈ S; Sn(i; !) be the number of i in the sequence X0(!); X1(!); : : : ; Xn−1(!);

where j(_·) denotes the Kronecker delta function. Also let

(p(i; j)); p(i; j)¿0; i; j∈S (52)

Proof. Let Q be the homogeneous Markov measure determined by initial distribution (4) and transition matrix (52), i.e.,

q(x0; : : : ; xn) =q(x0) n Y

k=1

p(xk−1; xk): (57)

In this case it follows from (9), (51), and (57) that

h(P|Q) = lim sup

It is easy to see from Theorem 3 that

(12)

Eqs. (59) and (60) imply that

By the entropy inequality we have

N

This implies thatP(D(0))=1. Hence (54) – (56) can be established by using Theorem 3

(cf. Liu and Yang, 1996). This completes the proof of the theorem.

Corollary. Letting [a]+= max_{a;0_}; if

(13)

Now we give an approach to construct examples satisfying the assumption (53).

Example. Let= min{p(i; j); i; j∈S}; = max{p(i; j); i; j∈S}, and (n(i; j)); i; j∈ S, be a sequence of matrices satisfying the following conditions:

N X

j=1

[n(i; j)]+= N X

j=1

[n(i; j)]−; i∈S; n¿1;

dn= max{|n(i; j)|; i; j∈S}¡min{;1−};

∞

X

n=1

dn=n ¡∞:

Letpn(i; j)=p(i; j)+n(i; j). Then (pn(i; j)) is a stochastic matrix, and by Kronecker’s

lemma we have

lim

n→ ∞n −1

n X

k=1

|pk(i; j)−p(i; j)|6 lim n→ ∞n

−1 n X

k=1 dk= 0:

This implies (53) evidently.

4. For further reading

Billingsley, 1986; Cover and Thomas, 1991; Laha and Rohatgi, 1979; Karlin and Taylor, 1975

Acknowledgements

The authors are thankful to the referee for his valuable suggestions.

References

Algoet, P.H., Cover, T.M., 1988. A sandwich proof of the Shannon–McMillan–Breiman theorem. Ann. Probab. 16, 899–909.

Barron, A.R., 1985. The strong ergodic theorem for densities: generalized Shannon–McMillan–Breiman theorem. Ann. Probab. 13, 1292–1303.

Billingsley, P., 1986. Probability and Measure. Wiley, New York.

Breiman, L., 1957. The individual ergodic theorem of information theory. Ann. Math. Statist. 28, 809–811. Chung, K.L., 1961. The ergodic theorem of information theory. Ann. Math. Statist. 32, 612–614.

Cover, T.M., Thomas, J.A., 1991. Elements of Information Theory. Wiley, New York. Gray, R.M., 1990. Entropy and Information Theory. Springer, New York.

Gray, R.M., Kieer, J.G., 1980. Asymptotically mean stationary measure. Ann. Probab. 8, 962–973. Kieer, J.C., 1973. A counterexample to Perez’s generalization of the Shannon–McMillan theorem. Ann.

Probab. 1, 362–364. Correction 4, 153–154, 1976.

Kieer, J.C., 1974. A Simple proof of Moy–Perez generalization of Shannon–McMillan theorem. Pacic J. Math. 51, 203–204.

(14)

Liu, W., 1990. Relative entropy densities and a class of limit theorems of the sequence ofm-valued random variables. Ann. Probab. 18, 829–839.

Liu, W., Yang, W., 1996. A extension of Shannon–McMillan theorem and some limit properties for nonhomogeneous Markov chains. Stochastic Process. Appl. 61, 129–145.

McMillan, B., 1953. The basic theorems of information theory. Ann. Math. Statist. 24, 196–216.