getdoc2319. 201KB Jun 04 2011 12:04:10 AM

(1)

ELECTRONIC

COMMUNICATIONS in PROBABILITY

ASYMPTOTIC VARIANCE OF FUNCTIONALS OF

DISCRETE-TIME MARKOV CHAINS VIA THE DRAZIN INVERSE

DAN J. SPITZNER

Department of Statistics (0439), Virginia Tech, Blacksburg, VA 24061, USA. email: dspitzne@vt.edu

THOMAS R. BOUCHER

Department of Mathematics, Plymouth State University, Plymouth, NH 03264, USA. email: tboucher1@plymouth.edu

Submitted August 30 2006, accepted in final form April 9 2007 AMS 2000 Subject classification:

Keywords: General state space Markov chains; f-regularity; Markov chain central limit theo-rem; Drazin inverse; fundamental matrix; asymptotic variance

Abstract

We consider aψ-irreducible, discrete-time Markov chain on a general state space with transition kernel P. Under suitable conditions on the chain, kernels can be treated as bounded linear operators between spaces of functions or measures and the Drazin inverse of the kernel operator I−P exists. The Drazin inverse provides a unifying framework for objects governing the chain. This framework is applied to derive a computational technique for the asymptotic variance in the central limit theorems of univariate and higher-order partial sums. Higher-order partial sums are treated as univariate sums on a ‘sliding-window’ chain. Our results are demonstrated on a simple AR(1) model and suggest a potential for computational simplification.

1 Introduction

Assume {Φ} is aψ-irreducible, discrete-time Markov chain with transition kernel P evolving on a Polish state spaceX. Denote byB(X) the Borel sets associated with X. A “univariate partial sum” will refer to Sn(g) =Pni=1g(Φi) for a function g:X →R. Define the

“sliding-window chain” as {Φ˜i = (Φi, . . . ,Φi+p−1)} for some integerp. A “high-order partial sum” is

Sn(˜g) =Pni=1g(Φ˜ i, . . . ,Φi+p−1) for some function ˜g:Xp→R.

We consider the role of linear operators in formulating the limiting behavior of the chain. Of special interest are the “fundamental kernel,”Z, and “Drazin inverse” (or “group inverse”) Q#_{, of the kernel}_Q₌_I₋_P_{. Our study is somewhat unconventional since it focuses on}_Q# rather thanZ, which has traditionally been used in investigations of a Markov chain’s limiting behavior. The framework centering onQ#_{is particularly tidy and convenient, and can simplify} computation. As an example, we showQ#_{allows an easy correspondence between the objects} governing sliding-window chains and their analogues on the original chain.

(2)

A considerable literature is available on central limit theorems for univariate partial sums, with a variety of formulas for the asymptotic variance of Sn(g). The sliding-window chain

inherits the stability properties of{Φ}and admits a viewpoint in whichSn(˜g) is treated as a

univariatepartial sum,Pn

i=1˜g( ˜Φi). Connections between the limiting properties ofSn(˜g) and Sn(g) are made by describing the probability laws governing{Φ˜} in terms of those of{Φ}.

Our investigation builds upon the central limit theorem derived in Meyn and Tweedie (1993, Ch. 17) and related ideas in Glynn and Meyn (1996). Our main reference on the Drazin inverse of a linear operator is Koliha (1996a), but Campbell and Meyer’s (1979) investigations in finite-dimensional state space also provide inspiration. References for Markov-chain cen-tral limit theorems abound, including Kemeny and Snell (1960), Cogburn (1972), Nummelin (1984), Kipnis and Varadhan (1986), T´oth (1986), Geyer (1992), Chan and Geyer (1994), Tier-ney (1994), Robert (1995), Robert and Casella (1999, Ch. 4), Roberts and Tweedie (1996), Nummelin (2002), Hobartet al. (2002), Roberts and Rosenthal (2004), and Jones (2004). The main results are in Section 2. Markov chain stability conditions and solutions to the “Poisson equation” are discussed in Section 2.1; these establish the fundamental kernel and Drazin inverse as bounded linear operators on spaces of functions and signed measures. Section 2.2 expresses representations for asymptotic variance associated with Meyn and Tweedie’s (1993) central limit theorem in terms of the Drazin inverse. Extensions of the Drazin inverse methodology to the sliding window chain are in Sections 2.3 and 2.4. Demonstrations on linear autoregressive processes are in Section 3 and conclusions are made in Section 4.

2 Results

LetM denote the space of bounded signed measures on (X,B(X)). Let ν ∈ M, g :X →R a measurable function, A and B kernels, x ∈ X, and E ∈ B(X). We have the following notation: νA(E) :=R

A(x, E)ν(dx);Ag(x) :=R

g(y)A(x, dy);BA(x, E) :=R

A(y, E)B(x, dy); νg:=R

g(x)ν(dx);I(x, E) := 1 ifx∈E and 0 otherwise; Ig(x, E) :=g(x)I(x, E). Define the

n-step transition probability kernelPn _of_{_Φ_} _{as the}_{n-fold product of}_P _{with itself.}

2.1 The Poisson equation and linear operator solutions

Define the kernel R = P∞

i=0q(i)Pi, where q is a probability measure on the nonnegative integers. A setC ispetiteif it holds thatR(x, E)≥ν(E) for everyx∈C, E∈ B(X), someq as above, and some nontrivial measureν on (X,B(X)).

A rather weak stability condition for{Φi} which yields a central limit theorem is

P V(x)≤V(x)−f(x) +bI(x, C) for everyx∈ X, (1) wherebis a finite constant,f :X →[1,∞),V :X →[1,∞) andC∈ B(X) is a petite set. Such a chain is termedf-regularand has a stationary distributionπwithπf <∞(see e.g. Glynn and Meyn, Thm. 2.2). Define the kernel Π(x, E) :=π(E) and note it satisfies ΠP =PΠ = Π. For a functional g : X → R the limiting properties of Sn(g) can be understood through a

solution ˆg:X →Rto the “Poisson equation,”

(3)

2.3) established that when (1) holds the fundamental kernel Z = {I−P + Π}−1 _provides solutions ˆg = Zg to the Poisson equation (2). Meyn and Tweedie (1993, eq. 17.39) give solutions to the Poisson equation through ˆg = P

i≥0{Pig−πg}. Define the kernelQ# :=

P

i≥0{Pi−Π}and then ˆg=Q#gis a solution to the Poisson equation. Thus, bothZandQ# are kernels operating on a space of functions (defined below) to yield solutions to the Poisson equation.

In a similar spirit, we formalize a framework for operatorsAthat sendgto its solutions ˆg=Ag of the Poisson equation (2). Following convention, for a function h:X → [1,∞), define the linear spaceL∞

h as all measurable functions g:X →Rfor whichg(x)|/h(x) is bounded over

x∈ X. The spaceL∞

h can be equipped with the normkgkh = supx|g(x)|/h(x), making it a

Banach space. The following is essentially Theorem 17.4.2 of Meyn and Tweedie (1993): Proposition 1. Suppose (1) is satisfied. If πV < ∞, then there is an a∈ R such for each

|g|< f the Poisson equation has a solutionˆg for which |ˆg| ≤a(V + 1).

Following Glynn and Meyn (1996, Thm. 2.3), Proposition 1 implies a linear operator which gives solutions to the Poisson equation is a bounded linear operator fromL∞

f toL∞V+1. When the operator is a kernel, it may also be defined on linear spaces of signed measures. Define

Mh as the space of signed measuresν∈ M for which|νg|is bounded overg∈L∞h . Mh is a

Banach space under the normkνkh= supg:|g|≤h|νg|.

Proposition 2. Suppose(1)is satisfied andπV <∞. The kernelsΠ,Q#_and_Z _{are bounded} linear operators from Mf toMV+1.

Proof. Suppose the kernelAis such that ˆg=Ag satisfies the Poisson equation (2) forg∈Lf.

Choose g1 ∈ L∞

f , ν ∈ MV+1 arbitrarily and set g2 = Ag1. Then Proposition 1 implies g2 ∈ L∞

V+1, in turn implying |νg2| < ∞. Since νg2 = {νA}g1, then νA ∈ Mf. Thus A is

a bounded linear operator from Mf to MV+1 and the conclusion regarding Q#, Z follows. Since πV < ∞ then clearly π ∈ MV+1. Since (1) holds this implies πf < ∞and we have

νΠg1<∞, implyingνΠ∈ Mf. Thus Π is a bounded linear operator fromMf toMV+1.

For a kernelAto be theDrazin inverseof the kernelQ=I−P it must satisfy the algebraic criteria

QA=AQ, QAQ=Q, andAQA=A, (3) in which case it is unique in that respect. The next Proposition establishesQ#_{is the Drazin} inverse ofQand delineates what will be useful relationships amongZ,Q#_{, and Π.}

Proposition 3. Suppose (1) holds andπV <∞. Consider the kernelsΠ,Q#_and _Z _defined as operators either from MV+1 toMf orL∞f toL∞V+1.

(i) Q# _{satisfies the criteria (3) and is therefore the unique Drazin inverse of} _Q, (ii) Q#₌_Z_{_I₋_Π_}₌_{_I₋_Π_}_Z,

(iii) Π =I−Q#_Q₌_I₋_QQ#_, (iv) Z =I+P Q#_{, and}

(v) Q#₌_Pj_Q#₊Pj−1

(4)

Proof. Note when (1) is true andπV <∞, the kernels Π,Q#_and_Z _{exist and are well-defined} due to Propositions 1 and 2.

To prove statement (i), it is straightforward to use the definitions ofQand Q#_{to show (3).} The uniqueness ofQ#_{with respect to the criteria (3) follows from Koliha (1996, lem. 2.4).} Statement (ii) follows from{I−Π} commuting with Z−1_: _{_I₋_Π_}_Z ₌ _ZZ−1_{_I₋_Π_}_Z ₌ Z{I−Π}Z−1_Z₌_Z_{_I₋_Π_}_.

Statement (iii) now follows from (ii), recallingQ=I−P.

Statement (iv) also follows immediately from (ii) sinceZ ={I−P+ Π}−1 _and

{I+P Q#}Z−1 = Z−1+P{I−Π}ZZ−1=I.

Statement (v) for the case j = 1 is immediate from statement (iii), since I−Π = QQ# _is equivalent to Q# ₌_{P Q}#₊_{_I₋_Π_}_{. To prove the general case, apply this in the induction} step

Q#=Pj−1Q#+

j−2

X

i=0

{Pi−Π}=Pj−1[P Q#+{I−Π}] +

j−2

X

i=0

{Pi−Π}=PjQ#+

j−1

X

i=0

{Pi−Π}.

The results to this point, and those to follow, require the drift condition (1) as a sufficient condition. We pause for a remark on the necessity of (1). The existence of the operator Q# _{follows from} _f_{-regularity with} _π(V₎ _< _∞_{, since for a} _{ψ-irreducible chain these imply}

P

i≥0|Pig−πg|<∞ for any initialxand any functiong with |g| ≤f (Meyn and Tweedie, 1993, Thm. 14.0.1). This is not equivalent tof-ergodicity, which merely haskPn₋_π_k

f→0.

Of course, if a chain is notf-ergodic there is no hope of Q# _{existing, so that an investigation} into the necessity of (1) must involvef-ergodic chains. A ψ-irreducible chain is f-regular if and only if it satisfies (1) (Meyn and Tweedie, 1993, Thm. 14.2.6), and, while an aperiodic f-regular chain is f-ergodic, aperiodic f-ergodic chains need not be f-regular (Meyn and Tweedie, 1993, Thm. 14.3.3). However, the f-ergodic chain restricted to a full (a set A with ψ(A) = 1) absorbing set is f-regular. Thus, when the maximal irreducibility measure ψ is Lebesgue measure, an aperiodicf-ergodic chain is f-regular almost everywhere. In this case, which we suspect will contain most real applications, the drift condition (1) is ‘almost’ necessary as well as being sufficient for the existence ofQ#_when_π(V₎_<_∞_.

The question of the necessity of the drift condition (1) in establishing the existence of the operatorZ is resolved similarly, with the difference being Glynn and Meyn (1993, Thm. 2.3) require for any x∈ X that Ex[PkτB=0−1f(Φk)]< V(x) +c(B), withV as above,τB being the

hitting time for a setB∈ B(X), andc(·) a set function withc(B)<∞. This is slightly stronger than the characterization, equivalent to (1), of aψ-irreducible chain beingf-regular when it has a countable covering of the state spaceX byregularsetsCi with supx∈CiEx[

PτB−1

k=0 f(Φk)]<

+∞.

The special case of V-uniform ergodicity occurs when the bounding function V in (1) is a multiple off. The operator normkAkV = supx∈Xsupg:|g|≤V |Ag(x)|/V(x) makes the relevant

kernels bounded linear operators from either L∞

V to itself or MV to itself. It also allows an

equivalent characterization ofV-uniform ergodicity: kPi₋_Π_k

V ≤cρi, c∈(0,∞), 0< ρ <1,

providing an immediate proof of the existence ofQ# _since P

i≥0{Pi−Π} converges in the normk · kV, considered over either spaces of functions or measures.

(5)

Q must have an isolated eigenvalue at zero, which admits the study of “second eigenvalue” properties of Markov chains. Such ideas have been used to investigate a Markov chain’s rate of convergence to its invariant distribution. See, e.g., Lawler and Sokal (1988) and more recent work by Fran¸cois (1999) and Le´on (2001).

2.2 Representations of asymptotic variance

Considering the kernels as operators between spaces of functions makes sense when concerned with calculating asymptotic quantities of partial sums. Considering the kernels as operators between spaces of measures is logical when concerned with convergence of measures, such as convergence of the transition probabilities to the stationary. A combination of the two is applicable to studying convergence of normalized partial sums to a normal distribution in the Central Limit theorem. We recount some important asymptotic properties of the partial sum Sn(g). Statements (i) and (ii) are Theorems 14.0.1 and 17.4.4 of Meyn and Tweedie (1993),

respectively.

Proposition 4. Suppose (1) is satisfied and πV <∞. (i) The asymptotic mean ofn−1_S

n(g)is

µg = πg=

Z

g(x)π(dx). (4)

(ii) Suppose ˆg solves (2) andπ(ˆg2₎_<_∞_{. If}_σ2

g =π(ˆg2− {Pgˆ}2)>0 then

n−1/2{Sn(g)−nµg} →N(0, σ2g). (5)

The quantity σ2

g is the asymptotic variance of the partial sum Sn(g). It is a quantity of

interest in Markov chain theory; for example in the CLT as in Proposition 4.ii, also in MCMC methodology. We express the asymptotic variance in terms of the Drazin inverseQ#_.

Proposition 5. When Proposition 4.iiholds the asymptotic variance may be written σ2

g = 2πIgQ#g−πIg{I−Π}g, (6)

σ2g = 2πIgZ{I−Π}g−πIg{I−Π}g, (7)

σ2g = πIg{I+P}Q#g, (8)

σ2g = 2πIgPjQ#g+ 2 j−1

X

i=0

πIg{Pi−Π}g−πIg{I−Π}g. (9)

Proof. Writeσ2g =π(ˆg2− {Pˆg}2) =π(ˆg2− {ˆg−g¯}2). Substituting ¯g ={I−Π}g, ˆg=Q#g

and simplifying yields σ2

g = 2π(¯gQ#g)−π(¯g2) = 2πI¯gQ#g−πI¯g{I−Π}g. Now use the

identitiesπI¯g=πIg{I−Π}and ΠQ#= 0 (by Proposition 3.iii) to establish (6). Substituting

Q# ₌_Z_{_I₋_Π_}_{, as in Proposition 3.iv, gives an expression (7) in terms of the fundamental} matrix. Using I −Π = QQ#_{, by Proposition 3.ii, one is led to the compact formula (8).} Alternatively, Proposition 3.vleads to the expression (9) for anyj≥1.

Note (9) may be interpreted in terms of the autocovariances of{Φi}. Assuming Φ0is initially

(6)

πIg{Pi−Π}g=Cov[g(Φ0), g(Φi)]. Under (1),Pj→Π, and since ΠQ#= 0 it follows the first

term in (9) disappears. The resulting formula is then the familiar (Meyn and Tweedie, 1993, Thm. 17.5.3)

σ2g = 2

∞

X

i=0

Cov[g(Φ0), g(Φi)]−V ar[g(Φ0)]. (10)

2.3 Representations for a sliding window chain in terms of the

orig-inal chain

Recall{Φ˜i= (Φi, . . . ,Φi+p−1)}andSn(˜g) =Pni=1g( ˜˜Φi). Under suitable regularity conditions

on{Φ˜}, the asymptotic properties ofSn(˜g) are detailed in formulas (4) - (6) withπ,P, andQ#

replaced by ˜π, ˜P, and ˜Q#_{, analogous objects governing}_{_Φ_˜

i}. In this section, the connection

of ˜π, ˜P, and ˜Q#_to _P,_{π, and} _Q# _{is made explicit.}

The transition probability kernel ˜P of the sliding window chain is defined on the product space (Xp_,_B₍_X₎p_{). We extend measures from} _{_Φ_} _to _{_Φ_˜_} _{as follows. Define ˜}_P∗_{(x1, E) :=} P(Φ2 ∈ E2, . . . ,Φp ∈ Ep|Φ1 = x1). Suppose ν ∈ M. For ˜E ∈ B(X)p, define ˜ν( ˜E) :=

R

E₁P˜∗(x1, E)ν(dx1); for a function ˜g : Xp → R define ˜νg˜ appropriately. Define from ˜g the

function ˜g∗_:_{X →}_R_{by ˜}_g∗_:=R

˜

gdP˜∗ _{and note we can write}_ν_g_˜∗_{= ˜}_ν˜_g.

An analogue to the drift condition (1) holds for the sliding window chain. Fromf andV in (1) define ˜f(˜x) =f(xp) and ˜V(˜x) =V(xp); from the petite setC define ˜C={x˜∈ Xp:xp∈C}.

The stability condition (1) may now be rewritten ˜

PV˜(˜x)≤V˜(˜x)−f˜(˜x) +bI(˜x,C),˜ for every ˜x∈ Xp. (11) It remains to show that ˜C is petite.

Proposition 6. If C is petite for {Φ}, then C˜ ={(x1, . . . , xp)∈ Xp : xp ∈ C} is petite for

{Φ˜}.

Proof. Pick ˜x ∈ Xp and ˜E ∈ B(X)p. Note ˜Pi+p−1(˜x,E) =˜ R

E1

˜

P∗(xp+1, E)Pi(xp, dxp+1).

Define ˜R = P∞

i=0q(i) ˜˜ Pi, where ˜q(i) = 0 if i < p−1 and ˜q(i) = q(i−p+ 1) else, so that ˜

R(˜x,E) =˜ R

E1

˜ P∗_(x

p+1, E)R(xp, dxp+1). Since C is petite in X this leads to ˜R(˜x,E)˜ ≥

˜

ν( ˜E) for every ˜x∈C˜ and ˜E∈ B(X)p_{, implying ˜}_C _{is petite.}

We now define a number of relevant linear spaces. LetMp _{be the space of bounded signed}

measures on the product space (Xp_,_B₍_X₎p_{). Recall the spaces}_L∞

h andMhdefined in Section

2.1. Let ˜L∞

h denote the space of measurable functions ˜g : Xp →R such that ˜g∗ ∈L∞h. Let

˜

Mhdenote the space of bounded signed measures ˜ν ∈ Mpfor which there is a signed measure

ν∈ Mhsuch that ˜ν˜g=νg˜∗ for any ˜g∈L˜∞h .

Based on Proposition (6) and the preceding discussion we expect if the original chain is stable so is the sliding-window chain. Then by the results in Section 2.2 an invariant measure must exist, a Drazin inverse must exist and so also must solutions to the sliding-window version of the Poisson equation. The following shows this is the case.

Proposition 7. Letg˜:Xp_→_R _{be a measurable function. Suppose} ₍₁₎_{holds and}_{πV <}_∞_.

(7)

(ii) The Drazin inverse ofQ, denoted˜ Q˜#_{, exists as a bounded linear operator from}_L_˜∞

f to

˜ L∞

V+1and from M˜f toM˜V+1, is unique and provides solutionsˆ˜g= ˜Q#˜g to the Poisson

equation.

Proof. Statement (i): If ˜ν ∈ M˜f is such that ˜νg˜=ν˜g∗ for ν ∈ Mf, any ˜g ∈ L˜f, ν ∈ Mf,

then ˜νP˜i_g_˜₌_νPi_g_˜∗_{for any} _i_≥_{0. Since (11) holds an invariant measure exists and is unique.} The measure ˜π∈M˜f such that ˜π˜g=π˜g∗ withπ∈ Mf is therefore the invariant measure of

the sliding window chain.

Statement(ii): SinceπV <∞then ˜πV <˜ ∞. Define ˜Q=I−P. As implied by Proposition˜ 3.i, ˜Q# _{exists, is unique, and may be written ˜}_Q# ₌P∞

i=0{P˜i−Π˜}. It is easily verified this provides solutions to the Poisson equation.

Since ˜πV <˜ ∞, the central limit theorem and asymptotic variance expressions in Propositions 4 and 5 apply to{Φ˜}with ˜P ,Π,˜ Q˜#replacingP,Π, Q#. We have already related ˜P and ˜Π to P and Π. We next connect ˜Q#_{to its analogous object} _Q# _{on the original chain.}

Proposition 8. For a measureν˜∈M˜V+1, the measureν˜Q˜#is inM˜f and acts on functions

˜ g∈L˜∞

f byν˜Q˜#˜g=νQ#˜g∗.

Proof. Recall by definition we extended measuresν ∈ MV+1 by ˜νg˜=νg˜∗ for any ˜g ∈ L∞V+1. By Proposition 7, ˜Q# _{takes measures ˜}_ν _∈ _M_˜

V+1 to M˜f. Since ˜νQ˜# = νQ]# this implies

˜

νQ˜#_g_˜₌_νQ]#_˜_g₌_νQ#_˜_g∗_.

2.4 Asymptotic variance of high-order partial sums

We apply these results to show how the asymptotic variance of Sn(˜g) can be expressed in

terms ofQ#_.

Proposition 9. When Proposition 4.iiholds for analogous objects of the sliding window chain, the asymptotic variance of Sn(˜g)is

σ2 ˜

g= 2πP˜gQ#˜g∗+ 2

( p−2

X

i=0 ˜ πI˜gP˜ig˜

!

−(p−1)(˜π˜g)2

)

−πI˜˜ g{I−Π˜}g,˜ (12)

where

P˜g(x1, E) =

Z

I(xp, E)˜g(x1, . . . , xp)P(xp−1, dxp)P(xp−2, dxp−1)· · ·P(x1, dx2).

Proof. It follows almost directly from Proposition 8 that ˜πI˜gP˜p−1Q˜#˜g=πP˜gQ#g˜∗. Combine

this with Proposition 3.vwith j =p−1 and (6) in which the term ˜πI˜g{P˜i−Π˜}g˜has been

expanded to ˜πI˜gP˜i˜g−(˜π˜g)2.

3 Applications to autoregressive processes

At this point, we have established the Drazin inverse methodology for{Φ}, extended it to{Φ˜}

(8)

We consider a simple time series model, the linear autoregressive process of order one, AR(1). The AR(1) model is Φi+1 = φΦi +ǫi+1, where the {ǫi} are mean-zero Gaussian random

variables with constant varianceσ2_{. We suppose}_|_φ_|_<_{1 so that}_{_Φ

i}is a mean zero stationary

process. The transition probability kernel identifiesP(x,·) with a Gaussian distribution having meanφxand varianceσ2_{. A straightforward iterative argument will then show that}_Pi_(x,_·_{) is}

a Gaussian probability measure with meanφi_x_{and variance}_σ2

i =σ2(1−φ2i)/(1−φ2). Known

results (see, e.g., Brockwell and Davis, 1991, Ch. 3) establish the invariant distributionπ is mean-zero Gaussian with varianceσ2

π=σ2

P∞

i=0φ2i=σ2/(1−φ2).

We verify the drift condition (1) for V-uniform ergodicity (and thus f-regularity, since f is a multiple of V) using the bounding function V(x) = ec|x|_, _{c >} _{0, which satisfies} _V _≥ _1. NoteP V(x)<∞and{Φt} isψ-irreducible with the irreducibility measureψbeing Lebesgue

measure. Since|φ|<1 andE[ec|ǫi|]<∞, for|x|large enough it holds that P V(x)/V(x)<1,

implying there existsβ > 0 so that P V(x) +βV(x) ≤V(x). LetC ={x: P V(x)≥V(x)}

and letb= maxx∈CV(x); thenC is petite and the drift condition forV-uniform ergodicity is

satisfied: P V(x)−V(x)≤ −βV(x) +bI(x, C).

First, in Example 1, to demonstrate the basic techniques described in Section 2.2, we treat a problem involving a univariate partial sum only. Next, in Example 2, to demonstrate the techniques derived in Section 2.4, we treat a problem involving a high-order partial sum. A third example is also given, which combines the formulas derived in Example 2 to rederive the asymptotic variance of the least squares estimator for the autoregressive parameter.

Although the examples deduce fairly broad formulas, they have been laid out in such a way as to sketch how our high-order asymptotic formulas might be used in a general methodology: the first problem provides an understanding of the action ofQ#_{as applied to a univariate partial} sum, which is then applied in the second, more complicated problem involving higher-order partial sums; finally, the third problem pulls the results together in a useful context. The amount of computational effort required in each example is roughly the same as that which would be needed in taking a more direct approach, such as applying (10),e.g., which does not involveQ#_{. Nevertheless, we anticipate advantages in future applications to general} autore-gressive time series in whichQ#_{is known with uncertainty and at considerable computational} cost. By following a sequence of steps similar to these examples, one avoids calculation of the Drazin inverse of the sliding window chain, thereby heading off additional uncertainty and computational costs that would arise in working with this more complicated chain directly. Further discussion of potential applications is given in Section 4.

Example 1. We calculate the asymptotic variance (6) forg(x) = xd _{for some integer} _{d. By}

rescaling Pi_(x,_·_{) with respect to the distribution of a standard Gaussian random variable}

Z∼G(0,1), the binomial theorem provides (σiZ+φix)d=Pdj′=0

d j′

{σiZ}j

′

{φi_x_}d−j′

. This leads to the formula

Pig(x) = ⌊d/2⌋

X

j=0

_d

2j

σ2j

₁₋_φ2i

1−φ2

j

φi(d−2j)xd−2jE[Z2j], (13)

in which indexing uses j′ _{= 2j} _{only, since the odd moments of} _Z _{are zero. Here,} _⌊_d/2_⌋ represents the greatest integer not exceedingd/2. The binomial theorem may be used again to expand (1−φ2i₎j ₌Pj

k=0

j k

(−1)k_φ2ik_{. Recalling}_Q#_{g(x) =}P∞

(9)

identityP∞

Thus, the asymptotic variance (6) forg(x) =xdis

σg2= 2σ2πd

Expressions for the example valuesd= 1,2,3 are

σg2 = In general settings, the even moments ofZmay be calculated through the well-known formula E[Z2j_{] = 2}j_π−1/2_Γ(j_{+ 1/2), where Γ is the gamma function.}

Example 2. In this example, the sliding-window chain formulas derived in Section 2.4 are used to evaluate the asymptotic behavior of the partial sumsS1 =Pn

i=1ΦiΦi+r−1 andS2 =

Pn

i=1ΦiΦi+p−1, withrandpbeing fixed, positive integers. The specific objective is to evaluate µi ≈n−1E[Si], σ2i ≈ n−1V ar[Si], for i = 1,2, and σ12 ≈ n−1Cov[S1, S2] under the simple

AR(1) model. The parameterr shall be restricted to 1≤r <(p+ 1)/2, which identifies the simpler cases in calculating the second term in (12).

Set ˜g(˜x) =a1x1xr+a2x1xp, for arbitrary constantsa1anda2. Expressions for the asymptotic

mean and variance ofSn(˜g) will have the formµ˜g =a1µ1+a2µ2 andσ˜g2=a21σ12+ 2a1a2σ12+ a2

2σ22. Thus, onceµ˜gandσ˜g are derived, the desired quantities may simply be read off.

We begin the derivation of σ2 ˜

g by calculating the first term in (12) from Proposition 9. We

need the following lemma.

Lemma 1. Define˜ei(˜x) =xi,e˜ij(˜x) =xixj,e˜ijk(˜x) =xixjxk, where1< i≤j≤k≤p. Then

(10)

Proof. The expressions follow directly from formula (13). It is helpful to note that whenever to which one may immediately apply (14) withd= 2 in order to evaluate

Q#˜g∗(x1) = (a1φr−1+a2φp−1)

hence, with (13) and some straightforward manipulation, the required formula is

πP˜gQ#g˜∗= 2a21

To calculate the second term in (12), observe that (15) immediately provides

π˜g∗ = (a1φr−1+a2φp−1)σπ2, (17)

(11)

Case 2,r−1≤i < p−r,

Putting together (16), (17), (19), and (20), The asymptotic variance (12) is given by

σ2˜g = a21

Example 3. Under the AR(1) model, a least squares estimator forφarises by regressing one-step-ahead values Φi+1 onto current values Φi in the manner of no-intercept simple linear

regression. Calculating this from a trajectory ofn+ 1 time points, Φ1, . . . ,Φn+1, the estimator

is

Observe that ˆφis calculated from the partial sums of the previous example with p= 2 and r = 1: the partial sums are S1 = Pn

i=1Φ2i and S2 =

Pn

i=1ΦiΦi+1, and the least squares

(12)

The asymptotic properties of ˆφn may therefore be calculated from the asymptotic means,

variances, and the covariance of S1 and S2 by Taylor expansion. A first-order expansion of ˆ

φ(S1, S2) about (µ1, µ2) implies that the asymptotic mean of ˆφn is µ2/µ1 and its asymptotic

variance is

nV ar[ ˆφ] ≈ {∂1φˆ}2σ21+ 2{∂1φˆ}{∂2φˆ}σ12+{∂2φˆ}2σ22. (22) where ∂iφˆ=∂/(∂Si) ˆφ(µ1, µ2). Note that∂1φˆ=−µ2µ−21 and ∂2φˆ=µ−11 . According to the formulas (17) and (21), these areµ1=σ2

π andµ2=φσπ2,

σ2 1 = 2

_{1 +}_φ2 1−φ2

σ4

π, σ22=

1 +φ2_{+ 4}

_φ2 1−φ2

σ4

π, andσ12= 4

_φ

1−φ2

σ4

π.

The asymptotic mean of ˆφn is thereforeµ2/µ1=φ. Pluggingσ12,σ22, andσ12into (22) greatly simplifies the expression and leads to the well known formula for the asymptotic variance of ˆφ given bynV ar[ ˆφ]≈σ2₍₁₋_φ2_).

4 Conclusions and discussion

A linear operator framework was established for discrete-time Markov chains and the Drazin inverse was shown to unify objects central to Markov chain theory. This framework was applied to asymptotic variance calculation for univariate and higher-order partial sums, with the formulas expressed in terms of objects governing the original chain,{Φi}, which presumably

would be obtained by statistical analysis.

A potential application is to analysis of a general autoregressive time series. The present study does not go as far as to provide a methodology for the analysis of general autoregressive time series, but our hope is that this technique might be exploited in future applications in order to reduce computational expense. Another extension of these methods would be to continuous-time Markov chains.

We note some existing literature which used asymptotic variance formulas to monitor the convergence of simulated Markov chains to their invariant distributions. For instance, Peskun (1973) studied convergence in chains with finite state space using the fundamental matrix. More recent work by Chauveau and Diebolt (2002) builds upon the formula (10) to study convergence in general state-space chains. We conjecture the present results would be useful in these applications as well.

It is prudent to remark on continuity and approximation ofπ andQ#. Continuity results for Q#are found in Campbell (1980) and Koliha and Rakoˇcevi´c (1998). These can be extended to the continuity ofπthrough our Proposition 3.iii. Meyer (1975) contains an early algorithm for calculating Q# _{for finite state-space Markov chains. Working with abstract Banach spaces,} Gonzolez and Koliha (1999) study semi-iterative methods for solving linear equations such as π = πP in cases where Q# _{exists. More recently, iterative methods for approximating} Q# _{and other generalized inverses in general state-space settings are studied in Djordjevic,} Stanimirovic, and Wei (2004). We expect the results of these investigations would be helpful in developing efficient computational algorithms for Markov chain applications.

5 Acknowledgements

(13)

References

[1] Brockwell, P. J., and Davis, R. A. (1991), Time Series: Theory and Methods (second edition),New York: Springer. MR1093459

[2] Campbell, S. L. (1980), Continuity of the Drazin inverse,Linear and Multilinear Algebra, 8:265-268. MR0560571

[3] Campbell, S. L., and Meyer, C. D. (1979),Generalized Inverses of Linear Transformations, London: Pitman. MR0533666

[4] Chauveau, D., Diebolt, J. (2003), Estimation of the asymptotic variance in the CLT for Markov chains,Stochastic Models, 19(4): 449-465. MR2013368

[5] Chan, K. S., and Geyer, C. J. (1994), Discussion of“Markov chains for exploring posterior distributions,”Annals of Statistics,22: 1747-1758. MR1329166

[6] Cogburn, R. (1972), The central limit theorem for Markov processes, Proceedings of the 6th Berkeley symposium on mathematical statistics and probability,485-512.

[7] Djordjevic, D. S., Stanimirovic, P. S., and Wei, Y. (2004), The representation and approx-imations of outer generalized inverses,Acta Math. Hungar.,104(1-2): 1-26. MR2069959 [8] Fran¸cois, O. (1999), On the spectral gap of a time reversible Markov chain, Probability in

Engineering and Informational Sciences,13:95-101. MR1666392

[9] Geyer, C. J. (1992), Practical Markov chain Monte Carlo (with discussion), Statistical Science,7:473-511.

[10] Gonzalez, C. N., and Koliha, J. J. (1999), Semi-iterative methods for the Drazin inverse solution of linear equations in Banach spaces,Numer. Funct. Anal. Optim.,20(5-6): 405-418. MR1704952

[11] Glynn, P. W., and Meyn, S. P. (1996), A Liapunov Bound for Solutions of the Poisson Equation,Annals of Probability, 24:916-931. MR1404536

[12] Hobart, G. L., Jones, G. L., Presnell, B., and Rosenthal, J. S. (2002), On the applica-bility of regenerative simulation in Markov chain Monte Carlo, Biometrika, 89:731-743. MR1946508

[13] Jones, G. L. (2004), On the Markov chain central limit theorem, Probability Surveys, 1:299-320. MR2068475

[14] Kemeny, J. G., and Snell, J. L. (1960),Finite Markov Chains, New York: Van Nostrand. MR0115196

[15] Kipnis, C., and Varadhan, S. R. S. (1986), Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions,Communications in Mathematical Physics,104:1-19. MR0834478

(14)

[17] Koliha, J. J. (1996b), Isolated spectral points,Proceedings of the American Mathematical Society,124(11):3417-3424. MR1342031

[18] Koliha, J. J., and Rakoˇcevi´c, V. (1998), Continuity of the Drazin inverse II,Studia Math-ematica, 131(2):167-177. MR1636348

[19] Lawler, G. F., and Sokal, A. D. (1988), Bounds on theL2_{spectrum for Markov chains and} Markov processes: a generalization of Cheeger’s inequality, Transactions of the American Mathematical Society,309(2):557-580. MR0930082

[20] Le´on, C. A. (2001), Maximum asymptotic variance of sums of finite Markov chains, Statis-tics and Probability Letters,54:413-415. MR1861387

[21] Meyer, C. D. (1975), The role of the group generalized inverse in the theory of finite Markov chains,SIAM Review,17(3): 443-464. MR0383538

[22] Meyn, S. P., and Tweedie, R. L. (1993),Markov Chains and Stochastic Stability,London, U.K.: Springer-Verlag. MR1287609

[23] Nummelin, E. (1984), General Irreducible Markov Chains and Non-negative Operators, New York: Cambridge University Press. MR0776608

[24] Nummelin, E. (2002), MC’s for MCMC’ists,International Statistical Review,70:215-240. [25] Peskun, P. H. (1973), Optimum Monte-Carlo sampling using Markov chains,Biometrika,

60(3):607-612. MR0362823

[26] Robert, C. P. (1995), Convergence control methods for Markov chain Monte Carlo algo-rithms,Statistical Science,10:231-253. MR1390517

[27] Robert, C. P., and Casella, G. (1999), Monte Carlo Statistical Methods, New York: Springer. MR1707311

[28] Roberts, G. O., and Tweedie, R. L. (1996), Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms,Biometrika,83:95-110. MR1399158

[29] Roberts, G. O., and Rosenthal, J. S. (2004), General state space Markov Chains and MCMC algorithms,Probability Surveys,1:20-71. MR2095565

[30] Tierney, L. (1994), Markov chains for exploring posterior distributions (with discussion), Annals of Statistics,22:1701-1762. MR1329166