PDF Lecture 2.2 Finite State Markov Chains - econdse.org

(1)

Stochastic Dynamics A. Banerji

Markov Chains

Introduction Marginal Distributions Other Identities

Stability of Finite State MCs

Stationary Distributions Dobrushin Coefficient

Lecture 2.2 Finite State Markov Chains

A. Banerji

Department of Economics

February 24, 2014

(2)

Markov Chains

Outline

Markov Chains Introduction

Marginal Distributions Identities

Stability of Finite State MCs Stationary Distributions Dobrushin Coefficient

(3)

Markov Chains

Stochastic Kernels

Finite State SpaceS={x₁,x₂, ...,x_N}

Distribution onS. A functionφ:S→ <s.t. φ(x)≥0, for allx ∈SandP

x∈Sφ(x) =1.

The set of all distributions onS,P(S), is the N-1 dimensional unit simplex in<^N.

Definition

A stochastic kernel onSis a functionp:S×S→[0,1]

s.t.P

y∈Sp(x,y) =1, for allx ∈S.

For eachx ∈S, we call the corresponding distribution on S,p(x,dy). For a finite state spaceS, we can write down theNdistributions in anN×N matrix.

(4)

Markov Chains

M = (p(x,dy))_x∈S

=







p(x₁,x₁) . . . p(x₁,x_N) ... ... ... p(x_N,x₁) . . . p(x_N,x_N)







Definition

The Markov Chain onSgenerated by stochastic kernelp and initial conditionψ∈ P(S)is the sequence(X_t)^∞_t=0of random variables defined by

(i)X₀∼ψ

(ii)Fort =0,1,2, ...,X_t+1∼p(X_t,dy)

So ifXt =x,P(X_t+1=y|X_t =x) =p(x,y). Called Markov -(p, ψ)chain. Discuss Hamilton(2005), Quah(1993).

(5)

Markov Chains

Marginal Distribution - Approximation

Let(X_t)^∞_t₌₀be a Markov Chain onSgenerated by a stochastic kernelpand initial conditionψ. The marginal distributionψ_t(y)≡P(X_t =y), for ally ∈S.

Approximatingψt(y)by Monte-Carlo. DrawXt a large number of times and compute the relative frequency ofy. Specifically:

DrawX₀fromψa largennumber of times. Each of these times:

Fork =1, ...,t, drawX_k fromp(x_k−1,dy).

Fory ∈S,ψt(y)' _n¹Pn i=11_{Xi

t=y}

Now do JS exercises 4.2.1, 4.2.2.

(6)

Markov Chains

Marginal Distribution - Recursion

By the Law of Total Probability, P{X_t₊₁=y}=P

x∈SP{X_t+1=y|X_t =x}P{X_t =x} (The above just integrates outXt from the joint(X_t+1,Xt)).

That is, ψt+1(y) =P

x∈Sp(x,y)ψt(x) = (ψt(x))_x∈S.(p(x,y))x∈S

Stacking these for ally ∈Sin one row, ψ_t+1= (ψ_t₊₁(y))_y∈S=ψ_tM

Bytrecursions, we get

ψ_t+1=ψM^t+1 †

The probabilities of the states att+1 is a weighted average of the transitionsp(x,dy)(rows ofM) weighted by the probabilities of the states att.

Example

Quah - Starting in extreme poverty (State 1), what is the marginal distribution after 10, 60 and 160 periods.

(7)

Markov Chains

Powers of M

Let(p^k(x,y))N×N ≡M^k Lemma

p^k(x,y) =P{X_t_+k =y|X_t =x}

Proof.

Letδx ∈ P(S)be the degenerate distribution that givesx with probability 1.

SoP{X_t+k =y|X_t =x}=P{X_t+k =y|X_t ∼δx}

This is just the marginal distributionψ_t+k(y)with initial conditionX_t ∼δ_x. By recursion†,

ψ_t+k =δxM^k =p^k(x,dy).

(8)

Markov Chains

Expectation

SupposeX_t ∼ψ∈ P(S). So the marginal distribution of X_t_+k,ψ_t_+k =ψM^k. So ifh:S→ <, the expectation E[h(X_t+k)|X_t ∼ψ] =P

y∈SψM^k(y)h(y) =ψM^kh whereh≡(h(y))_y∈S (we’ve taken an inner product).

Example

Ifψ=δ_x, we have E[h(X_t+K)|X_t =x] =P

y∈Sp^k(x,y)h(y) =δxM^kh. This is just thex^throw of the matrixM^k multiplied by the vectorh.

For Hamilton(2005), leth= (1000,0,−1000)be profits of a firm in the 3 states. What is expected profit 5 periods from now, if we are currently in severe recession (state 1)? Do JS 4.2.4-5.

(9)

Markov Chains

Chapman-Kolmogorov Equation

The Equation:

p^k+j(x,y) =X

z∈S

p^k(x,z)p^j(z,y)

Proof.

M^k+j =M^kM^j. So the(x,y)thelement ofM^k+j is the inner product of thexthrow ofM^k andythcolumn of M^j.

To go from statex to statey ink +j steps, we must go to statez ∈Sink steps, then from there toy injsteps. For fixedz, multiply the 2 probabilities; then add over all (mutually exclusive)z’s. In other standard notation, P{X_k_+j =y|X₀=x}=P

z∈SP{X_k+j =y|X_k =z}P{X_k = z|X₀=x}

(10)

Markov Chains

Exercise

JS 4.2.6. In terms of sums, p^k(x,y) =P

z¹∈Sp(x,z¹)P

z²∈Sp(z¹,z²). . . P

z^k⁻¹∈Sp(z^k−2,z^k−1)p(z^k−1,y) Proof.

p^k(x,y)is the sum of probabilities of all mutually

exclusive outcome paths of type{xz¹z². . .z^k⁻¹y}, which equals

P

all{xz¹z²...z^k−1y}p(x,z¹)p(z¹,z²). . .p(z^k−1,y)

=P

all{xz¹z²...z^k−2}p(x,z¹). . . p(z^k−3,z^k−2)P

z^k⁻¹∈Sp(z^k−2,z^k−1)p(z^k−1,y)

where we’ve fixed{xz¹. . .z^k−2}and summed across the last stretch of the paths ending aty. Working backwards all the way we get

P

z¹∈Sp(x,z¹)P

z²∈Sp(z¹,z²)P

z³∈S. . . P

z^k⁻¹∈Sp(z^k−2,z^k−1)p(z^k−1,y)

(11)

Markov Chains

Introduction

I Investigate the sequence(ψ_t)of marginal distributions for Quah, astgrows large.

I (ψt)settles at someψ^∗, regardless of where we start

I Global asymptotic stability of Markov Chains refers to the settling down of the marginal distribution to a unique distribution, regardless of initial condition

I Known asergodicity

(12)

Markov Chains

Dynamical System Corresponding to FSMC

I The marginal distributions of the Markov Process (X_t)with matrixM are(ψ_t) = (ψM^t), if

X₀∼ψ∈ P(S).

I Notice thatM:P(S)→ P(S)(JS 4.3.1.). Indeed, for anyψ∈ P(S),ψM =P

x∈Sψ(x)p(x,dy). So, for all y ∈S, they^thcoordinate ofψM,

ψM(y) =P

x∈Sψ(x)p(x,y)>0. Also,P

y∈SψM(y)

=P

y∈S

P

x∈Sψ(x)p(x,y)

=P

x∈Sψ(x)P

y∈Sp(x,y) =P

x∈Sψ(x) =1. So, ψM ∈ P(S).

Basically,ψM is a convex combination of points from P(S)and therefore belongs toP(S).

I Impose the norm|| ||₁and the corresponding metric d₁onP(S). Then(P(S),M)is a dynamical system, withψt+1=ψtM,t=0,1,2, . . ..

(13)

Markov Chains

Stationary Distributions

Definition

A distributionψ^∗∈ P(S)isstationaryor invariant forMif ψ^∗M=ψ^∗. That is,ψ^∗ is a fixed point of the dynamical system(P(S),M).

Theorem

Every Markov chain on a finite state space has at least one stationary distribution.

Proof.

P(S)is compact and convex (it’s just the (N-1) dimensional unit simplex), andM is linear and hence continuous. So by Brouwer’s fixed point theorem,M has a fixed point onP(S).

Note: There could be many fixed points. e.g. JS 4.3.4.

For the Markov matrixI_N (theN×Nidentity matrix), everyψ∈ P(S)is stationary.

(14)

Markov Chains

Some Implications

Lemma

M is d₁-nonexpansive onP(S). That is, for all ψ, ψ⁰ ∈ P(S), d₁(ψM, ψ⁰M)≤d₁(ψ, ψ⁰).

Proof.

||ψM−ψ⁰M||₁=X

y∈S

|ψM(y)−ψ⁰M(y)|

=P

y∈S|P

x∈S(ψ(x)−ψ⁰(x))p(x,y)|

≤P

y∈S

P

x∈S|(ψ(x)−ψ⁰(x))p(x,y)|

= P

x∈S|ψ(x)−ψ⁰(x)|P

y∈Sp(x,y) =P

x∈S|ψ(x)−ψ⁰(x)|

=||ψ−ψ⁰||₁.The inequality follows from the triangle inequality.

(15)

Markov Chains

Computing Stationary Distributions

ψ∈ P(S)is stationary or a fixed point iffψ(I_N−M) =0 i.e.(I_N−M)^Tψ^T =0. We can solve the system of equations, and normalizeψ^T by dividing by its norm, so that it lies inP(S). Alternatively: JS 4.3.5. Let

1_N ≡(1,1, . . . ,1)and1_N×N anN×Nmatrix of ones. Ifψ is a fixed point ofM andψ∈ P(S), we have

1_N =ψ(I_N−M+1_N×N). Indeed, sinceψ∈ P(S), so ψ1_N×N =1_N. So,ψ(I_N−M) =0, orψis a fixed point of M. However, ifψ /∈ P(S), then it is not necessarily true that1_N =ψ(I_N−M+1_N×N).

So solving(I_N−M+1_N×N)^Tψ^T =1^T_N works. (do JS 4.3.6-7)

(16)

Markov Chains

Stability

Definition

The dynamical system(P(S),M)is globally stable if it has a unique stationary distribution (fixed point) ψ^∗∈ P(S), and for allψ∈ P(S),

d₁(ψM^t, ψ^∗)≡ ||ψM^t−ψ^∗||₁→0, ast → ∞.

Need more than nonexpansiveness ofMfor stability.

Stability fails forM =I_N. Succeeds ‘best’ ifp(x,dy)is identical for allx ∈S(we then jump to the unique fixed point in a single step, from anyψ∈ P(S)).

Example

M =

0 1 1 0

ψ^∗= (1/2,1/2)is the unique fixed point; for every other ψ= (ψ₁,1−ψ₁)6=ψM= (1−ψ₁, ψ₁). This also shows that(ψM^t)oscillates witht, so the system is not globally stable.

(17)

Markov Chains

Dobrushin Coefficient

Definition

TheDobrushin Coefficientof a stochastic kernelpis

α(p)≡ min

(x,x⁰)∈S×S

X

y∈S

p(x,y)∧p(x⁰,y)

wherea∧b≡min{a,b}

Remarks.1.Hamilton and Quah. α(p)equals 0.029 for M_H and 0 forM_Q (see 1st and 5th rows ofM_Q). 2.

α(p)∈[0,1], for allp. It equals 1 iffp(x,dy)is identical for allx ∈S. It equals 0 forI_N and the periodic kernel on the previous slide. 3. α(p)>0 iff for every pair

(x,x⁰)∈S×S,p(x,dy)andp(x⁰,dy)overlap(assign positive probability to at least one common statey). From any 2 states then, there is positive probability that the chains will meet next period.

(18)

Markov Chains

Dobrushin Coefficient and Stability

Theorem

Let p be a stochastic kernel with Markov matrix M. Then for everyφ, ψ∈ P(S),

||φM−ψM||₁≤(1−α(p))||φ−ψ||₁ Moreover this bound is tight; forλ <(1−α(p)), there exists a pairφ, ψwhich violates the≤inequality.

The proof consists of 3 steps/lemmas.

Lemma

JS C.2.1. Letφ, ψ∈ P(S)and h:S→ <₊. Then

|X

x∈S

h(x)φ(x)−X

x∈S

h(x)ψ(x)| ≤ 1 2sup

x,x⁰

|h(x)−h(x⁰)|.||φ−ψ||₁

(19)

Markov Chains

Stability 2

JS C.2.1. provides an upper bound for the (absolute) difference of expectationhunderφandψ. See proof in Stachurski (appendix).

Lemma C.2.2.

||φM−ψM||₁≤ ¹₂sup_x,x⁰||p(x,dy)−p(x⁰,dy)||₁.||φ−ψ||₁ Proof.

See Stachurski (appendix). The inequality looks similar to C.2.1. Exercise 4.3.2. implies

||φM−ψM||₁=2 sup_A⊂S|φM(A)−ψM(A)|. We introduce the functionhused in C.2.1. by noting that

|φM(A)−ψM(A)|=

|P

x∈SP(x,A)φ(x)−P

x∈SP(x,A)ψ(x)|, where P(x,A) =P

y∈Ap(x,y).

(20)

Markov Chains

Stability 3

To prove the first claim of the theorem, we now show that 1

2sup

x,x⁰

||p(x,dy)−p(x⁰,dy)||₁=1−inf

x,x⁰

X

y∈S

p(x,y)∧p(x⁰,y)

It suffices to show that for everyx,x⁰, 1

2||p(x,dy)−p(x⁰,dy)||₁=1−X

y∈S

p(x,y)∧p(x⁰,y)

This is true for any pair of distributions, as below.

Lemma

C.2.3. For every pairµ, ν ∈ P(S)we have

1

2||µ−ν||=1−X

y∈S

µ(y)∧ν(y)

(21)

Markov Chains

Stability 4

To show that the bound is tight, note that 1−α(p)

= ¹₂sup_x,x⁰||p(x,dy)−p(x⁰,dy)||₁

=sup{x 6=x⁰}||p(x,dy)−p(x⁰,dy)||₁

||δx−δ

x0||₁ ≤sup_µ6=ν ^|µM−νM||_|µ−ν|| ¹

1

The second equality holds since||δ_x−δ_x⁰||₁=2. The final inequality holds because the set of degenerate distributions likeδ_x, δ_x⁰ is a subset of the set of all distributions. More simply, just putM =I_N (soα(p) =0).

Now for the main theorem.

Theorem

Let p be a stochastic kernel on a finite set S, and M the corresponding Markov matrix. The dynamical system (P(S),M)is globally stable iff there exists a t ∈Ns.t.

α(p^t)>0.

(22)

Markov Chains

Stability 5

Proof.

M is nonexpansive. From the earlier theorem, we know that sinceα(p^t)>0,(P(S),M^t)is globally stable. So by lemma 4.1.1.,(P(S),M)is globally stable.

Conversely, suppose(P(S),M)is globally stable. So there is a unique stationary distributionψ^∗, and ψM^t →ψ^∗ for allψ∈ P(S). In particular,

δxM^t =p^t(x,dy)→ψ^∗, for everyx ∈S. So for allx ∈S, p^t(x,y)→ψ^∗(y), for ally ∈S. Sinceψ^∗ is a distribution, there is at least oney ∈S s.t.ψ^∗(y)>0. So for thisy, we have from the convergence that fortlarge enough, p^t(x,y)>0, for allx ∈S. Thus there existstsuch that all rowsp^t(x,dy)ofM^t overlap at thisy; thus the Dobrushin coefficientα(p^t)>0.

(23)

Markov Chains

Exercises

Theorem shows:for every pair(x,x⁰)of states,p^t(x,dy) andp^t(x⁰,dy)overlap. Starting at any 2 different points today, the chains meet with positive probabilitytperiods later. Extreme form: p^t(x,dy)same for allx or

convergence int periods.

I α(p)>0 for Hamilton’s matrix but zero for Quah’s matrix. However, for the 23rd iterateM_Q²³ reported by Quah,α(p²³)>0. So(P(S),M_Q)is globally stable.

I JS 4.3.20. Code to calculateα(p^t),t=1,2, . . . ,T for a given Markov matrixM, stopping at the firstt s.t.

α(p^t)>0. Show that thist =2 for Quah’s matrix.

I (s,S)(or(q,Q)) inventory dynamics. A firm with inventoryX_t at the start of periodt, has the option of ordering inventory up to its maximum storing

capacityQ. At the end of periodt, demandD_t₊₁is observed (all non-negative integers). The firm meets demand up to its current stock level; remaining inventory is carried over to next period.

(24)

Markov Chains

Inventory Dynamics exercise

(Dt)t≥1is an iid sequence of random variables taking nonnegative integer values according to distribution b(d)≡P{D_t =d}= (1/2)^d+1.

The firm follows a stationary policy: IfXt ≤q, order inventory to top up to equalQ; otherwise, order no inventory (the choice ofqis the firm’s predecided policy choice).

So,

X_t+1=

max{Q−D_t+1,0} ifX_t ≤q max{X_t −D_t+1,0} ifX_t >q LetS={0,1, ...,Q}. What is the stochastic kernel M_q= (p(x,y))corresponding to restocking policyq?

(25)

Markov Chains

(q,Q) Dynamics

Letx ≤q. Then

X_t+1=

Q−i with Pr(1/2)ⁱ⁺¹, i =0,1, ...,Q−1

0 with Pr(1/2)^Q

Letx >q. Then

X_t₊₁=

x−i with Pr(1/2)ⁱ⁺¹, i =0,1, ...,x−1

0 with Pr(1/2)^x

M_q=







(1/2)^Q (1/2)^Q (1/2)^Q−1 . . . (1/2) . . . . (1/2)^Q (1/2)^Q (1/2)^Q−1 . . . (1/2)

. . . .

(1/2)^x (1/2)^x (1/2)^x−1 . . . (1/2) 0

. . . .

(1/2)^Q (1/2)^Q (1/2)^Q−1 . . . (1/2)







(26)

Markov Chains

(q,Q) Dynamics cont

Staring atMq, we see that regardless ofq(andQ), α(p)>0. So(P(S),M_q)is globally stable.

JS 4.3.23. Compute the stationary distribution when (q,Q) = (2,5).

JS 4.3.24. SupposeQ=20, and the fixed cost of

ordering inventory in any period is 0.1. The firm buys the product at zero cost per unit and sells at USD 1 per unit.

For eachq∈ {0,1,2, ...,20}, evaluate the stationary distributionψ^∗_q, and evaluate the firm’s expected per period profit at this stationary distribution (i.e. compute the firm’s long run average profits with restocking policy q). Show that this profit is maximized atq=7.