PDF Lecture 5 Topics in Finite State Markov Chains - econdse.org

(1)

Stochastic Dynamics A. Banerji

Lecture 5 Topics in Finite State Markov Chains

A. Banerji

Department of Economics

February 26, 2015

(2)

Outline

(3)

Introduction

THE PROBLEM

Col. Kurtz, infinitely lived on fish, has a stockX_t of fish at noon on datet. He storesRt ≤Mand eats the rest C_t =X_t −R_t (he does not throw away anything because his utility is increasing in consumption). Mis his storage capacity.

Next morning he catchesW_t₊₁fish,W_t₊₁iid∼φon {0,1, . . . ,B}. So the stock of fish tomorrow noon, X_t₊₁=Rt +W_t+1.

Kurtz’s period utility functionU(c) =c^β, 0< β <1. He uses a policy functionσ mapping current stockX_t into savingRt. (i.e. a stationary policy - this induces a Markovian decision process). SinceX_t ≤B+M, let S={0,1, ...,B+M}. Soσ :S→ {0, ...,M}, with

σ(x)≤x,∀x ∈S. LetΣbe the set of all such maps. Then maxσ∈Σ E

"_∞ X

t=0

ρ^t(X_t −σ(X_t))^β

# F

(4)

Policy Function Induced Chain

The objective depends on the initial stock. If Kurtz starts with stockx, a givenσ ∈Σinduces a stochastic recursive sequence

X_t+1=σ(X_t) +W_t₊₁,(W_t)iid ∼φ, X₀=x ‡ For each levela∈ {0, ...,M}of stock left over after

consumption, letγ(a,dy)be the distribution ofa+W. So, X_t₊₁∼γ(σ(Xt),dy),t ≥0.

LetM_σ ≡(p_σ(x,y))be the Markov matrix corresponding to policyσ, so that for all statesx,y,

pσ(x,y)≡γ(σ(x),y).

For every statex, letΓ(x) ={0,1, ...,min{x,M}}denote the set of feasible actions for that state.

(5)

Kurtz’s Objective Function

Regarding eachWt as a function on a common outcome spaceΩ, if chance selectsω∈Ω, the entire sequence (W_t(ω))of shocks is determined, and via‡, the entire path or time series(Xt(ω)). The payoff from this path is

Y_σ(ω) =

∞

X

t=0

ρ^t(X_t(ω)−σ(X_t(ω)))^β

Kurtz’s objective functionFis the expectation ofY_σ w.r.t.

the probability measure overΩ. This is complicated since Ωis infinite and complicated. For now, note that if we truncate time atT,

E

" _T X

t=0

ρ^t(X_t −σ(X_t))^β

#

= X

x∈S^T⁺¹

F(x)q_T₊₁(x)

where the payoffsF(x) =PT

t=0ρ^t(x^t −σ(x^t))^β for the pathsx≡(x⁰, ...,x^T)are weighted by the path

probabilitiesq_T₊₁(x).

(6)

Computing Objective Function

Lemma E

" _T X

t=0

ρ^t(X_t−σ(X_t))^β

#

=

T

X

t=0

ρ^tM^t_σr_σ(x)

where rσ = ((y−σ(y))^β)_y∈S is a vector, andM^t_σrσ(x)is the xth element of the vectorM^t_σr_σ. Also, let

r_σ[y] = (y−σ(y))^β Proof.

Note thatx above is the initial stock/state. By linearity, Eh

PT

t=0ρ^t(Xt −σ(Xt))^β i

=PT

t=0ρ^tE(X_t −σ(Xt))^β. E(X_t −σ(X_t))^β =Er_σ[X_t] =P

y∈Sp_σ^t(x,y)r_σ[y].p^t_σ(x,y) is the probability that the state isy,tperiods after starting from initial statex. This is the(x,y)thelement ofM^t_σ, so the last expression is justM^t_σr_σ(x).

(7)

Bellman’s Equation

NOTE 1. The lemma says that rather than taking an expectation over all paths, we can take, for every periodt, the expected utility of consumption ifX₀=x and the kernel isp, and then take the discounted sum of these over allt∈ {0,1, ...,T}. 2. We show later that the limit of the expression in the lemma, that isP∞

t=0ρ^tM^t_σr_σ(x)is Kurtz’s expected payoff from policyσ, ifX₀=x. Call this v_σ(x). Define thevalue function v^∗(x) =sup_σ∈Σv_σ(x).

(Exists sinceΣis finite).

Theorem

v^∗ follows Bellman’s Equation. Let Γ(x)≡ {0,1, ...,min{x,M}}.

Bellman’s Equation

v^∗(x) = max

a∈Γ(x)







(x −a)^β +ρX

y∈S

v^∗(y)γ(a,y)







,x ∈S †

For eachx ∈S, the optimal action trades off current reward against discounted expected future reward.

(8)

Interpretation

Bellman’s equation reduces the infinite horizon problem into a two-period problem, for every initial stockx (provided we already know the value functionv^∗ !).

How will Kurtz maximize his objective function if the initial stock isx? Suppose he can do so starting tomorrow, for every stocky that he can have at noon tomorrow. Then tomorrow, his payoff will bev^∗(y)if that stock isy. If he chooses to storeatoday, his expected payoff from tomorrow, discounted one period, isρP

y∈Sv^∗(y)γ(a,y).

This is his continuation payoff. His current reward from choosingais(x−a)^β, and his total payoff is the sum of these two. Sincev^∗(x)is the max,amust be chosen optimally to attain it. A loweraimplies a higher current payoff, but stochastically lowerγ(a,y). This tradeoff is best resolved at the optimala, which depends on the initialx.

(9)

Bellman’s Equation - Proof

Proof.

Letv_σ be the payoff from a possibly mixed behavior policy σ. For allx ∈S,

v_σ(x) = P

a∈Γ(x)σ(x)(a){(x −a)^β+ρP

y∈Sγ(a,y)v_σ(y)}

≤ P

σ(x)(a){(x−a)^β+ρP

γ(a,y)v^∗(y)}

≤ P

σ(x)(a)max_a∈Γ(x)[(x−a)^β+ρP

γ(a,y)v^∗(y)]

= max_a∈Γ(x)[(x−a)^β+ρP

γ(a,y)v^∗(y)]P

σ(x)(a)

= max_a∈Γ(x)[(x−a)^β+ρP

γ(a,y)v^∗(y)]

(Hereσ(x)(a)is the probability of choosingain statex).

Since this inequality holds for allσ andvσ, it holds for the supv^∗.

For pure policy functionsσ,v_σ(x)≤RHS (Bellman’s eqn) is transparent: and so≤must hold for the sup,v^∗(x).

(10)

Proof continued

Proof.

Conversely, fix a statex ∈S. Letσ be s.t. at timet =0, σ(x) =a₀, wherea₀maximizes

[(x −a)^β+ρP

γ(a,y)v^∗(y)], and subsequently, for every statey ∈S,σspecifies a policy (function)σ_y whose valuev_σ_y(y)≥v^∗(y)−. So,

v_σ(x) = [(x−a₀)^β+ρP

γ(a₀,y)v_σ_y(y)

≥ [(x −a₀)^β+ρP

γ(a₀,y)v^∗(y)]−ρ

= max_a[(x−a)^β+ρP

γ(a,y)v^∗(y)]−ρ by the choice ofa₀. Sincev^∗(x)≥v_σ(x), the above inequality holds withv^∗(x)on the LHS. Since it holds for all≥0,v^∗(x)≥max_a[(x −a)^β+ρP

γ(a,y)v^∗(y)].

(11)

Optimal Policy

Theorem

σ^∗ is optimal (i.e., vσ^∗(x) =v^∗(x),∀x ∈S) if and only if

σ^∗(x) =argmax_a∈Γ(x)







(x −a)^β+ρX

y∈S

v^∗(y)γ(a,y)







,∀x ∈S ††

Proof.

Supposeσ^∗ is optimal.

So for allx,v_σ^∗(x) =v^∗(x) (1).

Sincev^∗satisfies Bellman’s equation, we have, for all x ∈S,

v^∗(x) = max_a∈Γ(x)n

(x−a)^β+ρP

y∈Sv^∗(y)γ(a,y) o

(2).

(12)

Optimal Policy 2

Proof contd.

Also by definition, we have v_σ^∗(x) =

n

(x−σ^∗(x))^β+ρP

y∈Sv_σ^∗(y)γ(σ^∗(x),y)o (3).

Eq(1) in Eq(3) yields v^∗(x) =n

(x−σ^∗(x))^β+ρP

y∈Sv^∗(y)γ(σ^∗(x),y)o (4).

Comparing (2) and (4), we get Eq††.

(13)

Optimal Policy contd.

Proof.

Contd. Conversely, supposeσ^∗ satisfies††. So, for X₀=x, we can attainv^∗(x)by playingσ^∗(x)for one period and getting continuation payoffs according tov^∗. Extending this, we can get the continuation payoffsv^∗(y) in the second stage by playingσ^∗(y)then, and getting continuation payoffs according tov^∗ from periodt =2 onwards. Sov^∗(x)equals the payoff from playing according toσ^∗ for 2 periods, then getting continuation payoffs according toρ²P

γ(,y)v^∗(y)from the 3rd period on. Extending this toN periods, the payoffv_σ^∗

N(x)from

the strategy that followsσ^∗forNperiods, and

subsequently getting continuation payoffs according to v^∗, equalsv^∗(x). The difference between this payoff, and the payoff from followingσ^∗forever, is thereforeρ^N times some bounded number. This difference goes to zero as N → ∞. Sov^∗(x) =v_σ^∗(x)

(14)

Fixed Point Iteration

Note: We’ve seen a more general one-deviation property in game theory for subgame perfect equilibria. The logic is the same.

How to solve forv^∗, σ^∗? Choose anyv :S→ <, and define the mapTv :S→ <by

Tv(x) = max

a∈Γ(x)







(x−a)^β+ρX

y∈S

v(y)γ(a,y)







,x ∈S 4

It will be shown thatT is a contraction with modulusρ, on V ={v :S→ <}with metricd∞.v^∗ is the unique (by Banach’s theorem) fixed point ofT. Since(Tⁿ(v))nis a Cauchy sequence (for anyv ∈V), we start with anyv and successively applyT untild∞(Tⁿ(v),Tⁿ⁻¹(v))is small.

(15)

Value Iteration Algorithm

Start with anyv :S→ <. Set Tolerance. Set Difference.

While Difference>Tolerance computeTv; set Difference

=d∞(Tv,v); setv =Tv. Having arrived at av that is

−close tov^∗, solve for av−greedy policyσ. (i.e. σis optimal w.r.t. v: greedy = locally optimal). That is, σ(x) =argmaxa[(x−a)^β+ρP

v(y)γ(a,y)],∀x ∈S.

Note that we appeal to a continuity argument (computed v is close tov^∗, so optimizer ofv is close to optimizer σ^∗)).

For practical purposes, don’t use functionsv and recursions on them; codev as a vector.

(16)

Howard’s Improvement Algo

I Start with someσ∈Σ.

I Computevσ.

I Compute av_σ−greedy policyσ⁰. That is, for every x ∈S,σ⁰(x)maximizes, over alla∈Γ(x),

{U(x−a) +ρP

z∈Zv_σ(a+z)φ(z)}

I Evaluate the difference||σ−σ⁰||.

I Setσ =σ⁰.

I Repeat until successive policies have difference zero.

Note how values are mapped and remapped from round to round.

(17)

Howard’s Improvement 2

I To evaluatev_σ, when starting the optimization problem from a particular initial statex, we may evaluate

PT

t=0ρ^tM_σ^tr_σ(x)for largeT.

I The code writes this as a function that takes an array

‘sigma’ (one action for each state), and returns an arrayv_sigma(one valuevσ(x)for every initial state x).

I The code defines a stochastic kernelp_σ(x,y), then uses it to define a functionM_sigmacorresponding to the linear transformationM_σ.

I It creates the arrayr_sigma, then steps through 50 terms ofρ^tM_σ^trσ, adding each one to the return value.

(18)

Stochastic Recursive Sequences

Stochastic Recursive Sequence

X_t₊₁=F(X_t,W_t+1),X₀∼ψ

(W_t)_t≥0is a sequence of independent r.v.s (shocks) Random variables are functions defined on an outcome space.

One way to model the SRS above: Chance moves once, at the beginning, selecting an outcomeω∈Ω. This determines values for all the shocks as(Wt(ω))t≥0, as well as a realizationX₀(ω)ofX₀.

(P{ω:X₀(ω) =x_i}=ψ(x_i)).

Thus the entire time path (time series)(Xt(ω))is determined recursively asX₁(ω) =F(X₀(ω),W₁(ω)), X₂(ω) =F(X₁(ω),W₂(ω)),...

From the SRS, we get a stochastic kernelpon S:

p(x,y) =P{F(x,Wt) =y} ≡P{ω∈Ω :F(x,Wt(ω)) =y}

(19)

MC as SRS

Conversely, here is how we can represent a Markov -(p, ψ)chain as an SRS. We wantX₀∼ψ,

X₁∼p(X₀,dy), and so forth.

We do this in two steps. First, we define a function τ : (0,1]→S, parameterized by a distributionφonS, such that ifW is uniformly distributed on(0,1], then τ(W;φ)has distributionφonS.

Once we have this, we use theW to model shocks.

Suppose(W_t)is an independently uniformly(0,1]

distributed sequence of r.v.s. Then

X₀=τ(W₀;ψ), andX_t₊₁=F(X_t,W_t+1), where F(X_t,W_t₊₁) =τ(W_t₊₁,p(X_t,dy)).

(20)

Defining τ (W ; φ)

LetS={x₁, ...,x_N}. Partition(0,1]into subintervals of lengthsφ(x_i),i=1, . . . ,N.

We could do this by defining, fori =1, . . . ,N, I(x_i, φ) = (Pi−1

s=1φ(x_s),Pi

s=1φ(x_s)].

Then definez 7→τ(z;φ)by τ(z;φ) =P

x∈Sx1{z ∈I(x;φ)}

Note thatτ is a simple function (a linear combination of indicator functions). Note that we use the functionτ, which has distributionφ, rather than simply the

distributionφitself, in order to get an SRS whereX_t+1 has the desired distribution.

(21)

Path Probabilities

Let(X_t)^∞₀ be a Markov process withX₀∼ψ.

Then(X₀,X₁)is a random vector distributed on S²=S×S. q₂(x⁰,x¹) =P{X₀=x⁰,X₁=x¹}= P{X₀=x⁰} ∩ {X₁=x¹}. So,

q₂(x⁰,x¹) =P{X₀=x⁰}P{X₁=x¹|X₀=x⁰}= ψ(x⁰)p(x⁰,x¹)

Similarly,

q₃(x⁰,x¹,x²) =P{X₀=x⁰,x₁=x¹}P{X₂=x²|X₀= x⁰,X₁=x¹}

=ψ(x⁰)p(x⁰,x¹)p(x¹,x²)The last equality follows from the Markov assumption.

More generally,q_T₊₁(x⁰, ...,x^T) =ψ(x⁰)Π^T_t=0⁻¹p(x^t,x^t⁺¹)

(22)

Joint Distributions

Lemma

Let D^t ⊂S, for t =0,1, ...,T . Then P∩t≤T{X_t ∈D^t}=

P

x⁰∈D⁰ψ(x⁰)P

x¹∈D¹p(x⁰,x¹). . .P

x^T∈D^T p(x^T⁻¹,x^T) Proof.

Sketch. LHS equals the sum of probabilities of all paths x= (x⁰, ...,x^T)∈X^T_t₌₀D^t. That is,it equals

P

x∈XtD^t q_T₊₁(x). By the path probability formula on previous slide,q_T₊₁(x)is a product. Now begin summing over these products by holding fixed(x⁰, ...,x^T⁻¹)and summing over allx^T ∈D^T; recursively work

backward.

(23)

Coupling

Coupling - a probabilistic technique to study stochastic stability.

Letpbe a stochastic kernel with Markov matrixM.

Globally stable(P(S),M)characterized byα(p^t)>0. For simplicity, what follows deals witht =1 case. NOTE:

α(p)>0 is equivalent to >0, where

≡min





 X

y∈S

p(x,y)p(x⁰,y) :x,x⁰ ∈S





 The idea behind >0 is also that of positive overlap between all pairs of distributions: if we start from different x,x⁰, there is positive probability that the chains meet next period.

We give a different argument for why >0 implies global stability.

(24)

Coupling 2

Let(X_t)and(X_t^∗)be independent Markov chains

generated byp, whereX₀∼ψ, andX₀^∗∼ψ^∗, whereψ^∗is any stationary distribution ofM. Consider a Markov process(X_t⁰), which follows(X_t)until the random time v ≡min{t ≥1:X_t =X_t^∗}, and then switches to following (X_t^∗).v is known as thecoupling time.

Claim: The distributions forX_t,X_t⁰, are the same, for allt.

(Soψ_t⁰ =ψM^t,∀t)

We need to show that(X_t⁰)t≥0is Markov-(p, ψ), just as (X_t)_t≥0is. (Note that(X_t^∗)_t_≥0is Markov-(p, ψ^∗)). That is, X₀⁰ ∼ψ(which is true) andX_t+1⁰ ∼p(X_t⁰,dy),t=0,1,2, ....

Fort<v, this is true sinceX_s⁰ =X_s,s=t,t+1 can be used on either side of the∼equation. Fort ≥v, using the fact thatX_t+1^∗ ∼p(X_t^∗,dy), andX_s⁰ =X_s^∗,s=t,t+1 on either side of∼completes the argument.

(25)

Coupling 3

We want to show that if >0, thenψM^t, the marginal distribution of (the arbitrary)X_t, converges toψ^∗ (this will also show thatψ^∗ is the unique stationary distribution, and establish global stability). First, a lemma.

Lemma

Suppose X and Y are rvs taking values in S, with distributionsφX, φY, then

||φ_X −φ_Y||∞≡max

x∈S |φ_X(x)−φ_Y(x)| ≤P{X 6=Y} Proof to be done later. Lemma says that ifX,Y are close, so are their distributions.

Applied to the r.v.sX_t⁰,X_t^∗, we have

||ψM^t −ψ^∗||_∞≤P{X_t⁰ 6=X_t^∗}

(26)

Coupling 4

To show that the LHS goes to 0 ast→ ∞, we show that the RHS does. Here’s whereX_t⁰ is used:

P{X_t⁰ 6=X_t^∗}=P∩_j≤t{X_j 6=X_j^∗}

for if the intersection on the RHS is non-empty, thenX_t⁰ andX_t^∗ would have been coupled and equal by timet.

We show that the term on the RHS goes to 0 ast → ∞.

Theorem

P∩_j≤t{X_j 6=X_j^∗} ≤(1−)^t,∀t∈N

where≡min





 X

y∈S

p(x,y)p(x⁰,y) :x,x⁰ ∈S







(27)

Coupling 5

Proof.

(X_t,X_t^∗)is a Markov chain onS×S. By independence, the initial condition is

P{(X₀,X₀^∗) =x= (x,s)}=ψ(x).ψ^∗(s)(call this ψ×ψ^∗(x⁰)); and the kernel is

k(x,x⁰) =k((x,s),(x⁰,s⁰)) =p(x,x⁰).p(s,s⁰).

LettingD={(x,s)∈S×S :x =s},

P∩_j≤t{X_j 6=X_j^∗}=P∩_j≤t{(X_j,X_j^∗)∈D^c}. This equals P

x⁰∈D^cψ×ψ^∗(x⁰)P

x¹∈D^ck(x⁰,x¹). . .P

x^t∈D^c k(x^t−1,x^t)†

NowP

x^t∈D^ck(x^t−1,x^t) =1−P

x^t∈Dk(x^t−1,x^t).

P

x^t∈Dk(x^t−1,x^t) =P

(x^t,s^t)∈Dp(x^t−1,x^t)p(s^t−1,s^t), which, by definition ofD, equals

P

y∈Sp(x^t−1,y)p(s^t−1,y), which is≥. So P

x^t∈D^ck(x^t−1,x^t)≤1−. The same argument holds for all terms of†.

(28)

Proof of the Lemma

Proof

Take any eventB.

P{X ∈B}=P{X ∈B} ∩ {X =Y} +P{X ∈B} ∩ {X 6=Y}

P{Y ∈B}=P{Y ∈B} ∩ {X =Y} +P{Y ∈B} ∩ {X 6=Y}

The first terms on the RHS of both equations are equal.

So

P{X ∈B} −P{Y ∈B}=P{X ∈B} ∩ {X 6=Y}

−P{Y ∈B} ∩ {X 6=Y}

SoP{X ∈B} −P{Y ∈B} ≤P{X ∈B} ∩ {X 6=Y} ≤ P{X 6=Y}

We could switchX andY above as well. So

|P{X ∈B} −P{Y ∈B}| ≤P{X 6=Y}.