• Tidak ada hasil yang ditemukan

PDF Lecture 5 Topics in Finite State Markov Chains - econdse.org

N/A
N/A
Protected

Academic year: 2024

Membagikan "PDF Lecture 5 Topics in Finite State Markov Chains - econdse.org"

Copied!
28
0
0

Teks penuh

(1)

Stochastic Dynamics A. Banerji

Lecture 5 Topics in Finite State Markov Chains

A. Banerji

Department of Economics

February 26, 2015

(2)

Stochastic Dynamics A. Banerji

Outline

(3)

Stochastic Dynamics A. Banerji

Introduction

THE PROBLEM

Col. Kurtz, infinitely lived on fish, has a stockXt of fish at noon on datet. He storesRt ≤Mand eats the rest Ct =Xt −Rt (he does not throw away anything because his utility is increasing in consumption). Mis his storage capacity.

Next morning he catchesWt+1fish,Wt+1iid∼φon {0,1, . . . ,B}. So the stock of fish tomorrow noon, Xt+1=Rt +Wt+1.

Kurtz’s period utility functionU(c) =cβ, 0< β <1. He uses a policy functionσ mapping current stockXt into savingRt. (i.e. a stationary policy - this induces a Markovian decision process). SinceXt ≤B+M, let S={0,1, ...,B+M}. Soσ :S→ {0, ...,M}, with

σ(x)≤x,∀x ∈S. LetΣbe the set of all such maps. Then maxσ∈Σ E

" X

t=0

ρt(Xt −σ(Xt))β

# F

(4)

Stochastic Dynamics A. Banerji

Policy Function Induced Chain

The objective depends on the initial stock. If Kurtz starts with stockx, a givenσ ∈Σinduces a stochastic recursive sequence

Xt+1=σ(Xt) +Wt+1,(Wt)iid ∼φ, X0=x ‡ For each levela∈ {0, ...,M}of stock left over after

consumption, letγ(a,dy)be the distribution ofa+W. So, Xt+1∼γ(σ(Xt),dy),t ≥0.

LetMσ ≡(pσ(x,y))be the Markov matrix corresponding to policyσ, so that for all statesx,y,

pσ(x,y)≡γ(σ(x),y).

For every statex, letΓ(x) ={0,1, ...,min{x,M}}denote the set of feasible actions for that state.

(5)

Stochastic Dynamics A. Banerji

Kurtz’s Objective Function

Regarding eachWt as a function on a common outcome spaceΩ, if chance selectsω∈Ω, the entire sequence (Wt(ω))of shocks is determined, and via‡, the entire path or time series(Xt(ω)). The payoff from this path is

Yσ(ω) =

X

t=0

ρt(Xt(ω)−σ(Xt(ω)))β

Kurtz’s objective functionFis the expectation ofYσ w.r.t.

the probability measure overΩ. This is complicated since Ωis infinite and complicated. For now, note that if we truncate time atT,

E

" T X

t=0

ρt(Xt −σ(Xt))β

#

= X

x∈ST+1

F(x)qT+1(x)

where the payoffsF(x) =PT

t=0ρt(xt −σ(xt))β for the pathsx≡(x0, ...,xT)are weighted by the path

probabilitiesqT+1(x).

(6)

Stochastic Dynamics A. Banerji

Computing Objective Function

Lemma E

" T X

t=0

ρt(Xt−σ(Xt))β

#

=

T

X

t=0

ρtMtσrσ(x)

where rσ = ((y−σ(y))β)y∈S is a vector, andMtσrσ(x)is the xth element of the vectorMtσrσ. Also, let

rσ[y] = (y−σ(y))β Proof.

Note thatx above is the initial stock/state. By linearity, Eh

PT

t=0ρt(Xt −σ(Xt))β i

=PT

t=0ρtE(Xt −σ(Xt))β. E(Xt −σ(Xt))β =Erσ[Xt] =P

y∈Spσt(x,y)rσ[y].ptσ(x,y) is the probability that the state isy,tperiods after starting from initial statex. This is the(x,y)thelement ofMtσ, so the last expression is justMtσrσ(x).

(7)

Stochastic Dynamics A. Banerji

Bellman’s Equation

NOTE 1. The lemma says that rather than taking an expectation over all paths, we can take, for every periodt, the expected utility of consumption ifX0=x and the kernel isp, and then take the discounted sum of these over allt∈ {0,1, ...,T}. 2. We show later that the limit of the expression in the lemma, that isP

t=0ρtMtσrσ(x)is Kurtz’s expected payoff from policyσ, ifX0=x. Call this vσ(x). Define thevalue function v(x) =supσ∈Σvσ(x).

(Exists sinceΣis finite).

Theorem

v follows Bellman’s Equation. Let Γ(x)≡ {0,1, ...,min{x,M}}.

Bellman’s Equation

v(x) = max

a∈Γ(x)

(x −a)β +ρX

y∈S

v(y)γ(a,y)

,x ∈S †

For eachx ∈S, the optimal action trades off current reward against discounted expected future reward.

(8)

Stochastic Dynamics A. Banerji

Interpretation

Bellman’s equation reduces the infinite horizon problem into a two-period problem, for every initial stockx (provided we already know the value functionv !).

How will Kurtz maximize his objective function if the initial stock isx? Suppose he can do so starting tomorrow, for every stocky that he can have at noon tomorrow. Then tomorrow, his payoff will bev(y)if that stock isy. If he chooses to storeatoday, his expected payoff from tomorrow, discounted one period, isρP

y∈Sv(y)γ(a,y).

This is his continuation payoff. His current reward from choosingais(x−a)β, and his total payoff is the sum of these two. Sincev(x)is the max,amust be chosen optimally to attain it. A loweraimplies a higher current payoff, but stochastically lowerγ(a,y). This tradeoff is best resolved at the optimala, which depends on the initialx.

(9)

Stochastic Dynamics A. Banerji

Bellman’s Equation - Proof

Proof.

Letvσ be the payoff from a possibly mixed behavior policy σ. For allx ∈S,

vσ(x) = P

a∈Γ(x)σ(x)(a){(x −a)β+ρP

y∈Sγ(a,y)vσ(y)}

≤ P

σ(x)(a){(x−a)β+ρP

γ(a,y)v(y)}

≤ P

σ(x)(a)maxa∈Γ(x)[(x−a)β+ρP

γ(a,y)v(y)]

= maxa∈Γ(x)[(x−a)β+ρP

γ(a,y)v(y)]P

σ(x)(a)

= maxa∈Γ(x)[(x−a)β+ρP

γ(a,y)v(y)]

(Hereσ(x)(a)is the probability of choosingain statex).

Since this inequality holds for allσ andvσ, it holds for the supv.

For pure policy functionsσ,vσ(x)≤RHS (Bellman’s eqn) is transparent: and so≤must hold for the sup,v(x).

(10)

Stochastic Dynamics A. Banerji

Proof continued

Proof.

Conversely, fix a statex ∈S. Letσ be s.t. at timet =0, σ(x) =a0, wherea0maximizes

[(x −a)β+ρP

γ(a,y)v(y)], and subsequently, for every statey ∈S,σspecifies a policy (function)σy whose valuevσy(y)≥v(y)−. So,

vσ(x) = [(x−a0)β+ρP

γ(a0,y)vσy(y)

≥ [(x −a0)β+ρP

γ(a0,y)v(y)]−ρ

= maxa[(x−a)β+ρP

γ(a,y)v(y)]−ρ by the choice ofa0. Sincev(x)≥vσ(x), the above inequality holds withv(x)on the LHS. Since it holds for all≥0,v(x)≥maxa[(x −a)β+ρP

γ(a,y)v(y)].

(11)

Stochastic Dynamics A. Banerji

Optimal Policy

Theorem

σ is optimal (i.e., vσ(x) =v(x),∀x ∈S) if and only if

σ(x) =argmaxa∈Γ(x)

(x −a)β+ρX

y∈S

v(y)γ(a,y)

,∀x ∈S ††

Proof.

Supposeσ is optimal.

So for allx,vσ(x) =v(x) (1).

Sincevsatisfies Bellman’s equation, we have, for all x ∈S,

v(x) = maxa∈Γ(x)n

(x−a)β+ρP

y∈Sv(y)γ(a,y) o

(2).

(12)

Stochastic Dynamics A. Banerji

Optimal Policy 2

Proof contd.

Also by definition, we have vσ(x) =

n

(x−σ(x))β+ρP

y∈Svσ(y)γ(σ(x),y)o (3).

Eq(1) in Eq(3) yields v(x) =n

(x−σ(x))β+ρP

y∈Sv(y)γ(σ(x),y)o (4).

Comparing (2) and (4), we get Eq††.

(13)

Stochastic Dynamics A. Banerji

Optimal Policy contd.

Proof.

Contd. Conversely, supposeσ satisfies††. So, for X0=x, we can attainv(x)by playingσ(x)for one period and getting continuation payoffs according tov. Extending this, we can get the continuation payoffsv(y) in the second stage by playingσ(y)then, and getting continuation payoffs according tov from periodt =2 onwards. Sov(x)equals the payoff from playing according toσ for 2 periods, then getting continuation payoffs according toρ2P

γ(,y)v(y)from the 3rd period on. Extending this toN periods, the payoffvσ

N(x)from

the strategy that followsσforNperiods, and

subsequently getting continuation payoffs according to v, equalsv(x). The difference between this payoff, and the payoff from followingσforever, is thereforeρN times some bounded number. This difference goes to zero as N → ∞. Sov(x) =vσ(x)

(14)

Stochastic Dynamics A. Banerji

Fixed Point Iteration

Note: We’ve seen a more general one-deviation property in game theory for subgame perfect equilibria. The logic is the same.

How to solve forv, σ? Choose anyv :S→ <, and define the mapTv :S→ <by

Tv(x) = max

a∈Γ(x)

(x−a)β+ρX

y∈S

v(y)γ(a,y)

,x ∈S 4

It will be shown thatT is a contraction with modulusρ, on V ={v :S→ <}with metricd.v is the unique (by Banach’s theorem) fixed point ofT. Since(Tn(v))nis a Cauchy sequence (for anyv ∈V), we start with anyv and successively applyT untild(Tn(v),Tn−1(v))is small.

(15)

Stochastic Dynamics A. Banerji

Value Iteration Algorithm

Start with anyv :S→ <. Set Tolerance. Set Difference.

While Difference>Tolerance computeTv; set Difference

=d(Tv,v); setv =Tv. Having arrived at av that is

−close tov, solve for av−greedy policyσ. (i.e. σis optimal w.r.t. v: greedy = locally optimal). That is, σ(x) =argmaxa[(x−a)β+ρP

v(y)γ(a,y)],∀x ∈S.

Note that we appeal to a continuity argument (computed v is close tov, so optimizer ofv is close to optimizer σ)).

For practical purposes, don’t use functionsv and recursions on them; codev as a vector.

(16)

Stochastic Dynamics A. Banerji

Howard’s Improvement Algo

I Start with someσ∈Σ.

I Computevσ.

I Compute avσ−greedy policyσ0. That is, for every x ∈S,σ0(x)maximizes, over alla∈Γ(x),

{U(x−a) +ρP

z∈Zvσ(a+z)φ(z)}

I Evaluate the difference||σ−σ0||.

I Setσ =σ0.

I Repeat until successive policies have difference zero.

Note how values are mapped and remapped from round to round.

(17)

Stochastic Dynamics A. Banerji

Howard’s Improvement 2

I To evaluatevσ, when starting the optimization problem from a particular initial statex, we may evaluate

PT

t=0ρtMσtrσ(x)for largeT.

I The code writes this as a function that takes an array

‘sigma’ (one action for each state), and returns an arrayv_sigma(one valuevσ(x)for every initial state x).

I The code defines a stochastic kernelpσ(x,y), then uses it to define a functionM_sigmacorresponding to the linear transformationMσ.

I It creates the arrayr_sigma, then steps through 50 terms ofρtMσtrσ, adding each one to the return value.

(18)

Stochastic Dynamics A. Banerji

Stochastic Recursive Sequences

Stochastic Recursive Sequence

Xt+1=F(Xt,Wt+1),X0∼ψ

(Wt)t≥0is a sequence of independent r.v.s (shocks) Random variables are functions defined on an outcome space.

One way to model the SRS above: Chance moves once, at the beginning, selecting an outcomeω∈Ω. This determines values for all the shocks as(Wt(ω))t≥0, as well as a realizationX0(ω)ofX0.

(P{ω:X0(ω) =xi}=ψ(xi)).

Thus the entire time path (time series)(Xt(ω))is determined recursively asX1(ω) =F(X0(ω),W1(ω)), X2(ω) =F(X1(ω),W2(ω)),...

From the SRS, we get a stochastic kernelpon S:

p(x,y) =P{F(x,Wt) =y} ≡P{ω∈Ω :F(x,Wt(ω)) =y}

(19)

Stochastic Dynamics A. Banerji

MC as SRS

Conversely, here is how we can represent a Markov -(p, ψ)chain as an SRS. We wantX0∼ψ,

X1∼p(X0,dy), and so forth.

We do this in two steps. First, we define a function τ : (0,1]→S, parameterized by a distributionφonS, such that ifW is uniformly distributed on(0,1], then τ(W;φ)has distributionφonS.

Once we have this, we use theW to model shocks.

Suppose(Wt)is an independently uniformly(0,1]

distributed sequence of r.v.s. Then

X0=τ(W0;ψ), andXt+1=F(Xt,Wt+1), where F(Xt,Wt+1) =τ(Wt+1,p(Xt,dy)).

(20)

Stochastic Dynamics A. Banerji

Defining τ (W ; φ)

LetS={x1, ...,xN}. Partition(0,1]into subintervals of lengthsφ(xi),i=1, . . . ,N.

We could do this by defining, fori =1, . . . ,N, I(xi, φ) = (Pi−1

s=1φ(xs),Pi

s=1φ(xs)].

Then definez 7→τ(z;φ)by τ(z;φ) =P

x∈Sx1{z ∈I(x;φ)}

Note thatτ is a simple function (a linear combination of indicator functions). Note that we use the functionτ, which has distributionφ, rather than simply the

distributionφitself, in order to get an SRS whereXt+1 has the desired distribution.

(21)

Stochastic Dynamics A. Banerji

Path Probabilities

Let(Xt)0 be a Markov process withX0∼ψ.

Then(X0,X1)is a random vector distributed on S2=S×S. q2(x0,x1) =P{X0=x0,X1=x1}= P{X0=x0} ∩ {X1=x1}. So,

q2(x0,x1) =P{X0=x0}P{X1=x1|X0=x0}= ψ(x0)p(x0,x1)

Similarly,

q3(x0,x1,x2) =P{X0=x0,x1=x1}P{X2=x2|X0= x0,X1=x1}

=ψ(x0)p(x0,x1)p(x1,x2)The last equality follows from the Markov assumption.

More generally,qT+1(x0, ...,xT) =ψ(x0Tt=0−1p(xt,xt+1)

(22)

Stochastic Dynamics A. Banerji

Joint Distributions

Lemma

Let Dt ⊂S, for t =0,1, ...,T . Then Pt≤T{Xt ∈Dt}=

P

x0∈D0ψ(x0)P

x1∈D1p(x0,x1). . .P

xT∈DT p(xT−1,xT) Proof.

Sketch. LHS equals the sum of probabilities of all paths x= (x0, ...,xT)∈XTt=0Dt. That is,it equals

P

x∈XtDt qT+1(x). By the path probability formula on previous slide,qT+1(x)is a product. Now begin summing over these products by holding fixed(x0, ...,xT−1)and summing over allxT ∈DT; recursively work

backward.

(23)

Stochastic Dynamics A. Banerji

Coupling

Coupling - a probabilistic technique to study stochastic stability.

Letpbe a stochastic kernel with Markov matrixM.

Globally stable(P(S),M)characterized byα(pt)>0. For simplicity, what follows deals witht =1 case. NOTE:

α(p)>0 is equivalent to >0, where

≡min

 X

y∈S

p(x,y)p(x0,y) :x,x0 ∈S

 The idea behind >0 is also that of positive overlap between all pairs of distributions: if we start from different x,x0, there is positive probability that the chains meet next period.

We give a different argument for why >0 implies global stability.

(24)

Stochastic Dynamics A. Banerji

Coupling 2

Let(Xt)and(Xt)be independent Markov chains

generated byp, whereX0∼ψ, andX0∼ψ, whereψis any stationary distribution ofM. Consider a Markov process(Xt0), which follows(Xt)until the random time v ≡min{t ≥1:Xt =Xt}, and then switches to following (Xt).v is known as thecoupling time.

Claim: The distributions forXt,Xt0, are the same, for allt.

(Soψt0 =ψMt,∀t)

We need to show that(Xt0)t≥0is Markov-(p, ψ), just as (Xt)t≥0is. (Note that(Xt)t≥0is Markov-(p, ψ)). That is, X00 ∼ψ(which is true) andXt+10 ∼p(Xt0,dy),t=0,1,2, ....

Fort<v, this is true sinceXs0 =Xs,s=t,t+1 can be used on either side of the∼equation. Fort ≥v, using the fact thatXt+1 ∼p(Xt,dy), andXs0 =Xs,s=t,t+1 on either side of∼completes the argument.

(25)

Stochastic Dynamics A. Banerji

Coupling 3

We want to show that if >0, thenψMt, the marginal distribution of (the arbitrary)Xt, converges toψ (this will also show thatψ is the unique stationary distribution, and establish global stability). First, a lemma.

Lemma

Suppose X and Y are rvs taking values in S, with distributionsφX, φY, then

||φX −φY||≡max

x∈SX(x)−φY(x)| ≤P{X 6=Y} Proof to be done later. Lemma says that ifX,Y are close, so are their distributions.

Applied to the r.v.sXt0,Xt, we have

||ψMt −ψ||≤P{Xt0 6=Xt}

(26)

Stochastic Dynamics A. Banerji

Coupling 4

To show that the LHS goes to 0 ast→ ∞, we show that the RHS does. Here’s whereXt0 is used:

P{Xt0 6=Xt}=Pj≤t{Xj 6=Xj}

for if the intersection on the RHS is non-empty, thenXt0 andXt would have been coupled and equal by timet.

We show that the term on the RHS goes to 0 ast → ∞.

Theorem

Pj≤t{Xj 6=Xj} ≤(1−)t,∀t∈N

where≡min

 X

y∈S

p(x,y)p(x0,y) :x,x0 ∈S

(27)

Stochastic Dynamics A. Banerji

Coupling 5

Proof.

(Xt,Xt)is a Markov chain onS×S. By independence, the initial condition is

P{(X0,X0) =x= (x,s)}=ψ(x).ψ(s)(call this ψ×ψ(x0)); and the kernel is

k(x,x0) =k((x,s),(x0,s0)) =p(x,x0).p(s,s0).

LettingD={(x,s)∈S×S :x =s},

Pj≤t{Xj 6=Xj}=Pj≤t{(Xj,Xj)∈Dc}. This equals P

x0∈Dcψ×ψ(x0)P

x1∈Dck(x0,x1). . .P

xt∈Dc k(xt1,xt)†

NowP

xt∈Dck(xt1,xt) =1−P

xt∈Dk(xt1,xt).

P

xt∈Dk(xt1,xt) =P

(xt,st)∈Dp(xt−1,xt)p(st−1,st), which, by definition ofD, equals

P

y∈Sp(xt−1,y)p(st−1,y), which is≥. So P

xt∈Dck(xt1,xt)≤1−. The same argument holds for all terms of†.

(28)

Stochastic Dynamics A. Banerji

Proof of the Lemma

Proof

Take any eventB.

P{X ∈B}=P{X ∈B} ∩ {X =Y} +P{X ∈B} ∩ {X 6=Y}

P{Y ∈B}=P{Y ∈B} ∩ {X =Y} +P{Y ∈B} ∩ {X 6=Y}

The first terms on the RHS of both equations are equal.

So

P{X ∈B} −P{Y ∈B}=P{X ∈B} ∩ {X 6=Y}

−P{Y ∈B} ∩ {X 6=Y}

SoP{X ∈B} −P{Y ∈B} ≤P{X ∈B} ∩ {X 6=Y} ≤ P{X 6=Y}

We could switchX andY above as well. So

|P{X ∈B} −P{Y ∈B}| ≤P{X 6=Y}.

Referensi

Dokumen terkait