Stochastic Dynamics A. Banerji
Lecture 5 Topics in Finite State Markov Chains
A. Banerji
Department of Economics
February 26, 2015
Stochastic Dynamics A. Banerji
Outline
Stochastic Dynamics A. Banerji
Introduction
THE PROBLEM
Col. Kurtz, infinitely lived on fish, has a stockXt of fish at noon on datet. He storesRt ≤Mand eats the rest Ct =Xt −Rt (he does not throw away anything because his utility is increasing in consumption). Mis his storage capacity.
Next morning he catchesWt+1fish,Wt+1iid∼φon {0,1, . . . ,B}. So the stock of fish tomorrow noon, Xt+1=Rt +Wt+1.
Kurtz’s period utility functionU(c) =cβ, 0< β <1. He uses a policy functionσ mapping current stockXt into savingRt. (i.e. a stationary policy - this induces a Markovian decision process). SinceXt ≤B+M, let S={0,1, ...,B+M}. Soσ :S→ {0, ...,M}, with
σ(x)≤x,∀x ∈S. LetΣbe the set of all such maps. Then maxσ∈Σ E
"∞ X
t=0
ρt(Xt −σ(Xt))β
# F
Stochastic Dynamics A. Banerji
Policy Function Induced Chain
The objective depends on the initial stock. If Kurtz starts with stockx, a givenσ ∈Σinduces a stochastic recursive sequence
Xt+1=σ(Xt) +Wt+1,(Wt)iid ∼φ, X0=x ‡ For each levela∈ {0, ...,M}of stock left over after
consumption, letγ(a,dy)be the distribution ofa+W. So, Xt+1∼γ(σ(Xt),dy),t ≥0.
LetMσ ≡(pσ(x,y))be the Markov matrix corresponding to policyσ, so that for all statesx,y,
pσ(x,y)≡γ(σ(x),y).
For every statex, letΓ(x) ={0,1, ...,min{x,M}}denote the set of feasible actions for that state.
Stochastic Dynamics A. Banerji
Kurtz’s Objective Function
Regarding eachWt as a function on a common outcome spaceΩ, if chance selectsω∈Ω, the entire sequence (Wt(ω))of shocks is determined, and via‡, the entire path or time series(Xt(ω)). The payoff from this path is
Yσ(ω) =
∞
X
t=0
ρt(Xt(ω)−σ(Xt(ω)))β
Kurtz’s objective functionFis the expectation ofYσ w.r.t.
the probability measure overΩ. This is complicated since Ωis infinite and complicated. For now, note that if we truncate time atT,
E
" T X
t=0
ρt(Xt −σ(Xt))β
#
= X
x∈ST+1
F(x)qT+1(x)
where the payoffsF(x) =PT
t=0ρt(xt −σ(xt))β for the pathsx≡(x0, ...,xT)are weighted by the path
probabilitiesqT+1(x).
Stochastic Dynamics A. Banerji
Computing Objective Function
Lemma E
" T X
t=0
ρt(Xt−σ(Xt))β
#
=
T
X
t=0
ρtMtσrσ(x)
where rσ = ((y−σ(y))β)y∈S is a vector, andMtσrσ(x)is the xth element of the vectorMtσrσ. Also, let
rσ[y] = (y−σ(y))β Proof.
Note thatx above is the initial stock/state. By linearity, Eh
PT
t=0ρt(Xt −σ(Xt))β i
=PT
t=0ρtE(Xt −σ(Xt))β. E(Xt −σ(Xt))β =Erσ[Xt] =P
y∈Spσt(x,y)rσ[y].ptσ(x,y) is the probability that the state isy,tperiods after starting from initial statex. This is the(x,y)thelement ofMtσ, so the last expression is justMtσrσ(x).
Stochastic Dynamics A. Banerji
Bellman’s Equation
NOTE 1. The lemma says that rather than taking an expectation over all paths, we can take, for every periodt, the expected utility of consumption ifX0=x and the kernel isp, and then take the discounted sum of these over allt∈ {0,1, ...,T}. 2. We show later that the limit of the expression in the lemma, that isP∞
t=0ρtMtσrσ(x)is Kurtz’s expected payoff from policyσ, ifX0=x. Call this vσ(x). Define thevalue function v∗(x) =supσ∈Σvσ(x).
(Exists sinceΣis finite).
Theorem
v∗ follows Bellman’s Equation. Let Γ(x)≡ {0,1, ...,min{x,M}}.
Bellman’s Equation
v∗(x) = max
a∈Γ(x)
(x −a)β +ρX
y∈S
v∗(y)γ(a,y)
,x ∈S †
For eachx ∈S, the optimal action trades off current reward against discounted expected future reward.
Stochastic Dynamics A. Banerji
Interpretation
Bellman’s equation reduces the infinite horizon problem into a two-period problem, for every initial stockx (provided we already know the value functionv∗ !).
How will Kurtz maximize his objective function if the initial stock isx? Suppose he can do so starting tomorrow, for every stocky that he can have at noon tomorrow. Then tomorrow, his payoff will bev∗(y)if that stock isy. If he chooses to storeatoday, his expected payoff from tomorrow, discounted one period, isρP
y∈Sv∗(y)γ(a,y).
This is his continuation payoff. His current reward from choosingais(x−a)β, and his total payoff is the sum of these two. Sincev∗(x)is the max,amust be chosen optimally to attain it. A loweraimplies a higher current payoff, but stochastically lowerγ(a,y). This tradeoff is best resolved at the optimala, which depends on the initialx.
Stochastic Dynamics A. Banerji
Bellman’s Equation - Proof
Proof.
Letvσ be the payoff from a possibly mixed behavior policy σ. For allx ∈S,
vσ(x) = P
a∈Γ(x)σ(x)(a){(x −a)β+ρP
y∈Sγ(a,y)vσ(y)}
≤ P
σ(x)(a){(x−a)β+ρP
γ(a,y)v∗(y)}
≤ P
σ(x)(a)maxa∈Γ(x)[(x−a)β+ρP
γ(a,y)v∗(y)]
= maxa∈Γ(x)[(x−a)β+ρP
γ(a,y)v∗(y)]P
σ(x)(a)
= maxa∈Γ(x)[(x−a)β+ρP
γ(a,y)v∗(y)]
(Hereσ(x)(a)is the probability of choosingain statex).
Since this inequality holds for allσ andvσ, it holds for the supv∗.
For pure policy functionsσ,vσ(x)≤RHS (Bellman’s eqn) is transparent: and so≤must hold for the sup,v∗(x).
Stochastic Dynamics A. Banerji
Proof continued
Proof.
Conversely, fix a statex ∈S. Letσ be s.t. at timet =0, σ(x) =a0, wherea0maximizes
[(x −a)β+ρP
γ(a,y)v∗(y)], and subsequently, for every statey ∈S,σspecifies a policy (function)σy whose valuevσy(y)≥v∗(y)−. So,
vσ(x) = [(x−a0)β+ρP
γ(a0,y)vσy(y)
≥ [(x −a0)β+ρP
γ(a0,y)v∗(y)]−ρ
= maxa[(x−a)β+ρP
γ(a,y)v∗(y)]−ρ by the choice ofa0. Sincev∗(x)≥vσ(x), the above inequality holds withv∗(x)on the LHS. Since it holds for all≥0,v∗(x)≥maxa[(x −a)β+ρP
γ(a,y)v∗(y)].
Stochastic Dynamics A. Banerji
Optimal Policy
Theorem
σ∗ is optimal (i.e., vσ∗(x) =v∗(x),∀x ∈S) if and only if
σ∗(x) =argmaxa∈Γ(x)
(x −a)β+ρX
y∈S
v∗(y)γ(a,y)
,∀x ∈S ††
Proof.
Supposeσ∗ is optimal.
So for allx,vσ∗(x) =v∗(x) (1).
Sincev∗satisfies Bellman’s equation, we have, for all x ∈S,
v∗(x) = maxa∈Γ(x)n
(x−a)β+ρP
y∈Sv∗(y)γ(a,y) o
(2).
Stochastic Dynamics A. Banerji
Optimal Policy 2
Proof contd.
Also by definition, we have vσ∗(x) =
n
(x−σ∗(x))β+ρP
y∈Svσ∗(y)γ(σ∗(x),y)o (3).
Eq(1) in Eq(3) yields v∗(x) =n
(x−σ∗(x))β+ρP
y∈Sv∗(y)γ(σ∗(x),y)o (4).
Comparing (2) and (4), we get Eq††.
Stochastic Dynamics A. Banerji
Optimal Policy contd.
Proof.
Contd. Conversely, supposeσ∗ satisfies††. So, for X0=x, we can attainv∗(x)by playingσ∗(x)for one period and getting continuation payoffs according tov∗. Extending this, we can get the continuation payoffsv∗(y) in the second stage by playingσ∗(y)then, and getting continuation payoffs according tov∗ from periodt =2 onwards. Sov∗(x)equals the payoff from playing according toσ∗ for 2 periods, then getting continuation payoffs according toρ2P
γ(,y)v∗(y)from the 3rd period on. Extending this toN periods, the payoffvσ∗
N(x)from
the strategy that followsσ∗forNperiods, and
subsequently getting continuation payoffs according to v∗, equalsv∗(x). The difference between this payoff, and the payoff from followingσ∗forever, is thereforeρN times some bounded number. This difference goes to zero as N → ∞. Sov∗(x) =vσ∗(x)
Stochastic Dynamics A. Banerji
Fixed Point Iteration
Note: We’ve seen a more general one-deviation property in game theory for subgame perfect equilibria. The logic is the same.
How to solve forv∗, σ∗? Choose anyv :S→ <, and define the mapTv :S→ <by
Tv(x) = max
a∈Γ(x)
(x−a)β+ρX
y∈S
v(y)γ(a,y)
,x ∈S 4
It will be shown thatT is a contraction with modulusρ, on V ={v :S→ <}with metricd∞.v∗ is the unique (by Banach’s theorem) fixed point ofT. Since(Tn(v))nis a Cauchy sequence (for anyv ∈V), we start with anyv and successively applyT untild∞(Tn(v),Tn−1(v))is small.
Stochastic Dynamics A. Banerji
Value Iteration Algorithm
Start with anyv :S→ <. Set Tolerance. Set Difference.
While Difference>Tolerance computeTv; set Difference
=d∞(Tv,v); setv =Tv. Having arrived at av that is
−close tov∗, solve for av−greedy policyσ. (i.e. σis optimal w.r.t. v: greedy = locally optimal). That is, σ(x) =argmaxa[(x−a)β+ρP
v(y)γ(a,y)],∀x ∈S.
Note that we appeal to a continuity argument (computed v is close tov∗, so optimizer ofv is close to optimizer σ∗)).
For practical purposes, don’t use functionsv and recursions on them; codev as a vector.
Stochastic Dynamics A. Banerji
Howard’s Improvement Algo
I Start with someσ∈Σ.
I Computevσ.
I Compute avσ−greedy policyσ0. That is, for every x ∈S,σ0(x)maximizes, over alla∈Γ(x),
{U(x−a) +ρP
z∈Zvσ(a+z)φ(z)}
I Evaluate the difference||σ−σ0||.
I Setσ =σ0.
I Repeat until successive policies have difference zero.
Note how values are mapped and remapped from round to round.
Stochastic Dynamics A. Banerji
Howard’s Improvement 2
I To evaluatevσ, when starting the optimization problem from a particular initial statex, we may evaluate
PT
t=0ρtMσtrσ(x)for largeT.
I The code writes this as a function that takes an array
‘sigma’ (one action for each state), and returns an arrayv_sigma(one valuevσ(x)for every initial state x).
I The code defines a stochastic kernelpσ(x,y), then uses it to define a functionM_sigmacorresponding to the linear transformationMσ.
I It creates the arrayr_sigma, then steps through 50 terms ofρtMσtrσ, adding each one to the return value.
Stochastic Dynamics A. Banerji
Stochastic Recursive Sequences
Stochastic Recursive Sequence
Xt+1=F(Xt,Wt+1),X0∼ψ
(Wt)t≥0is a sequence of independent r.v.s (shocks) Random variables are functions defined on an outcome space.
One way to model the SRS above: Chance moves once, at the beginning, selecting an outcomeω∈Ω. This determines values for all the shocks as(Wt(ω))t≥0, as well as a realizationX0(ω)ofX0.
(P{ω:X0(ω) =xi}=ψ(xi)).
Thus the entire time path (time series)(Xt(ω))is determined recursively asX1(ω) =F(X0(ω),W1(ω)), X2(ω) =F(X1(ω),W2(ω)),...
From the SRS, we get a stochastic kernelpon S:
p(x,y) =P{F(x,Wt) =y} ≡P{ω∈Ω :F(x,Wt(ω)) =y}
Stochastic Dynamics A. Banerji
MC as SRS
Conversely, here is how we can represent a Markov -(p, ψ)chain as an SRS. We wantX0∼ψ,
X1∼p(X0,dy), and so forth.
We do this in two steps. First, we define a function τ : (0,1]→S, parameterized by a distributionφonS, such that ifW is uniformly distributed on(0,1], then τ(W;φ)has distributionφonS.
Once we have this, we use theW to model shocks.
Suppose(Wt)is an independently uniformly(0,1]
distributed sequence of r.v.s. Then
X0=τ(W0;ψ), andXt+1=F(Xt,Wt+1), where F(Xt,Wt+1) =τ(Wt+1,p(Xt,dy)).
Stochastic Dynamics A. Banerji
Defining τ (W ; φ)
LetS={x1, ...,xN}. Partition(0,1]into subintervals of lengthsφ(xi),i=1, . . . ,N.
We could do this by defining, fori =1, . . . ,N, I(xi, φ) = (Pi−1
s=1φ(xs),Pi
s=1φ(xs)].
Then definez 7→τ(z;φ)by τ(z;φ) =P
x∈Sx1{z ∈I(x;φ)}
Note thatτ is a simple function (a linear combination of indicator functions). Note that we use the functionτ, which has distributionφ, rather than simply the
distributionφitself, in order to get an SRS whereXt+1 has the desired distribution.
Stochastic Dynamics A. Banerji
Path Probabilities
Let(Xt)∞0 be a Markov process withX0∼ψ.
Then(X0,X1)is a random vector distributed on S2=S×S. q2(x0,x1) =P{X0=x0,X1=x1}= P{X0=x0} ∩ {X1=x1}. So,
q2(x0,x1) =P{X0=x0}P{X1=x1|X0=x0}= ψ(x0)p(x0,x1)
Similarly,
q3(x0,x1,x2) =P{X0=x0,x1=x1}P{X2=x2|X0= x0,X1=x1}
=ψ(x0)p(x0,x1)p(x1,x2)The last equality follows from the Markov assumption.
More generally,qT+1(x0, ...,xT) =ψ(x0)ΠTt=0−1p(xt,xt+1)
Stochastic Dynamics A. Banerji
Joint Distributions
Lemma
Let Dt ⊂S, for t =0,1, ...,T . Then P∩t≤T{Xt ∈Dt}=
P
x0∈D0ψ(x0)P
x1∈D1p(x0,x1). . .P
xT∈DT p(xT−1,xT) Proof.
Sketch. LHS equals the sum of probabilities of all paths x= (x0, ...,xT)∈XTt=0Dt. That is,it equals
P
x∈XtDt qT+1(x). By the path probability formula on previous slide,qT+1(x)is a product. Now begin summing over these products by holding fixed(x0, ...,xT−1)and summing over allxT ∈DT; recursively work
backward.
Stochastic Dynamics A. Banerji
Coupling
Coupling - a probabilistic technique to study stochastic stability.
Letpbe a stochastic kernel with Markov matrixM.
Globally stable(P(S),M)characterized byα(pt)>0. For simplicity, what follows deals witht =1 case. NOTE:
α(p)>0 is equivalent to >0, where
≡min
X
y∈S
p(x,y)p(x0,y) :x,x0 ∈S
The idea behind >0 is also that of positive overlap between all pairs of distributions: if we start from different x,x0, there is positive probability that the chains meet next period.
We give a different argument for why >0 implies global stability.
Stochastic Dynamics A. Banerji
Coupling 2
Let(Xt)and(Xt∗)be independent Markov chains
generated byp, whereX0∼ψ, andX0∗∼ψ∗, whereψ∗is any stationary distribution ofM. Consider a Markov process(Xt0), which follows(Xt)until the random time v ≡min{t ≥1:Xt =Xt∗}, and then switches to following (Xt∗).v is known as thecoupling time.
Claim: The distributions forXt,Xt0, are the same, for allt.
(Soψt0 =ψMt,∀t)
We need to show that(Xt0)t≥0is Markov-(p, ψ), just as (Xt)t≥0is. (Note that(Xt∗)t≥0is Markov-(p, ψ∗)). That is, X00 ∼ψ(which is true) andXt+10 ∼p(Xt0,dy),t=0,1,2, ....
Fort<v, this is true sinceXs0 =Xs,s=t,t+1 can be used on either side of the∼equation. Fort ≥v, using the fact thatXt+1∗ ∼p(Xt∗,dy), andXs0 =Xs∗,s=t,t+1 on either side of∼completes the argument.
Stochastic Dynamics A. Banerji
Coupling 3
We want to show that if >0, thenψMt, the marginal distribution of (the arbitrary)Xt, converges toψ∗ (this will also show thatψ∗ is the unique stationary distribution, and establish global stability). First, a lemma.
Lemma
Suppose X and Y are rvs taking values in S, with distributionsφX, φY, then
||φX −φY||∞≡max
x∈S |φX(x)−φY(x)| ≤P{X 6=Y} Proof to be done later. Lemma says that ifX,Y are close, so are their distributions.
Applied to the r.v.sXt0,Xt∗, we have
||ψMt −ψ∗||∞≤P{Xt0 6=Xt∗}
Stochastic Dynamics A. Banerji
Coupling 4
To show that the LHS goes to 0 ast→ ∞, we show that the RHS does. Here’s whereXt0 is used:
P{Xt0 6=Xt∗}=P∩j≤t{Xj 6=Xj∗}
for if the intersection on the RHS is non-empty, thenXt0 andXt∗ would have been coupled and equal by timet.
We show that the term on the RHS goes to 0 ast → ∞.
Theorem
P∩j≤t{Xj 6=Xj∗} ≤(1−)t,∀t∈N
where≡min
X
y∈S
p(x,y)p(x0,y) :x,x0 ∈S
Stochastic Dynamics A. Banerji
Coupling 5
Proof.
(Xt,Xt∗)is a Markov chain onS×S. By independence, the initial condition is
P{(X0,X0∗) =x= (x,s)}=ψ(x).ψ∗(s)(call this ψ×ψ∗(x0)); and the kernel is
k(x,x0) =k((x,s),(x0,s0)) =p(x,x0).p(s,s0).
LettingD={(x,s)∈S×S :x =s},
P∩j≤t{Xj 6=Xj∗}=P∩j≤t{(Xj,Xj∗)∈Dc}. This equals P
x0∈Dcψ×ψ∗(x0)P
x1∈Dck(x0,x1). . .P
xt∈Dc k(xt−1,xt)†
NowP
xt∈Dck(xt−1,xt) =1−P
xt∈Dk(xt−1,xt).
P
xt∈Dk(xt−1,xt) =P
(xt,st)∈Dp(xt−1,xt)p(st−1,st), which, by definition ofD, equals
P
y∈Sp(xt−1,y)p(st−1,y), which is≥. So P
xt∈Dck(xt−1,xt)≤1−. The same argument holds for all terms of†.
Stochastic Dynamics A. Banerji
Proof of the Lemma
Proof
Take any eventB.
P{X ∈B}=P{X ∈B} ∩ {X =Y} +P{X ∈B} ∩ {X 6=Y}
P{Y ∈B}=P{Y ∈B} ∩ {X =Y} +P{Y ∈B} ∩ {X 6=Y}
The first terms on the RHS of both equations are equal.
So
P{X ∈B} −P{Y ∈B}=P{X ∈B} ∩ {X 6=Y}
−P{Y ∈B} ∩ {X 6=Y}
SoP{X ∈B} −P{Y ∈B} ≤P{X ∈B} ∩ {X 6=Y} ≤ P{X 6=Y}
We could switchX andY above as well. So
|P{X ∈B} −P{Y ∈B}| ≤P{X 6=Y}.