• Tidak ada hasil yang ditemukan

1 Introduction

N/A
N/A
Protected

Academic year: 2024

Membagikan "1 Introduction"

Copied!
10
0
0

Teks penuh

(1)

Lec 6 Infinite State Space

1 Introduction

We have a the state space S ⊂ <, and a shock space Z ⊂ <. The state to- morrow evolves stochastically as a function of today’s state and a realization of a shock, according to the function F : S ×Z → S. We have the SRS (stochastic recursive sequence)

Xt+1 =F(Xt, Wt+1), X0 ∼Ψ,(Wt) iid ∼Φ (1) Here, Φ is now a cumulative distribution function, as is Ψ, and X0 is independent of the shocks. Xt and Wt+1 are independent since Xt depends only on W1, . . . , Wt, which are all independent of Wt+1.

Example 1 Stochastic Solow-Swan model.

kt+1=F(kt, Wt+1) = sf(kt, Wt+1) + (1−δ)kt

where s is the savings rate, δ is the rate of depreciation, the state space S =<+ or (0,∞), and the shock space Z = (0,∞).

The file srs.py defines a class SRS to implement the SRS in Equation 1.

An instance of it requires a function F as in Equation 1, a distribution φ for the shocks, and an initial condition for X. There’s an ‘update’ function using F, which is used recursively in a sample path function.

1

(2)

The file solowtest.py creates an instance of this by first defining F as in the Solow-Swan model of example 2; specifying φ to be lognormal, choos- ing an initial X, and putting all these things in the instance solow srs = SRS(F = , φ= ,X = ). A command of the form solow srs.sample path(n) then gives a (capital) time series of length n for the model.

1.1 Distribution Dynamics

We’re interested in tracking the distribution of Xt defined in Equation 1.

Using a Markov matrix is not possible with an infinite state space. We use simulation instead.

For a givent, we sample Xt a large number of times. SampleX0 fromψ randomly each time, and use class SRS to generate a large number of time series of length t. Then use the empirical distribution function of the Xts.

The empirical distribution functionFnof a r.v. X, from a sample (X1, . . . , Xn) of size n, specifies, for each x, Fn(x) to be the proportion of observations of the sample that are less than or equal to x. That is

Fn(x) = 1 n

n

X

i=1

1{Xi ≤x}, ∀x∈ < (2) By the law of large numbers, Fn(x) converges in probability to F(x), for each x, whereF is the distribution of the r.v. X.

The file ecdf defines a class ECDF; at the moment, this just has one func- tion, the empirical distribution function. The file solowtest.py computes the empirical distribution function ofktfort= 20 from a sample of 1000kts gen- erated by using SRS, generating 1000 time series of length 20 (independently sampling the initial condition each time), and retaining the final observation kt for these time series, then feeding these to ECDF.

(3)

We then plot this empirical distribution function using plot(). For the X-axis, we have the observations vector, and for the Y-axis, we have ECDF evaluated at all points of this vector.

The file threshold ext.py produces time series for the threshold external- ities model with a lognormal output shock. You can start with capital stock below and above the threshold, plot say a 1000 long time series for these two initial conditions, and observe the first time at which the low initial capital stock series has akt that exceeds the threshold capital stockkb (first passage time). And you can do other stuff, e.g. ECDF at time t, and so forth.

1.2 Density Dynamics

We use a slightly more specific functional form than Equation 1, mainly to get a short expression for the stochastic kernel.

Xt+1 =g(Xt) +Wt+1, X0 ∼ψ,(Wt)∼ iid φ (3) with Z =S =<, and where ψ, φ are densities on <. Then the marginal densities ψt of Xt, t= 1,2, . . . follow the recursion

ψt+1(y) =

Z

p(x, y)ψt(x)dx, where p(x, y) = φ(y−g(x)) (4) We have taken the joint probability density that the state at time t is x and at time t+ 1 is y, and integrated out x. p(x, y) = φ(y−g(x)) since the shock W is distributed according to φ, andW =y−g(x) is the relevant inverse function.

Simulating densities

Using Equation 4 directly is not efficient; one needs to do the recursive integration on a grid, and so forth. Nor can one differentiate the empirical dis-

(4)

tribution function that we computed earlier; on the differentiable stretches, the derivative is zero.

One standard thing in other contexts is to generate a sample of Xts and estimate a kernel density. Indeed, this is akin to taking a discretederivative using the empirical distribution. Suppose we have a sample (Y1, . . . , Yn) of points, and let Fn be the empirical distribution function. Consider the discrete derivative at x:

f(x) =ˆ Fn(x+h)−Fn(x−h) 2h

This is the proportion of the Yis that lie within a band of length h on each side of x, divided by 2h. This equals

1 2nh

n

X

i=1

1

(|Yi−x|

h ≤1

)

, or

fˆ(x) = 1 nh

n

X

i=1

k

x−Yi h

wherekis the uniform density on [−1,1], thus equalling 1/2 ifYi is within h of x on either side, and 0 otherwise. ˆf is an example of a kernel density, and k is an example of a kernel function. More generally, a kernel function is any function on < satisfying R−∞ k(x)dx = 1 (so if such a function is nonnegative, then it is a density). h is the bandwidth, which affects the degree of smoothing. h can be a function of sample size.

For the Markov problem, there is an even better way, from the point of view of finite-sample and asymptotic properties: this is to use thelook-ahead estimator ψnt of ψt. This is defined as

ψtn(y) = 1 n

n

X

i=1

p(xit−1, y), y ∈ < (5)

(5)

where (xit−1)ni=1 is a sample ofnindependent draws ofXt−1, andp(x, y) = φ(y−g(x)).

We have the following

Lemma 1 The look-ahead estimatorψnt is pointwise unbiased and consistent for ψt.

Proof. Fix y ∈ S. We want to first show that Eψtn(y) = ψt(y). This follows from

Ep(Xt−1, y) =

Z

p(x, y)ψt−1(x)dx=ψt(y)

Hence the pointwise unbiasedness. Moreover, we know that the sample mean of an iid sample of r.v.s (here, of (Xt−1i )ni=1) is a consistent estimator of the mean.

1.3 Stationary Densities

ψ is a stationary density for (Xt) given by Equation (3) if it satisfies

ψ(y) =

Z

p(x, y)ψ(x)dx (y∈ <) (6) The SRS is globally stable if there is a unique stationary density. A law of large numbers applies to each Markov Chain generated by p(x, y), if there is global stability: Then, for any real-valued function h s.t. R|h(x)|ψ(x)dx is finite, we have

1 n

n

X

t=1

h(Xt)→

Z

h(x)ψ(x)dx as n → ∞ (7) We can use two methods to approximate the stationary distribution ψ, arising out of the law of large numbers.

(6)

The less powerful one is to use the empirical distribution for a long Markov chain from the SRS. For, letting h(x) = 1{Xt ≤x}, notice that as n→ ∞

Fn(x) = 1 n

n

X

t=1

1{Xt≤x} →

Z

1{Xt ≤x}ψ(y)dy= Ψ(x)

The above method approximates the stationary distribution. A more powerful method uses the stochastic kernel at every step of the Markov chain, and approximates the stationary density. Notice by the law of large numbers that as n→ ∞,

ψn(y)≡ 1 n

n

X

t=1

p(Xt, y)→

Z

p(x, y)ψ(x)dx=ψ(y)

For instance consider the stochastic Solow Swan model with δ = 1. In a sense this is a trivial example because with lognormal shocks, it is easy to work out the stationary distribution analytically. Nevertheless, consider this model.

kt+1 =skαtWt+1

with the Ws being iid φ (for concreteness, say lognormal). Conditioning on kt (or x), we have

p(x, y) =φ(y/sxα)(1/sxα)

We can first generate a long Markov chain (xt), then evaluate (1/n)Pnt=1p(xt, y) over a grid of y’s to approximateψ.

2 Optimal Growth

2.1 Optimization

Output is the state variable. It evolves according to yt+1 = f(kt, Wt+1), where the shocks Wt are iid distributed according to φ on Z = (0,∞). The

(7)

agent uses a policy σ that maps from output yt to savings kt, with the rest being consumed. So, capital is used up completely in production. σ satisfies 0≤σ(y)≤y for every y∈S, and the set of feasible maps is Σ.

Eachσ induces an SRS

yt+1 =f(σ(yt), Wt+1), (Wt) iid φ, y0 = y (8) yis the initial output or income. The agent has a felicity function U and discount factor β ∈(0,1), and maximizes vσ(y) over allσ ∈Σ, where

vσ(y)≡E

" X

t=0

βtU(yt−σ(yt))|y0 =y

#

(9) We are assuming thatU is bounded and continuous and f is continuous.

The expectations operator can be taken inside:

vσ(y) =

X

t=0

βtE[U(yt−σ(yt))|y0 =y]

The simplification being that now the expectations are with respect to the marginal densities at each time t, and hence are integrals over < rather than a complicated set of paths.

The value function v, defined by v(y) = sup{vσ(y) : σ ∈ Σ}, for all y ∈ S, satisfies a Bellman equation. Letting Γ(y) = [0, y] be the feasible actions /savings when output is y, the Bellman equation here is

v(y) = max

k∈Γ(y)

U(y−k) +β

Z

v(f(k, z))φ(z)dz

(y∈S) (10) Note that the choice ofktoday determines the distribution of states (here, outputs) f(k, z) tomorrow, via the random shockz. So each state f(k, z) is weighted with density φ(z).

(8)

It can also be shown that v is continuous. On the set of bounded con- tinuous real valued functions w∈bcS, σ∈Σ is w−greedy if

σ(y)∈argmaxk∈Γ(y)

U(y−k) +β

Z

w(f(k, z))φ(z)dz

(y∈S) (11) We can show that a policy function σ is optimal if and only if it is v−greedy.

Define the Bellman operator T carrying maps w ∈ bcS to T w ∈ bcS as follows:

T w(y) = max

k∈Γ(y)

U(y−k) +β

Z

w(f(k, z))φ(z)dz

(y∈S) (12) We can show thatT is a uniform contraction with modulus βon the met- ric space (bcS, d), where d(v, w) = supy∈S|v(y)−w(y)|. So, by Banach’s theorem, the value function v is the unique fixed point of the mapT. Value iteration is also suggested by this, as a method to approximate v. Following which, a v−greedy policy may be computed.

2.2 Fitted Value Iteration

The value function is now over an infinite state space: there’s one maxi- mization problem for each such state. One way around this was discussed in the first part of Chapter 3 of Ljungvist and Sargent: convert the problem into a finite state problem with a very large number of states, representing a grid over <. The approximation properties could be poor or okay; we don’t discuss this here.

Another alternative is to use functions that can be stored with a finite set of parameters: polynomials. But again, w 7→ T w may not be easy to

(9)

represent with lower order polynomials. Orthogonal polynomials may be viable, but we don’t do this here.

Yet another alternative is fitted value iteration. Start with some value function w, and on a suitable grid G of points (states), evaluate T w. Then extend this to the rest of the state space, using interpolation. The simplest is linear interpolation, which joins the set of pointsP = (w(G), T w(G)) with lines. If (x1, y1),(x2, y2) are two points in P, and we wish to evaluate the linear interpolant at x∈(x1, x2), it’s the point y s.t. y−y1 = xy2−y1

2−x1(x−x1).

While other interpolations (e.g. a cubic spline) are smoother, more ac- curate etc., what we are after is how the Bellman iterations converge: and this property is good for linear interpolation and need not be good for other interpolations even though they may be more accurate interpolations.

Thus consider the composition ˆT =L◦T whereT is the Bellman operator and L is the interpolation. It can be shown that L is non-expansive. Then, since T is uniformly contracting, so will be L◦T. Indeed, for the modulus λ w.r.t. which T is uniformly contracting, we have for any value functions v, w:

d(L(T(v)), L(T(w)))≤d(T(v), T(w))≤λd(v, w)

where the first inequality is due to the non-expansiveness of L. So, by Banach’s theorem, any sequence ( ˆTn(w))n is Cauchy and will converge in the metric space of bcS. (Question: How do we know convergence is to the value function, the limit of (Tn(w))n?).

Fitted Value Iteration is implemented in fvi.py. This uses the scipy func- tion interp that does linear interpolation. Alternatively, you could follow Stachurski and work with the LinInterp class.

(10)

2.3 Fitted Policy Iteration

The idea is the same as policy iteration in the case of finite state processes.

Start with a policy σ, evaluate the value vσ from following it, then find a vσ-greedy policy, and iterate on.

How do we evaluate vσ(y) =Pt=0βtE(U(yt−σ(yt)))?

One alternative is to evaluate, for eacht= 1,2, . . . , T (T large),E(U(yt− σ(yt))) by a Monte Carlo simulation: get a large random sample of yt’s (e.g.

by using the class SRS), and then compute the mean of the (U(yt−σ(Yt)))’s.

Note that we must do this for a grid of initial states and interpolate or something to get the function.

Stachurski suggests, for the present single state variable problem, an al- ternative. This is based on the following iteration. Forσ ∈Σ, defineTσ that maps a value function w∈bcS intoT w ∈bcS:

Tσw(y) =U(y−σ(y)) +β

Z

w(f(σ(y), z)φ(z)dz (y∈S) (13) It can be shown that Tσ is uniformly contracting with modulus β. We can see that the unique fixed point is vσ. Based on this and Listing 6.6, do a fitted policy iteration for the growth problem, for homework during Spring Break (i.e. Stachurski exercise 6.2.4).

Referensi

Dokumen terkait