1 Introduction

(1)

Lec 6 Infinite State Space

1 Introduction

We have a the state space S ⊂ <, and a shock space Z ⊂ <. The state tomorrow evolves stochastically as a function of today’s state and a realization of a shock, according to the function F : S ×Z → S. We have the SRS (stochastic recursive sequence)

X_t+1 =F(X_t, W_t+1), X₀ ∼Ψ,(W_t) iid ∼Φ (1) Here, Φ is now a cumulative distribution function, as is Ψ, and X₀ is independent of the shocks. X_t and W_t+1 are independent since X_t depends only on W₁, . . . , W_t, which are all independent of W_t+1.

Example 1 Stochastic Solow-Swan model.

k_t+1=F(k_t, W_t+1) = sf(k_t, W_t+1) + (1−δ)k_t

where s is the savings rate, δ is the rate of depreciation, the state space S =<₊ or (0,∞), and the shock space Z = (0,∞).

The file srs.py defines a class SRS to implement the SRS in Equation 1.

An instance of it requires a function F as in Equation 1, a distribution φ for the shocks, and an initial condition for X. There’s an ‘update’ function using F, which is used recursively in a sample path function.

1

(2)

The file solowtest.py creates an instance of this by first defining F as in the Solow-Swan model of example 2; specifying φ to be lognormal, choos- ing an initial X, and putting all these things in the instance solow srs = SRS(F = , φ= ,X = ). A command of the form solow srs.sample path(n) then gives a (capital) time series of length n for the model.

1.1 Distribution Dynamics

We’re interested in tracking the distribution of X_t defined in Equation 1.

Using a Markov matrix is not possible with an infinite state space. We use simulation instead.

For a givent, we sample X_t a large number of times. SampleX₀ fromψ randomly each time, and use class SRS to generate a large number of time series of length t. Then use the empirical distribution function of the X_ts.

The empirical distribution functionF_nof a r.v. X, from a sample (X₁, . . . , X_n) of size n, specifies, for each x, F_n(x) to be the proportion of observations of the sample that are less than or equal to x. That is

F_n(x) = 1 n

n

X

i=1

1{X_i ≤x}, ∀x∈ < (2) By the law of large numbers, F_n(x) converges in probability to F(x), for each x, whereF is the distribution of the r.v. X.

The file ecdf defines a class ECDF; at the moment, this just has one function, the empirical distribution function. The file solowtest.py computes the empirical distribution function ofk_tfort= 20 from a sample of 1000k_ts generated by using SRS, generating 1000 time series of length 20 (independently sampling the initial condition each time), and retaining the final observation k_t for these time series, then feeding these to ECDF.

(3)

We then plot this empirical distribution function using plot(). For the X-axis, we have the observations vector, and for the Y-axis, we have ECDF evaluated at all points of this vector.

The file threshold ext.py produces time series for the threshold external- ities model with a lognormal output shock. You can start with capital stock below and above the threshold, plot say a 1000 long time series for these two initial conditions, and observe the first time at which the low initial capital stock series has ak_t that exceeds the threshold capital stockk_b (first passage time). And you can do other stuff, e.g. ECDF at time t, and so forth.

1.2 Density Dynamics

We use a slightly more specific functional form than Equation 1, mainly to get a short expression for the stochastic kernel.

X_t+1 =g(X_t) +W_t+1, X₀ ∼ψ,(W_t)∼ iid φ (3) with Z =S =<, and where ψ, φ are densities on <. Then the marginal densities ψ_t of X_t, t= 1,2, . . . follow the recursion

ψ_t+1(y) =

Z

p(x, y)ψ_t(x)dx, where p(x, y) = φ(y−g(x)) (4) We have taken the joint probability density that the state at time t is x and at time t+ 1 is y, and integrated out x. p(x, y) = φ(y−g(x)) since the shock W is distributed according to φ, andW =y−g(x) is the relevant inverse function.

Simulating densities

Using Equation 4 directly is not efficient; one needs to do the recursive integration on a grid, and so forth. Nor can one differentiate the empirical dis-

(4)

tribution function that we computed earlier; on the differentiable stretches, the derivative is zero.

One standard thing in other contexts is to generate a sample of X_ts and estimate a kernel density. Indeed, this is akin to taking a discretederivative using the empirical distribution. Suppose we have a sample (Y₁, . . . , Y_n) of points, and let F_n be the empirical distribution function. Consider the discrete derivative at x:

f(x) =ˆ F_n(x+h)−F_n(x−h) 2h

This is the proportion of the Y_is that lie within a band of length h on each side of x, divided by 2h. This equals

1 2nh

n

X

i=1

1

(|Y_i−x|

h ≤1

)

, or

fˆ(x) = 1 nh

n

X

i=1

k

x−Y_i h

wherekis the uniform density on [−1,1], thus equalling 1/2 ifY_i is within h of x on either side, and 0 otherwise. ˆf is an example of a kernel density, and k is an example of a kernel function. More generally, a kernel function is any function on < satisfying ^R_−∞^∞ k(x)dx = 1 (so if such a function is nonnegative, then it is a density). h is the bandwidth, which affects the degree of smoothing. h can be a function of sample size.

For the Markov problem, there is an even better way, from the point of view of finite-sample and asymptotic properties: this is to use thelook-ahead estimator ψⁿ_t of ψ_t. This is defined as

ψ_tⁿ(y) = 1 n

n

X

i=1

p(xⁱ_t−1, y), y ∈ < (5)

(5)

where (xⁱ_t−1)ⁿ_i=1 is a sample ofnindependent draws ofXt−1, andp(x, y) = φ(y−g(x)).

We have the following

Lemma 1 The look-ahead estimatorψⁿ_t is pointwise unbiased and consistent for ψt.

Proof. Fix y ∈ S. We want to first show that Eψ_tⁿ(y) = ψt(y). This follows from

Ep(Xt−1, y) =

Z

p(x, y)ψt−1(x)dx=ψt(y)

Hence the pointwise unbiasedness. Moreover, we know that the sample mean of an iid sample of r.v.s (here, of (X_t−1ⁱ )ⁿ_i=1) is a consistent estimator of the mean.

1.3 Stationary Densities

ψ^∗ is a stationary density for (X_t) given by Equation (3) if it satisfies

ψ^∗(y) =

Z

p(x, y)ψ^∗(x)dx (y∈ <) (6) The SRS is globally stable if there is a unique stationary density. A law of large numbers applies to each Markov Chain generated by p(x, y), if there is global stability: Then, for any real-valued function h s.t. ^R|h(x)|ψ^∗(x)dx is finite, we have

1 n

n

X

t=1

h(X_t)→

Z

h(x)ψ^∗(x)dx as n → ∞ (7) We can use two methods to approximate the stationary distribution ψ^∗, arising out of the law of large numbers.

(6)

The less powerful one is to use the empirical distribution for a long Markov chain from the SRS. For, letting h(x) = 1{X_t ≤x}, notice that as n→ ∞

F_n(x) = 1 n

n

X

t=1

1{X_t≤x} →

Z

1{X_t ≤x}ψ^∗(y)dy= Ψ^∗(x)

The above method approximates the stationary distribution. A more powerful method uses the stochastic kernel at every step of the Markov chain, and approximates the stationary density. Notice by the law of large numbers that as n→ ∞,

ψ^∗_n(y)≡ 1 n

n

X

t=1

p(X_t, y)→

Z

p(x, y)ψ^∗(x)dx=ψ^∗(y)

For instance consider the stochastic Solow Swan model with δ = 1. In a sense this is a trivial example because with lognormal shocks, it is easy to work out the stationary distribution analytically. Nevertheless, consider this model.

k_t+1 =sk^α_tW_t+1

with the Ws being iid φ (for concreteness, say lognormal). Conditioning on k_t (or x), we have

p(x, y) =φ(y/sx^α)(1/sx^α)

We can first generate a long Markov chain (x_t), then evaluate (1/n)^Pⁿ_t=1p(x_t, y) over a grid of y’s to approximateψ^∗.

2 Optimal Growth

2.1 Optimization

Output is the state variable. It evolves according to y_t+1 = f(k_t, W_t+1), where the shocks W_t are iid distributed according to φ on Z = (0,∞). The

(7)

agent uses a policy σ that maps from output y_t to savings k_t, with the rest being consumed. So, capital is used up completely in production. σ satisfies 0≤σ(y)≤y for every y∈S, and the set of feasible maps is Σ.

Eachσ induces an SRS

y_t+1 =f(σ(y_t), W_t+1), (W_t) iid φ, y₀ = y (8) yis the initial output or income. The agent has a felicity function U and discount factor β ∈(0,1), and maximizes v_σ(y) over allσ ∈Σ, where

v_σ(y)≡E

"_∞ X

t=0

β^tU(y_t−σ(y_t))|y₀ =y

#

(9) We are assuming thatU is bounded and continuous and f is continuous.

The expectations operator can be taken inside:

v_σ(y) =

∞

X

t=0

β^tE[U(y_t−σ(y_t))|y₀ =y]

The simplification being that now the expectations are with respect to the marginal densities at each time t, and hence are integrals over < rather than a complicated set of paths.

The value function v^∗, defined by v^∗(y) = sup{v_σ(y) : σ ∈ Σ}, for all y ∈ S, satisfies a Bellman equation. Letting Γ(y) = [0, y] be the feasible actions /savings when output is y, the Bellman equation here is

v^∗(y) = max

k∈Γ(y)

U(y−k) +β

Z

v^∗(f(k, z))φ(z)dz

(y∈S) (10) Note that the choice ofktoday determines the distribution of states (here, outputs) f(k, z) tomorrow, via the random shockz. So each state f(k, z) is weighted with density φ(z).

(8)

It can also be shown that v^∗ is continuous. On the set of bounded continuous real valued functions w∈bcS, σ∈Σ is w−greedy if

σ(y)∈argmax_k∈Γ(y)

U(y−k) +β

Z

w(f(k, z))φ(z)dz

(y∈S) (11) We can show that a policy function σ^∗ is optimal if and only if it is v^∗−greedy.

Define the Bellman operator T carrying maps w ∈ bcS to T w ∈ bcS as follows:

T w(y) = max

k∈Γ(y)

U(y−k) +β

Z

w(f(k, z))φ(z)dz

(y∈S) (12) We can show thatT is a uniform contraction with modulus βon the metric space (bcS, d∞), where d∞(v, w) = sup_y∈S|v(y)−w(y)|. So, by Banach’s theorem, the value function v^∗ is the unique fixed point of the mapT. Value iteration is also suggested by this, as a method to approximate v^∗. Following which, a v^∗−greedy policy may be computed.

2.2 Fitted Value Iteration

The value function is now over an infinite state space: there’s one maxi- mization problem for each such state. One way around this was discussed in the first part of Chapter 3 of Ljungvist and Sargent: convert the problem into a finite state problem with a very large number of states, representing a grid over <. The approximation properties could be poor or okay; we don’t discuss this here.

Another alternative is to use functions that can be stored with a finite set of parameters: polynomials. But again, w 7→ T w may not be easy to

(9)

represent with lower order polynomials. Orthogonal polynomials may be viable, but we don’t do this here.

Yet another alternative is fitted value iteration. Start with some value function w, and on a suitable grid G of points (states), evaluate T w. Then extend this to the rest of the state space, using interpolation. The simplest is linear interpolation, which joins the set of pointsP = (w(G), T w(G)) with lines. If (x₁, y₁),(x₂, y₂) are two points in P, and we wish to evaluate the linear interpolant at x∈(x₁, x₂), it’s the point y s.t. y−y₁ = _x^y²^−y¹

2−x₁(x−x₁).

While other interpolations (e.g. a cubic spline) are smoother, more accurate etc., what we are after is how the Bellman iterations converge: and this property is good for linear interpolation and need not be good for other interpolations even though they may be more accurate interpolations.

Thus consider the composition ˆT =L◦T whereT is the Bellman operator and L is the interpolation. It can be shown that L is non-expansive. Then, since T is uniformly contracting, so will be L◦T. Indeed, for the modulus λ w.r.t. which T is uniformly contracting, we have for any value functions v, w:

d∞(L(T(v)), L(T(w)))≤d∞(T(v), T(w))≤λd∞(v, w)

where the first inequality is due to the non-expansiveness of L. So, by Banach’s theorem, any sequence ( ˆTⁿ(w))_n is Cauchy and will converge in the metric space of bcS. (Question: How do we know convergence is to the value function, the limit of (Tⁿ(w))_n?).

Fitted Value Iteration is implemented in fvi.py. This uses the scipy function interp that does linear interpolation. Alternatively, you could follow Stachurski and work with the LinInterp class.

(10)

2.3 Fitted Policy Iteration

The idea is the same as policy iteration in the case of finite state processes.

Start with a policy σ, evaluate the value v_σ from following it, then find a v_σ-greedy policy, and iterate on.

How do we evaluate v_σ(y) =^P^∞_t=0β^tE(U(y_t−σ(y_t)))?

One alternative is to evaluate, for eacht= 1,2, . . . , T (T large),E(U(y_t− σ(y_t))) by a Monte Carlo simulation: get a large random sample of y_t’s (e.g.

by using the class SRS), and then compute the mean of the (U(y_t−σ(Y_t)))’s.

Note that we must do this for a grid of initial states and interpolate or something to get the function.

Stachurski suggests, for the present single state variable problem, an alternative. This is based on the following iteration. Forσ ∈Σ, defineT_σ that maps a value function w∈bcS intoT w ∈bcS:

T_σw(y) =U(y−σ(y)) +β

Z

w(f(σ(y), z)φ(z)dz (y∈S) (13) It can be shown that T_σ is uniformly contracting with modulus β. We can see that the unique fixed point is v_σ. Based on this and Listing 6.6, do a fitted policy iteration for the growth problem, for homework during Spring Break (i.e. Stachurski exercise 6.2.4).