1 Undiscounted Problem (Deterministic)

(1)

Lecture 9: Linear Quadratic Control Problems

1 Undiscounted Problem (Deterministic)

Choose (u_t)^∞₀ to

Minimize

∞

X

t=0

(x⁰_tRxt+u⁰_tQut) subject tox_t+1 =Ax_t+Bu_t, x₀ given.

xtis ann-vector state,utak-vector control,R, Qare symmetric, psd and pd respectively. A and B are n×n and n×k. So, the objective function is quadratic and the state transition law is linear, hence this is called a linear quadratic control problem.

We guess the value function is quadratic: V(x) = −x⁰P x where P is symmetric psd. The value function satisfies Bellman’s equation.

V(x) = max

u [−x⁰Rx−u⁰Qu+V(y)]

wherey =Ax+Bu. Substitute the guess for V(x) and V(y):

−x⁰P x= max

u [−x⁰Rx−u⁰Qu−(Ax+Bu)⁰P(Ax+Bu)]

Get the optimal interioru on the RHS through an FOC:

−2u⁰Q−2(Ax+Bu)⁰P B = 0. Take the transpose, and rearrange.

Qu = −B⁰P(Ax + Bu) = −B⁰P Ax − B⁰P Bu. So (Q +B⁰P B)u =

−B⁰P Ax.

Or u=−F x, where F = (Q+B⁰P B)⁻¹B⁰P A.

1

(2)

Now plug back u=−F x in Bellman’s Equation. We have

−x⁰P x=−x⁰Rx−x⁰F⁰QF x−[{(A−BF)x}⁰P{(A−BF)x}]

Or

x⁰P x=x⁰[R+A⁰P A+F⁰(Q+B⁰P B)F −F⁰B⁰P A−A⁰P BF]x (∗) Now, notice using symmetry that

F⁰ = (B⁰P A)⁰(Q+B⁰P B)⁻¹ =A⁰P B(Q+B⁰P B)⁻¹. So,

F⁰(Q+B⁰P B)F =A⁰P B(Q+B⁰P B)⁻¹(Q+B⁰P B)(Q+B⁰P B)⁻¹B⁰P A

=A⁰P B(Q+B⁰P B)⁻¹B⁰P A=A⁰P BF =F⁰B⁰P A.

Thus the last 3 terms of (∗) are the same, and after cancellation, (∗) yields

P =R+A⁰P A−A⁰P B(Q+B⁰P B)⁻¹B⁰P A (∗∗)

(∗∗) is an algebraic matrix Riccati equation; a functional equation for the matrix P. Under some sufficient conditions on the matrices in the objective function and the transition law, it has a unique solution, and one that can be obtained using the recursion

P_j+1 =R+A⁰P A−A⁰P_jB(Q+B⁰P_jB)⁻¹B⁰P_jA (∗⁰)

with P₀ being a zero matrix, and proceeding to the limit. In fact, this recursion works to give us the value function for finite-horizon problems as well; only in these, we have to stop afterT iterations, where T is the horizon length.

To solve a linear quadratic problem, therefore, one can derive its algebraic matrix Riccati equation, solve it using a recursion like the one above, and then substitute for it in the matrix F to get the optimal policy function u_t =−F x_t.

(3)

The optimized system then evolves according tox_t+1 = (A−BF)x_t. It is stable if limt→∞ = 0 starting from anyx₀ ∈ <ⁿ. In fact, the system is stable if all eigenvalues of (A−BF) are less than 1 in absolute value. (This is easy to see in the case that all eigenvalues are distinct and less than 1 in absolute value: then, (A−BF) =DΛD⁰ where Λ is the diagonal matrix of eigenvalues and D a matrix of eigenvectors. Then x_t+1 = DΛD⁰x_t = Dλ^tD⁰x₀, which converges to 0 as t→ ∞).

Sufficient conditions for this are discussed in the literature, and ensure a unique solution for the algebraic matrix Riccati equation as well.

2 Discounted Problem (deterministic)

Choose (u_t)^∞₀ to

Maximize −^P^∞_t=0β^t(x⁰_tRx_t+u⁰_tQu_t)

subject tox_t+1 =Ax_t+Bu_t, x₀ given, 0< β <1.

We guess V(x) = −x⁰P x for a symmetric psd matrix P. SO V(y) =

−y⁰P y =−(Ax+bu)⁰P(Ax+Bu). Substitute these in Bellman’s Equation:

−x⁰P x= max

u [−x⁰Rx−u⁰Qu−β{(Ax+Bu)⁰P(Ax+Bu)}]

The interior FOC wrtu is:

−2u⁰Q−β{2(Ax+Bu)⁰P B}= 0, orQu=−βB⁰P(Ax+Bu).

So, (Q+βB⁰P B)u=−βB⁰P Ax, or u=−F x, where F =β(Q+βB⁰P B)⁻¹B⁰P A (#).

Substitute this in the Bellman Equation:

−x⁰P x= [−x⁰Rx−x⁰F⁰QF x−β{(Ax−BF x)⁰P(Ax−BF x)}]

Or

(4)

x⁰P x=x⁰[R+βA⁰P A+F⁰(Q+βB⁰P B)F −βA⁰P BF −βF⁰B⁰P A]x and using a cancelation akin to that in the undiscounted case, we get

P =R+βA⁰P A−β²A⁰P B(Q+βB⁰P B)⁻¹B⁰P A (##)

So, value iteration in the discounted case iterates recursively to solve the algebraic Riccati equation (##), and then uses the resultingP in (#) to get the optimal policy function.

3 Stochastic Optimal Linear Regulator

The problem is to Maximize −E₀[^P^∞_t=0β^t(x⁰_tRx_t+u⁰_tQu_t)]

subject toxt+1 =Axt+But+Cwt+1,x0 given, 0< β <1, wherew_t+1 is an n-random vector that is i.i.d. N(0,I).

The choice is now not one of a sequence of ut vectors, t = 0,1,2, ... ut

is now permitted to depend on the entire history up until time t. Without nonstandard decision behavior, it can depend on the sequence w1, . . . , wt of realized shocks up to that point. Since Bellman’s equation will apply to the value function, ut will depend only on the state. In this particular problem, the shocks w₁, . . . , w_t will all be incorporated in the state x_t.

Theorem 1 The value function for this problem isV(x) = −x⁰P x−d, where P is the unique symmetric psd solution of the discounted algebraic Riccati equation:

P =R+βA⁰P A−β²A⁰P B(Q+βB⁰P B)⁻¹B⁰P A .

(5)

d= β

1−βtrace(P CC⁰) .

The optimal policy is u_t =−F x_t, where F =β(Q+βB⁰P B)⁻¹B⁰P A.

Note. P in the value function, and the optimal policy function (charac- terized by F) are the same as in the discounted deterministic problem.

Proof of Theorem 1.

Substitute the guess for V(y) in the RHS of Bellman’s Equation:

V(x) = max

u [−x⁰Rx−u⁰Qu−βE{(Ax+Bu+Cw)⁰P(Ax+Bu+Cw)} −βd]

Inside the braces on the RHS we have

x⁰A⁰P Ax+x⁰A⁰P Bu+x⁰A⁰P Cw+u⁰B⁰P Ax+u⁰B⁰P Bu+u⁰B⁰P Cw +w⁰C⁰P Ax+w⁰C⁰P Bu+w⁰C⁰P Cw

Taking expectations of the above, the terms 3,6,7 and 8 evaluate to zero since E(w|x) = 0. Moreover, x⁰A⁰P Bu is a real number and hence equals its transpose u⁰B⁰P Ax. So we have:

V(x) = max

u −[x⁰Rx+u⁰Qu+βx⁰A⁰P Ax+ 2βx⁰A⁰P Bu+βu⁰B⁰P Bu+βE(w⁰C⁰P Cw)]−βd The interior FOC wrtu is therefore:

−[2u⁰Q+ 2βx⁰A⁰P B+ 2βu⁰B⁰P B] = 0 or (Q+βB⁰P B)u=−betaB⁰P Ax, so the optimal policy is

u = −β(Q+βB⁰P B)⁻¹B⁰P Ax = −F x. Note that if P turns out to be the same as in the deterministic case, so will this optimal policy function.

Plug this optimal policy back in Bellman’s Equation. Note also that E(w⁰C⁰P Cw) = trace(P CC⁰) (proof provided later). We have:

(6)

-x’Px - d = - {x’Rx + x’F’QFx + βx⁰A⁰P Ax−2βx⁰A⁰P BF x +βx⁰F⁰B⁰P BF x+βtrace(P CC⁰)} −βd Equating the constants on both sides,d=βtrace(P CC⁰) +βd or d= _1−β^β trace(P CC⁰) as was to be shown.

Now equate the terms containing x on both sides. Note that F = β(Q+βB⁰P B)⁻¹B⁰P A, so, using symmetry of Q and B⁰P B, we have F⁰ = βA⁰P B(Q+βB⁰P B)⁻¹.

Terms 2, 4 and 5 on the RHS together are

x⁰F⁰QF x−2βx⁰A⁰P BF x+βx⁰F⁰B⁰P BF x, which equals:

β²x⁰A⁰P B(Q+βB⁰P B)⁻¹Q(Q+βB⁰P B)⁻¹B⁰P Ax

−2β²x⁰A⁰P B(Q+βB⁰P B)⁻¹B⁰P Ax

+β²x⁰A⁰P B(Q+βB⁰P B)⁻¹B⁰P B(Q+βB⁰P B)⁻¹B⁰P Ax Add the first and the third of these terms:

β²x⁰AP B(Q+βB⁰P B)⁻¹(Q+βB⁰P B)(Q+βB⁰P B)⁻¹B⁰P Ax So this minus the second term yeilds−β²x⁰A⁰P B(Q+βB⁰P B)⁻¹B⁰P Ax.

Now the LHS-RHS comparison of the x terms of Bellman’s Equation yields P =R+βA⁰P A−β²A⁰P B(Q+βB⁰P B)⁻¹B⁰P A

Note. The proof used the fact that E(w⁰C⁰P Cw) = trace(P CC⁰). This is due to the following result about expectations of quadratic forms of random vectors.

(7)

Proposition 1 Let x be a random n-vector with mean µ and Covariance matrix Σ. Then

E(x⁰Ax) = trace(AΣ) +µ⁰Aµ Proof. x⁰Ax =^Pⁿ_i,j=1a_ijx_ix_j. So,

E(x⁰Ax) =^Pⁿ_i,j=1a_ijE(x_ix_j), due to the linearity of the expectations op- erator. Since σ_ij ≡Cov(x_i, x_j) = E(x_ix_j)−µ_iµ_j),

E(x⁰Ax) =^P_i,ja_ij(σ_ij +µ_iµ_j)

=^P_i,ja_ijσji+^P_i,ja_ijµ_iµ_j, since σ_ij =σji.

This equals ^Pⁿ_i=1^h^Pⁿ_j=1a_ijσ_jiⁱ+µ⁰Aµ.

Note that in the first of these terms, the inner sum is the dot product of the ith row of A and the ith column of Σ, which equals the ith diagonal element of AΣ. So the first term evaluates to trace(AΣ).

The Class lq.py in quantecon implements solutions to finite and infinite horizon linear quadratic problems (both deterministic and stochastic(with iid Normal shocks)). The object attributes of this class include the matrices R, Q, A, B, C and the discount factor β. The class has a method to solve for P, F, d, and a method to generate times series for x_t etc. under the assumption that the decision maker uses the optimal policy function u_t =

−F x_t.

We move to this next and use it to solve the problem of an infinitely lived monopoly with output adjustment costs, facing stochastic demand each period.

4 Monopoly with Adjustment Costs

An infinitely lived monopoly at each time t faces the inverse demand p_t = a₀ −a₁q_t+d_t, where (d_t) follows

(8)

d_t+1 =ρd_t+σw_t+1, with (w_t) iid standard normal. The d_tis an intercept shifter.

The monopolist chooses (q_t) (functions of realized shocks up to that time) to Maximize E{^P^∞_t=0β^tπ_t}, where

π_t=p_tq_t−cq_t−γ(q_t+1−q_t)².

The last term captures convex adjustment costs of changing output. If γ = 0, the monopolist would simply choose the static profit maximizing output ¯q_t= (a₀−c+d_t)/2a₁ in periodt. And this would fluctuate according to movements in dt. But if γ >> 0, it’s quite costly to adjust output, so qt

would respond less frenetically to shocks in d_t, smoothing output and price fluctuations.

Now cast this as an LQ control problem. πt suggests that we can let u_t =q_t+1 −q_t, so that with Q =γ, u⁰_tQu_t = γ(q_t+1)−q_t₎². Now for the state vector.

Notice that (p_t−c)q_t= (a₀−a₁q_t+d_t−c)q_t =−a₁q_t²+ (a₀−c+d_t)q_t. This equals−a₁q_t²+ (2a₁q¯_t)q_t. We have eliminatedcand d_tby bringing in ¯q_t. This suggests we complete the square by adding −a₁q¯_t², to get a quadratic form −a₁q²_t + (2a₁q¯_t)q_t−a₁q¯_t² in the vector (¯q_t, q_t), and this vector can then be the state vector.

We will need to write the evolution of the state vector. By choice of u_t, q_t+1 =q_t+u_t. And

¯

q_t+1 = (a₀ −c+d_t+1)/2a₁ = m₀ +m₁d_t+1, which equals m₀+m₁(ρd_t+ σw_t+1). This equals

m₀+m₁ρ^q^¯^t_m^−m⁰

1

+m₁σw_t+1. So,

¯

q_t+1 =m₀(1−ρ) +ρ¯q_t+m₁σw_t+1.

The first, constant term on the RHS creates a problem: In order to write the transition lasx_t+1 =Ax_t+Bu_t+Cw_t+1 to incorporate the constant term

(9)

for the ¯q_t+1 evolution, we need to expand the state vector.

So, letxt = (¯qt qt 1). Let ˆ

π_t=π_t−a₁q¯²_t =−a₁(q_t−q¯_t)²−γu²_t. We minimize E₀

"_∞ X

t=0

β^t(a₁(q_t−q¯_t)²+γu²_t)

#

.

Notice that

a1(qt−q_t²)² = x⁰_tRxt, where the symmetric matrix R has r11 = r22 = 1, r₁₂ =r₂₁=−1, and all other entries equal 0. As stated earlier, Q=γ. The objective function is then

E₀

"_∞ X

t=0

β^t(x⁰_tRx_t+u⁰_tQu_t)

#

.

The transition law is

x_t+1 =Ax_t+Bu_t+Cw_t+1 wherex_t+1 = (¯q_t+1 q_t+1 1)⁰,u_t =q_t+1−q_t,

A=







ρ 0 m₀(1−ρ)

0 1 0

0 0 1







B = (0 1 0)⁰ and C = (m₁σ 0 0)⁰.

We now have the object attributesR, Q, A, B, C, β to use for this model to be an instance of the Class LQ.