Lecture 9: Linear Quadratic Control Problems
1 Undiscounted Problem (Deterministic)
Choose (ut)∞0 to
Minimize
∞
X
t=0
(x0tRxt+u0tQut) subject toxt+1 =Axt+But, x0 given.
xtis ann-vector state,utak-vector control,R, Qare symmetric, psd and pd respectively. A and B are n×n and n×k. So, the objective function is quadratic and the state transition law is linear, hence this is called a linear quadratic control problem.
We guess the value function is quadratic: V(x) = −x0P x where P is symmetric psd. The value function satisfies Bellman’s equation.
V(x) = max
u [−x0Rx−u0Qu+V(y)]
wherey =Ax+Bu. Substitute the guess for V(x) and V(y):
−x0P x= max
u [−x0Rx−u0Qu−(Ax+Bu)0P(Ax+Bu)]
Get the optimal interioru on the RHS through an FOC:
−2u0Q−2(Ax+Bu)0P B = 0. Take the transpose, and rearrange.
Qu = −B0P(Ax + Bu) = −B0P Ax − B0P Bu. So (Q +B0P B)u =
−B0P Ax.
Or u=−F x, where F = (Q+B0P B)−1B0P A.
1
Now plug back u=−F x in Bellman’s Equation. We have
−x0P x=−x0Rx−x0F0QF x−[{(A−BF)x}0P{(A−BF)x}]
Or
x0P x=x0[R+A0P A+F0(Q+B0P B)F −F0B0P A−A0P BF]x (∗) Now, notice using symmetry that
F0 = (B0P A)0(Q+B0P B)−1 =A0P B(Q+B0P B)−1. So,
F0(Q+B0P B)F =A0P B(Q+B0P B)−1(Q+B0P B)(Q+B0P B)−1B0P A
=A0P B(Q+B0P B)−1B0P A=A0P BF =F0B0P A.
Thus the last 3 terms of (∗) are the same, and after cancellation, (∗) yields
P =R+A0P A−A0P B(Q+B0P B)−1B0P A (∗∗)
(∗∗) is an algebraic matrix Riccati equation; a functional equation for the matrix P. Under some sufficient conditions on the matrices in the objective function and the transition law, it has a unique solution, and one that can be obtained using the recursion
Pj+1 =R+A0P A−A0PjB(Q+B0PjB)−1B0PjA (∗0)
with P0 being a zero matrix, and proceeding to the limit. In fact, this recursion works to give us the value function for finite-horizon problems as well; only in these, we have to stop afterT iterations, where T is the horizon length.
To solve a linear quadratic problem, therefore, one can derive its algebraic matrix Riccati equation, solve it using a recursion like the one above, and then substitute for it in the matrix F to get the optimal policy function ut =−F xt.
The optimized system then evolves according toxt+1 = (A−BF)xt. It is stable if limt→∞ = 0 starting from anyx0 ∈ <n. In fact, the system is stable if all eigenvalues of (A−BF) are less than 1 in absolute value. (This is easy to see in the case that all eigenvalues are distinct and less than 1 in absolute value: then, (A−BF) =DΛD0 where Λ is the diagonal matrix of eigenvalues and D a matrix of eigenvectors. Then xt+1 = DΛD0xt = DλtD0x0, which converges to 0 as t→ ∞).
Sufficient conditions for this are discussed in the literature, and ensure a unique solution for the algebraic matrix Riccati equation as well.
2 Discounted Problem (deterministic)
Choose (ut)∞0 to
Maximize −P∞t=0βt(x0tRxt+u0tQut)
subject toxt+1 =Axt+But, x0 given, 0< β <1.
We guess V(x) = −x0P x for a symmetric psd matrix P. SO V(y) =
−y0P y =−(Ax+bu)0P(Ax+Bu). Substitute these in Bellman’s Equation:
−x0P x= max
u [−x0Rx−u0Qu−β{(Ax+Bu)0P(Ax+Bu)}]
The interior FOC wrtu is:
−2u0Q−β{2(Ax+Bu)0P B}= 0, orQu=−βB0P(Ax+Bu).
So, (Q+βB0P B)u=−βB0P Ax, or u=−F x, where F =β(Q+βB0P B)−1B0P A (#).
Substitute this in the Bellman Equation:
−x0P x= [−x0Rx−x0F0QF x−β{(Ax−BF x)0P(Ax−BF x)}]
Or
x0P x=x0[R+βA0P A+F0(Q+βB0P B)F −βA0P BF −βF0B0P A]x and using a cancelation akin to that in the undiscounted case, we get
P =R+βA0P A−β2A0P B(Q+βB0P B)−1B0P A (##)
So, value iteration in the discounted case iterates recursively to solve the algebraic Riccati equation (##), and then uses the resultingP in (#) to get the optimal policy function.
3 Stochastic Optimal Linear Regulator
The problem is to Maximize −E0[P∞t=0βt(x0tRxt+u0tQut)]
subject toxt+1 =Axt+But+Cwt+1,x0 given, 0< β <1, wherewt+1 is an n-random vector that is i.i.d. N(0,I).
The choice is now not one of a sequence of ut vectors, t = 0,1,2, ... ut
is now permitted to depend on the entire history up until time t. Without nonstandard decision behavior, it can depend on the sequence w1, . . . , wt of realized shocks up to that point. Since Bellman’s equation will apply to the value function, ut will depend only on the state. In this particular problem, the shocks w1, . . . , wt will all be incorporated in the state xt.
Theorem 1 The value function for this problem isV(x) = −x0P x−d, where P is the unique symmetric psd solution of the discounted algebraic Riccati equation:
P =R+βA0P A−β2A0P B(Q+βB0P B)−1B0P A .
d= β
1−βtrace(P CC0) .
The optimal policy is ut =−F xt, where F =β(Q+βB0P B)−1B0P A.
Note. P in the value function, and the optimal policy function (charac- terized by F) are the same as in the discounted deterministic problem.
Proof of Theorem 1.
Substitute the guess for V(y) in the RHS of Bellman’s Equation:
V(x) = max
u [−x0Rx−u0Qu−βE{(Ax+Bu+Cw)0P(Ax+Bu+Cw)} −βd]
Inside the braces on the RHS we have
x0A0P Ax+x0A0P Bu+x0A0P Cw+u0B0P Ax+u0B0P Bu+u0B0P Cw +w0C0P Ax+w0C0P Bu+w0C0P Cw
Taking expectations of the above, the terms 3,6,7 and 8 evaluate to zero since E(w|x) = 0. Moreover, x0A0P Bu is a real number and hence equals its transpose u0B0P Ax. So we have:
V(x) = max
u −[x0Rx+u0Qu+βx0A0P Ax+ 2βx0A0P Bu+βu0B0P Bu+βE(w0C0P Cw)]−βd The interior FOC wrtu is therefore:
−[2u0Q+ 2βx0A0P B+ 2βu0B0P B] = 0 or (Q+βB0P B)u=−betaB0P Ax, so the optimal policy is
u = −β(Q+βB0P B)−1B0P Ax = −F x. Note that if P turns out to be the same as in the deterministic case, so will this optimal policy function.
Plug this optimal policy back in Bellman’s Equation. Note also that E(w0C0P Cw) = trace(P CC0) (proof provided later). We have:
-x’Px - d = - {x’Rx + x’F’QFx + βx0A0P Ax−2βx0A0P BF x +βx0F0B0P BF x+βtrace(P CC0)} −βd Equating the constants on both sides,d=βtrace(P CC0) +βd or d= 1−ββ trace(P CC0) as was to be shown.
Now equate the terms containing x on both sides. Note that F = β(Q+βB0P B)−1B0P A, so, using symmetry of Q and B0P B, we have F0 = βA0P B(Q+βB0P B)−1.
Terms 2, 4 and 5 on the RHS together are
x0F0QF x−2βx0A0P BF x+βx0F0B0P BF x, which equals:
β2x0A0P B(Q+βB0P B)−1Q(Q+βB0P B)−1B0P Ax
−2β2x0A0P B(Q+βB0P B)−1B0P Ax
+β2x0A0P B(Q+βB0P B)−1B0P B(Q+βB0P B)−1B0P Ax Add the first and the third of these terms:
β2x0AP B(Q+βB0P B)−1(Q+βB0P B)(Q+βB0P B)−1B0P Ax So this minus the second term yeilds−β2x0A0P B(Q+βB0P B)−1B0P Ax.
Now the LHS-RHS comparison of the x terms of Bellman’s Equation yields P =R+βA0P A−β2A0P B(Q+βB0P B)−1B0P A
Note. The proof used the fact that E(w0C0P Cw) = trace(P CC0). This is due to the following result about expectations of quadratic forms of random vectors.
Proposition 1 Let x be a random n-vector with mean µ and Covariance matrix Σ. Then
E(x0Ax) = trace(AΣ) +µ0Aµ Proof. x0Ax =Pni,j=1aijxixj. So,
E(x0Ax) =Pni,j=1aijE(xixj), due to the linearity of the expectations op- erator. Since σij ≡Cov(xi, xj) = E(xixj)−µiµj),
E(x0Ax) =Pi,jaij(σij +µiµj)
=Pi,jaijσji+Pi,jaijµiµj, since σij =σji.
This equals Pni=1hPnj=1aijσjii+µ0Aµ.
Note that in the first of these terms, the inner sum is the dot product of the ith row of A and the ith column of Σ, which equals the ith diagonal element of AΣ. So the first term evaluates to trace(AΣ).
The Class lq.py in quantecon implements solutions to finite and infinite horizon linear quadratic problems (both deterministic and stochastic(with iid Normal shocks)). The object attributes of this class include the matrices R, Q, A, B, C and the discount factor β. The class has a method to solve for P, F, d, and a method to generate times series for xt etc. under the assumption that the decision maker uses the optimal policy function ut =
−F xt.
We move to this next and use it to solve the problem of an infinitely lived monopoly with output adjustment costs, facing stochastic demand each period.
4 Monopoly with Adjustment Costs
An infinitely lived monopoly at each time t faces the inverse demand pt = a0 −a1qt+dt, where (dt) follows
dt+1 =ρdt+σwt+1, with (wt) iid standard normal. The dtis an intercept shifter.
The monopolist chooses (qt) (functions of realized shocks up to that time) to Maximize E{P∞t=0βtπt}, where
πt=ptqt−cqt−γ(qt+1−qt)2.
The last term captures convex adjustment costs of changing output. If γ = 0, the monopolist would simply choose the static profit maximizing output ¯qt= (a0−c+dt)/2a1 in periodt. And this would fluctuate according to movements in dt. But if γ >> 0, it’s quite costly to adjust output, so qt
would respond less frenetically to shocks in dt, smoothing output and price fluctuations.
Now cast this as an LQ control problem. πt suggests that we can let ut =qt+1 −qt, so that with Q =γ, u0tQut = γ(qt+1)−qt)2. Now for the state vector.
Notice that (pt−c)qt= (a0−a1qt+dt−c)qt =−a1qt2+ (a0−c+dt)qt. This equals−a1qt2+ (2a1q¯t)qt. We have eliminatedcand dtby bringing in ¯qt. This suggests we complete the square by adding −a1q¯t2, to get a quadratic form −a1q2t + (2a1q¯t)qt−a1q¯t2 in the vector (¯qt, qt), and this vector can then be the state vector.
We will need to write the evolution of the state vector. By choice of ut, qt+1 =qt+ut. And
¯
qt+1 = (a0 −c+dt+1)/2a1 = m0 +m1dt+1, which equals m0+m1(ρdt+ σwt+1). This equals
m0+m1ρq¯tm−m0
1
+m1σwt+1. So,
¯
qt+1 =m0(1−ρ) +ρ¯qt+m1σwt+1.
The first, constant term on the RHS creates a problem: In order to write the transition lasxt+1 =Axt+But+Cwt+1 to incorporate the constant term
for the ¯qt+1 evolution, we need to expand the state vector.
So, letxt = (¯qt qt 1). Let ˆ
πt=πt−a1q¯2t =−a1(qt−q¯t)2−γu2t. We minimize E0
"∞ X
t=0
βt(a1(qt−q¯t)2+γu2t)
#
.
Notice that
a1(qt−qt2)2 = x0tRxt, where the symmetric matrix R has r11 = r22 = 1, r12 =r21=−1, and all other entries equal 0. As stated earlier, Q=γ. The objective function is then
E0
"∞ X
t=0
βt(x0tRxt+u0tQut)
#
.
The transition law is
xt+1 =Axt+But+Cwt+1 wherext+1 = (¯qt+1 qt+1 1)0,ut =qt+1−qt,
A=
ρ 0 m0(1−ρ)
0 1 0
0 0 1
B = (0 1 0)0 and C = (m1σ 0 0)0.
We now have the object attributesR, Q, A, B, C, β to use for this model to be an instance of the Class LQ.