LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS
5.1 Preliminaries .1 Notation.1Notation
5.1.2 Partially Observable Linear Dynamical Systems
In this chapter, we will first study the canonical measurement-feedback linear control systems known as Linear Quadratic Gaussian (LQG) control systems. As their name suggests, these systems have linear dynamics, quadratic control costs, and Gaussian noise disturbances. These systems will be our starting point, and we will consider various generalizations such as without the exact knowledge of the noise covariance, sub-Gaussian noise, and general (strongly) convex cost functions in our results.
In the LQG control systems, we have Ξ = (π΄, π΅, πΆ) with π΄ β RπΓπ, π΅ β RπΓπ, πΆ β RπΓπ as the model parameters of a partially observable linear time-invariant dynamical system in the state-space form:
π₯π‘+1= π΄π₯π‘+π΅π’π‘+π€π‘
π¦π‘ =πΆ π₯π‘+π§π‘, (5.1)
whereπ₯π‘ β Rπ is the (latent) state of the system, π’π‘ β Rπ is the control input, the observationπ¦π‘ βRπ is the output of the system,π€π‘ βΌ N (0, π),andπ§π‘ βΌ N (0, π) are i.i.d. process noise and measurement noise, respectively. Note that for simplicity of presentation, in LQG control systems, we will consider isotropic Gaussian process and measurement noise, i.e.,π =π2
π€πΌandπ =π2
π§πΌ. At each time stepπ‘, the system is at stateπ₯π‘ and the agent observes π¦π‘, i.e., imperfect state information. Then, the agent applies a control inputπ’π‘ and the system evolves toπ₯π‘+1at time stepπ‘+1. We will assume that the underlying system is controllable and observable.
Definition 5.1. A linear systemΞ = (π΄, π΅, πΆ) is(π΄, π΅) controllable if the control- lability matrix,
C(π΄, π΅, π)= [π΅ π΄ π΅ π΄2π΅ . . . π΄πβ1π΅]
has full row rank. For allπ» β₯ π,C(π΄, π΅, π») defines the extended (π΄, π΅) control- lability matrix. Similarly, a linear system Ξ = (π΄, π΅, πΆ) is π΄, πΆ observable if the observability matrix,
O(π΄, πΆ , π) =[πΆβ€ (πΆ π΄)β€ (πΆ π΄2)β€. . .(πΆ π΄πβ1)β€]β€
has full column rank. For all π» β₯ π, O(π΄, πΆ , π») defines the extended (π΄, πΆ) observability matrix.
By assuming controllability and observability of the underlying system with state dimensionπ, we implicitly assume the order of the underlying system is alsoπ, i.e.,
the system is in its minimal representation. We adopt this assumption for ease of presentation. There are several efficient algorithms that find the order of an unknown linear dynamical system [234]. Using these techniques, we can lift the assumption on the order of the system without jeopardizing any performance guarantees.
Notice that unlike the dynamical systems studied in Chapters 3 and 4, in this system the agent does not observe the state, thus it is needed to be estimated. For this setting, in his seminal work, Kalman derived a closed-form expression for Λπ₯π‘|π‘ ,Ξ, the minimum mean squared error (MMSE) estimate of the underlying stateπ₯π‘using the past information of control inputs and observations, and the model parameters Ξ, where Λπ₯0|β1,Ξ =0. This formulation is denoted as the Kalman filter and is efficiently obtained via
Λ
π₯π‘|π‘ ,Ξ =(πΌβπΏπΆ)π₯Λπ‘|π‘β1,Ξ+πΏ π¦π‘, (5.2)
Λ
π₯π‘|π‘β1,Ξ =(π΄π₯Λπ‘β1|π‘β1,Ξ +π΅π’π‘β1), (5.3) πΏ = Ξ£πΆβ€
πΆΞ£πΆβ€+π2
π§πΌ β1
, (5.4)
whereΞ£is the unique positive semidefinite solution to the following Discrete Alge- braic Riccati Equation (DARE):
Ξ£ = π΄Ξ£π΄β€β π΄Ξ£πΆβ€
πΆΞ£πΆβ€+π2
π§πΌ β1
πΆΞ£π΄β€+π2
π€πΌ . (5.5)
Ξ£ can be interpreted as the steady state error covariance matrix of state estimation under Ξ. There are various equivalent characterizations of the dynamics of the discrete-time linear time-invariant system Ξbesides the state-space form given in (5.1) [132, 160, 270]. Note that these representations all have the same second-order statistics. One of the most common forms is the innovations form2 of the system characterized as
π₯π‘+1 = π΄π₯π‘+π΅π’π‘+πΉ ππ‘
π¦π‘ =πΆ π₯π‘+ππ‘, (5.6)
where πΉ = π΄ πΏ is the Kalman gain in the observer form and ππ‘ is the zero mean white innovation process. In this equivalent representation of the system, the state π₯π‘ can be seen as the estimate of the state in the state space representation, which is the expression stated in (5.3), i.e., the MMSE estimate of state π₯π‘ given
2For simplicity, all of the system representations are presented for the steady-state of the system.
Note that the system converges to the steady state exponentially fast [211].
(π¦π‘β1, . . . , π¦0, π’π‘β1, . . . , π’0). In the steady state, ππ‘ βΌ N 0, πΆΞ£πΆβ€+π2
π§πΌ
. Using the relationship between ππ‘ and π¦π‘, we obtain the following characterization of the systemΞ, known as the predictor form of the system,
π₯π‘+1= π΄π₯Β― π‘+π΅π’π‘+πΉ π¦π‘
π¦π‘ =πΆ π₯π‘+ππ‘, (5.7)
where Β―π΄= π΄βπΉ πΆandπΉ = π΄ πΏ. Notice that at steady state, the predictor form allows the current outputπ¦π‘to be described by the history of inputs and outputs with an i.i.d.
Gaussian disturbanceππ‘ βΌ N 0, πΆΞ£πΆβ€+π2
π§πΌ
. In our results, we exploit these fun- damental properties to estimate the underlying system, even with feedback control.
The predictor form dynamics given in (5.7) belong to a larger class of dynami- cal systems named Autoregressive Exogenous (ARX) systems. ARX systems are central dynamical systems in time-series modeling due to input-output form rep- resentation as given in (5.7). Due to their ability to approximate linear systems in a parametric model structure, they have been crucial in many areas including chemical engineering, power engineering, medicine, economics, and neuroscience [23, 42, 87, 118, 209]. These models provide a general representation of LDS witharbitrarystochastic disturbances. In our study, besides LQG control systems in predictor form (5.7) with ππ‘ being the innovation process, we will consider the general setting of dynamical systems of the form (5.7) with sub-Gaussian ππ‘ and arbitrary Β―π΄andπΉ.
Definition 5.2. A matrixπβRπΓπ (π , πΎ)-stable forπ β₯ 0and0< πΎ β€ 1if there ex- ists a similarity transformationπ =πΞπβ1whereβ₯πβ₯ β₯πβ1β₯ β€ π andβ₯Ξβ₯ β€ 1βπΎ. We will consider(π 1, πΎ1)open-loop stable LQG control systems. From the definition above, this means that for all π, β₯π΄πβ₯ β€ π 1(1βπΎ1)π. Notice that Definition 5.2 is the stability corresponding to the stabilizability definition given in Definition 3.2 in Chapter 3. The stability of π΄ is required to have a simply bounded state in the analysis and is not a fundamental requirement in the predictor form nor for the ARX systems. In particular, in the predictor form of LQG, Β―π΄is stable due to observability assumption and in the ARX systems, we will explicitly assume the stability of Β―π΄ which captures an extensive number of systems including all detectable partially observable linear dynamical systems [132]. Thus, for the LQG control systems, one can show exponential in dimension bound on stateπ₯π‘similar to the analysis provided in Chapter 3. We leave this analysis for future work.
To summarize, we assume that the underlying system lives in the following set.
Assumption 5.1. The unknown systemΞ = (π΄, π΅, πΆ)is a member of a setS, such that,
S β





ο£²




ο£³
Ξβ²=(π΄β², π΅β², πΆβ², πΉβ²)
π΄β²is(π 1, πΎ1)-stable, (π΄β², π΅β²)is controllable, (π΄β², πΆβ²)is observable, (π΄β², πΉβ²)is controllable,
max( β₯π΄β₯,β₯π΅β₯,β₯πΆβ₯,β₯πΉβ₯ ) β€π.





ο£½




ο£Ύ ,
where π > 0, π 1 > 0, and πΎ1 β (0,1]. In particular, we assume that there exist constantsπ 2, π 3>0andπΎ2, πΎ3β (0,1]such that the systems are(π 2, πΎ2)-stabilizable as defined in Definition 3.2 and π΄Β―β²β π΄β²βπΉβ²πΆβ²is(π 3, πΎ3)-stable for allΞβ²β S. Note that (π 2, πΎ2)-stabilizability follows directly from the controllability of the system and the closed-loop stability ofπ΄β²βπΉβ²πΆβ²also follows from the observability of the system, in other words, it can be considered as the stabilizability property with respect to the filtering problem.
The behavior of an LQG control system or an ARX system is uniquely governed by its Markov parameters, i.e., impulse response.
Definition 5.3 (Markov Parameters). The set of matrices that maps the previous inputs to the output is called input-to-output Markov parameters and the ones that map the previous outputs to the output are denoted as output-to-output Markov parameters of the system Ξ. In particular, for the dynamics in (5.1), the set of Markov parameters is defined as πΊπ
π’βπ¦=πΆ π΄πβ1π΅,βπ β₯ 1. For the predictor form or ARX systems given in (5.7), the matrices that map inputs and outputs to the output are the elements of the Markov operator,G={πΊππ’βπ¦, πΊππ¦βπ¦}πβ₯1whereβπβ₯1, πΊππ’βπ¦=πΆπ΄Β―πβ1π΅andπΊππ¦βπ¦=πΆπ΄Β―πβ1πΉwhich are unique.
In learning the system dynamics, we will aim to learn the Markov parameters of the system since they are uniquely identifiable.