Partially Observable Linear Dynamical Systems

LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS

5.1 Preliminaries .1 Notation.1Notation

5.1.2 Partially Observable Linear Dynamical Systems

In this chapter, we will first study the canonical measurement-feedback linear control systems known as Linear Quadratic Gaussian (LQG) control systems. As their name suggests, these systems have linear dynamics, quadratic control costs, and Gaussian noise disturbances. These systems will be our starting point, and we will consider various generalizations such as without the exact knowledge of the noise covariance, sub-Gaussian noise, and general (strongly) convex cost functions in our results.

In the LQG control systems, we have Θ = (𝐴, 𝐵, 𝐶) with 𝐴 ∈ R^𝑛^×^𝑛, 𝐵 ∈ R^𝑛^×^𝑝, 𝐶 ∈ R^𝑚^×^𝑛 as the model parameters of a partially observable linear time-invariant dynamical system in the state-space form:

𝑥_𝑡+₁= 𝐴𝑥_𝑡+𝐵𝑢_𝑡+𝑤_𝑡

𝑦_𝑡 =𝐶 𝑥_𝑡+𝑧_𝑡, (5.1)

where𝑥_𝑡 ∈ R^𝑛 is the (latent) state of the system, 𝑢_𝑡 ∈ R^𝑝 is the control input, the observation𝑦_𝑡 ∈R^𝑚 is the output of the system,𝑤_𝑡 ∼ N (0, 𝑊),and𝑧_𝑡 ∼ N (0, 𝑍) are i.i.d. process noise and measurement noise, respectively. Note that for simplicity of presentation, in LQG control systems, we will consider isotropic Gaussian process and measurement noise, i.e.,𝑊 =𝜎²

𝑤𝐼and𝑍 =𝜎²

𝑧𝐼. At each time step𝑡, the system is at state𝑥_𝑡 and the agent observes 𝑦_𝑡, i.e., imperfect state information. Then, the agent applies a control input𝑢_𝑡 and the system evolves to𝑥_𝑡+1at time step𝑡+1. We will assume that the underlying system is controllable and observable.

Definition 5.1. A linear systemΘ = (𝐴, 𝐵, 𝐶) is(𝐴, 𝐵) controllable if the control- lability matrix,

C(𝐴, 𝐵, 𝑛)= [𝐵 𝐴 𝐵 𝐴²𝐵 . . . 𝐴^𝑛⁻¹𝐵]

has full row rank. For all𝐻 ≥ 𝑛,C(𝐴, 𝐵, 𝐻) defines the extended (𝐴, 𝐵) control- lability matrix. Similarly, a linear system Θ = (𝐴, 𝐵, 𝐶) is 𝐴, 𝐶 observable if the observability matrix,

O(𝐴, 𝐶 , 𝑛) =[𝐶^⊤ (𝐶 𝐴)^⊤ (𝐶 𝐴²)^⊤. . .(𝐶 𝐴^𝑛−1)^⊤]^⊤

has full column rank. For all 𝐻 ≥ 𝑛, O(𝐴, 𝐶 , 𝐻) defines the extended (𝐴, 𝐶) observability matrix.

By assuming controllability and observability of the underlying system with state dimension𝑛, we implicitly assume the order of the underlying system is also𝑛, i.e.,

the system is in its minimal representation. We adopt this assumption for ease of presentation. There are several efficient algorithms that find the order of an unknown linear dynamical system [234]. Using these techniques, we can lift the assumption on the order of the system without jeopardizing any performance guarantees.

Notice that unlike the dynamical systems studied in Chapters 3 and 4, in this system the agent does not observe the state, thus it is needed to be estimated. For this setting, in his seminal work, Kalman derived a closed-form expression for ˆ𝑥_𝑡_|_{𝑡 ,}_Θ, the minimum mean squared error (MMSE) estimate of the underlying state𝑥_𝑡using the past information of control inputs and observations, and the model parameters Θ, where ˆ𝑥₀_|−₁_,_Θ =0. This formulation is denoted as the Kalman filter and is efficiently obtained via

𝑥_{𝑡|𝑡 ,Θ} =(𝐼−𝐿𝐶)𝑥ˆ_{𝑡|𝑡−1}_,Θ+𝐿 𝑦_𝑡, (5.2)

𝑥_𝑡_|_𝑡₋₁_,_Θ =(𝐴𝑥ˆ_𝑡_−1|_𝑡₋₁_,_Θ +𝐵𝑢_𝑡−₁), (5.3) 𝐿 = Σ𝐶^⊤

𝐶Σ𝐶^⊤+𝜎²

𝑧𝐼 ⁻1

, (5.4)

whereΣis the unique positive semidefinite solution to the following Discrete Alge- braic Riccati Equation (DARE):

Σ = 𝐴Σ𝐴^⊤− 𝐴Σ𝐶^⊤

𝐶Σ𝐶^⊤+𝜎²

𝑧𝐼 −1

𝐶Σ𝐴^⊤+𝜎²

𝑤𝐼 . (5.5)

Σ can be interpreted as the steady state error covariance matrix of state estimation under Θ. There are various equivalent characterizations of the dynamics of the discrete-time linear time-invariant system Θbesides the state-space form given in (5.1) [132, 160, 270]. Note that these representations all have the same second-order statistics. One of the most common forms is the innovations form2 of the system characterized as

𝑥_𝑡₊₁ = 𝐴𝑥_𝑡+𝐵𝑢_𝑡+𝐹 𝑒_𝑡

𝑦_𝑡 =𝐶 𝑥_𝑡+𝑒_𝑡, (5.6)

where 𝐹 = 𝐴 𝐿 is the Kalman gain in the observer form and 𝑒_𝑡 is the zero mean white innovation process. In this equivalent representation of the system, the state 𝑥_𝑡 can be seen as the estimate of the state in the state space representation, which is the expression stated in (5.3), i.e., the MMSE estimate of state 𝑥_𝑡 given

2For simplicity, all of the system representations are presented for the steady-state of the system.

Note that the system converges to the steady state exponentially fast [211].

(𝑦_𝑡−₁, . . . , 𝑦₀, 𝑢_𝑡−₁, . . . , 𝑢₀). In the steady state, 𝑒_𝑡 ∼ N 0, 𝐶Σ𝐶^⊤+𝜎²

𝑧𝐼

. Using the relationship between 𝑒_𝑡 and 𝑦_𝑡, we obtain the following characterization of the systemΘ, known as the predictor form of the system,

𝑥_𝑡₊₁= 𝐴𝑥¯ _𝑡+𝐵𝑢_𝑡+𝐹 𝑦_𝑡

𝑦_𝑡 =𝐶 𝑥_𝑡+𝑒_𝑡, (5.7)

where ¯𝐴= 𝐴−𝐹 𝐶and𝐹 = 𝐴 𝐿. Notice that at steady state, the predictor form allows the current output𝑦_𝑡to be described by the history of inputs and outputs with an i.i.d.

Gaussian disturbance𝑒_𝑡 ∼ N 0, 𝐶Σ𝐶^⊤+𝜎²

𝑧𝐼

. In our results, we exploit these fundamental properties to estimate the underlying system, even with feedback control.

The predictor form dynamics given in (5.7) belong to a larger class of dynamical systems named Autoregressive Exogenous (ARX) systems. ARX systems are central dynamical systems in time-series modeling due to input-output form representation as given in (5.7). Due to their ability to approximate linear systems in a parametric model structure, they have been crucial in many areas including chemical engineering, power engineering, medicine, economics, and neuroscience [23, 42, 87, 118, 209]. These models provide a general representation of LDS witharbitrarystochastic disturbances. In our study, besides LQG control systems in predictor form (5.7) with 𝑒_𝑡 being the innovation process, we will consider the general setting of dynamical systems of the form (5.7) with sub-Gaussian 𝑒_𝑡 and arbitrary ¯𝐴and𝐹.

Definition 5.2. A matrix𝑀∈R^𝑛^×^𝑛 (𝜅, 𝛾)-stable for𝜅 ≥ 0and0< 𝛾 ≤ 1if there ex- ists a similarity transformation𝑀 =𝑆Λ𝑆⁻¹where∥𝑆∥ ∥𝑆⁻¹∥ ≤ 𝜅and∥Λ∥ ≤ 1−𝛾. We will consider(𝜅₁, 𝛾₁)open-loop stable LQG control systems. From the definition above, this means that for all 𝑘, ∥𝐴^𝑘∥ ≤ 𝜅₁(1−𝛾₁)^𝑘. Notice that Definition 5.2 is the stability corresponding to the stabilizability definition given in Definition 3.2 in Chapter 3. The stability of 𝐴 is required to have a simply bounded state in the analysis and is not a fundamental requirement in the predictor form nor for the ARX systems. In particular, in the predictor form of LQG, ¯𝐴is stable due to observability assumption and in the ARX systems, we will explicitly assume the stability of ¯𝐴 which captures an extensive number of systems including all detectable partially observable linear dynamical systems [132]. Thus, for the LQG control systems, one can show exponential in dimension bound on state𝑥_𝑡similar to the analysis provided in Chapter 3. We leave this analysis for future work.

To summarize, we assume that the underlying system lives in the following set.

Assumption 5.1. The unknown systemΘ = (𝐴, 𝐵, 𝐶)is a member of a setS, such that,

S ⊆













Θ^′=(𝐴^′, 𝐵^′, 𝐶^′, 𝐹^′)

𝐴^′is(𝜅₁, 𝛾₁)-stable, (𝐴^′, 𝐵^′)is controllable, (𝐴^′, 𝐶^′)is observable, (𝐴^′, 𝐹^′)is controllable,

max( ∥𝐴∥,∥𝐵∥,∥𝐶∥,∥𝐹∥ ) ≤𝜓.











 ,

where 𝜓 > 0, 𝜅₁ > 0, and 𝛾₁ ∈ (0,1]. In particular, we assume that there exist constants𝜅₂, 𝜅₃>0and𝛾₂, 𝛾₃∈ (0,1]such that the systems are(𝜅₂, 𝛾₂)-stabilizable as defined in Definition 3.2 and 𝐴¯^′≔ 𝐴^′−𝐹^′𝐶^′is(𝜅₃, 𝛾₃)-stable for allΘ^′∈ S. Note that (𝜅₂, 𝛾₂)-stabilizability follows directly from the controllability of the system and the closed-loop stability of𝐴^′−𝐹^′𝐶^′also follows from the observability of the system, in other words, it can be considered as the stabilizability property with respect to the filtering problem.

The behavior of an LQG control system or an ARX system is uniquely governed by its Markov parameters, i.e., impulse response.

Definition 5.3 (Markov Parameters). The set of matrices that maps the previous inputs to the output is called input-to-output Markov parameters and the ones that map the previous outputs to the output are denoted as output-to-output Markov parameters of the system Θ. In particular, for the dynamics in (5.1), the set of Markov parameters is defined as 𝐺^𝑖

𝑢→𝑦=𝐶 𝐴^𝑖⁻¹𝐵,∀𝑖 ≥ 1. For the predictor form or ARX systems given in (5.7), the matrices that map inputs and outputs to the output are the elements of the Markov operator,G={𝐺^𝑖_𝑢→𝑦, 𝐺^𝑖_𝑦→𝑦}𝑖≥1where∀𝑖≥1, 𝐺^𝑖_𝑢→𝑦=𝐶𝐴¯^𝑖−¹𝐵and𝐺^𝑖_𝑦→𝑦=𝐶𝐴¯^𝑖−¹𝐹which are unique.

In learning the system dynamics, we will aim to learn the Markov parameters of the system since they are uniquely identifiable.

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 148-151)