• Tidak ada hasil yang ditemukan

Partially Observable Linear Dynamical Systems

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 148-151)

LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS

5.1 Preliminaries .1 Notation.1Notation

5.1.2 Partially Observable Linear Dynamical Systems

In this chapter, we will first study the canonical measurement-feedback linear control systems known as Linear Quadratic Gaussian (LQG) control systems. As their name suggests, these systems have linear dynamics, quadratic control costs, and Gaussian noise disturbances. These systems will be our starting point, and we will consider various generalizations such as without the exact knowledge of the noise covariance, sub-Gaussian noise, and general (strongly) convex cost functions in our results.

In the LQG control systems, we have Θ = (𝐴, 𝐡, 𝐢) with 𝐴 ∈ R𝑛×𝑛, 𝐡 ∈ R𝑛×𝑝, 𝐢 ∈ Rπ‘šΓ—π‘› as the model parameters of a partially observable linear time-invariant dynamical system in the state-space form:

π‘₯𝑑+1= 𝐴π‘₯𝑑+𝐡𝑒𝑑+𝑀𝑑

𝑦𝑑 =𝐢 π‘₯𝑑+𝑧𝑑, (5.1)

whereπ‘₯𝑑 ∈ R𝑛 is the (latent) state of the system, 𝑒𝑑 ∈ R𝑝 is the control input, the observation𝑦𝑑 ∈Rπ‘š is the output of the system,𝑀𝑑 ∼ N (0, π‘Š),and𝑧𝑑 ∼ N (0, 𝑍) are i.i.d. process noise and measurement noise, respectively. Note that for simplicity of presentation, in LQG control systems, we will consider isotropic Gaussian process and measurement noise, i.e.,π‘Š =𝜎2

𝑀𝐼and𝑍 =𝜎2

𝑧𝐼. At each time step𝑑, the system is at stateπ‘₯𝑑 and the agent observes 𝑦𝑑, i.e., imperfect state information. Then, the agent applies a control input𝑒𝑑 and the system evolves toπ‘₯𝑑+1at time step𝑑+1. We will assume that the underlying system is controllable and observable.

Definition 5.1. A linear systemΘ = (𝐴, 𝐡, 𝐢) is(𝐴, 𝐡) controllable if the control- lability matrix,

C(𝐴, 𝐡, 𝑛)= [𝐡 𝐴 𝐡 𝐴2𝐡 . . . π΄π‘›βˆ’1𝐡]

has full row rank. For all𝐻 β‰₯ 𝑛,C(𝐴, 𝐡, 𝐻) defines the extended (𝐴, 𝐡) control- lability matrix. Similarly, a linear system Θ = (𝐴, 𝐡, 𝐢) is 𝐴, 𝐢 observable if the observability matrix,

O(𝐴, 𝐢 , 𝑛) =[𝐢⊀ (𝐢 𝐴)⊀ (𝐢 𝐴2)⊀. . .(𝐢 π΄π‘›βˆ’1)⊀]⊀

has full column rank. For all 𝐻 β‰₯ 𝑛, O(𝐴, 𝐢 , 𝐻) defines the extended (𝐴, 𝐢) observability matrix.

By assuming controllability and observability of the underlying system with state dimension𝑛, we implicitly assume the order of the underlying system is also𝑛, i.e.,

the system is in its minimal representation. We adopt this assumption for ease of presentation. There are several efficient algorithms that find the order of an unknown linear dynamical system [234]. Using these techniques, we can lift the assumption on the order of the system without jeopardizing any performance guarantees.

Notice that unlike the dynamical systems studied in Chapters 3 and 4, in this system the agent does not observe the state, thus it is needed to be estimated. For this setting, in his seminal work, Kalman derived a closed-form expression for Λ†π‘₯𝑑|𝑑 ,Θ, the minimum mean squared error (MMSE) estimate of the underlying stateπ‘₯𝑑using the past information of control inputs and observations, and the model parameters Θ, where Λ†π‘₯0|βˆ’1,Θ =0. This formulation is denoted as the Kalman filter and is efficiently obtained via

Λ†

π‘₯𝑑|𝑑 ,Θ =(πΌβˆ’πΏπΆ)π‘₯ˆ𝑑|π‘‘βˆ’1,Θ+𝐿 𝑦𝑑, (5.2)

Λ†

π‘₯𝑑|π‘‘βˆ’1,Θ =(𝐴π‘₯Λ†π‘‘βˆ’1|π‘‘βˆ’1,Θ +π΅π‘’π‘‘βˆ’1), (5.3) 𝐿 = Σ𝐢⊀

𝐢Σ𝐢⊀+𝜎2

𝑧𝐼 βˆ’1

, (5.4)

whereΞ£is the unique positive semidefinite solution to the following Discrete Alge- braic Riccati Equation (DARE):

Ξ£ = π΄Ξ£π΄βŠ€βˆ’ 𝐴Σ𝐢⊀

𝐢Σ𝐢⊀+𝜎2

𝑧𝐼 βˆ’1

𝐢Σ𝐴⊀+𝜎2

𝑀𝐼 . (5.5)

Σ can be interpreted as the steady state error covariance matrix of state estimation under Θ. There are various equivalent characterizations of the dynamics of the discrete-time linear time-invariant system Θbesides the state-space form given in (5.1) [132, 160, 270]. Note that these representations all have the same second-order statistics. One of the most common forms is the innovations form2 of the system characterized as

π‘₯𝑑+1 = 𝐴π‘₯𝑑+𝐡𝑒𝑑+𝐹 𝑒𝑑

𝑦𝑑 =𝐢 π‘₯𝑑+𝑒𝑑, (5.6)

where 𝐹 = 𝐴 𝐿 is the Kalman gain in the observer form and 𝑒𝑑 is the zero mean white innovation process. In this equivalent representation of the system, the state π‘₯𝑑 can be seen as the estimate of the state in the state space representation, which is the expression stated in (5.3), i.e., the MMSE estimate of state π‘₯𝑑 given

2For simplicity, all of the system representations are presented for the steady-state of the system.

Note that the system converges to the steady state exponentially fast [211].

(π‘¦π‘‘βˆ’1, . . . , 𝑦0, π‘’π‘‘βˆ’1, . . . , 𝑒0). In the steady state, 𝑒𝑑 ∼ N 0, 𝐢Σ𝐢⊀+𝜎2

𝑧𝐼

. Using the relationship between 𝑒𝑑 and 𝑦𝑑, we obtain the following characterization of the systemΘ, known as the predictor form of the system,

π‘₯𝑑+1= 𝐴π‘₯Β― 𝑑+𝐡𝑒𝑑+𝐹 𝑦𝑑

𝑦𝑑 =𝐢 π‘₯𝑑+𝑒𝑑, (5.7)

where ¯𝐴= π΄βˆ’πΉ 𝐢and𝐹 = 𝐴 𝐿. Notice that at steady state, the predictor form allows the current output𝑦𝑑to be described by the history of inputs and outputs with an i.i.d.

Gaussian disturbance𝑒𝑑 ∼ N 0, 𝐢Σ𝐢⊀+𝜎2

𝑧𝐼

. In our results, we exploit these fun- damental properties to estimate the underlying system, even with feedback control.

The predictor form dynamics given in (5.7) belong to a larger class of dynami- cal systems named Autoregressive Exogenous (ARX) systems. ARX systems are central dynamical systems in time-series modeling due to input-output form rep- resentation as given in (5.7). Due to their ability to approximate linear systems in a parametric model structure, they have been crucial in many areas including chemical engineering, power engineering, medicine, economics, and neuroscience [23, 42, 87, 118, 209]. These models provide a general representation of LDS witharbitrarystochastic disturbances. In our study, besides LQG control systems in predictor form (5.7) with 𝑒𝑑 being the innovation process, we will consider the general setting of dynamical systems of the form (5.7) with sub-Gaussian 𝑒𝑑 and arbitrary ¯𝐴and𝐹.

Definition 5.2. A matrixπ‘€βˆˆR𝑛×𝑛 (πœ…, 𝛾)-stable forπœ… β‰₯ 0and0< 𝛾 ≀ 1if there ex- ists a similarity transformation𝑀 =π‘†Ξ›π‘†βˆ’1whereβˆ₯𝑆βˆ₯ βˆ₯π‘†βˆ’1βˆ₯ ≀ πœ…andβˆ₯Ξ›βˆ₯ ≀ 1βˆ’π›Ύ. We will consider(πœ…1, 𝛾1)open-loop stable LQG control systems. From the definition above, this means that for all π‘˜, βˆ₯π΄π‘˜βˆ₯ ≀ πœ…1(1βˆ’π›Ύ1)π‘˜. Notice that Definition 5.2 is the stability corresponding to the stabilizability definition given in Definition 3.2 in Chapter 3. The stability of 𝐴 is required to have a simply bounded state in the analysis and is not a fundamental requirement in the predictor form nor for the ARX systems. In particular, in the predictor form of LQG, ¯𝐴is stable due to observability assumption and in the ARX systems, we will explicitly assume the stability of ¯𝐴 which captures an extensive number of systems including all detectable partially observable linear dynamical systems [132]. Thus, for the LQG control systems, one can show exponential in dimension bound on stateπ‘₯𝑑similar to the analysis provided in Chapter 3. We leave this analysis for future work.

To summarize, we assume that the underlying system lives in the following set.

Assumption 5.1. The unknown systemΘ = (𝐴, 𝐡, 𝐢)is a member of a setS, such that,

S βŠ†











ο£²









ο£³

Ξ˜β€²=(𝐴′, 𝐡′, 𝐢′, 𝐹′)

𝐴′is(πœ…1, 𝛾1)-stable, (𝐴′, 𝐡′)is controllable, (𝐴′, 𝐢′)is observable, (𝐴′, 𝐹′)is controllable,

max( βˆ₯𝐴βˆ₯,βˆ₯𝐡βˆ₯,βˆ₯𝐢βˆ₯,βˆ₯𝐹βˆ₯ ) β‰€πœ“.











ο£½









ο£Ύ ,

where πœ“ > 0, πœ…1 > 0, and 𝛾1 ∈ (0,1]. In particular, we assume that there exist constantsπœ…2, πœ…3>0and𝛾2, 𝛾3∈ (0,1]such that the systems are(πœ…2, 𝛾2)-stabilizable as defined in Definition 3.2 and 𝐴¯′≔ π΄β€²βˆ’πΉβ€²πΆβ€²is(πœ…3, 𝛾3)-stable for allΞ˜β€²βˆˆ S. Note that (πœ…2, 𝛾2)-stabilizability follows directly from the controllability of the system and the closed-loop stability ofπ΄β€²βˆ’πΉβ€²πΆβ€²also follows from the observability of the system, in other words, it can be considered as the stabilizability property with respect to the filtering problem.

The behavior of an LQG control system or an ARX system is uniquely governed by its Markov parameters, i.e., impulse response.

Definition 5.3 (Markov Parameters). The set of matrices that maps the previous inputs to the output is called input-to-output Markov parameters and the ones that map the previous outputs to the output are denoted as output-to-output Markov parameters of the system Θ. In particular, for the dynamics in (5.1), the set of Markov parameters is defined as 𝐺𝑖

𝑒→𝑦=𝐢 π΄π‘–βˆ’1𝐡,βˆ€π‘– β‰₯ 1. For the predictor form or ARX systems given in (5.7), the matrices that map inputs and outputs to the output are the elements of the Markov operator,G={𝐺𝑖𝑒→𝑦, 𝐺𝑖𝑦→𝑦}𝑖β‰₯1whereβˆ€π‘–β‰₯1, 𝐺𝑖𝑒→𝑦=πΆπ΄Β―π‘–βˆ’1𝐡and𝐺𝑖𝑦→𝑦=πΆπ΄Β―π‘–βˆ’1𝐹which are unique.

In learning the system dynamics, we will aim to learn the Markov parameters of the system since they are uniquely identifiable.

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 148-151)