Open-Loop System Identification - LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL

LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS

5.2 Open-Loop System Identification

In this section, we study the open-loop system identification methods that are adopted in the literature. In order to minimize the regret given in (5.12), the learning agent needs to efficiently explore the environment to learn the system dynamics, and exploit the gathered experiences to minimize overall cost [171]. However, since the underlying states of the systems are not fully observable, learning the system dynamics with finite time guarantees brings substantial challenges, making it a long- lasting problem in adaptive control. In particular, when the latent states of a system are not fully observable, future observations are correlated with the past inputs and observations through the latent states. These correlations are even magnified when closed-loop controllers, those that naturally use past experiences to come up with control inputs, are deployed. Therefore, more sophisticated estimation methods that consider these complicated and unknown correlations are required for learning the dynamics.

An Open-loop System Identification Method

In recent years, a series of works have studied this learning problem and presented a range of novel methods with finite-sample learning guarantees. These studies propose to employ i.i.d. Gaussian excitation as the control input, i.e., open-loop control, collect system outputs, and estimate the model parameters using the data collected. These methods study the system identification problem using the state- space representation (5.1) and aim to recover the input-to-output Markov parameters

𝐺^𝑖_𝑢→𝑦=𝐶 𝐴^𝑖−1𝐵 introduced in Definition 5.3. The use of i.i.d. Gaussian noise as the open-loop control input (not using past experiences) mitigates the correlation between the inputs and the output observations. For stable systems, these methods provide efficient ways to learn the model dynamics with confidence bounds of O (˜ 1/√

𝑇), after𝑇 times step of agent-environment interaction [166, 213, 234, 245, 269]. Here ˜O (·) denotes the order up to logarithmic factors. Deploying i.i.d.

Gaussian noise for a long period of time to estimate the model parameters has been the common practice in adaptive control since incorporating a closed-loop controller introduces significant challenges to learning the model dynamics [223].

In this section, we review one of such open-loop system identification methods and discuss the reason why methods that use the state-space representation of the system (5.1) cannot provide reliable estimates in closed-loop estimation problems.

Using the state-space representation in (5.1), for any positive integer 𝐻, one can rewrite the output at time𝑡as follows,

𝑦_𝑡 =∑︁𝐻 𝑖=1

𝐶 𝐴^𝑖−1𝐵𝑢_𝑡−𝑖+𝐶 𝐴^𝐻𝑥_𝑡−𝐻+𝑧_𝑡 +∑︁𝐻−1 𝑖=0

𝐶 𝐴^𝑖𝑤_{𝑡−𝑖−1}. (5.13)

Recalling Definition 5.3, for𝜅

G≥1, let the Markov operator ofΘbe bounded, i.e., Í

𝑖≥0∥𝐺^𝑖_𝑢→𝑦∥ ≤ 𝜅

G. Due to Assumption 5.1, i.e., the stability of 𝐴, the second term in (5.13) decays exponentially, and for large enough 𝐻it becomes negligible.

Therefore, we obtain the following for the output at time𝑡, 𝑦_𝑡 ≈ ∑︁^𝐻

𝑖=1𝐺^𝑖

𝑢→𝑦𝑢_𝑡₋_𝑖+𝑧_𝑡+∑︁^𝐻−1

𝑖=0 𝐶 𝐴^𝑖𝑤_𝑡₋_𝑖₋₁. (5.14) From this formulation, a least squares estimation problem can be formulated using outputs as the dependent variable and the concatenation of𝐻 input sequences ¯𝑢_𝑡 = [𝑢_𝑡−1, . . . , 𝑢_𝑡−𝐻] as the regressor to recover the Markov parameters of the system:

Gb^𝑢^→^𝑦 = [𝐺b¹_𝑢→𝑦, . . . ,𝐺b_𝑢→𝑦^𝐻 ] =argmin

𝑋

∑︁^𝑇

𝑡=𝐻

∥𝑦_𝑡−𝑋𝑢¯_𝑡∥²₂. (5.15) Prior finite-time system identification algorithms propose using i.i.d. zero-mean Gaussian noise for the input, to make sure that the two noise terms in (5.14) are not correlated with the inputs. In particular, exciting the system with i.i.d.

𝑢_𝑡 ∼ N (0, 𝜎²

𝑢𝐼)for 1 ≤ 𝑡 ≤ 𝑇_{𝑒𝑥 𝑝} provides a lack of correlation between the regressor and the noise components in (5.14) and allows solving (5.15) in closed-form with finite-time estimation error guarantees for the unknown input-to-output Markov parameters [161, 166, 213, 234, 244]. Note that besides lack of correlation, the i.i.d Gaussian control inputs persistently excite the system allows consistent estimation

of the Markov parameters. Interested readers can find the general analysis in [213]

where Oymak and Ozay, show that using i.i.d. Gaussian control inputs allows estimating the Markov parameters with the optimal rate of ˜O (1/√︁

𝑇_{𝑒𝑥 𝑝}), i.e.,

∥Gb^𝑢^→^𝑦−G^𝑢^→^𝑦∥ ≤ 𝑐 𝜎_𝑢√︁

𝑇_{𝑒𝑥 𝑝}

(5.16) for some problem-dependent constant𝑐after large enough𝑇_{𝑒𝑥 𝑝}time steps. This rate is the same error rate one would get from solving a linear regression problem with independent noise and independent covariates [106].

Even though Markov parameters uniquely determine the underlying system, to design the controller for the underlying system as described in Section 5.1.3, one needs to find a balanced realization of Θ from Gb^𝑢→𝑦. To achieve this, the well- known subspace method Ho-Kalman algorithm is the primary choice [112]. The Ho-Kalman algorithm is given in Algorithm 9. It takes the Markov parameter matrix estimate Gb^𝑢→𝑦, 𝐻, the systems order 𝑛, and dimensions 𝑑₁, 𝑑₂, as the input and computes an order𝑛system ˆΘ =(𝐴,ˆ 𝐵,ˆ 𝐶ˆ). It is worth restating that the dimension of the latent state, 𝑛, is the order of the system for observable and controllable dynamics. With the assumption that 𝐻 ≥ 2𝑛+1, we pick 𝑑₁ ≥ 𝑛 and 𝑑₂ ≥ 𝑛 such 𝑑₁+𝑑₂+1 = 𝐻. This guarantees that the system identification problem is well-conditioned.

Algorithm 9Ho-Kalman Algorithm

1: Input: Gb^𝑢→𝑦, 𝐻, system order𝑛,𝑑₁, 𝑑₂such that𝑑₁+𝑑₂+1=𝐻

2: Form the Hankel Matrix ˆH ∈R^{𝑚 𝑑}¹^×^𝑝⁽^𝑑²⁺¹⁾ fromGb^𝑢^→^𝑦

3: Set ˆH⁻ ∈R^{𝑚 𝑑}¹^{×𝑝 𝑑}² as the first 𝑝 𝑑₂columns of ˆH

4: Using SVD obtain ˆN ∈R^{𝑚 𝑑}¹^×^{𝑝 𝑑}² , the rank-𝑛approximation of ˆH⁻

5: ObtainU,𝚺,V=SVD(N )ˆ

6: ConstructOˆ =U𝚺^1/2∈R^{𝑚 𝑑}¹^×𝑛

7: ConstructCˆ =𝚺^1/2V∈R^𝑛^×^{𝑝 𝑑}²

8: Obtain ˆ𝐶 ∈R^𝑚^×^𝑛, the first𝑚 rows ofOˆ

9: Obtain ˆ𝐵 ∈R^𝑛×𝑝, the first 𝑝columns ofCˆ

10: Obtain ˆH⁺ ∈R^{𝑚 𝑑}¹^×^{𝑝 𝑑}² , the last 𝑝 𝑑₂columns of ˆH

11: Obtain ˆ𝐴=Oˆ^†Hˆ⁺Cˆ^† ∈R^𝑛^×^𝑛

Since only the order 𝑛 input-output response of the system is uniquely identifi- able [188], the system parameters Θ (even with the correct Markov parameters matrix G^𝑢^→^𝑦) are recovered up to similarity transformation. More generally, for any invertible T ∈ R^𝑛×𝑛, the system 𝐴^′ = T⁻¹𝐴T, 𝐵^′ = T⁻¹𝐵, 𝐶^′ = 𝐶T gives the

same Markov parameters matrixG^𝑢→𝑦, equivalently, the same input-output impulse response.

For 𝐻 ≥ 2𝑛+1, using [𝐺b¹_𝑢→𝑦, . . . ,𝐺b^𝐻_𝑢→𝑦] ∈ R^𝑚^×^{𝐻 𝑝}, the Ho-Kalman algorithm constructs a(𝑛×𝑛+1)block Hankel matrix ˆH ∈R^{𝑛𝑚×(𝑛+}¹⁾^𝑝such that(𝑖, 𝑗)th block of Hankel matrix is ˆ𝐺

𝑖+𝑗−1

𝑢→𝑦 . It is worth noting that if the input to the algorithm was G^𝑢→𝑦then the corresponding Hankel matrix,H is rank𝑛, more importantly,

H =[𝐶^⊤ (𝐶 𝐴)^⊤. . .(𝐶 𝐴^𝑛−1)^⊤]^⊤[𝐵 𝐴 𝐵 . . . 𝐴^𝑛𝐵] =O[C 𝐴^𝑛𝐵] =O[𝐵 𝐴C], whereOand Care observability and controllability matrices respectively. Essen- tially, the Ho-Kalman algorithm estimates these matrices using Gb^𝑢^→^𝑦. In order to estimateOandC, the algorithm constructs ˆH⁻, the first𝑛 𝑝columns of ˆHand calcu- lates ˆN, the best rank-𝑛approximation of ˆH⁻. Therefore, the singular value decom- position of ˆNprovides us with the estimates ofO,C, i.e., ˆN =U𝚺¹^/²𝚺¹^/²V=O ˆˆC. From these estimates, the algorithm recovers ˆ𝐵as the first 𝑛× 𝑝 block ofCˆ, ˆ𝐶 as the first 𝑚 ×𝑛 block of Oˆ, and ˆ𝐴 as Oˆ^†Hˆ⁺Cˆ^† where ˆH⁺ is the submatrix of ˆH, obtained by discarding the left-most𝑛𝑚×𝑝 block.

Note that if we feed G^𝑢^→^𝑦 to the Ho-Kalman algorithm, the H⁻ is the first 𝑛 𝑝 columns ofH, it is rank-𝑛, andN=H⁻. Using the outputs of the Ho-Kalman algorithm,i.e.,(𝐴,ˆ 𝐵,ˆ 𝐶ˆ), we can construct confidence sets centered around these outputs that contain a similarity transformation of the system parametersΘ =(𝐴, 𝐵, 𝐶)with high probability. Theorem 5.1 states the construction of confidence sets and it is a slight modification of Corollary 5.4 of Oymak and Ozay [213].

Theorem 5.1 (Confidence Set Construction). Suppose H is the rank-𝑛 Hankel matrix obtained fromG^𝑢^→^𝑦. Let 𝐴,¯ 𝐵,¯ 𝐶¯be the system parameters that Ho-Kalman algorithm provides for G^𝑢^→^𝑦. Define the rank-𝑛 matrix N such that it is the submatrix of H obtained by discarding the last block column of H. Suppose 𝜎_𝑛(N ) > 0and ∥N − N ∥ ≤ˆ ^𝜎^𝑛^{(N )}₂ . Then, there exists a unitary matrixT ∈ R^𝑛×𝑛 such that,Θ =¯ (𝐴,¯ 𝐵,¯ 𝐶¯) ∈ (C𝐴× C𝐵× C𝐶)for

C𝐴 =

𝐴^′∈R^𝑛^×^𝑛: ∥𝐴ˆ−T^⊤𝐴^′T∥ ≤

31𝑛∥H ∥

𝜎_𝑛²(H ) + 13𝑛 2𝜎_𝑛(H )

∥Gb^𝑢^→^𝑦−G^𝑢^→^𝑦∥

C𝐵 = (

𝐵^′∈R^𝑛×𝑝 : ∥𝐵ˆ−T^⊤𝐵^′∥ ≤ 7𝑛

√︁

𝜎_𝑛(H )

∥Gb^𝑢→𝑦−G^𝑢→𝑦∥ )

C𝐶 = (

𝐶^′ ∈R^𝑚^×^𝑛 : ∥𝐶ˆ−𝐶^′T∥ ≤ 7𝑛

√︁

𝜎_𝑛(H )

∥Gb^𝑢→𝑦−G^𝑢→𝑦∥ )

where 𝐴,ˆ 𝐵,ˆ 𝐶ˆ obtained from the Ho-Kalman algorithm using the least squares estimate of the Markov parameter matrixGb^𝑢→𝑦.

Proof. The proof is similar to the proof of Theorem 4.3 in [213]. The difference in the presentation arises due to providing a different characterization of the dependence on ∥N −N ∥ˆ and centering the confidence ball over the estimations rather than the output of Ho-Kalman algorithm with the input of𝐺. In Oymak and Ozay [213], from the inequality

∥𝐵¯−T^⊤𝐵ˆ∥²_𝐹 ≤ 2𝑛∥N −N ∥ˆ ² (√

2−1)

𝜎_𝑛(N ) − ∥N −N ∥ˆ , the authors use the assumption ∥N −N ∥ ≤ˆ ^𝜎^𝑛^{(N )}

2 to cancel out numerator and denominator. In this presentation, we define 𝑇_𝑁 such that for large enough ex- ploration time 𝑇_{𝑒𝑥 𝑝} such that 𝑇_{𝑒𝑥 𝑝} ≥ 𝑇_𝑁, we have ∥N −N ∥ ≤ˆ ^𝜎^𝑛^{(N )}

2 with high probability. See [166] for the precise expression of𝑇_𝑁. Note that 𝑇_𝑁 depends on 𝜎_𝑛(𝐻), due to the fact that singular values of submatrices by column partitioning are interlaced, i.e., 𝜎_𝑛(N ) = 𝜎_𝑛(H⁻) ≥ 𝜎_𝑛(H ). Then, we redefine the denominator based on𝜎_𝑛(N ) and again use the fact𝜎_𝑛(N ) =𝜎_𝑛(H⁻) ≥𝜎_𝑛(H ). Following the proof steps provided in Oymak and Ozay [213] and combining with the fact that

∥N −N ∥ ≤ˆ 2

H⁻ −Hˆ⁻ ≤ 2√︁

min{𝑑₁, 𝑑₂}∥Gb^𝑢→𝑦−G^𝑢→𝑦∥ (see Lemma B.1 of

[213]), we obtain the presented theorem. □

Combining Theorem 5.1 with (5.16) shows that using the open-loop system identification method, a balanced realization ofΘcould be recovered with the optimal estimation rate with high probability. However, when a controller designs the inputs based on the history of inputs and observations, the inputs become highly correlated with the past process noise sequences, {𝑤_𝑖}^𝑡⁻¹

𝑖=0. This correlation prevents the consistent and reliable estimation of Markov parameters using (5.15). Therefore, these prior open-loop estimation methods do not generalize to the systems that adaptive controllers generate the inputs for estimation, i.e., closed-loop estimation. For this very reason, the open-loop system identification techniques have been only deployed to propose explore-then-commit-based adaptive control algorithms to minimize regret as discussed at the beginning of this chapter. In the following section, we provide a closed-loop system identification algorithm that alleviates the correlations in the covariates and the noise sequences by considering the predictor form of the system dynamics (5.7) rather than the state space form (5.1).

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 153-158)