• Tidak ada hasil yang ditemukan

Open-Loop System Identification

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 153-158)

LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS

5.2 Open-Loop System Identification

In this section, we study the open-loop system identification methods that are adopted in the literature. In order to minimize the regret given in (5.12), the learning agent needs to efficiently explore the environment to learn the system dynamics, and exploit the gathered experiences to minimize overall cost [171]. However, since the underlying states of the systems are not fully observable, learning the system dynamics with finite time guarantees brings substantial challenges, making it a long- lasting problem in adaptive control. In particular, when the latent states of a system are not fully observable, future observations are correlated with the past inputs and observations through the latent states. These correlations are even magnified when closed-loop controllers, those that naturally use past experiences to come up with control inputs, are deployed. Therefore, more sophisticated estimation methods that consider these complicated and unknown correlations are required for learning the dynamics.

An Open-loop System Identification Method

In recent years, a series of works have studied this learning problem and presented a range of novel methods with finite-sample learning guarantees. These studies propose to employ i.i.d. Gaussian excitation as the control input, i.e., open-loop control, collect system outputs, and estimate the model parameters using the data collected. These methods study the system identification problem using the state- space representation (5.1) and aim to recover the input-to-output Markov parameters

𝐺𝑖𝑒→𝑦=𝐢 π΄π‘–βˆ’1𝐡 introduced in Definition 5.3. The use of i.i.d. Gaussian noise as the open-loop control input (not using past experiences) mitigates the correlation between the inputs and the output observations. For stable systems, these methods provide efficient ways to learn the model dynamics with confidence bounds of O (˜ 1/√

𝑇), after𝑇 times step of agent-environment interaction [166, 213, 234, 245, 269]. Here ˜O (Β·) denotes the order up to logarithmic factors. Deploying i.i.d.

Gaussian noise for a long period of time to estimate the model parameters has been the common practice in adaptive control since incorporating a closed-loop controller introduces significant challenges to learning the model dynamics [223].

In this section, we review one of such open-loop system identification methods and discuss the reason why methods that use the state-space representation of the system (5.1) cannot provide reliable estimates in closed-loop estimation problems.

Using the state-space representation in (5.1), for any positive integer 𝐻, one can rewrite the output at time𝑑as follows,

𝑦𝑑 =βˆ‘οΈπ» 𝑖=1

𝐢 π΄π‘–βˆ’1π΅π‘’π‘‘βˆ’π‘–+𝐢 𝐴𝐻π‘₯π‘‘βˆ’π»+𝑧𝑑 +βˆ‘οΈπ»βˆ’1 𝑖=0

𝐢 π΄π‘–π‘€π‘‘βˆ’π‘–βˆ’1. (5.13)

Recalling Definition 5.3, forπœ…

Gβ‰₯1, let the Markov operator ofΘbe bounded, i.e., Í

𝑖β‰₯0βˆ₯𝐺𝑖𝑒→𝑦βˆ₯ ≀ πœ…

G. Due to Assumption 5.1, i.e., the stability of 𝐴, the second term in (5.13) decays exponentially, and for large enough 𝐻it becomes negligible.

Therefore, we obtain the following for the output at time𝑑, 𝑦𝑑 β‰ˆ βˆ‘οΈπ»

𝑖=1𝐺𝑖

π‘’β†’π‘¦π‘’π‘‘βˆ’π‘–+𝑧𝑑+βˆ‘οΈπ»βˆ’1

𝑖=0 𝐢 π΄π‘–π‘€π‘‘βˆ’π‘–βˆ’1. (5.14) From this formulation, a least squares estimation problem can be formulated using outputs as the dependent variable and the concatenation of𝐻 input sequences ¯𝑒𝑑 = [π‘’π‘‘βˆ’1, . . . , π‘’π‘‘βˆ’π»] as the regressor to recover the Markov parameters of the system:

Gb𝑒→𝑦 = [𝐺b1𝑒→𝑦, . . . ,𝐺b𝑒→𝑦𝐻 ] =argmin

𝑋

βˆ‘οΈπ‘‡

𝑑=𝐻

βˆ₯π‘¦π‘‘βˆ’π‘‹π‘’Β―π‘‘βˆ₯22. (5.15) Prior finite-time system identification algorithms propose using i.i.d. zero-mean Gaussian noise for the input, to make sure that the two noise terms in (5.14) are not correlated with the inputs. In particular, exciting the system with i.i.d.

𝑒𝑑 ∼ N (0, 𝜎2

𝑒𝐼)for 1 ≀ 𝑑 ≀ 𝑇𝑒π‘₯ 𝑝 provides a lack of correlation between the regres- sor and the noise components in (5.14) and allows solving (5.15) in closed-form with finite-time estimation error guarantees for the unknown input-to-output Markov parameters [161, 166, 213, 234, 244]. Note that besides lack of correlation, the i.i.d Gaussian control inputs persistently excite the system allows consistent estimation

of the Markov parameters. Interested readers can find the general analysis in [213]

where Oymak and Ozay, show that using i.i.d. Gaussian control inputs allows estimating the Markov parameters with the optimal rate of ˜O (1/√︁

𝑇𝑒π‘₯ 𝑝), i.e.,

βˆ₯Gbπ‘’β†’π‘¦βˆ’G𝑒→𝑦βˆ₯ ≀ 𝑐 πœŽπ‘’βˆšοΈ

𝑇𝑒π‘₯ 𝑝

(5.16) for some problem-dependent constant𝑐after large enough𝑇𝑒π‘₯ 𝑝time steps. This rate is the same error rate one would get from solving a linear regression problem with independent noise and independent covariates [106].

Even though Markov parameters uniquely determine the underlying system, to design the controller for the underlying system as described in Section 5.1.3, one needs to find a balanced realization of Θ from Gb𝑒→𝑦. To achieve this, the well- known subspace method Ho-Kalman algorithm is the primary choice [112]. The Ho-Kalman algorithm is given in Algorithm 9. It takes the Markov parameter matrix estimate Gb𝑒→𝑦, 𝐻, the systems order 𝑛, and dimensions 𝑑1, 𝑑2, as the input and computes an order𝑛system Λ†Ξ˜ =(𝐴,Λ† 𝐡,Λ† 𝐢ˆ). It is worth restating that the dimension of the latent state, 𝑛, is the order of the system for observable and controllable dynamics. With the assumption that 𝐻 β‰₯ 2𝑛+1, we pick 𝑑1 β‰₯ 𝑛 and 𝑑2 β‰₯ 𝑛 such 𝑑1+𝑑2+1 = 𝐻. This guarantees that the system identification problem is well-conditioned.

Algorithm 9Ho-Kalman Algorithm

1: Input: Gb𝑒→𝑦, 𝐻, system order𝑛,𝑑1, 𝑑2such that𝑑1+𝑑2+1=𝐻

2: Form the Hankel Matrix Λ†H ∈Rπ‘š 𝑑1×𝑝(𝑑2+1) fromGb𝑒→𝑦

3: Set Λ†Hβˆ’ ∈Rπ‘š 𝑑1×𝑝 𝑑2 as the first 𝑝 𝑑2columns of Λ†H

4: Using SVD obtain Λ†N ∈Rπ‘š 𝑑1×𝑝 𝑑2 , the rank-𝑛approximation of Λ†Hβˆ’

5: ObtainU,𝚺,V=SVD(N )Λ†

6: ConstructOΛ† =U𝚺1/2∈Rπ‘š 𝑑1×𝑛

7: ConstructCΛ† =𝚺1/2V∈R𝑛×𝑝 𝑑2

8: Obtain ˆ𝐢 ∈Rπ‘šΓ—π‘›, the firstπ‘š rows ofOΛ†

9: Obtain ˆ𝐡 ∈R𝑛×𝑝, the first 𝑝columns ofCΛ†

10: Obtain Λ†H+ ∈Rπ‘š 𝑑1×𝑝 𝑑2 , the last 𝑝 𝑑2columns of Λ†H

11: Obtain ˆ𝐴=Oˆ†HΛ†+Cˆ† ∈R𝑛×𝑛

Since only the order 𝑛 input-output response of the system is uniquely identifi- able [188], the system parameters Θ (even with the correct Markov parameters matrix G𝑒→𝑦) are recovered up to similarity transformation. More generally, for any invertible T ∈ R𝑛×𝑛, the system 𝐴′ = Tβˆ’1𝐴T, 𝐡′ = Tβˆ’1𝐡, 𝐢′ = 𝐢T gives the

same Markov parameters matrixG𝑒→𝑦, equivalently, the same input-output impulse response.

For 𝐻 β‰₯ 2𝑛+1, using [𝐺b1𝑒→𝑦, . . . ,𝐺b𝐻𝑒→𝑦] ∈ Rπ‘šΓ—π» 𝑝, the Ho-Kalman algorithm constructs a(𝑛×𝑛+1)block Hankel matrix Λ†H ∈Rπ‘›π‘šΓ—(𝑛+1)𝑝such that(𝑖, 𝑗)th block of Hankel matrix is ˆ𝐺

𝑖+π‘—βˆ’1

𝑒→𝑦 . It is worth noting that if the input to the algorithm was G𝑒→𝑦then the corresponding Hankel matrix,H is rank𝑛, more importantly,

H =[𝐢⊀ (𝐢 𝐴)⊀. . .(𝐢 π΄π‘›βˆ’1)⊀]⊀[𝐡 𝐴 𝐡 . . . 𝐴𝑛𝐡] =O[C 𝐴𝑛𝐡] =O[𝐡 𝐴C], whereOand Care observability and controllability matrices respectively. Essen- tially, the Ho-Kalman algorithm estimates these matrices using Gb𝑒→𝑦. In order to estimateOandC, the algorithm constructs Λ†Hβˆ’, the first𝑛 𝑝columns of Λ†Hand calcu- lates Λ†N, the best rank-𝑛approximation of Λ†Hβˆ’. Therefore, the singular value decom- position of Λ†Nprovides us with the estimates ofO,C, i.e., Λ†N =U𝚺1/2𝚺1/2V=O Λ†Λ†C. From these estimates, the algorithm recovers ˆ𝐡as the first 𝑛× 𝑝 block ofCΛ†, ˆ𝐢 as the first π‘š ×𝑛 block of OΛ†, and ˆ𝐴 as Oˆ†HΛ†+Cˆ† where Λ†H+ is the submatrix of Λ†H, obtained by discarding the left-mostπ‘›π‘šΓ—π‘ block.

Note that if we feed G𝑒→𝑦 to the Ho-Kalman algorithm, the Hβˆ’ is the first 𝑛 𝑝 columns ofH, it is rank-𝑛, andN=Hβˆ’. Using the outputs of the Ho-Kalman algo- rithm,i.e.,(𝐴,Λ† 𝐡,Λ† 𝐢ˆ), we can construct confidence sets centered around these outputs that contain a similarity transformation of the system parametersΘ =(𝐴, 𝐡, 𝐢)with high probability. Theorem 5.1 states the construction of confidence sets and it is a slight modification of Corollary 5.4 of Oymak and Ozay [213].

Theorem 5.1 (Confidence Set Construction). Suppose H is the rank-𝑛 Hankel matrix obtained fromG𝑒→𝑦. Let 𝐴,Β― 𝐡,Β― 𝐢¯be the system parameters that Ho-Kalman algorithm provides for G𝑒→𝑦. Define the rank-𝑛 matrix N such that it is the submatrix of H obtained by discarding the last block column of H. Suppose πœŽπ‘›(N ) > 0and βˆ₯N βˆ’ N βˆ₯ ≀ˆ πœŽπ‘›(N )2 . Then, there exists a unitary matrixT ∈ R𝑛×𝑛 such that,Θ =Β― (𝐴,Β― 𝐡,Β― 𝐢¯) ∈ (C𝐴× C𝐡× C𝐢)for

C𝐴 =

π΄β€²βˆˆR𝑛×𝑛: βˆ₯π΄Λ†βˆ’TβŠ€π΄β€²Tβˆ₯ ≀

31𝑛βˆ₯H βˆ₯

πœŽπ‘›2(H ) + 13𝑛 2πœŽπ‘›(H )

βˆ₯Gbπ‘’β†’π‘¦βˆ’G𝑒→𝑦βˆ₯

C𝐡 = (

π΅β€²βˆˆR𝑛×𝑝 : βˆ₯π΅Λ†βˆ’TβŠ€π΅β€²βˆ₯ ≀ 7𝑛

√︁

πœŽπ‘›(H )

βˆ₯Gbπ‘’β†’π‘¦βˆ’G𝑒→𝑦βˆ₯ )

C𝐢 = (

𝐢′ ∈Rπ‘šΓ—π‘› : βˆ₯πΆΛ†βˆ’πΆβ€²Tβˆ₯ ≀ 7𝑛

√︁

πœŽπ‘›(H )

βˆ₯Gbπ‘’β†’π‘¦βˆ’G𝑒→𝑦βˆ₯ )

,

where 𝐴,Λ† 𝐡,Λ† 𝐢ˆ obtained from the Ho-Kalman algorithm using the least squares estimate of the Markov parameter matrixGb𝑒→𝑦.

Proof. The proof is similar to the proof of Theorem 4.3 in [213]. The difference in the presentation arises due to providing a different characterization of the dependence on βˆ₯N βˆ’N βˆ₯Λ† and centering the confidence ball over the estimations rather than the output of Ho-Kalman algorithm with the input of𝐺. In Oymak and Ozay [213], from the inequality

βˆ₯π΅Β―βˆ’TβŠ€π΅Λ†βˆ₯2𝐹 ≀ 2𝑛βˆ₯N βˆ’N βˆ₯Λ† 2 (√

2βˆ’1)

πœŽπ‘›(N ) βˆ’ βˆ₯N βˆ’N βˆ₯Λ† , the authors use the assumption βˆ₯N βˆ’N βˆ₯ ≀ˆ πœŽπ‘›(N )

2 to cancel out numerator and denominator. In this presentation, we define 𝑇𝑁 such that for large enough ex- ploration time 𝑇𝑒π‘₯ 𝑝 such that 𝑇𝑒π‘₯ 𝑝 β‰₯ 𝑇𝑁, we have βˆ₯N βˆ’N βˆ₯ ≀ˆ πœŽπ‘›(N )

2 with high probability. See [166] for the precise expression of𝑇𝑁. Note that 𝑇𝑁 depends on πœŽπ‘›(𝐻), due to the fact that singular values of submatrices by column partitioning are interlaced, i.e., πœŽπ‘›(N ) = πœŽπ‘›(Hβˆ’) β‰₯ πœŽπ‘›(H ). Then, we redefine the denominator based onπœŽπ‘›(N ) and again use the factπœŽπ‘›(N ) =πœŽπ‘›(Hβˆ’) β‰₯πœŽπ‘›(H ). Following the proof steps provided in Oymak and Ozay [213] and combining with the fact that

βˆ₯N βˆ’N βˆ₯ ≀ˆ 2

Hβˆ’ βˆ’HΛ†βˆ’ ≀ 2√︁

min{𝑑1, 𝑑2}βˆ₯Gbπ‘’β†’π‘¦βˆ’G𝑒→𝑦βˆ₯ (see Lemma B.1 of

[213]), we obtain the presented theorem. β–‘

Combining Theorem 5.1 with (5.16) shows that using the open-loop system iden- tification method, a balanced realization ofΘcould be recovered with the optimal estimation rate with high probability. However, when a controller designs the inputs based on the history of inputs and observations, the inputs become highly correlated with the past process noise sequences, {𝑀𝑖}π‘‘βˆ’1

𝑖=0. This correlation prevents the con- sistent and reliable estimation of Markov parameters using (5.15). Therefore, these prior open-loop estimation methods do not generalize to the systems that adaptive controllers generate the inputs for estimation, i.e., closed-loop estimation. For this very reason, the open-loop system identification techniques have been only deployed to propose explore-then-commit-based adaptive control algorithms to minimize re- gret as discussed at the beginning of this chapter. In the following section, we provide a closed-loop system identification algorithm that alleviates the correlations in the covariates and the noise sequences by considering the predictor form of the system dynamics (5.7) rather than the state space form (5.1).

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 153-158)