LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS
5.2 Open-Loop System Identification
In this section, we study the open-loop system identification methods that are adopted in the literature. In order to minimize the regret given in (5.12), the learning agent needs to efficiently explore the environment to learn the system dynamics, and exploit the gathered experiences to minimize overall cost [171]. However, since the underlying states of the systems are not fully observable, learning the system dynamics with finite time guarantees brings substantial challenges, making it a long- lasting problem in adaptive control. In particular, when the latent states of a system are not fully observable, future observations are correlated with the past inputs and observations through the latent states. These correlations are even magnified when closed-loop controllers, those that naturally use past experiences to come up with control inputs, are deployed. Therefore, more sophisticated estimation methods that consider these complicated and unknown correlations are required for learning the dynamics.
An Open-loop System Identification Method
In recent years, a series of works have studied this learning problem and presented a range of novel methods with finite-sample learning guarantees. These studies propose to employ i.i.d. Gaussian excitation as the control input, i.e., open-loop control, collect system outputs, and estimate the model parameters using the data collected. These methods study the system identification problem using the state- space representation (5.1) and aim to recover the input-to-output Markov parameters
πΊππ’βπ¦=πΆ π΄πβ1π΅ introduced in Definition 5.3. The use of i.i.d. Gaussian noise as the open-loop control input (not using past experiences) mitigates the correlation between the inputs and the output observations. For stable systems, these methods provide efficient ways to learn the model dynamics with confidence bounds of O (Λ 1/β
π), afterπ times step of agent-environment interaction [166, 213, 234, 245, 269]. Here ΛO (Β·) denotes the order up to logarithmic factors. Deploying i.i.d.
Gaussian noise for a long period of time to estimate the model parameters has been the common practice in adaptive control since incorporating a closed-loop controller introduces significant challenges to learning the model dynamics [223].
In this section, we review one of such open-loop system identification methods and discuss the reason why methods that use the state-space representation of the system (5.1) cannot provide reliable estimates in closed-loop estimation problems.
Using the state-space representation in (5.1), for any positive integer π», one can rewrite the output at timeπ‘as follows,
π¦π‘ =βοΈπ» π=1
πΆ π΄πβ1π΅π’π‘βπ+πΆ π΄π»π₯π‘βπ»+π§π‘ +βοΈπ»β1 π=0
πΆ π΄ππ€π‘βπβ1. (5.13)
Recalling Definition 5.3, forπ
Gβ₯1, let the Markov operator ofΞbe bounded, i.e., Γ
πβ₯0β₯πΊππ’βπ¦β₯ β€ π
G. Due to Assumption 5.1, i.e., the stability of π΄, the second term in (5.13) decays exponentially, and for large enough π»it becomes negligible.
Therefore, we obtain the following for the output at timeπ‘, π¦π‘ β βοΈπ»
π=1πΊπ
π’βπ¦π’π‘βπ+π§π‘+βοΈπ»β1
π=0 πΆ π΄ππ€π‘βπβ1. (5.14) From this formulation, a least squares estimation problem can be formulated using outputs as the dependent variable and the concatenation ofπ» input sequences Β―π’π‘ = [π’π‘β1, . . . , π’π‘βπ»] as the regressor to recover the Markov parameters of the system:
Gbπ’βπ¦ = [πΊb1π’βπ¦, . . . ,πΊbπ’βπ¦π» ] =argmin
π
βοΈπ
π‘=π»
β₯π¦π‘βππ’Β―π‘β₯22. (5.15) Prior finite-time system identification algorithms propose using i.i.d. zero-mean Gaussian noise for the input, to make sure that the two noise terms in (5.14) are not correlated with the inputs. In particular, exciting the system with i.i.d.
π’π‘ βΌ N (0, π2
π’πΌ)for 1 β€ π‘ β€ πππ₯ π provides a lack of correlation between the regres- sor and the noise components in (5.14) and allows solving (5.15) in closed-form with finite-time estimation error guarantees for the unknown input-to-output Markov parameters [161, 166, 213, 234, 244]. Note that besides lack of correlation, the i.i.d Gaussian control inputs persistently excite the system allows consistent estimation
of the Markov parameters. Interested readers can find the general analysis in [213]
where Oymak and Ozay, show that using i.i.d. Gaussian control inputs allows estimating the Markov parameters with the optimal rate of ΛO (1/βοΈ
πππ₯ π), i.e.,
β₯Gbπ’βπ¦βGπ’βπ¦β₯ β€ π ππ’βοΈ
πππ₯ π
(5.16) for some problem-dependent constantπafter large enoughπππ₯ πtime steps. This rate is the same error rate one would get from solving a linear regression problem with independent noise and independent covariates [106].
Even though Markov parameters uniquely determine the underlying system, to design the controller for the underlying system as described in Section 5.1.3, one needs to find a balanced realization of Ξ from Gbπ’βπ¦. To achieve this, the well- known subspace method Ho-Kalman algorithm is the primary choice [112]. The Ho-Kalman algorithm is given in Algorithm 9. It takes the Markov parameter matrix estimate Gbπ’βπ¦, π», the systems order π, and dimensions π1, π2, as the input and computes an orderπsystem ΛΞ =(π΄,Λ π΅,Λ πΆΛ). It is worth restating that the dimension of the latent state, π, is the order of the system for observable and controllable dynamics. With the assumption that π» β₯ 2π+1, we pick π1 β₯ π and π2 β₯ π such π1+π2+1 = π». This guarantees that the system identification problem is well-conditioned.
Algorithm 9Ho-Kalman Algorithm
1: Input: Gbπ’βπ¦, π», system orderπ,π1, π2such thatπ1+π2+1=π»
2: Form the Hankel Matrix ΛH βRπ π1Γπ(π2+1) fromGbπ’βπ¦
3: Set ΛHβ βRπ π1Γπ π2 as the first π π2columns of ΛH
4: Using SVD obtain ΛN βRπ π1Γπ π2 , the rank-πapproximation of ΛHβ
5: ObtainU,πΊ,V=SVD(N )Λ
6: ConstructOΛ =UπΊ1/2βRπ π1Γπ
7: ConstructCΛ =πΊ1/2VβRπΓπ π2
8: Obtain ΛπΆ βRπΓπ, the firstπ rows ofOΛ
9: Obtain Λπ΅ βRπΓπ, the first πcolumns ofCΛ
10: Obtain ΛH+ βRπ π1Γπ π2 , the last π π2columns of ΛH
11: Obtain Λπ΄=OΛβ HΛ+CΛβ βRπΓπ
Since only the order π input-output response of the system is uniquely identifi- able [188], the system parameters Ξ (even with the correct Markov parameters matrix Gπ’βπ¦) are recovered up to similarity transformation. More generally, for any invertible T β RπΓπ, the system π΄β² = Tβ1π΄T, π΅β² = Tβ1π΅, πΆβ² = πΆT gives the
same Markov parameters matrixGπ’βπ¦, equivalently, the same input-output impulse response.
For π» β₯ 2π+1, using [πΊb1π’βπ¦, . . . ,πΊbπ»π’βπ¦] β RπΓπ» π, the Ho-Kalman algorithm constructs a(πΓπ+1)block Hankel matrix ΛH βRππΓ(π+1)πsuch that(π, π)th block of Hankel matrix is ΛπΊ
π+πβ1
π’βπ¦ . It is worth noting that if the input to the algorithm was Gπ’βπ¦then the corresponding Hankel matrix,H is rankπ, more importantly,
H =[πΆβ€ (πΆ π΄)β€. . .(πΆ π΄πβ1)β€]β€[π΅ π΄ π΅ . . . π΄ππ΅] =O[C π΄ππ΅] =O[π΅ π΄C], whereOand Care observability and controllability matrices respectively. Essen- tially, the Ho-Kalman algorithm estimates these matrices using Gbπ’βπ¦. In order to estimateOandC, the algorithm constructs ΛHβ, the firstπ πcolumns of ΛHand calcu- lates ΛN, the best rank-πapproximation of ΛHβ. Therefore, the singular value decom- position of ΛNprovides us with the estimates ofO,C, i.e., ΛN =UπΊ1/2πΊ1/2V=O ΛΛC. From these estimates, the algorithm recovers Λπ΅as the first πΓ π block ofCΛ, ΛπΆ as the first π Γπ block of OΛ, and Λπ΄ as OΛβ HΛ+CΛβ where ΛH+ is the submatrix of ΛH, obtained by discarding the left-mostππΓπ block.
Note that if we feed Gπ’βπ¦ to the Ho-Kalman algorithm, the Hβ is the first π π columns ofH, it is rank-π, andN=Hβ. Using the outputs of the Ho-Kalman algo- rithm,i.e.,(π΄,Λ π΅,Λ πΆΛ), we can construct confidence sets centered around these outputs that contain a similarity transformation of the system parametersΞ =(π΄, π΅, πΆ)with high probability. Theorem 5.1 states the construction of confidence sets and it is a slight modification of Corollary 5.4 of Oymak and Ozay [213].
Theorem 5.1 (Confidence Set Construction). Suppose H is the rank-π Hankel matrix obtained fromGπ’βπ¦. Let π΄,Β― π΅,Β― πΆΒ―be the system parameters that Ho-Kalman algorithm provides for Gπ’βπ¦. Define the rank-π matrix N such that it is the submatrix of H obtained by discarding the last block column of H. Suppose ππ(N ) > 0and β₯N β N β₯ β€Λ ππ(N )2 . Then, there exists a unitary matrixT β RπΓπ such that,Ξ =Β― (π΄,Β― π΅,Β― πΆΒ―) β (Cπ΄Γ Cπ΅Γ CπΆ)for
Cπ΄ =
π΄β²βRπΓπ: β₯π΄ΛβTβ€π΄β²Tβ₯ β€
31πβ₯H β₯
ππ2(H ) + 13π 2ππ(H )
β₯Gbπ’βπ¦βGπ’βπ¦β₯
Cπ΅ = (
π΅β²βRπΓπ : β₯π΅ΛβTβ€π΅β²β₯ β€ 7π
βοΈ
ππ(H )
β₯Gbπ’βπ¦βGπ’βπ¦β₯ )
CπΆ = (
πΆβ² βRπΓπ : β₯πΆΛβπΆβ²Tβ₯ β€ 7π
βοΈ
ππ(H )
β₯Gbπ’βπ¦βGπ’βπ¦β₯ )
,
where π΄,Λ π΅,Λ πΆΛ obtained from the Ho-Kalman algorithm using the least squares estimate of the Markov parameter matrixGbπ’βπ¦.
Proof. The proof is similar to the proof of Theorem 4.3 in [213]. The difference in the presentation arises due to providing a different characterization of the dependence on β₯N βN β₯Λ and centering the confidence ball over the estimations rather than the output of Ho-Kalman algorithm with the input ofπΊ. In Oymak and Ozay [213], from the inequality
β₯π΅Β―βTβ€π΅Λβ₯2πΉ β€ 2πβ₯N βN β₯Λ 2 (β
2β1)
ππ(N ) β β₯N βN β₯Λ , the authors use the assumption β₯N βN β₯ β€Λ ππ(N )
2 to cancel out numerator and denominator. In this presentation, we define ππ such that for large enough ex- ploration time πππ₯ π such that πππ₯ π β₯ ππ, we have β₯N βN β₯ β€Λ ππ(N )
2 with high probability. See [166] for the precise expression ofππ. Note that ππ depends on ππ(π»), due to the fact that singular values of submatrices by column partitioning are interlaced, i.e., ππ(N ) = ππ(Hβ) β₯ ππ(H ). Then, we redefine the denominator based onππ(N ) and again use the factππ(N ) =ππ(Hβ) β₯ππ(H ). Following the proof steps provided in Oymak and Ozay [213] and combining with the fact that
β₯N βN β₯ β€Λ 2
Hβ βHΛβ β€ 2βοΈ
min{π1, π2}β₯Gbπ’βπ¦βGπ’βπ¦β₯ (see Lemma B.1 of
[213]), we obtain the presented theorem. β‘
Combining Theorem 5.1 with (5.16) shows that using the open-loop system iden- tification method, a balanced realization ofΞcould be recovered with the optimal estimation rate with high probability. However, when a controller designs the inputs based on the history of inputs and observations, the inputs become highly correlated with the past process noise sequences, {π€π}π‘β1
π=0. This correlation prevents the con- sistent and reliable estimation of Markov parameters using (5.15). Therefore, these prior open-loop estimation methods do not generalize to the systems that adaptive controllers generate the inputs for estimation, i.e., closed-loop estimation. For this very reason, the open-loop system identification techniques have been only deployed to propose explore-then-commit-based adaptive control algorithms to minimize re- gret as discussed at the beginning of this chapter. In the following section, we provide a closed-loop system identification algorithm that alleviates the correlations in the covariates and the noise sequences by considering the predictor form of the system dynamics (5.7) rather than the state space form (5.1).