LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS
5.3 A Novel Closed-Loop System Identification Method
5.3.1 Proof of Theorem 5.2
In this section, we first present the proof of Theorem 5.2 under the PE assumption with precise expressions. In particular, we show the self-normalized error bound on the (5.21), Theorem 5.3. Then, assuming the PE condition, we convert the self- normalized bound into a Frobenius norm bound to be used for parameter estimation error bounds in Theorem 5.4, which concludes the proof of Theorem 5.2.
First, consider the effect of the truncation bias term, ๐ถ๐ดยฏ๐ป๐ฅ๐กโ๐ป in (5.18). From Assumption 5.1, we have that ยฏ๐ดis (๐ 3, ๐พ3) stable. Thus,๐ถ๐ดยฏ๐ป๐ฅ๐กโ๐ป scales with the order of (1โ๐พ3)๐ป for bounded๐ฅ. In order to get consistent estimation, for some problem-dependent constant๐๐ป, we set๐ป โฅ log(๐๐ป๐
โ๐/โ ๐)
log(1/(1โ๐พ3)) , resulting in a negligible bias term of order 1/๐. Note that ๐๐ป is determined by the underlying system and the control policy since it is related to the scaling of the latent state. Using this we first obtain a self-normalized finite sample estimation error of (5.21):
Theorem 5.3(Self-normalized Estimation Error). LetGbyube the solution to (5.21) at time ๐. For ๐ป โฅ log(๐๐ป๐
โ๐/โ ๐)
log(1/(1โ๐พ3)) , define๐๐ =๐ ๐ผ +ร๐
๐=๐ป๐๐๐โค
๐ . Let โฅGyuโฅ๐น โค ๐. For๐ฟ โ (0,1), with probability at least1โ๐ฟ, for all๐ก โค ๐,Gyulies in the setCG
yu,๐ก, where
CG
yu,๐ก ={Gyuโฒ: Tr( (Gbyuโ Gyuโฒ)๐๐ก(Gbyuโ Gyuโฒ)โค) โค ๐ฝ2
๐}, for ๐ฝ๐ =
โ๏ธ
๐ฮฃ๐log
det(๐๐)1/2 ๐ฟdet(๐ ๐ผ)1/2
+๐
โ ๐+ ๐
โ ๐ป
๐ , whereฮฃ๐Bโฅ๐ถฮฃ๐ถโค+๐2
๐ง๐ผโฅ๐น.
The proof is given in Appendix C.1. Note that the above result holds under sub- Gaussian๐๐ก and is satisfied in both LQG control systems and ARX systems. Using
this result, we have
๐min(๐๐) โฅGbyuโ Gyuโฅ2๐น โค Tr( (Gbyuโ Gyu)๐๐ก(Gbyuโ Gyu)โค) โค ๐ฝ2
๐,
Assume that ๐๐ is bounded (which will be rigorously shown for different adaptive control algorithms, i.e., Sections 5.4โ5.6) such that max๐โค๐โฅ๐๐โฅ โค ฮฅโ
๐ป. For persistently exciting inputs,i.e.,๐min(๐๐) โฅ ๐2
โ ๐for๐โ > 0, we get, with probability at least 1โ๐ฟ,
โฅGbyuโ Gyuโฅ๐น โค
โ๏ธ
๐ฮฃ๐
log(1๐ฟ) + ๐ป(๐+๐)
2 log
๐(๐+๐)+๐ฮฅ2 ๐(๐+๐) +๐
โ ๐+โ
๐ป ๐โ
โ ๐
(5.23) after๐time steps. Note thatGbyuโGyu = [bG๐ฆโ๐ฆ,Gb๐ขโ๐ฆ] โ [G๐ฆโ๐ฆ,G๐ขโ๐ฆ], thus (5.23) translates to the same error bounds forโฅGb๐ฆโ๐ฆโG๐ฆโ๐ฆโฅandโฅGb๐ขโ๐ฆโG๐ขโ๐ฆโฅ, proving the first part of Theorem 5.2. This result shows that the novel least squares problem provides consistent estimates and the estimation error is หO (1/โ
๐)after๐ samples.
For the second part of Theorem 5.2, we show thatSysIdprovides a balanced realiza- tion ofฮsuch that we have confidence sets around the estimated model parameters in which a similarity transformation ofฮ lives in with high probability similar to Theorem 5.1. For this, define ๐G
yu as the number of samples required such that
โฅGbyuโ Gyuโฅ โค 1 in (5.23). Let ๐๐ =๐G
yu
8๐ป ๐๐2(H )
, ๐๐ต =๐G
yu
20๐๐ป
๐๐(H ). (5.24)
We have the following result on the model parameter estimates.
Theorem 5.4(Model Parameters Estimation Error). LetH be the concatenation of two Hankel matrices obtained fromGyu. Let ๐ด,ยฏ ๐ต,ยฏ ๐ถ ,ยฏ ๐น ,ยฏ ๐ฟยฏ be the system parameters that SysIdprovides for Gyu. At time step ๐ก, let ๐ดห๐ก,๐ตห๐ก,๐ถห๐ก,๐นห๐ก,๐ฟห๐ก denote the system parameters obtained bySysId using Gbyu. For all๐ก โฅ max{๐G
yu, ๐๐, ๐๐ต}, for ๐ป โฅ maxn
2๐+1,log(๐๐ป
๐
โ๐/โ ๐) log(1/(1โ๐พ3))
o
, there exists a unitary matrix T โ R๐ร๐ such that, ฮ =ยฏ (๐ด,ยฏ ๐ต,ยฏ ๐ถ ,ยฏ ๐น ,ยฏ ๐ฟยฏ) โ (C๐ดร C๐ตร C๐ถร C๐น ร C๐ฟ) where
C๐ด(๐ก)=
๐ดโฒโR๐ร๐:โฅ๐ดห๐กโTโค๐ดโฒTโฅ โค๐ฝ๐ด
๐ก , C๐ต(๐ก)=
๐ตโฒโR๐ร๐:โฅ๐ตห๐กโTโค๐ตโฒโฅ โค ๐ฝ๐ต
๐ก ,
C๐ถ(๐ก)=
๐ถโฒโR๐ร๐:โฅ๐ถห๐กโ๐ถโฒTโฅ โค ๐ฝ๐ถ
๐ก , C๐น(๐ก)=
๐นโฒโR๐ร๐:โฅ๐นห๐กโTโค๐นโฒโฅ โค ๐ฝ๐น
๐ก ,
C๐ฟ(๐ก)=
๐ฟโฒโR๐ร๐:โฅ๐ฟห๐กโTโค๐ฟโฒโฅ โค ๐ฝ๐ฟ(๐ก) , (5.25)
for ๐ฝ๐ด
๐ก =๐1
โ
๐๐ป( โฅH โฅ +๐๐(H )) ๐๐2(H )
!
โฅGbyuโGyuโฅ, ๐ฝ๐ต
๐ก =๐ฝ๐ถ
๐ก =๐ฝ๐น
๐ก =
โ๏ธ
20๐๐ป
๐๐(H )โฅGbyuโGyuโฅ, ๐ฝ๐ฟ
๐ก = ๐2โฅH โฅ
โ๏ธ
๐๐(H )
๐ฝ๐ด+๐3
โ
๐๐ป( โฅH โฅ +๐๐(H )) ๐3/2
๐ (H )
โฅGbyuโGyuโฅ, for some problem-dependent constants๐1, ๐2and๐3.
Before presenting the proof, we state the following lemmas which are adapted from Oymak and Ozay [213] with slight modifications to fit our setting. In particular, they are originally used for the Ho-Kalman algorithm andSysIdis a variant of this algo- rithm. These results will be useful in proving error bounds on system parameters.
Lemma 5.1. H,Hห๐กandN,Nห๐กsatisfies the following perturbation bounds, maxn
H+โHห๐ก+ ,
HโโHห๐กโ o
โค โฅH โHห๐กโฅ โคโ๏ธ
min{๐1, ๐2+1}โฅGbyuโ Gyuโฅ
โฅN โNห๐กโฅ โค2
HโโHห๐กโ โค 2โ๏ธ
min{๐1, ๐2}โฅGbyuโ Gyuโฅ. Lemma 5.2. Suppose๐min(N ) โฅ2โฅN โN โฅห where๐min(N )is the smallest nonzero singular value (i.e.,๐th largest singular value) ofN. Let rank-๐matricesN,Nห have singular value decompositions U๐บVโค and U หห๐บVหโค. There exists an ๐ร๐ unitary matrixTso that
U๐บ1/2โU หห๐บ1/2T
2 ๐น
+
V๐บ1/2โV หห๐บ1/2T
2 ๐น
โค 5๐โฅN โN โฅห 2 ๐๐(N ) โ โฅN โN โฅห
.
Proof. For brevity, we noteO=O(๐ด, ๐ถ , ๐ยฏ 1),CF=C(๐ด, ๐น , ๐ยฏ 2+1),CB=C(๐ด, ๐ต, ๐ยฏ 2+ 1), Oหt = Oหt(๐ด, ๐ถ , ๐ยฏ 1), CหFt = Cหt(๐ด, ๐น , ๐ยฏ 2 +1), CหBt = Cหt(๐ด, ๐ต, ๐ยฏ 2 +1). In the definition of๐๐, we use ๐๐(๐ป), due to the fact that singular values of submatrices by column partitioning are interlaced, i.e., ๐๐(N ) = ๐๐(Hโ) โฅ ๐๐(H ). Directly applying Lemma 5.2 with the condition that for given๐ก โฅ ๐๐, we have ๐min(N ) โฅ 2โฅN โN โฅ, we can guarantee that there exists a unitary transformห Tsuch that
Oหt โOT
2 ๐น +
[CหFt CหBt] โTโค[CF CB]
2
๐น โค 10๐โฅN โNห๐กโฅ2
๐๐(N ) . (5.26) Since ห๐ถ๐กโ๐ถยฏTis a submatrix ofOหtโOT, ห๐ต๐กโTโค๐ตยฏis a submatrix ofCหBtโTโคCB
and ห๐น๐กโTโค๐นยฏ is a submatrix ofCหFtโTโคCF, we get the same bounds for them stated in (5.26). Using Lemma 5.1, with the choice of๐1, ๐2โฅ ๐ป
2, we have
โฅN โNห๐กโฅ โค
โ
2๐ปโฅGbyuโ Gyuโฅ.
This provides the advertised bounds in the theorem:
โฅ๐ตห๐กโTโค๐ตยฏโฅ,โฅ๐ถห๐กโ๐ถยฏTโฅ,โฅ๐นห๐กโTโค๐นยฏโฅ โค
โ
20๐๐ปโฅGbyuโ Gyuโฅ
โ๏ธ
๐๐(N ) .
Notice that for๐ก โฅ๐๐ต, we have all the terms above to be bounded by 1. In order to determine the closeness of ห๐ด๐ก and ยฏ๐ดwe first consider the closeness of หยฏ๐ด๐กโTโค๐ดยฏยฏT, where ยฏยฏ๐ดis the output obtained by SysIdfor ยฏ๐ดwhen the input isGyu. Let ๐ =OT and๐ =Tโค[CF CB]. Thus, we have
โฅ๐ดหยฏ๐กโTโค๐ดยฏยฏTโฅ๐น = โฅOหโ tHห๐ก+[CหFt CหBt]โ โ ๐โ H+๐โ โฅ๐น
โค
Oหโ t โ๐โ
Hห๐ก+[CหFt CหBt]โ ๐น
+ ๐โ
Hห๐ก+โ H+
[CหFt CหBt]โ ๐น
+
๐โ H+
[CหFt CหBt]โ โ๐โ ๐น
.
For the first term, we have the following perturbation bound [197, 291],
โฅOหโ t โ ๐โ โฅ๐น โค โฅOหtโ๐โฅ๐นmax{โฅ๐โ โฅ2,โฅOหโ tโฅ2}
โค โฅN โNห๐กโฅ
โ๏ธ
10๐
๐๐(N ) max{โฅ๐โ โฅ2,โฅOหโ tโฅ2}.
Since we already had๐๐(N ) โฅ 2โฅN โN โฅ, we haveห โฅN โฅ โคห 2โฅN โฅand 2๐๐(N ) โฅห ๐๐(N ). Thus,
max{โฅ๐โ โฅ2,โฅOหโ tโฅ2} =max
1
๐๐(N ), 1 ๐๐(N )ห
โค 2
๐๐(N ). (5.27) Combining these and following the same steps forโฅ [CหFt CหBt]โ โ๐โ โฅ๐น, we get
Oหโ t โ๐โ ๐น
,
[CหFt CหBt]โ โ๐โ ๐น โค
N โNห๐ก
โ๏ธ
40๐ ๐๐3(N )
. (5.28) The following individual bounds obtained by using (5.27), (5.28) and triangle in- equality:
Oหโ t โ๐โ
Hห๐ก+[CหFt CหBt]โ ๐น
โค โฅOหโ t โ ๐โ โฅ๐นโฅHห๐ก+โฅ โฅ [CหFt CหBt]โ โฅ
โค 4โ 5๐
N โNห๐ก
๐2
๐(N )
โฅH+โฅ + โฅHห๐ก+โ H+โฅ
๐โ
Hห๐ก+โ H+
[CหFt CหBt]โ ๐น
โค 2โ
๐โฅHห๐ก+ โ H+โฅ ๐๐(N )
๐โ H+
[CหFt CหBt]โ โ๐โ ๐น
โค โฅ๐โ โฅ โฅH+โฅ โฅ [CหFt CหBt]โ โ๐โ โฅ
โค 2
โ 10๐
N โNห๐ก
๐๐2(N ) โฅH+โฅ.
Combining these we get
โฅ๐ดหยฏ๐กโTโค๐ดยฏยฏTโฅ๐นโค31โ
๐โฅH+โฅ N โNห๐ก
2๐2
๐(N ) + โฅHห๐ก+โH+โฅ 4โ 5๐
N โNห๐ก
๐2
๐(N ) + 2โ
๐ ๐๐(N )
!
โค 31โ
๐โฅH+โฅ 2๐2
๐(N )
N โNห๐ก
+ 13โ ๐
2๐๐(N )โฅHห๐ก+โ H+โฅ.
These results give the estimation error guarantees for the ARX systems. For LQG control systems we additionally need to recover ๐ด and ๐ฟ. Now consider ห๐ด๐ก =
หยฏ
๐ด๐ก+๐นห๐ก๐ถห๐ก. Using Lemma 5.1,
โฅ๐ดห๐กโTโค๐ดยฏTโฅ๐น
= โฅ๐ดหยฏ๐ก +๐นห๐ก๐ถห๐กโTโค๐ดยฏยฏTโTโค๐นยฏ๐ถยฏTโฅ๐น
โค โฅ๐ดหยฏ๐กโTโค๐ดยฏยฏTโฅ๐น+ โฅ (๐นห๐กโTโค๐นยฏ)๐ถห๐กโฅ๐น+ โฅTโค๐นยฏ(๐ถห๐ก โ๐ถยฏT) โฅ๐น
โค โฅ๐ดหยฏ๐กโTโค๐ดยฏยฏTโฅ๐น+ โฅ (๐นห๐กโTโค๐นยฏ) โฅ๐นโฅ๐ถห๐กโ๐ถยฏTโฅ๐น
+ โฅ (๐นห๐กโTโค๐นยฏ) โฅ๐นโฅ๐ถยฏโฅ + โฅ๐นยฏโฅ โฅ (๐ถห๐กโ๐ถยฏT) โฅ๐น
โค 31
โ
2๐๐ปโฅH โฅ 2๐2
๐(N ) โฅGbyuโ Gyuโฅ + 13
โ ๐๐ป 2
โ
2๐๐(N )
โฅGbyuโ Gyuโฅ + 20๐๐ปโฅGbyuโ Gyuโฅ2 ๐๐(N ) + ( โฅ๐นยฏโฅ + โฅ๐ถยฏโฅ) โฅGbyuโ Gyuโฅ
โ๏ธ
20๐๐ป ๐๐(N ).
Using the result above, to obtain an estimation error bound for ห๐ฟ๐ก, we define ๐๐ด as the samples required to have โฅ๐ดห๐ก โ Tโค๐ดยฏTโฅ โค ๐๐(๐ดยฏ)/2 for all ๐ก โฅ ๐๐ด, i.e., ๐๐ด=๐G
yu 62โ
2๐ ๐ปโฅ H โฅ 2๐2
๐( N ) + 26
โ ๐ ๐ป 2
โ
2๐๐( N )+( โฅ๐นยฏโฅ+โฅ๐ถยฏโฅ)โ๏ธ
80๐ ๐ป ๐๐( N )+
โ๏ธ40๐ ๐ป ๐๐(๐ดยฏ) ๐๐( N )
๐๐(๐ดยฏ)
!2
. From Weylโs inequality, we have๐๐(๐ดห๐ก) โฅ ๐๐(๐ดยฏ)/2. Recalling that๐ = O(๐ด, ๐ถ , ๐ยฏ 1)T, under Assumption 5.1, we consider ห๐ฟ๐ก:
โฅ๐ฟห๐กโTโค๐ฟยฏโฅ๐น
= โฅ๐ดหโ
๐กOหโ tHห๐กโโTโค๐ดยฏโ Oโ Hโโฅ๐น
โค โฅ (๐ดหโ
๐กโTโค๐ดยฏโ T)Oหโ tHห๐กโโฅ๐น+ โฅTโค๐ดยฏโ T(Oหโ tโ๐โ )Hห๐กโโฅ๐น+ โฅTโค๐ดยฏโ T๐โ (Hห๐กโโHโ) โฅ๐น
โค โฅ๐ดหโ
๐กโTโค๐ดยฏโ Tโฅ๐นโฅOหโ tโฅ โฅHห๐กโโฅ + โฅOหโ tโ๐โ โฅ๐นโฅ๐ดยฏโ โฅ โฅHห๐กโโฅ +โ
๐โฅHห๐กโโHโโฅ โฅ๐ดยฏโ โฅ โฅ๐โ โฅ
โค โฅ๐ดหโ
๐กโTโค๐ดยฏโ Tโฅ๐น
โ๏ธ
2 ๐๐(N ) +
N โNห๐ก
โ๏ธ
40๐
๐๐3(N )โฅ๐ดยฏโ โฅ
!
โฅHโโฅ + โฅHห๐กโโHโโฅ
+โ
๐โฅ๐ดยฏโ โฅ 1
โ๏ธ
๐๐(N )
โฅHห๐กโโ Hโโฅ.
Again using the perturbation bounds of the MooreโPenrose inverse under the Frobenius norm [197], we have โฅ๐ดหโ
๐ก โ Tโค๐ดยฏโ Tโฅ๐น โค 2
๐๐2(๐ดยฏ)โฅ๐ดห๐ก โ Tโค๐ดยฏTโฅ. No- tice that the similarity transformation that transfers ๐ด to ยฏ๐ด is bounded since ๐ = [๐ถโค (๐ถ๐ดยฏ)โค. . .(๐ถ๐ดยฏ๐1โ1)โค]โคโ
O(๐ด, ๐ถ , ๐ยฏ 1). Combining all and using Lemma 5.1, we obtain the confidence set for ห๐ฟ๐กgiven in Theorem 5.4. โก Combining Theorem 5.4 with the guarantee that โฅGbyuโ Gyuโฅ = O (1/ห โ
๐) given in (5.23), finishes the proof of the second part of Theorem 5.2. Overall, we showed that our novel system identification method allows closed-loop and open-loop estimation in both LQG and ARX systems. This method will be the key piece in our adaptive control design.
Remark 5.1. Note that to recoverGyu using the closed-loop system identification method presented in this section, we only require stabilizability condition on(๐ด, ๐ต) and detectability conditions on (๐ด, ๐ถ), i.e., there exists a matrix๐พ and๐นsuch that ๐ดโ๐ต๐พand๐ดโ๐น ๐ถare stable, rather than controllability and observability condi- tions provided in Assumption 5.1. Stabilizability and detectability are necessary and sufficient conditions to have a well-defined learning and control problem in partially observable linear dynamical systems, and they provide the conditions required for our novel closed-loop system identification method to work, i.e., stable๐ดยฏ. However, controllability and observability assumptions are required for the subspace identifi- cation methodSysId, since it requires rank-๐observability and controllability matri- ces to achieve a balanced realization. If the goal is to recover the Markov parameters of the system or if one can design adaptive control methods only using Markov param- eter estimates, e.g., Section 5.6.5, stabilizability and detectability of the underlying system are sufficient to have reliable estimates as in Theorem 5.3 and(5.23). 5.3.2 PE Condition in the Open-Loop Setting
Before studying the adaptive control problem in partially observable linear dynam- ical systems, at the end of this section, we show that the PE condition required for consistent estimation is satisfied for the open-loop control, i.e., i.i.d. Gaussian control inputs. To this end, we introduce the truncated open-loop noise evolution parameterG๐๐. G๐๐ represents the effect of noises in the system on the outputs. We define G๐๐ for 2๐ป time steps back in time and show that the last 2๐ป process and measurement noises provide sufficient persistent excitation for the covariates in the estimation problem. In the following, we show that there exists a positive๐๐such that ๐๐ < ๐min(G๐๐), i.e., G๐๐ is full row rank. Let ยฏ๐๐ก = ๐ ๐๐ก for a permutation
matrix๐that gives
ยฏ ๐๐ก =
๐ฆโค
๐กโ1 ๐ขโค
๐กโ1. . . ๐ฆโค
๐กโ๐ป ๐ขโค
๐กโ๐ป
โค
โR(๐+๐)๐ป.
We will consider the state space representation for the analysis for LQG control systems given in (5.1), but one can apply the same analysis for predictor form/ARX systems (see [163] for the details). For the control input of ๐ข๐ก โผ N (0, ๐2
๐ข๐ผ), let ๐๐ก = [๐ฆโค
๐ก ๐ขโค
๐ก ]โค. From the evolution of the system with given input we have the following:
๐๐ก =Go h
๐คโค
๐กโ1 ๐งโค
๐ก ๐ขโค
๐ก . . . ๐คโค
๐กโ๐ป ๐งโค
๐กโ๐ป+1 ๐ขโค
๐กโ๐ป+1
iโค
+rot where
Go :=
"
0๐ร๐ ๐ผ๐ร๐ 0๐ร๐ ๐ถ0๐ร๐ ๐ถ ๐ต . . . ๐ถ ๐ด๐ปโ20๐ร๐ ๐ถ ๐ด๐ปโ2๐ต 0๐ร๐0๐ร๐ ๐ผ๐ร๐ 0๐ร๐0๐ร๐ 0๐ร๐ . . . 0๐ร๐0๐ร๐ 0๐ร๐
# ,
androt is the residual vector that represents the effect of [๐ค๐โ1 ๐ง๐ ๐ข๐] for 0 โค ๐ <
๐กโ๐ป, which are independent. Notice thatGois full row rank even for ๐ป = 1, due to first(๐+๐) ร (๐+๐+๐)block. Using this, we can represent ยฏ๐๐ก as follows
ยฏ ๐๐ก =
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ ๐๐กโ1
.. . ๐๐กโ๐ป
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
| {z }๏ฃป
R(๐+๐)๐ป
+
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ rotโ1
.. . rotโH
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
=G๐๐
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ ๐ค๐กโ2
๐ง๐กโ1 ๐ข๐กโ1 .. . ๐ค๐กโ2๐ปโ1
๐ง๐กโ2๐ป ๐ข๐กโ2๐ป
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
| {z }๏ฃป
R2(๐+๐+๐)๐ป
+
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ rotโ1
.. . rotโH
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
where
G๐๐B
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
[ Go ] 0(๐+๐)ร(๐+๐+๐) 0(๐+๐)ร(๐+๐+๐) 0(๐+๐)ร(๐+๐+๐) . . .
0(๐+๐)ร(๐+๐+๐) [ Go ] 0(๐+๐)ร(๐+๐+๐) 0(๐+๐)ร(๐+๐+๐) . . .
...
0(๐+๐)ร(๐+๐+๐) 0(๐+๐)ร(๐+๐+๐) . . . [ Go ] 0(๐+๐)ร(๐+๐+๐)
0(๐+๐)ร(๐+๐+๐) 0(๐+๐)ร(๐+๐+๐) 0(๐+๐)ร(๐+๐+๐) . . . [ Go ]
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป .
(5.29) Recall Assumption 5.1. The following lemma shows that covariates๐s are bounded for the given system under open-loop control.
Lemma 5.3. After applying the control inputs of๐ข๐ก โผ N (0, ๐2
๐ข๐ผ) for๐๐ค time steps for all1 โค ๐ก โค๐๐ค, with probability1โ๐ฟ/2,
โฅ๐ฅ๐กโฅ โค ๐๐ค B
(๐๐ค+๐๐ขโฅ๐ตโฅ)๐ 1(1โ๐พ1)
โ๏ธ1โ (1โ๐พ1)2
โ๏ธ2๐log(12๐๐๐ค/๐ฟ), (5.30)
โฅ๐ง๐กโฅ โค ๐ B ๐๐ง
โ๏ธ2๐log(12๐๐๐ค/๐ฟ), (5.31)
โฅ๐ข๐กโฅ โค๐๐ค B ๐๐ข
โ๏ธ2๐log(12๐๐๐ค/๐ฟ), (5.32)
โฅ๐ฆ๐กโฅ โค โฅ๐ถโฅ๐๐ค+๐ . (5.33)
Thus, we havemax๐โค๐กโค๐๐ค โฅ๐๐โฅ โค ฮฅ๐ค
โ
๐ป, whereฮฅ๐ค = โฅ๐ถโฅ๐๐ค+๐+๐๐ค.
Proof. For all 1 โค ๐ก โค ๐๐ค, ฮฃ(๐ฅ๐ก) โผ ๐ชโ, where ๐ชโ is the steady state covariance matrix of๐ฅ๐กsuch that,
๐ชโ =โ๏ธโ ๐=0
๐2
๐ค๐ด๐(๐ดโค)๐+๐2
๐ข๐ด๐๐ต ๐ตโค(๐ดโค)๐.
From the Assumption 5.1, we have โฅ๐ด๐โฅ โค ๐ 1(1 โ ๐พ1)๐ for all ๐ โฅ 0. Thus,
โฅ๐ชโโฅ โค (๐2
๐ค+๐2
๐ขโฅ๐ตโฅ2)๐
2 1(1โ๐พ1)2
1โ(1โ๐พ1)2. Notice that each๐ฅ๐ก is component-wiseโ๏ธ
โฅ๐ชโโฅ- sub-Gaussian random variable. Using standard sub-Gaussian vector norm upper bound with a union bound argument, we get the advertised result. โก The following lemma shows that the i.i.d. Gaussian inputs uniformly excite the system and satisfy the PE condition after enough interactions.
Lemma 5.4(Persistence of Excitation in Open-Loop Control Setting). G๐๐ is full row-rank such that๐min(G๐๐) > ๐๐ > 0. For some๐ฟ โ (0,1), and ฮฅ๐ค defined in Lemma 5.3, let๐๐ =32ฮฅ4๐ค๐โ4
๐ log2
2๐ป(๐+๐) ๐ฟ
max{๐โ4
๐ค , ๐โ4
๐ง , ๐โ4
๐ข }. After applying the control inputs of๐ข๐ก โผ N (0, ๐2
๐ข๐ผ)for๐๐ค โฅ๐๐time steps, with probability at least 1โ๐ฟwe have๐min ร๐ก
๐=1๐๐๐โค
๐
โฅ ๐ก
๐2
๐
2 min{๐2
๐ค, ๐2
๐ง, ๐2
๐ข}.
Proof. Let ยฏ0 = 0(๐+๐)ร(๐+๐+๐). Since each block row is full row-rank, we get the following decomposition using QR decomposition for each block row:
G๐๐ =
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
๐๐ 0๐+๐ 0๐+๐ 0๐+๐ . . . 0๐+๐ ๐๐ 0๐+๐ 0๐+๐ . . .
.. .
0๐+๐ 0๐+๐ . . . ๐๐ 0๐+๐
0๐+๐ 0๐+๐ 0๐+๐ . . . ๐๐
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
| {z }๏ฃป
R(๐+๐)๐ปร (๐+๐)๐ป
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
๐ ๐ 0ยฏ 0ยฏ 0ยฏ . . . 0ยฏ ๐ ๐ 0ยฏ 0ยฏ . . .
.. .
0ยฏ 0ยฏ . . . ๐ ๐ 0ยฏ 0ยฏ 0ยฏ 0ยฏ . . . ๐ ๐
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
| {z }๏ฃป
R(๐+๐)๐ปร2(๐+๐+๐)๐ป
,
where ๐ ๐ =
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
ร ร ร ร ร ร . . . 0 ร ร ร ร ร . . .
...
0 0 0 ร ร ร . . .
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
โ R(๐+๐)ร๐ป(๐+๐+๐) where the elements
in the diagonal are positive numbers. Notice that the first matrix with ๐0 is full rank. Also, all the rows of the second matrix are in row echelon form and the second matrix is full row-rank. Thus, we can deduce thatG๐๐ is full row-rank, i.e., ๐min(G๐๐) > ๐๐ > 0. SinceG๐๐ is full row rank, we have that
E[๐ยฏ๐ก๐ยฏโค
๐ก ] โชฐ G๐๐ฮฃ๐ค ,๐ง,๐ขG๐๐โค, where ฮฃ๐ค ,๐ง,๐ข โ R2(๐+๐+๐)๐ปร2(๐+๐+๐)๐ป = diag(๐2
๐ค, ๐2
๐ง, ๐2
๐ข, . . . , ๐2
๐ค, ๐2
๐ง, ๐2
๐ข). This gives us
๐min(E[๐ยฏ๐ก๐ยฏโค
๐ก ]) โฅ๐2
๐min{๐2
๐ค, ๐2
๐ง, ๐2
๐ข} for all๐ก. From Lemma 5.3, we have max๐โค๐ โฅ๐๐โฅ โคฮฅ๐ค
โ
๐ปwith probability at least 1โ๐ฟ/2. Given this holds, one can use Matrix Azuma inequality in [267], to obtain the following which holds with probability 1โ๐ฟ/2:
๐max
๐ก
โ๏ธ
๐=1
๐๐๐โค
๐ โE[๐๐๐โค
๐ ]
!
โค 2
โ
2๐กฮฅ2๐ค๐ป
โ๏ธ
log
2๐ป(๐+ ๐) ๐ฟ
.
Using Weylโs inequality, during the warm-up period with probability 1โ๐ฟ, we have
๐min
๐ก
โ๏ธ
๐=1
๐๐๐โค
๐
!
โฅ ๐ก ๐2
๐min{๐2
๐ค, ๐2
๐ง, ๐2
๐ข} โ2
โ 2๐กฮฅ2๐ค๐ป
โ๏ธ
log
2๐ป(๐+๐) ๐ฟ
.
For all๐ก โฅ๐๐, we have the stated lower bound. โก
This result verifies that the PE condition holds in the open-loop control setting, which shows that the estimation error guarantees given in Theorem 5.2 hold for open-loop data collection. Therefore, even if the closed-loop PE condition is not satisfied such that we cannot guarantee the estimation error guarantees given in Theorem 5.2 for the closed-loop control, one can use the novel system identification method with the i.i.d. control inputs to obtain state-of-the-art guarantees. However, if one has PE in the closed-loop setting we can further guarantee consistent improvement of estimates which would not be possible with prior methods. This novelty will be crucial in the adaptive control tasks discussed next.