Proof of Theorem 5.2 - A Novel Closed-Loop System Identification Method

LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS

5.3 A Novel Closed-Loop System Identification Method

5.3.1 Proof of Theorem 5.2

In this section, we first present the proof of Theorem 5.2 under the PE assumption with precise expressions. In particular, we show the self-normalized error bound on the (5.21), Theorem 5.3. Then, assuming the PE condition, we convert the self- normalized bound into a Frobenius norm bound to be used for parameter estimation error bounds in Theorem 5.4, which concludes the proof of Theorem 5.2.

First, consider the effect of the truncation bias term, 𝐶𝐴¯^𝐻𝑥_𝑡₋_𝐻 in (5.18). From Assumption 5.1, we have that ¯𝐴is (𝜅₃, 𝛾₃) stable. Thus,𝐶𝐴¯^𝐻𝑥_𝑡−𝐻 scales with the order of (1−𝛾₃)^𝐻 for bounded𝑥. In order to get consistent estimation, for some problem-dependent constant𝑐_𝐻, we set𝐻 ≥ ^log^(𝑐^𝐻^𝑇

√𝑚/√ 𝜆)

log(1/(1−𝛾₃)) , resulting in a negligible bias term of order 1/𝑇. Note that 𝑐_𝐻 is determined by the underlying system and the control policy since it is related to the scaling of the latent state. Using this we first obtain a self-normalized finite sample estimation error of (5.21):

Theorem 5.3(Self-normalized Estimation Error). LetGb_yube the solution to (5.21) at time 𝜏. For 𝐻 ≥ ^log^(𝑐^𝐻^𝑇

√𝑚/√ 𝜆)

log(1/(1−𝛾₃)) , define𝑉_𝜏 =𝜆 𝐼 +Í𝜏

𝑖=𝐻𝜙_𝑖𝜙^⊤

𝑖 . Let ∥G_yu∥𝐹 ≤ 𝑆. For𝛿 ∈ (0,1), with probability at least1−𝛿, for all𝑡 ≤ 𝜏,G_yulies in the setC_G

yu,𝑡, where

C_G

yu,𝑡 ={G_yu^′: Tr( (Gb_yu− G_yu^′)𝑉_𝑡(Gb_yu− G_yu^′)^⊤) ≤ 𝛽²

𝜏}, for 𝛽_𝜏 =

√︂

𝑚Σ𝑒log

det(𝑉𝜏)^1/2 𝛿det(𝜆 𝐼)^1/2

+𝑆

√ 𝜆+ ^𝜏

√ 𝐻

𝑇 , whereΣ𝑒B∥𝐶Σ𝐶^⊤+𝜎²

𝑧𝐼∥𝐹.

The proof is given in Appendix C.1. Note that the above result holds under sub- Gaussian𝑒_𝑡 and is satisfied in both LQG control systems and ARX systems. Using

this result, we have

𝜎_min(𝑉_𝜏) ∥Gb_yu− G_yu∥²_𝐹 ≤ Tr( (Gb_yu− G_yu)𝑉_𝑡(Gb_yu− G_yu)^⊤) ≤ 𝛽²

𝜏,

Assume that 𝜙_𝑖 is bounded (which will be rigorously shown for different adaptive control algorithms, i.e., Sections 5.4–5.6) such that max𝑖≤𝜏∥𝜙_𝑖∥ ≤ Υ√

𝐻. For persistently exciting inputs,i.e.,𝜎_min(𝑉_𝜏) ≥ 𝜎²

★𝜏for𝜎_★> 0, we get, with probability at least 1−𝛿,

∥Gb_yu− G_yu∥𝐹 ≤

√︂

𝑚Σ𝑒

log(¹_𝛿) + ^𝐻⁽^𝑚⁺^𝑝⁾

2 log

𝜆(𝑚+𝑝)+𝜏Υ² 𝜆(𝑚+𝑝) +𝑆

√ 𝜆+√

𝐻 𝜎_★

√ 𝜏

(5.23) after𝜏time steps. Note thatGb_yu−G_yu = [bG^𝑦→𝑦,Gb^𝑢→𝑦] − [G^𝑦→𝑦,G^𝑢→𝑦], thus (5.23) translates to the same error bounds for∥Gb^𝑦^→^𝑦−G^𝑦^→^𝑦∥and∥Gb^𝑢^→^𝑦−G^𝑢^→^𝑦∥, proving the first part of Theorem 5.2. This result shows that the novel least squares problem provides consistent estimates and the estimation error is ˜O (1/√

𝑇)after𝑇 samples.

For the second part of Theorem 5.2, we show thatSysIdprovides a balanced realization ofΘsuch that we have confidence sets around the estimated model parameters in which a similarity transformation ofΘ lives in with high probability similar to Theorem 5.1. For this, define 𝑇_G

yu as the number of samples required such that

∥Gb_yu− G_yu∥ ≤ 1 in (5.23). Let 𝑇_𝑁 =𝑇_G

8𝐻 𝜎_𝑛²(H )

, 𝑇_𝐵 =𝑇_G

20𝑛𝐻

𝜎_𝑛(H ). (5.24)

We have the following result on the model parameter estimates.

Theorem 5.4(Model Parameters Estimation Error). LetH be the concatenation of two Hankel matrices obtained fromG_yu. Let 𝐴,¯ 𝐵,¯ 𝐶 ,¯ 𝐹 ,¯ 𝐿¯ be the system parameters that SysIdprovides for G_yu. At time step 𝑡, let 𝐴ˆ_𝑡,𝐵ˆ_𝑡,𝐶ˆ_𝑡,𝐹ˆ_𝑡,𝐿ˆ_𝑡 denote the system parameters obtained bySysId using Gb_yu. For all𝑡 ≥ max{𝑇_G

yu, 𝑇_𝑁, 𝑇_𝐵}, for 𝐻 ≥ maxn

2𝑛+1,^log(𝑐^𝐻

𝑇

√𝑚/√ 𝜆) log(1/(1−𝛾₃))

, there exists a unitary matrix T ∈ R^𝑛^×^𝑛 such that, Θ =¯ (𝐴,¯ 𝐵,¯ 𝐶 ,¯ 𝐹 ,¯ 𝐿¯) ∈ (C𝐴× C𝐵× C𝐶× C𝐹 × C𝐿) where

C𝐴(𝑡)=

𝐴^′∈R^𝑛×𝑛:∥𝐴ˆ_𝑡−T^⊤𝐴^′T∥ ≤𝛽^𝐴

𝑡 , C𝐵(𝑡)=

𝐵^′∈R^𝑛×𝑝:∥𝐵ˆ_𝑡−T^⊤𝐵^′∥ ≤ 𝛽^𝐵

𝑡 ,

C𝐶(𝑡)=

𝐶^′∈R^𝑚^×^𝑛:∥𝐶ˆ_𝑡−𝐶^′T∥ ≤ 𝛽^𝐶

𝑡 , C𝐹(𝑡)=

𝐹^′∈R^𝑛^×^𝑚:∥𝐹ˆ_𝑡−T^⊤𝐹^′∥ ≤ 𝛽^𝐹

𝑡 ,

C𝐿(𝑡)=

𝐿^′∈R^𝑛×𝑚:∥𝐿ˆ_𝑡−T^⊤𝐿^′∥ ≤ 𝛽_𝐿(𝑡) , (5.25)

for 𝛽^𝐴

𝑡 =𝑐₁

√

𝑛𝐻( ∥H ∥ +𝜎_𝑛(H )) 𝜎_𝑛²(H )

∥Gb_yu−G_yu∥, 𝛽^𝐵

𝑡 =𝛽^𝐶

𝑡 =𝛽^𝐹

𝑡 =

√︄

20𝑛𝐻

𝜎_𝑛(H )∥Gb_yu−G_yu∥, 𝛽^𝐿

𝑡 = 𝑐₂∥H ∥

√︁

𝜎_𝑛(H )

𝛽_𝐴+𝑐₃

√

𝑛𝐻( ∥H ∥ +𝜎_𝑛(H )) 𝜎^3/2

𝑛 (H )

∥Gb_yu−G_yu∥, for some problem-dependent constants𝑐₁, 𝑐₂and𝑐₃.

Before presenting the proof, we state the following lemmas which are adapted from Oymak and Ozay [213] with slight modifications to fit our setting. In particular, they are originally used for the Ho-Kalman algorithm andSysIdis a variant of this algorithm. These results will be useful in proving error bounds on system parameters.

Lemma 5.1. H,Hˆ𝑡andN,Nˆ𝑡satisfies the following perturbation bounds, maxn

H⁺−Hˆ_𝑡⁺ ,

H⁻−Hˆ_𝑡⁻ o

≤ ∥H −Hˆ𝑡∥ ≤√︁

min{𝑑₁, 𝑑₂+1}∥Gb_yu− G_yu∥

∥N −Nˆ𝑡∥ ≤2

H⁻−Hˆ_𝑡⁻ ≤ 2√︁

min{𝑑₁, 𝑑₂}∥Gb_yu− G_yu∥. Lemma 5.2. Suppose𝜎_min(N ) ≥2∥N −N ∥ˆ where𝜎_min(N )is the smallest nonzero singular value (i.e.,𝑛th largest singular value) ofN. Let rank-𝑛matricesN,Nˆ have singular value decompositions U𝚺V^⊤ and U ˆˆ𝚺Vˆ^⊤. There exists an 𝑛×𝑛 unitary matrixTso that

U𝚺^1/2−U ˆˆ𝚺^1/2T

2 𝐹

V𝚺^1/2−V ˆˆ𝚺^1/2T

2 𝐹

≤ 5𝑛∥N −N ∥ˆ ² 𝜎_𝑛(N ) − ∥N −N ∥ˆ

Proof. For brevity, we noteO=O(𝐴, 𝐶 , 𝑑¯ ₁),CF=C(𝐴, 𝐹 , 𝑑¯ ₂+1),CB=C(𝐴, 𝐵, 𝑑¯ ₂+ 1), Oˆt = Oˆt(𝐴, 𝐶 , 𝑑¯ ₁), CˆFt = Cˆt(𝐴, 𝐹 , 𝑑¯ ₂ +1), CˆBt = Cˆt(𝐴, 𝐵, 𝑑¯ ₂ +1). In the definition of𝑇_𝑁, we use 𝜎_𝑛(𝐻), due to the fact that singular values of submatrices by column partitioning are interlaced, i.e., 𝜎_𝑛(N ) = 𝜎_𝑛(H⁻) ≥ 𝜎_𝑛(H ). Directly applying Lemma 5.2 with the condition that for given𝑡 ≥ 𝑇_𝑁, we have 𝜎_min(N ) ≥ 2∥N −N ∥, we can guarantee that there exists a unitary transformˆ Tsuch that

Oˆt −OT

2 𝐹 +

[CˆFt CˆBt] −T^⊤[CF CB]

𝐹 ≤ 10𝑛∥N −Nˆ𝑡∥²

𝜎_𝑛(N ) . (5.26) Since ˆ𝐶_𝑡−𝐶¯Tis a submatrix ofOˆt−OT, ˆ𝐵_𝑡−T^⊤𝐵¯is a submatrix ofCˆBt−T^⊤CB

and ˆ𝐹_𝑡−T^⊤𝐹¯ is a submatrix ofCˆFt−T^⊤CF, we get the same bounds for them stated in (5.26). Using Lemma 5.1, with the choice of𝑑₁, 𝑑₂≥ ^𝐻

2, we have

∥N −Nˆ𝑡∥ ≤

√

2𝐻∥Gb_yu− G_yu∥.

This provides the advertised bounds in the theorem:

∥𝐵ˆ_𝑡−T^⊤𝐵¯∥,∥𝐶ˆ_𝑡−𝐶¯T∥,∥𝐹ˆ_𝑡−T^⊤𝐹¯∥ ≤

√

20𝑛𝐻∥Gb_yu− G_yu∥

√︁

𝜎_𝑛(N ) .

Notice that for𝑡 ≥𝑇_𝐵, we have all the terms above to be bounded by 1. In order to determine the closeness of ˆ𝐴_𝑡 and ¯𝐴we first consider the closeness of ˆ¯𝐴_𝑡−T^⊤𝐴¯¯T, where ¯¯𝐴is the output obtained by SysIdfor ¯𝐴when the input isG_yu. Let 𝑋 =OT and𝑌 =T^⊤[CF CB]. Thus, we have

∥𝐴ˆ¯_𝑡−T^⊤𝐴¯¯T∥𝐹 = ∥Oˆ^†_tHˆ_𝑡⁺[CˆFt CˆBt]^†− 𝑋^†H⁺𝑌^†∥𝐹

≤

Oˆ^†_t −𝑋^†

Hˆ_𝑡⁺[CˆFt CˆBt]^† 𝐹

+ 𝑋^†

Hˆ_𝑡⁺− H⁺

[CˆFt CˆBt]^† 𝐹

𝑋^†H⁺

[CˆFt CˆBt]^†−𝑌^† 𝐹

For the first term, we have the following perturbation bound [197, 291],

∥Oˆ^†_t − 𝑋^†∥𝐹 ≤ ∥Oˆt−𝑋∥𝐹max{∥𝑋^†∥²,∥Oˆ^†_t∥²}

≤ ∥N −Nˆ𝑡∥

√︄

10𝑛

𝜎_𝑛(N ) max{∥𝑋^†∥²,∥Oˆ^†_t∥²}.

Since we already had𝜎_𝑛(N ) ≥ 2∥N −N ∥, we haveˆ ∥N ∥ ≤ˆ 2∥N ∥and 2𝜎_𝑛(N ) ≥ˆ 𝜎_𝑛(N ). Thus,

max{∥𝑋^†∥²,∥Oˆ^†_t∥²} =max

𝜎_𝑛(N ), 1 𝜎_𝑛(N )ˆ

≤ 2

𝜎_𝑛(N ). (5.27) Combining these and following the same steps for∥ [CˆFt CˆBt]^†−𝑌^†∥𝐹, we get

Oˆ^†_t −𝑋^† 𝐹

[CˆFt CˆBt]^†−𝑌^† 𝐹 ≤

N −Nˆ𝑡

√︄

40𝑛 𝜎_𝑛³(N )

. (5.28) The following individual bounds obtained by using (5.27), (5.28) and triangle inequality:

Oˆ^†_t −𝑋^†

Hˆ_𝑡⁺[CˆFt CˆBt]^† 𝐹

≤ ∥Oˆ^†_t − 𝑋^†∥𝐹∥Hˆ_𝑡⁺∥ ∥ [CˆFt CˆBt]^†∥

≤ 4√ 5𝑛

N −Nˆ𝑡

𝜎²

𝑛(N )

∥H⁺∥ + ∥Hˆ_𝑡⁺− H⁺∥

𝑋^†

Hˆ_𝑡⁺− H⁺

[CˆFt CˆBt]^† 𝐹

≤ 2√

𝑛∥Hˆ_𝑡⁺ − H⁺∥ 𝜎_𝑛(N )

𝑋^†H⁺

[CˆFt CˆBt]^†−𝑌^† 𝐹

≤ ∥𝑋^†∥ ∥H⁺∥ ∥ [CˆFt CˆBt]^†−𝑌^†∥

≤ 2

√ 10𝑛

N −Nˆ𝑡

𝜎_𝑛²(N ) ∥H⁺∥.

Combining these we get

∥𝐴ˆ¯_𝑡−T^⊤𝐴¯¯T∥𝐹≤31√

𝑛∥H⁺∥ N −Nˆ𝑡

2𝜎²

𝑛(N ) + ∥Hˆ_𝑡⁺−H⁺∥ 4√ 5𝑛

N −Nˆ𝑡

𝜎²

𝑛(N ) + 2√

𝑛 𝜎_𝑛(N )

≤ 31√

𝑛∥H⁺∥ 2𝜎²

𝑛(N )

N −Nˆ𝑡

+ 13√ 𝑛

2𝜎_𝑛(N )∥Hˆ_𝑡⁺− H⁺∥.

These results give the estimation error guarantees for the ARX systems. For LQG control systems we additionally need to recover 𝐴 and 𝐿. Now consider ˆ𝐴_𝑡 =

ˆ¯

𝐴_𝑡+𝐹ˆ_𝑡𝐶ˆ_𝑡. Using Lemma 5.1,

∥𝐴ˆ_𝑡−T^⊤𝐴¯T∥𝐹

= ∥𝐴ˆ¯_𝑡 +𝐹ˆ_𝑡𝐶ˆ_𝑡−T^⊤𝐴¯¯T−T^⊤𝐹¯𝐶¯T∥𝐹

≤ ∥𝐴ˆ¯_𝑡−T^⊤𝐴¯¯T∥𝐹+ ∥ (𝐹ˆ_𝑡−T^⊤𝐹¯)𝐶ˆ_𝑡∥𝐹+ ∥T^⊤𝐹¯(𝐶ˆ_𝑡 −𝐶¯T) ∥𝐹

≤ ∥𝐴ˆ¯_𝑡−T^⊤𝐴¯¯T∥𝐹+ ∥ (𝐹ˆ_𝑡−T^⊤𝐹¯) ∥𝐹∥𝐶ˆ_𝑡−𝐶¯T∥𝐹

+ ∥ (𝐹ˆ_𝑡−T^⊤𝐹¯) ∥𝐹∥𝐶¯∥ + ∥𝐹¯∥ ∥ (𝐶ˆ_𝑡−𝐶¯T) ∥𝐹

≤ 31

√

2𝑛𝐻∥H ∥ 2𝜎²

𝑛(N ) ∥Gb_yu− G_yu∥ + 13

√ 𝑛𝐻 2

√

2𝜎_𝑛(N )

∥Gb_yu− G_yu∥ + 20𝑛𝐻∥Gb_yu− G_yu∥² 𝜎_𝑛(N ) + ( ∥𝐹¯∥ + ∥𝐶¯∥) ∥Gb_yu− G_yu∥

√︄

20𝑛𝐻 𝜎_𝑛(N ).

Using the result above, to obtain an estimation error bound for ˆ𝐿_𝑡, we define 𝑇_𝐴 as the samples required to have ∥𝐴ˆ_𝑡 − T^⊤𝐴¯T∥ ≤ 𝜎_𝑛(𝐴¯)/2 for all 𝑡 ≥ 𝑇_𝐴, i.e., 𝑇_𝐴=𝑇_G

yu 62√

2𝑛 𝐻∥ H ∥ 2𝜎2

𝑛( N ) + ²⁶

√ 𝑛 𝐻 2

√

2𝜎𝑛( N )+( ∥𝐹¯∥+∥𝐶¯∥)√︃

80𝑛 𝐻 𝜎𝑛( N )+

√︃40𝑛 𝐻 𝜎𝑛(𝐴¯) 𝜎𝑛( N )

𝜎_𝑛(𝐴¯)

!²

. From Weyl’s inequality, we have𝜎_𝑛(𝐴ˆ_𝑡) ≥ 𝜎_𝑛(𝐴¯)/2. Recalling that𝑋 = O(𝐴, 𝐶 , 𝑑¯ ₁)T, under Assumption 5.1, we consider ˆ𝐿_𝑡:

∥𝐿ˆ_𝑡−T^⊤𝐿¯∥𝐹

= ∥𝐴ˆ^†

𝑡Oˆ^†_tHˆ_𝑡⁻−T^⊤𝐴¯^†O^†H⁻∥𝐹

≤ ∥ (𝐴ˆ^†

𝑡−T^⊤𝐴¯^†T)Oˆ^†_tHˆ_𝑡⁻∥𝐹+ ∥T^⊤𝐴¯^†T(Oˆ^†_t−𝑋^†)Hˆ_𝑡⁻∥𝐹+ ∥T^⊤𝐴¯^†T𝑋^†(Hˆ_𝑡⁻−H⁻) ∥𝐹

≤ ∥𝐴ˆ^†

𝑡−T^⊤𝐴¯^†T∥𝐹∥Oˆ^†_t∥ ∥Hˆ_𝑡⁻∥ + ∥Oˆ^†_t−𝑋^†∥𝐹∥𝐴¯^†∥ ∥Hˆ_𝑡⁻∥ +√

𝑛∥Hˆ_𝑡⁻−H⁻∥ ∥𝐴¯^†∥ ∥𝑋^†∥

≤ ∥𝐴ˆ^†

𝑡−T^⊤𝐴¯^†T∥𝐹

√︄

2 𝜎_𝑛(N ) +

N −Nˆ𝑡

√︄

40𝑛

𝜎_𝑛³(N )∥𝐴¯^†∥

∥H⁻∥ + ∥Hˆ_𝑡⁻−H⁻∥

+√

𝑛∥𝐴¯^†∥ 1

√︁

𝜎_𝑛(N )

∥Hˆ_𝑡⁻− H⁻∥.

Again using the perturbation bounds of the Moore–Penrose inverse under the Frobenius norm [197], we have ∥𝐴ˆ^†

𝑡 − T^⊤𝐴¯^†T∥𝐹 ≤ ²

𝜎_𝑛²(𝐴¯)∥𝐴ˆ_𝑡 − T^⊤𝐴¯T∥. No- tice that the similarity transformation that transfers 𝐴 to ¯𝐴 is bounded since 𝑆 = [𝐶^⊤ (𝐶𝐴¯)^⊤. . .(𝐶𝐴¯^𝑑¹⁻¹)^⊤]^⊤^†

O(𝐴, 𝐶 , 𝑑¯ ₁). Combining all and using Lemma 5.1, we obtain the confidence set for ˆ𝐿_𝑡given in Theorem 5.4. □ Combining Theorem 5.4 with the guarantee that ∥Gb_yu− G_yu∥ = O (1/˜ √

𝑇) given in (5.23), finishes the proof of the second part of Theorem 5.2. Overall, we showed that our novel system identification method allows closed-loop and open-loop estimation in both LQG and ARX systems. This method will be the key piece in our adaptive control design.

Remark 5.1. Note that to recoverG_yu using the closed-loop system identification method presented in this section, we only require stabilizability condition on(𝐴, 𝐵) and detectability conditions on (𝐴, 𝐶), i.e., there exists a matrix𝐾 and𝐹such that 𝐴−𝐵𝐾and𝐴−𝐹 𝐶are stable, rather than controllability and observability condi- tions provided in Assumption 5.1. Stabilizability and detectability are necessary and sufficient conditions to have a well-defined learning and control problem in partially observable linear dynamical systems, and they provide the conditions required for our novel closed-loop system identification method to work, i.e., stable𝐴¯. However, controllability and observability assumptions are required for the subspace identifi- cation methodSysId, since it requires rank-𝑛observability and controllability matri- ces to achieve a balanced realization. If the goal is to recover the Markov parameters of the system or if one can design adaptive control methods only using Markov param- eter estimates, e.g., Section 5.6.5, stabilizability and detectability of the underlying system are sufficient to have reliable estimates as in Theorem 5.3 and(5.23). 5.3.2 PE Condition in the Open-Loop Setting

Before studying the adaptive control problem in partially observable linear dynamical systems, at the end of this section, we show that the PE condition required for consistent estimation is satisfied for the open-loop control, i.e., i.i.d. Gaussian control inputs. To this end, we introduce the truncated open-loop noise evolution parameterG^𝑜𝑙. G^𝑜𝑙 represents the effect of noises in the system on the outputs. We define G^𝑜𝑙 for 2𝐻 time steps back in time and show that the last 2𝐻 process and measurement noises provide sufficient persistent excitation for the covariates in the estimation problem. In the following, we show that there exists a positive𝜎_𝑜such that 𝜎_𝑜 < 𝜎_min(G^𝑜𝑙), i.e., G^𝑜𝑙 is full row rank. Let ¯𝜙_𝑡 = 𝑃 𝜙_𝑡 for a permutation

matrix𝑃that gives

¯ 𝜙_𝑡 =

𝑦^⊤

𝑡−1 𝑢^⊤

𝑡−1. . . 𝑦^⊤

𝑡−𝐻 𝑢^⊤

𝑡−𝐻

⊤

∈R^(𝑚+^𝑝)𝐻.

We will consider the state space representation for the analysis for LQG control systems given in (5.1), but one can apply the same analysis for predictor form/ARX systems (see [163] for the details). For the control input of 𝑢_𝑡 ∼ N (0, 𝜎²

𝑢𝐼), let 𝑓_𝑡 = [𝑦^⊤

𝑡 𝑢^⊤

𝑡 ]^⊤. From the evolution of the system with given input we have the following:

𝑓_𝑡 =G^o h

𝑤^⊤

𝑡−1 𝑧^⊤

𝑡 𝑢^⊤

𝑡 . . . 𝑤^⊤

𝑡−𝐻 𝑧^⊤

𝑡−𝐻+1 𝑢^⊤

𝑡−𝐻+1

i⊤

+r^o_t where

G^o :=

0𝑚×𝑛 𝐼_𝑚_×_𝑚 0𝑚×𝑝 𝐶0𝑚×𝑚 𝐶 𝐵 . . . 𝐶 𝐴^𝐻⁻²0𝑚×𝑚 𝐶 𝐴^𝐻⁻²𝐵 0𝑝×𝑛0𝑝×𝑚 𝐼_𝑝×_𝑝 0𝑝×𝑛0𝑝×𝑚 0𝑝×𝑝 . . . 0𝑝×𝑛0𝑝×𝑚 0𝑝×𝑝

# ,

andr^o_t is the residual vector that represents the effect of [𝑤_𝑖₋₁ 𝑧_𝑖 𝑢_𝑖] for 0 ≤ 𝑖 <

𝑡−𝐻, which are independent. Notice thatG^ois full row rank even for 𝐻 = 1, due to first(𝑚+𝑝) × (𝑚+𝑛+𝑝)block. Using this, we can represent ¯𝜙_𝑡 as follows

¯ 𝜙_𝑡 =





 𝑓_𝑡−1

.. . 𝑓_𝑡₋_𝐻





| {z }

R⁽^𝑚⁺^𝑝⁾^𝐻





 r^o_t₋₁

.. . r^o_t₋_H







=G^𝑜𝑙





 𝑤_𝑡−₂

𝑧_𝑡−₁ 𝑢_𝑡−1 .. . 𝑤_𝑡₋₂_𝐻₋₁

𝑧_𝑡₋₂_𝐻 𝑢_𝑡₋₂_𝐻





| {z }

R²⁽^𝑛⁺^𝑚⁺^𝑝⁾^𝐻





 r^o_t₋₁

.. . r^o_t₋_H







where

G^𝑜𝑙B







[ G^o ] 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) . . .

0(𝑚+𝑝)×(𝑚+𝑛+𝑝) [ G^o ] 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) . . .

...

0(𝑚+𝑝)×(𝑚+𝑛+𝑝) 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) . . . [ G^o ] 0(𝑚+𝑝)×(𝑚+𝑛+𝑝)

0(𝑚+𝑝)×(𝑚+𝑛+𝑝) 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) 0(𝑚+𝑝)×(𝑚+𝑛+𝑝) . . . [ G^o ]





 .

(5.29) Recall Assumption 5.1. The following lemma shows that covariates𝜙s are bounded for the given system under open-loop control.

Lemma 5.3. After applying the control inputs of𝑢_𝑡 ∼ N (0, 𝜎²

𝑢𝐼) for𝑇_𝑤 time steps for all1 ≤ 𝑡 ≤𝑇_𝑤, with probability1−𝛿/2,

∥𝑥_𝑡∥ ≤ 𝑋_𝑤 B

(𝜎_𝑤+𝜎_𝑢∥𝐵∥)𝜅₁(1−𝛾₁)

√︁1− (1−𝛾₁)²

√︁2𝑛log(12𝑛𝑇_𝑤/𝛿), (5.30)

∥𝑧_𝑡∥ ≤ 𝑍 B 𝜎_𝑧

√︁2𝑚log(12𝑚𝑇_𝑤/𝛿), (5.31)

∥𝑢_𝑡∥ ≤𝑈_𝑤 B 𝜎_𝑢

√︁2𝑝log(12𝑝𝑇_𝑤/𝛿), (5.32)

∥𝑦_𝑡∥ ≤ ∥𝐶∥𝑋_𝑤+𝑍 . (5.33)

Thus, we havemax𝑖≤𝑡≤𝑇𝑤 ∥𝜙_𝑖∥ ≤ Υ𝑤

√

𝐻, whereΥ𝑤 = ∥𝐶∥𝑋_𝑤+𝑍+𝑈_𝑤.

Proof. For all 1 ≤ 𝑡 ≤ 𝑇_𝑤, Σ(𝑥_𝑡) ≼ 𝚪_∞, where 𝚪_∞ is the steady state covariance matrix of𝑥_𝑡such that,

𝚪_∞ =∑︁∞ 𝑖=0

𝜎²

𝑤𝐴^𝑖(𝐴^⊤)^𝑖+𝜎²

𝑢𝐴^𝑖𝐵 𝐵^⊤(𝐴^⊤)^𝑖.

From the Assumption 5.1, we have ∥𝐴^𝜏∥ ≤ 𝜅₁(1 − 𝛾₁)^𝜏 for all 𝜏 ≥ 0. Thus,

∥𝚪_∞∥ ≤ (𝜎²

𝑤+𝜎²

𝑢∥𝐵∥²)^𝜅

2 1(1−𝛾₁)²

1−(1−𝛾₁)². Notice that each𝑥_𝑡 is component-wise√︁

∥𝚪_∞∥- sub-Gaussian random variable. Using standard sub-Gaussian vector norm upper bound with a union bound argument, we get the advertised result. □ The following lemma shows that the i.i.d. Gaussian inputs uniformly excite the system and satisfy the PE condition after enough interactions.

Lemma 5.4(Persistence of Excitation in Open-Loop Control Setting). G^𝑜𝑙 is full row-rank such that𝜎_min(G^𝑜𝑙) > 𝜎_𝑜 > 0. For some𝛿 ∈ (0,1), and Υ𝑤 defined in Lemma 5.3, let𝑇_𝑜 =32Υ⁴𝑤𝜎⁻⁴

𝑜 log²

2𝐻(𝑚+𝑝) 𝛿

max{𝜎⁻⁴

𝑤 , 𝜎⁻⁴

𝑧 , 𝜎⁻⁴

𝑢 }. After applying the control inputs of𝑢_𝑡 ∼ N (0, 𝜎²

𝑢𝐼)for𝑇_𝑤 ≥𝑇_𝑜time steps, with probability at least 1−𝛿we have𝜎_min Í^𝑡

𝑖=1𝜙_𝑖𝜙^⊤

𝑖

≥ 𝑡

𝜎²

𝑜

2 min{𝜎²

𝑤, 𝜎²

𝑧, 𝜎²

𝑢}.

Proof. Let ¯0 = 0(𝑚+𝑝)×(𝑚+𝑛+𝑝). Since each block row is full row-rank, we get the following decomposition using QR decomposition for each block row:

G^𝑜𝑙 =







𝑄^𝑜 0𝑚+𝑝 0𝑚+𝑝 0𝑚+𝑝 . . . 0𝑚+𝑝 𝑄^𝑜 0𝑚+𝑝 0𝑚+𝑝 . . .

.. .

0𝑚+𝑝 0𝑚+𝑝 . . . 𝑄^𝑜 0𝑚+𝑝

0𝑚+𝑝 0𝑚+𝑝 0𝑚+𝑝 . . . 𝑄^𝑜





| {z }

R(𝑚+𝑝)𝐻× (𝑚+𝑝)𝐻







𝑅^𝑜 0¯ 0¯ 0¯ . . . 0¯ 𝑅^𝑜 0¯ 0¯ . . .

.. .

0¯ 0¯ . . . 𝑅^𝑜 0¯ 0¯ 0¯ 0¯ . . . 𝑅^𝑜





| {z }

R^(𝑚+𝑝)^𝐻×²^{(𝑚+𝑛+𝑝)𝐻}

where 𝑅^𝑜 =







× × × × × × . . . 0 × × × × × . . .

...

0 0 0 × × × . . .







∈ R⁽^𝑚⁺^𝑝^)×^𝐻⁽^𝑚⁺^𝑛⁺^𝑝⁾ where the elements

in the diagonal are positive numbers. Notice that the first matrix with 𝑄⁰ is full rank. Also, all the rows of the second matrix are in row echelon form and the second matrix is full row-rank. Thus, we can deduce thatG^𝑜𝑙 is full row-rank, i.e., 𝜎_min(G^𝑜𝑙) > 𝜎_𝑜 > 0. SinceG^𝑜𝑙 is full row rank, we have that

E[𝜙¯_𝑡𝜙¯^⊤

𝑡 ] ⪰ G^𝑜𝑙Σ𝑤 ,𝑧,𝑢G^𝑜𝑙⊤, where Σ𝑤 ,𝑧,𝑢 ∈ R²⁽^𝑛⁺^𝑚⁺^𝑝⁾^𝐻^×2(^𝑛⁺^𝑚⁺^𝑝⁾^𝐻 = diag(𝜎²

𝑤, 𝜎²

𝑧, 𝜎²

𝑢, . . . , 𝜎²

𝑤, 𝜎²

𝑧, 𝜎²

𝑢). This gives us

𝜎_min(E[𝜙¯_𝑡𝜙¯^⊤

𝑡 ]) ≥𝜎²

𝑜min{𝜎²

𝑤, 𝜎²

𝑧, 𝜎²

𝑢} for all𝑡. From Lemma 5.3, we have max𝑖≤𝜏 ∥𝜙_𝑖∥ ≤Υ𝑤

√

𝐻with probability at least 1−𝛿/2. Given this holds, one can use Matrix Azuma inequality in [267], to obtain the following which holds with probability 1−𝛿/2:

𝜆_max

𝑡

∑︁

𝑖=1

𝜙_𝑖𝜙^⊤

𝑖 −E[𝜙_𝑖𝜙^⊤

𝑖 ]

≤ 2

√

2𝑡Υ²_𝑤𝐻

√︄

log

2𝐻(𝑚+ 𝑝) 𝛿

Using Weyl’s inequality, during the warm-up period with probability 1−𝛿, we have

𝜎_min

𝑡

∑︁

𝑖=1

𝜙_𝑖𝜙^⊤

𝑖

≥ 𝑡 𝜎²

𝑜min{𝜎²

𝑤, 𝜎²

𝑧, 𝜎²

𝑢} −2

√ 2𝑡Υ²_𝑤𝐻

√︄

log

2𝐻(𝑚+𝑝) 𝛿

For all𝑡 ≥𝑇_𝑜, we have the stated lower bound. □

This result verifies that the PE condition holds in the open-loop control setting, which shows that the estimation error guarantees given in Theorem 5.2 hold for open-loop data collection. Therefore, even if the closed-loop PE condition is not satisfied such that we cannot guarantee the estimation error guarantees given in Theorem 5.2 for the closed-loop control, one can use the novel system identification method with the i.i.d. control inputs to obtain state-of-the-art guarantees. However, if one has PE in the closed-loop setting we can further guarantee consistent improvement of estimates which would not be possible with prior methods. This novelty will be crucial in the adaptive control tasks discussed next.

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 162-171)