• Tidak ada hasil yang ditemukan

Proof of Theorem 5.2

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 162-171)

LEARNING AND CONTROL IN PARTIALLY OBSERVABLE LINEAR DYNAMICAL SYSTEMS

5.3 A Novel Closed-Loop System Identification Method

5.3.1 Proof of Theorem 5.2

In this section, we first present the proof of Theorem 5.2 under the PE assumption with precise expressions. In particular, we show the self-normalized error bound on the (5.21), Theorem 5.3. Then, assuming the PE condition, we convert the self- normalized bound into a Frobenius norm bound to be used for parameter estimation error bounds in Theorem 5.4, which concludes the proof of Theorem 5.2.

First, consider the effect of the truncation bias term, ๐ถ๐ดยฏ๐ป๐‘ฅ๐‘กโˆ’๐ป in (5.18). From Assumption 5.1, we have that ยฏ๐ดis (๐œ…3, ๐›พ3) stable. Thus,๐ถ๐ดยฏ๐ป๐‘ฅ๐‘กโˆ’๐ป scales with the order of (1โˆ’๐›พ3)๐ป for bounded๐‘ฅ. In order to get consistent estimation, for some problem-dependent constant๐‘๐ป, we set๐ป โ‰ฅ log(๐‘๐ป๐‘‡

โˆš๐‘š/โˆš ๐œ†)

log(1/(1โˆ’๐›พ3)) , resulting in a negligible bias term of order 1/๐‘‡. Note that ๐‘๐ป is determined by the underlying system and the control policy since it is related to the scaling of the latent state. Using this we first obtain a self-normalized finite sample estimation error of (5.21):

Theorem 5.3(Self-normalized Estimation Error). LetGbyube the solution to (5.21) at time ๐œ. For ๐ป โ‰ฅ log(๐‘๐ป๐‘‡

โˆš๐‘š/โˆš ๐œ†)

log(1/(1โˆ’๐›พ3)) , define๐‘‰๐œ =๐œ† ๐ผ +ร๐œ

๐‘–=๐ป๐œ™๐‘–๐œ™โŠค

๐‘– . Let โˆฅGyuโˆฅ๐น โ‰ค ๐‘†. For๐›ฟ โˆˆ (0,1), with probability at least1โˆ’๐›ฟ, for all๐‘ก โ‰ค ๐œ,Gyulies in the setCG

yu,๐‘ก, where

CG

yu,๐‘ก ={Gyuโ€ฒ: Tr( (Gbyuโˆ’ Gyuโ€ฒ)๐‘‰๐‘ก(Gbyuโˆ’ Gyuโ€ฒ)โŠค) โ‰ค ๐›ฝ2

๐œ}, for ๐›ฝ๐œ =

โˆš๏ธ‚

๐‘šฮฃ๐‘’log

det(๐‘‰๐œ)1/2 ๐›ฟdet(๐œ† ๐ผ)1/2

+๐‘†

โˆš ๐œ†+ ๐œ

โˆš ๐ป

๐‘‡ , whereฮฃ๐‘’Bโˆฅ๐ถฮฃ๐ถโŠค+๐œŽ2

๐‘ง๐ผโˆฅ๐น.

The proof is given in Appendix C.1. Note that the above result holds under sub- Gaussian๐‘’๐‘ก and is satisfied in both LQG control systems and ARX systems. Using

this result, we have

๐œŽmin(๐‘‰๐œ) โˆฅGbyuโˆ’ Gyuโˆฅ2๐น โ‰ค Tr( (Gbyuโˆ’ Gyu)๐‘‰๐‘ก(Gbyuโˆ’ Gyu)โŠค) โ‰ค ๐›ฝ2

๐œ,

Assume that ๐œ™๐‘– is bounded (which will be rigorously shown for different adaptive control algorithms, i.e., Sections 5.4โ€“5.6) such that max๐‘–โ‰ค๐œโˆฅ๐œ™๐‘–โˆฅ โ‰ค ฮฅโˆš

๐ป. For persistently exciting inputs,i.e.,๐œŽmin(๐‘‰๐œ) โ‰ฅ ๐œŽ2

โ˜…๐œfor๐œŽโ˜…> 0, we get, with probability at least 1โˆ’๐›ฟ,

โˆฅGbyuโˆ’ Gyuโˆฅ๐น โ‰ค

โˆš๏ธ‚

๐‘šฮฃ๐‘’

log(1๐›ฟ) + ๐ป(๐‘š+๐‘)

2 log

๐œ†(๐‘š+๐‘)+๐œฮฅ2 ๐œ†(๐‘š+๐‘) +๐‘†

โˆš ๐œ†+โˆš

๐ป ๐œŽโ˜…

โˆš ๐œ

(5.23) after๐œtime steps. Note thatGbyuโˆ’Gyu = [bG๐‘ฆโ†’๐‘ฆ,Gb๐‘ขโ†’๐‘ฆ] โˆ’ [G๐‘ฆโ†’๐‘ฆ,G๐‘ขโ†’๐‘ฆ], thus (5.23) translates to the same error bounds forโˆฅGb๐‘ฆโ†’๐‘ฆโˆ’G๐‘ฆโ†’๐‘ฆโˆฅandโˆฅGb๐‘ขโ†’๐‘ฆโˆ’G๐‘ขโ†’๐‘ฆโˆฅ, proving the first part of Theorem 5.2. This result shows that the novel least squares problem provides consistent estimates and the estimation error is หœO (1/โˆš

๐‘‡)after๐‘‡ samples.

For the second part of Theorem 5.2, we show thatSysIdprovides a balanced realiza- tion ofฮ˜such that we have confidence sets around the estimated model parameters in which a similarity transformation ofฮ˜ lives in with high probability similar to Theorem 5.1. For this, define ๐‘‡G

yu as the number of samples required such that

โˆฅGbyuโˆ’ Gyuโˆฅ โ‰ค 1 in (5.23). Let ๐‘‡๐‘ =๐‘‡G

yu

8๐ป ๐œŽ๐‘›2(H )

, ๐‘‡๐ต =๐‘‡G

yu

20๐‘›๐ป

๐œŽ๐‘›(H ). (5.24)

We have the following result on the model parameter estimates.

Theorem 5.4(Model Parameters Estimation Error). LetH be the concatenation of two Hankel matrices obtained fromGyu. Let ๐ด,ยฏ ๐ต,ยฏ ๐ถ ,ยฏ ๐น ,ยฏ ๐ฟยฏ be the system parameters that SysIdprovides for Gyu. At time step ๐‘ก, let ๐ดห†๐‘ก,๐ตห†๐‘ก,๐ถห†๐‘ก,๐นห†๐‘ก,๐ฟห†๐‘ก denote the system parameters obtained bySysId using Gbyu. For all๐‘ก โ‰ฅ max{๐‘‡G

yu, ๐‘‡๐‘, ๐‘‡๐ต}, for ๐ป โ‰ฅ maxn

2๐‘›+1,log(๐‘๐ป

๐‘‡

โˆš๐‘š/โˆš ๐œ†) log(1/(1โˆ’๐›พ3))

o

, there exists a unitary matrix T โˆˆ R๐‘›ร—๐‘› such that, ฮ˜ =ยฏ (๐ด,ยฏ ๐ต,ยฏ ๐ถ ,ยฏ ๐น ,ยฏ ๐ฟยฏ) โˆˆ (C๐ดร— C๐ตร— C๐ถร— C๐น ร— C๐ฟ) where

C๐ด(๐‘ก)=

๐ดโ€ฒโˆˆR๐‘›ร—๐‘›:โˆฅ๐ดห†๐‘กโˆ’TโŠค๐ดโ€ฒTโˆฅ โ‰ค๐›ฝ๐ด

๐‘ก , C๐ต(๐‘ก)=

๐ตโ€ฒโˆˆR๐‘›ร—๐‘:โˆฅ๐ตห†๐‘กโˆ’TโŠค๐ตโ€ฒโˆฅ โ‰ค ๐›ฝ๐ต

๐‘ก ,

C๐ถ(๐‘ก)=

๐ถโ€ฒโˆˆR๐‘šร—๐‘›:โˆฅ๐ถห†๐‘กโˆ’๐ถโ€ฒTโˆฅ โ‰ค ๐›ฝ๐ถ

๐‘ก , C๐น(๐‘ก)=

๐นโ€ฒโˆˆR๐‘›ร—๐‘š:โˆฅ๐นห†๐‘กโˆ’TโŠค๐นโ€ฒโˆฅ โ‰ค ๐›ฝ๐น

๐‘ก ,

C๐ฟ(๐‘ก)=

๐ฟโ€ฒโˆˆR๐‘›ร—๐‘š:โˆฅ๐ฟห†๐‘กโˆ’TโŠค๐ฟโ€ฒโˆฅ โ‰ค ๐›ฝ๐ฟ(๐‘ก) , (5.25)

for ๐›ฝ๐ด

๐‘ก =๐‘1

โˆš

๐‘›๐ป( โˆฅH โˆฅ +๐œŽ๐‘›(H )) ๐œŽ๐‘›2(H )

!

โˆฅGbyuโˆ’Gyuโˆฅ, ๐›ฝ๐ต

๐‘ก =๐›ฝ๐ถ

๐‘ก =๐›ฝ๐น

๐‘ก =

โˆš๏ธ„

20๐‘›๐ป

๐œŽ๐‘›(H )โˆฅGbyuโˆ’Gyuโˆฅ, ๐›ฝ๐ฟ

๐‘ก = ๐‘2โˆฅH โˆฅ

โˆš๏ธ

๐œŽ๐‘›(H )

๐›ฝ๐ด+๐‘3

โˆš

๐‘›๐ป( โˆฅH โˆฅ +๐œŽ๐‘›(H )) ๐œŽ3/2

๐‘› (H )

โˆฅGbyuโˆ’Gyuโˆฅ, for some problem-dependent constants๐‘1, ๐‘2and๐‘3.

Before presenting the proof, we state the following lemmas which are adapted from Oymak and Ozay [213] with slight modifications to fit our setting. In particular, they are originally used for the Ho-Kalman algorithm andSysIdis a variant of this algo- rithm. These results will be useful in proving error bounds on system parameters.

Lemma 5.1. H,Hห†๐‘กandN,Nห†๐‘กsatisfies the following perturbation bounds, maxn

H+โˆ’Hห†๐‘ก+ ,

Hโˆ’โˆ’Hห†๐‘กโˆ’ o

โ‰ค โˆฅH โˆ’Hห†๐‘กโˆฅ โ‰คโˆš๏ธ

min{๐‘‘1, ๐‘‘2+1}โˆฅGbyuโˆ’ Gyuโˆฅ

โˆฅN โˆ’Nห†๐‘กโˆฅ โ‰ค2

Hโˆ’โˆ’Hห†๐‘กโˆ’ โ‰ค 2โˆš๏ธ

min{๐‘‘1, ๐‘‘2}โˆฅGbyuโˆ’ Gyuโˆฅ. Lemma 5.2. Suppose๐œŽmin(N ) โ‰ฅ2โˆฅN โˆ’N โˆฅห† where๐œŽmin(N )is the smallest nonzero singular value (i.e.,๐‘›th largest singular value) ofN. Let rank-๐‘›matricesN,Nห† have singular value decompositions U๐šบVโŠค and U ห†ห†๐šบVห†โŠค. There exists an ๐‘›ร—๐‘› unitary matrixTso that

U๐šบ1/2โˆ’U ห†ห†๐šบ1/2T

2 ๐น

+

V๐šบ1/2โˆ’V ห†ห†๐šบ1/2T

2 ๐น

โ‰ค 5๐‘›โˆฅN โˆ’N โˆฅห† 2 ๐œŽ๐‘›(N ) โˆ’ โˆฅN โˆ’N โˆฅห†

.

Proof. For brevity, we noteO=O(๐ด, ๐ถ , ๐‘‘ยฏ 1),CF=C(๐ด, ๐น , ๐‘‘ยฏ 2+1),CB=C(๐ด, ๐ต, ๐‘‘ยฏ 2+ 1), Oห†t = Oห†t(๐ด, ๐ถ , ๐‘‘ยฏ 1), Cห†Ft = Cห†t(๐ด, ๐น , ๐‘‘ยฏ 2 +1), Cห†Bt = Cห†t(๐ด, ๐ต, ๐‘‘ยฏ 2 +1). In the definition of๐‘‡๐‘, we use ๐œŽ๐‘›(๐ป), due to the fact that singular values of submatrices by column partitioning are interlaced, i.e., ๐œŽ๐‘›(N ) = ๐œŽ๐‘›(Hโˆ’) โ‰ฅ ๐œŽ๐‘›(H ). Directly applying Lemma 5.2 with the condition that for given๐‘ก โ‰ฅ ๐‘‡๐‘, we have ๐œŽmin(N ) โ‰ฅ 2โˆฅN โˆ’N โˆฅ, we can guarantee that there exists a unitary transformห† Tsuch that

Oห†t โˆ’OT

2 ๐น +

[Cห†Ft Cห†Bt] โˆ’TโŠค[CF CB]

2

๐น โ‰ค 10๐‘›โˆฅN โˆ’Nห†๐‘กโˆฅ2

๐œŽ๐‘›(N ) . (5.26) Since ห†๐ถ๐‘กโˆ’๐ถยฏTis a submatrix ofOห†tโˆ’OT, ห†๐ต๐‘กโˆ’TโŠค๐ตยฏis a submatrix ofCห†Btโˆ’TโŠคCB

and ห†๐น๐‘กโˆ’TโŠค๐นยฏ is a submatrix ofCห†Ftโˆ’TโŠคCF, we get the same bounds for them stated in (5.26). Using Lemma 5.1, with the choice of๐‘‘1, ๐‘‘2โ‰ฅ ๐ป

2, we have

โˆฅN โˆ’Nห†๐‘กโˆฅ โ‰ค

โˆš

2๐ปโˆฅGbyuโˆ’ Gyuโˆฅ.

This provides the advertised bounds in the theorem:

โˆฅ๐ตห†๐‘กโˆ’TโŠค๐ตยฏโˆฅ,โˆฅ๐ถห†๐‘กโˆ’๐ถยฏTโˆฅ,โˆฅ๐นห†๐‘กโˆ’TโŠค๐นยฏโˆฅ โ‰ค

โˆš

20๐‘›๐ปโˆฅGbyuโˆ’ Gyuโˆฅ

โˆš๏ธ

๐œŽ๐‘›(N ) .

Notice that for๐‘ก โ‰ฅ๐‘‡๐ต, we have all the terms above to be bounded by 1. In order to determine the closeness of ห†๐ด๐‘ก and ยฏ๐ดwe first consider the closeness of ห†ยฏ๐ด๐‘กโˆ’TโŠค๐ดยฏยฏT, where ยฏยฏ๐ดis the output obtained by SysIdfor ยฏ๐ดwhen the input isGyu. Let ๐‘‹ =OT and๐‘Œ =TโŠค[CF CB]. Thus, we have

โˆฅ๐ดห†ยฏ๐‘กโˆ’TโŠค๐ดยฏยฏTโˆฅ๐น = โˆฅOห†โ€ tHห†๐‘ก+[Cห†Ft Cห†Bt]โ€ โˆ’ ๐‘‹โ€ H+๐‘Œโ€ โˆฅ๐น

โ‰ค

Oห†โ€ t โˆ’๐‘‹โ€ 

Hห†๐‘ก+[Cห†Ft Cห†Bt]โ€  ๐น

+ ๐‘‹โ€ 

Hห†๐‘ก+โˆ’ H+

[Cห†Ft Cห†Bt]โ€  ๐น

+

๐‘‹โ€ H+

[Cห†Ft Cห†Bt]โ€ โˆ’๐‘Œโ€  ๐น

.

For the first term, we have the following perturbation bound [197, 291],

โˆฅOห†โ€ t โˆ’ ๐‘‹โ€ โˆฅ๐น โ‰ค โˆฅOห†tโˆ’๐‘‹โˆฅ๐นmax{โˆฅ๐‘‹โ€ โˆฅ2,โˆฅOห†โ€ tโˆฅ2}

โ‰ค โˆฅN โˆ’Nห†๐‘กโˆฅ

โˆš๏ธ„

10๐‘›

๐œŽ๐‘›(N ) max{โˆฅ๐‘‹โ€ โˆฅ2,โˆฅOห†โ€ tโˆฅ2}.

Since we already had๐œŽ๐‘›(N ) โ‰ฅ 2โˆฅN โˆ’N โˆฅ, we haveห† โˆฅN โˆฅ โ‰คห† 2โˆฅN โˆฅand 2๐œŽ๐‘›(N ) โ‰ฅห† ๐œŽ๐‘›(N ). Thus,

max{โˆฅ๐‘‹โ€ โˆฅ2,โˆฅOห†โ€ tโˆฅ2} =max

1

๐œŽ๐‘›(N ), 1 ๐œŽ๐‘›(N )ห†

โ‰ค 2

๐œŽ๐‘›(N ). (5.27) Combining these and following the same steps forโˆฅ [Cห†Ft Cห†Bt]โ€ โˆ’๐‘Œโ€ โˆฅ๐น, we get

Oห†โ€ t โˆ’๐‘‹โ€  ๐น

,

[Cห†Ft Cห†Bt]โ€ โˆ’๐‘Œโ€  ๐น โ‰ค

N โˆ’Nห†๐‘ก

โˆš๏ธ„

40๐‘› ๐œŽ๐‘›3(N )

. (5.28) The following individual bounds obtained by using (5.27), (5.28) and triangle in- equality:

Oห†โ€ t โˆ’๐‘‹โ€ 

Hห†๐‘ก+[Cห†Ft Cห†Bt]โ€  ๐น

โ‰ค โˆฅOห†โ€ t โˆ’ ๐‘‹โ€ โˆฅ๐นโˆฅHห†๐‘ก+โˆฅ โˆฅ [Cห†Ft Cห†Bt]โ€ โˆฅ

โ‰ค 4โˆš 5๐‘›

N โˆ’Nห†๐‘ก

๐œŽ2

๐‘›(N )

โˆฅH+โˆฅ + โˆฅHห†๐‘ก+โˆ’ H+โˆฅ

๐‘‹โ€ 

Hห†๐‘ก+โˆ’ H+

[Cห†Ft Cห†Bt]โ€  ๐น

โ‰ค 2โˆš

๐‘›โˆฅHห†๐‘ก+ โˆ’ H+โˆฅ ๐œŽ๐‘›(N )

๐‘‹โ€ H+

[Cห†Ft Cห†Bt]โ€ โˆ’๐‘Œโ€  ๐น

โ‰ค โˆฅ๐‘‹โ€ โˆฅ โˆฅH+โˆฅ โˆฅ [Cห†Ft Cห†Bt]โ€ โˆ’๐‘Œโ€ โˆฅ

โ‰ค 2

โˆš 10๐‘›

N โˆ’Nห†๐‘ก

๐œŽ๐‘›2(N ) โˆฅH+โˆฅ.

Combining these we get

โˆฅ๐ดห†ยฏ๐‘กโˆ’TโŠค๐ดยฏยฏTโˆฅ๐นโ‰ค31โˆš

๐‘›โˆฅH+โˆฅ N โˆ’Nห†๐‘ก

2๐œŽ2

๐‘›(N ) + โˆฅHห†๐‘ก+โˆ’H+โˆฅ 4โˆš 5๐‘›

N โˆ’Nห†๐‘ก

๐œŽ2

๐‘›(N ) + 2โˆš

๐‘› ๐œŽ๐‘›(N )

!

โ‰ค 31โˆš

๐‘›โˆฅH+โˆฅ 2๐œŽ2

๐‘›(N )

N โˆ’Nห†๐‘ก

+ 13โˆš ๐‘›

2๐œŽ๐‘›(N )โˆฅHห†๐‘ก+โˆ’ H+โˆฅ.

These results give the estimation error guarantees for the ARX systems. For LQG control systems we additionally need to recover ๐ด and ๐ฟ. Now consider ห†๐ด๐‘ก =

ห†ยฏ

๐ด๐‘ก+๐นห†๐‘ก๐ถห†๐‘ก. Using Lemma 5.1,

โˆฅ๐ดห†๐‘กโˆ’TโŠค๐ดยฏTโˆฅ๐น

= โˆฅ๐ดห†ยฏ๐‘ก +๐นห†๐‘ก๐ถห†๐‘กโˆ’TโŠค๐ดยฏยฏTโˆ’TโŠค๐นยฏ๐ถยฏTโˆฅ๐น

โ‰ค โˆฅ๐ดห†ยฏ๐‘กโˆ’TโŠค๐ดยฏยฏTโˆฅ๐น+ โˆฅ (๐นห†๐‘กโˆ’TโŠค๐นยฏ)๐ถห†๐‘กโˆฅ๐น+ โˆฅTโŠค๐นยฏ(๐ถห†๐‘ก โˆ’๐ถยฏT) โˆฅ๐น

โ‰ค โˆฅ๐ดห†ยฏ๐‘กโˆ’TโŠค๐ดยฏยฏTโˆฅ๐น+ โˆฅ (๐นห†๐‘กโˆ’TโŠค๐นยฏ) โˆฅ๐นโˆฅ๐ถห†๐‘กโˆ’๐ถยฏTโˆฅ๐น

+ โˆฅ (๐นห†๐‘กโˆ’TโŠค๐นยฏ) โˆฅ๐นโˆฅ๐ถยฏโˆฅ + โˆฅ๐นยฏโˆฅ โˆฅ (๐ถห†๐‘กโˆ’๐ถยฏT) โˆฅ๐น

โ‰ค 31

โˆš

2๐‘›๐ปโˆฅH โˆฅ 2๐œŽ2

๐‘›(N ) โˆฅGbyuโˆ’ Gyuโˆฅ + 13

โˆš ๐‘›๐ป 2

โˆš

2๐œŽ๐‘›(N )

โˆฅGbyuโˆ’ Gyuโˆฅ + 20๐‘›๐ปโˆฅGbyuโˆ’ Gyuโˆฅ2 ๐œŽ๐‘›(N ) + ( โˆฅ๐นยฏโˆฅ + โˆฅ๐ถยฏโˆฅ) โˆฅGbyuโˆ’ Gyuโˆฅ

โˆš๏ธ„

20๐‘›๐ป ๐œŽ๐‘›(N ).

Using the result above, to obtain an estimation error bound for ห†๐ฟ๐‘ก, we define ๐‘‡๐ด as the samples required to have โˆฅ๐ดห†๐‘ก โˆ’ TโŠค๐ดยฏTโˆฅ โ‰ค ๐œŽ๐‘›(๐ดยฏ)/2 for all ๐‘ก โ‰ฅ ๐‘‡๐ด, i.e., ๐‘‡๐ด=๐‘‡G

yu 62โˆš

2๐‘› ๐ปโˆฅ H โˆฅ 2๐œŽ2

๐‘›( N ) + 26

โˆš ๐‘› ๐ป 2

โˆš

2๐œŽ๐‘›( N )+( โˆฅ๐นยฏโˆฅ+โˆฅ๐ถยฏโˆฅ)โˆš๏ธƒ

80๐‘› ๐ป ๐œŽ๐‘›( N )+

โˆš๏ธƒ40๐‘› ๐ป ๐œŽ๐‘›(๐ดยฏ) ๐œŽ๐‘›( N )

๐œŽ๐‘›(๐ดยฏ)

!2

. From Weylโ€™s inequality, we have๐œŽ๐‘›(๐ดห†๐‘ก) โ‰ฅ ๐œŽ๐‘›(๐ดยฏ)/2. Recalling that๐‘‹ = O(๐ด, ๐ถ , ๐‘‘ยฏ 1)T, under Assumption 5.1, we consider ห†๐ฟ๐‘ก:

โˆฅ๐ฟห†๐‘กโˆ’TโŠค๐ฟยฏโˆฅ๐น

= โˆฅ๐ดห†โ€ 

๐‘กOห†โ€ tHห†๐‘กโˆ’โˆ’TโŠค๐ดยฏโ€ Oโ€ Hโˆ’โˆฅ๐น

โ‰ค โˆฅ (๐ดห†โ€ 

๐‘กโˆ’TโŠค๐ดยฏโ€ T)Oห†โ€ tHห†๐‘กโˆ’โˆฅ๐น+ โˆฅTโŠค๐ดยฏโ€ T(Oห†โ€ tโˆ’๐‘‹โ€ )Hห†๐‘กโˆ’โˆฅ๐น+ โˆฅTโŠค๐ดยฏโ€ T๐‘‹โ€ (Hห†๐‘กโˆ’โˆ’Hโˆ’) โˆฅ๐น

โ‰ค โˆฅ๐ดห†โ€ 

๐‘กโˆ’TโŠค๐ดยฏโ€ Tโˆฅ๐นโˆฅOห†โ€ tโˆฅ โˆฅHห†๐‘กโˆ’โˆฅ + โˆฅOห†โ€ tโˆ’๐‘‹โ€ โˆฅ๐นโˆฅ๐ดยฏโ€ โˆฅ โˆฅHห†๐‘กโˆ’โˆฅ +โˆš

๐‘›โˆฅHห†๐‘กโˆ’โˆ’Hโˆ’โˆฅ โˆฅ๐ดยฏโ€ โˆฅ โˆฅ๐‘‹โ€ โˆฅ

โ‰ค โˆฅ๐ดห†โ€ 

๐‘กโˆ’TโŠค๐ดยฏโ€ Tโˆฅ๐น

โˆš๏ธ„

2 ๐œŽ๐‘›(N ) +

N โˆ’Nห†๐‘ก

โˆš๏ธ„

40๐‘›

๐œŽ๐‘›3(N )โˆฅ๐ดยฏโ€ โˆฅ

!

โˆฅHโˆ’โˆฅ + โˆฅHห†๐‘กโˆ’โˆ’Hโˆ’โˆฅ

+โˆš

๐‘›โˆฅ๐ดยฏโ€ โˆฅ 1

โˆš๏ธ

๐œŽ๐‘›(N )

โˆฅHห†๐‘กโˆ’โˆ’ Hโˆ’โˆฅ.

Again using the perturbation bounds of the Mooreโ€“Penrose inverse under the Frobenius norm [197], we have โˆฅ๐ดห†โ€ 

๐‘ก โˆ’ TโŠค๐ดยฏโ€ Tโˆฅ๐น โ‰ค 2

๐œŽ๐‘›2(๐ดยฏ)โˆฅ๐ดห†๐‘ก โˆ’ TโŠค๐ดยฏTโˆฅ. No- tice that the similarity transformation that transfers ๐ด to ยฏ๐ด is bounded since ๐‘† = [๐ถโŠค (๐ถ๐ดยฏ)โŠค. . .(๐ถ๐ดยฏ๐‘‘1โˆ’1)โŠค]โŠคโ€ 

O(๐ด, ๐ถ , ๐‘‘ยฏ 1). Combining all and using Lemma 5.1, we obtain the confidence set for ห†๐ฟ๐‘กgiven in Theorem 5.4. โ–ก Combining Theorem 5.4 with the guarantee that โˆฅGbyuโˆ’ Gyuโˆฅ = O (1/หœ โˆš

๐‘‡) given in (5.23), finishes the proof of the second part of Theorem 5.2. Overall, we showed that our novel system identification method allows closed-loop and open-loop estimation in both LQG and ARX systems. This method will be the key piece in our adaptive control design.

Remark 5.1. Note that to recoverGyu using the closed-loop system identification method presented in this section, we only require stabilizability condition on(๐ด, ๐ต) and detectability conditions on (๐ด, ๐ถ), i.e., there exists a matrix๐พ and๐นsuch that ๐ดโˆ’๐ต๐พand๐ดโˆ’๐น ๐ถare stable, rather than controllability and observability condi- tions provided in Assumption 5.1. Stabilizability and detectability are necessary and sufficient conditions to have a well-defined learning and control problem in partially observable linear dynamical systems, and they provide the conditions required for our novel closed-loop system identification method to work, i.e., stable๐ดยฏ. However, controllability and observability assumptions are required for the subspace identifi- cation methodSysId, since it requires rank-๐‘›observability and controllability matri- ces to achieve a balanced realization. If the goal is to recover the Markov parameters of the system or if one can design adaptive control methods only using Markov param- eter estimates, e.g., Section 5.6.5, stabilizability and detectability of the underlying system are sufficient to have reliable estimates as in Theorem 5.3 and(5.23). 5.3.2 PE Condition in the Open-Loop Setting

Before studying the adaptive control problem in partially observable linear dynam- ical systems, at the end of this section, we show that the PE condition required for consistent estimation is satisfied for the open-loop control, i.e., i.i.d. Gaussian control inputs. To this end, we introduce the truncated open-loop noise evolution parameterG๐‘œ๐‘™. G๐‘œ๐‘™ represents the effect of noises in the system on the outputs. We define G๐‘œ๐‘™ for 2๐ป time steps back in time and show that the last 2๐ป process and measurement noises provide sufficient persistent excitation for the covariates in the estimation problem. In the following, we show that there exists a positive๐œŽ๐‘œsuch that ๐œŽ๐‘œ < ๐œŽmin(G๐‘œ๐‘™), i.e., G๐‘œ๐‘™ is full row rank. Let ยฏ๐œ™๐‘ก = ๐‘ƒ ๐œ™๐‘ก for a permutation

matrix๐‘ƒthat gives

ยฏ ๐œ™๐‘ก =

๐‘ฆโŠค

๐‘กโˆ’1 ๐‘ขโŠค

๐‘กโˆ’1. . . ๐‘ฆโŠค

๐‘กโˆ’๐ป ๐‘ขโŠค

๐‘กโˆ’๐ป

โŠค

โˆˆR(๐‘š+๐‘)๐ป.

We will consider the state space representation for the analysis for LQG control systems given in (5.1), but one can apply the same analysis for predictor form/ARX systems (see [163] for the details). For the control input of ๐‘ข๐‘ก โˆผ N (0, ๐œŽ2

๐‘ข๐ผ), let ๐‘“๐‘ก = [๐‘ฆโŠค

๐‘ก ๐‘ขโŠค

๐‘ก ]โŠค. From the evolution of the system with given input we have the following:

๐‘“๐‘ก =Go h

๐‘คโŠค

๐‘กโˆ’1 ๐‘งโŠค

๐‘ก ๐‘ขโŠค

๐‘ก . . . ๐‘คโŠค

๐‘กโˆ’๐ป ๐‘งโŠค

๐‘กโˆ’๐ป+1 ๐‘ขโŠค

๐‘กโˆ’๐ป+1

iโŠค

+rot where

Go :=

"

0๐‘šร—๐‘› ๐ผ๐‘šร—๐‘š 0๐‘šร—๐‘ ๐ถ0๐‘šร—๐‘š ๐ถ ๐ต . . . ๐ถ ๐ด๐ปโˆ’20๐‘šร—๐‘š ๐ถ ๐ด๐ปโˆ’2๐ต 0๐‘ร—๐‘›0๐‘ร—๐‘š ๐ผ๐‘ร—๐‘ 0๐‘ร—๐‘›0๐‘ร—๐‘š 0๐‘ร—๐‘ . . . 0๐‘ร—๐‘›0๐‘ร—๐‘š 0๐‘ร—๐‘

# ,

androt is the residual vector that represents the effect of [๐‘ค๐‘–โˆ’1 ๐‘ง๐‘– ๐‘ข๐‘–] for 0 โ‰ค ๐‘– <

๐‘กโˆ’๐ป, which are independent. Notice thatGois full row rank even for ๐ป = 1, due to first(๐‘š+๐‘) ร— (๐‘š+๐‘›+๐‘)block. Using this, we can represent ยฏ๐œ™๐‘ก as follows

ยฏ ๐œ™๐‘ก =

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ ๐‘“๐‘กโˆ’1

.. . ๐‘“๐‘กโˆ’๐ป

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

| {z }๏ฃป

R(๐‘š+๐‘)๐ป

+

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ rotโˆ’1

.. . rotโˆ’H

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

=G๐‘œ๐‘™

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ ๐‘ค๐‘กโˆ’2

๐‘ง๐‘กโˆ’1 ๐‘ข๐‘กโˆ’1 .. . ๐‘ค๐‘กโˆ’2๐ปโˆ’1

๐‘ง๐‘กโˆ’2๐ป ๐‘ข๐‘กโˆ’2๐ป

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

| {z }๏ฃป

R2(๐‘›+๐‘š+๐‘)๐ป

+

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ rotโˆ’1

.. . rotโˆ’H

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

where

G๐‘œ๐‘™B

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

[ Go ] 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) . . .

0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) [ Go ] 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) . . .

...

0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) . . . [ Go ] 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘)

0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘) . . . [ Go ]

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป .

(5.29) Recall Assumption 5.1. The following lemma shows that covariates๐œ™s are bounded for the given system under open-loop control.

Lemma 5.3. After applying the control inputs of๐‘ข๐‘ก โˆผ N (0, ๐œŽ2

๐‘ข๐ผ) for๐‘‡๐‘ค time steps for all1 โ‰ค ๐‘ก โ‰ค๐‘‡๐‘ค, with probability1โˆ’๐›ฟ/2,

โˆฅ๐‘ฅ๐‘กโˆฅ โ‰ค ๐‘‹๐‘ค B

(๐œŽ๐‘ค+๐œŽ๐‘ขโˆฅ๐ตโˆฅ)๐œ…1(1โˆ’๐›พ1)

โˆš๏ธ1โˆ’ (1โˆ’๐›พ1)2

โˆš๏ธ2๐‘›log(12๐‘›๐‘‡๐‘ค/๐›ฟ), (5.30)

โˆฅ๐‘ง๐‘กโˆฅ โ‰ค ๐‘ B ๐œŽ๐‘ง

โˆš๏ธ2๐‘šlog(12๐‘š๐‘‡๐‘ค/๐›ฟ), (5.31)

โˆฅ๐‘ข๐‘กโˆฅ โ‰ค๐‘ˆ๐‘ค B ๐œŽ๐‘ข

โˆš๏ธ2๐‘log(12๐‘๐‘‡๐‘ค/๐›ฟ), (5.32)

โˆฅ๐‘ฆ๐‘กโˆฅ โ‰ค โˆฅ๐ถโˆฅ๐‘‹๐‘ค+๐‘ . (5.33)

Thus, we havemax๐‘–โ‰ค๐‘กโ‰ค๐‘‡๐‘ค โˆฅ๐œ™๐‘–โˆฅ โ‰ค ฮฅ๐‘ค

โˆš

๐ป, whereฮฅ๐‘ค = โˆฅ๐ถโˆฅ๐‘‹๐‘ค+๐‘+๐‘ˆ๐‘ค.

Proof. For all 1 โ‰ค ๐‘ก โ‰ค ๐‘‡๐‘ค, ฮฃ(๐‘ฅ๐‘ก) โ‰ผ ๐šชโˆž, where ๐šชโˆž is the steady state covariance matrix of๐‘ฅ๐‘กsuch that,

๐šชโˆž =โˆ‘๏ธโˆž ๐‘–=0

๐œŽ2

๐‘ค๐ด๐‘–(๐ดโŠค)๐‘–+๐œŽ2

๐‘ข๐ด๐‘–๐ต ๐ตโŠค(๐ดโŠค)๐‘–.

From the Assumption 5.1, we have โˆฅ๐ด๐œโˆฅ โ‰ค ๐œ…1(1 โˆ’ ๐›พ1)๐œ for all ๐œ โ‰ฅ 0. Thus,

โˆฅ๐šชโˆžโˆฅ โ‰ค (๐œŽ2

๐‘ค+๐œŽ2

๐‘ขโˆฅ๐ตโˆฅ2)๐œ…

2 1(1โˆ’๐›พ1)2

1โˆ’(1โˆ’๐›พ1)2. Notice that each๐‘ฅ๐‘ก is component-wiseโˆš๏ธ

โˆฅ๐šชโˆžโˆฅ- sub-Gaussian random variable. Using standard sub-Gaussian vector norm upper bound with a union bound argument, we get the advertised result. โ–ก The following lemma shows that the i.i.d. Gaussian inputs uniformly excite the system and satisfy the PE condition after enough interactions.

Lemma 5.4(Persistence of Excitation in Open-Loop Control Setting). G๐‘œ๐‘™ is full row-rank such that๐œŽmin(G๐‘œ๐‘™) > ๐œŽ๐‘œ > 0. For some๐›ฟ โˆˆ (0,1), and ฮฅ๐‘ค defined in Lemma 5.3, let๐‘‡๐‘œ =32ฮฅ4๐‘ค๐œŽโˆ’4

๐‘œ log2

2๐ป(๐‘š+๐‘) ๐›ฟ

max{๐œŽโˆ’4

๐‘ค , ๐œŽโˆ’4

๐‘ง , ๐œŽโˆ’4

๐‘ข }. After applying the control inputs of๐‘ข๐‘ก โˆผ N (0, ๐œŽ2

๐‘ข๐ผ)for๐‘‡๐‘ค โ‰ฅ๐‘‡๐‘œtime steps, with probability at least 1โˆ’๐›ฟwe have๐œŽmin ร๐‘ก

๐‘–=1๐œ™๐‘–๐œ™โŠค

๐‘–

โ‰ฅ ๐‘ก

๐œŽ2

๐‘œ

2 min{๐œŽ2

๐‘ค, ๐œŽ2

๐‘ง, ๐œŽ2

๐‘ข}.

Proof. Let ยฏ0 = 0(๐‘š+๐‘)ร—(๐‘š+๐‘›+๐‘). Since each block row is full row-rank, we get the following decomposition using QR decomposition for each block row:

G๐‘œ๐‘™ =

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

๐‘„๐‘œ 0๐‘š+๐‘ 0๐‘š+๐‘ 0๐‘š+๐‘ . . . 0๐‘š+๐‘ ๐‘„๐‘œ 0๐‘š+๐‘ 0๐‘š+๐‘ . . .

.. .

0๐‘š+๐‘ 0๐‘š+๐‘ . . . ๐‘„๐‘œ 0๐‘š+๐‘

0๐‘š+๐‘ 0๐‘š+๐‘ 0๐‘š+๐‘ . . . ๐‘„๐‘œ

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

| {z }๏ฃป

R(๐‘š+๐‘)๐ปร— (๐‘š+๐‘)๐ป

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

๐‘…๐‘œ 0ยฏ 0ยฏ 0ยฏ . . . 0ยฏ ๐‘…๐‘œ 0ยฏ 0ยฏ . . .

.. .

0ยฏ 0ยฏ . . . ๐‘…๐‘œ 0ยฏ 0ยฏ 0ยฏ 0ยฏ . . . ๐‘…๐‘œ

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

| {z }๏ฃป

R(๐‘š+๐‘)๐ปร—2(๐‘š+๐‘›+๐‘)๐ป

,

where ๐‘…๐‘œ =

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

ร— ร— ร— ร— ร— ร— . . . 0 ร— ร— ร— ร— ร— . . .

...

0 0 0 ร— ร— ร— . . .

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

โˆˆ R(๐‘š+๐‘)ร—๐ป(๐‘š+๐‘›+๐‘) where the elements

in the diagonal are positive numbers. Notice that the first matrix with ๐‘„0 is full rank. Also, all the rows of the second matrix are in row echelon form and the second matrix is full row-rank. Thus, we can deduce thatG๐‘œ๐‘™ is full row-rank, i.e., ๐œŽmin(G๐‘œ๐‘™) > ๐œŽ๐‘œ > 0. SinceG๐‘œ๐‘™ is full row rank, we have that

E[๐œ™ยฏ๐‘ก๐œ™ยฏโŠค

๐‘ก ] โชฐ G๐‘œ๐‘™ฮฃ๐‘ค ,๐‘ง,๐‘ขG๐‘œ๐‘™โŠค, where ฮฃ๐‘ค ,๐‘ง,๐‘ข โˆˆ R2(๐‘›+๐‘š+๐‘)๐ปร—2(๐‘›+๐‘š+๐‘)๐ป = diag(๐œŽ2

๐‘ค, ๐œŽ2

๐‘ง, ๐œŽ2

๐‘ข, . . . , ๐œŽ2

๐‘ค, ๐œŽ2

๐‘ง, ๐œŽ2

๐‘ข). This gives us

๐œŽmin(E[๐œ™ยฏ๐‘ก๐œ™ยฏโŠค

๐‘ก ]) โ‰ฅ๐œŽ2

๐‘œmin{๐œŽ2

๐‘ค, ๐œŽ2

๐‘ง, ๐œŽ2

๐‘ข} for all๐‘ก. From Lemma 5.3, we have max๐‘–โ‰ค๐œ โˆฅ๐œ™๐‘–โˆฅ โ‰คฮฅ๐‘ค

โˆš

๐ปwith probability at least 1โˆ’๐›ฟ/2. Given this holds, one can use Matrix Azuma inequality in [267], to obtain the following which holds with probability 1โˆ’๐›ฟ/2:

๐œ†max

๐‘ก

โˆ‘๏ธ

๐‘–=1

๐œ™๐‘–๐œ™โŠค

๐‘– โˆ’E[๐œ™๐‘–๐œ™โŠค

๐‘– ]

!

โ‰ค 2

โˆš

2๐‘กฮฅ2๐‘ค๐ป

โˆš๏ธ„

log

2๐ป(๐‘š+ ๐‘) ๐›ฟ

.

Using Weylโ€™s inequality, during the warm-up period with probability 1โˆ’๐›ฟ, we have

๐œŽmin

๐‘ก

โˆ‘๏ธ

๐‘–=1

๐œ™๐‘–๐œ™โŠค

๐‘–

!

โ‰ฅ ๐‘ก ๐œŽ2

๐‘œmin{๐œŽ2

๐‘ค, ๐œŽ2

๐‘ง, ๐œŽ2

๐‘ข} โˆ’2

โˆš 2๐‘กฮฅ2๐‘ค๐ป

โˆš๏ธ„

log

2๐ป(๐‘š+๐‘) ๐›ฟ

.

For all๐‘ก โ‰ฅ๐‘‡๐‘œ, we have the stated lower bound. โ–ก

This result verifies that the PE condition holds in the open-loop control setting, which shows that the estimation error guarantees given in Theorem 5.2 hold for open-loop data collection. Therefore, even if the closed-loop PE condition is not satisfied such that we cannot guarantee the estimation error guarantees given in Theorem 5.2 for the closed-loop control, one can use the novel system identification method with the i.i.d. control inputs to obtain state-of-the-art guarantees. However, if one has PE in the closed-loop setting we can further guarantee consistent improvement of estimates which would not be possible with prior methods. This novelty will be crucial in the adaptive control tasks discussed next.

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 162-171)