• Tidak ada hasil yang ditemukan

Main Result

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 116-128)

LEARNING AND CONTROL IN LINEAR TIME-VARYING SYSTEMS

4.2 Stable Online Control of Linear Time-Varying Systems

4.2.3 Main Result

The previous section highlights that a naive application of LTI control cannot guar- antee stability for LTV systems. We now propose a new approach, COvariance Constrained Online LQ (COCO-LQ) control. Our main technical result shows that COCO-LQ provably guarantees stability in LTV systems when the SDP is feasible.

Then, we discuss how to handle the situation when the SDP is infeasible. Finally, we discuss the effect of model estimation error.

COvariance Constrained Online LQ (COCO-LQ)

The naive approach discussed in Section 4.2.2 seeks to solve the LTI problem at every time step, which is equivalent to solving the SDP in Proposition 4.1 for every (๐ด๐‘ก, ๐ต๐‘ก). The reason this method fails is that it only considers cost minimization without explicitly considering stability. The main idea of COCO-LQ is to enforce stability via a state covariance constraint embedded into the SDP framework. The proposed algorithm is stated formally in Algorithm 7.

COCO-LQ solves an SDP (4.3) at each time step that is similar to that in Propo- sition 4.1. The crucial difference is the new constraint (4.3c), which involves parameter๐›ผ. Plugging (4.3a) into constraint (4.3c) yields the following:

ฮฃ๐‘ฅ ๐‘ฅ โชฏ 1 1โˆ’๐›ผ

๐‘Š .

This highlights that constraint (4.3c) can be interpreted as an upper bound on the state covariance matrixฮฃ๐‘ฅ ๐‘ฅ. When๐›ผ =0, the controller essentially cancels out the dynamics, without taking into account the cost of doing so. This ensures stability but can lead to large cost. At another extreme, when๐›ผโ†’1, the SDP solved at each time step is the same as for the LTI setting, and so COCO-LQ matches the naive

Algorithm 7COCO-LQ: COvariance Constrained Online LQ

1: Input: ๐›ผ โˆˆ [0,1),๐‘„ , ๐‘…, ๐‘Š โ‰ป 0

2: for๐‘ก =1,2, ...do

3: Receive state๐‘ฅ๐‘ก, and system parameter ๐ด๐‘ก, ๐ต๐‘ก

4: Solve the following SDP forฮฃ๐‘ก โˆˆR(๐‘‘+๐‘)ร—(๐‘‘+๐‘): minimize

๐‘„ 0

0 ๐‘…

โ€ขฮฃ subject to ฮฃ๐‘ฅ ๐‘ฅ =

๐ด๐‘ก ๐ต๐‘ก ฮฃ

๐ด๐‘ก ๐ต๐‘กโŠค

+๐‘Š (4.3a)

ฮฃ โชฐ 0 (4.3b)

๐ด๐‘ก ๐ต๐‘ก ฮฃ

๐ด๐‘ก ๐ต๐‘กโŠค

โชฏ ๐›ผฮฃ๐‘ฅ ๐‘ฅ (4.3c)

5: Compute the control gain ๐พ๐‘ก = ฮฃโŠค๐‘ฅ ๐‘ขฮฃ๐‘ฅ ๐‘ฅโˆ’1, whereฮฃ๐‘ก =

ฮฃ๐‘ฅ ๐‘ฅ ฮฃ๐‘ฅ ๐‘ข

ฮฃ๐‘ฅ ๐‘ข

โŠค ฮฃ๐‘ข๐‘ข

6: Execute control action๐‘ข๐‘ก =๐พ๐‘ก๐‘ฅ๐‘ก

approach. Thus,๐›ผtrades off between stability and cost. In the following section, we show that this novel state covariance constraint promotes sequential strong stability [61], which in turn guarantees ISS with a proper choice of๐›ผ.

Stability

We now state our main technical result, which provides a formal stability guarantee for COCO-LQ.

Theorem 4.1. Let 0 โ‰ค ๐›ผ < 1/2, and suppose (4.3) is feasible for all ๐‘ก, then the resulting dynamical system satisfies ISS in the sense that for any disturbance sequence{๐‘ค๐‘ก}โˆž

๐‘ก=0and for any๐‘ก โ‰ฅ ๐‘ก0,

โˆฅ๐‘ฅ๐‘กโˆฅ โ‰ค ๐œŒ๐‘กโˆ’๐‘ก0โˆฅ๐‘ฅ๐‘ก

0โˆฅ + ๐œ… ๐œŒ 1โˆ’๐œŒ

sup

๐‘ก0โ‰ค๐‘˜ <๐‘ก

โˆฅ๐‘ค๐‘˜โˆฅ for ๐œŒ = โˆš๏ธ ๐›ผ

1โˆ’๐›ผ โˆˆ [0,1) and ๐œ… = โˆš๐œ…๐‘Š

1โˆ’๐›ผ, where ๐œ…๐‘Š = โˆฅ๐‘Šโˆฅ โˆฅ๐‘Šโˆ’1โˆฅ is the condition number of๐‘Š.

The key intuition underpinning this result is that the additional state covariance constraint (4.3c) implicitly enforces sequential strong stability [61], which in turn ensures ISS. These results are formally stated in Lemma 4.1 and Lemma 4.2 respec- tively. First, we formally define sequential strong stability.

Definition 4.2(Sequential Strong Stability). A sequence of policies๐พ1, ๐พ2, ...,such that ๐‘ข๐‘ก = ๐พ๐‘ก๐‘ฅ๐‘ก is (๐œ…, ๐›พ , ๐œŒ)-sequential strongly stable (for ๐œ… > 0, 0 < ๐›พ โ‰ค 1 and

0 โ‰ค ๐œŒ < 1) if there exist matrices๐ป1, ๐ป2, ..., and ๐ฟ1, ๐ฟ2..., such that ๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก = ๐ป๐‘ก๐ฟ๐‘ก๐ปโˆ’1

๐‘ก for all ๐‘ก, with the following properties: (a) โˆฅ๐ฟ๐‘กโˆฅ โ‰ค 1โˆ’๐›พ; (b) โˆฅ๐ป๐‘กโˆฅ โ‰ค ๐›ฝ1 and||๐ปโˆ’1

๐‘ก || โ‰ค 1/๐›ฝ2with๐œ… =๐›ฝ1/๐›ฝ2; (c)โˆฅ๐ปโˆ’1

๐‘ก+1๐ป๐‘กโˆฅ โ‰ค ๐œŒ

1โˆ’๐›พ.

With this definition in place, we present the connection between the condition in (4.3c) and sequential strong stability.

Lemma 4.1. Under the conditions in Theorem 4.1, the policies designed by COCO- LQ are (๐œ…, ๐›พ , ๐œŒ)-sequential strongly stability for ๐œ… = โˆš๐œ…๐‘Š

1โˆ’๐›ผ, ๐›พ =1โˆ’โˆš

๐›ผ, ๐œŒ =โˆš๏ธ ๐›ผ 1โˆ’๐›ผ

where๐œ…๐‘Š = โˆฅ๐‘Šโˆฅ โˆฅ๐‘Šโˆ’1โˆฅ.

Proof. To prove Lemma 4.1, we first show that the optimal solution to the SDP in (4.3) has a specific low rank structure. In particular, we claim that if (4.3) has a minimizerฮฃโˆ—, then there exists a๐พ โˆˆR๐‘ร—๐‘‘ s.t. ฮฃโˆ—can be written as

ฮฃโˆ— =

"

ฮฃ๐‘ฅ ๐‘ฅโˆ— ฮฃโˆ—๐‘ฅ ๐‘ข (ฮฃ๐‘ฅ ๐‘ขโˆ— )โŠค ฮฃโˆ—๐‘ข๐‘ข

#

=

"

ฮฃ๐‘ฅ ๐‘ฅโˆ— ฮฃโˆ—๐‘ฅ ๐‘ฅ๐พโŠค ๐พฮฃโˆ—๐‘ฅ ๐‘ฅ ๐พฮฃโˆ—๐‘ฅ ๐‘ฅ๐พโŠค

#

. (4.4)

To see this, first note that ฮฃโˆ—๐‘ฅ ๐‘ฅ โชฐ ๐‘Š โ‰ป 0, and therefore we can simply define ๐พ =(ฮฃ๐‘ฅ ๐‘ขโˆ— )โŠค(ฮฃ๐‘ฅ ๐‘ฅโˆ— )โˆ’1, and the only thing we need to show isฮฃโˆ—๐‘ข๐‘ข =๐พฮฃ๐‘ฅ ๐‘ฅโˆ— ๐พโŠค. Suppose this is not true, then sinceฮฃโˆ— โชฐ 0, there must exist ๐ท โ‰ 0, ๐ท โชฐ 0 s.t.

ฮฃโˆ—๐‘ข๐‘ข =๐พฮฃ๐‘ฅ ๐‘ฅโˆ— ๐พโŠค +๐ท . (4.5) Then, by (4.3a), we also have,

ฮฃโˆ—๐‘ฅ ๐‘ฅ = (๐ด๐‘ก+๐ต๐‘ก๐พ)ฮฃโˆ—๐‘ฅ ๐‘ฅ(๐ด๐‘ก+๐ต๐‘ก๐พ)โŠค+๐ต๐‘ก๐ท ๐ตโŠค

๐‘ก +๐‘Š . (4.6)

Viewing the above as a Lyapunov equation in terms ofฮฃโˆ—๐‘ฅ ๐‘ฅ, and since๐ต๐‘ก๐ท ๐ตโŠค

๐‘ก +๐‘Š โ‰ป 0 andฮฃโˆ—๐‘ฅ ๐‘ฅ โ‰ป 0, we get ๐ด๐‘ก+๐ต๐‘ก๐พ is a stable matrix. Next, since ๐ด๐‘ก+๐ต๐‘ก๐พis stable, we can construct หœฮฃ๐‘ฅ ๐‘ฅ to be the unique solution to the following Lyapunov equation,

ฮฃหœ๐‘ฅ ๐‘ฅ = (๐ด๐‘ก+๐ต๐‘ก๐พ)ฮฃหœ๐‘ฅ ๐‘ฅ(๐ด๐‘ก+๐ต๐‘ก๐พ)โŠค+๐‘Š . (4.7) We further define,

ฮฃ =หœ

"

ฮฃหœ๐‘ฅ ๐‘ฅ ฮฃหœ๐‘ฅ ๐‘ฅ๐พโŠค ๐พฮฃหœ๐‘ฅ ๐‘ฅ ๐พฮฃหœ๐‘ฅ ๐‘ฅ๐พโŠค

# ,

and we claim that หœฮฃis a feasible solution to (4.3). Clearly หœฮฃis positive semi-definite, and further, it satisfies (4.3a). We now check that (4.3c) is met below.

h ๐ด๐‘ก ๐ต๐‘ก

iฮฃหœ h ๐ด๐‘ก ๐ต๐‘ก

iโŠค

โˆ’๐›ผฮฃหœ๐‘ฅ ๐‘ฅ

= (๐ด๐‘ก+๐ต๐‘ก๐พ)ฮฃหœ๐‘ฅ ๐‘ฅ(๐ด๐‘ก+๐ต๐‘ก๐พ)โŠคโˆ’๐›ผฮฃหœ๐‘ฅ ๐‘ฅ

= (๐ด๐‘ก+๐ต๐‘ก๐พ)ฮฃโˆ—๐‘ฅ ๐‘ฅ(๐ด๐‘ก+๐ต๐‘ก๐พ)โŠคโˆ’๐›ผฮฃ๐‘ฅ ๐‘ฅโˆ— + (๐ด๐‘ก+๐ต๐‘ก๐พ) (ฮฃหœ๐‘ฅ ๐‘ฅโˆ’ฮฃ๐‘ฅ ๐‘ฅโˆ— ) (๐ด๐‘ก+๐ต๐‘ก๐พ)โŠคโˆ’๐›ผ(ฮฃหœ๐‘ฅ ๐‘ฅโˆ’ฮฃโˆ—๐‘ฅ ๐‘ฅ)

= [๐ด๐‘ก, ๐ต๐‘ก]ฮฃโˆ—[๐ด๐‘ก, ๐ต๐‘ก]โŠคโˆ’๐ต๐‘ก๐ท ๐ตโŠค

๐‘กโˆ’๐›ผฮฃโˆ—๐‘ฅ ๐‘ฅ+ (๐ด๐‘ก+๐ต๐‘ก๐พ) (ฮฃหœ๐‘ฅ ๐‘ฅโˆ’ฮฃโˆ—๐‘ฅ ๐‘ฅ) (๐ด๐‘ก+๐ต๐‘ก๐พ)โŠคโˆ’๐›ผ(ฮฃหœ๐‘ฅ ๐‘ฅโˆ’ฮฃ๐‘ฅ ๐‘ฅโˆ— )

โชฏ โˆ’๐ต๐‘ก๐ท ๐ตโŠค

๐‘ก + (๐ด๐‘ก+๐ต๐‘ก๐พ) (ฮฃหœ๐‘ฅ ๐‘ฅโˆ’ฮฃ๐‘ฅ ๐‘ฅโˆ— ) (๐ด๐‘ก+๐ต๐‘ก๐พ)โŠค โˆ’๐›ผ(ฮฃหœ๐‘ฅ ๐‘ฅ โˆ’ฮฃโˆ—๐‘ฅ ๐‘ฅ), (4.8) where (4.8) is due toฮฃโˆ— must satisfy (4.3c). Subtracting (4.6) from (4.7), we have,

ฮฃหœ๐‘ฅ ๐‘ฅ โˆ’ฮฃโˆ—๐‘ฅ ๐‘ฅ =(๐ด๐‘ก+๐ต๐‘ก๐พ) (ฮฃหœ๐‘ฅ ๐‘ฅโˆ’ฮฃ๐‘ฅ ๐‘ฅโˆ— ) (๐ด๐‘ก+๐ต๐‘ก๐พ)โŠค โˆ’๐ต๐‘ก๐ท ๐ตโŠค

๐‘ก . (4.9) Plugging (4.9) into (4.8), we have,

h ๐ด๐‘ก ๐ต๐‘ก

iฮฃหœ h

๐ด๐‘ก ๐ต๐‘ก iโŠค

โˆ’๐›ผฮฃหœ๐‘ฅ ๐‘ฅ โชฏ (1โˆ’๐›ผ) (ฮฃหœ๐‘ฅ ๐‘ฅ โˆ’ฮฃโˆ—๐‘ฅ ๐‘ฅ).

Therefore, to check (4.3c) it remains to show หœฮฃ๐‘ฅ ๐‘ฅ โˆ’ฮฃ๐‘ฅ ๐‘ฅโˆ— โชฏ 0. To see this, we view (4.9) as a Lyapunov equation in terms of หœฮฃ๐‘ฅ ๐‘ฅ โˆ’ฮฃ๐‘ฅ ๐‘ฅโˆ— , and as ๐ด๐‘ก+๐ต๐‘ก๐พ is stable, and ๐ต๐‘ก๐ท ๐ตโŠค

๐‘ก โชฐ 0, we have หœฮฃ๐‘ฅ ๐‘ฅ โˆ’ฮฃ๐‘ฅ ๐‘ฅโˆ— โชฏ 0. As a result, (4.3c) holds and หœฮฃ๐‘ฅ ๐‘ฅ is indeed a feasible solution to (4.3).

Further, we can show หœฮฃachieves a strictly lower cost thanฮฃโˆ—. To see this, note we have already shown หœฮฃ๐‘ฅ ๐‘ฅ โชฏ ฮฃโˆ—๐‘ฅ ๐‘ฅ, which also implies หœฮฃ๐‘ข๐‘ข = ๐พฮฃหœ๐‘ฅ ๐‘ฅ๐พโŠค โชฏ ๐พฮฃ๐‘ฅ ๐‘ฅโˆ— ๐พโŠค = ฮฃ๐‘ข๐‘ขโˆ— โˆ’๐ทwith๐ท โชฐ 0,๐ท โ‰ 0. Coupled this with the fact that๐‘„ , ๐‘…are strictly positive definite, we deduce that หœฮฃmust achieve a lower cost. So we get a contradiction, and verified that (4.4) holds.

Now that we established the decomposition in (4.4) for the solution of (4.3), we proceed with the proof of Lemma 4.1. Let ฮฃ๐‘ก be the solution to the SDP (4.3) at step๐‘ก. Then,ฮฃ๐‘ก can be rewritten as,

ฮฃ๐‘ก =

"

ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก

๐พ๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ ๐พ๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก

# ,

for someฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ โ‰ป 0, and๐พ๐‘กis the linear controller at step๐‘ก. With this, we can re-write the left side of constraint (4.3c) as follows,

h ๐ด๐‘ก ๐ต๐‘ก

i ฮฃ๐‘ก

h ๐ด๐‘ก ๐ต๐‘ก

iโŠค

= h ๐ด๐‘ก ๐ต๐‘ก

i

"

ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก

๐พ๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ ๐พ๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก

# "

๐ดโŠค

๐‘ก

๐ตโŠค

๐‘ก

#

= ๐ด๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐ดโŠค

๐‘ก +๐ด๐‘ก๐พ๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐ตโŠค

๐‘ก +๐ต๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก ๐ดโŠค

๐‘ก +๐ต๐‘ก๐พ๐‘กฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก ๐ตโŠค

๐‘ก

= (๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก)ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ(๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก)โŠค.

As a result, (4.3c) can be equivalently expressed as,

(๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก)ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ(๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก)โŠค โชฏ ๐›ผฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ. (4.10) Left and right multiplying the above byฮฃโˆ’1/2๐‘ก ,๐‘ฅ ๐‘ฅ , we get

ฮฃโˆ’1/2๐‘ก ,๐‘ฅ ๐‘ฅ (๐ด๐‘ก +๐ต๐‘ก๐พ๐‘ก)ฮฃ1/2๐‘ก ,๐‘ฅ ๐‘ฅฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ1/2(๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก)โŠคฮฃ๐‘ก ,๐‘ฅ ๐‘ฅโˆ’1/2โชฏ ๐›ผ ๐ผ . (4.11) Let๐ป๐‘ก = ฮฃ1/2๐‘ก ,๐‘ฅ ๐‘ฅ,๐ฟ๐‘ก = ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅโˆ’1/2(๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก)ฮฃ1/2๐‘ก ,๐‘ฅ ๐‘ฅ. We get the following decomposition,

(๐ด๐‘ก +๐ต๐‘ก๐พ๐‘ก) =๐ป๐‘ก๐ฟ๐‘ก๐ปโˆ’1

๐‘ก . (4.12)

We now show that this decomposition yields the desired sequential stability property.

To do so, we need to check the three conditions in Definition 4.2.

To check condition (a) in Definition 4.2, note that (4.11) provides an upper bound onโˆฅ๐ฟ๐‘กโˆฅ, that is ๐ฟ๐‘ก๐ฟโŠค

๐‘ก โชฏ ๐›ผ ๐ผ. Therefore, โˆฅ๐ฟ๐‘กโˆฅ โ‰คโˆš

๐›ผ=1โˆ’๐›พ with๐›พ =1โˆ’โˆš ๐›ผ. To check condition (b) which is an upper bound on โˆฅ๐ป๐‘กโˆฅ and โˆฅ๐ปโˆ’1

๐‘ก โˆฅ, we recall constraint (4.3a),

h ๐ด๐‘ก ๐ต๐‘ก

i ฮฃ๐‘ก

h ๐ด๐‘ก ๐ต๐‘ก

iโŠค

= ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅโˆ’๐‘Š .

As the left-hand side of the above is positive semi-definite, we must haveฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ โชฐ๐‘Š. Further, plugging (4.3a) into (4.3c), we can get, ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ โˆ’๐‘Š โชฏ ๐›ผฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ. These can be equivalently written as

๐‘Š โชฏ ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ โชฏ ๐‘Š 1โˆ’๐›ผ

. (4.13)

Therefore, โˆฅ๐ป๐‘กโˆฅ = โˆฅฮฃ1/2๐‘ก ,๐‘ฅ ๐‘ฅโˆฅ โ‰ค โˆฅโˆš๐‘Š1/2โˆฅ

1โˆ’๐›ผ , โˆฅ๐ปโˆ’1

๐‘ก โˆฅ = โˆฅฮฃ๐‘ก ,๐‘ฅ ๐‘ฅโˆ’1/2โˆฅ โ‰ค โˆฅ๐‘Šโˆ’1/2โˆฅ, โˆฅ๐ป๐‘กโˆฅ โˆฅ๐ปโˆ’1

๐‘ก โˆฅ โ‰ค

โˆฅ๐‘Š1/2โˆฅ โˆฅ๐‘Šโˆ’1/2โˆฅ

โˆš 1โˆ’๐›ผ

= โˆš๐œ…๐‘Š

1โˆ’๐›ผ. Therefore, condition (b) holds with๐œ… = โˆš๐œ…๐‘Š

1โˆ’๐›ผ. Lastly, we check condition (c), which is an upper bound onโˆฅ๐ปโˆ’1

๐‘ก+1๐ป๐‘กโˆฅ. Note that, ๐ปโˆ’1

๐‘ก+1๐ป๐‘ก๐ปโŠค

๐‘ก (๐ปโˆ’1

๐‘ก+1)โŠค=๐ปโˆ’1

๐‘ก+1ฮฃ๐‘ก ,๐‘ฅ ๐‘ฅ(๐ปโˆ’1

๐‘ก+1)โŠคโชฏ 1 1โˆ’๐›ผ

(ฮฃ๐‘ก+1,๐‘ฅ ๐‘ฅ)โˆ’1/2๐‘Š(ฮฃ๐‘ก+1,๐‘ฅ ๐‘ฅ)โˆ’1/2โชฏ 1 1โˆ’๐›ผ

๐ผ , where the two inequalities in the above derivations are due to (4.13). As a result, we have โˆฅ๐ปโˆ’1

๐‘ก+1๐ป๐‘กโˆฅ โ‰ค โˆš1

1โˆ’๐›ผ = 1โˆ’๐œŒ๐›พ with ๐œŒ =

โˆš

โˆš ๐›ผ

1โˆ’๐›ผ. As ๐›ผ < 1/2, we have ๐œŒ < 1.

As such, condition (c) holds. In summary, the (๐œ…, ๐›พ , ๐œŒ) strong sequential stability holds, with๐œ…= โˆš๐œ…๐‘Š

1โˆ’๐›ผ,๐›พ =1โˆ’โˆš

๐›ผ, and๐œŒ =โˆš๏ธ ๐›ผ

1โˆ’๐›ผ, which concludes the proof. โ–ก Now that we established the sequential strong stability of the policies obtained via COCO-LQ, we require relating this fact to ISS. The following provides the remaining piece in the proof of Theorem 4.1.

Lemma 4.2. Suppose a sequence of policies ๐พ1, . . . , ๐พ๐‘ก is (๐œ…, ๐›พ , ๐œŒ)-sequential strongly stable. Then, the closed-loop system obtained via the sequence of these policies is input-to-state stable in the sense that for any๐‘ก โ‰ฅ ๐‘ก0โ‰ฅ 1,

โˆฅ๐‘ฅ๐‘กโˆฅ โ‰ค ๐œ… ๐œŒ๐‘กโˆ’๐‘ก0โˆฅ๐‘ฅ๐‘ก

0โˆฅ + ๐œ… ๐œŒ 1โˆ’๐œŒ

max

๐‘ก0โ‰ค๐‘  <๐‘กโˆฅ๐‘ค๐‘ โˆฅ.

Proof. For notational simplicity, we only prove the case with๐‘ก0=1; the general case follows similarly, at the cost of heavier notations. Let๐‘ฅ1, ๐‘ฅ2, ...be a sequence of states starting from initial state๐‘ฅ1, and generated by the dynamics๐‘ฅ๐‘ก+1 =๐ด๐‘ก๐‘ฅ๐‘ก+๐ต๐‘ก๐‘ข๐‘ก+๐‘ค๐‘ก = (๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก)๐‘ฅ๐‘ก+๐‘ค๐‘ก. Hence,๐‘ฅ๐‘ก =๐‘€1๐‘ฅ1+ร๐‘กโˆ’1

๐‘ =1๐‘€๐‘ +1๐‘ค๐‘ ,where

๐‘€๐‘ก =๐ผ;๐‘€๐‘  =๐‘€๐‘ +1(๐ด๐‘ +๐ต๐‘ ๐พ๐‘ )= (๐ด๐‘กโˆ’1+๐ต๐‘กโˆ’1๐พ๐‘กโˆ’1) ยท ยท ยท (๐ด๐‘  +๐ต๐‘ ๐พ๐‘ ). By the sequential strong stability, there exist matrices ๐ป1, ๐ป2, ... and ๐ฟ1, ๐ฟ2, ...

such that ๐ด๐‘— + ๐ต๐‘—๐พ๐‘— = ๐ป๐‘—๐ฟ๐‘—๐ปโˆ’1

๐‘— , and ๐ป๐‘—, ๐ฟ๐‘— satisfy the properties specified in Definition 4.2. Thus we have for all 1 โ‰ค ๐‘  < ๐‘ก,

โˆฅ๐‘€๐‘ โˆฅ = โˆฅ๐ป๐‘กโˆ’1๐ฟ๐‘กโˆ’1๐ปโˆ’1

๐‘กโˆ’1๐ป๐‘กโˆ’2๐ฟ๐‘กโˆ’1๐ปโˆ’1

๐‘กโˆ’2ยท ยท ยท๐ป๐‘ ๐ฟ๐‘ ๐ปโˆ’1

๐‘  โˆฅ

โ‰ค โˆฅ๐ป๐‘กโˆ’1โˆฅ

๐‘กโˆ’1

ร–

๐‘—=๐‘ 

โˆฅ๐ฟ๐‘—โˆฅ

! ๐‘กโˆ’2 ร–

๐‘—=๐‘ 

โˆฅ๐ปโˆ’1

๐‘—+1๐ป๐‘—โˆฅ

!

โˆฅ๐ปโˆ’1

๐‘  โˆฅ,

โ‰ค ๐›ฝ1(1โˆ’๐›พ)๐‘กโˆ’๐‘ ( ๐œŒ 1โˆ’๐›พ

)๐‘กโˆ’๐‘ โˆ’1(1/๐›ฝ2) โ‰ค ๐œ…(1โˆ’๐›พ) ๐œŒ

๐œŒ๐‘กโˆ’๐‘ . As๐œ… โ‰ฅ 1, the same holds for๐‘€๐‘ก. Thus, we have,

โˆฅ๐‘ฅ๐‘กโˆฅ โ‰ค โˆฅ๐‘€1โˆฅ โˆฅ๐‘ฅ1โˆฅ +

๐‘กโˆ’1

โˆ‘๏ธ

๐‘ =1

โˆฅ๐‘€๐‘ +1โˆฅ โˆฅ๐‘ค๐‘ โˆฅ

โ‰ค ๐œ… ๐œŒ๐‘กโˆ’1โˆฅ๐‘ฅ1โˆฅ +๐œ…

๐‘กโˆ’1

โˆ‘๏ธ

๐‘ =1

๐œŒ๐‘กโˆ’๐‘ โˆ’1โˆฅ๐‘ค๐‘ โˆฅ โ‰ค ๐œ… ๐œŒ๐‘กโˆ’1โˆฅ๐‘ฅ1โˆฅ + ๐œ… ๐œŒ 1โˆ’๐œŒ

max

1โ‰ค๐‘  <๐‘กโˆฅ๐‘ค๐‘ โˆฅ.

โ–ก Proof of Theorem 4.1. Combining Lemma 4.1 with Lemma 4.2 gives the advertised

result of stability for COCO-LQ. โ–ก

A critical assumption in Theorem 4.1 is that (4.3) is feasible for 0โ‰ค๐›ผ <1/2. The following result provides a condition when the problem is always feasible.

Lemma 4.3. When๐ต๐‘กis full row rank, then the SDP(4.3)is always feasible.

Proof. We prove this result by explicitly constructing a feasible solution for (4.3).

As๐ต๐‘ก has full row rank,๐ต๐‘ก๐ตโŠค

๐‘ก is invertible. Let ฮฃ0=

"

๐‘Š โˆ’๐‘Š ๐ดโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ต๐‘ก

โˆ’๐ตโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ด๐‘ก๐‘Š ๐ตโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ด๐‘ก๐‘Š ๐ดโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ต๐‘ก

#

=

"

๐ผ

โˆ’๐ตโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ด๐‘ก

# ๐‘Š

"

๐ผ

โˆ’๐ตโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ด๐‘ก

#โŠค

.

It suffices to show thatฮฃ0satisfies (4.3a), (4.3b), and (4.3c). Notice that [๐ด๐‘ก, ๐ต๐‘ก]ฮฃ0[๐ด๐‘ก, ๐ต๐‘ก]โŠค = (๐ด๐‘กโˆ’๐ต๐‘ก๐ตโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ด๐‘ก)๐‘Š(๐ด๐‘กโˆ’๐ต๐‘ก๐ตโŠค

๐‘ก (๐ต๐‘ก๐ตโŠค

๐‘ก )โˆ’1๐ด๐‘ก)โŠค =0. Therefore, constraint (4.3a) is equivalent toฮฃ๐‘ฅ ๐‘ฅ =๐‘Š, which holds by the construc- tion ofฮฃ0. Next, as๐‘Š โ‰ป 0,ฮฃ0 is clearly positive semi-definitive and (4.3b) holds.

Finally, note the left-hand side of (4.3c) is 0, and the right-hand side of (4.3c) is

positive semi-definite. As a result, (4.3c) holds. โ–ก

Note that having ๐ต๐‘ก full row rank is a sufficient but not necessary condition for the feasibility of (4.3) of COCO-LQ. When ๐ต๐‘ก is not full row rank, the feasibility as- sumption may still hold, and therefore our assumption is weaker than the invertibility assumption used in the literature, e.g., Lai [155]. More broadly, in Theorem 4.1, ๐›ผ <0.5 is a sufficient condition for stability. For๐›ผโ‰ฅ 0.5, stability may still hold for some problem instances(๐ด๐‘ก, ๐ต๐‘ก)as will be shown in the simulations in Section 4.2.4.

How to provide a more refined instance-dependent threshold on๐›ผis an interesting future direction.

Infeasibility and the Role of Predictions

We now turn our attention to the case when the SDP given in (4.3) is infeasible. In this case it is necessary for the controller to use additional information in order to stabilize the system. In particular, the following example shows that when๐ต๐‘ก is not full row rank, for any (deterministic) online control algorithm that has causal access to system matrices, there exists a future sequence of (๐ด๐‘ก, ๐ต๐‘ก) such that the system state will blow up from a given initial state.

Example 4.2. Set ๐‘‘ = 2, ๐‘ = 1, i.e., ๐ด๐‘ก is 2-by-2and ๐ต๐‘ก is 2-by-1. For all ๐‘ก, we set๐ต๐‘ก = [1,0]โŠค, and for even time indices ๐‘ก =2๐‘˜, we set ๐ด๐‘ก = ๐ผ. Further, assume that ๐‘ฅ0 = [1,1]โŠค and ๐‘ค๐‘ก = 0. Our construction is based on induction. Suppose

Algorithm 8COCO-LQ with Predictions

1: Input: ๐›ผ โˆˆ [0,1),๐‘„ , ๐‘…, ๐‘Š โ‰ป 0

2: for๐‘ก =1,2, ...do

3: Receive state๐‘ฅ๐‘ก

4: if ๐‘ก โ‰ก1 (mod๐ป)then

5: Receive system parameters (๐ด๐‘ก, ๐ต๐‘ก), . . . ,(๐ด๐‘ก+๐ปโˆ’1, ๐ต๐‘ก+๐ปโˆ’1)

6: Solve (4.3) by replacing ๐‘…with หœ๐‘…=I๐ป โŠ— ๐‘…, ๐ด๐‘ก with หœ๐ด๐‘กโ‰”[๐ด๐‘ก+๐ปโˆ’1ยท ยท ยท๐ด๐‘ก], and ๐ต๐‘ก with ๐ตหœ๐‘ก โ‰” [๐ต๐‘ก+๐ปโˆ’1, ๐ด๐‘ก+๐ปโˆ’1๐ต๐‘ก+๐ปโˆ’2, . . . , ๐ด๐‘ก+๐ปโˆ’1ยท ยท ยท๐ด๐‘ก+1๐ต๐‘ก] to recoverฮฃ๐‘ก

7: ComputeKt =[๐พโŠค

๐‘ก , . . . , ๐พโŠค

๐‘ก+๐ปโˆ’1]โŠค= ฮฃโŠค๐‘ฅ ๐‘ขฮฃ๐‘ฅ ๐‘ฅโˆ’1, whereฮฃ๐‘ก=

ฮฃ๐‘ฅ ๐‘ฅ ฮฃ๐‘ฅ ๐‘ข

ฮฃ๐‘ฅ ๐‘ข

โŠค ฮฃ๐‘ข๐‘ข

8: Execute control action๐‘ข๐‘ก =๐พ๐‘ก๐‘ฅ๐‘ก

at an even index ๐‘ก = 2๐‘˜, ๐‘ฅ2๐‘˜ = [๐‘ฅ2๐‘˜ ,1, ๐‘ฅ2๐‘˜ ,2]โŠค with ๐‘ฅ2๐‘˜ ,2 โ‰ฅ 0. Then, after ๐‘ข2๐‘˜ is determined, we set

๐ด2๐‘˜+1=

"

1 0 ๐œ– 2

#

with๐œ– =0.5max(||๐‘ฅ2๐‘ฅ๐‘˜ ,2|

2๐‘˜ ,1|,1)sign(๐‘ข2๐‘˜)Then, after(๐ด2๐‘˜+1, ๐ต2๐‘˜+1)is revealed to the agent, it takes an action๐‘ข2๐‘˜+1, resulting in๐‘ฅ2๐‘˜+2= ๐ด2๐‘˜+1๐ด2๐‘˜๐‘ฅ2๐‘˜+๐ด2๐‘˜+1๐ต2๐‘˜๐‘ข2๐‘˜+๐ต2๐‘˜+1๐‘ข2๐‘˜+1. Since ๐ด2๐‘˜ = ๐ผ and๐ต2๐‘˜ = ๐ต2๐‘˜+1 = [1,0]โŠค, the second coordinate of๐‘ฅ2๐‘˜+2can then be written as ๐‘ฅ2๐‘˜+2,2 = ๐œ– ๐‘ฅ2๐‘˜ ,1 +2๐‘ฅ2๐‘˜ ,2+ ๐œ– ๐‘ข2๐‘˜. By the way ๐œ– is chosen, we have ๐œ– ๐‘ข2๐‘˜ โ‰ฅ 0, and |๐œ– ๐‘ฅ2๐‘˜ ,1| โ‰ค 0.5|๐‘ฅ2๐‘˜ ,2|, and therefore, ๐‘ฅ2๐‘˜+2,2 โ‰ฅ 1.5๐‘ฅ2๐‘˜ ,2. Since the system starts with๐‘ฅ0 = [1,1]โŠค, applying the argument above recursively, we have for any๐‘˜,๐‘ฅ2๐‘˜ ,2 โ‰ฅ 1.5๐‘˜, i.e., the state blows up.

In this section, we show that using (๐ด๐‘ก, ๐ต๐‘ก) together with short-term predictions of future system matrices is enough to stabilize the system under standard controlla- bility assumptions. Specifically, we extend COCO-LQ to include future๐ป steps of predictions in Algorithm 8. The key idea is to rewrite the dynamics as

๐‘ฅ๐‘ก+๐ป = ๐ดหœ๐‘ก๐‘ฅ๐‘ก+๐ตหœ๐‘ก๐‘ขยฏ๐‘ก+ [๐ผ , ๐ด๐‘ก+๐ปโˆ’1,ยท ยท ยท , ๐ด๐‘ก+๐ปโˆ’1... ๐ด๐‘ก+1]๐‘คยฏ๐‘ก, (4.14) where หœ๐ด๐‘ก := ๐ด๐‘ก+๐ปโˆ’1ยท ยท ยท๐ด๐‘ก, หœ๐ต๐‘ก := [๐ต๐‘ก+๐ปโˆ’1, ๐ด๐‘ก+๐ปโˆ’1๐ต๐‘ก+๐ปโˆ’2, . . . , ๐ด๐‘ก+๐ปโˆ’1ยท ยท ยท๐ด๐‘ก+1๐ต๐‘ก],

ยฏ

๐‘ข๐‘ก:=[๐‘ขโŠค

๐‘ก+๐ปโˆ’1, ๐‘ขโŠค

๐‘ก+๐ปโˆ’2, . . . , ๐‘ขโŠค

๐‘ก ]โŠค and ยฏ๐‘ค๐‘ก:=[๐‘คโŠค

๐‘ก+๐ปโˆ’1, ๐‘คโŠค

๐‘ก+๐ปโˆ’2, . . . , ๐‘คโŠค

๐‘ก ]โŠค. Notice that

หœ

๐ต๐‘ก is in the form of the controllability matrix for this LTV system. When๐ป is long enough such that หœ๐ต๐‘กis full row rank, i.e., LTV system is๐ป-controllable, we can use Algorithm 8 on หœ๐ด๐‘กand หœ๐ต๐‘กand avoid the infeasibility issue, and we have the stability guarantee.

Theorem 4.2. Suppose for each ๐‘ก, matrix ๐ตหœ๐‘ก defined above satisfies ๐ตหœ๐‘ก๐ตหœโŠค

๐‘ก โชฐ ๐œŽ ๐ผ

for some ๐œŽ > 0, and โˆฅ๐ด๐‘กโˆฅ โ‰ค ๐‘Ž,โˆฅ๐ต๐‘กโˆฅ โ‰ค ๐‘ for some ๐‘Ž, ๐‘ > 0. Then, the SDP in Algorithm 8 is always feasible. Further, when ๐›ผ < 1/2, the closed-loop system is ISS for any๐‘ก,

โˆฅ๐‘ฅ๐‘กโˆฅ โ‰ค ๐œ…โ€ฒ

๐ด๐œŒ

๐‘ก

๐ปโˆ’1โˆฅ๐‘ฅ1โˆฅ +๐œ…โ€ฒ

๐ด๐œ…๐ด๐œ…max(1, ๐œŒ 1โˆ’๐œŒ

) sup

1โ‰ค๐‘  <๐‘ก

โˆฅ๐‘ค๐‘ โˆฅ, where the same as Theorem 4.1, ๐œŒ = โˆš๏ธ ๐›ผ

1โˆ’๐›ผ โˆˆ [0,1) and ๐œ… = โˆš๐œ…๐‘Š

1โˆ’๐›ผ with ๐œ…๐‘Š =

โˆฅ๐‘Šโˆฅ โˆฅ๐‘Šโˆ’1โˆฅbeing the condition number of๐‘Š; further,๐œ…๐ด=1+๐‘Ž+. . .+๐‘Ž๐ปโˆ’1, and ๐œ…โ€ฒ

๐ด =๐‘Ž๐ปโˆ’1+๐‘2๐œ…2

๐ด

๐œ…๐‘…๐œ…+๐‘Ž

๐ป

๐œŽ with๐œ…๐‘… being the condition number of๐‘….

Proof. The feasibility directly follows Lemma 4.3. For stability, using the dynamics given in (4.14), from Theorem 4.1, we have,โˆ€๐‘˜ โ‰ฅ 1

โˆฅ๐‘ฅ๐‘˜ ๐ป+1โˆฅ โ‰ค ๐œŒ๐‘˜โˆฅ๐‘ฅ1โˆฅ + ๐œ… ๐œŒ 1โˆ’๐œŒ

sup

0โ‰ค๐œ < ๐‘˜

โˆฅ๐‘คหœ๐œ ๐ป+1โˆฅ. Note that, for any๐œ,

โˆฅ๐‘คหœ๐œ ๐ป+1โˆฅ โ‰ค

๐ป

โˆ‘๏ธ

โ„“=1

โˆฅ๐ด๐œ ๐ป+๐ปโˆฅ ยท ยท ยท โˆฅ๐ด๐œ ๐ป+โ„“+1โˆฅ โˆฅ๐‘ค๐œ ๐ป+โ„“โˆฅ

โ‰ค (1+๐‘Ž+. . .+๐‘Ž๐ปโˆ’1) sup

๐œ ๐ป+1โ‰ค๐‘ โ‰ค(๐œ+1)๐ป

โˆฅ๐‘ค๐‘ โˆฅ := ๐œ…๐ด sup

๐œ ๐ป+1โ‰ค๐‘ โ‰ค(๐œ+1)๐ป

โˆฅ๐‘ค๐‘ โˆฅ. These show that,

โˆฅ๐‘ฅ๐‘˜ ๐ป+1โˆฅ โ‰ค ๐œŒ๐‘˜โˆฅ๐‘ฅ1โˆฅ + ๐œ…๐ด๐œ… ๐œŒ 1โˆ’๐œŒ sup

1โ‰ค๐‘ โ‰ค๐‘˜ ๐ป

โˆฅ๐‘ค๐‘ โˆฅ.

The above already shows the boundedness of states for time indexes๐‘ก =1 mod (๐ป). To show the boundedness of the states for all ๐‘ก, we need to consider the effect of control inputs in all ๐ป sequences. Note that Theorem 4.1 shows that the policies ๐พ๐‘˜ ๐ป+1is (๐œ…, ๐›พ , ๐œŒ)-sequential strongly stable, with๐›พ =1โˆ’โˆš

๐›ผand๐œ… = โˆš๐œ…๐‘Š

1โˆ’๐›ผ. This impliesโˆฅ๐ดหœ๐‘˜ ๐ป+1+๐ตหœ๐‘˜ ๐ป+1๐พ๐‘˜ ๐ป+1โˆฅ โ‰ค (1โˆ’๐›พ)๐œ…, showing

โˆฅ๐ตหœ๐‘˜ ๐ป+1๐พ๐‘˜ ๐ป+1โˆฅ โ‰ค (1โˆ’๐›พ)๐œ…+๐‘Ž๐ป, which further leads to

โˆฅ๐พ๐‘˜ ๐ป+1โˆฅ โ‰ค ๐พmax= ๐œ…๐‘…

๐‘(1+๐‘Ž+. . .+๐‘Ž๐ปโˆ’1) ๐œŽ

( (1โˆ’๐›พ)๐œ…+๐‘Ž๐ป).

Therefore, we have

โˆฅ๐‘ฅ๐‘˜ ๐ป+โ„“โˆฅ โ‰ค โˆฅ๐ด๐‘˜ ๐ป+โ„“โˆ’1ยท ยท ยท๐ด๐‘˜ ๐ป+1โˆฅ โˆฅ๐‘ฅ๐‘˜ ๐ป+1โˆฅ

+ โˆฅ h

๐ต๐‘˜ ๐ป+โ„“โˆ’1, ๐ด๐‘˜ ๐ป+โ„“โˆ’1๐ต๐‘˜ ๐ป+โ„“โˆ’2, . . . , ๐ด๐‘˜ ๐ป+โ„“โˆ’1ยท ยท ยท๐ด๐‘˜ ๐ป+2๐ต๐‘˜ ๐ป+1 i

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

๐‘ข๐‘˜ ๐ป+โ„“โˆ’1 .. . ๐‘ข๐‘˜ ๐ป+1

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

โˆฅ

+ โˆฅ [๐ผ , ๐ด๐‘ก+๐ปโˆ’1,ยท ยท ยท , ๐ด๐‘˜ ๐ป+โ„“โˆ’1... ๐ด๐‘˜ ๐ป+1]

๏ฃฎ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฏ

๏ฃฐ

๐‘ค๐‘˜ ๐ป+โ„“โˆ’1 .. . ๐‘ค๐‘˜ ๐ป+1

๏ฃน

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃบ

๏ฃป

โˆฅ

< ๐œ…โ€ฒ

๐ดโˆฅ๐‘ฅ๐‘˜ ๐ป+1โˆฅ +๐œ…๐ด sup

๐‘˜ ๐ป+1โ‰ค๐‘  < ๐‘˜ ๐ป+โ„“

โˆฅ๐‘ค๐‘ โˆฅ.

For any๐‘ก, let๐‘˜ be such that๐‘ก =๐‘˜ ๐ป+โ„“with 1 โ‰ค โ„“ โ‰ค ๐ป. Then, we have,

โˆฅ๐‘ฅ๐‘กโˆฅ โ‰ค ๐œ…โ€ฒ

๐ดโˆฅ๐‘ฅ๐‘˜ ๐ป+1โˆฅ +๐œ…๐ด sup

๐‘˜ ๐ป+1โ‰ค๐‘  <๐‘ก

โˆฅ๐‘ค๐‘ โˆฅ

โ‰ค ๐œ…โ€ฒ

๐ด๐œŒ๐‘˜โˆฅ๐‘ฅ1โˆฅ +๐œ…โ€ฒ

๐ด

๐œ…๐ด๐œ… ๐œŒ 1โˆ’ ๐œŒ

sup

1โ‰ค๐‘ โ‰ค๐‘˜ ๐ป

โˆฅ๐‘ค๐‘ โˆฅ +๐œ…๐ด sup

๐‘˜ ๐ป+1โ‰ค๐‘  <๐‘ก

โˆฅ๐‘ค๐‘ โˆฅ

โ‰ค ๐œ…โ€ฒ

๐ด๐œŒ

๐‘ก

๐ปโˆ’1โˆฅ๐‘ฅ1โˆฅ +๐œ…โ€ฒ

๐ด๐œ…๐ด๐œ…max(1, ๐œŒ 1โˆ’ ๐œŒ

) sup

1โ‰ค๐‘ โ‰ค๐‘ก

โˆฅ๐‘ค๐‘ โˆฅ.

โ–ก

Estimation Error

In both Algorithm 7 and Algorithm 8, the exact knowledge of state-transition ma- trices (๐ด๐‘ก, ๐ต๐‘ก) or the extended state-transition matrices (๐ดหœ๐‘ก,๐ตหœ๐‘ก) are needed when deriving the control actions. In this section, we show that COCO-LQ can still obtain a stabilizing controller in the case where only approximations are known, if the estimation error is controlled. Our main result is the following.

Theorem 4.3. Let (๐ดห†๐‘ก, ๐ตห†๐‘ก) be an estimate of (๐ด๐‘ก, ๐ต๐‘ก). Given ๐›ผ โˆˆ [0,1

2), let ๐œŒ =โˆš๏ธ ๐›ผ

1โˆ’๐›ผ, ๐œ… = โˆฅ๐‘Šโˆฅ โˆฅ๐‘Šโˆš โˆ’1โˆฅ

1โˆ’๐›ผ and๐›พ = 1โˆ’โˆš

๐›ผ. Let๐พ1, ๐พ2, ...be the policies designed by COCO-LQ for (๐ดห†๐‘ก, ๐ตห†๐‘ก)with parameter๐›ผ. When the estimation error satisfies,

max{||๐ดห†๐‘กโˆ’๐ด๐‘ก||2,||๐ตห†๐‘กโˆ’๐ต๐‘ก||2} โ‰ค๐›ฟ ๐›พ

๐œ…(1+๐พ๐‘š๐‘Ž ๐‘ฅ), (4.15) where๐›ฟcan be any number in (0,

โˆš 1โˆ’๐›ผโˆ’โˆš

๐›ผ 1โˆ’โˆš

๐›ผ ), and๐พ๐‘š ๐‘Ž๐‘ฅ is any uniform upper bound onโˆฅ๐พ๐‘กโˆฅ. Then, the policies๐พ๐‘กare ISS when applied to the system (๐ด๐‘ก, ๐ต๐‘ก),

โˆฅ๐‘ฅ๐‘กโˆฅ โ‰ค (๐œŒโ€ฒ)๐‘กโˆ’๐‘ก0โˆฅ๐‘ฅ๐‘ก

0โˆฅ + ๐œ… ๐œŒโ€ฒ 1โˆ’๐œŒโ€ฒ

sup

๐‘ก0โ‰ค๐‘˜ <๐‘ก

โˆฅ๐‘ค๐‘˜โˆฅ,

where ๐œŒโ€ฒ = 1โˆ’(1โˆ’๐›ฟ)๐›พ1โˆ’๐›พ ๐œŒ โˆˆ (0,1). Finally, when โˆฅ๐ดห†๐‘กโˆฅ โ‰ค ๐œŽยฏ๐ด, โˆฅ๐ตห†๐‘กโˆฅ โ‰ค ๐œŽยฏ๐ต and ห†

๐ต๐‘ก๐ตห†โŠค

๐‘ก โชฐ ๐œŽ2

๐ต๐ผ, one uniform upper bound for โˆฅ๐พ๐‘กโˆฅ is๐พ๐‘š ๐‘Ž๐‘ฅ = ๐œ…๐‘…๐œŽยฏ๐ต

๐œŽ2

๐ต

(๐œ…(1โˆ’๐›พ) +๐œŽยฏ๐ด) with๐œ…๐‘… =โˆฅ๐‘…โˆฅ โˆฅ๐‘…โˆ’1โˆฅ.

Proof. The proof is divided by two parts. For the first part, we prove the ISS property given the upper bound ๐พ๐‘š ๐‘Ž๐‘ฅ on the controllers โˆฅ๐พ๐‘กโˆฅ. In the second part, we provide such an upper bound๐พ๐‘š ๐‘Ž๐‘ฅ.

Proof of ISS. By Lemma 4.2, to show ISS we only need to show that (๐พ๐‘ก)โˆž

๐‘ก=0

is sequential strongly stable for system (๐ด๐‘ก, ๐ต๐‘ก)โˆž

๐‘ก=0. {๐พ๐‘ก}โˆž

๐‘ก=0 is (๐œ…, ๐›พ , ๐œŒ)-sequential strongly stable for the system (๐ดห†๐‘ก,๐ตห†๐‘ก)โˆž

๐‘ก=0 with (๐œ…, ๐›พ , ๐œŒ) defined by Lemma 4.1 as ๐œ… = โˆš๐œ…๐‘Š

1โˆ’๐›ผ, ๐›พ = 1โˆ’ โˆš

๐›ผ, ๐œŒ = โˆš๏ธ ๐›ผ

1โˆ’๐›ผ where ๐œ…๐‘Š = โˆฅ๐‘Šโˆฅ โˆฅ๐‘Šโˆ’1โˆฅ. Thus, there exist matrices ๐ป1, ๐ป2, ..., and ๐ฟ1, ๐ฟ2..., such that ห†๐‘€๐‘ก := ๐ดห†๐‘ก +๐ตห†๐‘ก๐พ๐‘ก = ๐ป๐‘ก๐ฟ๐‘ก๐ปโˆ’1

๐‘ก with the following properties: (a) โˆฅ๐ฟ๐‘กโˆฅ โ‰ค 1โˆ’ ๐›พ; (b) โˆฅ๐ป๐‘กโˆฅ โ‰ค ๐›ฝ1 and โˆฅ๐ปโˆ’1

๐‘ก โˆฅ โ‰ค 1/๐›ฝ2 with ๐œ… = ๐›ฝ1/๐›ฝ2; (c) โˆฅ๐ปโˆ’1

๐‘ก+1๐ป๐‘กโˆฅ โ‰ค 1โˆ’๐›พ๐œŒ . With this decomposition for ห†๐‘€๐‘ก, we show that ๐‘€๐‘ก := ๐ด๐‘ก+๐ต๐‘ก๐พ๐‘ก can be decomposed similarly. Letฮ”๐‘ก =๐‘€๐‘กโˆ’๐‘€ห†๐‘ก. Then,

๐‘€๐‘ก = ๐‘€ห†๐‘ก+ฮ”๐‘ก =๐ป๐‘ก๐ฟ๐‘ก๐ปโˆ’1

๐‘ก +๐ป๐‘ก๐ปโˆ’1

๐‘ก ฮ”๐‘ก๐ป๐‘ก๐ปโˆ’1

๐‘ก =๐ป๐‘ก(๐ฟ๐‘ก+๐ปโˆ’1

๐‘ก ฮ”๐‘ก๐ป๐‘ก)๐ปโˆ’1

๐‘ก =๐ป๐‘ก๐ฟโ€ฒ

๐‘ก๐ปโˆ’1

๐‘ก . (4.16) We next upper boundโˆฅ๐ฟโ€ฒ

๐‘กโˆฅ. Notice that

โˆฅฮ”๐‘กโˆฅ =โˆฅ๐‘€๐‘กโˆ’๐‘€ห†๐‘กโˆฅ = โˆฅ๐ด๐‘ก+๐ต๐‘ก๐พ๐‘กโˆ’ ๐ดห†๐‘กโˆ’๐ตห†๐‘ก๐พ๐‘กโˆฅ

โ‰ค โˆฅ๐ด๐‘ก โˆ’๐ดห†๐‘กโˆฅ + โˆฅ๐ต๐‘กโˆ’ ๐ตห†๐‘กโˆฅ โˆฅ๐พ๐‘กโˆฅ

โ‰ค max{โˆฅ๐ด๐‘กโˆ’๐ดห†๐‘กโˆฅ,โˆฅ๐ต๐‘กโˆ’๐ตห†๐‘กโˆฅ}(1+ โˆฅ๐พ๐‘กโˆฅ)

โ‰ค ๐›ฟ

๐›พ

๐œ…(1+๐พ๐‘š ๐‘Ž๐‘ฅ)(1+ โˆฅ๐พ๐‘กโˆฅ) โ‰ค๐›ฟ ๐›พ ๐œ… . Then, we have the following bound on โˆฅ๐ฟโ€ฒ

๐‘กโˆฅ,

โˆฅ๐ฟโ€ฒ

๐‘กโˆฅ=โˆฅ๐ฟ๐‘ก+๐ปโˆ’1

๐‘ก ฮ”๐‘ก๐ป๐‘กโˆฅ โ‰ค โˆฅ๐ฟ๐‘กโˆฅ + โˆฅ๐ปโˆ’1

๐‘ก โˆฅ โˆฅฮ”๐‘กโˆฅ โˆฅ๐ป๐‘กโˆฅ โ‰ค1โˆ’๐›พ+๐›ฟ๐›พ=1โˆ’ (1โˆ’๐›ฟ)๐›พ . (4.17) Define ๐›พโ€ฒ = (1โˆ’ ๐›ฟ)๐›พ and ๐œŒโ€ฒ = 1โˆ’(1โˆ’1โˆ’๐›พ๐›ฟ)๐›พ๐œŒ. We claim that (๐พ๐‘ก)โˆž

๐‘ก=0 is (๐œ…, ๐›พโ€ฒ, ๐œŒโ€ฒ) sequential strongly stable for system (๐ด๐‘ก, ๐ต๐‘ก)โˆž

๐‘ก=0. In the decomposition (4.16), ๐ฟโ€ฒ

๐‘ก

satisfiesโˆฅ๐ฟโ€ฒ

๐‘กโˆฅ โ‰ค 1โˆ’๐›พโ€ฒ(which we have proved in (4.17)), and by definition๐ป๐‘กsatisfies

โˆฅ๐ป๐‘กโˆฅ โ‰ค ๐›ฝ1andโˆฅ๐ปโˆ’1

๐‘ก โˆฅ โ‰ค 1/๐›ฝ2with๐œ…= ๐›ฝ1/๐›ฝ2; โˆฅ๐ปโˆ’1

๐‘ก+1๐ป๐‘กโˆฅ โ‰ค ๐œŒ

1โˆ’๐›พ = ๐œŒโ€ฒ

1โˆ’๐›พโ€ฒ. Therefore, the only conditions we need to verify are (a) 0 < ๐›พโ€ฒ โ‰ค 1 and (b) 0 โ‰ค ๐œŒโ€ฒ < 1.

Condition (a) is trivial since for any๐›ผโˆˆ [0, 1

2), we have๐›ฟ โˆˆ (0,

โˆš 1โˆ’๐›ผโˆ’โˆš

๐›ผ 1โˆ’โˆš

๐›ผ ) โŠ‚ (0,1) and hence๐›พโ€ฒ=(1โˆ’๐›ฟ)๐›พlies in(0,1]. For condition (b), as๐œŒโ€ฒ= 1โˆ’(1โˆ’๐›ฟ)๐›พ

1โˆ’๐›พ ๐œŒ, where๐œŒ=

โˆš๏ธ ๐›ผ

1โˆ’๐›ผ and๐›พ =1โˆ’โˆš

๐›ผ, it follows that๐œŒโ€ฒ <1 is equivalent to the following condition:

๐œŒโ€ฒ= 1โˆ’ (1โˆ’๐›ฟ)๐›พ 1โˆ’๐›พ

๐œŒ = ๐›ฟ+ (1โˆ’๐›ฟ)โˆš ๐›ผ

โˆš 1โˆ’๐›ผ

< 1. (4.18)

Reorganizing the terms we have๐›ฟ <

โˆš 1โˆ’๐›ผโˆ’โˆš

๐›ผ 1โˆ’โˆš

๐›ผ , which is satisfied by our condition on๐›ฟ. Upper boundingโˆฅ๐พ๐‘กโˆฅ. The only thing that remains to be shown is that there exists an upper bound on the controller gain matrix under the conditions stated in the theorem.

Note that ๐พ๐‘ก must satisfy ห†๐‘€๐‘ก = ๐ดห†๐‘ก + ๐ตห†๐‘ก๐พ๐‘ก. On the other hand, by the COCO-LQ formulation,๐พ๐‘กsolves the following problem

min

๐พ๐‘ก

Tr(๐‘… ๐พ๐‘กฮฃ๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก ) s.t. ห†๐‘€๐‘ก = ๐ดห†๐‘ก+๐ตห†๐‘ก๐พ๐‘ก,

whereฮฃ๐‘ฅ ๐‘ฅ is the solution to the COCO-LQ formulation at time๐‘ก. The Lagrangian of the above optimization is

๐ฟ(๐พ๐‘ก,ฮ“)=Tr(๐‘… ๐พ๐‘กฮฃ๐‘ฅ ๐‘ฅ๐พโŠค

๐‘ก ) +Tr(ฮ“โŠค(๐ดห†๐‘ก+๐ตห†๐‘ก๐พ๐‘กโˆ’๐‘€ห†๐‘ก)). The optimizer๐พ๐‘ก must satisfy the following condition,

โˆ‡๐พ๐‘ก๐ฟ(๐พ๐‘ก,ฮ“)=2๐‘… ๐พ๐‘กฮฃ๐‘ฅ ๐‘ฅ +๐ตห†โŠค

๐‘ก ฮ“ =0. ห†

๐‘€๐‘ก = ๐ดห†๐‘ก +๐ตห†๐‘ก๐พ๐‘ก. Together, we obtain๐พ๐‘ก =๐‘…โˆ’1๐ตห†โŠค

๐‘ก (๐ตห†๐‘ก๐‘…โˆ’1๐ตห†โŠค

๐‘ก )โˆ’1(๐‘€ห†๐‘กโˆ’ ๐ดห†๐‘ก)and

โˆฅ๐พ๐‘กโˆฅ โ‰ค โˆฅ๐‘…โˆ’1โˆฅ โˆฅ๐ตห†โŠค

๐‘ก โˆฅ 1

๐œŽmin(๐‘…โˆ’1)๐œŽmin(๐ตห†๐‘ก๐ตห†โŠค

๐‘ก )(๐œ…(1โˆ’๐›พ) + โˆฅ๐ดห†๐‘กโˆฅ), (4.20) where we have used by the(๐œ…, ๐›พ , ๐œŒ)sequential strong stability of(๐พ๐‘ก)โˆž

๐‘ก=0for system (๐ดห†๐‘ก,๐ตห†๐‘ก)โˆž

๐‘ก=0, we haveโˆฅ๐‘€ห†๐‘กโˆฅ = โˆฅ๐ดห†๐‘ก+ ๐ตห†๐‘ก๐พ๐‘กโˆฅ โ‰ค ๐œ…(1โˆ’ ๐›พ). By our condition, we have

โˆฅ๐ดห†๐‘กโˆฅ โ‰ค ๐œŽยฏ๐ด, โˆฅ๐ตห†๐‘กโˆฅ โ‰ค ๐œŽยฏ๐ต, ห†๐ต๐‘ก๐ตห†โŠค

๐‘ก โชฐ ๐œŽ2

๐ต, and ๐œ…๐‘… = โˆฅ๐‘…โˆฅ โˆฅ๐‘…โˆ’1โˆฅ. Then, we have โˆฅ๐พ๐‘กโˆฅ upper bounded as follows,

โˆฅ๐พ๐‘กโˆฅ โ‰ค ๐œ…๐‘…

ยฏ ๐œŽ๐ต ๐œŽ2

๐ต

(๐œ…(1โˆ’๐›พ) +๐œŽยฏ๐ด).

โ–ก This result highlights the tradeoff between the estimation error and the algorithm performance. If we choose a small๐›ผ, the algorithm can tolerate a larger estimation error (i.e., larger right-hand side of (4.15) can be obtained) but may lead to high control cost due to the tight state covariance constraint. If we choose a larger๐›ผ, the algorithm tolerates smaller estimation errors while its performance improves due to the less strict state covariance constraint.

Dalam dokumen Learning and Control of Dynamical Systems (Halaman 116-128)