LEARNING AND CONTROL IN LINEAR TIME-VARYING SYSTEMS
4.2 Stable Online Control of Linear Time-Varying Systems
4.2.3 Main Result
The previous section highlights that a naive application of LTI control cannot guar- antee stability for LTV systems. We now propose a new approach, COvariance Constrained Online LQ (COCO-LQ) control. Our main technical result shows that COCO-LQ provably guarantees stability in LTV systems when the SDP is feasible.
Then, we discuss how to handle the situation when the SDP is infeasible. Finally, we discuss the effect of model estimation error.
COvariance Constrained Online LQ (COCO-LQ)
The naive approach discussed in Section 4.2.2 seeks to solve the LTI problem at every time step, which is equivalent to solving the SDP in Proposition 4.1 for every (๐ด๐ก, ๐ต๐ก). The reason this method fails is that it only considers cost minimization without explicitly considering stability. The main idea of COCO-LQ is to enforce stability via a state covariance constraint embedded into the SDP framework. The proposed algorithm is stated formally in Algorithm 7.
COCO-LQ solves an SDP (4.3) at each time step that is similar to that in Propo- sition 4.1. The crucial difference is the new constraint (4.3c), which involves parameter๐ผ. Plugging (4.3a) into constraint (4.3c) yields the following:
ฮฃ๐ฅ ๐ฅ โชฏ 1 1โ๐ผ
๐ .
This highlights that constraint (4.3c) can be interpreted as an upper bound on the state covariance matrixฮฃ๐ฅ ๐ฅ. When๐ผ =0, the controller essentially cancels out the dynamics, without taking into account the cost of doing so. This ensures stability but can lead to large cost. At another extreme, when๐ผโ1, the SDP solved at each time step is the same as for the LTI setting, and so COCO-LQ matches the naive
Algorithm 7COCO-LQ: COvariance Constrained Online LQ
1: Input: ๐ผ โ [0,1),๐ , ๐ , ๐ โป 0
2: for๐ก =1,2, ...do
3: Receive state๐ฅ๐ก, and system parameter ๐ด๐ก, ๐ต๐ก
4: Solve the following SDP forฮฃ๐ก โR(๐+๐)ร(๐+๐): minimize
๐ 0
0 ๐
โขฮฃ subject to ฮฃ๐ฅ ๐ฅ =
๐ด๐ก ๐ต๐ก ฮฃ
๐ด๐ก ๐ต๐กโค
+๐ (4.3a)
ฮฃ โชฐ 0 (4.3b)
๐ด๐ก ๐ต๐ก ฮฃ
๐ด๐ก ๐ต๐กโค
โชฏ ๐ผฮฃ๐ฅ ๐ฅ (4.3c)
5: Compute the control gain ๐พ๐ก = ฮฃโค๐ฅ ๐ขฮฃ๐ฅ ๐ฅโ1, whereฮฃ๐ก =
ฮฃ๐ฅ ๐ฅ ฮฃ๐ฅ ๐ข
ฮฃ๐ฅ ๐ข
โค ฮฃ๐ข๐ข
6: Execute control action๐ข๐ก =๐พ๐ก๐ฅ๐ก
approach. Thus,๐ผtrades off between stability and cost. In the following section, we show that this novel state covariance constraint promotes sequential strong stability [61], which in turn guarantees ISS with a proper choice of๐ผ.
Stability
We now state our main technical result, which provides a formal stability guarantee for COCO-LQ.
Theorem 4.1. Let 0 โค ๐ผ < 1/2, and suppose (4.3) is feasible for all ๐ก, then the resulting dynamical system satisfies ISS in the sense that for any disturbance sequence{๐ค๐ก}โ
๐ก=0and for any๐ก โฅ ๐ก0,
โฅ๐ฅ๐กโฅ โค ๐๐กโ๐ก0โฅ๐ฅ๐ก
0โฅ + ๐ ๐ 1โ๐
sup
๐ก0โค๐ <๐ก
โฅ๐ค๐โฅ for ๐ = โ๏ธ ๐ผ
1โ๐ผ โ [0,1) and ๐ = โ๐ ๐
1โ๐ผ, where ๐ ๐ = โฅ๐โฅ โฅ๐โ1โฅ is the condition number of๐.
The key intuition underpinning this result is that the additional state covariance constraint (4.3c) implicitly enforces sequential strong stability [61], which in turn ensures ISS. These results are formally stated in Lemma 4.1 and Lemma 4.2 respec- tively. First, we formally define sequential strong stability.
Definition 4.2(Sequential Strong Stability). A sequence of policies๐พ1, ๐พ2, ...,such that ๐ข๐ก = ๐พ๐ก๐ฅ๐ก is (๐ , ๐พ , ๐)-sequential strongly stable (for ๐ > 0, 0 < ๐พ โค 1 and
0 โค ๐ < 1) if there exist matrices๐ป1, ๐ป2, ..., and ๐ฟ1, ๐ฟ2..., such that ๐ด๐ก+๐ต๐ก๐พ๐ก = ๐ป๐ก๐ฟ๐ก๐ปโ1
๐ก for all ๐ก, with the following properties: (a) โฅ๐ฟ๐กโฅ โค 1โ๐พ; (b) โฅ๐ป๐กโฅ โค ๐ฝ1 and||๐ปโ1
๐ก || โค 1/๐ฝ2with๐ =๐ฝ1/๐ฝ2; (c)โฅ๐ปโ1
๐ก+1๐ป๐กโฅ โค ๐
1โ๐พ.
With this definition in place, we present the connection between the condition in (4.3c) and sequential strong stability.
Lemma 4.1. Under the conditions in Theorem 4.1, the policies designed by COCO- LQ are (๐ , ๐พ , ๐)-sequential strongly stability for ๐ = โ๐ ๐
1โ๐ผ, ๐พ =1โโ
๐ผ, ๐ =โ๏ธ ๐ผ 1โ๐ผ
where๐ ๐ = โฅ๐โฅ โฅ๐โ1โฅ.
Proof. To prove Lemma 4.1, we first show that the optimal solution to the SDP in (4.3) has a specific low rank structure. In particular, we claim that if (4.3) has a minimizerฮฃโ, then there exists a๐พ โR๐ร๐ s.t. ฮฃโcan be written as
ฮฃโ =
"
ฮฃ๐ฅ ๐ฅโ ฮฃโ๐ฅ ๐ข (ฮฃ๐ฅ ๐ขโ )โค ฮฃโ๐ข๐ข
#
=
"
ฮฃ๐ฅ ๐ฅโ ฮฃโ๐ฅ ๐ฅ๐พโค ๐พฮฃโ๐ฅ ๐ฅ ๐พฮฃโ๐ฅ ๐ฅ๐พโค
#
. (4.4)
To see this, first note that ฮฃโ๐ฅ ๐ฅ โชฐ ๐ โป 0, and therefore we can simply define ๐พ =(ฮฃ๐ฅ ๐ขโ )โค(ฮฃ๐ฅ ๐ฅโ )โ1, and the only thing we need to show isฮฃโ๐ข๐ข =๐พฮฃ๐ฅ ๐ฅโ ๐พโค. Suppose this is not true, then sinceฮฃโ โชฐ 0, there must exist ๐ท โ 0, ๐ท โชฐ 0 s.t.
ฮฃโ๐ข๐ข =๐พฮฃ๐ฅ ๐ฅโ ๐พโค +๐ท . (4.5) Then, by (4.3a), we also have,
ฮฃโ๐ฅ ๐ฅ = (๐ด๐ก+๐ต๐ก๐พ)ฮฃโ๐ฅ ๐ฅ(๐ด๐ก+๐ต๐ก๐พ)โค+๐ต๐ก๐ท ๐ตโค
๐ก +๐ . (4.6)
Viewing the above as a Lyapunov equation in terms ofฮฃโ๐ฅ ๐ฅ, and since๐ต๐ก๐ท ๐ตโค
๐ก +๐ โป 0 andฮฃโ๐ฅ ๐ฅ โป 0, we get ๐ด๐ก+๐ต๐ก๐พ is a stable matrix. Next, since ๐ด๐ก+๐ต๐ก๐พis stable, we can construct หฮฃ๐ฅ ๐ฅ to be the unique solution to the following Lyapunov equation,
ฮฃห๐ฅ ๐ฅ = (๐ด๐ก+๐ต๐ก๐พ)ฮฃห๐ฅ ๐ฅ(๐ด๐ก+๐ต๐ก๐พ)โค+๐ . (4.7) We further define,
ฮฃ =ห
"
ฮฃห๐ฅ ๐ฅ ฮฃห๐ฅ ๐ฅ๐พโค ๐พฮฃห๐ฅ ๐ฅ ๐พฮฃห๐ฅ ๐ฅ๐พโค
# ,
and we claim that หฮฃis a feasible solution to (4.3). Clearly หฮฃis positive semi-definite, and further, it satisfies (4.3a). We now check that (4.3c) is met below.
h ๐ด๐ก ๐ต๐ก
iฮฃห h ๐ด๐ก ๐ต๐ก
iโค
โ๐ผฮฃห๐ฅ ๐ฅ
= (๐ด๐ก+๐ต๐ก๐พ)ฮฃห๐ฅ ๐ฅ(๐ด๐ก+๐ต๐ก๐พ)โคโ๐ผฮฃห๐ฅ ๐ฅ
= (๐ด๐ก+๐ต๐ก๐พ)ฮฃโ๐ฅ ๐ฅ(๐ด๐ก+๐ต๐ก๐พ)โคโ๐ผฮฃ๐ฅ ๐ฅโ + (๐ด๐ก+๐ต๐ก๐พ) (ฮฃห๐ฅ ๐ฅโฮฃ๐ฅ ๐ฅโ ) (๐ด๐ก+๐ต๐ก๐พ)โคโ๐ผ(ฮฃห๐ฅ ๐ฅโฮฃโ๐ฅ ๐ฅ)
= [๐ด๐ก, ๐ต๐ก]ฮฃโ[๐ด๐ก, ๐ต๐ก]โคโ๐ต๐ก๐ท ๐ตโค
๐กโ๐ผฮฃโ๐ฅ ๐ฅ+ (๐ด๐ก+๐ต๐ก๐พ) (ฮฃห๐ฅ ๐ฅโฮฃโ๐ฅ ๐ฅ) (๐ด๐ก+๐ต๐ก๐พ)โคโ๐ผ(ฮฃห๐ฅ ๐ฅโฮฃ๐ฅ ๐ฅโ )
โชฏ โ๐ต๐ก๐ท ๐ตโค
๐ก + (๐ด๐ก+๐ต๐ก๐พ) (ฮฃห๐ฅ ๐ฅโฮฃ๐ฅ ๐ฅโ ) (๐ด๐ก+๐ต๐ก๐พ)โค โ๐ผ(ฮฃห๐ฅ ๐ฅ โฮฃโ๐ฅ ๐ฅ), (4.8) where (4.8) is due toฮฃโ must satisfy (4.3c). Subtracting (4.6) from (4.7), we have,
ฮฃห๐ฅ ๐ฅ โฮฃโ๐ฅ ๐ฅ =(๐ด๐ก+๐ต๐ก๐พ) (ฮฃห๐ฅ ๐ฅโฮฃ๐ฅ ๐ฅโ ) (๐ด๐ก+๐ต๐ก๐พ)โค โ๐ต๐ก๐ท ๐ตโค
๐ก . (4.9) Plugging (4.9) into (4.8), we have,
h ๐ด๐ก ๐ต๐ก
iฮฃห h
๐ด๐ก ๐ต๐ก iโค
โ๐ผฮฃห๐ฅ ๐ฅ โชฏ (1โ๐ผ) (ฮฃห๐ฅ ๐ฅ โฮฃโ๐ฅ ๐ฅ).
Therefore, to check (4.3c) it remains to show หฮฃ๐ฅ ๐ฅ โฮฃ๐ฅ ๐ฅโ โชฏ 0. To see this, we view (4.9) as a Lyapunov equation in terms of หฮฃ๐ฅ ๐ฅ โฮฃ๐ฅ ๐ฅโ , and as ๐ด๐ก+๐ต๐ก๐พ is stable, and ๐ต๐ก๐ท ๐ตโค
๐ก โชฐ 0, we have หฮฃ๐ฅ ๐ฅ โฮฃ๐ฅ ๐ฅโ โชฏ 0. As a result, (4.3c) holds and หฮฃ๐ฅ ๐ฅ is indeed a feasible solution to (4.3).
Further, we can show หฮฃachieves a strictly lower cost thanฮฃโ. To see this, note we have already shown หฮฃ๐ฅ ๐ฅ โชฏ ฮฃโ๐ฅ ๐ฅ, which also implies หฮฃ๐ข๐ข = ๐พฮฃห๐ฅ ๐ฅ๐พโค โชฏ ๐พฮฃ๐ฅ ๐ฅโ ๐พโค = ฮฃ๐ข๐ขโ โ๐ทwith๐ท โชฐ 0,๐ท โ 0. Coupled this with the fact that๐ , ๐ are strictly positive definite, we deduce that หฮฃmust achieve a lower cost. So we get a contradiction, and verified that (4.4) holds.
Now that we established the decomposition in (4.4) for the solution of (4.3), we proceed with the proof of Lemma 4.1. Let ฮฃ๐ก be the solution to the SDP (4.3) at step๐ก. Then,ฮฃ๐ก can be rewritten as,
ฮฃ๐ก =
"
ฮฃ๐ก ,๐ฅ ๐ฅ ฮฃ๐ก ,๐ฅ ๐ฅ๐พโค
๐ก
๐พ๐กฮฃ๐ก ,๐ฅ ๐ฅ ๐พ๐กฮฃ๐ก ,๐ฅ ๐ฅ๐พโค
๐ก
# ,
for someฮฃ๐ก ,๐ฅ ๐ฅ โป 0, and๐พ๐กis the linear controller at step๐ก. With this, we can re-write the left side of constraint (4.3c) as follows,
h ๐ด๐ก ๐ต๐ก
i ฮฃ๐ก
h ๐ด๐ก ๐ต๐ก
iโค
= h ๐ด๐ก ๐ต๐ก
i
"
ฮฃ๐ก ,๐ฅ ๐ฅ ฮฃ๐ก ,๐ฅ ๐ฅ๐พโค
๐ก
๐พ๐กฮฃ๐ก ,๐ฅ ๐ฅ ๐พ๐กฮฃ๐ก ,๐ฅ ๐ฅ๐พโค
๐ก
# "
๐ดโค
๐ก
๐ตโค
๐ก
#
= ๐ด๐กฮฃ๐ก ,๐ฅ ๐ฅ๐ดโค
๐ก +๐ด๐ก๐พ๐กฮฃ๐ก ,๐ฅ ๐ฅ๐ตโค
๐ก +๐ต๐กฮฃ๐ก ,๐ฅ ๐ฅ๐พโค
๐ก ๐ดโค
๐ก +๐ต๐ก๐พ๐กฮฃ๐ก ,๐ฅ ๐ฅ๐พโค
๐ก ๐ตโค
๐ก
= (๐ด๐ก+๐ต๐ก๐พ๐ก)ฮฃ๐ก ,๐ฅ ๐ฅ(๐ด๐ก+๐ต๐ก๐พ๐ก)โค.
As a result, (4.3c) can be equivalently expressed as,
(๐ด๐ก+๐ต๐ก๐พ๐ก)ฮฃ๐ก ,๐ฅ ๐ฅ(๐ด๐ก+๐ต๐ก๐พ๐ก)โค โชฏ ๐ผฮฃ๐ก ,๐ฅ ๐ฅ. (4.10) Left and right multiplying the above byฮฃโ1/2๐ก ,๐ฅ ๐ฅ , we get
ฮฃโ1/2๐ก ,๐ฅ ๐ฅ (๐ด๐ก +๐ต๐ก๐พ๐ก)ฮฃ1/2๐ก ,๐ฅ ๐ฅฮฃ๐ก ,๐ฅ ๐ฅ1/2(๐ด๐ก+๐ต๐ก๐พ๐ก)โคฮฃ๐ก ,๐ฅ ๐ฅโ1/2โชฏ ๐ผ ๐ผ . (4.11) Let๐ป๐ก = ฮฃ1/2๐ก ,๐ฅ ๐ฅ,๐ฟ๐ก = ฮฃ๐ก ,๐ฅ ๐ฅโ1/2(๐ด๐ก+๐ต๐ก๐พ๐ก)ฮฃ1/2๐ก ,๐ฅ ๐ฅ. We get the following decomposition,
(๐ด๐ก +๐ต๐ก๐พ๐ก) =๐ป๐ก๐ฟ๐ก๐ปโ1
๐ก . (4.12)
We now show that this decomposition yields the desired sequential stability property.
To do so, we need to check the three conditions in Definition 4.2.
To check condition (a) in Definition 4.2, note that (4.11) provides an upper bound onโฅ๐ฟ๐กโฅ, that is ๐ฟ๐ก๐ฟโค
๐ก โชฏ ๐ผ ๐ผ. Therefore, โฅ๐ฟ๐กโฅ โคโ
๐ผ=1โ๐พ with๐พ =1โโ ๐ผ. To check condition (b) which is an upper bound on โฅ๐ป๐กโฅ and โฅ๐ปโ1
๐ก โฅ, we recall constraint (4.3a),
h ๐ด๐ก ๐ต๐ก
i ฮฃ๐ก
h ๐ด๐ก ๐ต๐ก
iโค
= ฮฃ๐ก ,๐ฅ ๐ฅโ๐ .
As the left-hand side of the above is positive semi-definite, we must haveฮฃ๐ก ,๐ฅ ๐ฅ โชฐ๐. Further, plugging (4.3a) into (4.3c), we can get, ฮฃ๐ก ,๐ฅ ๐ฅ โ๐ โชฏ ๐ผฮฃ๐ก ,๐ฅ ๐ฅ. These can be equivalently written as
๐ โชฏ ฮฃ๐ก ,๐ฅ ๐ฅ โชฏ ๐ 1โ๐ผ
. (4.13)
Therefore, โฅ๐ป๐กโฅ = โฅฮฃ1/2๐ก ,๐ฅ ๐ฅโฅ โค โฅโ๐1/2โฅ
1โ๐ผ , โฅ๐ปโ1
๐ก โฅ = โฅฮฃ๐ก ,๐ฅ ๐ฅโ1/2โฅ โค โฅ๐โ1/2โฅ, โฅ๐ป๐กโฅ โฅ๐ปโ1
๐ก โฅ โค
โฅ๐1/2โฅ โฅ๐โ1/2โฅ
โ 1โ๐ผ
= โ๐ ๐
1โ๐ผ. Therefore, condition (b) holds with๐ = โ๐ ๐
1โ๐ผ. Lastly, we check condition (c), which is an upper bound onโฅ๐ปโ1
๐ก+1๐ป๐กโฅ. Note that, ๐ปโ1
๐ก+1๐ป๐ก๐ปโค
๐ก (๐ปโ1
๐ก+1)โค=๐ปโ1
๐ก+1ฮฃ๐ก ,๐ฅ ๐ฅ(๐ปโ1
๐ก+1)โคโชฏ 1 1โ๐ผ
(ฮฃ๐ก+1,๐ฅ ๐ฅ)โ1/2๐(ฮฃ๐ก+1,๐ฅ ๐ฅ)โ1/2โชฏ 1 1โ๐ผ
๐ผ , where the two inequalities in the above derivations are due to (4.13). As a result, we have โฅ๐ปโ1
๐ก+1๐ป๐กโฅ โค โ1
1โ๐ผ = 1โ๐๐พ with ๐ =
โ
โ ๐ผ
1โ๐ผ. As ๐ผ < 1/2, we have ๐ < 1.
As such, condition (c) holds. In summary, the (๐ , ๐พ , ๐) strong sequential stability holds, with๐ = โ๐ ๐
1โ๐ผ,๐พ =1โโ
๐ผ, and๐ =โ๏ธ ๐ผ
1โ๐ผ, which concludes the proof. โก Now that we established the sequential strong stability of the policies obtained via COCO-LQ, we require relating this fact to ISS. The following provides the remaining piece in the proof of Theorem 4.1.
Lemma 4.2. Suppose a sequence of policies ๐พ1, . . . , ๐พ๐ก is (๐ , ๐พ , ๐)-sequential strongly stable. Then, the closed-loop system obtained via the sequence of these policies is input-to-state stable in the sense that for any๐ก โฅ ๐ก0โฅ 1,
โฅ๐ฅ๐กโฅ โค ๐ ๐๐กโ๐ก0โฅ๐ฅ๐ก
0โฅ + ๐ ๐ 1โ๐
max
๐ก0โค๐ <๐กโฅ๐ค๐ โฅ.
Proof. For notational simplicity, we only prove the case with๐ก0=1; the general case follows similarly, at the cost of heavier notations. Let๐ฅ1, ๐ฅ2, ...be a sequence of states starting from initial state๐ฅ1, and generated by the dynamics๐ฅ๐ก+1 =๐ด๐ก๐ฅ๐ก+๐ต๐ก๐ข๐ก+๐ค๐ก = (๐ด๐ก+๐ต๐ก๐พ๐ก)๐ฅ๐ก+๐ค๐ก. Hence,๐ฅ๐ก =๐1๐ฅ1+ร๐กโ1
๐ =1๐๐ +1๐ค๐ ,where
๐๐ก =๐ผ;๐๐ =๐๐ +1(๐ด๐ +๐ต๐ ๐พ๐ )= (๐ด๐กโ1+๐ต๐กโ1๐พ๐กโ1) ยท ยท ยท (๐ด๐ +๐ต๐ ๐พ๐ ). By the sequential strong stability, there exist matrices ๐ป1, ๐ป2, ... and ๐ฟ1, ๐ฟ2, ...
such that ๐ด๐ + ๐ต๐๐พ๐ = ๐ป๐๐ฟ๐๐ปโ1
๐ , and ๐ป๐, ๐ฟ๐ satisfy the properties specified in Definition 4.2. Thus we have for all 1 โค ๐ < ๐ก,
โฅ๐๐ โฅ = โฅ๐ป๐กโ1๐ฟ๐กโ1๐ปโ1
๐กโ1๐ป๐กโ2๐ฟ๐กโ1๐ปโ1
๐กโ2ยท ยท ยท๐ป๐ ๐ฟ๐ ๐ปโ1
๐ โฅ
โค โฅ๐ป๐กโ1โฅ
๐กโ1
ร
๐=๐
โฅ๐ฟ๐โฅ
! ๐กโ2 ร
๐=๐
โฅ๐ปโ1
๐+1๐ป๐โฅ
!
โฅ๐ปโ1
๐ โฅ,
โค ๐ฝ1(1โ๐พ)๐กโ๐ ( ๐ 1โ๐พ
)๐กโ๐ โ1(1/๐ฝ2) โค ๐ (1โ๐พ) ๐
๐๐กโ๐ . As๐ โฅ 1, the same holds for๐๐ก. Thus, we have,
โฅ๐ฅ๐กโฅ โค โฅ๐1โฅ โฅ๐ฅ1โฅ +
๐กโ1
โ๏ธ
๐ =1
โฅ๐๐ +1โฅ โฅ๐ค๐ โฅ
โค ๐ ๐๐กโ1โฅ๐ฅ1โฅ +๐
๐กโ1
โ๏ธ
๐ =1
๐๐กโ๐ โ1โฅ๐ค๐ โฅ โค ๐ ๐๐กโ1โฅ๐ฅ1โฅ + ๐ ๐ 1โ๐
max
1โค๐ <๐กโฅ๐ค๐ โฅ.
โก Proof of Theorem 4.1. Combining Lemma 4.1 with Lemma 4.2 gives the advertised
result of stability for COCO-LQ. โก
A critical assumption in Theorem 4.1 is that (4.3) is feasible for 0โค๐ผ <1/2. The following result provides a condition when the problem is always feasible.
Lemma 4.3. When๐ต๐กis full row rank, then the SDP(4.3)is always feasible.
Proof. We prove this result by explicitly constructing a feasible solution for (4.3).
As๐ต๐ก has full row rank,๐ต๐ก๐ตโค
๐ก is invertible. Let ฮฃ0=
"
๐ โ๐ ๐ดโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ต๐ก
โ๐ตโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ด๐ก๐ ๐ตโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ด๐ก๐ ๐ดโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ต๐ก
#
=
"
๐ผ
โ๐ตโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ด๐ก
# ๐
"
๐ผ
โ๐ตโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ด๐ก
#โค
.
It suffices to show thatฮฃ0satisfies (4.3a), (4.3b), and (4.3c). Notice that [๐ด๐ก, ๐ต๐ก]ฮฃ0[๐ด๐ก, ๐ต๐ก]โค = (๐ด๐กโ๐ต๐ก๐ตโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ด๐ก)๐(๐ด๐กโ๐ต๐ก๐ตโค
๐ก (๐ต๐ก๐ตโค
๐ก )โ1๐ด๐ก)โค =0. Therefore, constraint (4.3a) is equivalent toฮฃ๐ฅ ๐ฅ =๐, which holds by the construc- tion ofฮฃ0. Next, as๐ โป 0,ฮฃ0 is clearly positive semi-definitive and (4.3b) holds.
Finally, note the left-hand side of (4.3c) is 0, and the right-hand side of (4.3c) is
positive semi-definite. As a result, (4.3c) holds. โก
Note that having ๐ต๐ก full row rank is a sufficient but not necessary condition for the feasibility of (4.3) of COCO-LQ. When ๐ต๐ก is not full row rank, the feasibility as- sumption may still hold, and therefore our assumption is weaker than the invertibility assumption used in the literature, e.g., Lai [155]. More broadly, in Theorem 4.1, ๐ผ <0.5 is a sufficient condition for stability. For๐ผโฅ 0.5, stability may still hold for some problem instances(๐ด๐ก, ๐ต๐ก)as will be shown in the simulations in Section 4.2.4.
How to provide a more refined instance-dependent threshold on๐ผis an interesting future direction.
Infeasibility and the Role of Predictions
We now turn our attention to the case when the SDP given in (4.3) is infeasible. In this case it is necessary for the controller to use additional information in order to stabilize the system. In particular, the following example shows that when๐ต๐ก is not full row rank, for any (deterministic) online control algorithm that has causal access to system matrices, there exists a future sequence of (๐ด๐ก, ๐ต๐ก) such that the system state will blow up from a given initial state.
Example 4.2. Set ๐ = 2, ๐ = 1, i.e., ๐ด๐ก is 2-by-2and ๐ต๐ก is 2-by-1. For all ๐ก, we set๐ต๐ก = [1,0]โค, and for even time indices ๐ก =2๐, we set ๐ด๐ก = ๐ผ. Further, assume that ๐ฅ0 = [1,1]โค and ๐ค๐ก = 0. Our construction is based on induction. Suppose
Algorithm 8COCO-LQ with Predictions
1: Input: ๐ผ โ [0,1),๐ , ๐ , ๐ โป 0
2: for๐ก =1,2, ...do
3: Receive state๐ฅ๐ก
4: if ๐ก โก1 (mod๐ป)then
5: Receive system parameters (๐ด๐ก, ๐ต๐ก), . . . ,(๐ด๐ก+๐ปโ1, ๐ต๐ก+๐ปโ1)
6: Solve (4.3) by replacing ๐ with ห๐ =I๐ป โ ๐ , ๐ด๐ก with ห๐ด๐กโ[๐ด๐ก+๐ปโ1ยท ยท ยท๐ด๐ก], and ๐ต๐ก with ๐ตห๐ก โ [๐ต๐ก+๐ปโ1, ๐ด๐ก+๐ปโ1๐ต๐ก+๐ปโ2, . . . , ๐ด๐ก+๐ปโ1ยท ยท ยท๐ด๐ก+1๐ต๐ก] to recoverฮฃ๐ก
7: ComputeKt =[๐พโค
๐ก , . . . , ๐พโค
๐ก+๐ปโ1]โค= ฮฃโค๐ฅ ๐ขฮฃ๐ฅ ๐ฅโ1, whereฮฃ๐ก=
ฮฃ๐ฅ ๐ฅ ฮฃ๐ฅ ๐ข
ฮฃ๐ฅ ๐ข
โค ฮฃ๐ข๐ข
8: Execute control action๐ข๐ก =๐พ๐ก๐ฅ๐ก
at an even index ๐ก = 2๐, ๐ฅ2๐ = [๐ฅ2๐ ,1, ๐ฅ2๐ ,2]โค with ๐ฅ2๐ ,2 โฅ 0. Then, after ๐ข2๐ is determined, we set
๐ด2๐+1=
"
1 0 ๐ 2
#
with๐ =0.5max(||๐ฅ2๐ฅ๐ ,2|
2๐ ,1|,1)sign(๐ข2๐)Then, after(๐ด2๐+1, ๐ต2๐+1)is revealed to the agent, it takes an action๐ข2๐+1, resulting in๐ฅ2๐+2= ๐ด2๐+1๐ด2๐๐ฅ2๐+๐ด2๐+1๐ต2๐๐ข2๐+๐ต2๐+1๐ข2๐+1. Since ๐ด2๐ = ๐ผ and๐ต2๐ = ๐ต2๐+1 = [1,0]โค, the second coordinate of๐ฅ2๐+2can then be written as ๐ฅ2๐+2,2 = ๐ ๐ฅ2๐ ,1 +2๐ฅ2๐ ,2+ ๐ ๐ข2๐. By the way ๐ is chosen, we have ๐ ๐ข2๐ โฅ 0, and |๐ ๐ฅ2๐ ,1| โค 0.5|๐ฅ2๐ ,2|, and therefore, ๐ฅ2๐+2,2 โฅ 1.5๐ฅ2๐ ,2. Since the system starts with๐ฅ0 = [1,1]โค, applying the argument above recursively, we have for any๐,๐ฅ2๐ ,2 โฅ 1.5๐, i.e., the state blows up.
In this section, we show that using (๐ด๐ก, ๐ต๐ก) together with short-term predictions of future system matrices is enough to stabilize the system under standard controlla- bility assumptions. Specifically, we extend COCO-LQ to include future๐ป steps of predictions in Algorithm 8. The key idea is to rewrite the dynamics as
๐ฅ๐ก+๐ป = ๐ดห๐ก๐ฅ๐ก+๐ตห๐ก๐ขยฏ๐ก+ [๐ผ , ๐ด๐ก+๐ปโ1,ยท ยท ยท , ๐ด๐ก+๐ปโ1... ๐ด๐ก+1]๐คยฏ๐ก, (4.14) where ห๐ด๐ก := ๐ด๐ก+๐ปโ1ยท ยท ยท๐ด๐ก, ห๐ต๐ก := [๐ต๐ก+๐ปโ1, ๐ด๐ก+๐ปโ1๐ต๐ก+๐ปโ2, . . . , ๐ด๐ก+๐ปโ1ยท ยท ยท๐ด๐ก+1๐ต๐ก],
ยฏ
๐ข๐ก:=[๐ขโค
๐ก+๐ปโ1, ๐ขโค
๐ก+๐ปโ2, . . . , ๐ขโค
๐ก ]โค and ยฏ๐ค๐ก:=[๐คโค
๐ก+๐ปโ1, ๐คโค
๐ก+๐ปโ2, . . . , ๐คโค
๐ก ]โค. Notice that
ห
๐ต๐ก is in the form of the controllability matrix for this LTV system. When๐ป is long enough such that ห๐ต๐กis full row rank, i.e., LTV system is๐ป-controllable, we can use Algorithm 8 on ห๐ด๐กand ห๐ต๐กand avoid the infeasibility issue, and we have the stability guarantee.
Theorem 4.2. Suppose for each ๐ก, matrix ๐ตห๐ก defined above satisfies ๐ตห๐ก๐ตหโค
๐ก โชฐ ๐ ๐ผ
for some ๐ > 0, and โฅ๐ด๐กโฅ โค ๐,โฅ๐ต๐กโฅ โค ๐ for some ๐, ๐ > 0. Then, the SDP in Algorithm 8 is always feasible. Further, when ๐ผ < 1/2, the closed-loop system is ISS for any๐ก,
โฅ๐ฅ๐กโฅ โค ๐ โฒ
๐ด๐
๐ก
๐ปโ1โฅ๐ฅ1โฅ +๐ โฒ
๐ด๐ ๐ด๐ max(1, ๐ 1โ๐
) sup
1โค๐ <๐ก
โฅ๐ค๐ โฅ, where the same as Theorem 4.1, ๐ = โ๏ธ ๐ผ
1โ๐ผ โ [0,1) and ๐ = โ๐ ๐
1โ๐ผ with ๐ ๐ =
โฅ๐โฅ โฅ๐โ1โฅbeing the condition number of๐; further,๐ ๐ด=1+๐+. . .+๐๐ปโ1, and ๐ โฒ
๐ด =๐๐ปโ1+๐2๐ 2
๐ด
๐ ๐ ๐ +๐
๐ป
๐ with๐ ๐ being the condition number of๐ .
Proof. The feasibility directly follows Lemma 4.3. For stability, using the dynamics given in (4.14), from Theorem 4.1, we have,โ๐ โฅ 1
โฅ๐ฅ๐ ๐ป+1โฅ โค ๐๐โฅ๐ฅ1โฅ + ๐ ๐ 1โ๐
sup
0โค๐ < ๐
โฅ๐คห๐ ๐ป+1โฅ. Note that, for any๐,
โฅ๐คห๐ ๐ป+1โฅ โค
๐ป
โ๏ธ
โ=1
โฅ๐ด๐ ๐ป+๐ปโฅ ยท ยท ยท โฅ๐ด๐ ๐ป+โ+1โฅ โฅ๐ค๐ ๐ป+โโฅ
โค (1+๐+. . .+๐๐ปโ1) sup
๐ ๐ป+1โค๐ โค(๐+1)๐ป
โฅ๐ค๐ โฅ := ๐ ๐ด sup
๐ ๐ป+1โค๐ โค(๐+1)๐ป
โฅ๐ค๐ โฅ. These show that,
โฅ๐ฅ๐ ๐ป+1โฅ โค ๐๐โฅ๐ฅ1โฅ + ๐ ๐ด๐ ๐ 1โ๐ sup
1โค๐ โค๐ ๐ป
โฅ๐ค๐ โฅ.
The above already shows the boundedness of states for time indexes๐ก =1 mod (๐ป). To show the boundedness of the states for all ๐ก, we need to consider the effect of control inputs in all ๐ป sequences. Note that Theorem 4.1 shows that the policies ๐พ๐ ๐ป+1is (๐ , ๐พ , ๐)-sequential strongly stable, with๐พ =1โโ
๐ผand๐ = โ๐ ๐
1โ๐ผ. This impliesโฅ๐ดห๐ ๐ป+1+๐ตห๐ ๐ป+1๐พ๐ ๐ป+1โฅ โค (1โ๐พ)๐ , showing
โฅ๐ตห๐ ๐ป+1๐พ๐ ๐ป+1โฅ โค (1โ๐พ)๐ +๐๐ป, which further leads to
โฅ๐พ๐ ๐ป+1โฅ โค ๐พmax= ๐ ๐
๐(1+๐+. . .+๐๐ปโ1) ๐
( (1โ๐พ)๐ +๐๐ป).
Therefore, we have
โฅ๐ฅ๐ ๐ป+โโฅ โค โฅ๐ด๐ ๐ป+โโ1ยท ยท ยท๐ด๐ ๐ป+1โฅ โฅ๐ฅ๐ ๐ป+1โฅ
+ โฅ h
๐ต๐ ๐ป+โโ1, ๐ด๐ ๐ป+โโ1๐ต๐ ๐ป+โโ2, . . . , ๐ด๐ ๐ป+โโ1ยท ยท ยท๐ด๐ ๐ป+2๐ต๐ ๐ป+1 i
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
๐ข๐ ๐ป+โโ1 .. . ๐ข๐ ๐ป+1
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
โฅ
+ โฅ [๐ผ , ๐ด๐ก+๐ปโ1,ยท ยท ยท , ๐ด๐ ๐ป+โโ1... ๐ด๐ ๐ป+1]
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
๐ค๐ ๐ป+โโ1 .. . ๐ค๐ ๐ป+1
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
โฅ
< ๐ โฒ
๐ดโฅ๐ฅ๐ ๐ป+1โฅ +๐ ๐ด sup
๐ ๐ป+1โค๐ < ๐ ๐ป+โ
โฅ๐ค๐ โฅ.
For any๐ก, let๐ be such that๐ก =๐ ๐ป+โwith 1 โค โ โค ๐ป. Then, we have,
โฅ๐ฅ๐กโฅ โค ๐ โฒ
๐ดโฅ๐ฅ๐ ๐ป+1โฅ +๐ ๐ด sup
๐ ๐ป+1โค๐ <๐ก
โฅ๐ค๐ โฅ
โค ๐ โฒ
๐ด๐๐โฅ๐ฅ1โฅ +๐ โฒ
๐ด
๐ ๐ด๐ ๐ 1โ ๐
sup
1โค๐ โค๐ ๐ป
โฅ๐ค๐ โฅ +๐ ๐ด sup
๐ ๐ป+1โค๐ <๐ก
โฅ๐ค๐ โฅ
โค ๐ โฒ
๐ด๐
๐ก
๐ปโ1โฅ๐ฅ1โฅ +๐ โฒ
๐ด๐ ๐ด๐ max(1, ๐ 1โ ๐
) sup
1โค๐ โค๐ก
โฅ๐ค๐ โฅ.
โก
Estimation Error
In both Algorithm 7 and Algorithm 8, the exact knowledge of state-transition ma- trices (๐ด๐ก, ๐ต๐ก) or the extended state-transition matrices (๐ดห๐ก,๐ตห๐ก) are needed when deriving the control actions. In this section, we show that COCO-LQ can still obtain a stabilizing controller in the case where only approximations are known, if the estimation error is controlled. Our main result is the following.
Theorem 4.3. Let (๐ดห๐ก, ๐ตห๐ก) be an estimate of (๐ด๐ก, ๐ต๐ก). Given ๐ผ โ [0,1
2), let ๐ =โ๏ธ ๐ผ
1โ๐ผ, ๐ = โฅ๐โฅ โฅ๐โ โ1โฅ
1โ๐ผ and๐พ = 1โโ
๐ผ. Let๐พ1, ๐พ2, ...be the policies designed by COCO-LQ for (๐ดห๐ก, ๐ตห๐ก)with parameter๐ผ. When the estimation error satisfies,
max{||๐ดห๐กโ๐ด๐ก||2,||๐ตห๐กโ๐ต๐ก||2} โค๐ฟ ๐พ
๐ (1+๐พ๐๐ ๐ฅ), (4.15) where๐ฟcan be any number in (0,
โ 1โ๐ผโโ
๐ผ 1โโ
๐ผ ), and๐พ๐ ๐๐ฅ is any uniform upper bound onโฅ๐พ๐กโฅ. Then, the policies๐พ๐กare ISS when applied to the system (๐ด๐ก, ๐ต๐ก),
โฅ๐ฅ๐กโฅ โค (๐โฒ)๐กโ๐ก0โฅ๐ฅ๐ก
0โฅ + ๐ ๐โฒ 1โ๐โฒ
sup
๐ก0โค๐ <๐ก
โฅ๐ค๐โฅ,
where ๐โฒ = 1โ(1โ๐ฟ)๐พ1โ๐พ ๐ โ (0,1). Finally, when โฅ๐ดห๐กโฅ โค ๐ยฏ๐ด, โฅ๐ตห๐กโฅ โค ๐ยฏ๐ต and ห
๐ต๐ก๐ตหโค
๐ก โชฐ ๐2
๐ต๐ผ, one uniform upper bound for โฅ๐พ๐กโฅ is๐พ๐ ๐๐ฅ = ๐ ๐ ๐ยฏ๐ต
๐2
๐ต
(๐ (1โ๐พ) +๐ยฏ๐ด) with๐ ๐ =โฅ๐ โฅ โฅ๐ โ1โฅ.
Proof. The proof is divided by two parts. For the first part, we prove the ISS property given the upper bound ๐พ๐ ๐๐ฅ on the controllers โฅ๐พ๐กโฅ. In the second part, we provide such an upper bound๐พ๐ ๐๐ฅ.
Proof of ISS. By Lemma 4.2, to show ISS we only need to show that (๐พ๐ก)โ
๐ก=0
is sequential strongly stable for system (๐ด๐ก, ๐ต๐ก)โ
๐ก=0. {๐พ๐ก}โ
๐ก=0 is (๐ , ๐พ , ๐)-sequential strongly stable for the system (๐ดห๐ก,๐ตห๐ก)โ
๐ก=0 with (๐ , ๐พ , ๐) defined by Lemma 4.1 as ๐ = โ๐ ๐
1โ๐ผ, ๐พ = 1โ โ
๐ผ, ๐ = โ๏ธ ๐ผ
1โ๐ผ where ๐ ๐ = โฅ๐โฅ โฅ๐โ1โฅ. Thus, there exist matrices ๐ป1, ๐ป2, ..., and ๐ฟ1, ๐ฟ2..., such that ห๐๐ก := ๐ดห๐ก +๐ตห๐ก๐พ๐ก = ๐ป๐ก๐ฟ๐ก๐ปโ1
๐ก with the following properties: (a) โฅ๐ฟ๐กโฅ โค 1โ ๐พ; (b) โฅ๐ป๐กโฅ โค ๐ฝ1 and โฅ๐ปโ1
๐ก โฅ โค 1/๐ฝ2 with ๐ = ๐ฝ1/๐ฝ2; (c) โฅ๐ปโ1
๐ก+1๐ป๐กโฅ โค 1โ๐พ๐ . With this decomposition for ห๐๐ก, we show that ๐๐ก := ๐ด๐ก+๐ต๐ก๐พ๐ก can be decomposed similarly. Letฮ๐ก =๐๐กโ๐ห๐ก. Then,
๐๐ก = ๐ห๐ก+ฮ๐ก =๐ป๐ก๐ฟ๐ก๐ปโ1
๐ก +๐ป๐ก๐ปโ1
๐ก ฮ๐ก๐ป๐ก๐ปโ1
๐ก =๐ป๐ก(๐ฟ๐ก+๐ปโ1
๐ก ฮ๐ก๐ป๐ก)๐ปโ1
๐ก =๐ป๐ก๐ฟโฒ
๐ก๐ปโ1
๐ก . (4.16) We next upper boundโฅ๐ฟโฒ
๐กโฅ. Notice that
โฅฮ๐กโฅ =โฅ๐๐กโ๐ห๐กโฅ = โฅ๐ด๐ก+๐ต๐ก๐พ๐กโ ๐ดห๐กโ๐ตห๐ก๐พ๐กโฅ
โค โฅ๐ด๐ก โ๐ดห๐กโฅ + โฅ๐ต๐กโ ๐ตห๐กโฅ โฅ๐พ๐กโฅ
โค max{โฅ๐ด๐กโ๐ดห๐กโฅ,โฅ๐ต๐กโ๐ตห๐กโฅ}(1+ โฅ๐พ๐กโฅ)
โค ๐ฟ
๐พ
๐ (1+๐พ๐ ๐๐ฅ)(1+ โฅ๐พ๐กโฅ) โค๐ฟ ๐พ ๐ . Then, we have the following bound on โฅ๐ฟโฒ
๐กโฅ,
โฅ๐ฟโฒ
๐กโฅ=โฅ๐ฟ๐ก+๐ปโ1
๐ก ฮ๐ก๐ป๐กโฅ โค โฅ๐ฟ๐กโฅ + โฅ๐ปโ1
๐ก โฅ โฅฮ๐กโฅ โฅ๐ป๐กโฅ โค1โ๐พ+๐ฟ๐พ=1โ (1โ๐ฟ)๐พ . (4.17) Define ๐พโฒ = (1โ ๐ฟ)๐พ and ๐โฒ = 1โ(1โ1โ๐พ๐ฟ)๐พ๐. We claim that (๐พ๐ก)โ
๐ก=0 is (๐ , ๐พโฒ, ๐โฒ) sequential strongly stable for system (๐ด๐ก, ๐ต๐ก)โ
๐ก=0. In the decomposition (4.16), ๐ฟโฒ
๐ก
satisfiesโฅ๐ฟโฒ
๐กโฅ โค 1โ๐พโฒ(which we have proved in (4.17)), and by definition๐ป๐กsatisfies
โฅ๐ป๐กโฅ โค ๐ฝ1andโฅ๐ปโ1
๐ก โฅ โค 1/๐ฝ2with๐ = ๐ฝ1/๐ฝ2; โฅ๐ปโ1
๐ก+1๐ป๐กโฅ โค ๐
1โ๐พ = ๐โฒ
1โ๐พโฒ. Therefore, the only conditions we need to verify are (a) 0 < ๐พโฒ โค 1 and (b) 0 โค ๐โฒ < 1.
Condition (a) is trivial since for any๐ผโ [0, 1
2), we have๐ฟ โ (0,
โ 1โ๐ผโโ
๐ผ 1โโ
๐ผ ) โ (0,1) and hence๐พโฒ=(1โ๐ฟ)๐พlies in(0,1]. For condition (b), as๐โฒ= 1โ(1โ๐ฟ)๐พ
1โ๐พ ๐, where๐=
โ๏ธ ๐ผ
1โ๐ผ and๐พ =1โโ
๐ผ, it follows that๐โฒ <1 is equivalent to the following condition:
๐โฒ= 1โ (1โ๐ฟ)๐พ 1โ๐พ
๐ = ๐ฟ+ (1โ๐ฟ)โ ๐ผ
โ 1โ๐ผ
< 1. (4.18)
Reorganizing the terms we have๐ฟ <
โ 1โ๐ผโโ
๐ผ 1โโ
๐ผ , which is satisfied by our condition on๐ฟ. Upper boundingโฅ๐พ๐กโฅ. The only thing that remains to be shown is that there exists an upper bound on the controller gain matrix under the conditions stated in the theorem.
Note that ๐พ๐ก must satisfy ห๐๐ก = ๐ดห๐ก + ๐ตห๐ก๐พ๐ก. On the other hand, by the COCO-LQ formulation,๐พ๐กsolves the following problem
min
๐พ๐ก
Tr(๐ ๐พ๐กฮฃ๐ฅ ๐ฅ๐พโค
๐ก ) s.t. ห๐๐ก = ๐ดห๐ก+๐ตห๐ก๐พ๐ก,
whereฮฃ๐ฅ ๐ฅ is the solution to the COCO-LQ formulation at time๐ก. The Lagrangian of the above optimization is
๐ฟ(๐พ๐ก,ฮ)=Tr(๐ ๐พ๐กฮฃ๐ฅ ๐ฅ๐พโค
๐ก ) +Tr(ฮโค(๐ดห๐ก+๐ตห๐ก๐พ๐กโ๐ห๐ก)). The optimizer๐พ๐ก must satisfy the following condition,
โ๐พ๐ก๐ฟ(๐พ๐ก,ฮ)=2๐ ๐พ๐กฮฃ๐ฅ ๐ฅ +๐ตหโค
๐ก ฮ =0. ห
๐๐ก = ๐ดห๐ก +๐ตห๐ก๐พ๐ก. Together, we obtain๐พ๐ก =๐ โ1๐ตหโค
๐ก (๐ตห๐ก๐ โ1๐ตหโค
๐ก )โ1(๐ห๐กโ ๐ดห๐ก)and
โฅ๐พ๐กโฅ โค โฅ๐ โ1โฅ โฅ๐ตหโค
๐ก โฅ 1
๐min(๐ โ1)๐min(๐ตห๐ก๐ตหโค
๐ก )(๐ (1โ๐พ) + โฅ๐ดห๐กโฅ), (4.20) where we have used by the(๐ , ๐พ , ๐)sequential strong stability of(๐พ๐ก)โ
๐ก=0for system (๐ดห๐ก,๐ตห๐ก)โ
๐ก=0, we haveโฅ๐ห๐กโฅ = โฅ๐ดห๐ก+ ๐ตห๐ก๐พ๐กโฅ โค ๐ (1โ ๐พ). By our condition, we have
โฅ๐ดห๐กโฅ โค ๐ยฏ๐ด, โฅ๐ตห๐กโฅ โค ๐ยฏ๐ต, ห๐ต๐ก๐ตหโค
๐ก โชฐ ๐2
๐ต, and ๐ ๐ = โฅ๐ โฅ โฅ๐ โ1โฅ. Then, we have โฅ๐พ๐กโฅ upper bounded as follows,
โฅ๐พ๐กโฅ โค ๐ ๐
ยฏ ๐๐ต ๐2
๐ต
(๐ (1โ๐พ) +๐ยฏ๐ด).
โก This result highlights the tradeoff between the estimation error and the algorithm performance. If we choose a small๐ผ, the algorithm can tolerate a larger estimation error (i.e., larger right-hand side of (4.15) can be obtained) but may lead to high control cost due to the tight state covariance constraint. If we choose a larger๐ผ, the algorithm tolerates smaller estimation errors while its performance improves due to the less strict state covariance constraint.