Chapter 6: Learning-based Safe and Robust Control and Estimation
6.1 Stability of NCM-based Control and Estimation
The NCM exhibits superior incremental robustness and stability due to its internal contracting property.
Theorem 6.1. Let M define the NCM in Definition 6.2, and let ๐ขโ = ๐ข๐ โ ๐ โ1๐ตโค๐(๐ฅ โ๐ฅ๐) for ๐ and ๐ given in Theorem 4.2. Suppose that the systems (6.1)and(6.2)are controlled by๐ข๐ฟ, which computes the CV-STEM controller๐ขโof Theorem 4.2 replacing๐ byM, i.e.,
๐ข๐ฟ =๐ข๐โ๐ (๐ฅ , ๐ฅ๐, ๐ข๐, ๐ก)โ1๐ต(๐ฅ , ๐ก)โคM (๐ฅ , ๐ฅ๐, ๐ข๐, ๐ก)e (6.7) for e = ๐ฅ โ ๐ฅ๐. Note that we could use the differential feedback control ๐ข = ๐ข๐+โซ1
0 ๐ ๐๐of Theorem 4.6 for๐ขโ and๐ข๐ฟ. Defineฮ๐ฟ of Theorems 5.2 and 5.3 as follows:
ฮ๐ฟ(๐, ๐ก)= ๐ต(๐ฅ , ๐ก) (๐ข๐ฟ(๐ฅ , ๐ฅ๐, ๐ข๐, ๐ก) โ๐ขโ(๐ฅ , ๐ฅ๐, ๐ข๐, ๐ก)) (6.8) where we use๐(0, ๐ก) =๐0(๐ก)=๐ฅ๐(๐ก),๐(1, ๐ก) =๐1(๐ก)=๐ฅ(๐ก), and๐ =[๐ฅโค, ๐ฅโค
๐
, ๐ขโค
๐]โค in the learning-based control formulation (5.1) and (5.2). If the NCM is learned
to satisfy the learning error assumptions of Theorems 5.2 and 5.3 forฮ๐ฟ given by (6.8), then(5.11),(5.14), and(5.15)of Theorems 5.2 and 5.3 hold as follows:
โฅe(๐ก) โฅ โค ๐โ(0)
โ ๐
๐โ๐ผโ๐ก+ ๐โ0+๐ยฏ๐ ๐ผโ
โ๏ธ
๐ ๐
(1โ๐โ๐ผโ๐ก) (6.9)
E
โฅe(๐ก) โฅ2
โค E[๐๐ โ(0)]
๐
๐โ2๐ผโ๐ก+ ๐ถ๐ถ 2๐ผโ
๐ ๐
(6.10) P[โฅe(๐ก) โฅ โฅ ๐] โค 1
๐2
E[๐๐ โ(0)]
๐
๐โ2๐ผโ๐ก+ ๐ถ๐ถ 2๐ผโ
๐ ๐
(6.11) wheree =๐ฅโ๐ฅ๐, ๐ถ๐ถ =๐ยฏ2
๐(2๐ผ๐บโ1+1) +๐2
โ0๐ผโ1
๐ , the disturbance upper bounds๐ยฏ๐ and๐ยฏ๐ are given in(6.1)and(6.2), respectively, and the other variables are defined in Theorems 4.2, 5.2 and 5.3.
Proof. Letgof the virtual systems given in Theorems 5.2 and 5.3 beg= ๐, where ๐is given by (3.13) (i.e., ๐ = (๐ดโ๐ต ๐ โ1๐ตโค๐) (๐โ๐ฅ๐) + ยค๐ฅ๐). By definition ofฮ๐ฟ, this has๐ =๐ฅ and๐ =๐ฅ๐ as its particular solutions (see Example 5.1). As defined in the proof of Theorems 3.1, we have ๐ = ๐ ๐๐ and๐บ = ๐๐บ๐ for these virtual systems, resulting in the upper bounds of disturbances in the learning-based control formulation (5.11) and (5.14) given as
๐ยฏ=๐ยฏ๐and ยฏ๐=๐ยฏ๐. (6.12)
Since๐and๐ขโare constructed to render๐ยค=๐(๐, ๐, ๐ก)contracting as presented in Theorem 4.2, and we haveโฅฮ๐ฟโฅ โค ๐โ0+๐โ1โฅ๐1โ๐0โฅdue to the learning assumption (5.3), Theorem 5.2 implies the deterministic bound (6.9) and Theorem 5.3 implies the stochastic bounds (6.10) and (6.11). Note that using the differential feedback control of Theorem 4.6 as๐ขโalso gives the bounds (6.9) โ (6.11) following the same argument [8].
Due to the estimation and control duality in differential dynamics observed in Chapter 4, we have a similar result to Theorem 6.1 for the NCM-based state estimator.
Theorem 6.2. LetM define the NCM in Definition 6.2, and let๐ฟ(๐ฅ , ๐กห ) = ๐๐ถยฏโค๐ โ1 for ๐,๐ถยฏ and๐ given in Theorem 4.5. Suppose that the systems(6.4)and(6.5)are estimated with an estimator gain๐ฟ๐ฟ, which computes๐ฟof the CV-STEM estimator
ยคห
๐ฅ = ๐(๐ฅ , ๐กห ) +๐ฟ(๐ฅ , ๐กห ) (๐ฆโโ(๐ฅ , ๐กห ))in Theorem 4.5 replacing๐ byM, i.e.,
ยคห
๐ฅ = ๐(๐ฅ , ๐กห ) +๐ฟ๐ฟ(๐ฅ , ๐กห ) (๐ฆโ โ(๐ฅ , ๐กห )) (6.13)
= ๐(๐ฅ , ๐กห ) + M (๐ฅ , ๐กห )๐ถยฏ(๐๐,๐ฅ , ๐กห )โค๐ (๐ฅ , ๐กห )โ1(๐ฆโโ(๐ฅ , ๐กห ))
Defineฮ๐ฟ of Theorems 5.2 and 5.3 as follows:
ฮ๐ฟ(๐, ๐ก)= (๐ฟ๐ฟ(๐ฅ , ๐กห ) โ๐ฟ(๐ฅ , ๐กห )) (โ(๐ฅ , ๐ก) โโ(๐ฅ , ๐กห )) (6.14) where we use ๐(0, ๐ก) = ๐0(๐ก) = ๐ฅ(๐ก), ๐(1, ๐ก) = ๐1(๐ก) = ๐ฅห(๐ก), and ๐ = [๐ฅโค,๐ฅหโค]โค in the learning-based control formulation (5.1) and (5.2). If the NCM is learned to satisfy the learning error assumptions of Theorems 5.2 and 5.3 forฮ๐ฟ given by (6.14), then(5.11),(5.14), and(5.15)of Theorems 5.2 and 5.3 hold as follows:
โฅe(๐ก) โฅ โค
โ
๐๐โ(0)๐โ๐ผโ๐ก+ ๐ยฏ๐
โ๏ธ
๐
๐ +๐ยฏ๐๐ ๐ผโ
(1โ๐โ๐ผโ๐ก) (6.15) E
โฅe(๐ก) โฅ2
โค ๐E[๐๐ โ(0)]๐โ2๐ผโ๐ก+ ๐ถ๐ธ 2๐ผโ
๐ ๐
(6.16) P[โฅe(๐ก) โฅ โฅ ๐] โค 1
๐2
๐E[๐๐ โ(0)]๐โ2๐ผโ๐ก+ ๐ถ๐ธ 2๐ผโ
๐ ๐
(6.17) wheree =๐ฅหโ๐ฅ,๐ยฏ๐ =๐โ0+๐ยฏ๐0,๐ยฏ๐= ๐ยฏ๐ยฏ๐ยฏ๐1,๐ถ๐ธ =(๐ยฏ2
๐0+๐ยฏ2๐ยฏ2๐ยฏ2
๐1๐2) (2๐ผ๐บโ1+1) + ๐2
โ0๐ผโ1
๐ , the disturbance upper bounds ๐ยฏ๐0 ๐ยฏ๐1, ๐ยฏ๐0, and๐ยฏ๐1 are given in(6.4) and (6.5), and the other variables are defined in Theorems 4.5, 5.2 and 5.3.
Proof. Letgof the virtual systems given in Theorems 5.2 and 5.3 beg= ๐, where ๐is given by (4.10) (i.e.,๐ =(๐ดโ๐ฟ๐ถ) (๐โ๐ฅ) + ๐(๐ฅ , ๐ก)). By definition ofฮ๐ฟ, this has๐=๐ฅหand๐ =๐ฅas its particular solutions (see Example 5.2). As defined in the proof Theorem 4.3, we have
๐ = (1โ ๐)๐๐0+๐ ๐ฟ๐ฟ๐๐1and๐บ = [(1โ๐)๐บ๐0, ๐ ๐ฟ๐ฟ๐บ๐1]
for these virtual systems, resulting in the upper bounds of external disturbances in the learning-based control formulation (5.11) and (5.14) given as
ยฏ
๐ =๐ยฏ๐0+๐ยฏ๐ยฏ๐ยฏ๐1
โ๏ธ
๐ ๐and ยฏ๐ =
โ๏ธ
ยฏ ๐2
๐0+๐ยฏ2๐ยฏ2๐ยฏ2
๐1๐2.
Since๐ = ๐โ1is constructed to render๐ยค = ๐(๐, ๐, ๐ก)contracting as presented in Theorem 4.5, and we haveโฅฮ๐ฟโฅ โค ๐โ0+๐โ1โฅ๐1โ๐0โฅdue to the learning assumption (5.3), Theorem 5.2 implies the bound (6.15), and Theorem 5.3 implies the bounds (6.16) and (6.17).
Remark 6.2. As discussed in Examples 5.1 and 5.2 and in Remark 5.1, we have ๐โ1 =0for the learning error๐โ1of(5.3)as long as๐ตandโare bounded a compact set by the definition ofฮ๐ฟ in(6.8)and(6.14). Ifโand๐ขโare Lipschitz with respect to๐ฅin the compact set, we have non-zero๐โ1with๐โ0 =0in(5.3)(see Theorem 6.3).
Using Theorems 6.1 and 6.2, we can relate the learning error of the matrix that defines the NCM,โฅM โ๐โฅ, to the error bounds (6.9) โ (6.11) and (6.15) โ (6.17), if๐ is given by the CV-STEM of Theorems 4.2 and 4.5.
Theorem 6.3. LetM define the NCM in Definition 6.2, and letS๐ โ R๐,S๐ข โ R๐, and ๐ก โ S๐ก โ Rโฅ0 be some compact sets. Suppose that the systems (6.1) โ (6.5) are controlled by (6.7) and estimated by (6.13), respectively, as in Theorems 6.1 and 6.2. Suppose also thatโ๐,ยฏ ๐,ยฏ ๐ยฏ โRโฅ0s.t. โฅ๐ต(๐ฅ , ๐ก) โฅ โค ๐ยฏ, โฅ๐ถ(๐๐, ๐ฅ ,๐ฅ , ๐กห ) โฅ โค ๐ยฏ, and โฅ๐ โ1โฅ โค ๐,ยฏ โ๐ฅ ,๐ฅห โ S๐ and๐ก โ S๐ก, for ๐ตin(6.1),๐ถ in Theorem 4.5, and ๐ in (6.7)and(6.13). If we have for all๐ฅ , ๐ฅ๐,๐ฅห โ S๐ ,๐ข๐ โ S๐ข, and๐ก โ S๐ก that
โฅM (๐ฅ , ๐ฅ๐, ๐ข๐, ๐ก) โ ๐(๐ฅ , ๐ฅ๐, ๐ข๐, ๐ก) โฅ โค๐โ for control (6.18)
โฅM (๐ฅ , ๐กห ) โ๐(๐ฅ , ๐กห ) โฅ โค๐โ for estimation
where๐(๐ฅ , ๐ฅ๐, ๐ข๐, ๐ก)and๐(๐ฅ , ๐กห )(or๐(๐ฅ , ๐ก)and๐(๐ฅ , ๐กห ), see Theorem 3.2) are the CV-STEM solutions of Theorems 4.2 and 4.5 withโ๐โ โRโฅ0being the learning error, then the bounds (6.9)โ (6.11)of Theorem 6.1 and (6.15)โ (6.17) of Theorem 6.2 hold with ๐โ0 = 0 and ๐โ1 = ๐ยฏ๐ยฏ2๐โ for control, and ๐โ0 = 0 and ๐โ1 = ๐ยฏ๐ยฏ2๐โ for estimation, as long as๐โ is sufficiently small to satisfy the conditions(5.10)and (5.13)of Theorems 5.2 and 5.3 for deterministic and stochastic systems, respectively.
Proof. Forฮ๐ฟ defined in (6.8) and (6.14) with the controllers and estimators given as๐ขโ =๐ข๐ โ๐ โ1๐ตโค๐(๐ฅ โ๐ฅ๐), ๐ข๐ฟ = ๐ข๐โ ๐ โ1๐ตโคM (๐ฅโ๐ฅ๐), ๐ฟ = ๐ ๐ถโค๐ โ1, and ๐ฟ๐ฟ =M๐ถโค๐ โ1, we have the following upper bounds:
โฅฮ๐ฟโฅ โค
๏ฃฑ๏ฃด
๏ฃด
๏ฃฒ
๏ฃด๏ฃด
๏ฃณ
ยฏ
๐๐ยฏ2๐โโฅ๐ฅโ๐ฅ๐โฅ (controller)
ยฏ
๐๐ยฏ2๐โโฅ๐ฅหโ๐ฅโฅ (estimator)
where the relation โฅโ(๐ฅ , ๐ก) โโ(๐ฅ , ๐กห ) โฅ โค ๐ยฏโฅ๐ฅหโ๐ฅโฅ, which follows from the equality โ(๐ฅ , ๐ก) โโ(๐ฅ , ๐กห ) =๐ถ(๐๐, ๐ฅ ,๐ฅ , ๐กห ) (๐ฅโ๐ฅห)(see Lemma 3.1), is used to obtain the second inequality. This implies that we have๐โ0=0 and๐โ1=๐ยฏ๐ยฏ2๐โfor control, and๐โ0=0 and๐โ1 = ๐ยฏ๐ยฏ2๐โ for estimation in the learning error of (5.3). The rest follows from Theorems 6.1 and 6.2.
Example 6.1. The left-hand side of Fig. 6.2 shows the state estimation errors of the following Lorenz oscillator perturbed by process noise ๐0 and measurement noise ๐1[1]:
ยค
๐ฅ= [๐(๐ฅ2โ๐ฅ1), ๐ฅ1(๐โ๐ฅ3) โ๐ฅ2, ๐ฅ1๐ฅ2โ ๐ฝ๐ฅ3]โค+๐0(๐ฅ , ๐ก) ๐ฆ= [1,0,0]๐ฅ+๐2(๐ฅ , ๐ก)
where ๐ฅ = [๐ฅ1, ๐ฅ2, ๐ฅ3]โค, ๐ = 10, ๐ฝ = 8/3, and ๐ = 28. It can be seen that the steady-state estimation errors of the NCM and CV-STEM are below the optimal steady-state upper bound of Theorem 4.5 (dotted line in the left-hand side of Fig. 6.2) while the EKF has a larger error compared to the other two. In this example, training data is sampled along100trajectories with uniformly distributed initial conditions (โ10 โค ๐ฅ๐ โค 10, ๐ = 1,2,3) using Theorem 4.5, and then modeled by a DNN as in Definition 6.2. Additional implementation and network training details can be found in [1].
Example 6.2. Consider a robust feedback control problem of the planar spacecraft perturbed by deterministic disturbances, the unperturbed dynamics of which is given as follows [9]:
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
๐ 0 0
0 ๐ 0
0 0 ๐ผ
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
ยฅ ๐ฅ =
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
cos๐ sin๐ 0
โsin๐ cos๐ 0
0 0 1
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ
โ1 โ1 0 0 1 1 0 0
0 0 โ1 โ1 0 0 1 1
โโ โ โ๐ ๐ โโ โ โ๐ ๐
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป ๐ข
where ๐ฅ = [๐๐ฅ, ๐๐ฆ, ๐,๐ยค๐ฅ,๐ยค๐ฆ,๐ยค]๐ with ๐๐ฅ, ๐๐ฆ, and ๐ being the horizontal coordi- nate, vertical coordinate, and yaw angle of the spacecraft, respectively, with the other variables defined as๐ =mass of spacecraft,๐ผ =yaw moment of inertia,๐ข = thruster force vector,โ =half-depth of spacecraft, and๐=half-width of spacecraft (see Fig. 8 of [9] for details).
As shown in the right-hand side of Fig. 6.2, the NCM keeps their trajectories within the bounded error tube of Theorem 4.2 (shaded region) around the target trajectory (dotted line) avoiding collision with the circular obstacles, even under the presence of disturbances, whilst requiring much smaller computational cost than the CV- STEM. Data sampling and the NCM training are performed as in Example 6.1 [1].
Remark 6.3. We could also simultaneously synthesize a controller๐ขand the NCM, and Theorem 6.1 can provide formal robustness and stability guarantees even for such cases [8], [10]. In Chapter 7, we generalize the NCM to learn contraction theory-based robust control using a DNN only taking๐ฅ, ๐ก, and a vector containing local environment information as its inputs, to avoid the online computation of๐ฅ๐ for the sake of automatic guidance and control implementable in real-time.