Chapter 9: Interstellar Object Exploration
9.4 Neural-Rendezvous: Learning-based Robust Guidance and Control
This section finally presents Neural-Rendezvous, a deep learning-based terminal G&C approach to autonomously encounter ISOs, thereby solving the problem 9.1 of Sec. 9.1 that arise from the large state uncertainty and high-velocity challenges.
It will be shown that the SN-DNN min-norm control (9.17) of Lemma 9.2 verifies a formal exponential bound on expected spacecraft delivery error, which provides valuable information in determining whether we should use the SN-DNN terminal guidance policy or enhance it with the SN-DNN min-norm control, depending on the size of the state uncertainty.
9.4.I Robustness and Stability Guarantee
The assumptions introduced in Sec. 9.3.III allow bounding the mean squared distance between the spacecraft relative position of (9.1) controlled by (9.17) and the desired position ๐๐(๐ก)given in (9.14), even under the presence of the state uncertainty (see Fig. 9.4). We remark that the additional notations in the following theorem are summarized in Table 9.4, where the others are consistent with the ones in Table 9.3.
Let us remark that the use of our incremental Lyapunov-like function, along with the supermartingale analysis, allows easier handling of the aforementioned local assumptions in estimation and control.
Theorem 9.1. Suppose that Assumptions 9.1 and 9.2 hold, and that the spacecraft relative dynamics with respect to the ISO, given in(9.1), is controlled by๐ข=๐ขโ
โ. If the estimated states at time๐ก =๐ก๐ satisfyลห๐ โ Ciso(๐ ยฏiso) and๐ฅห๐ โ Csc(๐ ยฏsc), then the spacecraft delivery error is explicitly bounded as follows with probability at least 1โ๐ctrl:
E[โฅ๐(๐ก๐) โ๐โฅ |ล(ห ๐ก๐ ) =ลห๐ โฉ๐ฅห(๐ก๐ ) =๐ฅห๐ ]
โค sup
(ล๐ ,๐ฅ๐ )โDest
๐โ๐(๐ก๐โ๐ก๐ )โฅ๐๐ โ ๐๐(๐ก๐ ) โฅ + ๐โ๐๐ก๐๐ฟ๐ ๐sc(๐ก๐)
โซ ๐ก๐
๐ก๐
๐๐(๐๐ , ๐ก๐ )๐ ๐
+
๏ฃฑ๏ฃด
๏ฃด๏ฃด
๏ฃฒ
๏ฃด๏ฃด
๏ฃด
๏ฃณ
๐โ๐(๐ก๐โ๐ก๐ )โ๐โ๐ผ(๐ก๐โ๐ก๐ ) ๐ผโ๐
๐ฃ(๐ฅ๐ ,๐ก๐ )
โ
๐sc(๐ก๐) if๐ผ โ ๐ (๐ก๐ โ๐ก๐ )๐โ๐(๐ก๐โ๐ก๐ )โ๐ฃ(๐ฅ๐ ,๐ก๐ )
๐sc(๐ก๐) if๐ผ =๐
(9.27)
where๐ctrl โRโฅ0is the probability to be given in(9.42),๐๐ก(๐๐ , ๐ก๐ )is a time-varying function defined as๐๐ก(๐๐ , ๐ก๐ ) =๐(๐โ๐ผ)๐ก
โซ๐ก ๐ก๐
๐๐ผ๐๐๐(๐๐ , ๐ก๐ )๐ ๐. Note that(9.27)yields a bound onP
โฅ๐(๐ก๐) โ ๐โฅ โค ๐
ลห(๐ก๐ ) =ลห๐ โฉ๐ฅห(๐ก๐ ) =๐ฅห๐
as in Theorem 2.5 of [7], where๐ โRโฅ0is any given distance of interest, using Markovโs inequality.
Table 9.4: Notations in Theorem 9.1.
Notation Description
Dest Set defined asDest={(ล๐ , ๐ฅ๐ ) โR๐รR๐|โ๏ธ
โฅล๐ โลห๐ โฅ2+ โฅ๐ฅ๐ โ๐ฅห๐ โฅ2 โค๐๐ ๐ฟ๐ Lipschitz constant of๐in Assumption (9.1)
๐sc(๐ก) Spacecraft mass of (9.2) satisfying๐sc(๐ก) โ [๐sc(๐ก๐ ), ๐sc(๐ก๐)]due to the Tsiolkovsky rocket equation
ล๐ ,ลห๐ , ๐ฅ๐ ,๐ฅห๐ True and estimated ISO and S/C state at time๐ก=๐ก๐ , respectively
๐๐(๐ก) Desired S/C relative position trajectory, i.e.,๐ฅ๐(๐ก)=[๐๐(๐ก)โค,๐ยค๐(๐ก)โค]โคfor ๐ฅ๐of (9.14)
๐๐ , ๐ห๐ True and estimated S/C relative position at time๐ก=๐ก๐ , i.e.,๐ฅ๐ =[๐โค๐ ,๐ยคโค๐ ]โค and ห๐ฅ๐ =[๐หโค๐ ,๐ยคหโค๐ ]โค
ยฏ
๐ iso, ๐ ยฏsc Constants defined as ยฏ๐ iso=๐isoโ2๐๐and ยฏ๐ sc=๐scโ2๐๐for๐๐of Assump- tion 9.1 and๐isoand๐scof Assumption 9.2
๐ขโ
โ SN-DNN min-norm control policy (9.17) of Lemma 9.2 ๐ฃ(๐ฅ , ๐ก) Non-negative function given as ๐ฃ(๐ฅ , ๐ก) = โ๏ธ
๐( ยค๐, ๐๐(๐, ๐ก), ๐ก) =
โ๏ธ
๐sc(๐ก) โฅฮ(๐โ๐๐(๐ก)) + ยค๐โ ยค๐๐(๐ก) โฅfor๐ of (9.22)
๐๐ Tuple of the true and estimated ISO and S/C state at time ๐ก = ๐ก๐ , i.e., ๐๐ =(ล๐ ,ลห๐ , ๐ฅ๐ ,๐ฅห๐ )
๐ผ Positive constant of (9.21)
๐ Minimum eigenvalue ofฮdefined in (9.21) ๐ Desired terminal S/C relative position of (9.4)
Proof. The dynamics (9.16) controlled by (9.17) can be rewritten as
ยฅ
๐ =๐ป(๐ฅ ,ล, ๐ก) +โฌ(๐ฅ ,ล, ๐ก)๐ขโ
โ(๐ฅ ,ล, ๐ก;๐nn) +โฌ(๐ฅ ,ลห, ๐ก)๐ขห (9.28) where ห๐ข =๐ขโ
โ(๐ฅ ,ห ลห, ๐ก;๐nn) โ๐ขโ
โ(๐ฅ ,ล, ๐ก;๐nn). Since the third term of (9.28) is subject to stochastic disturbance due to the state uncertainty of Assumption 9.1, we consider the weak infinitesimal operator ๐ given in [30, p. 9], instead of taking the time derivative, for analyzing the time evolution of the non-negative function๐ฃdefined as ๐ฃ(๐ฅ , ๐ก) =โ๏ธ
๐( ยค๐, ๐๐(๐, ๐ก), ๐ก) =โ๏ธ
๐sc(๐ก) โฅฮ(๐โ ๐๐(๐ก)) + ยค๐โ ยค๐๐(๐ก) โฅfor๐ of (9.22).
To this end, let us compute the following time increment of๐ฃ evaluated at the true state and time (๐ฅ ,ล, ๐ก):
ฮ๐ฃ=๐ฃ(๐ฅ(๐ก+ฮ๐ก), ๐ก+ฮ๐ก) โ๐ฃ(๐ฅ , ๐ก)
= 1 2๐ฃ(๐ฅ , ๐ก)
๐๐
๐๐ยค
ยฅ
๐(๐ก) + ๐๐
๐ ๐๐
ยฅ
๐๐(๐(๐ก), ๐ก) + ๐๐
๐ ๐ก
ฮ๐ก+ O ฮ๐ก2
(9.29)
whereฮ๐ก โ Rโฅ0 and the arguments ( ยค๐(๐ก), ๐๐(๐, ๐ก), ๐ก) of the partial derivatives of ๐ are omitted for notational convenience. Dropping the argument ๐ก for the state variables for simplicity, the dynamics decomposition (9.28) gives
๐๐
๐๐ยค
ยฅ ๐+ ๐๐
๐ ๐๐
ยฅ
๐๐(๐, ๐ก) + ๐๐
๐ ๐ก
โค ๐๐
๐๐ยค
(๐ป(๐ฅ ,ล, ๐ก) +โฌ(๐ฅ ,ล, ๐ก)๐ขโ(๐ฅ ,ห ลห, ๐ก;๐nn)) + ๐๐
๐ ๐๐
( ยฅ๐๐โฮ( ยค๐โ ยค๐๐))
โค โ2๐ผ๐( ยค๐, ๐๐(๐, ๐ก), ๐ก) +2๐๐
๐๐ยค
โฌ(๐ฅ ,ล, ๐ก)๐ขห=โ2๐ผ๐ฃ(๐ฅ , ๐ก)2+2๐ฃ(๐ฅ , ๐ก) โฅ๐ขหโฅ
โ๏ธ
๐sc(๐ก) (9.30) where the first inequality follows from the fact that the spacecraft mass ๐sc(๐ก), described by the Tsiolkovsky rocket equation, is a decreasing function and thus
๐๐/๐ ๐ก โค 0, and the second inequality follows from the stability condition (9.21) evaluated at (๐ฅ ,ล, ๐ก), which is guaranteed to be feasible due to Lemma 9.2. Since ๐ขโis assumed to be Lipschitz as in (9.25) of Assumption 9.2, we get the following relation for any๐ฅ๐ ,๐ฅห๐ โ Csc(๐sc), ล๐ ,ลห๐ โ Ciso(๐iso), and๐ก๐ โ [0, ๐ก๐], by substitut- ing (9.30) into (9.29):
๐๐ฃ(๐ฅ๐ , ๐ก๐ ) = lim
ฮ๐กโ0
E๐๐ [ฮ๐ฃ] ฮ๐ก
โค โ๐ผ๐ฃ(๐ฅ๐ , ๐ก๐ ) + ๐ฟ๐
โ๏ธ
๐sc(๐ก๐)
โ๏ธโฅลห๐ โล๐ โฅ2+ โฅ๐ฅห๐ โ๐ฅ๐ โฅ2 (9.31) whereE๐๐ [ยท] =E[ ยท|๐ฅ(๐ก๐ )=๐ฅ๐ ,๐ฅห(๐ก๐ ) =๐ฅห๐ ,ล(๐ก๐ ) =ล๐ ,ลห(๐ก๐ ) =ลห๐ ] for ๐๐ given as ๐๐ = (๐ฅ๐ ,๐ฅห๐ ,ล๐ ,ลห๐ ), and the relation ๐sc(๐ก) โฅ ๐sc(๐ก๐) by the Tsiolkovsky rocket equation is also used to obtain the inequality. Applying Dynkinโs formula [30, p.
10] to (9.31) gives the following with probability at least 1โ ๐est due to (9.24) of Assumption 9.1:
E๐๐ [โฅฮ(๐(๐ก) โ๐๐(๐ก)) + ยค๐(๐ก) โ ยค๐๐(๐ก) โฅ] (9.32)
โค ๐โ๐ผ(๐กโ๐ก๐ )๐ฃ(๐ฅ๐ , ๐ก๐ )
โ๏ธ
๐sc(๐ก๐) + ๐โ๐ผ๐ก๐ฟ๐ ๐sc(๐ก๐)
โซ ๐ก
๐ก๐
๐๐ผ๐๐๐(๐๐ , ๐ก๐ )๐ ๐, โ๐ก โ [๐ก๐ ,๐กยฏ๐]
where ๐๐(๐๐ , ๐ก๐ ) is as given in (9.24) and ยฏ๐ก๐ is defined as ยฏ๐ก๐ = min{๐ก๐, ๐ก๐}, with๐ก๐ being the first exit time that any of the states leave their respective set, i.e.,
๐ก๐ =inf{๐ก โฅ ๐ก๐ :ล(๐ก) โCiso(๐iso) โชลห(๐ก) โCiso(๐iso) (9.33) ๐ก๐ =inf{๐ก โฅ ๐ก๐ :โช๐ฅ(๐ก) โCsc(๐sc) โช๐ฅห(๐ก) โCsc(๐sc)}
given that the event of (9.23) in Assumption 9.1 has occurred. Since we further have that๐โฅ๐โ๐๐โฅ = ๐ ๐ก๐โฅ๐โ๐๐โฅ โค โฅ๐หโฅ โ๐(๐โ๐๐)for ห๐= ฮ(๐โ๐๐) + ยค๐โ ยค๐๐
due to the hierarchical structure of ห๐, utilizing Dynkinโs formula one more time and then substituting (9.32) result in
E๐๐ [โฅ๐(๐ก) โ๐๐(๐ก) โฅ] โค๐โ๐๐ก
โซ ๐ก
๐ก๐
๐(๐โ๐ผ)๐+๐ผ๐ก๐ ๐ฃ(๐ฅ๐ , ๐ก๐ )
โ๏ธ
๐sc(๐ก๐) + ๐ฟ๐
๐sc(๐ก๐)๐๐(๐๐ , ๐ก๐ )๐ ๐ +๐โ๐(๐กโ๐ก๐ )โฅ๐๐ โ ๐๐(0) โฅ, โ๐ก โ [0,๐กยฏ๐] (9.34) where๐ฅ๐ = [๐โค
๐ ,๐ยคโค
๐ ]โค.
What is left to show is that we can achieve๐ก๐ > ๐ก๐ with a finite probability for the first exit time๐ก๐of (9.33). For this purpose, we need the following lemma.
Lemma 9.3. Suppose that the events of Assumption 9.1 has occurred, i.e., the event of(9.23)has occurred and the2-norm estimation error satisfies
โ๏ธโฅลห(๐ก) โล(๐ก) โฅ2+ โฅ๐ฅห(๐ก) โ๐ฅ(๐ก) โฅ2 < ๐๐, โ๐ก โ [๐ก๐ , ๐ก๐]. If Assumption 9.2 holds and if we have
๐๐ฃ(๐ฅ๐ , ๐ก๐ ) โค โ๐ผ๐ฃ(๐ฅ๐ , ๐ก๐ ) + ๐ฟ๐
โ๏ธ
๐sc(๐ก๐)
โ๏ธโฅลห๐ โล๐ โฅ2+ โฅ๐ฅห๐ โ๐ฅ๐ โฅ2 (9.35) for any๐ฅ๐ ,๐ฅห๐ โ Csc(๐sc), ล๐ ,ลห๐ โ Ciso(๐iso), and ๐ก๐ โ [0, ๐ก๐] as in (9.31), then we get the following probabilistic bound forโ๐exit โRโฅ0:
๐stay โฅ 1โ๐exit =
๏ฃฑ๏ฃด
๏ฃด๏ฃด
๏ฃฒ
๏ฃด๏ฃด
๏ฃด
๏ฃณ
1โ ๐ธ๐
ยฏ ๐ฃ
๐โ
๐ป(๐ก ๐)
ยฏ
๐ฃ if๐ฃยฏ โฅ sup๐กโ [๐ก๐ ,๐ก๐]โ(๐ก)
ยฏ ๐ผ
1โ ยฏ
๐ผ ๐ธ๐ +(๐๐ปยฏ(๐ก๐)โ1)sup๐กโ [๐ก๐ ,๐ก๐]โ(๐ก)
ยฏ ๐ผ๐ฃ ๐ยฏ
ยฏ
๐ป(๐ก๐) otherwise
(9.36)
๐stay=P๐๐
๏ฃฎ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฏ
๏ฃฐ ร
๐กโ[๐ก๐ ,๐ก๐]
๐ฅ(๐ก) โ Csc(๐ยฏsc)
๏ฃน
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃบ
๏ฃป
where ๐๐ = (ล๐ ,ลห๐ , ๐ฅ๐ ,๐ฅห๐ ) denotes the states at time ๐ก = ๐ก๐ , which satisfy ล๐ โ Ciso(๐ยฏiso),ลห๐ โ Ciso(๐ ยฏiso), ๐ฅ๐ โ Csc(๐ยฏsc), and๐ฅห๐ โ Csc(๐ ยฏsc) for๐ยฏiso =๐iso โ๐๐ โฅ 0,
ยฏ
๐ iso = ๐iso โ 2๐๐ โฅ 0, ๐ยฏsc = ๐sc โ ๐๐ โฅ 0, and ๐ ยฏiso = ๐sc โ 2๐๐ โฅ 0, ๐ธ๐ = ๐ฃ(๐ฅ๐ , ๐ก๐ ) +๐ฟ๐โฅ๐๐ โ๐๐(๐ก๐ ) โฅ,๐ป(๐ก) =โซ๐ก
๐ก๐
โ(๐)๐ ๐,๐ปยฏ(๐ก) =2๐ป(๐ก)๐ผยฏ/sup๐กโ[๐ก๐ ,๐ก๐]โ(๐ก), and โ(๐ก) = ๐ฟ๐๐๐ก(๐๐ , ๐ก๐ )/โ๏ธ
๐sc(๐ก๐). The suitable choices of ๐ฃ ,ยฏ ๐ผ, ๐ฟยฏ ๐ โ R>0 are to be defined below.
Proof of Lemma 9.3. Let us define a non-negative and continuous function ๐ธ(๐ฅ , ๐ก) as
๐ธ(๐ฅ , ๐ก) =๐ฃ(๐ฅ , ๐ก) +๐ฟ๐โฅ๐โ ๐๐(๐ก) โฅ (9.37)
where ๐ฅ = [๐โค,๐ยคโค]โค and ๐ฟ๐ โ R>0. Note that ๐ธ(๐ฅ , ๐ก) of (9.37) is 0 only when ๐ฅ = ๐ฅ๐(๐ก). Since we have ๐โฅ๐ โ ๐๐(๐ก) โฅ โค ๐ฃ(๐ฅ , ๐ก)/โ๏ธ
๐sc(๐ก๐) โ๐โฅ๐ โ ๐๐(๐ก) โฅ, designing๐ฟ๐to have๐ผโ๐ฟ๐/โ๏ธ
๐sc(๐ก๐) >0, the relation (9.35) gives ๐๐ธ(๐ฅ๐ , ๐ก๐ ) โค โ๐ผ ๐ธยฏ (๐ฅ๐ , ๐ก๐ ) + ๐ฟ๐
โ๏ธ
๐sc(๐ก๐)
โ๏ธโฅลห๐ โล๐ โฅ2+ โฅ๐ฅห๐ โ๐ฅ๐ โฅ2 (9.38) for any๐ฅ๐ ,๐ฅห๐ โ Csc(๐sc),ล๐ ,ลห๐ โ Ciso(๐iso), and๐ก๐ โ [0, ๐ก๐], where
ยฏ
๐ผ=min
๐ผโ๐ฟ๐/
โ๏ธ
๐sc(๐ก๐), ๐
>0.
Let us define another non-negative and continuous function๐(๐ฅ , ๐ก) as ๐(๐ฅ , ๐ก) =๐ธ(๐ฅ , ๐ก)๐๐พ ๐ป(๐ก) + ๐๐พ ๐ป(๐ก๐) โ๐๐พ ๐ป(๐ก)
๐พ
where ๐พ is a positive constant that satisfies 2 ยฏ๐ผ โฅ ๐พsup๐กโ[0,๐ก๐] โ(๐ก). If ยฏ๐ฃ ,๐คยฏ โ R>0
is selected to have๐(๐ฅ , ๐ก) < ๐คยฏ โ ๐ธ(๐ฅ , ๐ก) < ๐ฃยฏ โ ๐ฅ โ Csc(๐ยฏsc) = ร
๐กโ[0,๐ก๐]{๐ โ R๐| โฅ๐ โ๐ฅ๐(๐ก) โฅ < ๐ยฏsc} โ R๐, which is always possible since ๐ธ(๐ฅ , ๐ก) of (9.37) is 0 only when๐ฅ =๐ฅ๐(๐ก), then we can utilize (9.38) to compute๐๐ as follows for any ๐ฅ๐ = [๐โค
๐ ,๐ยคโค
๐ ]โค โ Csc(๐ยฏsc)and๐ก๐ โ [0, ๐ก๐]that satisfies๐(๐ฅ๐ , ๐ก๐ ) < ๐คยฏ:
๐๐(๐ฅ๐ , ๐ก๐ ) = (๐พ โ(๐ก๐ )๐ธ(๐ฅ๐ , ๐ก๐ ) +๐๐ธ(๐ฅ๐ , ๐ก๐ ))๐๐พ ๐ป(๐ก๐ ) โโ(๐ก๐ )๐๐พ ๐ป(๐ก๐ ) (9.39)
โค (๐ฝ(ล๐ ,ลห๐ , ๐ฅ๐ ,๐ฅห๐ ) โโ(๐ก๐ ))๐๐พ ๐ป(๐ก๐ ) where๐ฝ(ล๐ ,ลห๐ , ๐ฅ๐ ,๐ฅห๐ ) =๐ฟ๐
โ๏ธ
( โฅลห๐ โล๐ โฅ2+ โฅ๐ฅห๐ โ๐ฅ๐ โฅ2)/๐sc(๐ก๐)and the inequal- ity follows from the relation ยฏ๐ผ โฅ ๐พsup๐กโ[0,๐ก๐] โ(๐ก). Applying Dynkinโs formula [30, p. 10] to (9.39), it can be verified that๐(๐ฅ(๐ก), ๐ก)is a non-negative supermartingale due to the fact thatE๐๐ [๐ฝ(ล(๐ก),ล(ห ๐ก), ๐ฅ(๐ก),๐ฅห(๐ก))] โค ๐ฟ๐๐๐ก(๐๐ , ๐ก๐ )/โ๏ธ
๐sc(๐ก๐) =โ(๐ก) by definition of ๐๐ก(๐๐ , ๐ก๐ ) in (9.24) of Assumption 9.1, which gives the following due to Villeโs maximal inequality [30, p. 26]:
P๐๐
"
sup
๐กโ[๐ก๐ ,๐ก๐]
๐(๐ฅ(๐ก), ๐ก) โฅ๐คยฏ
#
โค ๐ธ๐ +๐พโ1(๐๐พ ๐ป(๐ก๐) โ1)
ยฏ ๐ค
(9.40) as long as we have ล(๐ก),ลห(๐ก) โ Ciso(๐iso) and ห๐ฅ(๐ก) โ Csc(๐sc) for โ๐ก โ [๐ก๐ , ๐ก๐], where๐๐ =(ล๐ ,ลห๐ , ๐ฅ๐ ,๐ฅห๐ )is as defined in (9.36). For๐๐ satisfyingล๐ โ Ciso(๐ยฏiso),
ห
ล๐ โ Ciso(๐ ยฏiso), ๐ฅ๐ โ Csc(๐ยฏsc), and ห๐ฅ๐ โ Csc(๐ ยฏsc), the occurrence of the events of Assumption 9.1 and the forward invariance condition of (9.26) of Assumption 9.2 ensure that if we have ๐ฅ(๐ก) โ Csc(๐ยฏsc) for โ๐ก โ [๐ก๐ , ๐ก๐], we get ล(๐ก) โ Ciso(๐ยฏiso),
ห
ล(๐ก) โ Ciso(๐iso), and ห๐ฅ(๐ก) โ Csc(๐sc) for โ๐ก โ [๐ก๐ , ๐ก๐]. Therefore, selecting the constants๐พ and ยฏ๐ค in the inequality (9.40) as in [30, pp. 82-83] yields (9.36) due to the relation๐(๐ฅ , ๐ก) < ๐คยฏ โ ๐ธ(๐ฅ , ๐ก) < ๐ฃยฏ โ๐ฅ โ Ciso(๐ยฏiso).
Now, let us select ๐๐ of Assumption 9.1 as ๐๐ = ๐๐E
๐๐ก๐ (๐0,0)
, where ๐0 is the tuple of the true and estimated ISO and spacecraft state at time ๐ก = 0 and ๐๐ โ R>0. Using Markovโs inequality along with the bounds (9.23) and (9.24) of Assumption 9.1, we get the following probability bound:
P
hโ๏ธโฅลห๐ โล๐ โฅ2+ โฅ๐ฅห๐ โ๐ฅ๐ โฅ2 < ๐๐E
๐๐ก๐ (๐0,0)i
โฅ (1โ๐est)
1โ 1 ๐๐
. (9.41) Since we have หล๐ โ Ciso(๐ ยฏiso) and ห๐ฅ๐ โ Csc(๐ ยฏsc) by the theorem assumption, the occurrence of the event in (9.41) implies ล๐ โ Ciso(๐ยฏiso) and ๐ฅ๐ โ Csc(๐ยฏsc) for
ยฏ
๐iso = ๐iso โ ๐๐E
๐๐ก๐ (๐0,0)
โฅ 0, ยฏ๐sc = ๐sc โ ๐๐E
๐๐ก๐ (๐0,0)
โฅ 0, and then ล(๐ก) โ Ciso(๐ยฏiso), โ๐ก โ [๐ก๐ , ๐ก๐] due to the forward invariance condition (9.26) in Assumption 9.2. Therefore, using ๐err of Assumption 9.1 along with the result of Lemma 9.3, we haveโ๐ctrl โRโฅ0s.t.
P[ A |ล(ห ๐ก๐ ) =ลห๐ โฉ๐ฅห(๐ก๐ ) =๐ฅห๐ ] โฅ 1โ๐ctrl = (1โ๐exit) (1โ๐est) (1โ๐err)
1โ 1 ๐๐
(9.42) whereA =ร
๐กโ[๐ก๐ ,๐ก๐]ล(๐ก) โ Ciso(๐ยฏiso) โฉล(ห ๐ก) โ Ciso(๐iso) โฉ๐ฅ(๐ก) โ Csc(๐ยฏsc) โฉ๐ฅห(๐ก) โ Csc(๐sc) โฉ E withE being the event of (9.23) in Assumption 9.1. The desired rela- tion (9.27) follows by evaluating the first term of the integral in (9.34), and by ob- serving that ifAof (9.42) occurs, we have๐ก๐ > ๐ก๐ andโ๏ธ
โฅลห๐ โล๐ โฅ2+ โฅ๐ฅห๐ โ๐ฅ๐ โฅ2<
๐๐E
๐๐ก๐ (๐0,0)
due to (9.41).
Note that if Eiso = Esc = R๐ in Assumption 9.1 and ๐ is globally Lipschitz in Assumption 9.2, the estimation bound (9.24) and the Lipschitz bound (9.25) always hold without the second condition of Assumption 9.1. This indicates that the bound (9.27) holds as long as (ล๐ , ๐ฅ๐ ) โ Dest for given หล๐ ,๐ฅห๐ โ R๐, which occurs with probability at least 1โ๐โ1
๐ due to (9.41).
As derived in Theorem 9.1, the SN-DNN min-norm control policy (9.17) of Lemma 9.2 enhances the SN-DNN terminal guidance policy of Lemma 9.1 by providing the explicit spacecraft delivery error bound (9.27), which holds even un- der the presence of the state uncertainty. This bound is valuable in modulating the learning and control parameters to achieve a verifiable performance guaran- tee consistent with a mission-specific performance requirement as to be seen in Sec. 9.4.II.
As illustrated in Fig. 9.6, the expected state tracking error bound (9.27) of Theo- rem 9.1 decreases exponentially in time if ๐๐ก(๐๐ , ๐ก๐ ) is a non-increasing function
: S/C : ISO
๐ฅ(๐ก๐ ) ล(๐ก๐ )
: expected tracking error bound
๐
เท
๐ฅ ๐ก , ลเท ๐ก
เท
๐ฅ ๐ก , ลเท ๐ก
เท
๐ฅ ๐ก , ลเท ๐ก
Figure 9.6: Illustration of the expected position tracking error bound (9.27) of Theorem 9.1, where๐๐ก(๐๐ , ๐ก๐ ) is assumed to be a non-increasing function in time.
in ๐ก. The following example demonstrates how we compute the bound (9.27) in practice.
Example 9.1. Suppose that the estimation error is upper-bounded by a function that exponentially decreases in time, i.e., we have
๐๐ก(๐๐ , ๐ก๐ ) =๐โ๐ฝ(๐กโ๐ก๐ )
โ๏ธโฅลห๐ โล๐ โฅ2+ โฅ๐ฅห๐ โ๐ฅ๐ โฅ2+๐
for (9.24)in Assumption 9.1, where ๐ โ Rโฅ0 and ๐ฝ โ R>0. Assuming that ๐ผ โ ๐ฝ, ๐ฝโ ๐, and๐โ ๐ผfor simplicity, we get
RHS of(9.27) โค ๐โ๐๐กยฏ๐( โฅ๐ห๐ โ ๐๐(๐ก๐ ) โฅ +๐๐) + ๐โ๐๐กยฏ๐ โ๐โ๐ผ๐กยฏ๐
๐ผโ๐
(๐ยฏ+1) ( โฅ๐ฅห๐ โ๐ฅ๐(๐ก๐ ) โฅ +๐๐)
โ๏ธ
๐sc(๐ก๐) + ๐ฟ๐
๐sc(๐ก๐)
( ๐๐ ๐ผโ ๐ฝ
๐โ๐ยฏ๐ก๐ โ๐โ๐ฝยฏ๐ก๐ ๐ฝโ๐
โ ๐โ๐ยฏ๐ก๐ โ๐โ๐ผยฏ๐ก๐ ๐ผโ๐
+ ๐ ๐ผ
1โ๐โ๐๐กยฏ๐ ๐
โ ๐โ๐๐กยฏ๐ โ๐โ๐ผ๐กยฏ๐ ๐ผโ๐
(9.43) where๐กยฏ๐ =๐ก๐ โ๐ก๐ . Note that the bound(9.43)can be computed explicitly for given
ห
๐ฅ๐ andลห๐ at time๐ก =๐ก๐ as long asลห๐ โ Ciso(๐ ยฏiso) and๐ฅห๐ โ Csc(๐ ยฏsc). Its dominant term in(9.43)for large๐กยฏ๐ is๐ฟ๐๐/(๐sc(๐ก๐)๐ผ๐)due to the following relation:
lim
๐ก๐โโRHS of(9.43)= ๐ฟ๐๐ ๐sc(๐ก๐)๐ผ๐
.
The bound 9.27 of Theorem 9.1 can be used as a tool to determine whether we should use the SN-DNN terminal guidance policy of Lemma 9.1 or the SN-DNN min-norm control policy (9.17) of Lemma 9.2, based on the trade-off between them as to be seen in the following.
๐ขโ ๐ขโ
๐ข
ล(๐ก)
On-board Navigation
LAUNCH
2. CRUISE G&C 3. โONLINEโ TERMINAL G&C โ NEURAL-RENDEZVOUS : S/C
: ISO
๐ฆ ๐ก ๐ฅ ๐ก
SN-DNN Guidance ๐ขโ
On-board Navigation ๐ SN-DNN
Min-norm Control ๐ขโ
เท
๐ฅ ๐ก เทล ๐ก ๐ฅ ๐กเท , ลเท ๐ก ๐ฅ ๐กเท , ลเท ๐ก
๐ขโ
Ground-based Navigation
Spectrally-Normalized Deep Neural Network
(SN-DNN) { าง๐ฅ๐}๐=1๐๐
{ เดฅล๐}๐=1๐๐ { าง๐ก๐}๐=1๐๐ {๐ฅ าง๐ก๐}๐=1๐๐ { เดค๐๐}๐=1๐๐
Nonlinear Optimization- based Motion Planner
(MPC)
{๐ขmpc( าง๐ฅ๐, เดฅล๐, าง๐ก๐)}๐=1๐๐ {๐mpcาง๐ก๐+๐ฅ าง๐ก๐( าง๐ฅ๐, เดฅล๐, าง๐ก๐)}๐=1๐๐
{๐ขโ( าง๐ฅ๐, เดฅล๐, าง๐ก๐)}๐=1๐๐ target control input learned control input target state trajectory learned state trajectory
1. โOFFLINEโ SN-DNN TRAINING FOR FASTER ONLINE COMPUTATION
{๐โาง๐ก๐+๐ฅ าง๐ก๐( าง๐ฅ๐, เดฅล๐, าง๐ก๐)}๐=1๐๐ SN-DNN Loss Functions
MISSION DESIGN PHASE Input data for SN-DNN training Target data and SN-DNN outputs
Figure 9.7: Mission timeline for encountering an ISO, where ๐ฆ(๐ก) is the state measurement as in Fig. 9.2 and the other notation follows that of Lemma 9.1 and Theorem 9.1. Terminal G&C (Neural-Rendezvous) is performed using the SN- DNN guidance and min-norm control policies online as in Algorithm 1, thereby enabling fast response autonomous operations under the large ISO state uncertainty and high-velocity challenges. Note that the SN-DNN is trained offline.
9.4.II Neural-Rendezvous
We summarize the pros and cons of the aforementioned terminal G&C techniques.
โข The SN-DNN terminal guidance policy of Lemma 9.1 can be implemented in real-time and possesses an optimality gap as in (9.11), resulting in the near-
Algorithm 1:Neural-Rendezvous Inputs :๐ขโof Lemma 9.1 and๐ขโ
โ of Lemma 9.2 Outputs:Control input๐ขof (9.1) for๐ก โ [0, ๐ก๐] ๐ก0โcurrent time
ฮ๐กโcontrol time interval ๐ก๐, ๐ก๐, ๐กintโcurrent timeโ๐ก0 flagA, flagBโ0
while๐ก๐ < ๐ก๐ do
Obtain หล(๐ก๐)and ห๐ฅ(๐ก๐)using navigation technique ifflagA=0then
๐ขโ๐ขโ(๐ฅห(๐ก๐),ล(ห ๐ก๐), ๐ก๐) else
ifflagB=1then ๐ขโ๐ขโ
โ(๐ฅห(๐ก๐),ล(ห ๐ก๐), ๐ก๐) else
ComputeRHS of(9.27) in Theorem 9.1 with๐ก๐ =๐ก๐ ifRHS of(9.27)> ๐ก โ๐ ๐ ๐ โ๐๐ ๐then
๐ขโ๐ขโ(๐ฅห(๐ก๐),ล(ห ๐ก๐), ๐ก๐) else
๐ขโ๐ขโ
โ(๐ฅห(๐ก๐),ล(ห ๐ก๐), ๐ก๐) flagBโ1
Apply๐ขto (9.1) for๐กโ [๐ก๐, ๐ก๐+ฮ๐ก] whilecurrent timeโ๐ก0 < ๐ก๐+ฮ๐กdo
ifflagB=0then
Integrate dynamics to get๐ฅ๐andล๐of (9.14) form๐กint ifintegration is completethen
Update๐ฅ๐andล๐
๐ก๐, ๐กintโ๐ก๐+ฮ๐ก flagAโ1 else
๐กintโtime when S/C stopped integration ๐ก๐ โ๐ก๐+ฮ๐ก
optimal guarantee in terms of dynamic regret as discussed below (9.5) and in Lemma 9.1. However, obtaining any quantitative bound on the spacecraft delivery error with respect to the desired relative position, to either flyby or impact the ISO, is difficult in general [7].
โข In contrast, the SN-DNN min-norm control policy (9.17) of Lemma 9.2, which solves the problem 9.1 of Sec. 9.1, can also be implemented in real-time and provides an explicit upper-bound on the spacecraft delivery error that decreases in time as proven in Theorem (9.1). However, the desired trajectory it tracks is subject to the large state uncertainty initially for small๐ก๐ , resulting in a large optimality gap.
Based on these observations, it is ideal to utilize the SN-DNN terminal guidance policy without any feedback control initially for large state uncertainty, to avoid having a large optimality gap, then activate the SN-DNN min-norm control once the verifiable spacecraft delivery error of (9.27) of Theorem (9.1) becomes smaller than a mission-specific threshold value for the desired trajectory (9.14), which is to be updated at๐ก๐ โ [0, ๐ก๐] along the way as discussed in Remark 9.6. The pseudo- code for the proposed learning-based approach for encountering the ISO is given in Algorithm 1, and its mission timeline is shown in Fig. 9.7.
Again, the role of offline learning in Neural-Rendesvous, with the size of the neural network selected to be small enough for real-time implementation, is to deal with the limited computational resources of the spacecraft in solving any nonlinear opti- mization with non-negligible online computational load, including the one of MPC with receding horizons. This also enables utilizing Neural-Rendezvous in more challenging and highly nonlinear G&C scenarios as to be seen in Sec. 10.2, where solving even a single optimization online could become unrealistic.
Remark 9.7. We use ๐กint in Algorithm 1 to account for the fact that computing๐ฅ๐ could take more than ฮ๐ก for small ฮ๐ก and ๐ก๐ as pointed out in Remark 9.6, and thus the spacecraft is sometimes required to compute it over multiple time steps.
It can be seen that the SN-DNN terminal guidance policy ๐ขโ is also useful when the spacecraft does not possess ๐ฅ๐ yet, i.e., when flagA = 0. Also, the impact of discretization introduced in Algorithm 1 will be demonstrated in Sec. 10.1.III, where the detailed discussion of its connection to continuous-time stochastic systems can be found in [32].