• Tidak ada hasil yang ditemukan

Collision Avoidance and Robust Tracking Augmentation

Chapter 7: Learning-based Safe and Robust Motion Planning

7.5 Collision Avoidance and Robust Tracking Augmentation

0.0 2.5 5.0

horizontal coordinate px(m) 0

2 4

verticalcoordinatepy(m) (a) Learning-based

0.0 2.5 5.0 0

2 4

(b) Robust tube-based

0.0 2.5 5.0 0

2 4

(c) LAG-ROS

0.0 2.5 5.0 0

2 4

(d) Centralized solution

Figure 7.5: Trajectories for the learning-based planner(a), robust tube-based plan- ner(b), LAG-ROS(c), and offline centralized solution(d)(β—¦: start,β€’: goal).

Training Data Sampling

Learning-based Control

Find optimal contraction metrics State π‘₯

Time 𝑑

Environment Information π‘œ

Sample π‘₯(𝑑)s.t. π‘₯ βˆ’ π‘₯𝑑 ≀ π‘Ÿβ„“(𝑑) (π‘Ÿβ„“(𝑑): Radius of tracking error

tube with learning error)

Objectives

βˆ™control

βˆ™estimation

βˆ™system ID

βˆ™guidance

βˆ™etc.

CV-STEM Robust Control Reference

Signal π‘Ÿπ‘‘

ο‚§ Contracting Policy π‘’βˆ—

ο‚§ Contraction Metric w.r.t 𝑀 Contraction

Theory Current State π‘₯, Time 𝑑, &

Environment Information π‘œ

Stability &

Robustness Guarantees

Neural Net Policy 𝑒𝐿

𝑑 : Sampled state π‘₯ : Target state π‘₯𝑑

Convex 𝑀 Program (CV-STEM)

Obtain performance guarantees

CV-STEM Robust Control Contraction Theory Sample π‘₯in bounded error tube

𝑒𝐿(π‘₯, π‘œ, 𝑑): DNN policy

offline computation online stability/robustness (𝑒𝐿:given by evaluating once )

to be modeled by a neural network Sample {π‘Ÿπ‘‘}𝑖=1𝑁 (e.g., Target states)

π‘’βˆ— Reference Signal Generation

π‘₯π‘₯𝑑

𝑑 𝑒𝑑

𝑒𝐿

Figure 7.6: Detailed block diagram of machine learning-based control using con- traction theory, including LAG-ROS, where Fig. 4.1, Fig. 7.1, and Fig. 7.4 are utilized as building blocks.

as in Theorem 7.1 due to its internal contracting structure. This section presents an analytical framework for providing control theoretical collision avoidance and robust tracking guarantees to given learned motion planning policies for nonlinear multi-agent systems, independently of the performance of the learning approaches used in designing the learned policy. Although our approach can be extended further to handle general notions of safety, we focus on collision-free operation as the objective of safety. We call our approach CART (Collision Avoidance and Robust Tracking) [38] and its concept is to be summarized in the following.

7.5.I Augmenting Learned Policy with Safety and Robustness

Let us first recall that, as implied throughout this chapter, directly using the learned motion planning policy has the following two issues in practice: (i) even in nom- inal settings without external disturbance in (7.6) and (7.7), the system solution

π‘₯β„“ π‘₯𝑑

π‘₯π‘Ÿ

π‘₯β„“

αˆΆβ„Ž β‰₯ βˆ’π›Όβ„Žβ„Ž larger π›Όβ„Žβ†’more robust

(stability of a set)

αˆΆπ‘‰ ≀ βˆ’π›Όπ‘‰π‘‰ larger 𝛼𝑉→more robust

(incremental stability)

2. Robust Tracking of Safe 𝒙𝒅 1. Safe Trajectory Generation

π‘₯𝑑

Learned motion planning (π‘₯β„“, 𝑒ℓ)

Safe state trajectory π‘₯𝑑

Safe & robust state trajectory π‘₯ π‘₯

1 2

DISTURBANCE

Figure 7.7: Conceptual illustration of CART, showing the hierarchical combination of our safety filter and robust filter, where π›Όβ„Ž, 𝛼𝑉 > 0, β„Ž β‰₯ 0 is a safety function given,𝑆is a safe set given, π‘†πœ€ is some fictitious set containing learned trajectoryπ‘₯β„“ with learning errorπœ€ > 0,π‘₯𝑑is a safe trajectory,π‘₯π‘Ÿ is a reference trajectory given by global motion planner (7.8),π‘₯is the actual state trajectory subject to disturbance, and 𝑉 is an incremental Lyapunov function for safe and robust trajectory tracking. Note that we use a log-barrier formulation in CART for its distributed implementation and analytical solution, but this figure uses β„Ž for the simplicity of our concept description.

trajectories computed with the learned motion planning policy could violate safety requirements due to learning errors, and (ii) the learned policy lacks a formal mathematical guarantee of safety and robustness under the presence of external dis- turbance. Before going into details, let us see how we address these two problems analytically in real-time for the general systems (7.6) and (7.7), optimally and in- dependently of the performance of the learning method used in the learned motion planning policy.

7.5.I-A Safety Filter and Built-in Robustness

Let𝑆 = {π‘₯ ∈ R2𝑛|β„Ž(π‘₯) β‰₯ 0} be a set defining safety. Given the learned policy𝑒𝑖

β„“, we can slightly modify it using a safety filter (Control Barrier Function, CBF) to ensure the agents’ safe operation using a constraint β„ŽΒ€ β‰₯ βˆ’π›Όβ„Žβ„Ž, π›Όβ„Ž > 0 [39] even

αˆΆβ„Ž β‰₯ βˆ’π›Όβ„Žβ„Ž

Robust due to pulling force to 𝑆

𝑆 = β„Ž β‰₯ 0

𝑆 = β„Ž β‰₯ 0 π‘₯𝑑 π‘₯𝑑

𝑉 = π‘₯ βˆ’ π‘₯𝑑 T𝑀(π‘₯ βˆ’ π‘₯𝑑) 𝑉 = α‰Šβˆ’β„Ž if π‘₯ βˆ‰ 𝑆

0 otherwise

𝑉 : 𝑉 at time t 𝑉

αˆΆπ‘‰ ≀ βˆ’π›Όπ‘‰π‘‰

Robust due to pulling force to π‘₯𝑑 Figure 7.8: Different sources of robustness against external disturbance, where our safety filter is robust due to stability of a safe set and our robust filter is robust due to incremental stability of the closed-loop system with respect to a target trajectory.

with the presence of learning errorπœ€ >0 (βˆ’π›Όβ„Žβ„Žcan beβˆ’π›Ό(β„Ž)for a classKfunction 𝛼). Intuitively, since the learning error πœ€ is expected to be small empirically, the contribution required for the safety filter is also expected to be small in practice, especially in the absence of external disturbance as depicted in the left-hand side of Fig. 7.7.

Also, such a safety filter is robust as can be shown using a Lyapunov function𝑉 =βˆ’β„Ž forπ‘₯ βˆ‰π‘†and𝑉 =0 forπ‘₯ βˆˆπ‘†[40]. This implies that the safe set𝑆is constructed to be exponentially (asymptotically for classKfunctions) stable, where the robustness results from the pulling force to the safe set𝑆that could be undesirably large leading to a large tracking error, e.g., in the real world scenario involving the discretization of the control and dynamics (see the left-hand side of Fig. 7.8).

7.5.I-B Robust Filter and Tracking-based Robustness

Instead of handling both safety and robustness just by the safety filter, we can further utilize a robust filter hierarchically to mitigate the burden of the safety filter in dealing with the disturbance, i.e., to robustly track the safe trajectoryπ‘₯𝑑 slightly modified from the reference trajectory π‘₯π‘Ÿ using the safety filter, as depicted in the right-hand side of Fig. 7.7. We still use the Lyapunov formulation as in the safety filter for robustness, but now the Lyapunov function is defined incrementally as 𝑉 =(π‘₯βˆ’π‘₯𝑑)βŠ€π‘€(π‘₯βˆ’π‘₯𝑑), where𝑀 ≻ 0 is to be defined in the proceeding sections.

Improving the robustness performance here will simply result in closer tracking of the safe trajectoryπ‘₯𝑑 (see the right-hand side of Fig. 7.8) without losing too much information of the learned motion planning policy 𝑒𝑖

β„“. This is mainly because the safety is handled indirectly withβ„ŽΒ€ β‰₯ βˆ’π›Όβ„Žβ„Žin the safety filter and directly withβ„Ž β‰₯ 0 in the robust filter.

These observations imply that

(a) when the learning error is much larger than the size of the external disturbance, then we can use the safety filter and its build-in robustness, and

(b) when the learning error is much smaller than the size of the external disturbance, which is often the case, then we can modify the learned policy slightly with the safety filter in a nominal setting, handling disturbance using the tracking-based robust filter on behalf of the safety filter.

Note that we use the log-barrier formulation from now on instead of β„ŽΒ€ β‰₯ βˆ’π›Όβ„Žβ„Ž, which allows for the distributed implementation of our safety filter in a multi-agent setting.

7.5.I-C Relationship with Existing Methods

The nonlinear robustness and stability can also be analyzed using a Lyapunov function, which gives a finite tracking error with respect to a given target trajectory under the presence of external disturbances, including approximation errors of given learned motion planning policies. This property can be used further to establish a safety guarantee, by utilizing a conservative constraint that a tube around the computed target trajectory will not violate a given safety requirement [17], [18], [20], [41], [42]. This framework thus provides a one way to ensure safety and robustness as in the LAG-ROS framework with Lemma 7.1 (see also [43]–[45] and references therein), which depends on the knowledge and the size of the approximation error of a given learned motion planning policy. Such information could be conservative for previously unseen data or available only empirically (e.g., by using a Lipschitz bound [46]).

The CLF-CBF control [39], [40], [47] also considers safety and robustness in real- time without any knowledge of the learned motion planning errors at all [39], [40], [47] (see also [30], [48]–[50] for stochastic and higher-order systems and [51]–

[54] for learning-based CBF methods), without solving nonlinear optimization as

in MPC-based methods for robustness and safety [41]. In our context, it constructs an optimal control input by solving a QP to minimize its deviation from the learned motion planning policy, subject to the safety constraint β„ŽΒ€ β‰₯ βˆ’π›Όβ„Žβ„Žand the relaxed incremental stability constraint𝑉€ ≀ βˆ’π›Όπ‘‰π‘‰ +𝜌, where𝜌 is for QP feasibility and𝑉 is now defined as𝑉 = (π‘₯βˆ’π‘₯β„“)βŠ€π‘€(π‘₯βˆ’π‘₯β„“)for the learned trajectoryπ‘₯β„“ of Fig. 7.7.

We list key differences between the CLF-CBF controller and our approach, CART:

(a) The primary distinction between CART and CLF-CBF is the direction in which the respective tracking component steers the system. The CLF-CBF stability component steers towards the learned trajectory, which, because of learning error, might not be safe. In contrast, CART’s robust filter steers the system toward a certified safe trajectory, generated by the previous layer in the hierarchy, the safety filter. Because of this distinction, whereas CART can safely reject large disturbances with a large safety gain 𝛼𝑉, this strategy is not practical for the CLF-CBF controller, which is forced to reject disturbances with large safety gain π›Όβ„Ž, which pulls the system overconservatively towards the interior of the safety set.

(b) The secondary distinction between CART and CLF-CBF is that CLF-CBF re- quires solving a QP with a given Lyapunov function, while CART provides an explicit way to construct the incremental Lyapunov function using contraction theory, and gives an analytical solution for the optimal control input. This makes CART end-to-end trainable with a faster computation evaluation time.

CART proposes a hierarchical approach to combine the best of both of these methods for safety and robustness, by performing contraction theory-based robust tracking of a provably collision-free trajectory, generated by a safety filter with the learned policy. The additional tracking-based robust filter is for reducing the burden of the safety filter in dealing with disturbances. CART can be viewed also as a generaliza- tion of the frameworks [2], [55] to multi-agent Lagrangian and general control-affine nonlinear systems with deterministic and stochastic disturbance, constructed on top of the end-to-end learned motion planning policy augmented with the optimality guarantee of the analytical solution to our safety and robust filters.

The trade-off of Sec. 7.5.I-B and the strengths implied in Sec. 7.5.I-C will be demonstrated in Chapter 10.