Chapter 7: Learning-based Safe and Robust Motion Planning
7.5 Collision Avoidance and Robust Tracking Augmentation
0.0 2.5 5.0
horizontal coordinate px(m) 0
2 4
verticalcoordinatepy(m) (a) Learning-based
0.0 2.5 5.0 0
2 4
(b) Robust tube-based
0.0 2.5 5.0 0
2 4
(c) LAG-ROS
0.0 2.5 5.0 0
2 4
(d) Centralized solution
Figure 7.5: Trajectories for the learning-based planner(a), robust tube-based plan- ner(b), LAG-ROS(c), and offline centralized solution(d)(β¦: start,β’: goal).
Training Data Sampling
Learning-based Control
Find optimal contraction metrics State π₯
Time π‘
Environment Information π
Sample π₯(π‘)s.t. π₯ β π₯π β€ πβ(π‘) (πβ(π‘): Radius of tracking error
tube with learning error)
Objectives
βcontrol
βestimation
βsystem ID
βguidance
βetc.
CV-STEM Robust Control Reference
Signal ππ
ο§ Contracting Policy π’β
ο§ Contraction Metric w.r.t π Contraction
Theory Current State π₯, Time π‘, &
Environment Information π
Stability &
Robustness Guarantees
Neural Net Policy π’πΏ
π‘ : Sampled state π₯ : Target state π₯π
Convex π Program (CV-STEM)
Obtain performance guarantees
CV-STEM Robust Control Contraction Theory Sample π₯in bounded error tube
π’πΏ(π₯, π, π‘): DNN policy
offline computation online stability/robustness (π’πΏ:given by evaluating once )
to be modeled by a neural network Sample {ππ}π=1π (e.g., Target states)
π’β Reference Signal Generation
π₯π₯π
π‘ π’π
π’πΏ
Figure 7.6: Detailed block diagram of machine learning-based control using con- traction theory, including LAG-ROS, where Fig. 4.1, Fig. 7.1, and Fig. 7.4 are utilized as building blocks.
as in Theorem 7.1 due to its internal contracting structure. This section presents an analytical framework for providing control theoretical collision avoidance and robust tracking guarantees to given learned motion planning policies for nonlinear multi-agent systems, independently of the performance of the learning approaches used in designing the learned policy. Although our approach can be extended further to handle general notions of safety, we focus on collision-free operation as the objective of safety. We call our approach CART (Collision Avoidance and Robust Tracking) [38] and its concept is to be summarized in the following.
7.5.I Augmenting Learned Policy with Safety and Robustness
Let us first recall that, as implied throughout this chapter, directly using the learned motion planning policy has the following two issues in practice: (i) even in nom- inal settings without external disturbance in (7.6) and (7.7), the system solution
π₯β π₯π
π₯π
π₯β
αΆβ β₯ βπΌββ larger πΌββmore robust
(stability of a set)
αΆπ β€ βπΌππ larger πΌπβmore robust
(incremental stability)
2. Robust Tracking of Safe ππ 1. Safe Trajectory Generation
π₯π
Learned motion planning (π₯β, π’β)
Safe state trajectory π₯π
Safe & robust state trajectory π₯ π₯
1 2
DISTURBANCE
Figure 7.7: Conceptual illustration of CART, showing the hierarchical combination of our safety filter and robust filter, where πΌβ, πΌπ > 0, β β₯ 0 is a safety function given,πis a safe set given, ππ is some fictitious set containing learned trajectoryπ₯β with learning errorπ > 0,π₯πis a safe trajectory,π₯π is a reference trajectory given by global motion planner (7.8),π₯is the actual state trajectory subject to disturbance, and π is an incremental Lyapunov function for safe and robust trajectory tracking. Note that we use a log-barrier formulation in CART for its distributed implementation and analytical solution, but this figure uses β for the simplicity of our concept description.
trajectories computed with the learned motion planning policy could violate safety requirements due to learning errors, and (ii) the learned policy lacks a formal mathematical guarantee of safety and robustness under the presence of external dis- turbance. Before going into details, let us see how we address these two problems analytically in real-time for the general systems (7.6) and (7.7), optimally and in- dependently of the performance of the learning method used in the learned motion planning policy.
7.5.I-A Safety Filter and Built-in Robustness
Letπ = {π₯ β R2π|β(π₯) β₯ 0} be a set defining safety. Given the learned policyπ’π
β, we can slightly modify it using a safety filter (Control Barrier Function, CBF) to ensure the agentsβ safe operation using a constraint βΒ€ β₯ βπΌββ, πΌβ > 0 [39] even
αΆβ β₯ βπΌββ
Robust due to pulling force to π
π = β β₯ 0
π = β β₯ 0 π₯π π₯π
π = π₯ β π₯π Tπ(π₯ β π₯π) π = αββ if π₯ β π
0 otherwise
π : π at time t π
αΆπ β€ βπΌππ
Robust due to pulling force to π₯π Figure 7.8: Different sources of robustness against external disturbance, where our safety filter is robust due to stability of a safe set and our robust filter is robust due to incremental stability of the closed-loop system with respect to a target trajectory.
with the presence of learning errorπ >0 (βπΌββcan beβπΌ(β)for a classKfunction πΌ). Intuitively, since the learning error π is expected to be small empirically, the contribution required for the safety filter is also expected to be small in practice, especially in the absence of external disturbance as depicted in the left-hand side of Fig. 7.7.
Also, such a safety filter is robust as can be shown using a Lyapunov functionπ =ββ forπ₯ βπandπ =0 forπ₯ βπ[40]. This implies that the safe setπis constructed to be exponentially (asymptotically for classKfunctions) stable, where the robustness results from the pulling force to the safe setπthat could be undesirably large leading to a large tracking error, e.g., in the real world scenario involving the discretization of the control and dynamics (see the left-hand side of Fig. 7.8).
7.5.I-B Robust Filter and Tracking-based Robustness
Instead of handling both safety and robustness just by the safety filter, we can further utilize a robust filter hierarchically to mitigate the burden of the safety filter in dealing with the disturbance, i.e., to robustly track the safe trajectoryπ₯π slightly modified from the reference trajectory π₯π using the safety filter, as depicted in the right-hand side of Fig. 7.7. We still use the Lyapunov formulation as in the safety filter for robustness, but now the Lyapunov function is defined incrementally as π =(π₯βπ₯π)β€π(π₯βπ₯π), whereπ β» 0 is to be defined in the proceeding sections.
Improving the robustness performance here will simply result in closer tracking of the safe trajectoryπ₯π (see the right-hand side of Fig. 7.8) without losing too much information of the learned motion planning policy π’π
β. This is mainly because the safety is handled indirectly withβΒ€ β₯ βπΌββin the safety filter and directly withβ β₯ 0 in the robust filter.
These observations imply that
(a) when the learning error is much larger than the size of the external disturbance, then we can use the safety filter and its build-in robustness, and
(b) when the learning error is much smaller than the size of the external disturbance, which is often the case, then we can modify the learned policy slightly with the safety filter in a nominal setting, handling disturbance using the tracking-based robust filter on behalf of the safety filter.
Note that we use the log-barrier formulation from now on instead of βΒ€ β₯ βπΌββ, which allows for the distributed implementation of our safety filter in a multi-agent setting.
7.5.I-C Relationship with Existing Methods
The nonlinear robustness and stability can also be analyzed using a Lyapunov function, which gives a finite tracking error with respect to a given target trajectory under the presence of external disturbances, including approximation errors of given learned motion planning policies. This property can be used further to establish a safety guarantee, by utilizing a conservative constraint that a tube around the computed target trajectory will not violate a given safety requirement [17], [18], [20], [41], [42]. This framework thus provides a one way to ensure safety and robustness as in the LAG-ROS framework with Lemma 7.1 (see also [43]β[45] and references therein), which depends on the knowledge and the size of the approximation error of a given learned motion planning policy. Such information could be conservative for previously unseen data or available only empirically (e.g., by using a Lipschitz bound [46]).
The CLF-CBF control [39], [40], [47] also considers safety and robustness in real- time without any knowledge of the learned motion planning errors at all [39], [40], [47] (see also [30], [48]β[50] for stochastic and higher-order systems and [51]β
[54] for learning-based CBF methods), without solving nonlinear optimization as
in MPC-based methods for robustness and safety [41]. In our context, it constructs an optimal control input by solving a QP to minimize its deviation from the learned motion planning policy, subject to the safety constraint βΒ€ β₯ βπΌββand the relaxed incremental stability constraintπΒ€ β€ βπΌππ +π, whereπ is for QP feasibility andπ is now defined asπ = (π₯βπ₯β)β€π(π₯βπ₯β)for the learned trajectoryπ₯β of Fig. 7.7.
We list key differences between the CLF-CBF controller and our approach, CART:
(a) The primary distinction between CART and CLF-CBF is the direction in which the respective tracking component steers the system. The CLF-CBF stability component steers towards the learned trajectory, which, because of learning error, might not be safe. In contrast, CARTβs robust filter steers the system toward a certified safe trajectory, generated by the previous layer in the hierarchy, the safety filter. Because of this distinction, whereas CART can safely reject large disturbances with a large safety gain πΌπ, this strategy is not practical for the CLF-CBF controller, which is forced to reject disturbances with large safety gain πΌβ, which pulls the system overconservatively towards the interior of the safety set.
(b) The secondary distinction between CART and CLF-CBF is that CLF-CBF re- quires solving a QP with a given Lyapunov function, while CART provides an explicit way to construct the incremental Lyapunov function using contraction theory, and gives an analytical solution for the optimal control input. This makes CART end-to-end trainable with a faster computation evaluation time.
CART proposes a hierarchical approach to combine the best of both of these methods for safety and robustness, by performing contraction theory-based robust tracking of a provably collision-free trajectory, generated by a safety filter with the learned policy. The additional tracking-based robust filter is for reducing the burden of the safety filter in dealing with disturbances. CART can be viewed also as a generaliza- tion of the frameworks [2], [55] to multi-agent Lagrangian and general control-affine nonlinear systems with deterministic and stochastic disturbance, constructed on top of the end-to-end learned motion planning policy augmented with the optimality guarantee of the analytical solution to our safety and robust filters.
The trade-off of Sec. 7.5.I-B and the strengths implied in Sec. 7.5.I-C will be demonstrated in Chapter 10.