Collision Avoidance and Robust Tracking Augmentation

Chapter 7: Learning-based Safe and Robust Motion Planning

7.5 Collision Avoidance and Robust Tracking Augmentation

0.0 2.5 5.0

horizontal coordinate p_x(m) 0

2 4

verticalcoordinatepy(m) (a) Learning-based

0.0 2.5 5.0 0

2 4

(b) Robust tube-based

0.0 2.5 5.0 0

2 4

0.0 2.5 5.0 0

2 4

(d) Centralized solution

Figure 7.5: Trajectories for the learning-based planner(a), robust tube-based planner(b), LAG-ROS(c), and offline centralized solution(d)(◦: start,•: goal).

Training Data Sampling

Learning-based Control

Find optimal contraction metrics State 𝑥

Time 𝑡

Environment Information 𝑜

Sample 𝑥(𝑡)s.t. 𝑥 − 𝑥_𝑑 ≤ 𝑟_ℓ(𝑡) (𝑟ℓ(𝑡): Radius of tracking error

tube with learning error)

Objectives

∙control

∙estimation

∙system ID

∙guidance

∙etc.

CV-STEM Robust Control Reference

Signal 𝑟𝑑

 Contracting Policy 𝑢^∗

 Contraction Metric w.r.t 𝑀 Contraction

Theory Current State 𝑥, Time 𝑡, &

Environment Information 𝑜

Stability &

Robustness Guarantees

Neural Net Policy 𝑢𝐿

𝑡 : Sampled state 𝑥 : Target state 𝑥_𝑑

Convex 𝑀 Program (CV-STEM)

Obtain performance guarantees

CV-STEM Robust Control Contraction Theory Sample 𝑥in bounded error tube

𝑢_𝐿(𝑥, 𝑜, 𝑡): DNN policy

offline computation online stability/robustness (𝑢𝐿:given by evaluating once )

to be modeled by a neural network Sample {𝑟𝑑}_𝑖=1^𝑁 (e.g., Target states)

𝑢^∗ Reference Signal Generation

𝑥𝑥𝑑

𝑡 𝑢𝑑

𝑢𝐿

Figure 7.6: Detailed block diagram of machine learning-based control using contraction theory, including LAG-ROS, where Fig. 4.1, Fig. 7.1, and Fig. 7.4 are utilized as building blocks.

as in Theorem 7.1 due to its internal contracting structure. This section presents an analytical framework for providing control theoretical collision avoidance and robust tracking guarantees to given learned motion planning policies for nonlinear multi-agent systems, independently of the performance of the learning approaches used in designing the learned policy. Although our approach can be extended further to handle general notions of safety, we focus on collision-free operation as the objective of safety. We call our approach CART (Collision Avoidance and Robust Tracking) [38] and its concept is to be summarized in the following.

7.5.I Augmenting Learned Policy with Safety and Robustness

Let us first recall that, as implied throughout this chapter, directly using the learned motion planning policy has the following two issues in practice: (i) even in nominal settings without external disturbance in (7.6) and (7.7), the system solution

𝑥_ℓ 𝑥_𝑑

𝑥_𝑟

𝑥_ℓ

ሶℎ ≥ −𝛼_ℎℎ larger 𝛼_ℎ→more robust

(stability of a set)

ሶ𝑉 ≤ −𝛼_𝑉𝑉 larger 𝛼_𝑉→more robust

(incremental stability)

2. Robust Tracking of Safe 𝒙_𝒅 1. Safe Trajectory Generation

𝑥_𝑑

Learned motion planning (𝑥_ℓ, 𝑢_ℓ)

Safe state trajectory 𝑥_𝑑

Safe & robust state trajectory 𝑥 𝑥

1 2

DISTURBANCE

Figure 7.7: Conceptual illustration of CART, showing the hierarchical combination of our safety filter and robust filter, where 𝛼_ℎ, 𝛼_𝑉 > 0, ℎ ≥ 0 is a safety function given,𝑆is a safe set given, 𝑆_𝜀 is some fictitious set containing learned trajectory𝑥_ℓ with learning error𝜀 > 0,𝑥_𝑑is a safe trajectory,𝑥_𝑟 is a reference trajectory given by global motion planner (7.8),𝑥is the actual state trajectory subject to disturbance, and 𝑉 is an incremental Lyapunov function for safe and robust trajectory tracking. Note that we use a log-barrier formulation in CART for its distributed implementation and analytical solution, but this figure uses ℎ for the simplicity of our concept description.

trajectories computed with the learned motion planning policy could violate safety requirements due to learning errors, and (ii) the learned policy lacks a formal mathematical guarantee of safety and robustness under the presence of external disturbance. Before going into details, let us see how we address these two problems analytically in real-time for the general systems (7.6) and (7.7), optimally and independently of the performance of the learning method used in the learned motion planning policy.

7.5.I-A Safety Filter and Built-in Robustness

Let𝑆 = {𝑥 ∈ R²^𝑛|ℎ(𝑥) ≥ 0} be a set defining safety. Given the learned policy𝑢^𝑖

ℓ, we can slightly modify it using a safety filter (Control Barrier Function, CBF) to ensure the agents’ safe operation using a constraint ℎ¤ ≥ −𝛼_ℎℎ, 𝛼_ℎ > 0 [39] even

ሶℎ ≥ −𝛼_ℎℎ

Robust due to pulling force to 𝑆

𝑆 = ℎ ≥ 0

𝑆 = ℎ ≥ 0 𝑥_𝑑 𝑥_𝑑

𝑉 = 𝑥 − 𝑥_𝑑 ^T𝑀(𝑥 − 𝑥_𝑑) 𝑉 = ቊ−ℎ if 𝑥 ∉ 𝑆

0 otherwise

𝑉 : 𝑉 at time t 𝑉

ሶ𝑉 ≤ −𝛼_𝑉𝑉

Robust due to pulling force to 𝑥_𝑑 Figure 7.8: Different sources of robustness against external disturbance, where our safety filter is robust due to stability of a safe set and our robust filter is robust due to incremental stability of the closed-loop system with respect to a target trajectory.

with the presence of learning error𝜀 >0 (−𝛼_ℎℎcan be−𝛼(ℎ)for a classKfunction 𝛼). Intuitively, since the learning error 𝜀 is expected to be small empirically, the contribution required for the safety filter is also expected to be small in practice, especially in the absence of external disturbance as depicted in the left-hand side of Fig. 7.7.

Also, such a safety filter is robust as can be shown using a Lyapunov function𝑉 =−ℎ for𝑥 ∉𝑆and𝑉 =0 for𝑥 ∈𝑆[40]. This implies that the safe set𝑆is constructed to be exponentially (asymptotically for classKfunctions) stable, where the robustness results from the pulling force to the safe set𝑆that could be undesirably large leading to a large tracking error, e.g., in the real world scenario involving the discretization of the control and dynamics (see the left-hand side of Fig. 7.8).

7.5.I-B Robust Filter and Tracking-based Robustness

Instead of handling both safety and robustness just by the safety filter, we can further utilize a robust filter hierarchically to mitigate the burden of the safety filter in dealing with the disturbance, i.e., to robustly track the safe trajectory𝑥_𝑑 slightly modified from the reference trajectory 𝑥_𝑟 using the safety filter, as depicted in the right-hand side of Fig. 7.7. We still use the Lyapunov formulation as in the safety filter for robustness, but now the Lyapunov function is defined incrementally as 𝑉 =(𝑥−𝑥_𝑑)^⊤𝑀(𝑥−𝑥_𝑑), where𝑀 ≻ 0 is to be defined in the proceeding sections.

Improving the robustness performance here will simply result in closer tracking of the safe trajectory𝑥_𝑑 (see the right-hand side of Fig. 7.8) without losing too much information of the learned motion planning policy 𝑢^𝑖

ℓ. This is mainly because the safety is handled indirectly withℎ¤ ≥ −𝛼_ℎℎin the safety filter and directly withℎ ≥ 0 in the robust filter.

These observations imply that

(a) when the learning error is much larger than the size of the external disturbance, then we can use the safety filter and its build-in robustness, and

(b) when the learning error is much smaller than the size of the external disturbance, which is often the case, then we can modify the learned policy slightly with the safety filter in a nominal setting, handling disturbance using the tracking-based robust filter on behalf of the safety filter.

Note that we use the log-barrier formulation from now on instead of ℎ¤ ≥ −𝛼_ℎℎ, which allows for the distributed implementation of our safety filter in a multi-agent setting.

7.5.I-C Relationship with Existing Methods

The nonlinear robustness and stability can also be analyzed using a Lyapunov function, which gives a finite tracking error with respect to a given target trajectory under the presence of external disturbances, including approximation errors of given learned motion planning policies. This property can be used further to establish a safety guarantee, by utilizing a conservative constraint that a tube around the computed target trajectory will not violate a given safety requirement [17], [18], [20], [41], [42]. This framework thus provides a one way to ensure safety and robustness as in the LAG-ROS framework with Lemma 7.1 (see also [43]–[45] and references therein), which depends on the knowledge and the size of the approximation error of a given learned motion planning policy. Such information could be conservative for previously unseen data or available only empirically (e.g., by using a Lipschitz bound [46]).

The CLF-CBF control [39], [40], [47] also considers safety and robustness in real- time without any knowledge of the learned motion planning errors at all [39], [40], [47] (see also [30], [48]–[50] for stochastic and higher-order systems and [51]–

[54] for learning-based CBF methods), without solving nonlinear optimization as

in MPC-based methods for robustness and safety [41]. In our context, it constructs an optimal control input by solving a QP to minimize its deviation from the learned motion planning policy, subject to the safety constraint ℎ¤ ≥ −𝛼_ℎℎand the relaxed incremental stability constraint𝑉¤ ≤ −𝛼_𝑉𝑉 +𝜌, where𝜌 is for QP feasibility and𝑉 is now defined as𝑉 = (𝑥−𝑥_ℓ)^⊤𝑀(𝑥−𝑥_ℓ)for the learned trajectory𝑥_ℓ of Fig. 7.7.

We list key differences between the CLF-CBF controller and our approach, CART:

(a) The primary distinction between CART and CLF-CBF is the direction in which the respective tracking component steers the system. The CLF-CBF stability component steers towards the learned trajectory, which, because of learning error, might not be safe. In contrast, CART’s robust filter steers the system toward a certified safe trajectory, generated by the previous layer in the hierarchy, the safety filter. Because of this distinction, whereas CART can safely reject large disturbances with a large safety gain 𝛼_𝑉, this strategy is not practical for the CLF-CBF controller, which is forced to reject disturbances with large safety gain 𝛼_ℎ, which pulls the system overconservatively towards the interior of the safety set.

(b) The secondary distinction between CART and CLF-CBF is that CLF-CBF re- quires solving a QP with a given Lyapunov function, while CART provides an explicit way to construct the incremental Lyapunov function using contraction theory, and gives an analytical solution for the optimal control input. This makes CART end-to-end trainable with a faster computation evaluation time.

CART proposes a hierarchical approach to combine the best of both of these methods for safety and robustness, by performing contraction theory-based robust tracking of a provably collision-free trajectory, generated by a safety filter with the learned policy. The additional tracking-based robust filter is for reducing the burden of the safety filter in dealing with disturbances. CART can be viewed also as a generaliza- tion of the frameworks [2], [55] to multi-agent Lagrangian and general control-affine nonlinear systems with deterministic and stochastic disturbance, constructed on top of the end-to-end learned motion planning policy augmented with the optimality guarantee of the analytical solution to our safety and robust filters.

The trade-off of Sec. 7.5.I-B and the strengths implied in Sec. 7.5.I-C will be demonstrated in Chapter 10.

Dalam dokumen Contraction Theory for Robust Learning-based Control (Halaman 135-141)