Optimal Evasive Path Planning with Velocity Constraint

(1)

Optimal Evasive Path Planning with Velocity Constraint

Item Type Conference Paper

Authors Biswas, Karnika;Ghazzai, Hakim;Kar, Indrani;Massoud, Yehia Mahmoud

Citation Biswas, K., Ghazzai, H., Kar, I., & Massoud, Y. (2022). Optimal Evasive Path Planning with Velocity Constraint. 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). https://

doi.org/10.1109/apccas55924.2022.10090375 Eprint version Post-print

DOI 10.1109/apccas55924.2022.10090375

Publisher IEEE

Rights This is an accepted manuscript version of a paper before final publisher editing and formatting. Archived with thanks to IEEE.

Download date 2024-01-22 21:11:26

Link to Item http://hdl.handle.net/10754/691083

(2)

Optimal Evasive Path Planning with Velocity Constraint

Karnika Biswas¹, Hakim Ghazzai¹, Indrani Kar², and Yehia Massoud¹

1Innovative Technologies Laboratories, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia Email:{karnika.biswas, hakim.ghazzai, yehia.massoud}@kaust.edu.sa

2Indian Institute of Technology, Guwahati, India. Email: [email protected]

Abstract—Pursuit evasion is an important category of mobile robotics application related to surveillance, spying and gathering ambient information. This paper presents a novel optimal approach to evasion planning, considering physical limitations of the environment and the evader. The results show that the proposed formulation is applicable irrespective of the number of pursuing agents and the relative velocities of the pursuers and the evader, contrary to the traditional requirement that evasion strategies need to be configured according to situation-dependent cases. The proposed policy is generic and can be implemented in real-time by iterative optimization using model predictive controllers, the objective being avoidance of capture or at the least, maximizing the capture time.

Index Terms—Pursuit evasion, capture avoidance, capture time, miss distance, constrained velocity.

I. INTRODUCTION

Pursuit-evasion is a commonly used game-theoretic term, wherein a mobile agent is being tracked and intercepted by one or more pursuers. Pursuers are often synonymously interpreted as defenders that are agents entrusted to keep designated regions safe from invasion by external entities. On the other hand, the external entity or the invader is designed to reach the designated region and plan their trajectory intelligently such that it can avoid capture by the pursuer(s). However, the success rate of capture depends on several factors, such as the density of clutter, the number of pursers deployed, and the relative velocities of the pursuers and the invader [1]. The invader is programmed to exhibit elusive motion so that it can avoid being caught. In the worst-case, where capture is inevitable, the aim is to find a solution which ensures that the time-to-capture is maximized under given conditions. In this paper, we propose an optimal formulation for evasive motion planning. The proposed formulation has the advantage of being flexible enough to accommodate a variety of pursuer counts and speeds simultaneously. This optimal framework has been designed to account for the kinematic constraints of a real vehicle (invader) and the dynamic nature of the pursuit-evasion problem.

Existing references to pursuit-evasion mention pure-pursuit strategies adopted by the pursuers, in which the pursuers move along straight lines along their line of sight to the invader [2]. Subsequent research proposed inclusion of com- plex swarm-based strategies [3] that performed encirclement and capture of the invader from multiple directions [4]. The

traditional path planners for an invader rely on geometric assessments of the probability of capture and implement situation-specific control mechanisms [5], such as complete reversal of direction of motion, finding and speeding through open spaces in between a pair of pursuers, etc [6], [7]. Other researchers have used adaptive path planning using dynamic programming and deep reinforcement learning mechanisms to generate evasion guidance [8]. While dynamic programming involves heavy computational overhead in approximating the value function at each iteration, reinforcement learning has been found to require multiple training epochs for yielding satisfactory results. Du et al. [9] report a deep learning framework using the Monte-Carlo method that achieves superior evasion metrics over random motion and spinning trajectories of the invader. The reachability state-space approach [10]

is a relatively newer concept that has been found effective for local adaptations, often obtained by sacrificing optimality.

Intelligent evasion policies based on prediction of the pursuer’s motion is often termed as ‘counter guidance’, in reciprocation to the strategic navigation and guidance planning designed for pursuit vehicles. In several applications, agents are found to reverse their roles of predator and prey. Role reversal involves switching mechanisms which are challenging elements in the guidance/counter guidance formulation [11].

In [12], Da Silva et al. have explored a group intelligent path planning architecture for multi-agent invaders using de- centralized neural networks, considering the size of the pursuer group being smaller than the invader group. However, the said architecture is equipped to select and implement any one paradigm out of a set of predefined strategies, which constrains the flexibility of the approach. Strategies to manage the evasive lateral motion of a preceding vehicle have been studied in [13]

in the context of lane-keeping and lane-changing decisions for autonomous driving. Evasive manoeuver is also discussed with regards to safety protocols in multi-agent driving scenarios, with a special focus to defining, quantifying and assessing the safety metrics.

A two-on-one pursuit game is a most commonly encoun- tered scenario [14], wherein the pursuers are considered to have faster motion than the invader to ensure capture. Hayounn and Shima [14] observe that engaging two non-coordinating pursuers simultaneously against a single invader enhances the complexity of the evasive manoeuver over a one-on-one

(3)

pursuit because of the enlargement of the capture zone. How- ever, the zero-sum game formulation unrealistically assumes a constant speed of the agents and limits pursuit to only linear chase directions.

This paper presents a generic optimal framework for con- structing an evasion policy under kinematic limitations and dynamic nature of the environment, debarring situation-specific requirements like number of agents involved and the relative velocities of the pursuers and the invader. In Section II the evasion problem has been described in the light of a logarithmic penalty approach-based optimal control design. In Section III two simulation studies have been presented representing a one-on-one and a two-on-one pursuit evasion scenario. The results have been analyzed and the merits of the proposed optimal formulation have been explained with the help of the analysis. The key contributions of this paper and potential research directions pointed by the proposed formulation have been summarized in Section IV.

II. PROBLEMFORMULATION

In this paper, the pursuer(s) and the invader have been modeled as point masses with states represented by position variable qcomprising, positions along thexandy directions (with respect to the global reference frame) and control inputs represented by the forward velocity, v and heading, θ. The pursuer’s motion is denoted by suffix, P and that of the invader is denoted by suffix, E. The invader is designed to determine a dynamically-changing ‘virtual’ goal in its immediate (bounded) environment and move towards the goal in an attempt to evade capture. It is assumed that the invader is allowed to make a local prediction of the pursuer’s intent and compute one or more capture-free reachable states corresponding to the virtual goal, under given constraints.

The environment is assumed to have nobstacles that both the pursuer(s) and the invader need to avoid in course of navigation. These obstacles may be static or mobile and are assumed to be convex-shaped for ease of demonstration. It is also assumed that the obstacles can be detected, identified and localized at predetermined intervals using appropriate sensing devices, with possibilities of occasional bounded disturbances.

The invader is also equipped to detect and identify the pursuer(s) at the same sampling rate. The states of the obstacles, invader and the pursuers are read by the path planner for the invader, wherein the path planner locally estimates the motion of the obstacles and the pursuers to generate approximate future projections of their expected trajectories. A number of existing technologies can be used to generate these projections, some being the Kalman filter, non-linear predictors, observers and polynomial extrapolators. This paper uses the Neville- Aitken algorithm, which is an efficient non-recursive tool with low computational demand and the has the ability to refine the predictions over the number of iterations. These projections provide the necessary boundary values for finding solution to the optimal evasion problem and computation of the optimal trajectories and control inputs.

The pursuer’s objective is to track the invader and intercept or capture it within a pre-designed time, t_f −t_i, t_i being

the start time and t_f being the terminal time. Whereas, the invader’s objective is to avoid being captured by the pursuers within the pre-designed time. We have assumed that there can be m pursuers at most at any time, t ∈ [ti, tf]. The entire process of pursuit and evasion can be performed over multiple iterations using model predictive controllers (MPC) with a receding horizon approach. MPC offers an iterative optimization scheme wherein the first input is applied from the computed control sequence in each iteration. The iterations continue until the invader gets captured or a time-out occurs.

Typically, timeout refers to the total pursuit time or the number of iterations exceeding a predetermined threshold without a successful capture. In many cases, the timeout threshold is heuristically chosen depending upon the scenario. Iterative optimal path generation during run-time is a real-time process which can be practically realized using MPC.

Here we are interested in the optimal navigation plan from the invader’s perspective wherein, the cost functional for the optimal control policy has been designed according to equation (1).

max

vE,θE

Jevasion

J_evasion= Z tf

ti

1 +

n

X

1

w_avln(d²_obstacle

i−d²_th) +

m

X

1

wevln(d²_pursuer_j−d²_th) + ln(vmax−vE)

! dt (1) Total cost is a combination of multiple objectives. Integral of the first term of the cost function, Rt_f

ti dt indicates max- imization of time-to-capture. The second term optimizes the distance between the invader and the obstacles, while the third term refers to the distance function between the invader and the pursuers. The weighted distance functions mentioned in the second and the third terms have been modeled as non-quadratic barriers that generate logarithmically-escalated penalty upon events close to violation. The last term of the cost function refers to limiting the velocity of the invader to values less than or at most equal to the specified upper-bound. Detailed discussion follows in subsequent paragraphs. Total cost to be maximized is subject to kinematic constraints given as follows:

˙

xE=vEcos(θE)

˙

y_E=v_Esin(θ_E) (2) and velocity upper bound of the invader is given by:

vE≤vmax (3)

where, the i^th pursuer’s states,x_Pi andy_Pi are either measured or estimated. Likewise, the pursuer is also assumed to receive regular updates on the obstacles and the invader’s states, the measurements containing Gaussian white noise of known variance.

In equation (3),vmaxrepresents the velocity upper bound of the invader as imposed by its kinematic limit. The heading of the vehicle is unconstrained. Using two states and two control parameters minimizes the dimension of the problem and aids in ensuring existence of optimal solution and speeding up

(4)

computation, especially when several pursuers, obstacles and invader are active simultaneously. Efficiency of the kinematic model, however, is not the focus of the current discussion and hence is not a limitation to the proposed optimal formulation.

In fact, the solution to the optimal problem generates a guidance trajectory for the invader, which can be followed as closely as possible with a feedback-based tracking controller.

Thereby, the actual trajectory of the invader will be sub- optimal and will also be able to accommodate unmodeled uncertainties and disturbances.

It may be noted that, the terms d_obstacleiandd_{pursuer j} are distance functions between thei^thobstacle and the invader and thej^thpursuer and the invader respectively. The distance function may be modelled as ann-norm, wheren∈ {1,2, ...,∞}.

The distance functions are delimited by a threshold distance, dth which marks the minimum distance that is required to be maintained between the invader and the obstacle or the pursuer for vehicle safety. The threshold distances may be different for different agents and are selected such that the point mass assumption of the agents can be compensated by assigning suitable extent (size), while maintaining a chosen safe separation. In this paper we have modeled the distance functions as given in equation (4).

dobstacle=||qobstacle−qE||2

d_pursuer =||q_P −q_E||₂ (4) The reader’s attention is hereby drawn to the logarithmic ex- pressions inside the payoff function mentioned in equation (1).

Contrary to traditional quadratic functions, the logarithmic function provides multi-point barrier to prevent violation of safety and generates penalty at the onset of an imminent violation. This penalty increases logarithmically with increasing level of violation and thereby ensures collision-free motion.

This unique usage of the logarithmic barrier essentially con- firms that if any of the obstacles or the pursuers make an attempt to move any closer than the corresponding threshold distance, a penalty is imposed on the invader’s motion, which helps to drive the invader away from the pursuers and the obstacles. However, the penalty function is not ‘aggressive avoidance’, a feature that enables the invader to navigate through cluttered and dynamically changing environment. This feature is novel to existing evasion strategies.

The role of the weights,wav andwevare important to note in this context. These weights are real positive numbers that assign relative priorities of avoidance of the obstacles and the pursuing agents by the invader. In this paper the weights have been selected by trial, which opens up an interesting research domain that computes and assigns the weights automatically during run time. In fact, the weights need not remain constant over all iterations. The relative priorities can be reconfigured in every iteration by adjusting the weights. For example, as mentioned in existing literature, an invader may choose to ‘hide’ behind an obstacle to evade being captured by a pursuer. Therefore, the relative penalty of approach between the pursuer and the chosen obstacle has to be set to different values so that the trajectory can be adjusted accordingly.

2 3 4 5 6

Distance along x-axis (m)

2

4 6 8 10

Distance along y-axis (m)

invader's trajectory with evasion pursuer's trajectory

invader's trajectory without evasion Obstacle-1

Obstacle-2

interception after 5s

Fig. 1: Tracking and Interception as in a one-on-one game. With proposed evasion strategy interception fails before time-out.

Flexibility of the proposed formulation is an important point of merit which leaves ample scope of further study.

The velocity upper-bound of the invader has been incor- porated into the cost functional and the optimal problem has been solved as an unconstrained control problem. Using the maximum principle [15] the optimal formulation has been de- veloped and thereby converted into a set of simultaneous non- linear ordinary differential equations (ODE). Such problems usually have numerical solutions computed by appropriate ODE solvers that rely on multi-point collocation techniques.

III. RESULTS ANDDISCUSSIONS

The objective of this study is to evaluate the effectiveness of the proposed evasion strategy under kinematic and space- related limitations. The first simulation study portrays a one- on-one pursuit problem, where the environment contains two static obstacles located at (3.9,4)m and (2.9,3)m. Applying the concept of the invader making an effort to hide behind an obstacle to avoid/delay capture has been utilized here. The virtual goal for the invader is selected at a position behind the first obstacle. The predefined time-to interception for the pursuer is assumed to be5s, as an instance. In a previous work [16], we have mentioned that a free horizon problem may not have a solution for all possible initial configurations and this is why, a fixed finite horizon is preferred. The difference between the referred case and the one currently being discussed is that, the final states of the invader are difficult to estimate because of the ‘evasion’ policy, which makes it further difficult to ascertain a suitable fixed horizon. A useful solution for this issue is to generate optimal solutions iteratively using receding horizon approach. For ease of analysis, we have focused on a single iteration span and hence, the time-out duration is also chosen to be 5s, in order to establish relevance with respect to the iteration span. In practice, once the predefined horizon is exhausted, the horizon can be extended by the same magnitude as the earlier horizon wherein, tracking and evasion tasks can be renewed until capture is achieved or another timeout occurs. Since the entire control sequence is not used by MPC, it ensures execution in real-time.

In this study, the invader is pre-conditioned with the same horizon as that of the pursuer. This case study presents a

(5)

0 1 2 3 4 5 Time (s)

0 1 2 3 4

Velocity (m/s)

pursuer velocity invader velocity

Fig. 2:Interception failure registered even if the pursuer can achieve higher velocity than the invader and under constrained invader kinematics.

2 3 4 5 6 7 8

Distance along x-axis (m) 2

3 4 5 6 7

Distance along y-axis (m)

trajectory of pursuer-1 trajectory of pursuer-2 obstacle-2 (static) obstacle-1(mobile) capture-safe zone trajectory with evasion

Fig. 3: Capture attempted in a two-on-one pursuit, in presence of obstacles and velocity limitations on the invader.

scenario where a successful capture is equivalent to interception achieved before the horizon is exhausted. Contrary to the results discussed in [16], the optimal pursuit with fixed horizon fails to capture the invader before timeout, as shown in Figure 1. The evasion strategy incurs a miss distance of0.55m, which also satisfies the threshold condition of safe distance of separation, predetermined at a value of 0.3m. It may be recalled here, that tracking and capture by the pursuer does not refer to retracing of the invader’s trajectory but to generate an optimal path that ensures interception of the invader at the end of the predefined horizon. In this simulation study, we have designed the state feedback received by the pursuer to be a 5^th degree polynomial estimation of the invader’s motion.

In practice, the feedback is measured from visual or range sensors. Note that, The pursuer’s velocity is not constrained, while the invader’s velocity has been limited to a maximum magnitude of 2m/s. Figure 2 demonstrates that the velocity barrier for the invader is maintained during the navigation. It also indicates that although the pursuer is allowed to attain a higher speed than the invader, the pursuer fails to realise a successful capture due to the unique evasion strategy proposed in this paper.

The second simulation study represents a two-on-one game situation, where both pursuers are pre-conditioned in a similar fashion as the one-on-one case. The pursuers start tracking

0 1 2 3 4 5

Time (s) 0

2 4 6

Distance of separation (m)

separation with pursuer-1 capture-safe limit separation with pursuer-2

Fig. 4:Distance of separation between the pursuers and the invader increase, followed by a decrease as the pursuers attempt capture.

Timeout observed before the pursuers enter the capture-safe zone.

from locations indicated by(2,7)m and (5,6)m, intending to capture the invader from different directions. The first obstacle exhibits a linear motion oriented at an angle ofπ/3radians at a speed of0.1m/s. As observed in Figure 3, initially only the first pursuer is detected by the invader and the optimal trajectory of the invader is inclined to evade away for the first pursuer.

At a later stage, between2s-3s the second pursuer is detected by the invader and the trajectory is optimally configured to evade both the pursuers. In conformation, the distance of separation of the invader from the first pursuer increases steadily before aggressive evasion slows down beyond the2.5s mark, when the second pursuer poses a higher threat. Distance of separation between the second pursuer and the invader rises sharply before reducing under the tracking effect. The capture- safe zone is assumed to a neighbourhood of dimensions similar to the safe distance of separation around the position of the invader. None of the pursuers reach the capture-safe zone as indicated by Figure 4 before timeout occurs.

Eventually, capture may occur in the next iteration, but this substantiates our claim of increasing the time-to-capture in situations where capture in unavoidable. Note that, fixed horizon optimal pursuit with unconstrained pursuer velocities usually ascertain successful interception, more so because of the velocity upper bound levied on the invader’s motion. In this case, the pursuer’s velocity have not been predefined to any definite values but computed optimally according to our previous work [16]. If the pursuers are driven at their maximum speeds, the capture dynamics may be altered to favour the pursuit and capture. However, it is evident from the preceding figures that the proposed evasion policy helps to delay the capture. Figure 5 shows that the invader’s velocity is delimited to a preset maximum value of 2m/s, which is less that the maximum velocity attained by the pursuers.

However, the versatility of this approach is that, it can be applied to evade multiple pursuers with different dynamic ca- pabilities without specifying parameters like relative velocities and approach angle. The proposed evasion strategy is fully compatible to handle space constraint associated with clutter in the environment and operating with limited control effort.

Figure 6 indicates that the instantaneous magnitude of the cost function increases rapidly after 3s under the effect of both

(6)

0 1 2 3 4 5 Time (s)

0 1 2 3 4

Velocity (m/s)

pursuer-2 velocity pursuer-1 velocity invader's velocity

Fig. 5:Comparative illustration of velocities attained by the pursuers and the invader.

0 1 2 3 4 5

Time (s) 0.6

0.7 0.8 0.9 1 1.1

Magnitude of cost function

Fig. 6: Time-evolution of the proposed cost function under optimal control inputs.

pursuers and shows a decrement only because the simulation study is terminated by the end of the predefined horizon. The adjustment of control inputs to enhance the cost after the 3s mark is a definite indicator of flexibility offered by the proposed cost function design, when the second pursuer is accommodated.

IV. CONCLUSIONS

This paper presents a novel approach to solving pursuit- evasion game problem wherein evasion path planning is performed optimally and is focused to avoid or delay capture of an invader by pursuer(s). Unlike existing methods, the proposed strategy is flexible to accommodate variable number of pursuing agents and the optimal formulation is capable to handle pursuers that have different relative velocities and approach angles with respect to the invader. Moreover, the proposed controller design shows that effective evasion can be achieved even when kinematic constraints are applied to the invader’s motion. The logarithmic penalty function offers a non-aggressive evasion profile that can be used safely in presence of various static and mobile obstacles in the environment. As an extension to the current work, the flexibility and robustness of the proposed architecture can be evaluated by studying the effects of online adaptation of the priority weights, especially under swarm-attack.

REFERENCES

[1] H. Fu and H. H.-T. Liu, “An isochron-based solution to the target defense game against a faster invader,”IEEE Control Systems Letters, vol. 6, pp.

1352–1357, 2022.

[2] A. Von Moll, Z. Fuchs, and M. Pachter, “Optimal evasion against dual pure pursuit,” in2020 American Control Conference (ACC), 2020, pp.

36–43.

[3] E. Garcia and S. D. Bopardikar, “Cooperative containment of a high- speed evader,” in2021 American Control Conference (ACC), 2021, pp.

4698–4703.

[4] K.-M. Ramana, M.V., “Pursuit-evasion games of high speed evader,”J Intell Robot Syst, vol. 85, pp. 293–306, 2017.

[5] W. Li, “A dynamics perspective of pursuit-evasion: Capturing and escaping when the pursuer runs faster than the agile evader,” IEEE Transactions on Automatic Control, vol. 62, no. 1, pp. 451–457, 2017.

[6] J. Sz˝ots and I. Harmati, “Optimal feedback strategy of a superior evader passing between two pursuers,” in2020 23rd International Symposium on Measurement and Control in Robotics (ISMCR), 2020, pp. 1–6.

[7] J. Ko, J. Jang, and C. Oh, “A multi-agent driving simulation approach for evaluating the safety benefits of connected vehicles,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 5, pp. 4512–4524, 2022.

[8] Q. Qi, X. Zhang, and X. Guo, “A deep reinforcement learning approach for the pursuit evasion game in the presence of obstacles,” in 2020 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2020, pp. 68–73.

[9] R. Du, J. Liu, L. Zhang, and J. Li, “Intelligent counter guidance regulated by deep reinforced learning,” in2020 IEEE 7th International Workshop on Metrology for AeroSpace (MetroAeroSpace), 2020, pp. 60–65.

[10] A. Chaudhari and D. Chakraborty, “A time-optimal feedback control for a particular case of the game of two cars,”IEEE Transactions on Automatic Control, vol. 67, no. 4, pp. 1806–1821, 2022.

[11] J. Zhao, C. Yang, W. Wang, B. Xu, Y. Li, L. Yang, H. Zhu, and C. Xiang,

“A game-learning-based smooth path planning strategy for intelligent air-ground vehicle considering mode switching,”IEEE Transactions on Transportation Electrification, pp. 1–1, 2022.

[12] T. Franc¸a da Silva, M. Santos Ara´ujo, R. J. Campos Ferro Junior, L. Ferreira da Costa, J. P. Bernardino Andrade, G. A. Lima de Campos, and J. Celestino Junior, “Improving the behavior of evasive targets in cooperative target observation,” in2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), 2020, pp. 36–41.

[13] C. Kim, H. Chae, and K. Yi, “Lateral motion planning for evasive lane keeping of autonomous driving vehicles based on target prioritization,”

in2021 IEEE International Intelligent Transportation Systems Confer- ence (ITSC), 2021, pp. 538–544.

[14] S. T. A. Hayoun, S.Y., “A two-on-one linear pursuit–evasion game with bounded controls,”J Optim Theory Appl, vol. 174, pp. 837–857, 217.

[15] Optimal Control: Lewis/Optimal Control 3e. John Wiley and Sons, 2012.

[16] K. Biswas, I. Kar, and E. Feron, “Intent-aware optimal collision avoidance and trajectory planning for a pursuit vehicle,” Robotica, vol. 40, no. 8, p. 2505–2526, 2022.