Receding-horizon RRT-Infotaxis for autonomous source search in urban environments

(1)

Contents lists available atScienceDirect

Aerospace Science and Technology

www.elsevier.com/locate/aescte

Receding-horizon RRT-Infotaxis for autonomous source search in urban environments

Seulbi An, Minkyu Park, Hyondong Oh

^∗

DepartmentofMechanicalEngineering,UlsanNationalInstituteofScienceandTechnology,Ulsan,RepublicofKorea

a rt i c l e i n f o a b s t r a c t

Articlehistory:

Received9March2021

Receivedinrevisedform20November2021 Accepted5December2021

Availableonline10December2021 CommunicatedbyChoonKiAhn

Keywords:

Autonomoussearch SourceTermEstimation(STE) Bayesianinference

SequentialMonteCarlomethod Rapidly-exploringrandomtrees Information-theoreticsearch

In emergency situations such as hazardous gas leak, search and estimation for identifying source information,knownassourcetermestimation(STE),inatimelyand accuratemannerisofsigniﬁcant importance.In realworld situations, obstacles suchas buildingsor barriersnot onlyblock the path forsearchbutalsointerferetheﬂowofthegassource.Forautonomoussource searchandestimation usingamobilesensorinsuchobstacle-richenvironments,thispaper proposesaninformation-theoretic STEapproachbycombiningawidely-used Infotaxiswiththe rapidly-exploringrandomtrees (RRT).In particular,theproposedstrategyutilizesthereceding-horizonRRTconceptwithanewlydesignedutility functionfordeterminingthenextmaneuverofamobileagenttogetthebestinformationofthesource whileavoidingobstacles inurbanenvironments. Numerical simulationsinvariousenvironmentsshow thesuperiorperformanceoftheproposedapproachcomparedwiththeoriginalInfotaxismethod.

©²⁰²¹^Elsevier^Masson^SAS.^All^rights^reserved.

1. Introduction

Autonomoussourcesearch andestimationhasavariety ofap- plications across disasters, accidents or terrorism situations. For instance,chemical,biologicalorradiologicalmaterialscouldbere- leasedintotheatmosphereaccidentallyorintentionally.Whenthis happens,localizingasourcelocationandquantifyingsourceinfor- mationsuchasareleaserate,calledsourcetermestimation(STE), is the primary taskto take proper actions forpreventing further damages andevacuate people fromdangerous regions [1].In the past, alarge numberof pre-installed sensorsare utilized forcol- lectingsensingques.However,thisapproachrequireshighcostand has a spatial limitation when the accidents occur inunexpected areas.Thus,withrecentdevelopmentsonautonomoussystems,using mobilesensors suchasan unmannedaerialvehicle (UAV)for STEhasbeengainingpopularity[1–13].Usingamobilesensoral- lowstocoveralargeareaandtoberapidlydeployedanywhere it isneededinatimelymanner.

DiverseapproachesforSTEusingmobilesensorsare proposed, whichcouldbelargelycategorizedasbio-inspired,gradient-based and information-theoretic methods [2]. Bio-inspiredmethods are inspired by the behaviors of the living creatures such as moth, bacteria orﬂy[3–5]. Thesemethods mimic andcombinethebe- haviorsoffoodsearchingstrategiessuchassurge,cast,zigzagand

*

Correspondingauthor.

E-mailaddress:[email protected](H. Oh).

spiral. However, in reality,the sensing ability and locomotionof mobile sensors is far from those of living creatures [4], which makestheimplementationandperformance ofthisapproachlim- ited.Ingradient-basedmethods[14],themobileagentmovestoa higherconcentrationpoint,assumingthattheactualplumemodel producesacontinuousandsmoothgradientﬁeld.Thisassumption isoften invalid ina real plume dispersion situationespecially in a turbulent ﬂow. Lastly, information-theoretic methods on which thisstudy focuses are basedon probabilistic strategies. Theyuse theutility function withtheentropyor uncertaintytodetermine thebestnextactionofthemobileagentsuchasInfotaxis[6,7,9,15]

orEntrotaxis [8]. It employs a dispersion model, a sensormodel andmeasurements to estimate the sourceterm usinga Bayesian framework.Thesestrategiesperformwelleveninaturbulentﬂow wherethesensingquesaresparseorﬂuctuate.

Itis worthwhile notingthat,manyof theexisting STEstudies assume that the source search mission is conducted in an open spaceratherthan obstacle-ﬁlledenvironments andthe maneuver ofthe mobileagent is limitedin adiscrete domain [6–10,16,17].

Inapractical sourcesearch situation,thesearchingenvironments usually contain buildings or barriers which affect the dispersion phenomena and hinder the search process of the mobile agent [18]. Also, as the movement of the mobile agent in one step is limitedin discrete directions(e.g., left, right, up, anddown), the estimation performance is affected by the grid resolution of the environment and action candidate (i.e., movement step size). If the resolution of the grid is small, the estimation performance https://doi.org/10.1016/j.ast.2021.107276

1270-9638/©²⁰²¹^Elsevier^Masson^SAS.^All^rights^reserved.

(2)

couldbeenhancedbutthesearchingtimeandcomputationalloads wouldincreasesigniﬁcantly.

Severalstudiesconsidering complexenvironmentsinSTEhave beenstudied in[19–28]. Risticetal.[21,22] assumethat theen- vironmentisrepresentedasagridmap andobstaclesare located atthe2-Dlatticepoints sothatonlypassablelinksareremained.

However,thisassumptionisfarfromreality.Theobstaclesareusu- ally located randomly as well as the lattice sizes mainly affect the performance of STEin real situations. If the one-step looka- headapproach,alsoknownasthereactiveplanningalgorithm[29], such asInfotaxis [7,9] is used in obstacle-rich environments, local optima in which the mobile agent is stuck at the corner of the obstacle occurs. Some studies suggest the obstacleavoidance mechanismto reduceunnecessary maneuver ofthe mobileagent [23,24].However,theseapproachesrequireaknownglobalmapof theenvironmentandarehighlyaffectedbytheobstacleconﬁgura- tionandgridresolution.

Withabovebackgroundsinmind,thispaperproposesamulti- step lookahead source search strategy which belongs to deliber- ative planning algorithms [29,30]); this approach generates the efficientpathinanobstacle-richandcontinuousdomainbyintro- ducing therapidly-exploringrandomtrees (RRT)[31] asthelocal pathplannerforsourcesearch. ThegeneratedtreefromRRTina recedinghorizon(RH)mannerisconsidered asfeasiblelong-term future paths for source search. Note that the proposed receding horizon RRTapproach still requires a localmap around the current positionof themobile agent; this could be readilyavailable usingcameraorLiDARsensors[32].Tosuccessfullydecidetheef- ficient search action forthe mobile agent, a newutility function is introduced. The utility function consists oftwo parts: i) max- imizing theentropy reduction for exploration andii)minimizing the path length to thesource location forexploitation. For max- imizing the entropy reduction which generally encourages more exploration,theexistingInfotaxisutilityfunctionwiththeparticle filterisadopted[7,33].Notethattheparticlefilterisutilizedtoes- timatetargetstatesinhighlynonlinearandnon-Gaussian systems [34–37]. As the proposed approach with the particle filter considers multi-step lookaheadstates, the computational load could become intractable. Weaddress thisissueby replacing theorigi- nal Gaussiansensor modelwiththebinary sensormodel,similar to[38].Forminimizingthepathlengthforsourcesearch,thepath length is calculated by summing up the length fromthe current state tothefinal futurestateofthefeasiblebranchfromtheRRT andthelengthfromthefinalfuturestatetothegoalposition,that is,theestimatedsourcelocation.

The maincontributionofthispaperis twofold.First, withthe RRTandreceding-horizonconcept,theproposedalgorithmgener- atesthenext bestactionofthemobileagentinacontinuous domain whileavoidingobstacles. Second, theproposed utility func- tionresultsintheeﬃcientpathintermsofsearchtime andsuc- cessratewhile providingaccurate estimationofthe sourceterm.

Inotherwords,itfacilitatesbettersearchbyautonomouslyﬁnding the right balance between exploration and exploitation. To vali- date theenhanced performance ofthe proposedstrategy, various numericalsimulationsareconductedandcomparedwiththecon- ventionalInfotaxis[7].

Therestofthepaperisorganizedasfollows.InSection 2,the gas dispersion model and the sensor model are explained. Sec- tion3explainstheestimationprocess basedontheparticleﬁlter and the existing Infotaxis algorithm which is a base of the pro- posedapproach.InSection4,theproposed receding-horizonRRT- InfotaxisapproachwithabriefdescriptionontheRRTispresented.

Thedetails ofsimulationsetup andresultsaregiveninSection5.

Lastly,conclusionsandfutureworkaresummarizedinSection6.

2. Gasdispersionandsensormodeling

In this section, mathematical formulations of the gas disper- sionmodelandthesensormodelareexplained.Inparticular,the Gaussianplumemodelisadoptedasthedispersionmodelandthe Gaussiannoisesensormodelisutilized.

2.1. Dispersionmodel

TheGaussiandispersionmodel[39,40] assumesthat thegasis dispersedfromthepointsourceinthedownwind,crosswind,and vertical directions where positive x and y directions are aligned withthedownwindandcrosswinddirections,respectively. Inthis model, the source is released continuously from the gas source originr0= [^x0,y0,z0]^T∈R³⁺ with therelease rateof Q0∈R⁺ and the mean gas concentration at the sensing location r_k= [^rx,k,r_y_,_k,r_z_,_k]^T∈R³⁺attimestepkisformulatedas

R

(

r_k

|θ ) =

^Q⁰ 2

π

V

σ

y

σ

z

exp

−

ck2

2

σ

y2

×

exp

− (

r_z_,_k

−

^z0

)

²

2

σ

z2

+

^exp

− (

r_z_,_k

+

^z0

)

² 2

σ

z2

,

(1)

where thesource term, θ, indicates the parameters which isre- quiredtomodelthegas dispersionsituationsuch assourceloca- tion,windspeed,andgasreleaserate.c_kisthecrosswinddistance ofthesensing location fromthesourceand V is themeanwind velocity.Thestandarddeviationsofconcentration

σ

yand

σ

zinthe crosswindandverticaldirectionaredeﬁnedas

σ

y

= ζ

1d_k

/

1

+

⁰

.

0001d_kand

σ

z

= ζ

2d_k

/

1

+

⁰

.

0001d_k (2) wheredk isthedownwinddistancefromthesource.ζ1 andζ2are thestochasticdiffusiontermsinthecrosswindandverticaldirec- tions,respectively.Inthisstudy,weassumedthemobileagentﬂies at11m abovethegroundandthegasstacksatthesameheight.

Therefore,fromthefollowingsections,weonlyconsidera2-Den- vironmentwherethelocation ofthe mobileagent attime stepk isdeﬁnedasr_k= [^rx,k,r_y_,_k]^T∈R²⁺.

2.2.Sensormodel

Sincetheactualgasconcentrationfollowsastochasticprocess, thesensormodelismodeledbytheGaussiandistribution[41].The probabilitydensityfunctionoftheGaussiansensormodelwiththe noisestandarddeviation

σ

g,kattimestepkisformulatedas:

p

(

z_k

|θ ) =

N

(

z_k

;

^R

(

r_k

|θ ), σ

_g²_,_k

) =

¹

σ

_g_,_k

^√

2

π

^exp

⁻

(

z_k

−

^R

(

r_k

|θ ))

2

σ

_g²_,_k

(3) wherethemeasurementattime stepkisrepresentedasz_k∈R⁺ andits meanvalueisobtainedbyEq. (1).Thestandard deviation ofthenoiseisexpressedas:

σ

g,k

= α

R

(

r_k

|θ ) + σ

env (4) wherethenoiseisproportionalto R(r_k|θ )withthepositive coef- ﬁcient

α

and

σ

env representstheadditionalsensing noisecaused bytheenvironmentsuchaswinddisturbanceandtemperature.

ThegasdispersionwiththeGaussiandispersion modelformu- latedbyEq. (1) isshowninFig.1(a).Corresponding sensormea- surementsfromthegasdispersionaresparse andnoisyasshown inFig.1(b).Dueto thisaspect,just moving towards highermea- surementvalue graduallyforlocalizing thesource(known asthe

(3)

Fig. 1.The sample map of the gas dispersion model and sensor measurements.

gradient following method) does not work well in a real world;

hence,the information-theoreticmethodwhich willbe explained in thefollowing section isproposed to providemore robustper- formanceagainstsensornoisesanddisturbances.

3. Information-theoreticsourcesearch

The information-theoretic STE approach has an advantage for robustestimationeveninaturbulentenvironmentwherethegra- dient of source concentration is not continuous or smooth. Info- taxis is one ofthe information-theoretic source search strategies for source localizationor source term estimation usinga mobile agent [6–9]. The Infotaxis utilizes the entropy (i.e., uncertainty) to calculate the utility function forthe source terminformation.

By computing the reduction of the entropy for all possible next positions based on cardinal movement directions(in general, up, down,left, andright),theactionwiththemaximumreduction is selectedasthenextmovementstepofthemobileagent.Fromthe following,Infotaxiswiththe particleﬁlterisexplained mostlyby following [7] but withsome modiﬁcationsincludingthe useofa discretizedsensormodel.

3.1. Sourcetermestimation

The source term represents parameters required to model the gas dispersion such as source location, wind speed, release rateandstochastic diffusionterms. Amongthose parameters, the source location and the release rate are key factors to be esti- mated. Thus, inthis paper, the source term vector is deﬁned as θ= [^x0,y0,Q0]^T = [^r0,Q0]^T∈R³⁺, the 2-D source location and release rate, while other parameters are assumed to be known.

Other parameterscanbeacquiredbythemeteorologicaldataand gas properties. In many studies on source term estimation, this assumptionisgenerallyaccepted[7–10].

The particleﬁlterbased onBayesian frameworkis utilizedfor estimating the source term since the source term distribution is generallyhighlynon-linearandnon-Gaussian.Theposteriorprob- abilitydensityfunction(PDF)oftheestimatedsourcetermattime stepk,θk,isrepresentedby:

p

(θ

k

|

^z1:^k

) =

^p

(

zk

|θ

k

)

p

(θ

k

|

z1:^k−¹

)

p

(

z_k

|

^z1:^k−¹

)

⁽⁵⁾

where p

(

z_k

|

^z1:^k−¹

) =

p

(

z_k

| θ

_k

)

p

(θ

_k

|

^z1:^k−¹

)

d

θ

_k

.

(6) TheposteriorPDFisobtainedbyusingthepriorPDF p(θk|^z1:^k−¹), thelikelihood(i.e.,sensormodel)p(z_k|θk),andthemarginallikeli- hood p(z_k|^z1:k−1).Notethatthesensingques(i.e.,measurements),

collected by the mobile sensor located at r_k, are expressed as z1:^k= {^z1(r1),z2(r2),· · ·,zk(rk)}^.

ToobtaintheanalyticalsolutionofEq. (5) isimpossibledueto thenon-linearityofthesourceterm.Thus,theparticleﬁlterunder theSequentialMonteCarloframework[35] isutilized.Therandom samplescalledparticlesareemployedtoapproximatetheposterior PDFp(θk|^z1:^k).Withi-thestimatedstateofthesourcetermθ_kⁱ and its associated weight wⁱ_k attime step k,Eq. (5) is approximated by:

p

(θ

_k

|

^z1:k

) ≈

N i=¹

wⁱ_k

δ(θ − θ

_kⁱ

)

(7)

whereδ(·)îndicates ^the ^Dirac ^delta^function ând ^N îs ^the ^num- berofparticlesintheparticlefilterwhichrepresentsthepotential sourceterms.Theparticlefilteriterativelyupdatesitsweightswith Eq. (5). By assuming that the source term is time-invariant (i.e., stationarysource location andconstant releaserate), p(θk|^z1:^k−¹) ismadeequaltothePDF p(θk−¹|^z1:^k−¹).Inother words,theprior distribution at time step k is represented by the posterior distribution at time step k−^1. ^Besides, âs^the ^marginal ^likelihood p(z_k|^z1:^k−¹)isnotaffectedbytheparticles,thistermisconsidered as a constant normalization factor when updating the weights.

Hence,theunnormalizedi-thparticleisupdatedby:

¯

wⁱ_k

=

^p

(

z_k

|θ

_kⁱ₋₁

) ·

^w_kⁱ₋₁

,

(8) wherethelikelihood p(z_k|θ_kⁱ₋₁)can be obtainedbythe Gaussian dispersionmodel inEq. (1) andthe sensormodel inEq. (3). The normalizedweightiscomputedas:

wⁱ_k

=

^w

¯

ⁱ_k

_N

j=¹w

¯

_k^j

.

(9)

In this study, to sample the particles, the importance sampling (IS) is introduced [42]. It samples particles by following the importance (proposal) distribution. For the importance distribution, the estimatedsource term distribution at the previous step,expressedby p(θk−1|^z1:k−1),isadopted similarto other rel- evantstudies[7–9].Inaddition,toprevent adegeneracyproblem caused by IS, which indicates that only a small number of par- ticleshavenon-zeroparticle weights,we adoptedthe resampling method [36]. The effectivenumber of samples, N_{e f f} ≈ N ¹

i=1(wⁱ_k)² arecomputedateveryiteration,andwhenNe f f goesbelowacer- tainvalue, theparticles areresampled[43].Inaddition,since we assumedthatthepriordistributionattimestepkisrepresentedby theposteriordistributionattimestepk−^{1 as}^mentioned^above,^it mightleadtoalackofparticlesdiversity;hence,theMarkovchain Monte Carlo(MCMC) move step is additionally conductedto in- creasethediversityoftheparticlesaftertheresamplingstep[44].

(4)

3.2. Infotaxis

Infotaxis is one of the information-theoretic strategies for source term estimation. It locally maximizes the expected information gain which indicates the reduction in the entropy (i.e., uncertainty) of theestimated source termdistribution. It utilizes theinformation-theoreticutilityfunctionformulatedby:

I

(

u_k

) =

^p

(

r_k₊₁

)

H_k

− (

1

−

^p

(

r_k₊₁

))(

E

[ ˆ

^Hk+¹

(ˆ

^zk+¹

)] −

^Hk

)

⁽¹⁰⁾

where H_k

= −

p

(θ

_k

|

^z1:k

)

logp

(θ

_k

|

^z1:k

)

d

θ

_k

.

(11) TheShannon’sentropy,Eq. (11),isadoptedtocompute theutility function.Theprobability p(rk+¹)representstheprobabilitywhere theactualsourcelocatedatthenextsensingpointr_k₊₁(=^rk+^uk).

Themaneuverofthemobileagentisdefinedinadiscretedomain where its admissible actionset is definedas U_I= {↑,↓,←,→}^. Theprobability p(r_k₊₁)ismeaningfulonlywhentheactualsource isexactlylocatedatrk+¹ andcomputingthistermrequiressignifi- cantcomputationalburden[7].Hence,asimplifiedutilityfunction termedasInfotaxisIIignoringthefirstterminEq. (10) isutilized, givenas:

I

(

u_k

) =

^Hk

−

^E

[ ˆ

^Hk+¹

(ˆ

^zk+¹

)].

⁽¹²⁾

Accordingto Eqs. (7)∼^(9), ^the^posterior^PDF, ^p(θk|^z1:k),can be replaced bytheweights oftheparticleﬁlter.Thus,theShannon’s entropyisable tobeapproximatedusingthenormalizedweights inEq. (9) as:

H_k

≈ −

N i=¹

wⁱ_klogwⁱ_k

.

(13)

The second term in Eq. (12) indicates the expected Shannon’s entropy with the future measurement ˆzk+¹ at the next sensing position r_k₊₁. Thus, the expected entropy at time step k+^{1 is} formulatedas:

E

[ ˆ

^Hk+1

] =

p

(ˆ

z_k₊₁

| θ

_k

)

H

ˆ

_k₊₁

(ˆ

z_k₊₁

)

d

ˆ

z_k₊₁ (14)

where

H

ˆ

_k₊₁

(ˆ

z_k₊₁

) = −

p

( θ ˆ

_k₊₁

|

^z1:k

, ˆ

z_k₊₁

)

·

^log^p

( θ ˆ

k+¹

|

^z1:^k

, ˆ

^zk+¹

)

d

θ ˆ

k+¹

.

(15)

ThePDFp(θˆk+¹|^z1:^k,ˆ^zk+¹)inEq. (5) isabletoberepresentedwith theparticleweights.As doneinEq. (8),theunnormalized weight ofthe potential sourcetermwiththefuture measurementisup- datedas:

ˆ¯

wⁱ_k₊₁

=

^p

(ˆ

z_k₊₁

|θ

_kⁱ

) ·

^wⁱ_k ⁽¹⁶⁾

andtheupdated weight,^wˆ¯ⁱ_k₊₁^,^is^normalized^to ^wˆⁱ_k₊₁^,^as^doneⁱⁿ Eq. (9).Then,theexpectationoftheentropyattimestepk+^{1 can} beexpressedas:

E

[ ˆ

^Hk+¹

] = −

p

(ˆ

^zk+¹

|θ

k

)

N

i=¹

ˆ

wⁱ_k₊₁log^w

ˆ

_kⁱ₊₁^d

ˆ

^zk+¹

.

(17)

BysubstitutingEq. (13) andEq. (17) into Eq. (12), theInfotaxisII utilityfunctionisexpressedas:

I

(

u_k

) = −

N i=¹

wⁱ_klogwⁱ_k

+

p

(ˆ

z_k₊₁

| θ

_k

)

N

i=¹

(

w

ˆ

_kⁱ₊₁

)

log

(

w

ˆ

_kⁱ₊₁

)

d

ˆ

z_k₊₁

.

(18)

However,itisdiﬃcultto analyticallysolvetheabove equation asitneedstobeintegratedwithallpossiblefuturemeasurements ˆ

zk+¹ wheretheparticleweightwˆⁱ_k₊₁ isalsoafunctionofˆzk+¹,as represented in Eq. (16). Toaddress thisissue, we ﬁrst discretize thefuturemeasurementwithacertainintervalδdˆ as:

d

ˆ

_k₊₁

= [ˆ

^d_k⁽^min₊₁⁾

,

d

ˆ

⁽_k^min₊₁⁾

+δ

d

ˆ

_k₊₁

, ...,

d

ˆ

_k⁽^max₊₁⁾

] = [ˆ

^d_k⁽¹₊⁾₁

,

d

ˆ

_k⁽²₊⁾₁

, ...,

d

ˆ

⁽_k^d₊^max₁⁾

]

(19) wheretheminimumandmaximumvalueinfuturemeasurements, dˆ⁽_k^min₊₁⁾ and dˆ⁽_k^max₊₁⁾ are obtained by following the empirical three- sigmaruleas

μ

k+¹±³·

σ

g,k+¹ where

μ

k+¹=N

i=¹R(rk|θ_kⁱ)·^wⁱ_k indicatestheexpectedmeanconcentration derivedbyEq. (3) and

σ

g,k+¹=

α

·

μ

k+¹+

σ

env.Theutilityfunctionwithdiscretized mea- surementsisthenexpressedas:

I

(

u_k

) = −

N i=1

wⁱ_klogwⁱ_k

+

dmax

j=1

p

(

_d

ˆ

^j

k+¹

|θ

k

) ·

N l=¹

(

w

ˆ

^l_k₊₁

)

log

(

w

ˆ

^l_k₊₁

).

(20) Note that the updated and normalized weight wˆ^l_k₊₁ is now affected by _dˆ^j

k+1 rather than ˆzk+¹ in Eq. (16). Besides, the measurement likelihood with the discretized measurement set from theGaussiansensormodel, p(dˆ_k^j₊₁|θk),canbeexpressedwiththe cumulativedistributionfunction(CDF)ofthestandardnormaldis- tribution,(·)[45],asgiven:

p

(

_d

ˆ

^j

k+¹

|θ

k

) =

N i=1

d

ˆ

_k^j₊₁

+ δ

_d

ˆ σ

_gⁱ_,_k₊₁

−

d

ˆ

_k^j₊₁

σ

_gⁱ_,_k₊₁

wⁱ_k (21)

where

d

ˆ

_k^j₊₁

= ˆ

^d_k^j₊₁

−

^R

(

r_k₊₁

| θ

_kⁱ

),

σ

_gⁱ_,_k₊₁

= α

R

(

r_k₊₁

|θ

_kⁱ

) + σ

env

.

⁽²²⁾

Finally,the next bestaction, u^∗_k,among theaction candidate set, UI= {↑,↓,←,→}^,^is^chosen^as:

u^∗_k

=

^arg ^max

u_k∈UI

I

(

u_k

).

(23)

The conventional Infotaxis is basically a one-step greedy ap- proachand considers next movements incardinal directions(i.e.

neighboring up, down, left and right grid points) only, which is highly likely to lead the mobile agent to fall into the local op- timainobstacle-richenvironments. Whentheobstacles blockthe next step of the mobile agent, possible maneuvers are reduced and it could be stuck at the corner or boundary of the obstacle. Furthermore,the utility function of Infotaxis tends to guide themobileagent tobebiasedtowardexplorationratherthanex- ploitation[33]. Therefore,directlyusingthe entropyforthe Info- taxisutilityfunctionmightnotbeeﬃcientfromtheperspectiveof thesearch time.Thus,foreﬃcientsourcesearch forobstacle-rich environments, long-term(i.e. multi-step ahead) planning using a continuousactionspaceisneeded.

(5)

4. Rapidly-exploringrandomtree-Infotaxis

To resolve the aforementioned problem of conventional infotaxis,therapidly-exploringrandomtree(RRT),thesampling-based deliberativepathplanningalgorithm[29],isincorporatedwithIn- fotaxistopredicttherewardfromlong-termfuturemeasurements inanobstacle-richdomainwithanewly-designedutilityfunction.

The advantage of introducing RRTis not only obstacle avoidance butalsogeneratingvariousfeasiblepathsintheobstacle-ﬁlleddo- main. Generated RRTtree ina recedinghorizonmanner (similar to the concept of model predictive control [46]) is used as the long-term action candidate set of the mobile agent. It helps the mobileagentnot tofallintothelocaloptima.Fromthefollowing, theprinciple oftheRRTisbrieﬂyexplained,andthenthe overall processofRRT-Infotaxisisdescribed.

4.1. Rapidly-exploringrandomtree(RRT)

The rapidly-exploring random tree (RRT) is a sampling-based pathplanningalgorithm[31]. Itisdesignedto searchnon-convex and high-dimensional spaces efficiently. By randomly generating samplesinacontinuousspace,itinherentlyfillsthespacetowards tounexploredareasuniformly.Letthegivenspacebedenotedbya setZ⊂R²asweconsidera2-Dconfigurationspace.Theareaoc- cupiedwithobstaclesisrepresentedbyZ_obsandtheobstacle-free area is denoted asZ_{f ree}.The RRTconstructs a treeby sampling random nodes in Z_{f ree}. From the starting point qinit, the tree gradually expands and the process ends when the tree expands sufficientlynearthegoalpoint,q_goal.Duringeach iterationofthe algorithm, therandomsample node q_rand issampledin Zandif itliesinZf ree,theclosestsamplenodeqnearest inthetreeT from qrand is selected.Ifqrand is accessibleto qnearest anddistancebe- tween them is less than the predefined movement distance , q_rand is considered asthe newnode qne w and addedto the tree T.Ifthedistanceislongerthan ,anewsample fromqnearest to q_rand atthedistance isconsidered asq_{ne w}.The processoftree expansionisdescribedinFig.2.

TherepresentativecharacteristicsoftheRRTcanbecategorized twofold [47]. First, it generates the safe path avoiding obstacles.

Second, it makes random samples in a continuous domain with simple implementationso that thetree tends toexpand towards the unexplored area. By adopting these properties,the proposed RRT-Infotaxisstrategyisabletochooseeﬃcientactioncandidates ofthemobileagent.

4.2. SearchprocessofRRT-Infotaxis

TheproposedapproachusesRRTasalocalpathplanner.Unlike the RRT, in the source term estimation problem, the goal point which can be regarded as a true source origin is not known in advance.Furthermore,inpractice,theentire(i.e.,global)map information is not given. Thus, planning the global path towards the goal directly is impossible. However, as the estimation pro- cessproceeds,theestimatedsourcelocationisclosertotheactual sourcelocationandwitharangesensorsuchasLiDAR,theobsta- cles nearby can be detected. Thus, by planning thelocal path at eachiterationofthesearchprocessinarecedinghorizonmanner, themobileagentcanselecttheeﬃcientpathtothesourceorigin whileavoidingobstacles.

The overall processof theRRT-Infotaxis isdescribed in Fig.3.

ThetreeisgeneratedwithNtnnumberofRRTnodesinthelimited regionwitharadiusof R_range fromthecurrentposition,asshown in Fig. 3 (a). After that, the tree pruning process is conducted.

As weconsiderm recedinghorizonsteps,onlythebrancheshav- ing m number of nodes are considered, as illustrated in Fig. 3 (b). Note that, m is three in this ﬁgure. The pruned branches,

Fig. 2.Tree expansion of the RRT.

V = [^V1,V2,...,VN_b] ^where ^Nb is the total number of pruned branch,areconsideredasthepossiblefuturerecedinghorizon(i.e., local) path of the mobile agent. Each branch is formulated by a nodesetV_b=

v_b_,₁,v_b_,₂, ...,v_b_,_m

wheretheelementinthesetin- dicatesthe 2-D location inthe search area. The elements of the branchsetareusedastheactioncandidates(i.e.,movementdirec- tions)ofthemobileagent.Thereafter, theutility functionofeach branchis computedandthe best rewarded branchis selected as shown in Fig. 3 (c). Detailed explanation on the utility function willbedescribedinthefollowingsection.Amongthenodesinthe selectedbranch, themobile agent movesto theﬁrst node ofthe selected branch as illustrated in Fig. 3 (d). The RRT-Infotaxis al- gorithmrepeatedly removestheoldtreeandcreates newtreeby centering thecurrent positionof themobile agent at every time step.

4.3.UtilityfunctionofRRT-Infotaxis

Foreﬃcientsource search,balancingexplorationandexploita- tion is of primary importance [48]. As explained in Section 3.2, the Infotaxis utility function only utilizes the entropy reduction and it is known to be biased towards exploration [33]. Hence, by introducing a new term in the utility function, the proposed RRT-Infotaxisfacilitatesbothexplorationandexploitationproperly inobstacle-rich environments.Inotherwords,combinedwiththe information-theoretic utility function and the cost function from the receding-horizon RRT, the mobile agent balances exploration (i.e.,lookingfortheinformativesensingques)andexploitation(i.e., movingtowardsthesourceorigin).Theproposednewutilityfunc- tioniscomprisedoftwoparts:maximizingtheentropyreduction andminimizingthesearchpath.

(6)

Fig. 3.The example process of selecting the next best action of RRT-Infotaxis whenmis 3.

4.3.1. Maximizingentropyreduction

To maximize the entropyreduction, the Infotaxisutility function,Eq. (20) explained intheprevioussection,isused.However, to reduce the computational burden considering multiple recedinghorizonsteps,thediscretizedGaussiansensormodel(Eq. (21)) is replacedwiththe binary sensormodel.It hasonlytwo sensor measurement values:1 (when detected)and 0(notdetected), so thefuturebinarymeasurementisdenotedasbˆ∈ [⁰,1]^.^The^prob- abilityofthemeasurementattimestepk+ⁿconsideringmsteps (n≤^m)^is^thenrepresentedas:

p

(

b

ˆ

_k₊_n

|θ

k

) =

β

if_b

ˆ

_k₊_n

=

⁰

1

− β

if_b

ˆ

_k₊_n

=

¹

.

⁽²⁴⁾

Theprobability,β,canbeformulatedas:

β =

N i=1

¯

c_kⁱ₊_n

σ

_gⁱ_,_k₊_n

w_kⁱ

(25)

where

¯

cⁱ_k₊_n

= ¯

c_k

−

R

(

r_k₊_n

|θ

_kⁱ

),

σ

_gⁱ_,_k₊_n

= α

R

(

rk+ⁿ

|θ

_kⁱ

) + σ

env

.

⁽²⁶⁾

Toeﬃciently updatethe thresholdc¯_k whichdecides whetherthe measurementisdetectedornot,itisdesignedtobechangedadap- tivelywithrespecttothecurrentsensormeasurement,c_k,as:

¯

c_k

=

⎧ ⎪

⎨

⎪ ⎩

ack

+ (

1

−

a

)¯

ck−¹ ifk

>

1

,

ck

>

c

¯

k−¹

¯

ck−¹ ifk

>

1

,

ck

≤ ¯

ck−¹

c_k ifk

=

¹

,

(27)

wherea issetas0.5inthispapersimilar to[49,50].The threshold increases only when the new measurement is greater than the currentthreshold so that the mobile agent gradually moves toget ahigherconcentration.Updating thethresholdinthisway canfacilitatemoreexploitationsincetheagentcanobtainalarger measurementvalueasitgetsclosertothesource.

Forupdatingtheprobabilityofthefuturemeasurementsinre- cedinghorizonsteps,weutilizethecurrentweightoftheparticle ﬁlterattimestepk.Sequentiallypredictingtheweightsandusing themforthenextstepmightbepossible[51],butitrequiresasig- niﬁcantcomputationalload.Furthermore,asupdatingweights for futuresteps, the reliability ofthe predictiondecreases. Thus, the unnormalizedupdatedweightatnrecedinghorizonstepisrepre- sentedby:

ˆ

wⁱ_k₊_n

=

^p

(

b

ˆ

_k₊_n

|θ

_kⁱ

) ·

^wⁱ_k

.

(28) Finally,theutilityfunctionforeachnodeinabranchVb usingthe binarysensormodelcanberepresentedas:

I

(

v_b_,_n

) = −

N i=¹

w_kⁱlogwⁱ_k

+

1 bˆk+ⁿ=0

p

(

b

ˆ

_k₊_n

|θ

k

)

·

N l=¹

(

w

ˆ

^l_k₊_n

)

log

(

w

ˆ

^l_k₊_n

).

(29) Notethat,computingthisutilityfunctionismuchlighterthanthat ofEq. (20) owingtothebinarymodel.Allthenodesfromtheﬁrst tothem^thnodeineachbranch,whicharetheactioncandidatesfor therecedinghorizonapproach,arecomputedbytheEq. (29) and accumulated. However, asthe closest node has the mostreliable information,adiscountfactor

γ

isappliedwhenaccumulatingthe valueoftheutilityfunction.Thus,theutilityfunctionofmaximiz- ingtheentropyreductioninRRT-Infotaxisisformulatedby:

J₁

(

V_b

) =

m n=¹

γ

⁽ⁿ⁻¹⁾I

(

v_b_,_n

).

(30)

4.3.2. Minimizingsearchpath

To improve the performance in terms of reducing the search time or equivalently path length, the additional cost function is introduced into the RRT-Infotaxis utility function. This new cost functionoftheRRTcalculatesthepathlengthfromtheinitialpo- sition to the currentposition of the mobile agent and adds the

(7)

Euclidean distance between the current position and goal point motivated fromtheA* cost function [47,52]. Asthe exact source location is notknown inadvance, theestimatedsource origin^rˆ0 fromtheparticleﬁlterisconsideredasthegoalpointanditisup- datedateachsearchandestimationiteration.Asthesourcesearch process proceeds, the estimatedlocation becomesmoreaccurate.

Itimpliesthatasthemobileagentcollectsmoreinformativemea- surements, the mobile agent tends to move towards the correct source location. To thisend, the cost function ofminimizing the searchpathisformulatedas:

J2

(

V_b

) =

m n=1

( |v

b,n

− v

b,n−1

| ) + |ˆ r

0

− v

b,m

|

⁽³¹⁾

where b∈ {¹,2, ...,N_b}^and ^vb,0 indicates the currentlocation of themobileagent.Intheaboveequation,thelefttermoftheright- handsideindicatesthatthesumofthepathlengthsfromcurrent tomstepfuturelocationofthemobileagent,andrighttermrepre- sentstheEuclideandistancebetweentheﬁnalfuturesteplocation andthegoalpoint(i.e.,estimatedsourceoriginˆr0).

Bycombiningabove twoutility functions,the proposedutility functionattimestepkfortheRRT-Infotaxisisexpressedas:

Jk

(

Vb

) =

J1

(

Vb

) − (

1

− )

J2

(

Vb

)

(32) where isthepositiveweight.Inthisstudy, ischosenas0.5to make thetwo functionscontribute equally. Notethat,asthecost function ofthe RRTissupposed to minimize thecost, theutility function, J2(Vb),hasaminussignintheﬁnal utilityfunction, Jk. Thebestbranchisobtainedas:

V^∗_b

=

^arg^max

Vb∈V J

(

V_b

).

(33)

Theﬁrstnode v^∗_b_,₁ fromthebestbranchV_b^∗ ischosenasthenext maneuverofthemobileagent.

Notethat,atthebeginningoftheestimationprocess,theesti- matedsourcelocationcouldbefarfromthetruevalue,andmov- ing towardsthewrongsourceorigincouldbe badforexploration initially. However, at early stages,the moving directiondoesnot meanmuchastherewouldbenomeaningfulinformationaround the mobile sensor anyway and the Infotaxisutility function (i.e., J₁)makessurethattheagentexplorestheenvironmentwell.Itis reportedthat theInfotaxisutility functionhasagoodexploration propertybutlacks anexploitationproperty[33].Thus,theheuris- ticterm J2 isintroducedforbetterexploitation. Oncethe mobile agentgatherssuﬃcientmeasurements,theutilityfunction J₁does not changemuch.Thus,afterestimationofthe sourceismoreor lessconverged,the utility function J2 wouldplay amain role to lead the mobile agent towards the source, facilitating better exploitation.Insummary,bytheheuristicterm J2,wemightlosea littlebitofperformanceintermsofexplorationbutgainalotmore in terms of exploitation, which results in right balance between exploration and exploitation and much better search estimation performance. This willbe supported by numericalsimulation re- sultslater.

During the source term estimation process, the mobile agent is assumedtostay atone location fora few secondsto obtaina reliable gassensormeasurement beforemoving tothe nextsam- plingpoint.Notethat, ingeneral, theresponseandrecoverytime ofthegassensorrangesfromafewsecondstominutes[53].This is whywe considered the discrete action strategy moving to the next samplingpoint at each time step witha constant speed in thisstudy.ThisisacommonpracticeintheInfotaxis-basedsource search studies,supported byexperimental validation[17,41]. Tra- jectory (or velocity) planning considering the detailed dynamics

Algorithm1RRT-Infotaxis.

1: T.init(qinit)

2:fork=1,2,. . . ,kmaxdo

3: zk←Readanewsensormeasurement

4:

(θk−1,wk−1)→(θk,wk)

usingEq. (9) 5: qne w←^qinit= [^rx,k,ry,k]

6: whileNseed≤^N^tn^do 7: qrand←^Sample(qrand)

8: qnearest←FindNearestNode(qrand,T) 9: qne w←Steer(qnearest,qrand) 10: ifObstacle Freethen 11: T.add_node(qne w) 12: T.add_vertex(qne w,qnearest) 13: Nseed=^Nseed+¹ 14: V=

V1, ...,VN_b

←TreePrune(T) 15: for allVb= [vb,1,...vb,m]∈V do 16: for n=¹,2,...,mdo

17:

(θk,wk)→(ˆθk+n,wk+n)

usingEq. (28)

18: [ˆbk+1,...,bˆk+m]←Collectbinarysensormeasurements 19: JkcomputationwithEq. (32)

20: V^∗_b=argmaxVb∈V Jk(Vb)← Next best maneuvers for receding horizon timesteps

21: ClearT

22: ifσp<σtthen←^estimation^converged 23: break;

24: rk+1= [^rx,(k+1),ry,(k+1)]= [^rx,k,ry,k]+^v^∗b,1←^move^to^the^next^sampling positionrk+1usingtheﬁrstnodeofV_b^∗

andnonholonomic constraints of the mobile agent with a high- performance gassensor (providing a fastresponse time) remains asfuturework.

TheentireprocessoftheRRT-InfotaxisissummarizedinAlgo- rithm1.ThedetailofthefunctionsinAlgorithm1isdescribedas follows.

• ^Sample: ^This ^function ^generates ^the ^random ^position ^qrand whichisincludedintheregionwiththeradiusof R_range cen- teringthecurrentlocationofthemobileagent.

• FindNearestNode: This function ﬁnds the nearest node from othernodesinRRTtreeT toqrand.

• ^Steer:^This^function^generates^qne w alongthepathfromq_nearest towardsq_randatadistance where istheﬁxedincremental distance.

• ^TreePrune:^This^function^returns^all^set^of^branches.

5. Numericalsimulations

Tovalidate theperformance oftheproposed RRT-Infotaxis,we performedMonteCarlosimulations invarious environments. The Gaussianplumemodelandbinarysensormodelareutilized.

5.1. Actualgasdispersioninanenvironmentwithobstacles

Forthesimulations,gasdispersion situationsneedtobemod- eled.Todescribeitrealistically,theGrazLagrangian(GRAL)model, whichisdevelopedbythe GrazUniversityofTechnology,Austria, is utilized for simulations, which computes ﬂows around obstacles [54]. It is the revised Lagrangian particle model where the Lagrangian particle model produces gas particles fromthe diffusion source by considering not only the gas properties but also thestatisticalenvironment.Thismodelassumesthatatmospheric diffusion is able to be modeled by a Markov chain process and representsportionsofeachparticlebycalculating3-Dwindveloc- ity[55]. The GRALmodel is adopted asthe dispersion modelin the environment with obstacles as it is suitable forfast and ro- bustmodelinginalargearea[54].Thesamplemapisdescribedin Fig.4andtheparametersusedinthismapareasfollows.

(8)

Fig. 4.The satellite view of the sample map and gas dispersion map with obstacles using the GRAL model.

•^True^source ^term: ^Q0=^{2 kg/h}=⁰.56 g/h,andr0= [^{197 m}, 235 m]^;

•^Search ^area: ^A=^{255 m}×^{267 m,} ^wind ^velocity ^V =^{2 m/s} anddirectionφ=²⁴⁰^◦^,^horizontal^standard^deviation

σ

y=^20, vertical standarddeviation

σ

z=^10, ^and^stacking^height ^z0= 11 m;

•^Searchâgent: ^Movement^step^size^d=^{9 m,}^constant^flightâl- titude at 11 m above the groundwhich is the same as the gasstacking height,andtheradius oftheregionforgenerat- ingRRTsamplenodes,Rrange,isdefinedasRH-step×^d^where RH-steprepresentsrecedinghorizonsteps;

•Êstimation^condition:^Numberôf^particles^for^the^particle^filter N=³,000.Thestandarddeviationofthesensornoiseissetto

σ

g=^R(r|θ )+^10;^and

•^Terminal conditions: standard deviation of the particle ﬁlter

σ

t=^2,^and^the^estimation^success^threshold^ds=^{2 m.}

5.2. Theeffectrecedinghorizonstepsandthenumberofnodes WhenimplementingRRT-Infotaxis,we needtosamplerandom nodes to generate RRTtrees. However, takingtoo many samples would notincrease the efficiencyforsource search further while increasing computationalloads.Besides,forapplyingthereceding horizonapproach,thepropernumberoffuturesteps needstobe investigated.Therefore,inorderto findthepropernumberofthe RRTsamplenodesNtnandrecedinghorizonstep(RH-step)m,corresponding numerical simulations are first conducted; thesetwo parameters are key factors which affect the performance of the algorithm. Withchanging thenumber ofRRTsample nodes, Ntn, the meansearch time (MST) ofRRT-Infotaxis withdifferentm is comparedforthesimulationsusingthesample mapenvironment explainedintheprevioussectioninFig.5.Inthefigure,resultsare averaged over 100MonteCarlosimulations andRH-steps from2 to5are denotedasRH2 toRH5.As N_tn increasesto30,allcases showthedecreasingtrendoftheMST.However,theMSTdoesnot getbetterover30samplenodes.Whentherearetoomanysample nodesinacertainregion,theutility functionvaluedifferencebe- tween differentbranchesmightbeinsignificant.Besides,selecting the best action candidate does not always provide the best performance duetothe differencebetweenthe approximatedutility function andrealsituation. From thisresult, RH3 withN_tn of30 can be seen asthe proper values. Althoughthe MSTs of RH4 or RH5with40nodesarealmostthesameasthatofRH3,thelarger receding horizonsteps or sample nodes would mean the higher

Fig. 5.The MST of RRT-Infotaxis with different RRT nodes and RH-steps.

computationalload.Thus,allofthefollowingsimulationshereafter areconductedwithmof3andN_tnof30.

Itisworthwhilenotingthat,onesearchtime step(incomput- ingMST)includesthetimetakenforcollectingthereliablesensor reading at one position, computation time to calculate the next best samplingposition andtime to reach thenext samplingposition with a constant speed. In this regard, let us compare the MST betweenInfotaxis andthe RRT-Infotaxis. First, the averaged computationtime perstepforcalculatingthebestpositioninIn- fotaxis (one-step lookahead) and RRT-Infotaxis algorithms (RH3) is 0.0024 and 0.0535 seconds in the MATLABenvironment on a desktop with an Intel(R) Core(TM) i7-7707HQ CPU @ 2.80 GHz andNVIDIA GeForce GTX 1060, respectively. On the other hand, themovementdistancebetweenconsecutivesamplingpositionsis fixedintheoriginalInfotaxis(i.e.,movementstepsized=^{9 m}âs definedearlier)whilethat oftheRRT-Infotaxisisequalorslightly smaller than d=^{9 m;} ^this îs ^because ^the ^RRT âlgorithm âdds new random nodes (sampling position candidates) to the existing tree within the fixed threshold distance ofd=^{9 m.} În ^fact, theaverageddistancebetweensamplingpositionsofRRT-Infotaxis is8.3 m.Thus, considering bothtime loss fromcomputationand gain from moving a shorter movement step size between sam- plingpointsofRRT-Infotaxis,wecouldconsiderthatthesameMST forRRT-Infotaxis(RH3)andInfotaxisrepresentsapproximatelythe sametimetakenforthesourcesearchandestimationprocess.