Contents lists available atScienceDirect
Aerospace Science and Technology
www.elsevier.com/locate/aescte
Receding-horizon RRT-Infotaxis for autonomous source search in urban environments
Seulbi An, Minkyu Park, Hyondong Oh
∗DepartmentofMechanicalEngineering,UlsanNationalInstituteofScienceandTechnology,Ulsan,RepublicofKorea
a rt i c l e i n f o a b s t r a c t
Articlehistory:
Received9March2021
Receivedinrevisedform20November2021 Accepted5December2021
Availableonline10December2021 CommunicatedbyChoonKiAhn
Keywords:
Autonomoussearch SourceTermEstimation(STE) Bayesianinference
SequentialMonteCarlomethod Rapidly-exploringrandomtrees Information-theoreticsearch
In emergency situations such as hazardous gas leak, search and estimation for identifying source information,knownassourcetermestimation(STE),inatimelyand accuratemannerisofsignificant importance.In realworld situations, obstacles suchas buildingsor barriersnot onlyblock the path forsearchbutalsointerferetheflowofthegassource.Forautonomoussource searchandestimation usingamobilesensorinsuchobstacle-richenvironments,thispaper proposesaninformation-theoretic STEapproachbycombiningawidely-used Infotaxiswiththe rapidly-exploringrandomtrees (RRT).In particular,theproposedstrategyutilizesthereceding-horizonRRTconceptwithanewlydesignedutility functionfordeterminingthenextmaneuverofamobileagenttogetthebestinformationofthesource whileavoidingobstacles inurbanenvironments. Numerical simulationsinvariousenvironmentsshow thesuperiorperformanceoftheproposedapproachcomparedwiththeoriginalInfotaxismethod.
©2021ElsevierMassonSAS.Allrightsreserved.
1. Introduction
Autonomoussourcesearch andestimationhasavariety ofap- plications across disasters, accidents or terrorism situations. For instance,chemical,biologicalorradiologicalmaterialscouldbere- leasedintotheatmosphereaccidentallyorintentionally.Whenthis happens,localizingasourcelocationandquantifyingsourceinfor- mationsuchasareleaserate,calledsourcetermestimation(STE), is the primary taskto take proper actions forpreventing further damages andevacuate people fromdangerous regions [1].In the past, alarge numberof pre-installed sensorsare utilized forcol- lectingsensingques.However,thisapproachrequireshighcostand has a spatial limitation when the accidents occur inunexpected areas.Thus,withrecentdevelopmentsonautonomoussystems,us- ing mobilesensors suchasan unmannedaerialvehicle (UAV)for STEhasbeengainingpopularity[1–13].Usingamobilesensoral- lowstocoveralargeareaandtoberapidlydeployedanywhere it isneededinatimelymanner.
DiverseapproachesforSTEusingmobilesensorsare proposed, whichcouldbelargelycategorizedasbio-inspired,gradient-based and information-theoretic methods [2]. Bio-inspiredmethods are inspired by the behaviors of the living creatures such as moth, bacteria orfly[3–5]. Thesemethods mimic andcombinethebe- haviorsoffoodsearchingstrategiessuchassurge,cast,zigzagand
*
Correspondingauthor.E-mailaddress:[email protected](H. Oh).
spiral. However, in reality,the sensing ability and locomotionof mobile sensors is far from those of living creatures [4], which makestheimplementationandperformance ofthisapproachlim- ited.Ingradient-basedmethods[14],themobileagentmovestoa higherconcentrationpoint,assumingthattheactualplumemodel producesacontinuousandsmoothgradientfield.Thisassumption isoften invalid ina real plume dispersion situationespecially in a turbulent flow. Lastly, information-theoretic methods on which thisstudy focuses are basedon probabilistic strategies. Theyuse theutility function withtheentropyor uncertaintytodetermine thebestnextactionofthemobileagentsuchasInfotaxis[6,7,9,15]
orEntrotaxis [8]. It employs a dispersion model, a sensormodel andmeasurements to estimate the sourceterm usinga Bayesian framework.Thesestrategiesperformwelleveninaturbulentflow wherethesensingquesaresparseorfluctuate.
Itis worthwhile notingthat,manyof theexisting STEstudies assume that the source search mission is conducted in an open spaceratherthan obstacle-filledenvironments andthe maneuver ofthe mobileagent is limitedin adiscrete domain [6–10,16,17].
Inapractical sourcesearch situation,thesearchingenvironments usually contain buildings or barriers which affect the dispersion phenomena and hinder the search process of the mobile agent [18]. Also, as the movement of the mobile agent in one step is limitedin discrete directions(e.g., left, right, up, anddown), the estimation performance is affected by the grid resolution of the environment and action candidate (i.e., movement step size). If the resolution of the grid is small, the estimation performance https://doi.org/10.1016/j.ast.2021.107276
1270-9638/©2021ElsevierMassonSAS.Allrightsreserved.
couldbeenhancedbutthesearchingtimeandcomputationalloads wouldincreasesignificantly.
Severalstudiesconsidering complexenvironmentsinSTEhave beenstudied in[19–28]. Risticetal.[21,22] assumethat theen- vironmentisrepresentedasagridmap andobstaclesare located atthe2-Dlatticepoints sothatonlypassablelinksareremained.
However,thisassumptionisfarfromreality.Theobstaclesareusu- ally located randomly as well as the lattice sizes mainly affect the performance of STEin real situations. If the one-step looka- headapproach,alsoknownasthereactiveplanningalgorithm[29], such asInfotaxis [7,9] is used in obstacle-rich environments, lo- cal optima in which the mobile agent is stuck at the corner of the obstacle occurs. Some studies suggest the obstacleavoidance mechanismto reduceunnecessary maneuver ofthe mobileagent [23,24].However,theseapproachesrequireaknownglobalmapof theenvironmentandarehighlyaffectedbytheobstacleconfigura- tionandgridresolution.
Withabovebackgroundsinmind,thispaperproposesamulti- step lookahead source search strategy which belongs to deliber- ative planning algorithms [29,30]); this approach generates the efficientpathinanobstacle-richandcontinuousdomainbyintro- ducing therapidly-exploringrandomtrees (RRT)[31] asthelocal pathplannerforsourcesearch. ThegeneratedtreefromRRTina recedinghorizon(RH)mannerisconsidered asfeasiblelong-term future paths for source search. Note that the proposed receding horizon RRTapproach still requires a localmap around the cur- rent positionof themobile agent; this could be readilyavailable usingcameraorLiDARsensors[32].Tosuccessfullydecidetheef- ficient search action forthe mobile agent, a newutility function is introduced. The utility function consists oftwo parts: i) max- imizing theentropy reduction for exploration andii)minimizing the path length to thesource location forexploitation. For max- imizing the entropy reduction which generally encourages more exploration,theexistingInfotaxisutilityfunctionwiththeparticle filterisadopted[7,33].Notethattheparticlefilterisutilizedtoes- timatetargetstatesinhighlynonlinearandnon-Gaussian systems [34–37]. As the proposed approach with the particle filter con- siders multi-step lookaheadstates, the computational load could become intractable. Weaddress thisissueby replacing theorigi- nal Gaussiansensor modelwiththebinary sensormodel,similar to[38].Forminimizingthepathlengthforsourcesearch,thepath length is calculated by summing up the length fromthe current state tothefinal futurestateofthefeasiblebranchfromtheRRT andthelengthfromthefinalfuturestatetothegoalposition,that is,theestimatedsourcelocation.
The maincontributionofthispaperis twofold.First, withthe RRTandreceding-horizonconcept,theproposedalgorithmgener- atesthenext bestactionofthemobileagentinacontinuous do- main whileavoidingobstacles. Second, theproposed utility func- tionresultsintheefficientpathintermsofsearchtime andsuc- cessratewhile providingaccurate estimationofthe sourceterm.
Inotherwords,itfacilitatesbettersearchbyautonomouslyfinding the right balance between exploration and exploitation. To vali- date theenhanced performance ofthe proposedstrategy, various numericalsimulationsareconductedandcomparedwiththecon- ventionalInfotaxis[7].
Therestofthepaperisorganizedasfollows.InSection 2,the gas dispersion model and the sensor model are explained. Sec- tion3explainstheestimationprocess basedontheparticlefilter and the existing Infotaxis algorithm which is a base of the pro- posedapproach.InSection4,theproposed receding-horizonRRT- InfotaxisapproachwithabriefdescriptionontheRRTispresented.
Thedetails ofsimulationsetup andresultsaregiveninSection5.
Lastly,conclusionsandfutureworkaresummarizedinSection6.
2. Gasdispersionandsensormodeling
In this section, mathematical formulations of the gas disper- sionmodelandthesensormodelareexplained.Inparticular,the Gaussianplumemodelisadoptedasthedispersionmodelandthe Gaussiannoisesensormodelisutilized.
2.1. Dispersionmodel
TheGaussiandispersionmodel[39,40] assumesthat thegasis dispersedfromthepointsourceinthedownwind,crosswind,and vertical directions where positive x and y directions are aligned withthedownwindandcrosswinddirections,respectively. Inthis model, the source is released continuously from the gas source originr0= [x0,y0,z0]T∈R3+ with therelease rateof Q0∈R+ and the mean gas concentration at the sensing location rk= [rx,k,ry,k,rz,k]T∈R3+attimestepkisformulatedas
R
(
rk|θ ) =
Q0 2π
Vσ
yσ
zexp
−
ck22
σ
y2×
exp
− (
rz,k−
z0)
22
σ
z2+
exp− (
rz,k+
z0)
2 2σ
z2,
(1)where thesource term, θ, indicates the parameters which isre- quiredtomodelthegas dispersionsituationsuch assourceloca- tion,windspeed,andgasreleaserate.ckisthecrosswinddistance ofthesensing location fromthesourceand V is themeanwind velocity.Thestandarddeviationsofconcentration
σ
yandσ
zinthe crosswindandverticaldirectionaredefinedasσ
y= ζ
1dk/
1
+
0.
0001dkandσ
z= ζ
2dk/
1
+
0.
0001dk (2) wheredk isthedownwinddistancefromthesource.ζ1 andζ2are thestochasticdiffusiontermsinthecrosswindandverticaldirec- tions,respectively.Inthisstudy,weassumedthemobileagentflies at11m abovethegroundandthegasstacksatthesameheight.Therefore,fromthefollowingsections,weonlyconsidera2-Den- vironmentwherethelocation ofthe mobileagent attime stepk isdefinedasrk= [rx,k,ry,k]T∈R2+.
2.2.Sensormodel
Sincetheactualgasconcentrationfollowsastochasticprocess, thesensormodelismodeledbytheGaussiandistribution[41].The probabilitydensityfunctionoftheGaussiansensormodelwiththe noisestandarddeviation
σ
g,kattimestepkisformulatedas:p
(
zk|θ ) =
N(
zk;
R(
rk|θ ), σ
g2,k) =
1σ
g,k√
2π
exp−
(
zk−
R(
rk|θ ))
2σ
g2,k(3) wherethemeasurementattime stepkisrepresentedaszk∈R+ andits meanvalueisobtainedbyEq. (1).Thestandard deviation ofthenoiseisexpressedas:
σ
g,k= α
R(
rk|θ ) + σ
env (4) wherethenoiseisproportionalto R(rk|θ )withthepositive coef- ficientα
andσ
env representstheadditionalsensing noisecaused bytheenvironmentsuchaswinddisturbanceandtemperature.ThegasdispersionwiththeGaussiandispersion modelformu- latedbyEq. (1) isshowninFig.1(a).Corresponding sensormea- surementsfromthegasdispersionaresparse andnoisyasshown inFig.1(b).Dueto thisaspect,just moving towards highermea- surementvalue graduallyforlocalizing thesource(known asthe
Fig. 1.The sample map of the gas dispersion model and sensor measurements.
gradient following method) does not work well in a real world;
hence,the information-theoreticmethodwhich willbe explained in thefollowing section isproposed to providemore robustper- formanceagainstsensornoisesanddisturbances.
3. Information-theoreticsourcesearch
The information-theoretic STE approach has an advantage for robustestimationeveninaturbulentenvironmentwherethegra- dient of source concentration is not continuous or smooth. Info- taxis is one ofthe information-theoretic source search strategies for source localizationor source term estimation usinga mobile agent [6–9]. The Infotaxis utilizes the entropy (i.e., uncertainty) to calculate the utility function forthe source terminformation.
By computing the reduction of the entropy for all possible next positions based on cardinal movement directions(in general, up, down,left, andright),theactionwiththemaximumreduction is selectedasthenextmovementstepofthemobileagent.Fromthe following,Infotaxiswiththe particlefilterisexplained mostlyby following [7] but withsome modificationsincludingthe useofa discretizedsensormodel.
3.1. Sourcetermestimation
The source term represents parameters required to model the gas dispersion such as source location, wind speed, release rateandstochastic diffusionterms. Amongthose parameters, the source location and the release rate are key factors to be esti- mated. Thus, inthis paper, the source term vector is defined as θ= [x0,y0,Q0]T = [r0,Q0]T∈R3+, the 2-D source location and release rate, while other parameters are assumed to be known.
Other parameterscanbeacquiredbythemeteorologicaldataand gas properties. In many studies on source term estimation, this assumptionisgenerallyaccepted[7–10].
The particlefilterbased onBayesian frameworkis utilizedfor estimating the source term since the source term distribution is generallyhighlynon-linearandnon-Gaussian.Theposteriorprob- abilitydensityfunction(PDF)oftheestimatedsourcetermattime stepk,θk,isrepresentedby:
p
(θ
k|
z1:k) =
p(
zk|θ
k)
p(θ
k|
z1:k−1)
p
(
zk|
z1:k−1)
(5)where p
(
zk|
z1:k−1) =
p
(
zk| θ
k)
p(θ
k|
z1:k−1)
dθ
k.
(6) TheposteriorPDFisobtainedbyusingthepriorPDF p(θk|z1:k−1), thelikelihood(i.e.,sensormodel)p(zk|θk),andthemarginallikeli- hood p(zk|z1:k−1).Notethatthesensingques(i.e.,measurements),collected by the mobile sensor located at rk, are expressed as z1:k= {z1(r1),z2(r2),· · ·,zk(rk)}.
ToobtaintheanalyticalsolutionofEq. (5) isimpossibledueto thenon-linearityofthesourceterm.Thus,theparticlefilterunder theSequentialMonteCarloframework[35] isutilized.Therandom samplescalledparticlesareemployedtoapproximatetheposterior PDFp(θk|z1:k).Withi-thestimatedstateofthesourcetermθki and its associated weight wik attime step k,Eq. (5) is approximated by:
p
(θ
k|
z1:k) ≈
N i=1wik
δ(θ − θ
ki)
(7)whereδ(·)indicates the Dirac deltafunction and N is the num- berofparticlesintheparticlefilterwhichrepresentsthepotential sourceterms.Theparticlefilteriterativelyupdatesitsweightswith Eq. (5). By assuming that the source term is time-invariant (i.e., stationarysource location andconstant releaserate), p(θk|z1:k−1) ismadeequaltothePDF p(θk−1|z1:k−1).Inother words,theprior distribution at time step k is represented by the posterior dis- tribution at time step k−1. Besides, asthe marginal likelihood p(zk|z1:k−1)isnotaffectedbytheparticles,thistermisconsidered as a constant normalization factor when updating the weights.
Hence,theunnormalizedi-thparticleisupdatedby:
¯
wik
=
p(
zk|θ
ki−1) ·
wki−1,
(8) wherethelikelihood p(zk|θki−1)can be obtainedbythe Gaussian dispersionmodel inEq. (1) andthe sensormodel inEq. (3). The normalizedweightiscomputedas:wik
=
w¯
ik Nj=1w
¯
kj.
(9)In this study, to sample the particles, the importance sam- pling (IS) is introduced [42]. It samples particles by following the importance (proposal) distribution. For the importance dis- tribution, the estimatedsource term distribution at the previous step,expressedby p(θk−1|z1:k−1),isadopted similarto other rel- evantstudies[7–9].Inaddition,toprevent adegeneracyproblem caused by IS, which indicates that only a small number of par- ticleshavenon-zeroparticle weights,we adoptedthe resampling method [36]. The effectivenumber of samples, Ne f f ≈ N 1
i=1(wik)2 arecomputedateveryiteration,andwhenNe f f goesbelowacer- tainvalue, theparticles areresampled[43].Inaddition,since we assumedthatthepriordistributionattimestepkisrepresentedby theposteriordistributionattimestepk−1 asmentionedabove,it mightleadtoalackofparticlesdiversity;hence,theMarkovchain Monte Carlo(MCMC) move step is additionally conductedto in- creasethediversityoftheparticlesaftertheresamplingstep[44].
3.2. Infotaxis
Infotaxis is one of the information-theoretic strategies for source term estimation. It locally maximizes the expected infor- mation gain which indicates the reduction in the entropy (i.e., uncertainty) of theestimated source termdistribution. It utilizes theinformation-theoreticutilityfunctionformulatedby:
I
(
uk) =
p(
rk+1)
Hk− (
1−
p(
rk+1))(
E[ ˆ
Hk+1(ˆ
zk+1)] −
Hk)
(10)where Hk
= −
p
(θ
k|
z1:k)
logp(θ
k|
z1:k)
dθ
k.
(11) TheShannon’sentropy,Eq. (11),isadoptedtocompute theutility function.Theprobability p(rk+1)representstheprobabilitywhere theactualsourcelocatedatthenextsensingpointrk+1(=rk+uk).Themaneuverofthemobileagentisdefinedinadiscretedomain where its admissible actionset is definedas UI= {↑,↓,←,→}. Theprobability p(rk+1)ismeaningfulonlywhentheactualsource isexactlylocatedatrk+1 andcomputingthistermrequiressignifi- cantcomputationalburden[7].Hence,asimplifiedutilityfunction termedasInfotaxisIIignoringthefirstterminEq. (10) isutilized, givenas:
I
(
uk) =
Hk−
E[ ˆ
Hk+1(ˆ
zk+1)].
(12)Accordingto Eqs. (7)∼(9), theposteriorPDF, p(θk|z1:k),can be replaced bytheweights oftheparticlefilter.Thus,theShannon’s entropyisable tobeapproximatedusingthenormalizedweights inEq. (9) as:
Hk
≈ −
N i=1wiklogwik
.
(13)The second term in Eq. (12) indicates the expected Shannon’s entropy with the future measurement ˆzk+1 at the next sensing position rk+1. Thus, the expected entropy at time step k+1 is formulatedas:
E
[ ˆ
Hk+1] =
p
(ˆ
zk+1| θ
k)
Hˆ
k+1(ˆ
zk+1)
dˆ
zk+1 (14)where
H
ˆ
k+1(ˆ
zk+1) = −
p
( θ ˆ
k+1|
z1:k, ˆ
zk+1)
·
logp( θ ˆ
k+1|
z1:k, ˆ
zk+1)
dθ ˆ
k+1.
(15)
ThePDFp(θˆk+1|z1:k,ˆzk+1)inEq. (5) isabletoberepresentedwith theparticleweights.As doneinEq. (8),theunnormalized weight ofthe potential sourcetermwiththefuture measurementisup- datedas:
ˆ¯
wik+1
=
p(ˆ
zk+1|θ
ki) ·
wik (16)andtheupdated weight,wˆ¯ik+1,isnormalizedto wˆik+1,asdonein Eq. (9).Then,theexpectationoftheentropyattimestepk+1 can beexpressedas:
E
[ ˆ
Hk+1] = −
p
(ˆ
zk+1|θ
k)
Ni=1
ˆ
wik+1logw
ˆ
ki+1dˆ
zk+1.
(17)BysubstitutingEq. (13) andEq. (17) into Eq. (12), theInfotaxisII utilityfunctionisexpressedas:
I
(
uk) = −
N i=1wiklogwik
+
p
(ˆ
zk+1| θ
k)
Ni=1
(
wˆ
ki+1)
log(
wˆ
ki+1)
dˆ
zk+1.
(18)However,itisdifficultto analyticallysolvetheabove equation asitneedstobeintegratedwithallpossiblefuturemeasurements ˆ
zk+1 wheretheparticleweightwˆik+1 isalsoafunctionofˆzk+1,as represented in Eq. (16). Toaddress thisissue, we first discretize thefuturemeasurementwithacertainintervalδdˆ as:
d
ˆ
k+1= [ˆ
dk(min+1),
dˆ
(kmin+1)+δ
dˆ
k+1, ...,
dˆ
k(max+1)] = [ˆ
dk(1+)1,
dˆ
k(2+)1, ...,
dˆ
(kd+max1)]
(19) wheretheminimumandmaximumvalueinfuturemeasurements, dˆ(kmin+1) and dˆ(kmax+1) are obtained by following the empirical three- sigmaruleasμ
k+1±3·σ
g,k+1 whereμ
k+1=Ni=1R(rk|θki)·wik indicatestheexpectedmeanconcentration derivedbyEq. (3) and
σ
g,k+1=α
·μ
k+1+σ
env.Theutilityfunctionwithdiscretized mea- surementsisthenexpressedas:I
(
uk) = −
N i=1wiklogwik
+
dmax
j=1
p
(
dˆ
jk+1
|θ
k) ·
N l=1(
wˆ
lk+1)
log(
wˆ
lk+1).
(20) Note that the updated and normalized weight wˆlk+1 is now af- fected by dˆj
k+1 rather than ˆzk+1 in Eq. (16). Besides, the mea- surement likelihood with the discretized measurement set from theGaussiansensormodel, p(dˆkj+1|θk),canbeexpressedwiththe cumulativedistributionfunction(CDF)ofthestandardnormaldis- tribution,(·)[45],asgiven:
p
(
dˆ
jk+1
|θ
k) =
N i=1d
ˆ
kj+1+ δ
dˆ σ
gi,k+1−
dˆ
kj+1σ
gi,k+1wik (21)
where
d
ˆ
kj+1= ˆ
dkj+1−
R(
rk+1| θ
ki),
σ
gi,k+1= α
R(
rk+1|θ
ki) + σ
env.
(22)Finally,the next bestaction, u∗k,among theaction candidate set, UI= {↑,↓,←,→},ischosenas:
u∗k
=
arg maxuk∈UI
I
(
uk).
(23)The conventional Infotaxis is basically a one-step greedy ap- proachand considers next movements incardinal directions(i.e.
neighboring up, down, left and right grid points) only, which is highly likely to lead the mobile agent to fall into the local op- timainobstacle-richenvironments. Whentheobstacles blockthe next step of the mobile agent, possible maneuvers are reduced and it could be stuck at the corner or boundary of the obsta- cle. Furthermore,the utility function of Infotaxis tends to guide themobileagent tobebiasedtowardexplorationratherthanex- ploitation[33]. Therefore,directlyusingthe entropyforthe Info- taxisutilityfunctionmightnotbeefficientfromtheperspectiveof thesearch time.Thus,forefficientsourcesearch forobstacle-rich environments, long-term(i.e. multi-step ahead) planning using a continuousactionspaceisneeded.
4. Rapidly-exploringrandomtree-Infotaxis
To resolve the aforementioned problem of conventional info- taxis,therapidly-exploringrandomtree(RRT),thesampling-based deliberativepathplanningalgorithm[29],isincorporatedwithIn- fotaxistopredicttherewardfromlong-termfuturemeasurements inanobstacle-richdomainwithanewly-designedutilityfunction.
The advantage of introducing RRTis not only obstacle avoidance butalsogeneratingvariousfeasiblepathsintheobstacle-filleddo- main. Generated RRTtree ina recedinghorizonmanner (similar to the concept of model predictive control [46]) is used as the long-term action candidate set of the mobile agent. It helps the mobileagentnot tofallintothelocaloptima.Fromthefollowing, theprinciple oftheRRTisbrieflyexplained,andthenthe overall processofRRT-Infotaxisisdescribed.
4.1. Rapidly-exploringrandomtree(RRT)
The rapidly-exploring random tree (RRT) is a sampling-based pathplanningalgorithm[31]. Itisdesignedto searchnon-convex and high-dimensional spaces efficiently. By randomly generating samplesinacontinuousspace,itinherentlyfillsthespacetowards tounexploredareasuniformly.Letthegivenspacebedenotedbya setZ⊂R2asweconsidera2-Dconfigurationspace.Theareaoc- cupiedwithobstaclesisrepresentedbyZobsandtheobstacle-free area is denoted asZf ree.The RRTconstructs a treeby sampling random nodes in Zf ree. From the starting point qinit, the tree gradually expands and the process ends when the tree expands sufficientlynearthegoalpoint,qgoal.Duringeach iterationofthe algorithm, therandomsample node qrand issampledin Zandif itliesinZf ree,theclosestsamplenodeqnearest inthetreeT from qrand is selected.Ifqrand is accessibleto qnearest anddistancebe- tween them is less than the predefined movement distance , qrand is considered asthe newnode qne w and addedto the tree T.Ifthedistanceislongerthan ,anewsample fromqnearest to qrand atthedistance isconsidered asqne w.The processoftree expansionisdescribedinFig.2.
TherepresentativecharacteristicsoftheRRTcanbecategorized twofold [47]. First, it generates the safe path avoiding obstacles.
Second, it makes random samples in a continuous domain with simple implementationso that thetree tends toexpand towards the unexplored area. By adopting these properties,the proposed RRT-Infotaxisstrategyisabletochooseefficientactioncandidates ofthemobileagent.
4.2. SearchprocessofRRT-Infotaxis
TheproposedapproachusesRRTasalocalpathplanner.Unlike the RRT, in the source term estimation problem, the goal point which can be regarded as a true source origin is not known in advance.Furthermore,inpractice,theentire(i.e.,global)map in- formation is not given. Thus, planning the global path towards the goal directly is impossible. However, as the estimation pro- cessproceeds,theestimatedsourcelocationisclosertotheactual sourcelocationandwitharangesensorsuchasLiDAR,theobsta- cles nearby can be detected. Thus, by planning thelocal path at eachiterationofthesearchprocessinarecedinghorizonmanner, themobileagentcanselecttheefficientpathtothesourceorigin whileavoidingobstacles.
The overall processof theRRT-Infotaxis isdescribed in Fig.3.
ThetreeisgeneratedwithNtnnumberofRRTnodesinthelimited regionwitharadiusof Rrange fromthecurrentposition,asshown in Fig. 3 (a). After that, the tree pruning process is conducted.
As weconsiderm recedinghorizonsteps,onlythebrancheshav- ing m number of nodes are considered, as illustrated in Fig. 3 (b). Note that, m is three in this figure. The pruned branches,
Fig. 2.Tree expansion of the RRT.
V = [V1,V2,...,VNb] where Nb is the total number of pruned branch,areconsideredasthepossiblefuturerecedinghorizon(i.e., local) path of the mobile agent. Each branch is formulated by a nodesetVb=
vb,1,vb,2, ...,vb,m
wheretheelementinthesetin- dicatesthe 2-D location inthe search area. The elements of the branchsetareusedastheactioncandidates(i.e.,movementdirec- tions)ofthemobileagent.Thereafter, theutility functionofeach branchis computedandthe best rewarded branchis selected as shown in Fig. 3 (c). Detailed explanation on the utility function willbedescribedinthefollowingsection.Amongthenodesinthe selectedbranch, themobile agent movesto thefirst node ofthe selected branch as illustrated in Fig. 3 (d). The RRT-Infotaxis al- gorithmrepeatedly removestheoldtreeandcreates newtreeby centering thecurrent positionof themobile agent at every time step.
4.3.UtilityfunctionofRRT-Infotaxis
Forefficientsource search,balancingexplorationandexploita- tion is of primary importance [48]. As explained in Section 3.2, the Infotaxis utility function only utilizes the entropy reduction and it is known to be biased towards exploration [33]. Hence, by introducing a new term in the utility function, the proposed RRT-Infotaxisfacilitatesbothexplorationandexploitationproperly inobstacle-rich environments.Inotherwords,combinedwiththe information-theoretic utility function and the cost function from the receding-horizon RRT, the mobile agent balances exploration (i.e.,lookingfortheinformativesensingques)andexploitation(i.e., movingtowardsthesourceorigin).Theproposednewutilityfunc- tioniscomprisedoftwoparts:maximizingtheentropyreduction andminimizingthesearchpath.
Fig. 3.The example process of selecting the next best action of RRT-Infotaxis whenmis 3.
4.3.1. Maximizingentropyreduction
To maximize the entropyreduction, the Infotaxisutility func- tion,Eq. (20) explained intheprevioussection,isused.However, to reduce the computational burden considering multiple reced- inghorizonsteps,thediscretizedGaussiansensormodel(Eq. (21)) is replacedwiththe binary sensormodel.It hasonlytwo sensor measurement values:1 (when detected)and 0(notdetected), so thefuturebinarymeasurementisdenotedasbˆ∈ [0,1].Theprob- abilityofthemeasurementattimestepk+nconsideringmsteps (n≤m)isthenrepresentedas:
p
(
bˆ
k+n|θ
k) =
β
ifbˆ
k+n=
01
− β
ifbˆ
k+n=
1.
(24)Theprobability,β,canbeformulatedas:
β =
N i=1¯
cki+nσ
gi,k+nwki
(25)
where
¯
cik+n= ¯
ck−
R(
rk+n|θ
ki),
σ
gi,k+n= α
R(
rk+n|θ
ki) + σ
env.
(26)Toefficiently updatethe thresholdc¯k whichdecides whetherthe measurementisdetectedornot,itisdesignedtobechangedadap- tivelywithrespecttothecurrentsensormeasurement,ck,as:
¯
ck=
⎧ ⎪
⎨
⎪ ⎩
ack
+ (
1−
a)¯
ck−1 ifk>
1,
ck>
c¯
k−1¯
ck−1 ifk
>
1,
ck≤ ¯
ck−1ck ifk
=
1,
(27)
wherea issetas0.5inthispapersimilar to[49,50].The thresh- old increases only when the new measurement is greater than the currentthreshold so that the mobile agent gradually moves toget ahigherconcentration.Updating thethresholdinthisway canfacilitatemoreexploitationsincetheagentcanobtainalarger measurementvalueasitgetsclosertothesource.
Forupdatingtheprobabilityofthefuturemeasurementsinre- cedinghorizonsteps,weutilizethecurrentweightoftheparticle filterattimestepk.Sequentiallypredictingtheweightsandusing themforthenextstepmightbepossible[51],butitrequiresasig- nificantcomputationalload.Furthermore,asupdatingweights for futuresteps, the reliability ofthe predictiondecreases. Thus, the unnormalizedupdatedweightatnrecedinghorizonstepisrepre- sentedby:
ˆ
wik+n
=
p(
bˆ
k+n|θ
ki) ·
wik.
(28) Finally,theutilityfunctionforeachnodeinabranchVb usingthe binarysensormodelcanberepresentedas:I
(
vb,n) = −
N i=1wkilogwik
+
1 bˆk+n=0p
(
bˆ
k+n|θ
k)
·
N l=1(
wˆ
lk+n)
log(
wˆ
lk+n).
(29) Notethat,computingthisutilityfunctionismuchlighterthanthat ofEq. (20) owingtothebinarymodel.Allthenodesfromthefirst tothemthnodeineachbranch,whicharetheactioncandidatesfor therecedinghorizonapproach,arecomputedbytheEq. (29) and accumulated. However, asthe closest node has the mostreliable information,adiscountfactorγ
isappliedwhenaccumulatingthe valueoftheutilityfunction.Thus,theutilityfunctionofmaximiz- ingtheentropyreductioninRRT-Infotaxisisformulatedby:J1
(
Vb) =
m n=1γ
(n−1)I(
vb,n).
(30)4.3.2. Minimizingsearchpath
To improve the performance in terms of reducing the search time or equivalently path length, the additional cost function is introduced into the RRT-Infotaxis utility function. This new cost functionoftheRRTcalculatesthepathlengthfromtheinitialpo- sition to the currentposition of the mobile agent and adds the
Euclidean distance between the current position and goal point motivated fromtheA* cost function [47,52]. Asthe exact source location is notknown inadvance, theestimatedsource originrˆ0 fromtheparticlefilterisconsideredasthegoalpointanditisup- datedateachsearchandestimationiteration.Asthesourcesearch process proceeds, the estimatedlocation becomesmoreaccurate.
Itimpliesthatasthemobileagentcollectsmoreinformativemea- surements, the mobile agent tends to move towards the correct source location. To thisend, the cost function ofminimizing the searchpathisformulatedas:
J2
(
Vb) =
m n=1( |v
b,n− v
b,n−1| ) + |ˆ r
0− v
b,m|
(31)where b∈ {1,2, ...,Nb}and vb,0 indicates the currentlocation of themobileagent.Intheaboveequation,thelefttermoftheright- handsideindicatesthatthesumofthepathlengthsfromcurrent tomstepfuturelocationofthemobileagent,andrighttermrepre- sentstheEuclideandistancebetweenthefinalfuturesteplocation andthegoalpoint(i.e.,estimatedsourceoriginˆr0).
Bycombiningabove twoutility functions,the proposedutility functionattimestepkfortheRRT-Infotaxisisexpressedas:
Jk
(
Vb) =
J1(
Vb) − (
1− )
J2(
Vb)
(32) where isthepositiveweight.Inthisstudy, ischosenas0.5to make thetwo functionscontribute equally. Notethat,asthecost function ofthe RRTissupposed to minimize thecost, theutility function, J2(Vb),hasaminussigninthefinal utilityfunction, Jk. Thebestbranchisobtainedas:V∗b
=
argmaxVb∈V J
(
Vb).
(33)Thefirstnode v∗b,1 fromthebestbranchVb∗ ischosenasthenext maneuverofthemobileagent.
Notethat,atthebeginningoftheestimationprocess,theesti- matedsourcelocationcouldbefarfromthetruevalue,andmov- ing towardsthewrongsourceorigincouldbe badforexploration initially. However, at early stages,the moving directiondoesnot meanmuchastherewouldbenomeaningfulinformationaround the mobile sensor anyway and the Infotaxisutility function (i.e., J1)makessurethattheagentexplorestheenvironmentwell.Itis reportedthat theInfotaxisutility functionhasagoodexploration propertybutlacks anexploitationproperty[33].Thus,theheuris- ticterm J2 isintroducedforbetterexploitation. Oncethe mobile agentgatherssufficientmeasurements,theutilityfunction J1does not changemuch.Thus,afterestimationofthe sourceismoreor lessconverged,the utility function J2 wouldplay amain role to lead the mobile agent towards the source, facilitating better ex- ploitation.Insummary,bytheheuristicterm J2,wemightlosea littlebitofperformanceintermsofexplorationbutgainalotmore in terms of exploitation, which results in right balance between exploration and exploitation and much better search estimation performance. This willbe supported by numericalsimulation re- sultslater.
During the source term estimation process, the mobile agent is assumedtostay atone location fora few secondsto obtaina reliable gassensormeasurement beforemoving tothe nextsam- plingpoint.Notethat, ingeneral, theresponseandrecoverytime ofthegassensorrangesfromafewsecondstominutes[53].This is whywe considered the discrete action strategy moving to the next samplingpoint at each time step witha constant speed in thisstudy.ThisisacommonpracticeintheInfotaxis-basedsource search studies,supported byexperimental validation[17,41]. Tra- jectory (or velocity) planning considering the detailed dynamics
Algorithm1RRT-Infotaxis.
1: T.init(qinit)
2:fork=1,2,. . . ,kmaxdo
3: zk←Readanewsensormeasurement
4:
(θk−1,wk−1)→(θk,wk)
usingEq. (9) 5: qne w←qinit= [rx,k,ry,k]
6: whileNseed≤Ntndo 7: qrand←Sample(qrand)
8: qnearest←FindNearestNode(qrand,T) 9: qne w←Steer(qnearest,qrand) 10: ifObstacle Freethen 11: T.add_node(qne w) 12: T.add_vertex(qne w,qnearest) 13: Nseed=Nseed+1 14: V=
V1, ...,VNb
←TreePrune(T) 15: for allVb= [vb,1,...vb,m]∈V do 16: for n=1,2,...,mdo
17:
(θk,wk)→(ˆθk+n,wk+n)
usingEq. (28)
18: [ˆbk+1,...,bˆk+m]←Collectbinarysensormeasurements 19: JkcomputationwithEq. (32)
20: V∗b=argmaxVb∈V Jk(Vb)← Next best maneuvers for receding horizon timesteps
21: ClearT
22: ifσp<σtthen←estimationconverged 23: break;
24: rk+1= [rx,(k+1),ry,(k+1)]= [rx,k,ry,k]+v∗b,1←movetothenextsampling positionrk+1usingthefirstnodeofVb∗
andnonholonomic constraints of the mobile agent with a high- performance gassensor (providing a fastresponse time) remains asfuturework.
TheentireprocessoftheRRT-InfotaxisissummarizedinAlgo- rithm1.ThedetailofthefunctionsinAlgorithm1isdescribedas follows.
• Sample: This function generates the random position qrand whichisincludedintheregionwiththeradiusof Rrange cen- teringthecurrentlocationofthemobileagent.
• FindNearestNode: This function finds the nearest node from othernodesinRRTtreeT toqrand.
• Steer:Thisfunctiongeneratesqne w alongthepathfromqnearest towardsqrandatadistance where isthefixedincremental distance.
• TreePrune:Thisfunctionreturnsallsetofbranches.
5. Numericalsimulations
Tovalidate theperformance oftheproposed RRT-Infotaxis,we performedMonteCarlosimulations invarious environments. The Gaussianplumemodelandbinarysensormodelareutilized.
5.1. Actualgasdispersioninanenvironmentwithobstacles
Forthesimulations,gasdispersion situationsneedtobemod- eled.Todescribeitrealistically,theGrazLagrangian(GRAL)model, whichisdevelopedbythe GrazUniversityofTechnology,Austria, is utilized for simulations, which computes flows around obsta- cles [54]. It is the revised Lagrangian particle model where the Lagrangian particle model produces gas particles fromthe diffu- sion source by considering not only the gas properties but also thestatisticalenvironment.Thismodelassumesthatatmospheric diffusion is able to be modeled by a Markov chain process and representsportionsofeachparticlebycalculating3-Dwindveloc- ity[55]. The GRALmodel is adopted asthe dispersion modelin the environment with obstacles as it is suitable forfast and ro- bustmodelinginalargearea[54].Thesamplemapisdescribedin Fig.4andtheparametersusedinthismapareasfollows.
Fig. 4.The satellite view of the sample map and gas dispersion map with obstacles using the GRAL model.
•Truesource term: Q0=2 kg/h=0.56 g/h,andr0= [197 m, 235 m];
•Search area: A=255 m×267 m, wind velocity V =2 m/s anddirectionφ=240◦,horizontalstandarddeviation
σ
y=20, vertical standarddeviationσ
z=10, andstackingheight z0= 11 m;•Searchagent: Movementstepsized=9 m,constantflightal- titude at 11 m above the groundwhich is the same as the gasstacking height,andtheradius oftheregionforgenerat- ingRRTsamplenodes,Rrange,isdefinedasRH-step×dwhere RH-steprepresentsrecedinghorizonsteps;
•Estimationcondition:Numberofparticlesfortheparticlefilter N=3,000.Thestandarddeviationofthesensornoiseissetto
σ
g=R(r|θ )+10;and•Terminal conditions: standard deviation of the particle filter
σ
t=2,andtheestimationsuccessthresholdds=2 m.5.2. Theeffectrecedinghorizonstepsandthenumberofnodes WhenimplementingRRT-Infotaxis,we needtosamplerandom nodes to generate RRTtrees. However, takingtoo many samples would notincrease the efficiencyforsource search further while increasing computationalloads.Besides,forapplyingthereceding horizonapproach,thepropernumberoffuturesteps needstobe investigated.Therefore,inorderto findthepropernumberofthe RRTsamplenodesNtnandrecedinghorizonstep(RH-step)m,cor- responding numerical simulations are first conducted; thesetwo parameters are key factors which affect the performance of the algorithm. Withchanging thenumber ofRRTsample nodes, Ntn, the meansearch time (MST) ofRRT-Infotaxis withdifferentm is comparedforthesimulationsusingthesample mapenvironment explainedintheprevioussectioninFig.5.Inthefigure,resultsare averaged over 100MonteCarlosimulations andRH-steps from2 to5are denotedasRH2 toRH5.As Ntn increasesto30,allcases showthedecreasingtrendoftheMST.However,theMSTdoesnot getbetterover30samplenodes.Whentherearetoomanysample nodesinacertainregion,theutility functionvaluedifferencebe- tween differentbranchesmightbeinsignificant.Besides,selecting the best action candidate does not always provide the best per- formance duetothe differencebetweenthe approximatedutility function andrealsituation. From thisresult, RH3 withNtn of30 can be seen asthe proper values. Althoughthe MSTs of RH4 or RH5with40nodesarealmostthesameasthatofRH3,thelarger receding horizonsteps or sample nodes would mean the higher
Fig. 5.The MST of RRT-Infotaxis with different RRT nodes and RH-steps.
computationalload.Thus,allofthefollowingsimulationshereafter areconductedwithmof3andNtnof30.
Itisworthwhilenotingthat,onesearchtime step(incomput- ingMST)includesthetimetakenforcollectingthereliablesensor reading at one position, computation time to calculate the next best samplingposition andtime to reach thenext samplingpo- sition with a constant speed. In this regard, let us compare the MST betweenInfotaxis andthe RRT-Infotaxis. First, the averaged computationtime perstepforcalculatingthebestpositioninIn- fotaxis (one-step lookahead) and RRT-Infotaxis algorithms (RH3) is 0.0024 and 0.0535 seconds in the MATLABenvironment on a desktop with an Intel(R) Core(TM) i7-7707HQ CPU @ 2.80 GHz andNVIDIA GeForce GTX 1060, respectively. On the other hand, themovementdistancebetweenconsecutivesamplingpositionsis fixedintheoriginalInfotaxis(i.e.,movementstepsized=9 mas definedearlier)whilethat oftheRRT-Infotaxisisequalorslightly smaller than d=9 m; this is because the RRT algorithm adds new random nodes (sampling position candidates) to the exist- ing tree within the fixed threshold distance ofd=9 m. In fact, theaverageddistancebetweensamplingpositionsofRRT-Infotaxis is8.3 m.Thus, considering bothtime loss fromcomputationand gain from moving a shorter movement step size between sam- plingpointsofRRT-Infotaxis,wecouldconsiderthatthesameMST forRRT-Infotaxis(RH3)andInfotaxisrepresentsapproximatelythe sametimetakenforthesourcesearchandestimationprocess.