Web application testing: A systematic literature review

(1)

TheJournalofSystemsandSoftware91(2014)174–201

ContentslistsavailableatScienceDirect

The Journal of Systems and Software

jo u rn al h om ep a g e :w w w . e l s e v i e r . c o m / l o c a t e / j s s

Web application testing: A systematic literature review

Serdar Do˘gan

^a,b

, Aysu Betin-Can

^a

, Vahid Garousi

^a,c,d,∗

aDepartmentofInformationSystems,InformaticsInstitute,MiddleEastTechnicalUniversity,Ankara,Turkey

bASELSANA.S.,Ankara,Turkey

cSoftwareQualityEngineeringResearchGroup(SoftQual),DepartmentofElectricalandComputerEngineering,SchulichSchoolofEngineering,University ofCalgary,Calgary,Alberta,Canada

dDepartmentofSoftwareEngineering,AtilimUniversity,Ankara,Turkey

a r t i c l e i n f o

Articlehistory:

Received26August2013 Receivedinrevisedform 16November2013 Accepted5January2014 Availableonline21January2014 Keywords:

Systematicliteraturereview Webapplication

Testing

a b s t r a c t

Context:Thewebhashadasigniﬁcantimpactonallaspectsofoursociety.Asoursocietyreliesmoreand moreontheweb,thedependabilityofwebapplicationshasbecomeincreasinglyimportant.Tomake theseapplicationsmoredependable,forthepastdecaderesearchershaveproposedvarioustechniques fortestingweb-basedsoftwareapplications.Ourliteraturesearchforrelatedstudiesretrieved193papers intheareaofwebapplicationtesting,whichhaveappearedbetween2000and2013.

Objective:Asthisresearchareamaturesandthenumberofrelatedpapersincreases,itisimportantto systematicallyidentify,analyze,andclassifythepublicationsandprovideanoverviewofthetrendsand empiricalevidenceinthisspecializedﬁeld.

Methods:Wesystematicallyreviewthebodyofknowledgerelatedtofunctionaltestingofwebapplication throughasystematicliteraturereview(SLR)study.ThisSLRisafollow-upandcomplimentarystudytoa recentsystematicmapping(SM)studythatweconductedinthisarea.Aspartofthisstudy,weposethree setsofresearchquestions,deﬁneselectionandexclusioncriteria,andsynthesizetheempiricalevidence inthisarea.

Results:Ourpoolofstudiesincludesasetof95papers(fromthe193retrievedpapers)publishedin theareaofwebapplicationtestingbetween2000and2013.ThedataextractedduringourSLRstudyis availablethroughapublicly-accessibleonlinerepository.Amongourresultsarethefollowings:(1)the listoftesttoolsinthisareaandtheircapabilities,(2)thetypesoftestmodelsandfaultmodelsproposed inthisdomain,(3)thewaytheempiricalstudiesinthisareahavebeendesignedandreported,and(4) thestateofempiricalevidenceandindustrialrelevance.

Conclusion:Wediscusstheemergingtrendsinwebapplicationtesting,anddiscusstheimplicationsfor researchersandpractitionersinthisarea.TheresultsofourSLRcanhelpresearcherstoobtainanoverview ofexistingwebapplicationtestingapproaches,faultmodels,tools,metricsandempiricalevidence,and subsequentlyidentifyareasintheﬁeldthatrequiremoreattentionfromtheresearchcommunity.

1. Introduction

Thewebhashadasigniﬁcantimpactonallaspectsofoursoci- ety,frombusiness,education,government,entertainmentsectors, industry,toourpersonallives.Themainadvantagesofadopting thewebfordevelopingsoftwareproductsinclude(1)noinstalla- tioncosts,(2)automaticupgradewithnewfeaturesforallusers,

∗ Correspondingauthorat:SoftwareQualityEngineeringResearchGroup(Soft- Qual),DepartmentofElectrical andComputerEngineering,SchulichSchoolof Engineering,UniversityofCalgary,Calgary,Alberta,Canada.Tel.:+14032105412.

E-mail addresses: [email protected], [email protected] (S. Do˘gan), [email protected] (A. Betin-Can), [email protected], [email protected](V.Garousi).

(3)universalaccessfromanymachineconnectedtotheInternet and(4)beingindependentoftheoperatingsystemofclients.

Onthedownside,theuseofserverandbrowsertechnologies makewebapplicationsparticularlyerror-proneandchallengingto test,causingseriousdependabilitythreats.A2003studyconducted by theBusiness Internet Group SanFrancisco (BIG-SF) (BIG-SF, 2003)reportedthatapproximately70%ofwebsitesandwebappli- cationscontaindefects.Inadditiontoﬁnancialcosts,defectsinweb applicationsresultinlossofrevenueandcredibility.

Thedifﬁcultyintestingwebapplicationsismany-fold.First,web applicationsaredistributed througha client/serverarchitecture, with (asynchronous) HTTP request/response calls to synchro- nize theapplication state.Second, theyare heterogeneous,i.e., webapplicationsaredevelopedusingdifferentprogramminglan- guages,forinstance,HTML,CSS,JavaScriptontheclient-sideand 0164-1212/$–seefrontmatter©2014ElsevierInc.Allrightsreserved.

http://dx.doi.org/10.1016/j.jss.2014.01.010

(2)

PHP,Ruby,Javaontheserver-side.Andthird,webapplicationshave adynamicnature;inmanyscenariostheyalsopossessnondeter- ministiccharacteristics.

During the past decade, researchers in increasing numbers, haveproposeddifferenttechniquesforanalyzingandtestingthese dynamic, fast evolving software systems. As the research area maturesandthenumberofrelatedpapersincreases,itisimportant tosystematicallyidentify,analyzeandclassifythestate-of-the-art andprovideanoverviewofthetrendsinthisspecializedﬁeld.In thispaper,wepresentasystematicliteraturereview(SLR)ofthe webapplicationtesting(WAT)researchdomain.

In a recentwork, weconducted a systematicmapping (SM) study(Garousietal.,2013)inwhichwereviewed79papersinthe WATdomain.ThecurrentSLRisafollow-upcomplementarystudy afterourSMstudy.WecontinueinthisSLRthesecondarystudy thatwestartedinourSMbyfocusingindepthintotheempirical andevidence-baseaspectsoftheWATdomain.TheSMandtheSLR studieshavebeenconductedbypayingcloseattentiontomajordif- ferencesbetweenthesetwotypesofsecondarystudies,e.g.,refer totheguidelinebyKitchenhamandCharters(2007).

Tothebestofourknowledge,thispaperistheﬁrstSLRinthearea ofWAT.Theremainderofthispaperisoutlinedasfollows.Areview oftherelatedworkispresentedinSection2.Section3explainsour researchmethodologyandresearchquestions.Section4presents theresultsoftheSLR.Section5discussesthemainﬁndings,trends andthevaliditythreats.Finally,Section6concludesthepaper.

2. Backgroundandrelatedwork

Weclassifyrelatedworkintothreecategories:(1)secondary studiesthat havebeenreportedinthebroaderareaofsoftware testing, (2) onlinerepositories in softwareengineering, and (3) secondarystudiesfocusingonwebapplicationtesting.

2.1. Secondarystudiesinsoftwaretesting

Wewereabletoﬁnd24secondarystudies(Garousietal.,2013;

Afzaletal.,2008;Palaciosetal.,2011;Barmietal.,2011;Netoetal., 2011a,b;EngströmandRuneson,2011;Netoetal.,2012;Afzaletal.,

2009;Zakariaetal.,2009;EndoandSimao,2010;Alietal.,2010;

Engströmetal.,2010;Binder,1996;Juristoetal.,2004;McMinn, 2004;Grindaletal.,2005;CanforaandPenta,2008;P˘as˘areanuand Visser,2009;Bozkurtetal.,2010;JiaandHarman,2011;Memon andNguyen,2010;Hellmannetal.,2010;Banerjeeetal.,2013) reported,asofthiswriting,indifferentareasofsoftwaretesting.We listandcategorizethesestudiesinTable1alongwiththeirstudy areas.Basedonthe“year”column,weobservethatmoreandmore SMsandSLRshaverecentlystartedtoappearintheareaofsoftware testing.Asperourliteraturesearch,wewereabletoﬁndeightSMs andﬁveSLRsin thearea, asshowninthetable. Theremaining 11studiesare“surveys”,“taxonomies”,“literaturereviews”,and

“analysisand survey”,termsusedbytheauthorsthemselvesto describetheirsecondarystudies.Thenumberofprimarystudies studiedineachstudyinTable1variesfrom6(Hellmannetal., 2010)to264(JiaandHarman,2011).

OurrecentSMstudy(Garousietal.,2013)reviewed79papersin theWATdomainandisconsideredtheﬁrststep(phase)ofthecur- rentSLR.Themapping(scoping)thatweconductedintheSMstudy enabledustoclassifythepapersandidentifyempiricalstudies.We continueinthisSLRthesecondarystudythatwestartedinourSM byfocusingindepthintotheempiricalandevidence-baseaspects oftheWATdomain.Notethatasdiscussedbyotherresearchers suchasPetersenetal.(2008)andKitchenhamandCharters(2007), thegoalandscopeofSMandSLRstudiesarequitedifferentandwe followthosedistinctionsbetweenourpreviousSM(Garousietal., 2013)andthisSLR.

2.2. Onlinepaperrepositoriesinsoftwareengineering

A few recentsecondary studies have reportedonline repos- itories to supplement their study with the actual data. These repositoriesaretheby-productsofSMorSLRstudiesandwillbe usefultopractitionersandresearchersbyproviding asummary of alltheworks ina given area. Mostof theserepositoriesare maintainedandupdatedregularly.Forinstance,Harmanetal.have developedandsharedtwoonlinepaperrepositories:oneinthe areaofmutationtesting(JiaandHarman,2013),andanotherinthe areaofsearch-basedsoftwareengineering(SBSE)(Zhang,2013).In Table1

22Secondarystudiesinsoftwaretesting.

Typeofsecondarystudy Secondarystudyarea Numberofprimarystudies Year Reference

SM Non-functionalsearch-basedtesting 35 2008 Afzaletal.(2008)

SOAtesting 33 2011 Palaciosetal.(2011)

Testingusingrequirementsspeciﬁcation 35 2011 Barmietal.(2011)

Productlinestesting 45 2011 Netoetal.(2011a)

Productlinestesting 64 2011 EngströmandRuneson(2011)

Productlinestestingtools 33 2012 Netoetal.(2012)

Webapplicationtesting 79 2013 Garousietal.(2013)

GraphicalUserInterface(GUI)testing 136 2013 Banerjeeetal.(2013)

SLR Search-basednon-functionaltesting 35 2009 Afzaletal.(2009)

UnittestingforBusinessProcessExecution Language(BPEL)

27 2009 Zakariaetal.(2009)

Formaltestingofwebservices 37 2010 EndoandSimao(2010)

Search-basedtest-casegeneration 68 2010 Alietal.(2010)

Regressiontestselectiontechniques 27 2010 Engströmetal.(2010)

Survey/analysis Object-orientedtesting 140 1996 Binder(1996)

Testingtechniquesexperiments 36 2004 Juristoetal.(2004)

Search-basedtestdatageneration 73 2004 McMinn(2004)

Combinatorialtesting 30 2005 Grindaletal.(2005)

SOAtesting 64 2008 MesbahandPrasad(2011)

Symbolicexecution 70 2009 P˘as˘areanuandVisser(2009)

Testingwebservices 86 2010 Bozkurtetal.(2010)

Mutationtesting 264 2011 JiaandHarman(2011)

Productlinestesting 16 2011 Netoetal.(2011b)

Taxonomy Model-basedGUItesting 33 2010 MemonandNguyen(2010)

Literaturereview TDDofuserinterfaces 6 2010 Hellmannetal.(2010)

(3)

176 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 threerecentSMstudiesconductedbyusandourcolleagues,we

alsosharedthepaperrepositoriesonline(Garousi,2012a,b,c).

Webelievethisisavaluableundertakingsincemaintainingand sharingsuchrepositoriesprovidesmanybeneﬁtstothebroader community. For example, they are valuable resources for new researchersinthearea,andforresearchersaimingtoconductaddi- tionalsecondarystudies.Therefore,weprovidethedetailsofour SLRstudyasanonlinepaperrepository(Doganetal.,2013).

2.3. Secondarystudiesinwebapplicationtesting

WewereabletoﬁndfoursecondarystudiesintheareaofWAT (KamandDean,2009;Alalﬁetal.,2009;LuccaandFasolino,2006;

Amalﬁtanoetal.,2010b).Asummaryoftheseworksislistedin Table2.Allfourworksseemtobeconventional(unsystematic)surveys.Also,thesizeoftheirpoolofstudiesisrathersmall,between 20and29papers.

Kamand Dean(2009)conductedasurveyof20 WATpapers classifyingthemintosixgroups:formal,object-oriented,statistical,UML-based,slicing,andusersession-based.Alalfietal.(2009) presentedasurveyof24differentmodelingmethodsusedinweb verificationandtesting. Theauthorscategorized,compared and analyzedthedifferentmodelingmethodsaccordingtonavigation, behavior, and content.Lucca and Fasolino (2006) presented an overviewofthedifferencesbetweenwebapplicationsandtradi- tionalsoftwareapplications,andhowsuchdifferencesimpactthe testingoftheformer.Theyprovidealistofrelevantcontributions intheareaoffunctionalwebapplicationtesting.Amalfitanoetal.

(2010b)proposedaclassiﬁcationframeworkforrichinternetappli- cationtestinganddescribeanumberofexistingwebtestingtools fromtheliteraturebyplacingtheminthisframework.

Alltheseexistingstudieshaveseveralshortcomingsthatlimit theirreplication,generalization,and usabilityinstructuringthe researchbody onWAT. First, theyare all conducted in an ad- hocmanner, without a systematic approach for reviewing the literature.Second,sincetheirselectioncriteriaarenotexplicitly described,reproducingtheresultsisnotpossible.Third,theydonot representabroadperspectiveandtheirscopesarelimited,mainly becausetheyfocusonalimitednumberofrelatedpapers.

AnotherremotelyrelatedworkisInsfranandFernandez(2008) inwhichaSLRofusabilityevaluationinwebdevelopmenthasbeen reported,butthatworkdoesnotcovertheareaofWAT.Tothebest ofourknowledge,therehasbeennoSLRsofarintheﬁeldofWAT.

3. Researchmethod

Wediscussnextthefollowingaspectsofourresearchmethod:

•Overview.

•Goalandresearchquestions.

•Articleselection.

•Finalpoolofarticlesandtheonlinerepository.

•Dataextraction.

•Datasynthesis.

3.1. Overview

ThisSLRiscarriedoutfollowingtheguidelinesandprocesspro- posedbyKitchenhamandCharters(2007),whichhasthefollowing mainsteps:

•Planningthereview:

◦Identiﬁcationoftheneedforareview.

◦Specifyingtheresearchquestions.

◦Developingareviewprotocol.

◦Evaluatingthereviewprotocol.

•Conductingthereview:

◦Identiﬁcationofresearch.

◦Selectionofprimarystudies.

◦Studyqualityassessment.

◦Dataextractionandmonitoring.

◦Datasynthesis.

Afteridentifyingtheneedforthereview,wespecifytheresearch questions(RQs),whichareexplainedinSection3.2.Theprocess (reviewprotocol)thatwedevelopedintheplanningphaseandthen usedtoconductthisSLRstudyisoutlinedinFig.1.Theprocessstarts witharticleselection(discussedindetailinSection3.3).Then,we adaptedtheclassiﬁcationschemethatwehaddevelopedintheSM study(Garousietal.,2013)asabaselineandextendedittoaddress thegoalsandRQsofthisSLR.Afterwards,weconductedmapping inpreparationoftheSLRanalysisandthensynthesistoaddressthe RQs.

3.2. Goalandresearchquestions

AccordingtotheguidelinebyKitchenhamandCharters(2007), thegoalofaSMisclassiﬁcationandthematicanalysisofliterature onasoftwareengineeringtopic,whilethegoalofaSLRistoiden- tifybestpracticeswithrespecttospeciﬁcprocedures,technologies, methodsor toolsby aggregating informationfrom comparative studies.

RQsofaSMaregeneric,i.e.,relatedtoresearchtrends,andare oftheform:whichresearchers,howmuchactivity,whattypeof studies.Ontheotherhand,RQsofaSLRarespeciﬁc,meaningthat theyarerelatedtooutcomesofempiricalstudies.SLRsaretypically ofgreaterdepththanSMs.Often,SLRsincludeanSMasapartof theirstudy(Petersenetal.,2008).Inotherwords,theresultsofaSM canbefedintoamorerigorousSLRstudytosupportevidence-based Table2

Secondarystudiesinwebapplicationtesting.

Papertitle Reference Year #ofprimarystudies Summary

Lessonslearnedfromasurveyofweb applicationstesting

KamandDean(2009) 2009 20 Asurveyof20papersclassifyingthemintosixgroups:formal, object-oriented,statistical,UML-based,slicing,anduser session-based

Modelingmethodsforwebapplication veriﬁcationandtesting:stateoftheart

Alalﬁetal.(2009) 2009 24 Thestudysurveyed24differentmodelingmethodsusedin websiteveriﬁcationandtesting.Basedonashortcatalogof desirablepropertiesofwebapplicationsthatrequireanalysis, twodifferentviewsofthemethodswerepresented:ageneral categorizationbymodelinglevel,andadetailedcomparison basedonpropertycoverage

Testingweb-basedapplications:the stateoftheartandfuturetrends

LuccaandFasolino(2006) 2006 27 Surveyeddifferenttest-speciﬁcmodelingtechniques,test-case designtechniques,andtesttoolsforwebapplications Techniquesandtoolsforrichinternet

applicationstesting

Amalﬁtanoetal.(2010b) 2010 29 Aclassiﬁcationframeworkforrichinternetapplicationtesting andthelistofexistingwebtestingtools

(4)

Fig.1.ThereviewprotocolusedinthisSLRstudy.

softwareengineeringandthatisexactlywhatwehavefollowedin ourwork.

ThegoalofourSLRstudyistoidentify,analyze,andsynthesize workpublishedduringthepast13yearsintheﬁeldofWATwithan in-depthfocusontheempiricalandevidence-baseaspects.Based onourresearchgoal,weformulatethreeRQs.Toextractdetailed information,eachquestionisfurtherdividedintoanumberofsub- questions,asdescribedbelow:

•RQ1–Whattypesoftestmodels,faultmodelsandtoolshave beenproposed?

Itisimportantfornewresearcherstoknowthetypeandchar- acteristicsoftheaboveartifactstobeabletostartnewresearch workinthisarea.

◦RQ1.1–Whattypesofinput/inferredtestmodelshavebeen proposed/used?

◦RQ1.2–Whattypesoffaultmodels/bugtaxonomyrelatedto webapplicationshavebeenproposed?

◦RQ1.3–Whattoolshavebeenproposedandwhataretheir capabilities?

•RQ2 – How are the empirical studiesin WAT designed and reported?

Astudythathasbeenproperlydesignedandreportediseasyto assessandreplicate.Thefollowingsub-questionsaimatcharac- terizingsomeofthemostimportantaspectsofthestudydesign andhowwellstudiesaredesignedandreported:

◦RQ2.1–Whatarethemetricsusedforassessingcostandeffec- tivenessofWATtechniques?

◦RQ2.2–Whatarethethreatstovalidityintheempiricalstud- ies?

◦RQ2.3–Whatisthelevelofrigorandindustrialrelevanceof theempiricalstudies?

•RQ3–WhatisthestateofempiricalevidenceinWAT?

ThisRQattemptstosynthesizetheresultsreportedinthestud- iesinordertoassesshowmuchempiricalevidencewecurrently have.to answerthis question,weaddress thefollowing sub- questions:

◦RQ3.1–Isthereanyevidenceregardingthescalabilityofthe WATtechniques?

◦RQ3.2–Havedifferenttechniquesbeenempiricallycompared witheachother?

◦RQ3.3–Howmuchempiricalevidenceexistsforeachcategory oftechniquesandtypeofwebapps?

SomeoftheRQs(e.g.,RQs1.1–1.3,2.3,3.2and3.3)havebeen raisedspeciﬁctoourSLRwhilesomeothers havebeenadapted fromsimilarSLRsinotherareasoftesting,e.g.,RQs2.1,2.2and 3.3havebeenadaptedfromanotherrecentSLRonsearch-based testing(Alietal.,2010).

Tofurtherclarifythedifferenceingoalsandscopebetweenthe previousSMstudyandthisSLR,theRQsoftheSMstudyarelistedin Table3(adaptedfromGarousietal.,2013).Wecanseethatthetwo setsofRQsaredifferentfromeachotherandarecomplimentary.

3.3. Articleselection

Wefollowedthesamearticleselectionstrategyaswehadused inourSMstudy.Webrieﬂydiscussnextthefollowingaspectsof articleselection:

•Sourceselectionandsearchkeywords.

•Applicationofinclusion/exclusioncriteria.

3.3.1. Sourceselectionandsearchkeywords

BasedontheSLRguidelines(KitchenhamandCharters,2007;

Petersenetal.,2008),toﬁndrelevantstudies,wesearchedthefol- lowingsixmajoronlinesearchacademicarticlesearchengines:

(1) IEEEXplore,¹ (2) ACM DigitalLibrary,² (3) GoogleScholar,³ (4) MicrosoftAcademicSearch,⁴ (5) CiteSeerX,⁵ and (6) Science Direct.⁶Thesesearchengineshavealsobeenusedinothersimilar studies(e.g.,Netoetal.,2011a;Elberzhageretal.,2012).

1http://ieeexplore.ieee.org.

2http://dl.acm.org.

3http://scholar.google.com.

4http://academic.research.microsoft.com.

5http://citeseerx.ist.psu.edu.

6http://www.sciencedirect.com.

(5)

178 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 Table3

RQsoftheSMstudy.

RQ1–Systematicmapping:

•RQ1.1–Typeofcontribution:howmanypaperspresenttest

methods/techniques,testtools,testmodels,testmetrics,ortestprocesses?

•RQ1.2–Typeofresearchmethod:whattypeofresearchmethodsareused inthepapersinthisarea?

•RQ1.3–Typeoftestingactivity:whattype(s)oftestingactivitiesare presentedinthepapers?

•RQ1.4–Testlocation:howmanyclient-sideversusserversidetesting approacheshavebeenpresented?

•RQ1.5–Testinglevels:whichtestlevelshavereceivedmoreattention (e.g.,unit,integrationandsystemtesting)?

•RQ1.6–Sourceofinformationtoderivetestartifacts:whatsourcesof informationareusedtoderivetestartifacts?

•RQ1.7–Techniquetoderivetestartifacts:whattechniqueshavebeen usedtogeneratetestartifacts?

•RQ1.8–Typeoftestartifact:whichtypesoftestartifacts(e.g.,testcases, testinputs)havebeengenerated?

•RQ1.9–Manualversusautomatedtesting:howmanymanualversus automatedtestingapproacheshavebeenproposed?

•RQ1.10–Typeoftheevaluationmethod:whattypesofevaluation methodsareused?

•RQ1.11–Staticwebsitesversusdynamicwebapplications:howmanyof theapproachesaretargetedatstaticwebsitesversusdynamicweb applications?

•RQ1.12–SynchronicityofHTTPcalls:howmanytechniquestarget synchronouscallsversusasynchronousAjaxcalls?

•RQ1.13–Client-tierwebtechnologies:whichclient-tierwebtechnologies (e.g.,JavaScript,DOM)havebeensupportedmoreoften?

•RQ1.14–Server-tierwebtechnologies:whichserver-tierweb technologies(e.g.,PHP,JSP)havebeensupportedmoreoften?

•RQ1.15–Toolspresentedinthepapers:whatarethenamesof web-testingtoolsproposedanddescribedinthepapers,andhowmanyof themarefreelyavailablefordownload?

•RQ1.16–Attributesofthewebsoftwareundertest:whattypesofSystems UnderTest(SUT),i.e.,intermsofbeingopen-sourceorcommercial,have beenusedandwhataretheirattributes,e.g.,size,metrics?

RQ2–Trendsanddemographicsofthepublications:

•RQ2.1–Publicationcountbyyear:whatistheannualnumberof publicationsinthisﬁeld?

•RQ2.2–Top-citedpapers:whichpapershavebeencitedthemostbyother papers?

•RQ2.3–Activeresearchers:whoarethemostactiveresearchersinthe area,measuredbynumberofpublishedpapers?

•RQ2.5–Topvenues:whichvenues(i.e.,conferences,journals)arethe maintargetsofpapersinthisﬁeld?

(AdaptedfromGarousietal.,2013).

ThecoveragelandscapeofthisSLRistheareaoffunctionaltest- ingofwebapplications,aswellas(dynamicorstatic)analysisto supportWAT.Thesetofsearchtermsweredevisedinasystem- aticand iterativefashion,i.e.,westartedwithaninitialsetand iterativelyimprovedthesetuntilnofurtherrelevantpaperscould befoundtoimproveourpoolofprimarystudies.Bytakingallof theaboveaspectsintoaccount,weformulatedoursearchqueryas showninTable4.

SincetheSMstudyhadidentiﬁed79studiesthrougharigorous searchprocess,weimportedthemintotheSLRpaperpoolwithout anadditionalscanning.NotethattheSMhadconsideredthepapers publisheduntilSummer2011.Wesearchedformorerecentpapers between2011and2013andincludedtheminourcandidatepool.

ThepaperselectionphaseofthisstudywascarriedoutonApriland May2013.

Table4 Searchkeywords.

(webOR websiteOR “webapplication” OR Ajax OR JavaScript OR HTMLOR DOMOR PHP OR J2EE OR servlet

OR JSP OR .NET OR Ruby OR ASP OR Python OR Perl OR CGI) AND (test OR testing OR

analysisOR analyzing OR “dynamic analysis”OR “static analysis” OR verification)

Todecreasetheriskofmissingrelevantstudies,similartopre- viousSMstudiesandSLRs,wesearchedthefollowingsourcesas wellmanually:

•Referencesfoundinstudiesalreadyinthepool.

•Personalwebpagesofactiveresearchersintheﬁeldofinterest:

weextractedthenamesofactiveresearchersfromtheinitialset ofpapersfoundintheabovesearchengines.

Allstudiesfoundintheadditionalvenuesthatwerenotyetinthe poolofselectedstudiesbutseemedtobecandidatesforinclusion wereaddedtotheinitialpool.Withtheabovesearchstringsand searchinspeciﬁcvenues,wefound114studieswhichweconsid- eredasourinitialpoolofpotentially-relevantstudies(alsodepicted inFig.1).Atthisstage,papersintheinitialpoolwerereadyfor applicationofinclusion/exclusioncriteriadescribednext.

3.3.2. Applicationofinclusion/exclusioncriteria

Inclusion/exclusioncriteriawerealsothesameastheSMstudy.

Toincreasethereliabilityofourstudyanditsresults,theauthors applieda systematic voting process among theteam members inthepaperselectionphasefor decidingwhethertoincludeor excludeanyofthepapersintheﬁrstversionofthepool.Thisprocess wasalsoutilizedtominimizepersonalbiasofeachoftheauthors.

Theteammembershadconﬂictingopinionsonfourpapers,which wereresolvedthrough discussions.Our voting mechanism (i.e., exclusionandinclusioncriteria)wasbasedontwoquestions:(1)is thepaperrelevanttofunctionalwebapplicationtestingandanaly- sis?(2)Doesthepaperincludearelativelysoundvalidation?These criteriawereappliedtoallpapers,includingthosepresentingtech- niques,tools,orcasestudies/experiments.

Each author then independently answered these two ques- tionsforeach paper.Onlywhen agivenpaperreceivedatleast twopositiveanswers(fromthreevotingauthors)foreachofthe two questions, it was included in the pool. Otherwise, it was excluded. We primarily voted for papers based on their title, abstract, keywords, as well as theirevaluation sections. If not enoughinformationcouldbeinferredfromtheabstract,acare- fulreviewofthecontentswasalsoconductedtoensurethatallthe papershadadirectrelevancetoourfocusedtopic.

3.4. Finalpoolofarticlesandtheonlinerepository

Aftertheinitialsearchandthefollow-upanalysisforexclusion ofunrelatedandinclusionofadditionalstudies,thepoolofselected studieswasfinalizedwith95studies.Thereadercanrefertothe referencessectionattheendofthepaperforthefullreferencelist ofallprimarystudies.Thefinalpoolofselectedstudieshasalso beenpublishedinanonlinerepositoryusingtheGoogleDocssys- tem,andispublicallyaccessibleonlineatDoganetal.(2013).The classificationsofeachselectedpublicationaccordingtotheclas- sificationschemepresentedinGarousietal.(2013)andalsothe empiricaldataextractedfromthestudiesarealsoavailableinthe onlinerepository.

Fig.2showsthenumberofpapersinthepoolbytheiryearof publication.Wecannoticethattrendhasbeengenerallyincreasing from2000until2010,butthereisasomewhatdecreasingtrend from2010to2013.Notethatthesincethestudywasconductedin themidstoftheyear2013,thusthedataforthisyeararepartial.

3.5. Researchtypeclassiﬁcationofthearticlesintheﬁnalpool TheRQ2and RQ3inthis studyfocusonempiricalstudiesin WAT,forwhichweconductthesystematicanalysisonlyonthe setofempiricalstudiesinourpoolofpapers.Therefore,wehadto

(6)

Fig.2.Annualtrendofstudiesincludedinourpool.

distinguishtheempiricalstudiesintheﬁnalpool.Wehavecatego- rizedthe95primarystudiesbyadoptingtheclassiﬁcationapproach forresearchfacetofpapersproposedbyPetersenetal.(2008),as follows:

1.Solutionproposal:Asolutionforaproblemisproposed,which canbeeithernovelorasigniﬁcantextensionofanexistingtech- nique.Thepotentialbeneﬁtsandtheapplicabilityofthesolution areshownbyasmallexampleoragoodlineofargumentation.

2.Validationresearch:Techniquesinvestigatedarenovelandhave notyetbeenimplementedinpractice.Techniquesusedarefor exampleexperiments,i.e.,workdoneinthelaboratory.

3.Evaluationresearch:Techniquesare implementedinpractice andanevaluationofthetechniqueisconducted.Thatmeans,itis shownhowthetechniqueisimplementedinpractice(solution implementation)andwhataretheconsequencesoftheimple- mentationintermsofbeneﬁtsanddrawbacks(implementation evaluation).

InourpreviousSMstudy(Garousietal.,2013),wehad con- ductedsuchacategorizationwithinthescopeofthatstudy,i.e.,for the79primarystudiestomapeachstudybasedonitsresearchfacet accordingtotheapproachproposedbyPetersenetal.(2008).Here, weprovidethiscategorizationforthe95primarystudiesinthecur- rentpooltoshowthediversityofthestudiesandtoshowwhich oneshave beenconsideredinaddressingRQ2andRQ3.Table5 showsthelistofprimarystudiesineachofthesecategories.For theRQ2andRQ3weexaminedonlytheempiricalstudieswhich arethearticlesinthevalidationandevaluationresearchcategory.

3.6. Dataextraction

Asdiscussedabove,ourrecentSMstudywasthefirststepofthe currentSLR.Themapping(scoping)thatweconductedintheSM studyenabledustoclassifythepapersandidentifyempiricalstud- ies(usingtheclassification“researchfacet”intheSMstudy).We conductedseveralnewtypesofclassificationsinthisSLRtofurther classifytheprimarystudiesasneededbyseveralofourcurrentRQs, e.g.,RQ1.1(input/inferredtestmodels),andRQ2.1(metricsused forassessingcostandeffectivenessofWATtechniques).

Thedataextractionphasewasconductedcollaborativelyamong the authorsand data wererecorded in the online spreadsheet (Doganetal.,2013).

3.7. Datasynthesis

Afterconductingfurthermappinganalysisforthepurposeof RQ1inthisSLR,weconductedsynthesisforansweringRQs2and 3.Todevelopourmethodofsynthesis,wecarefullyreviewedthe researchsynthesisguidelinesinsoftwareengineering(e.g.,Cruzes and Dybå, 2010,2011a,b), and alsootherSLRs which hadused high-qualitysynthesisapproaches(e.g.,Alietal.,2010;Waliaaand Carverb,2009).

According to (Cruzes and Dybå, 2010), the key objective of researchsynthesisistoevaluatetheincludedstudiesforhetero- geneityandselectappropriatemethodsforintegratingorproviding interpretiveexplanationsaboutthem(Cooperetal.,2009).Ifthe primarystudiesaresimilarenoughwithrespecttointerventions and quantitative outcome variables, it maybe possibletosyn- thesizethembymeta-analysis,whichusesstatisticalmethodsto combineeffectsizes.However,insoftwareengineeringingeneral andinourfocusedWATdomaininparticular,primarystudiesare oftentooheterogeneoustopermitastatisticalsummary.Especially forqualitativeandmixedmethodsstudies,differentmethodsof researchsynthesis,e.g.,thematicanalysisandnarrativesynthesis, arerequired(CruzesandDybå,2010).

Basedonthoseguidelines,thefactthattheprimarystudiesin ourpoolweretooheterogeneousandalsothetypeofRQsinour SLR,itwasimperativetousethematicanalysisandnarrativesyn- thesis(CruzesandDybå,2010,2011a,b)inthiswork.Wefollowed thethematicanalysisstepsrecommendedby(CruzesandDybå, 2011a)whichwereasfollows:extractingdata,codingdata,trans- latingcodesintothemes,creatingamodelofhigher-orderthemes, andﬁnallyassessingthetrustworthinessofsynthesis.

4. Results

Inthissection,wepresenttheresultsofourSLRstudy.

4.1. RQ1–Whattypesoftestmodels,faultmodelsandtoolshave beenproposed?

WediscussnextresultsforRQ1.1,1.2and1.3.

4.1.1. RQ1.1–Whattypesofinput/inferredtestmodelshave beenproposed/used?

Alargenumberofstudies(40,42%)proposedtesttechniques whichusedcertainmodelsasinputorinferred(reverseengineered) them.Weclassiﬁedthosetestmodelsasfollowsandthefrequen- ciesareshowninFig.3:

•Navigation models (for pages): models such as ﬁnite-state machines(FSM)whichspecifytheﬂowamongthepagesofaweb application.

•Controlordataﬂowmodels:thesemodelsareinunitleveland areeithercontrolordataﬂowsinsideasinglemodule/function.

•DOM models: theDocumentObject Model (DOM)is a cross- platformandlanguage-independentconventionforrepresenting and interactingwith objectsin HTML, XHTML and XML doc- uments. Several approaches generated DOM models for the purposeoftest-casegenerationortestoracles.

•Other:anymodelsotherthantheabove.

Thenavigationmodelsseemtobethemostpopularas22studies useda type a of navigationmodels.7 studiesusedDOM mod- elsfor thepurposeof testing.Five studiesusedcontrolordata

(7)

180 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 Table5

Typesoftheresearchfacet(method)ofthearticlesinourpool(basedonthesystematicmappingapproachfollowedinourSMstudy(Garousietal.,2013)andalsothe approachproposedbyPetersenetal.(2008).

Researchfacettypes Paperreferences Numberofpapers

Solutionproposal TonellaandRicca(2002,2004a,b,2005),DiLuccaetal.(2006),Törsel(2013),Qietal.(2006),Ricca andTonella(2001,2002a),Alalﬁetal.(2010),Offuttetal.(2004),Sampathetal.(2004),Hu(2002), Liu(2006),DiSciascioetal.(2005),Stepienetal.(2008),MatosandSousa(2010),Raffeltetal.

(2008),BordbarandAnastasakis(2006),Koopmanetal.(2007),Ernitsetal.(2009),OffuttandWu (2009),SongandMiao(2009),Gerlits(2010),MinamideandEngineering(2005),Zhengetal.

(2011),Liuetal.(2000),MansourandHouri(2006),Luccaetal.(2002),Andrewsetal.(2005), Bellettinietal.(2005),Amyotetal.(2005),LicataandKrishnamurthi(2004),Benediktetal.(2002), Castellucciaetal.(2006),andXiongetal.(2005)

36

Validationresearch Artzietal.(2011),Saxenaetal.(2010),Marbacketal.(2012),Sampathetal.(2005,2006a,2007), PraphamontripongandOffutt(2010),MesbahandPrasad(2011),HalfondandOrso(2008), Sprenkleetal.(2005b,2007,2008,2012),HarmanandAlshahwan(2008),Dobolyietal.(2010), Ranetal.(2009),Luoetal.(2009),Alshahwanetal.(2012),Choudharyetal.(2012),Ozkinaciand BetinCan(2011),PattabiramanandZorn(2010),EttemaandBunch(2010),Marchettoetal.

(2007),Artzietal.(2008),Thummalapentaetal.(2013),HalfondandOrso(2007),Alshahwanetal.

(2009),MirshokraieandMesbah(2012),KallepalliandTian(2001),JensenandMøller(2011),Li etal.(2010),Sakamotoetal.(2013),ArtziandPistoia(2010),Halfondetal.(2009),Amalﬁtano etal.(2010a),Andrewsetal.(2010),Marchettoetal.(2008b),Marchetto(2008),LiuandKuanTan (2008),RiccaandTonella(2002b),HaoandMendes(2006),PengandLu(2011),TianandMa (2006),Choudharyetal.(2010),andDallmeieretal.(2013)

45

Evaluationresearch Marchettoetal.(2008a),Sprenkleetal.(2005a,2011),AlshahwanandHarman(2011), Mirshokraieetal.(2013),MesbahandVanDeursen(2009),Mesbahetal.(2012),Elbaumetal.

(2005),DobolyiandWeimer(2010),SampathandBryce(2008),Roestetal.(2010),Marchettoand Tonella(2010),andSampathetal.(2006b)

13

ﬂowmodels.The“Other”categoryincludedmodelssuchas:pro- gramdependencegraphs(PDGs)(Marbacketal.,2012),Database ExtendedFinite StateMachine (DEFSM)(Ranet al., 2009), Test architectureasUMLmodels(Hu,2002),andUniﬁedMarkovModels (UMMs)(KallepalliandTian,2001).

Weobservedthatmodelswereused/treatedinoneoffollowing threecases:(1)asinputtotheproposedtechnique,(2)extracted fromthewebapplicationsandused insubsequentstepsofthe approach,or(3)astheoutputofthetechnique.Weclassiﬁedthe primarystudiesaccordingtothisusagepurposeofthemodelsand providetheresultsofthisclassiﬁcationintheresultsshowedus thatmostofthestudiesareextractingmodelsofWAsinaphaseof thetechniqueandusethosemodelsasinputforconsequentphases.

ResultsareshowninTable6.

For instance,it is worth highlightingthe studies(Ricca and Tonella,2001;DiSciascioetal.,2005;BordbarandAnastasakis, 2006)whichusethemodelsin differentmultiplephasesofthe technique.RiccaandTonella(2001)andBordbarandAnastasakis (2006)extractthemodelofthewebapplicationsundertestand usethismodelintheirtechniques.Thesetwostudiesalsoprovide

theextractedmodelsasoutput.Previouslypreparedmodelsofweb applicationsundertestarefedasinputtotechniquesproposedin DiSciascioetal.(2005)andBordbarandAnastasakis(2006)and thesemodelsaretransformedtoothertypesofmodelswhichare usedasinputtotheconsequentstepsoftheproposedtechniques.

Theresultsshowedusthatmostofthestudiesareextractingmod- elsofWAsinaphaseofthetechniqueandusethosemodelsas inputforconsequentphases.

4.1.2. RQ1.2–Whattypesoffaultmodels/bugtaxonomyrelated towebapplicationshavebeenproposed?

Todevelopeffectivetesttechniques,itisimportanttounder- standandcharacterizefaultsspeciﬁcforwebapplicationsandto developbugtaxonomiesinthisdomain.Similartowhatwascon- ductedbyanotherSLRintheareaoftestingconcurrentsoftware (Souzaetal.,2011),weextractedthewebfaultmodelsandbug taxonomiesproposedintheprimarystudies,assynthesizedand summarizedinTable7.Rowsofthetablearethefaulttypesand thecolumnsaretheprimarystudiesdiscussingortargetingthese typesoffaults.21studiesdiscussedfaultmodelsandsomeofthem

Fig.3. Typesandfrequenciesofinferredtestmodels.

(8)

Table6

Classiﬁcationoftheprimarystudiesbasedonthetreatmentoftestmodels.

Inputtothetechnique Törsel(2013),DiSciascioetal.(2005),Bordbar andAnastasakis(2006)andErnitsetal.(2009) Extractedandusedinthe

technique

TonellaandRicca(2004a),Artzietal.(2011), Marbacketal.(2012),RiccaandTonella(2001), MesbahandPrasad(2011),Ranetal.(2009), Sampathetal.(2004),Hu(2002),Choudhary etal.(2012),DiSciascioetal.(2005),Tonella andRicca(2002),EttemaandBunch(2010), Thummalapentaetal.(2013),MesbahandVan Deursen(2009),Mesbahetal.(2012),Sprenkle etal.(2012),BordbarandAnastasakis(2006), KallepalliandTian(2001),Koopmanetal.

(2007),Roestetal.(2010),Marchettoetal.

(2008b),Marchetto(2008),Liuetal.(2000),Liu andKuanTan(2008),MansourandHouri (2006),Luccaetal.(2002),Bellettinietal.

(2005),Amyotetal.(2005),PengandLu (2011),MarchettoandTonella(2010),Licata andKrishnamurthi(2004),Castellucciaetal.

(2006)andDallmeieretal.(2013)

Outputofthetechnique RiccaandTonella(2001),Liu(2006),Bordbar andAnastasakis(2006),OffuttandWu(2009) andSongandMiao(2009)

conductedfaultfeeding(mutationtesting)basedontheproposed typesoffaults.

Itisworthhighlightingtwoofthestudies(Mirshokraieetal., 2013;Marchettoetal.,2007)inthiscontext.InMirshokraieetal.

(2013),abugseveritytaxonomywasproposedwhich wasused toassesstheeffectivenessandperformanceofamutationtesting toolforJavaScript.InMarchettoetal.(2007),awebfaulttaxon- omywasproposed,wasempiricallyvalidated,andusedforfault feeding(mutationtesting).Inthatstudy,31faulttypesclassified under6categorieswereproposed,e.g.,faultsrelatedtobrowser incompatibility,faultsrelatedtotheneededplugins,faultsinses- sionsynchronization,faultsinpersistenceofsessionobjects,and faultswhilemanipulatingcookies.Fourstudies(Praphamontripong andOffutt,2010;PattabiramanandZorn,2010;EttemaandBunch, 2010; Marchetto et al., 2008b) proposed more in-depth levels of fault taxonomies. Praphamontripong and Offutt (2010) and EttemaandBunch(2010)proposedtaxonomyfornavigationfaults andPattabiramanandZorn(2010)andMarchettoetal.(2008b) classified asynchronous communication faults, specific to Ajax applications.Theraw(before-synthesis)dataforthelistoffault

Fig.4.Numberoftoolspresentedinthepapersovertheyears.

models/bugtaxonomiesineachstudyareprovidedinAppendixA (Table27).

4.1.3. RQ1.3–Whattoolshavebeenproposedandwhataretheir capabilities?

52ofthe95papers(54%)presented(mostlyprototype-level) toolsupportfortheproposedWATapproaches.Wethoughtthat anaturalquestiontoaskinthiscontextiswhetherthepresented toolsareavailablefordownload,sothatotherresearchersandprac- titionerscouldusethemaswell.Weonlycountedapresentedtool availablefordownloadifitwasexplicitlymentionedinthepaper.

Iftheauthorshadnotexplicitlymentionedthatthetoolisavail- ablefordownload,wedidnotconductinternetsearchesforthe toolnames.Only11ofthe52presentedtools(21%)wereavail- ablefordownload.Wealsowantedtoknowwhethermorerecently presentedtoolsweremorelikelytobeavailablefordownload.To assessthis,Fig.4showstheannualtrendofthenumberoftools presentedinthepapersandwhethertheyareavailablefordown- load.Wecannoticethat,inthepaperspresentedafter2008,more andmoretoolsareavailablefordownload,whichisagoodsignfor thecommunity.

Table7

Faultmodels/bugtaxonomyrelatedtowebapplications.

Faultmodels/bugtaxonomy Paperreferences

Authentication,permission Marchettoetal.(2008a),Luoetal.(2009),KallepalliandTian(2001)andDobolyiandWeimer(2010) Internationalization Marchettoetal.(2008a)andDobolyiandWeimer(2010)

Functionality Marchettoetal.(2008a),Sprenkleetal.(2007),DobolyiandWeimer(2010)andChoudharyetal.(2010)

Portability Marchettoetal.(2008a)

Navigation Marchettoetal.(2008a),PraphamontripongandOffutt(2010),Sprenkleetal.(2005b),Luoetal.(2009),Ettemaand Bunch(2010),KallepalliandTian(2001),DobolyiandWeimer(2010)andPengandLu(2011)

Asynchronouscommunication Marchettoetal.(2008a,b)andPattabiramanandZorn(2010)

Session Marchettoetal.(2008a),Marchettoetal.(2007)andDobolyiandWeimer(2010) Formconstruction Marchettoetal.(2008a),Sprenkleetal.(2007)andElbaumetal.(2005)

Database,persistance Sprenkleetal.(2007),Marchettoetal.(2007),Elbaumetal.(2005),DobolyiandWeimer(2010)andPengandLu (2011)

Appearance(GUI,layout,etc.) Sprenkleetal.(2007),Luoetal.(2009),DobolyiandWeimer(2010),PengandLu(2011)andChoudharyetal.(2010) Multitierarchitectural Luoetal.(2009)

Syntactic,programminglanguage OzkinaciandBetinCan(2011),Artzietal.(2008)andArtziandPistoia(2010)

Cookiemanipulation Marchettoetal.(2007)

Plugin Marchettoetal.(2007)

Cross-browserincompatibility Marchettoetal.(2007)andMesbahetal.(2012)

DOM Mesbahetal.(2012),MirshokraieandMesbah(2012)andMarchettoetal.(2008b)

Scripting Elbaumetal.(2005)andJensenandMøller(2011)

Codedisclosure DobolyiandWeimer(2010)

CSS DobolyiandWeimer(2010)