TheJournalofSystemsandSoftware91(2014)174–201
ContentslistsavailableatScienceDirect
The Journal of Systems and Software
jo u rn al h om ep a g e :w w w . e l s e v i e r . c o m / l o c a t e / j s s
Web application testing: A systematic literature review
Serdar Do˘gan
a,b, Aysu Betin-Can
a, Vahid Garousi
a,c,d,∗aDepartmentofInformationSystems,InformaticsInstitute,MiddleEastTechnicalUniversity,Ankara,Turkey
bASELSANA.S.,Ankara,Turkey
cSoftwareQualityEngineeringResearchGroup(SoftQual),DepartmentofElectricalandComputerEngineering,SchulichSchoolofEngineering,University ofCalgary,Calgary,Alberta,Canada
dDepartmentofSoftwareEngineering,AtilimUniversity,Ankara,Turkey
a r t i c l e i n f o
Articlehistory:
Received26August2013 Receivedinrevisedform 16November2013 Accepted5January2014 Availableonline21January2014 Keywords:
Systematicliteraturereview Webapplication
Testing
a b s t r a c t
Context:Thewebhashadasignificantimpactonallaspectsofoursociety.Asoursocietyreliesmoreand moreontheweb,thedependabilityofwebapplicationshasbecomeincreasinglyimportant.Tomake theseapplicationsmoredependable,forthepastdecaderesearchershaveproposedvarioustechniques fortestingweb-basedsoftwareapplications.Ourliteraturesearchforrelatedstudiesretrieved193papers intheareaofwebapplicationtesting,whichhaveappearedbetween2000and2013.
Objective:Asthisresearchareamaturesandthenumberofrelatedpapersincreases,itisimportantto systematicallyidentify,analyze,andclassifythepublicationsandprovideanoverviewofthetrendsand empiricalevidenceinthisspecializedfield.
Methods:Wesystematicallyreviewthebodyofknowledgerelatedtofunctionaltestingofwebapplication throughasystematicliteraturereview(SLR)study.ThisSLRisafollow-upandcomplimentarystudytoa recentsystematicmapping(SM)studythatweconductedinthisarea.Aspartofthisstudy,weposethree setsofresearchquestions,defineselectionandexclusioncriteria,andsynthesizetheempiricalevidence inthisarea.
Results:Ourpoolofstudiesincludesasetof95papers(fromthe193retrievedpapers)publishedin theareaofwebapplicationtestingbetween2000and2013.ThedataextractedduringourSLRstudyis availablethroughapublicly-accessibleonlinerepository.Amongourresultsarethefollowings:(1)the listoftesttoolsinthisareaandtheircapabilities,(2)thetypesoftestmodelsandfaultmodelsproposed inthisdomain,(3)thewaytheempiricalstudiesinthisareahavebeendesignedandreported,and(4) thestateofempiricalevidenceandindustrialrelevance.
Conclusion:Wediscusstheemergingtrendsinwebapplicationtesting,anddiscusstheimplicationsfor researchersandpractitionersinthisarea.TheresultsofourSLRcanhelpresearcherstoobtainanoverview ofexistingwebapplicationtestingapproaches,faultmodels,tools,metricsandempiricalevidence,and subsequentlyidentifyareasinthefieldthatrequiremoreattentionfromtheresearchcommunity.
©2014ElsevierInc.Allrightsreserved.
1. Introduction
Thewebhashadasignificantimpactonallaspectsofoursoci- ety,frombusiness,education,government,entertainmentsectors, industry,toourpersonallives.Themainadvantagesofadopting thewebfordevelopingsoftwareproductsinclude(1)noinstalla- tioncosts,(2)automaticupgradewithnewfeaturesforallusers,
∗ Correspondingauthorat:SoftwareQualityEngineeringResearchGroup(Soft- Qual),DepartmentofElectrical andComputerEngineering,SchulichSchoolof Engineering,UniversityofCalgary,Calgary,Alberta,Canada.Tel.:+14032105412.
E-mail addresses: [email protected], [email protected] (S. Do˘gan), [email protected] (A. Betin-Can), [email protected], [email protected](V.Garousi).
(3)universalaccessfromanymachineconnectedtotheInternet and(4)beingindependentoftheoperatingsystemofclients.
Onthedownside,theuseofserverandbrowsertechnologies makewebapplicationsparticularlyerror-proneandchallengingto test,causingseriousdependabilitythreats.A2003studyconducted by theBusiness Internet Group SanFrancisco (BIG-SF) (BIG-SF, 2003)reportedthatapproximately70%ofwebsitesandwebappli- cationscontaindefects.Inadditiontofinancialcosts,defectsinweb applicationsresultinlossofrevenueandcredibility.
Thedifficultyintestingwebapplicationsismany-fold.First,web applicationsaredistributed througha client/serverarchitecture, with (asynchronous) HTTP request/response calls to synchro- nize theapplication state.Second, theyare heterogeneous,i.e., webapplicationsaredevelopedusingdifferentprogramminglan- guages,forinstance,HTML,CSS,JavaScriptontheclient-sideand 0164-1212/$–seefrontmatter©2014ElsevierInc.Allrightsreserved.
http://dx.doi.org/10.1016/j.jss.2014.01.010
PHP,Ruby,Javaontheserver-side.Andthird,webapplicationshave adynamicnature;inmanyscenariostheyalsopossessnondeter- ministiccharacteristics.
During the past decade, researchers in increasing numbers, haveproposeddifferenttechniquesforanalyzingandtestingthese dynamic, fast evolving software systems. As the research area maturesandthenumberofrelatedpapersincreases,itisimportant tosystematicallyidentify,analyzeandclassifythestate-of-the-art andprovideanoverviewofthetrendsinthisspecializedfield.In thispaper,wepresentasystematicliteraturereview(SLR)ofthe webapplicationtesting(WAT)researchdomain.
In a recentwork, weconducted a systematicmapping (SM) study(Garousietal.,2013)inwhichwereviewed79papersinthe WATdomain.ThecurrentSLRisafollow-upcomplementarystudy afterourSMstudy.WecontinueinthisSLRthesecondarystudy thatwestartedinourSMbyfocusingindepthintotheempirical andevidence-baseaspectsoftheWATdomain.TheSMandtheSLR studieshavebeenconductedbypayingcloseattentiontomajordif- ferencesbetweenthesetwotypesofsecondarystudies,e.g.,refer totheguidelinebyKitchenhamandCharters(2007).
Tothebestofourknowledge,thispaperisthefirstSLRinthearea ofWAT.Theremainderofthispaperisoutlinedasfollows.Areview oftherelatedworkispresentedinSection2.Section3explainsour researchmethodologyandresearchquestions.Section4presents theresultsoftheSLR.Section5discussesthemainfindings,trends andthevaliditythreats.Finally,Section6concludesthepaper.
2. Backgroundandrelatedwork
Weclassifyrelatedworkintothreecategories:(1)secondary studiesthat havebeenreportedinthebroaderareaofsoftware testing, (2) onlinerepositories in softwareengineering, and (3) secondarystudiesfocusingonwebapplicationtesting.
2.1. Secondarystudiesinsoftwaretesting
Wewereabletofind24secondarystudies(Garousietal.,2013;
Afzaletal.,2008;Palaciosetal.,2011;Barmietal.,2011;Netoetal., 2011a,b;EngströmandRuneson,2011;Netoetal.,2012;Afzaletal.,
2009;Zakariaetal.,2009;EndoandSimao,2010;Alietal.,2010;
Engströmetal.,2010;Binder,1996;Juristoetal.,2004;McMinn, 2004;Grindaletal.,2005;CanforaandPenta,2008;P˘as˘areanuand Visser,2009;Bozkurtetal.,2010;JiaandHarman,2011;Memon andNguyen,2010;Hellmannetal.,2010;Banerjeeetal.,2013) reported,asofthiswriting,indifferentareasofsoftwaretesting.We listandcategorizethesestudiesinTable1alongwiththeirstudy areas.Basedonthe“year”column,weobservethatmoreandmore SMsandSLRshaverecentlystartedtoappearintheareaofsoftware testing.Asperourliteraturesearch,wewereabletofindeightSMs andfiveSLRsin thearea, asshowninthetable. Theremaining 11studiesare“surveys”,“taxonomies”,“literaturereviews”,and
“analysisand survey”,termsusedbytheauthorsthemselvesto describetheirsecondarystudies.Thenumberofprimarystudies studiedineachstudyinTable1variesfrom6(Hellmannetal., 2010)to264(JiaandHarman,2011).
OurrecentSMstudy(Garousietal.,2013)reviewed79papersin theWATdomainandisconsideredthefirststep(phase)ofthecur- rentSLR.Themapping(scoping)thatweconductedintheSMstudy enabledustoclassifythepapersandidentifyempiricalstudies.We continueinthisSLRthesecondarystudythatwestartedinourSM byfocusingindepthintotheempiricalandevidence-baseaspects oftheWATdomain.Notethatasdiscussedbyotherresearchers suchasPetersenetal.(2008)andKitchenhamandCharters(2007), thegoalandscopeofSMandSLRstudiesarequitedifferentandwe followthosedistinctionsbetweenourpreviousSM(Garousietal., 2013)andthisSLR.
2.2. Onlinepaperrepositoriesinsoftwareengineering
A few recentsecondary studies have reportedonline repos- itories to supplement their study with the actual data. These repositoriesaretheby-productsofSMorSLRstudiesandwillbe usefultopractitionersandresearchersbyproviding asummary of alltheworks ina given area. Mostof theserepositoriesare maintainedandupdatedregularly.Forinstance,Harmanetal.have developedandsharedtwoonlinepaperrepositories:oneinthe areaofmutationtesting(JiaandHarman,2013),andanotherinthe areaofsearch-basedsoftwareengineering(SBSE)(Zhang,2013).In Table1
22Secondarystudiesinsoftwaretesting.
Typeofsecondarystudy Secondarystudyarea Numberofprimarystudies Year Reference
SM Non-functionalsearch-basedtesting 35 2008 Afzaletal.(2008)
SOAtesting 33 2011 Palaciosetal.(2011)
Testingusingrequirementsspecification 35 2011 Barmietal.(2011)
Productlinestesting 45 2011 Netoetal.(2011a)
Productlinestesting 64 2011 EngströmandRuneson(2011)
Productlinestestingtools 33 2012 Netoetal.(2012)
Webapplicationtesting 79 2013 Garousietal.(2013)
GraphicalUserInterface(GUI)testing 136 2013 Banerjeeetal.(2013)
SLR Search-basednon-functionaltesting 35 2009 Afzaletal.(2009)
UnittestingforBusinessProcessExecution Language(BPEL)
27 2009 Zakariaetal.(2009)
Formaltestingofwebservices 37 2010 EndoandSimao(2010)
Search-basedtest-casegeneration 68 2010 Alietal.(2010)
Regressiontestselectiontechniques 27 2010 Engströmetal.(2010)
Survey/analysis Object-orientedtesting 140 1996 Binder(1996)
Testingtechniquesexperiments 36 2004 Juristoetal.(2004)
Search-basedtestdatageneration 73 2004 McMinn(2004)
Combinatorialtesting 30 2005 Grindaletal.(2005)
SOAtesting 64 2008 MesbahandPrasad(2011)
Symbolicexecution 70 2009 P˘as˘areanuandVisser(2009)
Testingwebservices 86 2010 Bozkurtetal.(2010)
Mutationtesting 264 2011 JiaandHarman(2011)
Productlinestesting 16 2011 Netoetal.(2011b)
Taxonomy Model-basedGUItesting 33 2010 MemonandNguyen(2010)
Literaturereview TDDofuserinterfaces 6 2010 Hellmannetal.(2010)
176 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 threerecentSMstudiesconductedbyusandourcolleagues,we
alsosharedthepaperrepositoriesonline(Garousi,2012a,b,c).
Webelievethisisavaluableundertakingsincemaintainingand sharingsuchrepositoriesprovidesmanybenefitstothebroader community. For example, they are valuable resources for new researchersinthearea,andforresearchersaimingtoconductaddi- tionalsecondarystudies.Therefore,weprovidethedetailsofour SLRstudyasanonlinepaperrepository(Doganetal.,2013).
2.3. Secondarystudiesinwebapplicationtesting
WewereabletofindfoursecondarystudiesintheareaofWAT (KamandDean,2009;Alalfietal.,2009;LuccaandFasolino,2006;
Amalfitanoetal.,2010b).Asummaryoftheseworksislistedin Table2.Allfourworksseemtobeconventional(unsystematic)sur- veys.Also,thesizeoftheirpoolofstudiesisrathersmall,between 20and29papers.
Kamand Dean(2009)conductedasurveyof20 WATpapers classifyingthemintosixgroups:formal,object-oriented,statisti- cal,UML-based,slicing,andusersession-based.Alalfietal.(2009) presentedasurveyof24differentmodelingmethodsusedinweb verificationandtesting. Theauthorscategorized,compared and analyzedthedifferentmodelingmethodsaccordingtonavigation, behavior, and content.Lucca and Fasolino (2006) presented an overviewofthedifferencesbetweenwebapplicationsandtradi- tionalsoftwareapplications,andhowsuchdifferencesimpactthe testingoftheformer.Theyprovidealistofrelevantcontributions intheareaoffunctionalwebapplicationtesting.Amalfitanoetal.
(2010b)proposedaclassificationframeworkforrichinternetappli- cationtestinganddescribeanumberofexistingwebtestingtools fromtheliteraturebyplacingtheminthisframework.
Alltheseexistingstudieshaveseveralshortcomingsthatlimit theirreplication,generalization,and usabilityinstructuringthe researchbody onWAT. First, theyare all conducted in an ad- hocmanner, without a systematic approach for reviewing the literature.Second,sincetheirselectioncriteriaarenotexplicitly described,reproducingtheresultsisnotpossible.Third,theydonot representabroadperspectiveandtheirscopesarelimited,mainly becausetheyfocusonalimitednumberofrelatedpapers.
AnotherremotelyrelatedworkisInsfranandFernandez(2008) inwhichaSLRofusabilityevaluationinwebdevelopmenthasbeen reported,butthatworkdoesnotcovertheareaofWAT.Tothebest ofourknowledge,therehasbeennoSLRsofarinthefieldofWAT.
3. Researchmethod
Wediscussnextthefollowingaspectsofourresearchmethod:
•Overview.
•Goalandresearchquestions.
•Articleselection.
•Finalpoolofarticlesandtheonlinerepository.
•Dataextraction.
•Datasynthesis.
3.1. Overview
ThisSLRiscarriedoutfollowingtheguidelinesandprocesspro- posedbyKitchenhamandCharters(2007),whichhasthefollowing mainsteps:
•Planningthereview:
◦Identificationoftheneedforareview.
◦Specifyingtheresearchquestions.
◦Developingareviewprotocol.
◦Evaluatingthereviewprotocol.
•Conductingthereview:
◦Identificationofresearch.
◦Selectionofprimarystudies.
◦Studyqualityassessment.
◦Dataextractionandmonitoring.
◦Datasynthesis.
Afteridentifyingtheneedforthereview,wespecifytheresearch questions(RQs),whichareexplainedinSection3.2.Theprocess (reviewprotocol)thatwedevelopedintheplanningphaseandthen usedtoconductthisSLRstudyisoutlinedinFig.1.Theprocessstarts witharticleselection(discussedindetailinSection3.3).Then,we adaptedtheclassificationschemethatwehaddevelopedintheSM study(Garousietal.,2013)asabaselineandextendedittoaddress thegoalsandRQsofthisSLR.Afterwards,weconductedmapping inpreparationoftheSLRanalysisandthensynthesistoaddressthe RQs.
3.2. Goalandresearchquestions
AccordingtotheguidelinebyKitchenhamandCharters(2007), thegoalofaSMisclassificationandthematicanalysisofliterature onasoftwareengineeringtopic,whilethegoalofaSLRistoiden- tifybestpracticeswithrespecttospecificprocedures,technologies, methodsor toolsby aggregating informationfrom comparative studies.
RQsofaSMaregeneric,i.e.,relatedtoresearchtrends,andare oftheform:whichresearchers,howmuchactivity,whattypeof studies.Ontheotherhand,RQsofaSLRarespecific,meaningthat theyarerelatedtooutcomesofempiricalstudies.SLRsaretypically ofgreaterdepththanSMs.Often,SLRsincludeanSMasapartof theirstudy(Petersenetal.,2008).Inotherwords,theresultsofaSM canbefedintoamorerigorousSLRstudytosupportevidence-based Table2
Secondarystudiesinwebapplicationtesting.
Papertitle Reference Year #ofprimarystudies Summary
Lessonslearnedfromasurveyofweb applicationstesting
KamandDean(2009) 2009 20 Asurveyof20papersclassifyingthemintosixgroups:formal, object-oriented,statistical,UML-based,slicing,anduser session-based
Modelingmethodsforwebapplication verificationandtesting:stateoftheart
Alalfietal.(2009) 2009 24 Thestudysurveyed24differentmodelingmethodsusedin websiteverificationandtesting.Basedonashortcatalogof desirablepropertiesofwebapplicationsthatrequireanalysis, twodifferentviewsofthemethodswerepresented:ageneral categorizationbymodelinglevel,andadetailedcomparison basedonpropertycoverage
Testingweb-basedapplications:the stateoftheartandfuturetrends
LuccaandFasolino(2006) 2006 27 Surveyeddifferenttest-specificmodelingtechniques,test-case designtechniques,andtesttoolsforwebapplications Techniquesandtoolsforrichinternet
applicationstesting
Amalfitanoetal.(2010b) 2010 29 Aclassificationframeworkforrichinternetapplicationtesting andthelistofexistingwebtestingtools
Fig.1.ThereviewprotocolusedinthisSLRstudy.
softwareengineeringandthatisexactlywhatwehavefollowedin ourwork.
ThegoalofourSLRstudyistoidentify,analyze,andsynthesize workpublishedduringthepast13yearsinthefieldofWATwithan in-depthfocusontheempiricalandevidence-baseaspects.Based onourresearchgoal,weformulatethreeRQs.Toextractdetailed information,eachquestionisfurtherdividedintoanumberofsub- questions,asdescribedbelow:
•RQ1–Whattypesoftestmodels,faultmodelsandtoolshave beenproposed?
Itisimportantfornewresearcherstoknowthetypeandchar- acteristicsoftheaboveartifactstobeabletostartnewresearch workinthisarea.
◦RQ1.1–Whattypesofinput/inferredtestmodelshavebeen proposed/used?
◦RQ1.2–Whattypesoffaultmodels/bugtaxonomyrelatedto webapplicationshavebeenproposed?
◦RQ1.3–Whattoolshavebeenproposedandwhataretheir capabilities?
•RQ2 – How are the empirical studiesin WAT designed and reported?
Astudythathasbeenproperlydesignedandreportediseasyto assessandreplicate.Thefollowingsub-questionsaimatcharac- terizingsomeofthemostimportantaspectsofthestudydesign andhowwellstudiesaredesignedandreported:
◦RQ2.1–Whatarethemetricsusedforassessingcostandeffec- tivenessofWATtechniques?
◦RQ2.2–Whatarethethreatstovalidityintheempiricalstud- ies?
◦RQ2.3–Whatisthelevelofrigorandindustrialrelevanceof theempiricalstudies?
•RQ3–WhatisthestateofempiricalevidenceinWAT?
ThisRQattemptstosynthesizetheresultsreportedinthestud- iesinordertoassesshowmuchempiricalevidencewecurrently have.to answerthis question,weaddress thefollowing sub- questions:
◦RQ3.1–Isthereanyevidenceregardingthescalabilityofthe WATtechniques?
◦RQ3.2–Havedifferenttechniquesbeenempiricallycompared witheachother?
◦RQ3.3–Howmuchempiricalevidenceexistsforeachcategory oftechniquesandtypeofwebapps?
SomeoftheRQs(e.g.,RQs1.1–1.3,2.3,3.2and3.3)havebeen raisedspecifictoourSLRwhilesomeothers havebeenadapted fromsimilarSLRsinotherareasoftesting,e.g.,RQs2.1,2.2and 3.3havebeenadaptedfromanotherrecentSLRonsearch-based testing(Alietal.,2010).
Tofurtherclarifythedifferenceingoalsandscopebetweenthe previousSMstudyandthisSLR,theRQsoftheSMstudyarelistedin Table3(adaptedfromGarousietal.,2013).Wecanseethatthetwo setsofRQsaredifferentfromeachotherandarecomplimentary.
3.3. Articleselection
Wefollowedthesamearticleselectionstrategyaswehadused inourSMstudy.Webrieflydiscussnextthefollowingaspectsof articleselection:
•Sourceselectionandsearchkeywords.
•Applicationofinclusion/exclusioncriteria.
3.3.1. Sourceselectionandsearchkeywords
BasedontheSLRguidelines(KitchenhamandCharters,2007;
Petersenetal.,2008),tofindrelevantstudies,wesearchedthefol- lowingsixmajoronlinesearchacademicarticlesearchengines:
(1) IEEEXplore,1 (2) ACM DigitalLibrary,2 (3) GoogleScholar,3 (4) MicrosoftAcademicSearch,4 (5) CiteSeerX,5 and (6) Science Direct.6Thesesearchengineshavealsobeenusedinothersimilar studies(e.g.,Netoetal.,2011a;Elberzhageretal.,2012).
1http://ieeexplore.ieee.org.
2http://dl.acm.org.
3http://scholar.google.com.
4http://academic.research.microsoft.com.
5http://citeseerx.ist.psu.edu.
6http://www.sciencedirect.com.
178 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 Table3
RQsoftheSMstudy.
RQ1–Systematicmapping:
•RQ1.1–Typeofcontribution:howmanypaperspresenttest
methods/techniques,testtools,testmodels,testmetrics,ortestprocesses?
•RQ1.2–Typeofresearchmethod:whattypeofresearchmethodsareused inthepapersinthisarea?
•RQ1.3–Typeoftestingactivity:whattype(s)oftestingactivitiesare presentedinthepapers?
•RQ1.4–Testlocation:howmanyclient-sideversusserversidetesting approacheshavebeenpresented?
•RQ1.5–Testinglevels:whichtestlevelshavereceivedmoreattention (e.g.,unit,integrationandsystemtesting)?
•RQ1.6–Sourceofinformationtoderivetestartifacts:whatsourcesof informationareusedtoderivetestartifacts?
•RQ1.7–Techniquetoderivetestartifacts:whattechniqueshavebeen usedtogeneratetestartifacts?
•RQ1.8–Typeoftestartifact:whichtypesoftestartifacts(e.g.,testcases, testinputs)havebeengenerated?
•RQ1.9–Manualversusautomatedtesting:howmanymanualversus automatedtestingapproacheshavebeenproposed?
•RQ1.10–Typeoftheevaluationmethod:whattypesofevaluation methodsareused?
•RQ1.11–Staticwebsitesversusdynamicwebapplications:howmanyof theapproachesaretargetedatstaticwebsitesversusdynamicweb applications?
•RQ1.12–SynchronicityofHTTPcalls:howmanytechniquestarget synchronouscallsversusasynchronousAjaxcalls?
•RQ1.13–Client-tierwebtechnologies:whichclient-tierwebtechnologies (e.g.,JavaScript,DOM)havebeensupportedmoreoften?
•RQ1.14–Server-tierwebtechnologies:whichserver-tierweb technologies(e.g.,PHP,JSP)havebeensupportedmoreoften?
•RQ1.15–Toolspresentedinthepapers:whatarethenamesof web-testingtoolsproposedanddescribedinthepapers,andhowmanyof themarefreelyavailablefordownload?
•RQ1.16–Attributesofthewebsoftwareundertest:whattypesofSystems UnderTest(SUT),i.e.,intermsofbeingopen-sourceorcommercial,have beenusedandwhataretheirattributes,e.g.,size,metrics?
RQ2–Trendsanddemographicsofthepublications:
•RQ2.1–Publicationcountbyyear:whatistheannualnumberof publicationsinthisfield?
•RQ2.2–Top-citedpapers:whichpapershavebeencitedthemostbyother papers?
•RQ2.3–Activeresearchers:whoarethemostactiveresearchersinthe area,measuredbynumberofpublishedpapers?
•RQ2.5–Topvenues:whichvenues(i.e.,conferences,journals)arethe maintargetsofpapersinthisfield?
(AdaptedfromGarousietal.,2013).
ThecoveragelandscapeofthisSLRistheareaoffunctionaltest- ingofwebapplications,aswellas(dynamicorstatic)analysisto supportWAT.Thesetofsearchtermsweredevisedinasystem- aticand iterativefashion,i.e.,westartedwithaninitialsetand iterativelyimprovedthesetuntilnofurtherrelevantpaperscould befoundtoimproveourpoolofprimarystudies.Bytakingallof theaboveaspectsintoaccount,weformulatedoursearchqueryas showninTable4.
SincetheSMstudyhadidentified79studiesthrougharigorous searchprocess,weimportedthemintotheSLRpaperpoolwithout anadditionalscanning.NotethattheSMhadconsideredthepapers publisheduntilSummer2011.Wesearchedformorerecentpapers between2011and2013andincludedtheminourcandidatepool.
ThepaperselectionphaseofthisstudywascarriedoutonApriland May2013.
Table4 Searchkeywords.
(webOR websiteOR “webapplication” OR Ajax OR JavaScript OR HTMLOR DOMOR PHP OR J2EE OR servlet
OR JSP OR .NET OR Ruby OR ASP OR Python OR Perl OR CGI) AND (test OR testing OR
analysisOR analyzing OR “dynamic analysis”OR “static analysis” OR verification)
Todecreasetheriskofmissingrelevantstudies,similartopre- viousSMstudiesandSLRs,wesearchedthefollowingsourcesas wellmanually:
•Referencesfoundinstudiesalreadyinthepool.
•Personalwebpagesofactiveresearchersinthefieldofinterest:
weextractedthenamesofactiveresearchersfromtheinitialset ofpapersfoundintheabovesearchengines.
Allstudiesfoundintheadditionalvenuesthatwerenotyetinthe poolofselectedstudiesbutseemedtobecandidatesforinclusion wereaddedtotheinitialpool.Withtheabovesearchstringsand searchinspecificvenues,wefound114studieswhichweconsid- eredasourinitialpoolofpotentially-relevantstudies(alsodepicted inFig.1).Atthisstage,papersintheinitialpoolwerereadyfor applicationofinclusion/exclusioncriteriadescribednext.
3.3.2. Applicationofinclusion/exclusioncriteria
Inclusion/exclusioncriteriawerealsothesameastheSMstudy.
Toincreasethereliabilityofourstudyanditsresults,theauthors applieda systematic voting process among theteam members inthepaperselectionphasefor decidingwhethertoincludeor excludeanyofthepapersinthefirstversionofthepool.Thisprocess wasalsoutilizedtominimizepersonalbiasofeachoftheauthors.
Theteammembershadconflictingopinionsonfourpapers,which wereresolvedthrough discussions.Our voting mechanism (i.e., exclusionandinclusioncriteria)wasbasedontwoquestions:(1)is thepaperrelevanttofunctionalwebapplicationtestingandanaly- sis?(2)Doesthepaperincludearelativelysoundvalidation?These criteriawereappliedtoallpapers,includingthosepresentingtech- niques,tools,orcasestudies/experiments.
Each author then independently answered these two ques- tionsforeach paper.Onlywhen agivenpaperreceivedatleast twopositiveanswers(fromthreevotingauthors)foreachofthe two questions, it was included in the pool. Otherwise, it was excluded. We primarily voted for papers based on their title, abstract, keywords, as well as theirevaluation sections. If not enoughinformationcouldbeinferredfromtheabstract,acare- fulreviewofthecontentswasalsoconductedtoensurethatallthe papershadadirectrelevancetoourfocusedtopic.
3.4. Finalpoolofarticlesandtheonlinerepository
Aftertheinitialsearchandthefollow-upanalysisforexclusion ofunrelatedandinclusionofadditionalstudies,thepoolofselected studieswasfinalizedwith95studies.Thereadercanrefertothe referencessectionattheendofthepaperforthefullreferencelist ofallprimarystudies.Thefinalpoolofselectedstudieshasalso beenpublishedinanonlinerepositoryusingtheGoogleDocssys- tem,andispublicallyaccessibleonlineatDoganetal.(2013).The classificationsofeachselectedpublicationaccordingtotheclas- sificationschemepresentedinGarousietal.(2013)andalsothe empiricaldataextractedfromthestudiesarealsoavailableinthe onlinerepository.
Fig.2showsthenumberofpapersinthepoolbytheiryearof publication.Wecannoticethattrendhasbeengenerallyincreasing from2000until2010,butthereisasomewhatdecreasingtrend from2010to2013.Notethatthesincethestudywasconductedin themidstoftheyear2013,thusthedataforthisyeararepartial.
3.5. Researchtypeclassificationofthearticlesinthefinalpool TheRQ2and RQ3inthis studyfocusonempiricalstudiesin WAT,forwhichweconductthesystematicanalysisonlyonthe setofempiricalstudiesinourpoolofpapers.Therefore,wehadto
Fig.2.Annualtrendofstudiesincludedinourpool.
distinguishtheempiricalstudiesinthefinalpool.Wehavecatego- rizedthe95primarystudiesbyadoptingtheclassificationapproach forresearchfacetofpapersproposedbyPetersenetal.(2008),as follows:
1.Solutionproposal:Asolutionforaproblemisproposed,which canbeeithernovelorasignificantextensionofanexistingtech- nique.Thepotentialbenefitsandtheapplicabilityofthesolution areshownbyasmallexampleoragoodlineofargumentation.
2.Validationresearch:Techniquesinvestigatedarenovelandhave notyetbeenimplementedinpractice.Techniquesusedarefor exampleexperiments,i.e.,workdoneinthelaboratory.
3.Evaluationresearch:Techniquesare implementedinpractice andanevaluationofthetechniqueisconducted.Thatmeans,itis shownhowthetechniqueisimplementedinpractice(solution implementation)andwhataretheconsequencesoftheimple- mentationintermsofbenefitsanddrawbacks(implementation evaluation).
InourpreviousSMstudy(Garousietal.,2013),wehad con- ductedsuchacategorizationwithinthescopeofthatstudy,i.e.,for the79primarystudiestomapeachstudybasedonitsresearchfacet accordingtotheapproachproposedbyPetersenetal.(2008).Here, weprovidethiscategorizationforthe95primarystudiesinthecur- rentpooltoshowthediversityofthestudiesandtoshowwhich oneshave beenconsideredinaddressingRQ2andRQ3.Table5 showsthelistofprimarystudiesineachofthesecategories.For theRQ2andRQ3weexaminedonlytheempiricalstudieswhich arethearticlesinthevalidationandevaluationresearchcategory.
3.6. Dataextraction
Asdiscussedabove,ourrecentSMstudywasthefirststepofthe currentSLR.Themapping(scoping)thatweconductedintheSM studyenabledustoclassifythepapersandidentifyempiricalstud- ies(usingtheclassification“researchfacet”intheSMstudy).We conductedseveralnewtypesofclassificationsinthisSLRtofurther classifytheprimarystudiesasneededbyseveralofourcurrentRQs, e.g.,RQ1.1(input/inferredtestmodels),andRQ2.1(metricsused forassessingcostandeffectivenessofWATtechniques).
Thedataextractionphasewasconductedcollaborativelyamong the authorsand data wererecorded in the online spreadsheet (Doganetal.,2013).
3.7. Datasynthesis
Afterconductingfurthermappinganalysisforthepurposeof RQ1inthisSLR,weconductedsynthesisforansweringRQs2and 3.Todevelopourmethodofsynthesis,wecarefullyreviewedthe researchsynthesisguidelinesinsoftwareengineering(e.g.,Cruzes and Dybå, 2010,2011a,b), and alsootherSLRs which hadused high-qualitysynthesisapproaches(e.g.,Alietal.,2010;Waliaaand Carverb,2009).
According to (Cruzes and Dybå, 2010), the key objective of researchsynthesisistoevaluatetheincludedstudiesforhetero- geneityandselectappropriatemethodsforintegratingorproviding interpretiveexplanationsaboutthem(Cooperetal.,2009).Ifthe primarystudiesaresimilarenoughwithrespecttointerventions and quantitative outcome variables, it maybe possibletosyn- thesizethembymeta-analysis,whichusesstatisticalmethodsto combineeffectsizes.However,insoftwareengineeringingeneral andinourfocusedWATdomaininparticular,primarystudiesare oftentooheterogeneoustopermitastatisticalsummary.Especially forqualitativeandmixedmethodsstudies,differentmethodsof researchsynthesis,e.g.,thematicanalysisandnarrativesynthesis, arerequired(CruzesandDybå,2010).
Basedonthoseguidelines,thefactthattheprimarystudiesin ourpoolweretooheterogeneousandalsothetypeofRQsinour SLR,itwasimperativetousethematicanalysisandnarrativesyn- thesis(CruzesandDybå,2010,2011a,b)inthiswork.Wefollowed thethematicanalysisstepsrecommendedby(CruzesandDybå, 2011a)whichwereasfollows:extractingdata,codingdata,trans- latingcodesintothemes,creatingamodelofhigher-orderthemes, andfinallyassessingthetrustworthinessofsynthesis.
4. Results
Inthissection,wepresenttheresultsofourSLRstudy.
4.1. RQ1–Whattypesoftestmodels,faultmodelsandtoolshave beenproposed?
WediscussnextresultsforRQ1.1,1.2and1.3.
4.1.1. RQ1.1–Whattypesofinput/inferredtestmodelshave beenproposed/used?
Alargenumberofstudies(40,42%)proposedtesttechniques whichusedcertainmodelsasinputorinferred(reverseengineered) them.Weclassifiedthosetestmodelsasfollowsandthefrequen- ciesareshowninFig.3:
•Navigation models (for pages): models such as finite-state machines(FSM)whichspecifytheflowamongthepagesofaweb application.
•Controlordataflowmodels:thesemodelsareinunitleveland areeithercontrolordataflowsinsideasinglemodule/function.
•DOM models: theDocumentObject Model (DOM)is a cross- platformandlanguage-independentconventionforrepresenting and interactingwith objectsin HTML, XHTML and XML doc- uments. Several approaches generated DOM models for the purposeoftest-casegenerationortestoracles.
•Other:anymodelsotherthantheabove.
Thenavigationmodelsseemtobethemostpopularas22studies useda type a of navigationmodels.7 studiesusedDOM mod- elsfor thepurposeof testing.Five studiesusedcontrolordata
180 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 Table5
Typesoftheresearchfacet(method)ofthearticlesinourpool(basedonthesystematicmappingapproachfollowedinourSMstudy(Garousietal.,2013)andalsothe approachproposedbyPetersenetal.(2008).
Researchfacettypes Paperreferences Numberofpapers
Solutionproposal TonellaandRicca(2002,2004a,b,2005),DiLuccaetal.(2006),Törsel(2013),Qietal.(2006),Ricca andTonella(2001,2002a),Alalfietal.(2010),Offuttetal.(2004),Sampathetal.(2004),Hu(2002), Liu(2006),DiSciascioetal.(2005),Stepienetal.(2008),MatosandSousa(2010),Raffeltetal.
(2008),BordbarandAnastasakis(2006),Koopmanetal.(2007),Ernitsetal.(2009),OffuttandWu (2009),SongandMiao(2009),Gerlits(2010),MinamideandEngineering(2005),Zhengetal.
(2011),Liuetal.(2000),MansourandHouri(2006),Luccaetal.(2002),Andrewsetal.(2005), Bellettinietal.(2005),Amyotetal.(2005),LicataandKrishnamurthi(2004),Benediktetal.(2002), Castellucciaetal.(2006),andXiongetal.(2005)
36
Validationresearch Artzietal.(2011),Saxenaetal.(2010),Marbacketal.(2012),Sampathetal.(2005,2006a,2007), PraphamontripongandOffutt(2010),MesbahandPrasad(2011),HalfondandOrso(2008), Sprenkleetal.(2005b,2007,2008,2012),HarmanandAlshahwan(2008),Dobolyietal.(2010), Ranetal.(2009),Luoetal.(2009),Alshahwanetal.(2012),Choudharyetal.(2012),Ozkinaciand BetinCan(2011),PattabiramanandZorn(2010),EttemaandBunch(2010),Marchettoetal.
(2007),Artzietal.(2008),Thummalapentaetal.(2013),HalfondandOrso(2007),Alshahwanetal.
(2009),MirshokraieandMesbah(2012),KallepalliandTian(2001),JensenandMøller(2011),Li etal.(2010),Sakamotoetal.(2013),ArtziandPistoia(2010),Halfondetal.(2009),Amalfitano etal.(2010a),Andrewsetal.(2010),Marchettoetal.(2008b),Marchetto(2008),LiuandKuanTan (2008),RiccaandTonella(2002b),HaoandMendes(2006),PengandLu(2011),TianandMa (2006),Choudharyetal.(2010),andDallmeieretal.(2013)
45
Evaluationresearch Marchettoetal.(2008a),Sprenkleetal.(2005a,2011),AlshahwanandHarman(2011), Mirshokraieetal.(2013),MesbahandVanDeursen(2009),Mesbahetal.(2012),Elbaumetal.
(2005),DobolyiandWeimer(2010),SampathandBryce(2008),Roestetal.(2010),Marchettoand Tonella(2010),andSampathetal.(2006b)
13
flowmodels.The“Other”categoryincludedmodelssuchas:pro- gramdependencegraphs(PDGs)(Marbacketal.,2012),Database ExtendedFinite StateMachine (DEFSM)(Ranet al., 2009), Test architectureasUMLmodels(Hu,2002),andUnifiedMarkovModels (UMMs)(KallepalliandTian,2001).
Weobservedthatmodelswereused/treatedinoneoffollowing threecases:(1)asinputtotheproposedtechnique,(2)extracted fromthewebapplicationsandused insubsequentstepsofthe approach,or(3)astheoutputofthetechnique.Weclassifiedthe primarystudiesaccordingtothisusagepurposeofthemodelsand providetheresultsofthisclassificationintheresultsshowedus thatmostofthestudiesareextractingmodelsofWAsinaphaseof thetechniqueandusethosemodelsasinputforconsequentphases.
ResultsareshowninTable6.
For instance,it is worth highlightingthe studies(Ricca and Tonella,2001;DiSciascioetal.,2005;BordbarandAnastasakis, 2006)whichusethemodelsin differentmultiplephasesofthe technique.RiccaandTonella(2001)andBordbarandAnastasakis (2006)extractthemodelofthewebapplicationsundertestand usethismodelintheirtechniques.Thesetwostudiesalsoprovide
theextractedmodelsasoutput.Previouslypreparedmodelsofweb applicationsundertestarefedasinputtotechniquesproposedin DiSciascioetal.(2005)andBordbarandAnastasakis(2006)and thesemodelsaretransformedtoothertypesofmodelswhichare usedasinputtotheconsequentstepsoftheproposedtechniques.
Theresultsshowedusthatmostofthestudiesareextractingmod- elsofWAsinaphaseofthetechniqueandusethosemodelsas inputforconsequentphases.
4.1.2. RQ1.2–Whattypesoffaultmodels/bugtaxonomyrelated towebapplicationshavebeenproposed?
Todevelopeffectivetesttechniques,itisimportanttounder- standandcharacterizefaultsspecificforwebapplicationsandto developbugtaxonomiesinthisdomain.Similartowhatwascon- ductedbyanotherSLRintheareaoftestingconcurrentsoftware (Souzaetal.,2011),weextractedthewebfaultmodelsandbug taxonomiesproposedintheprimarystudies,assynthesizedand summarizedinTable7.Rowsofthetablearethefaulttypesand thecolumnsaretheprimarystudiesdiscussingortargetingthese typesoffaults.21studiesdiscussedfaultmodelsandsomeofthem
Fig.3. Typesandfrequenciesofinferredtestmodels.
Table6
Classificationoftheprimarystudiesbasedonthetreatmentoftestmodels.
Inputtothetechnique Törsel(2013),DiSciascioetal.(2005),Bordbar andAnastasakis(2006)andErnitsetal.(2009) Extractedandusedinthe
technique
TonellaandRicca(2004a),Artzietal.(2011), Marbacketal.(2012),RiccaandTonella(2001), MesbahandPrasad(2011),Ranetal.(2009), Sampathetal.(2004),Hu(2002),Choudhary etal.(2012),DiSciascioetal.(2005),Tonella andRicca(2002),EttemaandBunch(2010), Thummalapentaetal.(2013),MesbahandVan Deursen(2009),Mesbahetal.(2012),Sprenkle etal.(2012),BordbarandAnastasakis(2006), KallepalliandTian(2001),Koopmanetal.
(2007),Roestetal.(2010),Marchettoetal.
(2008b),Marchetto(2008),Liuetal.(2000),Liu andKuanTan(2008),MansourandHouri (2006),Luccaetal.(2002),Bellettinietal.
(2005),Amyotetal.(2005),PengandLu (2011),MarchettoandTonella(2010),Licata andKrishnamurthi(2004),Castellucciaetal.
(2006)andDallmeieretal.(2013)
Outputofthetechnique RiccaandTonella(2001),Liu(2006),Bordbar andAnastasakis(2006),OffuttandWu(2009) andSongandMiao(2009)
conductedfaultfeeding(mutationtesting)basedontheproposed typesoffaults.
Itisworthhighlightingtwoofthestudies(Mirshokraieetal., 2013;Marchettoetal.,2007)inthiscontext.InMirshokraieetal.
(2013),abugseveritytaxonomywasproposedwhich wasused toassesstheeffectivenessandperformanceofamutationtesting toolforJavaScript.InMarchettoetal.(2007),awebfaulttaxon- omywasproposed,wasempiricallyvalidated,andusedforfault feeding(mutationtesting).Inthatstudy,31faulttypesclassified under6categorieswereproposed,e.g.,faultsrelatedtobrowser incompatibility,faultsrelatedtotheneededplugins,faultsinses- sionsynchronization,faultsinpersistenceofsessionobjects,and faultswhilemanipulatingcookies.Fourstudies(Praphamontripong andOffutt,2010;PattabiramanandZorn,2010;EttemaandBunch, 2010; Marchetto et al., 2008b) proposed more in-depth levels of fault taxonomies. Praphamontripong and Offutt (2010) and EttemaandBunch(2010)proposedtaxonomyfornavigationfaults andPattabiramanandZorn(2010)andMarchettoetal.(2008b) classified asynchronous communication faults, specific to Ajax applications.Theraw(before-synthesis)dataforthelistoffault
Fig.4.Numberoftoolspresentedinthepapersovertheyears.
models/bugtaxonomiesineachstudyareprovidedinAppendixA (Table27).
4.1.3. RQ1.3–Whattoolshavebeenproposedandwhataretheir capabilities?
52ofthe95papers(54%)presented(mostlyprototype-level) toolsupportfortheproposedWATapproaches.Wethoughtthat anaturalquestiontoaskinthiscontextiswhetherthepresented toolsareavailablefordownload,sothatotherresearchersandprac- titionerscouldusethemaswell.Weonlycountedapresentedtool availablefordownloadifitwasexplicitlymentionedinthepaper.
Iftheauthorshadnotexplicitlymentionedthatthetoolisavail- ablefordownload,wedidnotconductinternetsearchesforthe toolnames.Only11ofthe52presentedtools(21%)wereavail- ablefordownload.Wealsowantedtoknowwhethermorerecently presentedtoolsweremorelikelytobeavailablefordownload.To assessthis,Fig.4showstheannualtrendofthenumberoftools presentedinthepapersandwhethertheyareavailablefordown- load.Wecannoticethat,inthepaperspresentedafter2008,more andmoretoolsareavailablefordownload,whichisagoodsignfor thecommunity.
Table7
Faultmodels/bugtaxonomyrelatedtowebapplications.
Faultmodels/bugtaxonomy Paperreferences
Authentication,permission Marchettoetal.(2008a),Luoetal.(2009),KallepalliandTian(2001)andDobolyiandWeimer(2010) Internationalization Marchettoetal.(2008a)andDobolyiandWeimer(2010)
Functionality Marchettoetal.(2008a),Sprenkleetal.(2007),DobolyiandWeimer(2010)andChoudharyetal.(2010)
Portability Marchettoetal.(2008a)
Navigation Marchettoetal.(2008a),PraphamontripongandOffutt(2010),Sprenkleetal.(2005b),Luoetal.(2009),Ettemaand Bunch(2010),KallepalliandTian(2001),DobolyiandWeimer(2010)andPengandLu(2011)
Asynchronouscommunication Marchettoetal.(2008a,b)andPattabiramanandZorn(2010)
Session Marchettoetal.(2008a),Marchettoetal.(2007)andDobolyiandWeimer(2010) Formconstruction Marchettoetal.(2008a),Sprenkleetal.(2007)andElbaumetal.(2005)
Database,persistance Sprenkleetal.(2007),Marchettoetal.(2007),Elbaumetal.(2005),DobolyiandWeimer(2010)andPengandLu (2011)
Appearance(GUI,layout,etc.) Sprenkleetal.(2007),Luoetal.(2009),DobolyiandWeimer(2010),PengandLu(2011)andChoudharyetal.(2010) Multitierarchitectural Luoetal.(2009)
Syntactic,programminglanguage OzkinaciandBetinCan(2011),Artzietal.(2008)andArtziandPistoia(2010)
Cookiemanipulation Marchettoetal.(2007)
Plugin Marchettoetal.(2007)
Cross-browserincompatibility Marchettoetal.(2007)andMesbahetal.(2012)
DOM Mesbahetal.(2012),MirshokraieandMesbah(2012)andMarchettoetal.(2008b)
Scripting Elbaumetal.(2005)andJensenandMøller(2011)
Codedisclosure DobolyiandWeimer(2010)
CSS DobolyiandWeimer(2010)