• Tidak ada hasil yang ditemukan

Web application testing: A systematic literature review

N/A
N/A
Protected

Academic year: 2023

Membagikan "Web application testing: A systematic literature review"

Copied!
28
0
0

Teks penuh

(1)

TheJournalofSystemsandSoftware91(2014)174–201

ContentslistsavailableatScienceDirect

The Journal of Systems and Software

jo u rn al h om ep a g e :w w w . e l s e v i e r . c o m / l o c a t e / j s s

Web application testing: A systematic literature review

Serdar Do˘gan

a,b

, Aysu Betin-Can

a

, Vahid Garousi

a,c,d,∗

aDepartmentofInformationSystems,InformaticsInstitute,MiddleEastTechnicalUniversity,Ankara,Turkey

bASELSANA.S.,Ankara,Turkey

cSoftwareQualityEngineeringResearchGroup(SoftQual),DepartmentofElectricalandComputerEngineering,SchulichSchoolofEngineering,University ofCalgary,Calgary,Alberta,Canada

dDepartmentofSoftwareEngineering,AtilimUniversity,Ankara,Turkey

a r t i c l e i n f o

Articlehistory:

Received26August2013 Receivedinrevisedform 16November2013 Accepted5January2014 Availableonline21January2014 Keywords:

Systematicliteraturereview Webapplication

Testing

a b s t r a c t

Context:Thewebhashadasignificantimpactonallaspectsofoursociety.Asoursocietyreliesmoreand moreontheweb,thedependabilityofwebapplicationshasbecomeincreasinglyimportant.Tomake theseapplicationsmoredependable,forthepastdecaderesearchershaveproposedvarioustechniques fortestingweb-basedsoftwareapplications.Ourliteraturesearchforrelatedstudiesretrieved193papers intheareaofwebapplicationtesting,whichhaveappearedbetween2000and2013.

Objective:Asthisresearchareamaturesandthenumberofrelatedpapersincreases,itisimportantto systematicallyidentify,analyze,andclassifythepublicationsandprovideanoverviewofthetrendsand empiricalevidenceinthisspecializedfield.

Methods:Wesystematicallyreviewthebodyofknowledgerelatedtofunctionaltestingofwebapplication throughasystematicliteraturereview(SLR)study.ThisSLRisafollow-upandcomplimentarystudytoa recentsystematicmapping(SM)studythatweconductedinthisarea.Aspartofthisstudy,weposethree setsofresearchquestions,defineselectionandexclusioncriteria,andsynthesizetheempiricalevidence inthisarea.

Results:Ourpoolofstudiesincludesasetof95papers(fromthe193retrievedpapers)publishedin theareaofwebapplicationtestingbetween2000and2013.ThedataextractedduringourSLRstudyis availablethroughapublicly-accessibleonlinerepository.Amongourresultsarethefollowings:(1)the listoftesttoolsinthisareaandtheircapabilities,(2)thetypesoftestmodelsandfaultmodelsproposed inthisdomain,(3)thewaytheempiricalstudiesinthisareahavebeendesignedandreported,and(4) thestateofempiricalevidenceandindustrialrelevance.

Conclusion:Wediscusstheemergingtrendsinwebapplicationtesting,anddiscusstheimplicationsfor researchersandpractitionersinthisarea.TheresultsofourSLRcanhelpresearcherstoobtainanoverview ofexistingwebapplicationtestingapproaches,faultmodels,tools,metricsandempiricalevidence,and subsequentlyidentifyareasinthefieldthatrequiremoreattentionfromtheresearchcommunity.

©2014ElsevierInc.Allrightsreserved.

1. Introduction

Thewebhashadasignificantimpactonallaspectsofoursoci- ety,frombusiness,education,government,entertainmentsectors, industry,toourpersonallives.Themainadvantagesofadopting thewebfordevelopingsoftwareproductsinclude(1)noinstalla- tioncosts,(2)automaticupgradewithnewfeaturesforallusers,

Correspondingauthorat:SoftwareQualityEngineeringResearchGroup(Soft- Qual),DepartmentofElectrical andComputerEngineering,SchulichSchoolof Engineering,UniversityofCalgary,Calgary,Alberta,Canada.Tel.:+14032105412.

E-mail addresses: [email protected], [email protected] (S. Do˘gan), [email protected] (A. Betin-Can), [email protected], [email protected](V.Garousi).

(3)universalaccessfromanymachineconnectedtotheInternet and(4)beingindependentoftheoperatingsystemofclients.

Onthedownside,theuseofserverandbrowsertechnologies makewebapplicationsparticularlyerror-proneandchallengingto test,causingseriousdependabilitythreats.A2003studyconducted by theBusiness Internet Group SanFrancisco (BIG-SF) (BIG-SF, 2003)reportedthatapproximately70%ofwebsitesandwebappli- cationscontaindefects.Inadditiontofinancialcosts,defectsinweb applicationsresultinlossofrevenueandcredibility.

Thedifficultyintestingwebapplicationsismany-fold.First,web applicationsaredistributed througha client/serverarchitecture, with (asynchronous) HTTP request/response calls to synchro- nize theapplication state.Second, theyare heterogeneous,i.e., webapplicationsaredevelopedusingdifferentprogramminglan- guages,forinstance,HTML,CSS,JavaScriptontheclient-sideand 0164-1212/$seefrontmatter©2014ElsevierInc.Allrightsreserved.

http://dx.doi.org/10.1016/j.jss.2014.01.010

(2)

PHP,Ruby,Javaontheserver-side.Andthird,webapplicationshave adynamicnature;inmanyscenariostheyalsopossessnondeter- ministiccharacteristics.

During the past decade, researchers in increasing numbers, haveproposeddifferenttechniquesforanalyzingandtestingthese dynamic, fast evolving software systems. As the research area maturesandthenumberofrelatedpapersincreases,itisimportant tosystematicallyidentify,analyzeandclassifythestate-of-the-art andprovideanoverviewofthetrendsinthisspecializedfield.In thispaper,wepresentasystematicliteraturereview(SLR)ofthe webapplicationtesting(WAT)researchdomain.

In a recentwork, weconducted a systematicmapping (SM) study(Garousietal.,2013)inwhichwereviewed79papersinthe WATdomain.ThecurrentSLRisafollow-upcomplementarystudy afterourSMstudy.WecontinueinthisSLRthesecondarystudy thatwestartedinourSMbyfocusingindepthintotheempirical andevidence-baseaspectsoftheWATdomain.TheSMandtheSLR studieshavebeenconductedbypayingcloseattentiontomajordif- ferencesbetweenthesetwotypesofsecondarystudies,e.g.,refer totheguidelinebyKitchenhamandCharters(2007).

Tothebestofourknowledge,thispaperisthefirstSLRinthearea ofWAT.Theremainderofthispaperisoutlinedasfollows.Areview oftherelatedworkispresentedinSection2.Section3explainsour researchmethodologyandresearchquestions.Section4presents theresultsoftheSLR.Section5discussesthemainfindings,trends andthevaliditythreats.Finally,Section6concludesthepaper.

2. Backgroundandrelatedwork

Weclassifyrelatedworkintothreecategories:(1)secondary studiesthat havebeenreportedinthebroaderareaofsoftware testing, (2) onlinerepositories in softwareengineering, and (3) secondarystudiesfocusingonwebapplicationtesting.

2.1. Secondarystudiesinsoftwaretesting

Wewereabletofind24secondarystudies(Garousietal.,2013;

Afzaletal.,2008;Palaciosetal.,2011;Barmietal.,2011;Netoetal., 2011a,b;EngströmandRuneson,2011;Netoetal.,2012;Afzaletal.,

2009;Zakariaetal.,2009;EndoandSimao,2010;Alietal.,2010;

Engströmetal.,2010;Binder,1996;Juristoetal.,2004;McMinn, 2004;Grindaletal.,2005;CanforaandPenta,2008;P˘as˘areanuand Visser,2009;Bozkurtetal.,2010;JiaandHarman,2011;Memon andNguyen,2010;Hellmannetal.,2010;Banerjeeetal.,2013) reported,asofthiswriting,indifferentareasofsoftwaretesting.We listandcategorizethesestudiesinTable1alongwiththeirstudy areas.Basedonthe“year”column,weobservethatmoreandmore SMsandSLRshaverecentlystartedtoappearintheareaofsoftware testing.Asperourliteraturesearch,wewereabletofindeightSMs andfiveSLRsin thearea, asshowninthetable. Theremaining 11studiesare“surveys”,“taxonomies”,“literaturereviews”,and

“analysisand survey”,termsusedbytheauthorsthemselvesto describetheirsecondarystudies.Thenumberofprimarystudies studiedineachstudyinTable1variesfrom6(Hellmannetal., 2010)to264(JiaandHarman,2011).

OurrecentSMstudy(Garousietal.,2013)reviewed79papersin theWATdomainandisconsideredthefirststep(phase)ofthecur- rentSLR.Themapping(scoping)thatweconductedintheSMstudy enabledustoclassifythepapersandidentifyempiricalstudies.We continueinthisSLRthesecondarystudythatwestartedinourSM byfocusingindepthintotheempiricalandevidence-baseaspects oftheWATdomain.Notethatasdiscussedbyotherresearchers suchasPetersenetal.(2008)andKitchenhamandCharters(2007), thegoalandscopeofSMandSLRstudiesarequitedifferentandwe followthosedistinctionsbetweenourpreviousSM(Garousietal., 2013)andthisSLR.

2.2. Onlinepaperrepositoriesinsoftwareengineering

A few recentsecondary studies have reportedonline repos- itories to supplement their study with the actual data. These repositoriesaretheby-productsofSMorSLRstudiesandwillbe usefultopractitionersandresearchersbyproviding asummary of alltheworks ina given area. Mostof theserepositoriesare maintainedandupdatedregularly.Forinstance,Harmanetal.have developedandsharedtwoonlinepaperrepositories:oneinthe areaofmutationtesting(JiaandHarman,2013),andanotherinthe areaofsearch-basedsoftwareengineering(SBSE)(Zhang,2013).In Table1

22Secondarystudiesinsoftwaretesting.

Typeofsecondarystudy Secondarystudyarea Numberofprimarystudies Year Reference

SM Non-functionalsearch-basedtesting 35 2008 Afzaletal.(2008)

SOAtesting 33 2011 Palaciosetal.(2011)

Testingusingrequirementsspecification 35 2011 Barmietal.(2011)

Productlinestesting 45 2011 Netoetal.(2011a)

Productlinestesting 64 2011 EngströmandRuneson(2011)

Productlinestestingtools 33 2012 Netoetal.(2012)

Webapplicationtesting 79 2013 Garousietal.(2013)

GraphicalUserInterface(GUI)testing 136 2013 Banerjeeetal.(2013)

SLR Search-basednon-functionaltesting 35 2009 Afzaletal.(2009)

UnittestingforBusinessProcessExecution Language(BPEL)

27 2009 Zakariaetal.(2009)

Formaltestingofwebservices 37 2010 EndoandSimao(2010)

Search-basedtest-casegeneration 68 2010 Alietal.(2010)

Regressiontestselectiontechniques 27 2010 Engströmetal.(2010)

Survey/analysis Object-orientedtesting 140 1996 Binder(1996)

Testingtechniquesexperiments 36 2004 Juristoetal.(2004)

Search-basedtestdatageneration 73 2004 McMinn(2004)

Combinatorialtesting 30 2005 Grindaletal.(2005)

SOAtesting 64 2008 MesbahandPrasad(2011)

Symbolicexecution 70 2009 P˘as˘areanuandVisser(2009)

Testingwebservices 86 2010 Bozkurtetal.(2010)

Mutationtesting 264 2011 JiaandHarman(2011)

Productlinestesting 16 2011 Netoetal.(2011b)

Taxonomy Model-basedGUItesting 33 2010 MemonandNguyen(2010)

Literaturereview TDDofuserinterfaces 6 2010 Hellmannetal.(2010)

(3)

176 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 threerecentSMstudiesconductedbyusandourcolleagues,we

alsosharedthepaperrepositoriesonline(Garousi,2012a,b,c).

Webelievethisisavaluableundertakingsincemaintainingand sharingsuchrepositoriesprovidesmanybenefitstothebroader community. For example, they are valuable resources for new researchersinthearea,andforresearchersaimingtoconductaddi- tionalsecondarystudies.Therefore,weprovidethedetailsofour SLRstudyasanonlinepaperrepository(Doganetal.,2013).

2.3. Secondarystudiesinwebapplicationtesting

WewereabletofindfoursecondarystudiesintheareaofWAT (KamandDean,2009;Alalfietal.,2009;LuccaandFasolino,2006;

Amalfitanoetal.,2010b).Asummaryoftheseworksislistedin Table2.Allfourworksseemtobeconventional(unsystematic)sur- veys.Also,thesizeoftheirpoolofstudiesisrathersmall,between 20and29papers.

Kamand Dean(2009)conductedasurveyof20 WATpapers classifyingthemintosixgroups:formal,object-oriented,statisti- cal,UML-based,slicing,andusersession-based.Alalfietal.(2009) presentedasurveyof24differentmodelingmethodsusedinweb verificationandtesting. Theauthorscategorized,compared and analyzedthedifferentmodelingmethodsaccordingtonavigation, behavior, and content.Lucca and Fasolino (2006) presented an overviewofthedifferencesbetweenwebapplicationsandtradi- tionalsoftwareapplications,andhowsuchdifferencesimpactthe testingoftheformer.Theyprovidealistofrelevantcontributions intheareaoffunctionalwebapplicationtesting.Amalfitanoetal.

(2010b)proposedaclassificationframeworkforrichinternetappli- cationtestinganddescribeanumberofexistingwebtestingtools fromtheliteraturebyplacingtheminthisframework.

Alltheseexistingstudieshaveseveralshortcomingsthatlimit theirreplication,generalization,and usabilityinstructuringthe researchbody onWAT. First, theyare all conducted in an ad- hocmanner, without a systematic approach for reviewing the literature.Second,sincetheirselectioncriteriaarenotexplicitly described,reproducingtheresultsisnotpossible.Third,theydonot representabroadperspectiveandtheirscopesarelimited,mainly becausetheyfocusonalimitednumberofrelatedpapers.

AnotherremotelyrelatedworkisInsfranandFernandez(2008) inwhichaSLRofusabilityevaluationinwebdevelopmenthasbeen reported,butthatworkdoesnotcovertheareaofWAT.Tothebest ofourknowledge,therehasbeennoSLRsofarinthefieldofWAT.

3. Researchmethod

Wediscussnextthefollowingaspectsofourresearchmethod:

•Overview.

•Goalandresearchquestions.

•Articleselection.

•Finalpoolofarticlesandtheonlinerepository.

•Dataextraction.

•Datasynthesis.

3.1. Overview

ThisSLRiscarriedoutfollowingtheguidelinesandprocesspro- posedbyKitchenhamandCharters(2007),whichhasthefollowing mainsteps:

•Planningthereview:

◦Identificationoftheneedforareview.

◦Specifyingtheresearchquestions.

◦Developingareviewprotocol.

◦Evaluatingthereviewprotocol.

•Conductingthereview:

◦Identificationofresearch.

◦Selectionofprimarystudies.

◦Studyqualityassessment.

◦Dataextractionandmonitoring.

◦Datasynthesis.

Afteridentifyingtheneedforthereview,wespecifytheresearch questions(RQs),whichareexplainedinSection3.2.Theprocess (reviewprotocol)thatwedevelopedintheplanningphaseandthen usedtoconductthisSLRstudyisoutlinedinFig.1.Theprocessstarts witharticleselection(discussedindetailinSection3.3).Then,we adaptedtheclassificationschemethatwehaddevelopedintheSM study(Garousietal.,2013)asabaselineandextendedittoaddress thegoalsandRQsofthisSLR.Afterwards,weconductedmapping inpreparationoftheSLRanalysisandthensynthesistoaddressthe RQs.

3.2. Goalandresearchquestions

AccordingtotheguidelinebyKitchenhamandCharters(2007), thegoalofaSMisclassificationandthematicanalysisofliterature onasoftwareengineeringtopic,whilethegoalofaSLRistoiden- tifybestpracticeswithrespecttospecificprocedures,technologies, methodsor toolsby aggregating informationfrom comparative studies.

RQsofaSMaregeneric,i.e.,relatedtoresearchtrends,andare oftheform:whichresearchers,howmuchactivity,whattypeof studies.Ontheotherhand,RQsofaSLRarespecific,meaningthat theyarerelatedtooutcomesofempiricalstudies.SLRsaretypically ofgreaterdepththanSMs.Often,SLRsincludeanSMasapartof theirstudy(Petersenetal.,2008).Inotherwords,theresultsofaSM canbefedintoamorerigorousSLRstudytosupportevidence-based Table2

Secondarystudiesinwebapplicationtesting.

Papertitle Reference Year #ofprimarystudies Summary

Lessonslearnedfromasurveyofweb applicationstesting

KamandDean(2009) 2009 20 Asurveyof20papersclassifyingthemintosixgroups:formal, object-oriented,statistical,UML-based,slicing,anduser session-based

Modelingmethodsforwebapplication verificationandtesting:stateoftheart

Alalfietal.(2009) 2009 24 Thestudysurveyed24differentmodelingmethodsusedin websiteverificationandtesting.Basedonashortcatalogof desirablepropertiesofwebapplicationsthatrequireanalysis, twodifferentviewsofthemethodswerepresented:ageneral categorizationbymodelinglevel,andadetailedcomparison basedonpropertycoverage

Testingweb-basedapplications:the stateoftheartandfuturetrends

LuccaandFasolino(2006) 2006 27 Surveyeddifferenttest-specificmodelingtechniques,test-case designtechniques,andtesttoolsforwebapplications Techniquesandtoolsforrichinternet

applicationstesting

Amalfitanoetal.(2010b) 2010 29 Aclassificationframeworkforrichinternetapplicationtesting andthelistofexistingwebtestingtools

(4)

Fig.1.ThereviewprotocolusedinthisSLRstudy.

softwareengineeringandthatisexactlywhatwehavefollowedin ourwork.

ThegoalofourSLRstudyistoidentify,analyze,andsynthesize workpublishedduringthepast13yearsinthefieldofWATwithan in-depthfocusontheempiricalandevidence-baseaspects.Based onourresearchgoal,weformulatethreeRQs.Toextractdetailed information,eachquestionisfurtherdividedintoanumberofsub- questions,asdescribedbelow:

•RQ1–Whattypesoftestmodels,faultmodelsandtoolshave beenproposed?

Itisimportantfornewresearcherstoknowthetypeandchar- acteristicsoftheaboveartifactstobeabletostartnewresearch workinthisarea.

◦RQ1.1–Whattypesofinput/inferredtestmodelshavebeen proposed/used?

◦RQ1.2–Whattypesoffaultmodels/bugtaxonomyrelatedto webapplicationshavebeenproposed?

◦RQ1.3–Whattoolshavebeenproposedandwhataretheir capabilities?

•RQ2 – How are the empirical studiesin WAT designed and reported?

Astudythathasbeenproperlydesignedandreportediseasyto assessandreplicate.Thefollowingsub-questionsaimatcharac- terizingsomeofthemostimportantaspectsofthestudydesign andhowwellstudiesaredesignedandreported:

◦RQ2.1–Whatarethemetricsusedforassessingcostandeffec- tivenessofWATtechniques?

◦RQ2.2–Whatarethethreatstovalidityintheempiricalstud- ies?

◦RQ2.3–Whatisthelevelofrigorandindustrialrelevanceof theempiricalstudies?

•RQ3–WhatisthestateofempiricalevidenceinWAT?

ThisRQattemptstosynthesizetheresultsreportedinthestud- iesinordertoassesshowmuchempiricalevidencewecurrently have.to answerthis question,weaddress thefollowing sub- questions:

◦RQ3.1–Isthereanyevidenceregardingthescalabilityofthe WATtechniques?

◦RQ3.2–Havedifferenttechniquesbeenempiricallycompared witheachother?

◦RQ3.3–Howmuchempiricalevidenceexistsforeachcategory oftechniquesandtypeofwebapps?

SomeoftheRQs(e.g.,RQs1.1–1.3,2.3,3.2and3.3)havebeen raisedspecifictoourSLRwhilesomeothers havebeenadapted fromsimilarSLRsinotherareasoftesting,e.g.,RQs2.1,2.2and 3.3havebeenadaptedfromanotherrecentSLRonsearch-based testing(Alietal.,2010).

Tofurtherclarifythedifferenceingoalsandscopebetweenthe previousSMstudyandthisSLR,theRQsoftheSMstudyarelistedin Table3(adaptedfromGarousietal.,2013).Wecanseethatthetwo setsofRQsaredifferentfromeachotherandarecomplimentary.

3.3. Articleselection

Wefollowedthesamearticleselectionstrategyaswehadused inourSMstudy.Webrieflydiscussnextthefollowingaspectsof articleselection:

•Sourceselectionandsearchkeywords.

•Applicationofinclusion/exclusioncriteria.

3.3.1. Sourceselectionandsearchkeywords

BasedontheSLRguidelines(KitchenhamandCharters,2007;

Petersenetal.,2008),tofindrelevantstudies,wesearchedthefol- lowingsixmajoronlinesearchacademicarticlesearchengines:

(1) IEEEXplore,1 (2) ACM DigitalLibrary,2 (3) GoogleScholar,3 (4) MicrosoftAcademicSearch,4 (5) CiteSeerX,5 and (6) Science Direct.6Thesesearchengineshavealsobeenusedinothersimilar studies(e.g.,Netoetal.,2011a;Elberzhageretal.,2012).

1http://ieeexplore.ieee.org.

2http://dl.acm.org.

3http://scholar.google.com.

4http://academic.research.microsoft.com.

5http://citeseerx.ist.psu.edu.

6http://www.sciencedirect.com.

(5)

178 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 Table3

RQsoftheSMstudy.

RQ1Systematicmapping:

RQ1.1Typeofcontribution:howmanypaperspresenttest

methods/techniques,testtools,testmodels,testmetrics,ortestprocesses?

RQ1.2Typeofresearchmethod:whattypeofresearchmethodsareused inthepapersinthisarea?

RQ1.3Typeoftestingactivity:whattype(s)oftestingactivitiesare presentedinthepapers?

RQ1.4Testlocation:howmanyclient-sideversusserversidetesting approacheshavebeenpresented?

RQ1.5Testinglevels:whichtestlevelshavereceivedmoreattention (e.g.,unit,integrationandsystemtesting)?

RQ1.6Sourceofinformationtoderivetestartifacts:whatsourcesof informationareusedtoderivetestartifacts?

RQ1.7Techniquetoderivetestartifacts:whattechniqueshavebeen usedtogeneratetestartifacts?

RQ1.8Typeoftestartifact:whichtypesoftestartifacts(e.g.,testcases, testinputs)havebeengenerated?

RQ1.9Manualversusautomatedtesting:howmanymanualversus automatedtestingapproacheshavebeenproposed?

RQ1.10Typeoftheevaluationmethod:whattypesofevaluation methodsareused?

RQ1.11Staticwebsitesversusdynamicwebapplications:howmanyof theapproachesaretargetedatstaticwebsitesversusdynamicweb applications?

RQ1.12SynchronicityofHTTPcalls:howmanytechniquestarget synchronouscallsversusasynchronousAjaxcalls?

RQ1.13Client-tierwebtechnologies:whichclient-tierwebtechnologies (e.g.,JavaScript,DOM)havebeensupportedmoreoften?

RQ1.14Server-tierwebtechnologies:whichserver-tierweb technologies(e.g.,PHP,JSP)havebeensupportedmoreoften?

RQ1.15Toolspresentedinthepapers:whatarethenamesof web-testingtoolsproposedanddescribedinthepapers,andhowmanyof themarefreelyavailablefordownload?

RQ1.16Attributesofthewebsoftwareundertest:whattypesofSystems UnderTest(SUT),i.e.,intermsofbeingopen-sourceorcommercial,have beenusedandwhataretheirattributes,e.g.,size,metrics?

RQ2Trendsanddemographicsofthepublications:

RQ2.1Publicationcountbyyear:whatistheannualnumberof publicationsinthisfield?

RQ2.2Top-citedpapers:whichpapershavebeencitedthemostbyother papers?

RQ2.3Activeresearchers:whoarethemostactiveresearchersinthe area,measuredbynumberofpublishedpapers?

RQ2.5Topvenues:whichvenues(i.e.,conferences,journals)arethe maintargetsofpapersinthisfield?

(AdaptedfromGarousietal.,2013).

ThecoveragelandscapeofthisSLRistheareaoffunctionaltest- ingofwebapplications,aswellas(dynamicorstatic)analysisto supportWAT.Thesetofsearchtermsweredevisedinasystem- aticand iterativefashion,i.e.,westartedwithaninitialsetand iterativelyimprovedthesetuntilnofurtherrelevantpaperscould befoundtoimproveourpoolofprimarystudies.Bytakingallof theaboveaspectsintoaccount,weformulatedoursearchqueryas showninTable4.

SincetheSMstudyhadidentified79studiesthrougharigorous searchprocess,weimportedthemintotheSLRpaperpoolwithout anadditionalscanning.NotethattheSMhadconsideredthepapers publisheduntilSummer2011.Wesearchedformorerecentpapers between2011and2013andincludedtheminourcandidatepool.

ThepaperselectionphaseofthisstudywascarriedoutonApriland May2013.

Table4 Searchkeywords.

(webOR websiteOR “webapplication” OR Ajax OR JavaScript OR HTMLOR DOMOR PHP OR J2EE OR servlet

OR JSP OR .NET OR Ruby OR ASP OR Python OR Perl OR CGI) AND (test OR testing OR

analysisOR analyzing OR “dynamic analysis”OR “static analysis” OR verification)

Todecreasetheriskofmissingrelevantstudies,similartopre- viousSMstudiesandSLRs,wesearchedthefollowingsourcesas wellmanually:

•Referencesfoundinstudiesalreadyinthepool.

•Personalwebpagesofactiveresearchersinthefieldofinterest:

weextractedthenamesofactiveresearchersfromtheinitialset ofpapersfoundintheabovesearchengines.

Allstudiesfoundintheadditionalvenuesthatwerenotyetinthe poolofselectedstudiesbutseemedtobecandidatesforinclusion wereaddedtotheinitialpool.Withtheabovesearchstringsand searchinspecificvenues,wefound114studieswhichweconsid- eredasourinitialpoolofpotentially-relevantstudies(alsodepicted inFig.1).Atthisstage,papersintheinitialpoolwerereadyfor applicationofinclusion/exclusioncriteriadescribednext.

3.3.2. Applicationofinclusion/exclusioncriteria

Inclusion/exclusioncriteriawerealsothesameastheSMstudy.

Toincreasethereliabilityofourstudyanditsresults,theauthors applieda systematic voting process among theteam members inthepaperselectionphasefor decidingwhethertoincludeor excludeanyofthepapersinthefirstversionofthepool.Thisprocess wasalsoutilizedtominimizepersonalbiasofeachoftheauthors.

Theteammembershadconflictingopinionsonfourpapers,which wereresolvedthrough discussions.Our voting mechanism (i.e., exclusionandinclusioncriteria)wasbasedontwoquestions:(1)is thepaperrelevanttofunctionalwebapplicationtestingandanaly- sis?(2)Doesthepaperincludearelativelysoundvalidation?These criteriawereappliedtoallpapers,includingthosepresentingtech- niques,tools,orcasestudies/experiments.

Each author then independently answered these two ques- tionsforeach paper.Onlywhen agivenpaperreceivedatleast twopositiveanswers(fromthreevotingauthors)foreachofthe two questions, it was included in the pool. Otherwise, it was excluded. We primarily voted for papers based on their title, abstract, keywords, as well as theirevaluation sections. If not enoughinformationcouldbeinferredfromtheabstract,acare- fulreviewofthecontentswasalsoconductedtoensurethatallthe papershadadirectrelevancetoourfocusedtopic.

3.4. Finalpoolofarticlesandtheonlinerepository

Aftertheinitialsearchandthefollow-upanalysisforexclusion ofunrelatedandinclusionofadditionalstudies,thepoolofselected studieswasfinalizedwith95studies.Thereadercanrefertothe referencessectionattheendofthepaperforthefullreferencelist ofallprimarystudies.Thefinalpoolofselectedstudieshasalso beenpublishedinanonlinerepositoryusingtheGoogleDocssys- tem,andispublicallyaccessibleonlineatDoganetal.(2013).The classificationsofeachselectedpublicationaccordingtotheclas- sificationschemepresentedinGarousietal.(2013)andalsothe empiricaldataextractedfromthestudiesarealsoavailableinthe onlinerepository.

Fig.2showsthenumberofpapersinthepoolbytheiryearof publication.Wecannoticethattrendhasbeengenerallyincreasing from2000until2010,butthereisasomewhatdecreasingtrend from2010to2013.Notethatthesincethestudywasconductedin themidstoftheyear2013,thusthedataforthisyeararepartial.

3.5. Researchtypeclassificationofthearticlesinthefinalpool TheRQ2and RQ3inthis studyfocusonempiricalstudiesin WAT,forwhichweconductthesystematicanalysisonlyonthe setofempiricalstudiesinourpoolofpapers.Therefore,wehadto

(6)

Fig.2.Annualtrendofstudiesincludedinourpool.

distinguishtheempiricalstudiesinthefinalpool.Wehavecatego- rizedthe95primarystudiesbyadoptingtheclassificationapproach forresearchfacetofpapersproposedbyPetersenetal.(2008),as follows:

1.Solutionproposal:Asolutionforaproblemisproposed,which canbeeithernovelorasignificantextensionofanexistingtech- nique.Thepotentialbenefitsandtheapplicabilityofthesolution areshownbyasmallexampleoragoodlineofargumentation.

2.Validationresearch:Techniquesinvestigatedarenovelandhave notyetbeenimplementedinpractice.Techniquesusedarefor exampleexperiments,i.e.,workdoneinthelaboratory.

3.Evaluationresearch:Techniquesare implementedinpractice andanevaluationofthetechniqueisconducted.Thatmeans,itis shownhowthetechniqueisimplementedinpractice(solution implementation)andwhataretheconsequencesoftheimple- mentationintermsofbenefitsanddrawbacks(implementation evaluation).

InourpreviousSMstudy(Garousietal.,2013),wehad con- ductedsuchacategorizationwithinthescopeofthatstudy,i.e.,for the79primarystudiestomapeachstudybasedonitsresearchfacet accordingtotheapproachproposedbyPetersenetal.(2008).Here, weprovidethiscategorizationforthe95primarystudiesinthecur- rentpooltoshowthediversityofthestudiesandtoshowwhich oneshave beenconsideredinaddressingRQ2andRQ3.Table5 showsthelistofprimarystudiesineachofthesecategories.For theRQ2andRQ3weexaminedonlytheempiricalstudieswhich arethearticlesinthevalidationandevaluationresearchcategory.

3.6. Dataextraction

Asdiscussedabove,ourrecentSMstudywasthefirststepofthe currentSLR.Themapping(scoping)thatweconductedintheSM studyenabledustoclassifythepapersandidentifyempiricalstud- ies(usingtheclassification“researchfacet”intheSMstudy).We conductedseveralnewtypesofclassificationsinthisSLRtofurther classifytheprimarystudiesasneededbyseveralofourcurrentRQs, e.g.,RQ1.1(input/inferredtestmodels),andRQ2.1(metricsused forassessingcostandeffectivenessofWATtechniques).

Thedataextractionphasewasconductedcollaborativelyamong the authorsand data wererecorded in the online spreadsheet (Doganetal.,2013).

3.7. Datasynthesis

Afterconductingfurthermappinganalysisforthepurposeof RQ1inthisSLR,weconductedsynthesisforansweringRQs2and 3.Todevelopourmethodofsynthesis,wecarefullyreviewedthe researchsynthesisguidelinesinsoftwareengineering(e.g.,Cruzes and Dybå, 2010,2011a,b), and alsootherSLRs which hadused high-qualitysynthesisapproaches(e.g.,Alietal.,2010;Waliaaand Carverb,2009).

According to (Cruzes and Dybå, 2010), the key objective of researchsynthesisistoevaluatetheincludedstudiesforhetero- geneityandselectappropriatemethodsforintegratingorproviding interpretiveexplanationsaboutthem(Cooperetal.,2009).Ifthe primarystudiesaresimilarenoughwithrespecttointerventions and quantitative outcome variables, it maybe possibletosyn- thesizethembymeta-analysis,whichusesstatisticalmethodsto combineeffectsizes.However,insoftwareengineeringingeneral andinourfocusedWATdomaininparticular,primarystudiesare oftentooheterogeneoustopermitastatisticalsummary.Especially forqualitativeandmixedmethodsstudies,differentmethodsof researchsynthesis,e.g.,thematicanalysisandnarrativesynthesis, arerequired(CruzesandDybå,2010).

Basedonthoseguidelines,thefactthattheprimarystudiesin ourpoolweretooheterogeneousandalsothetypeofRQsinour SLR,itwasimperativetousethematicanalysisandnarrativesyn- thesis(CruzesandDybå,2010,2011a,b)inthiswork.Wefollowed thethematicanalysisstepsrecommendedby(CruzesandDybå, 2011a)whichwereasfollows:extractingdata,codingdata,trans- latingcodesintothemes,creatingamodelofhigher-orderthemes, andfinallyassessingthetrustworthinessofsynthesis.

4. Results

Inthissection,wepresenttheresultsofourSLRstudy.

4.1. RQ1–Whattypesoftestmodels,faultmodelsandtoolshave beenproposed?

WediscussnextresultsforRQ1.1,1.2and1.3.

4.1.1. RQ1.1–Whattypesofinput/inferredtestmodelshave beenproposed/used?

Alargenumberofstudies(40,42%)proposedtesttechniques whichusedcertainmodelsasinputorinferred(reverseengineered) them.Weclassifiedthosetestmodelsasfollowsandthefrequen- ciesareshowninFig.3:

•Navigation models (for pages): models such as finite-state machines(FSM)whichspecifytheflowamongthepagesofaweb application.

•Controlordataflowmodels:thesemodelsareinunitleveland areeithercontrolordataflowsinsideasinglemodule/function.

•DOM models: theDocumentObject Model (DOM)is a cross- platformandlanguage-independentconventionforrepresenting and interactingwith objectsin HTML, XHTML and XML doc- uments. Several approaches generated DOM models for the purposeoftest-casegenerationortestoracles.

•Other:anymodelsotherthantheabove.

Thenavigationmodelsseemtobethemostpopularas22studies useda type a of navigationmodels.7 studiesusedDOM mod- elsfor thepurposeof testing.Five studiesusedcontrolordata

(7)

180 S.Do˘ganetal./TheJournalofSystemsandSoftware91(2014)174–201 Table5

Typesoftheresearchfacet(method)ofthearticlesinourpool(basedonthesystematicmappingapproachfollowedinourSMstudy(Garousietal.,2013)andalsothe approachproposedbyPetersenetal.(2008).

Researchfacettypes Paperreferences Numberofpapers

Solutionproposal TonellaandRicca(2002,2004a,b,2005),DiLuccaetal.(2006),Törsel(2013),Qietal.(2006),Ricca andTonella(2001,2002a),Alalfietal.(2010),Offuttetal.(2004),Sampathetal.(2004),Hu(2002), Liu(2006),DiSciascioetal.(2005),Stepienetal.(2008),MatosandSousa(2010),Raffeltetal.

(2008),BordbarandAnastasakis(2006),Koopmanetal.(2007),Ernitsetal.(2009),OffuttandWu (2009),SongandMiao(2009),Gerlits(2010),MinamideandEngineering(2005),Zhengetal.

(2011),Liuetal.(2000),MansourandHouri(2006),Luccaetal.(2002),Andrewsetal.(2005), Bellettinietal.(2005),Amyotetal.(2005),LicataandKrishnamurthi(2004),Benediktetal.(2002), Castellucciaetal.(2006),andXiongetal.(2005)

36

Validationresearch Artzietal.(2011),Saxenaetal.(2010),Marbacketal.(2012),Sampathetal.(2005,2006a,2007), PraphamontripongandOffutt(2010),MesbahandPrasad(2011),HalfondandOrso(2008), Sprenkleetal.(2005b,2007,2008,2012),HarmanandAlshahwan(2008),Dobolyietal.(2010), Ranetal.(2009),Luoetal.(2009),Alshahwanetal.(2012),Choudharyetal.(2012),Ozkinaciand BetinCan(2011),PattabiramanandZorn(2010),EttemaandBunch(2010),Marchettoetal.

(2007),Artzietal.(2008),Thummalapentaetal.(2013),HalfondandOrso(2007),Alshahwanetal.

(2009),MirshokraieandMesbah(2012),KallepalliandTian(2001),JensenandMøller(2011),Li etal.(2010),Sakamotoetal.(2013),ArtziandPistoia(2010),Halfondetal.(2009),Amalfitano etal.(2010a),Andrewsetal.(2010),Marchettoetal.(2008b),Marchetto(2008),LiuandKuanTan (2008),RiccaandTonella(2002b),HaoandMendes(2006),PengandLu(2011),TianandMa (2006),Choudharyetal.(2010),andDallmeieretal.(2013)

45

Evaluationresearch Marchettoetal.(2008a),Sprenkleetal.(2005a,2011),AlshahwanandHarman(2011), Mirshokraieetal.(2013),MesbahandVanDeursen(2009),Mesbahetal.(2012),Elbaumetal.

(2005),DobolyiandWeimer(2010),SampathandBryce(2008),Roestetal.(2010),Marchettoand Tonella(2010),andSampathetal.(2006b)

13

flowmodels.The“Other”categoryincludedmodelssuchas:pro- gramdependencegraphs(PDGs)(Marbacketal.,2012),Database ExtendedFinite StateMachine (DEFSM)(Ranet al., 2009), Test architectureasUMLmodels(Hu,2002),andUnifiedMarkovModels (UMMs)(KallepalliandTian,2001).

Weobservedthatmodelswereused/treatedinoneoffollowing threecases:(1)asinputtotheproposedtechnique,(2)extracted fromthewebapplicationsandused insubsequentstepsofthe approach,or(3)astheoutputofthetechnique.Weclassifiedthe primarystudiesaccordingtothisusagepurposeofthemodelsand providetheresultsofthisclassificationintheresultsshowedus thatmostofthestudiesareextractingmodelsofWAsinaphaseof thetechniqueandusethosemodelsasinputforconsequentphases.

ResultsareshowninTable6.

For instance,it is worth highlightingthe studies(Ricca and Tonella,2001;DiSciascioetal.,2005;BordbarandAnastasakis, 2006)whichusethemodelsin differentmultiplephasesofthe technique.RiccaandTonella(2001)andBordbarandAnastasakis (2006)extractthemodelofthewebapplicationsundertestand usethismodelintheirtechniques.Thesetwostudiesalsoprovide

theextractedmodelsasoutput.Previouslypreparedmodelsofweb applicationsundertestarefedasinputtotechniquesproposedin DiSciascioetal.(2005)andBordbarandAnastasakis(2006)and thesemodelsaretransformedtoothertypesofmodelswhichare usedasinputtotheconsequentstepsoftheproposedtechniques.

Theresultsshowedusthatmostofthestudiesareextractingmod- elsofWAsinaphaseofthetechniqueandusethosemodelsas inputforconsequentphases.

4.1.2. RQ1.2–Whattypesoffaultmodels/bugtaxonomyrelated towebapplicationshavebeenproposed?

Todevelopeffectivetesttechniques,itisimportanttounder- standandcharacterizefaultsspecificforwebapplicationsandto developbugtaxonomiesinthisdomain.Similartowhatwascon- ductedbyanotherSLRintheareaoftestingconcurrentsoftware (Souzaetal.,2011),weextractedthewebfaultmodelsandbug taxonomiesproposedintheprimarystudies,assynthesizedand summarizedinTable7.Rowsofthetablearethefaulttypesand thecolumnsaretheprimarystudiesdiscussingortargetingthese typesoffaults.21studiesdiscussedfaultmodelsandsomeofthem

Fig.3. Typesandfrequenciesofinferredtestmodels.

(8)

Table6

Classificationoftheprimarystudiesbasedonthetreatmentoftestmodels.

Inputtothetechnique Törsel(2013),DiSciascioetal.(2005),Bordbar andAnastasakis(2006)andErnitsetal.(2009) Extractedandusedinthe

technique

TonellaandRicca(2004a),Artzietal.(2011), Marbacketal.(2012),RiccaandTonella(2001), MesbahandPrasad(2011),Ranetal.(2009), Sampathetal.(2004),Hu(2002),Choudhary etal.(2012),DiSciascioetal.(2005),Tonella andRicca(2002),EttemaandBunch(2010), Thummalapentaetal.(2013),MesbahandVan Deursen(2009),Mesbahetal.(2012),Sprenkle etal.(2012),BordbarandAnastasakis(2006), KallepalliandTian(2001),Koopmanetal.

(2007),Roestetal.(2010),Marchettoetal.

(2008b),Marchetto(2008),Liuetal.(2000),Liu andKuanTan(2008),MansourandHouri (2006),Luccaetal.(2002),Bellettinietal.

(2005),Amyotetal.(2005),PengandLu (2011),MarchettoandTonella(2010),Licata andKrishnamurthi(2004),Castellucciaetal.

(2006)andDallmeieretal.(2013)

Outputofthetechnique RiccaandTonella(2001),Liu(2006),Bordbar andAnastasakis(2006),OffuttandWu(2009) andSongandMiao(2009)

conductedfaultfeeding(mutationtesting)basedontheproposed typesoffaults.

Itisworthhighlightingtwoofthestudies(Mirshokraieetal., 2013;Marchettoetal.,2007)inthiscontext.InMirshokraieetal.

(2013),abugseveritytaxonomywasproposedwhich wasused toassesstheeffectivenessandperformanceofamutationtesting toolforJavaScript.InMarchettoetal.(2007),awebfaulttaxon- omywasproposed,wasempiricallyvalidated,andusedforfault feeding(mutationtesting).Inthatstudy,31faulttypesclassified under6categorieswereproposed,e.g.,faultsrelatedtobrowser incompatibility,faultsrelatedtotheneededplugins,faultsinses- sionsynchronization,faultsinpersistenceofsessionobjects,and faultswhilemanipulatingcookies.Fourstudies(Praphamontripong andOffutt,2010;PattabiramanandZorn,2010;EttemaandBunch, 2010; Marchetto et al., 2008b) proposed more in-depth levels of fault taxonomies. Praphamontripong and Offutt (2010) and EttemaandBunch(2010)proposedtaxonomyfornavigationfaults andPattabiramanandZorn(2010)andMarchettoetal.(2008b) classified asynchronous communication faults, specific to Ajax applications.Theraw(before-synthesis)dataforthelistoffault

Fig.4.Numberoftoolspresentedinthepapersovertheyears.

models/bugtaxonomiesineachstudyareprovidedinAppendixA (Table27).

4.1.3. RQ1.3–Whattoolshavebeenproposedandwhataretheir capabilities?

52ofthe95papers(54%)presented(mostlyprototype-level) toolsupportfortheproposedWATapproaches.Wethoughtthat anaturalquestiontoaskinthiscontextiswhetherthepresented toolsareavailablefordownload,sothatotherresearchersandprac- titionerscouldusethemaswell.Weonlycountedapresentedtool availablefordownloadifitwasexplicitlymentionedinthepaper.

Iftheauthorshadnotexplicitlymentionedthatthetoolisavail- ablefordownload,wedidnotconductinternetsearchesforthe toolnames.Only11ofthe52presentedtools(21%)wereavail- ablefordownload.Wealsowantedtoknowwhethermorerecently presentedtoolsweremorelikelytobeavailablefordownload.To assessthis,Fig.4showstheannualtrendofthenumberoftools presentedinthepapersandwhethertheyareavailablefordown- load.Wecannoticethat,inthepaperspresentedafter2008,more andmoretoolsareavailablefordownload,whichisagoodsignfor thecommunity.

Table7

Faultmodels/bugtaxonomyrelatedtowebapplications.

Faultmodels/bugtaxonomy Paperreferences

Authentication,permission Marchettoetal.(2008a),Luoetal.(2009),KallepalliandTian(2001)andDobolyiandWeimer(2010) Internationalization Marchettoetal.(2008a)andDobolyiandWeimer(2010)

Functionality Marchettoetal.(2008a),Sprenkleetal.(2007),DobolyiandWeimer(2010)andChoudharyetal.(2010)

Portability Marchettoetal.(2008a)

Navigation Marchettoetal.(2008a),PraphamontripongandOffutt(2010),Sprenkleetal.(2005b),Luoetal.(2009),Ettemaand Bunch(2010),KallepalliandTian(2001),DobolyiandWeimer(2010)andPengandLu(2011)

Asynchronouscommunication Marchettoetal.(2008a,b)andPattabiramanandZorn(2010)

Session Marchettoetal.(2008a),Marchettoetal.(2007)andDobolyiandWeimer(2010) Formconstruction Marchettoetal.(2008a),Sprenkleetal.(2007)andElbaumetal.(2005)

Database,persistance Sprenkleetal.(2007),Marchettoetal.(2007),Elbaumetal.(2005),DobolyiandWeimer(2010)andPengandLu (2011)

Appearance(GUI,layout,etc.) Sprenkleetal.(2007),Luoetal.(2009),DobolyiandWeimer(2010),PengandLu(2011)andChoudharyetal.(2010) Multitierarchitectural Luoetal.(2009)

Syntactic,programminglanguage OzkinaciandBetinCan(2011),Artzietal.(2008)andArtziandPistoia(2010)

Cookiemanipulation Marchettoetal.(2007)

Plugin Marchettoetal.(2007)

Cross-browserincompatibility Marchettoetal.(2007)andMesbahetal.(2012)

DOM Mesbahetal.(2012),MirshokraieandMesbah(2012)andMarchettoetal.(2008b)

Scripting Elbaumetal.(2005)andJensenandMøller(2011)

Codedisclosure DobolyiandWeimer(2010)

CSS DobolyiandWeimer(2010)

Referensi

Dokumen terkait

In First Nations populations in Canada, communities with access to cultural facilities, sovereign lands or land claims, majority Indigenous language speakers, community control over

The humanistic and existential approach offers better approach to improve the emotional intelligence because it gives in depth understanding for students to understand