ContentslistsavailableatScienceDirect
Neurocomputing
journalhomepage:www.elsevier.com/locate/neucom
Classification using link prediction
Seyed Amin Fadaee, Maryam Amir Haeri
∗Department of Computer Science and Information Technology, Amirkabir University of Technology, Iran
a rt i c l e i n f o
Article history:
Received 21 February 2019 Revised 24 May 2019 Accepted 5 June 2019 Available online 13 June 2019 Communicated by Prof. H. Zhang Keywords:
Classification Link prediction Graph representation Local similarity measure Similarity-based techniques
a b s t r a c t
Linkpredictioninagraphistheproblemofdetectingthemissinglinksortheonesthatwouldbeformed inthenearfuture.Usingagraphrepresentationofthedata,wecanconverttheproblemofclassification totheproblemoflinkpredictionwhichaimsatfindingthemissing linksbetweentheunlabeled data (unlabelednodes) andtheirclasses.To ourknowledge,despitethefact thatnumerousalgorithmsuse thegraphrepresentationofthedataforclassification,noneareusinglinkpredictionastheheartoftheir classifyingprocedure.Inthiswork,weproposeanovelalgorithmcalledCULP(ClassificationUsingLink Prediction)whichusesanew structurenamelyLabelEmbeddedGraph orLEGand alink predictorto findtheclassoftheunlabeleddata.DifferentlinkpredictorsalongwithCompatibilityScore-anewlink predictorweproposedthatisdesignedspecificallyforoursettings-hasbeenusedandshowedpromis- ingresultsforclassifyingdifferentdatasets.ThispaperfurtherimprovedCULPbydesigninganextension calledCULMwhichusesamajorityvote(hencetheMintheacronym)procedurewithweightspropor- tionaltothepredictions’ confidencestousethe predictivepowerofmultiple linkpredictors andalso exploitsthelowlevelfeaturesofthedata.ExtensiveexperimentalevaluationsshowsthatbothCULPand CULMarehighlyaccurateandcompetitivewiththecuttingedgegraphclassifiersandgeneralclassifiers.
© 2019ElsevierB.V.Allrightsreserved.
1. Introduction
Classificationisanoldprobleminmachinelearningandpattern recognition that aims atfinding acorrect mappingbetweendata andtheircorresponding labels.Thismappingwouldthenbeused toderivetheclassoftheunlabeleddata[1].
Thisfield isstill highlyactiveinthe literatureanda lotofal- gorithms havebeenproposed tocorrectly classify thedata. Most oftheclassificationalgorithmsaimatfindingadecisionboundary inthefeaturespacefordistinguishingthedatabelongingtodiffer- entclasses;however,asmorecomplexdatarequiremorecomplex algorithms,theseapproachescouldfailornotcapturethetruere- lationsinthedata.
Oneofthenewapproachesthathasrecentlygainedpopularity in the literature is classification ofthe unlabeledinstances using the graphrepresentation ofthe data.Data can be representedin differentformsoneofwhichisagraph.Inthissetting,thedatais first convertedto a graphvia a similarityfunction in thefeature space, then unlabeled datais classified by incorporating a graph property.Thesegraphpropertiesarecalledhighlevelfeaturewhich givemoreinsighttothedatacomparedtothelowlevelfeatures.
∗ Corresponding author.
E-mail addresses: [email protected] (S.A. Fadaee), [email protected] (M. Amir Haeri).
Classificationusinggraphrepresentationisstudied extensively innumerousworks[2–9].Theseworksusegraphpropertiessuch as clustering coefficient, modularity, importance, PageRank and others to classify the unlabeled data and they tend to achieve more accurate results compared to the classifiers that classify based on the low level features of data.This approach has been used in text classification [10],hyperspectral image classification [11,12], image classification [2,8], handwritten digits recognition [3]andotherareas.
Linkpredictionistheproblemofpredictingthemissinglinkin agraphortheonesthatwouldbeformedinthenearfuture[13]. Usingthegraphrepresentationofthedata wecantreat theclas- sification asa linkprediction probleminan intuitive waywhere wetrytofindthelinkbetweentheunlabelednodewithitcorre- spondingclass. Toourknowledge, there arenot any workin the literaturethatuseslinkpredictiontosolvetheproblemofclassifi- cation,however,theuseofclassificationtosolvelinkpredictionis studiedextensively[13].
Inthiswork, weproposed an algorithm calledCULP(acronym forClassificationUsingLinkPrediction)thattakesadifferentlook attheclassificationproblemthroughalinkpredictionapproach.As wewillelaborateinthepaper,CULPusesagraphcalledLEGthat modelsthe data inan intuitive andsuitable wayforlink predic- tion.
Anylinkpredictorscanbeusedtoderivetheclassoftheunla- belednode inCULPandweproposed anewlocalmeasurecalled https://doi.org/10.1016/j.neucom.2019.06.026
0925-2312/© 2019 Elsevier B.V. All rights reserved.
CompatibilityScorethatisdesignedtoimprovetheaccuracyoflink predictionandconsequentlyclassification.
As much insight as highlevel features havefor capturingthe patternspresentinthedata,exploitingthelowlevelfeaturealong- sidethemwouldfurtherimprovethepredictivepowerofagraph classifiersand differentresearchers incorporate thisidea in their work[2,4].This iswhy we furtherimproved CULP andproposed theCULM extension-amajorityvotesystem(hencetheMinthe acronym)withweightsproportionaltotheprobabilitiesofthepre- dictions,thisextension usesmultiplelinkpredictorsalong witha lowlevelclassifier.AswewillseebothCULPandCULMalgorithms derivehighlyaccurateresultswhicharecompetitivewithlowlevel classifiersandothergraphbasedclassificationmethods.
Therestofthepaperisorganizedasfollows;inthenextsection areview ofthe generaldomains usedin thispaper is presented whichisapreliminarysectionelaboratingtheproblemoflinkpre- diction,similaritymeasures invectorspace,methodofconverting graphtodataandtheproblemofclassification.Afterthatasection of related works is given which is a summary of recent works usinggraphrepresentationofthedataforclassification.Next,the CULP algorithm is presented with full details which elaborates on the LEG (Label Embedded Graph) structure, the classification procedure which uses link prediction, our novel link predictor - Compatibility Score, the time complexity and a toy example to demonstrate CULP. Finally, the CULM extension is presented whichisfollowedbyourextensiveexperimentalresultstoputour proposed algorithms into perspective. At the end, the conclusion tothepaperandtheaimforfutureworksarepresented.
2. Preliminaries
TofullyunderstandCULP,a groundingforthe detailscompris- ingthisalgorithmshould beset.In thissection, ageneralreview tographtheory concepts andnotationsalong withthe definition ofthelinkpredictionproblemincomplexnetworksisgiven.After that,an overviewofsome ofthe mostimportantsimilaritymea- suresispresented,followingthisthedifferentwaysofconverting datato graph is discussed. Finallyat the endof thissection the problemofclassificationisdefined.
2.1.Linkprediction
Givena set ofverticesV andaset ofedges E containing(i, j) wherei,j∈VthedatastructureG(V,E)canbedefinedasagraph.If theelementsinEareorderedpairs,Gisconsideredtobeadirected graph.Inan undirectedgraph if(i,j)∈E itisimplied that(j,i)∈E. Regardlessof thedirectionalityof thegraph,node j isa neighbor nodetonodeiif(i,j)∈E.Foranodei,iisthesetoftheneighbor nodesofi.
ForthegraphG,adjacencymatrixAG orsimplyAisdefinedas anN×Nmatrixwithzero-oneelementsandN=
|
V|
.Foranyen-tryinA,Ai,j=1ifandonlyif(i,j)∈E.Inan undirectedgraphby definitionA=AT.Asourfocusinthispaperistowardundirected graph,for the sake of simplicity we use graph to state an undi- rectedgraph.
Thedegreeofanodeiinagraphcanbederivedusing|i|.For anygraph,thecardinalityor|E|canbeobtainedbysummingover thedegreeofallnodesusingEq.(1)whereN=
|
V|
.|
E|
= 12Ni=1
|
i|
(1)Theproblemoflinkpredictioninagraphariseswhenthegoal istopredictforthecurrentlyabsentlinks(0entriesinA)theprob- abilityoflinkformationinthefuture.Therearemanyfunctionsto predictthelinkpredictionscores.Thesefunctionsusuallycompute
thelocalsimilaritybetweenthenodestoderivethescores.Oneof thesimplesttechniquesis knownascommonneighbors(CN) [14]. Usingthisapproachthepredictionscorescanbederivedusingthe following:
λ
i,j=|
i∩j
|
(2)Eq. (2) simply counts the number of common neighbors of nodesiandjtoderiveascorefortheirlinkformation.
Anotherapproachtofindthelinkformationscoreisintroduced byAdamandAdar[15]whichusesdegreesofcommonneighbors asfeaturesforpredictionanditcanbewrittenas
λ
i,j= γ∈i∩j1
log
|
γ|
(3)Eq.(3)isknownasthe Adamic-Adarscore(AA). Thisscorepe- nalizesthefeaturesbytheirlogarithmandusesthesefeaturesfor deriving thepredictionscores.Anotherfamous approachfortack- lingtheproblemoflinkpredictionistheResourceAllocationIndex (RA)[16]thatsimulatesthetransitionofresourcesbetweennodes iandj.ThisindexisdefinedasEq.(4).
λ
i,j= γ∈i∩j1
|
γ|
(4)Thisindexisquite similarto AA,howeveritdoesnot usethe logarithm function which reduces the effect of nodes with high degree. This has the benefit of penalizing high degree common nodes.In alotof networks, thesenodesprovidelittle insightfor linkpredictionastheyareconnectedtoalotofothernodesinthe graph.
Inthiswork,we are proposinganewsimilarityfunction used forthepurposeoflinkprediction.calledCompatibilityScorewhich isdiscussedfurtherinthepaper.
2.2. Similaritymeasures
Anydatapointxwithnumericfeaturesxfwhere1≤f≤dcanbe regardedasa vector inan d-dimensionalspace.Thisview would enable the measurement of the similaritiesbetween data points usingconventionalsimilaritymeasures.Aswe aregoingtoutilize asimilarity measureinconvertingourdatato graph(discussedin the next segment), we aregoing to provide overviewofsome of thesemeasures.
HavingourdatamatrixX,withnrowsanddcolumnswitheach rowbeingadatavector,theCosinesimilaritycanbedefinedasthe following:
si,j= Xi.Xj
Xi2Xj2
(5)
where
x2 denotesthe Euclideannormofthevector xwhichis derivedbythefollowing: x2=d
f=1
x2f
Followingthe above equation,the Euclideandistancebetween anytwoddimensionalvectorscanbewrittenas:
φ
i,j=d
f=1
(
Xi,f−Xj,f)
2 (6)Utilizing the Euclidean distance, another similarity measure - namelyInverseEuclideancanbedefinedusing:
si,j= 1
φ
i,j+(7)
InEq.(7)the
termisasmallnumberusedtoavoiddivision
by zero in caseof identical vectors. Another prominentdistance in linearalgebra is whatis known astheabsolute orManhattan distance(Eq.(8))andbysubstitutingEq.(8)inEq.(7),theInverse Manhattansimilarityfunctionisdefined.
φ
i,j= df=1
|
Xi,f−Xj,f|
(8)2.3. Convertingdatatograph
Anyvectorbaseddatacanberepresentedasagraph.Doingthis wouldresultinchanging thestructureof thedatawhichenables ustocomputehighlevelfeatures.
Twoofthemostusedprocedures forconvertingdatatograph arer-RadiusandkNNmethods[17].
Using a similarity measure (e.g. cosine similaritydiscussed in the previous segment)s andmatrix data X we canuse either of thesetwoalgorithmstoconvertthedataintoagraph.Inr-Radius, an edge iscreatedbetweenevery pairofdatapointsthat havea similarityhigherthana predefinedthresholdr.Anotherapproach isusingk-nearestneighborsto formupthegraph.If(based ona s)Xiisinthek-nearestneighborsofXjtheedge(i,j)iscreated.
Due to the fact that kNN relation is not symmetric this ap- proach wouldgenerally results ina directed graph.However the same principle can be used to create an undirected graph asin Algorithm 1.Using thisapproach,if Xhas N instances,the num-
Algorithm1 UndirectedkNNconversionfunctionforthedatama- trixXandsimilaritymeasures.
functionkNN-Convert(X,s,k) E=
{}
fori,j∈X do
ifi∈kNN(s,j)or j∈kNN(s,i)then E←E∪(i,j)
endif endfor returnE endfunction
ber of undirected edges |E| in the created graph is bounded by
Nk
2 ≤
|
E|
≤Nk.CULP usesanundirectedkNNmodelingofthedata forthetaskofclassification.2.4. Classification
Suppose there are two sets ofdata, X withninstances and d featuresforeachinstancewhichisthesetofourlabeleddata.The labelsofXisdenotedbyy whereyi∈1,2,...,CwithCbeingthe number ofclasses. Eachpair (Xi, yi) makes up ourtraining data.
TheothersetofdataisX(u)withminstancesandagaindfeatures foreachinstancewhicharetheunlabeledorthetestdata.
TheclassificationproblemaimsatfindingamappingXi(u)→yˆi for every i∈1,...,m. In other words, we are trying to find a proper label for each of the unlabeled instance in X(u). IfC=2, thisiscalledbinaryclassificationandifC>2,theproblemiscalled multi-classclassification[1].
ClassifierslikekNNorDecisionTreecannaturallyhandlemulti- classclassificationproblems,howeversomeclassifierslikeSVMare inherentlydesignedforthebinaryclassificationtaskandupgrading themtohandlemulti-classclassification requiresusingOnevs.All orOnevs.Oneapproaches[1].
Inonevs.all,Cclassifiersaretrainedandeachclassifierhasthe taskofdecidingwhetheran instancebelongstoaparticularclass or not. The one vs. one approach is done by trainingC(C−1)/2
classifiersto classifyaninstanceintoeitheroftwoclassesamong alloftheCclasses.
3. Relatedworks
Using graph classification has recently gained popularity and numerousworks[2–8]focusonusingthisapproachinsteadofthe classicalmethodsofclassification.Thesemethodcancapturecom- plexpatternsinthedataandtheycangeneratehighlevelfeatures toguidetheclassificationprocedure,furthermoretheycanusually bemodifiedtoutilizethelowlevelfeaturesofthedataaswell.
In[2]arandom walker isusedto classifyunlabeledinstances on the graph embedding of the data. This graph is represented by a weight matrix of similarities. The random walk process is continueduntil convergenceand thenew datareceives the label throughaweightedmajorityvotebetweenthelabelsofthetop
η
nodeswithhighestprobabilities.Thismethodtakesthesimilarity amongthedatapointsintoaccountwithasinglenetworkforthe datasetalongwithstructuralchangesofanunlabeledinstanceon thenetworkscreatedforeachclass.Thecomplexityofthemethod isof O(n2), however,asthe authors claimed, using sparse repre- sentationssuch as kNNnetwork, andgraph construction method basedonLanczosbisection[18],thiscomplexitycanbereducedto acomplexitybetweenO(n1.06)andO(n1.33).
Anothersystemisproposed in[9]inwhichagraphiscreated forthe training instances ofeach class,then usingthe proposed spatio-structural differentialefficiency measure inthe paper,a test instanceisconnectedtosomeofthenodesineachgraph.Thela- belofthedatawouldbetheclass ofthegraphthat thetestdata hasthehighestimportancein.Theimportanceischaracterizedby Google’s PageRank measureof the network. The spatio-structural differentialefficiencymeasurein[9]takesconsiders bothphysical andtopological propertiesof thedata andthe complexityof the proposed method is again of O(n2) which is once more reduced toacomplexitybetweenO(n1.06) andO(n1.33)byusinggraphcon- structionmethodbasedonLanczosbisection.
Ahybrid method isproposed in [3]that aids a typical classi- fier (such as kNN, SVM or Naive Bayes) by using high level fea- tures.These highlevel features are thedifference of some graph propertiesbeforeandafterinsertinganewinstanceintothegraph representationof the dataof each class. The graph ofeach class isconstructedusing combinationofr-radius andkNNgraphcon- versionmethods. Thegraphpropertiesusedintheir workare as- sortativity,networkclusteringcoefficientandaveragedegree.Thela- belforthe test instanceis generatedby a weightedcombination of low level and high level features. The authors extended their work in [4] by using two more high level features namely Nor- malizedAverageDistanceamongverticesandcorenessvariabilityand using a stacking procedure to learn the weight foreach feature.
Also[5]extendsthesameworkbydiscardingtheuseofanyclas- sical classifier and using a scheme that takes low level features techniquesintoaccount to filterirrelevant graphsof someof the classes.
Authorsof[6]proposedaframework forclassificationusingk- AssociatedOptimalGraphformodelingthedataandBayestheorem andcomputingaposteriorprobabilityforeachclasstoclassifynew instances. Similar to kNN graph conversionmethod, k-Associated OptimalGraphcomputesthesimilarityofadatapointwithallof thetrainingdata,however,itwouldformanedgeonlyifthepoints belong to the same class. This would result in having multiple component(and possibly morethan one component fora class).
The method furthermoretries to find a local k foreach class so thattheresulting componentsgetthemaximal Purity(ameasure based on average degree of a component). This waythe process offindingthe parameterk isconductedautomaticallywhich also make the complexity of the framework of O(n2). Another paper
[3]also usesthe k-Associatedgraph inthispaperalong withthe highlevelclassificationmethodof[3]toclassifynewinstances.
Other methodsusingdifferentgraphmeasures havebeenpro- ducedaswell. Neto andZhao [7] uses dynamic entropy foreach weightedgraph produced by r-radius where the weights denote thedistancebetweendatapoints.Cupertino etal.[8]utilizesthe modularitymeasureforclassifyingnewinstancethatbelongstoa patternsetofthesameobjectinthetrainingdata.Thelabelisde- rivedby creating akNNgraphfor each patternsetand choosing thelabelofthegraphwithlowestmodularity changeafterinser- tionofthenewdata.Bothofthemethodsin[7,8]havethecom- plexityofO(n2).
Thegraphbasedclassificationmethodsintheliteraturemostly havethreecharacteristicsin common.Firstlythey createadiffer- entgraphforeachclassesof thedata;thisapproachavoids find- ingmeaningfulpatternthatmayformbythesimilaritiesbetween pointsindifferentclasses.
The second aspect these algorithms have in common is that they treat test instances individually and add them to the graph ofeach class andmeasurea graph propertybefore andafter the insertion.This makes thepredictionof anewinstance inefficient inpresenceoflargeamountoftestdata.
Lastly, thepropertiesthat thesealgorithms useforfindingthe differences before and after the insertion of the unlabeled data (e.g.clusteringcoefficient, average pathetc.) are time consuming andtheircomputationtimes areusually dependent onthe graph sizewhichcanmaketheminfeasibleforlargedatasets.
Our proposed algorithm CULP and it’s extension CULM solves thefirstandsecondissuebyemployinganovelgraphrepresenta- tioncalledLEG whichtreats classes asnodesalong withtraining andtest instances asaunified object andisdiscussed further in thepaper. As for thethird problem, since the labelof a test in- stanceis derived using linkprediction measures (as discussed in the previous section), the classification of the unlabeled data is fasterthanthesimilarmethods.
4. CULPalgorithm
CULP (Classification Using Link Prediction) is a classification method aimed to gain a higher accuracy in mulit-class classifi- cation task by exploiting the similarity among the data points.
This algorithm employs the powers of graph representation and link prediction methods in complex networks to deal with this problem.1TheoverallstructureofCULPisconsistedof2stages:
1.CreatingtheLEGstructureGfromthedata 2. ClassifyingthetestdatausingG
In the firststep we modelour datainto an augmented graph datastructure calledLEG(Label Embedded Graph)which we call G. G is a heterogeneous graph which incorporates the data, the classesandthesimilaritybetweenthemasaunifiedobject.
LEGessentiallycontains3setsofnodesand2setsoflinks.The differenttype ofnodesinGaretrainingnodes,testingnodesand classnodes,alsoalinkbetweentwodatanodesdenotessimilarity betweenthemandalinkbetweenatrainingnodeandaclassnode denotestheclassmembershipofthatnode.
After creatingG, we can convertthe classification problemto the problem of predicting the class membership link of a test- ingnode.Byutilizingalinkpredictionalgorithminthenextstep, membership score for every testing-class pair of nodes is com- puted.
Each ofthemembershipscores actsasaposterior probability.
Alabelischosenforatestingnodebasedonthesescores.
1The complete code of CULP in python can be found in github.com/aminfadaee/
culp .
CULP procedure is depicted in Algorithm 3. In the next seg- ments each of thesteps of the proposed algorithm iscovered in moredetail.
4.1. LEGrepresentation
The first step toward classification using CULP is creating the LEGrepresentation.LEG isaheterogeneousgraph withthree sets ofnodes:
• Trainingnodes(Vl)
• Testingnodes(Vu)
• Classnodes(Vc) andtwosetsofedges:
• Similarityedges(Es)
• Classmembershipedges(Ec)
Eachsetofnodescorrespondtotheiranalogoussetofdatai.e.
Vl containsnnodes,VucontainsmnodesandVccontainsCnodes.
The classmembership edges arecreatedbased onthe labeled data.Eccontainedges(i,j)wherei∈Vlandjisthenoderepresen- tationofyi,meaningthateachtrainingnodeisconnected(without direction)toitscorresponding classnode.Itshould benotedthat sincethelabelsforthetest dataisnot available,Eccontainsonly pairofnodesfromVl andVc.
UnlikeEc,themembersofEsarenotobtainedsotrivially.Esis responsibleforincorporatingthesimilaritiesbetweeninstancesof ourdata andthe edges inthissetare obtainedby usingagraph conversionalgorithm.In thisworktheundirectedversion ofkNN graphconversion(Algorithm1)isused.
EdgesinEsprimarily connect twonodesinVl ora nodefrom Vu tooneinVl.However,thereisnoconstraintonhavinganedge betweentwo nodesinVu,meaning thatwe can findthe similar- itybetweenunlabeleddataandconnectthemaswell(aswehave doneinthiswork).
Iftheunlabeleddataisnotavailableatfirstorincaseofanew unlabeled node x(u) this node is first added to the set Vu, after thatthesimilarityedgesbetweenthisnodeandothernodesofthe graphiscreatedthroughalinearsimilaritycomputation.
Aftercreatingallofthesetsofnodesandedges,wecandefine theLEGG(V,E)whereV=Vl∪Vu∪Vc andE=Es∪Ec.AlthoughG isinherentlyheterogeneous,wecantreatitasasimpleundirected graph.TheprocedureforcreatingGissummarizedinAlgorithm2.
Algorithm2 LEGconstructionfunctionforthedataX(l),thelabels yandtheunlabeleddataX(u)withparameterkandthesimilarity functions.
functionLEG(X(l),X(u),y,s,k) X=X(l)∪X(u)
Vl←
{
1,2,...,n}
//Nodes are represented by numbers Vu←{
n+1,n+2,...,n+m}
Vc←
{
n+m+1,n+m+2,...,n+m+C}
Ec←
{}
fori∈
{
1,2,...,n}
doEc←Ec∪(i,n+m+yi) endfor
Es←kNN-CONVERT(X,s,k) V←Vl∪Vu∪Vc
E←Es∪Ec
returnG(V,E) endfunction
Thisalgorithmtakesthelabeledandunlabeleddataalongwiththe parameter k andthesimilarity measure sandproduces G asthe output.
TherearealwaysnedgesbelongingtoEc.Thenumberofedges inEshowever,hasanupperandlowerbound.Theminimumnum- ber of possibleedges inEs is obtainedwhen the kNNprocedure ofeachpairofpointsinX(X(u)∪X(l))issymmetric-meaningthat
∀
i∀
j,i∈kNN(j)↔j∈kNN(i).ThemaximumnumberofedgesinEson the other handis obtainedwhenthe kNNprocedure isnot sym- metricforanypairofnodesinX.Usingthese, theboundsonthe numberofedgesinaLEGcanbederivedasEq.(9).n+k
2
(
n+m)
≤|
E|
≤n+k(
n+m)
(9)BytheboundsinEq.(9),itcanbestatedthatGgivesusanew low memorycostrepresentationofthedata.The memoryforthe originaldataisofO(n×d+m×d+n)forX(l),X(u)andy,butsince it isusually the casethat k<<dforhighdimensional data,LEG savesalotofmemorycomparedtousingtheoriginaldataforthe taskofclassification.
AnotheraspectofLEGisthefact thatwe areincorporatingall of our labeled and unlabeled data and class labels in a unified structurethatenablesustofindthelabelsofthetestdataviasim- pleandefficientgraphproperties,specificallylinkpredictionmeth- odswhichiscoveredinthenextsegment.
4.2. Classification
Asstatedbefore,inclassification,thegoalistofindamapping Xi(u)→yˆiforeveryi∈1,...,m.UsingtheLEGrepresentation,this problem can be reformatted asfinding j∗ for
∀
i∈VU so that the probabilityof(i,j∗)∈Ecismaximized.The newformulation means that edges will be added to the setEcbypredictingthemostprobablemembershiplinkforevery testnode.Thiscanbeeasilydonevialinkpredictionmethodsdis- cussedbefore.
Using a local similarity measure
λ
for link prediction (e.g.Adamic-Adarindex),thisproblemcanbe solvedusingthefollow- ing:
∀
i∈Vu, Ec←Ec∪(
i,j∗)
j∗=argmaxj∈Vc
( λ
i,j)
(10)Althoughmorecomplexlinkpredictionmethods(randomwalk, average path length etc.) can be used to solve the problem, the localsimilaritymeasures arenotonlyextremely fastandefficient to computebutthey alsoderive competitively accurateresults as it willbe discussinthe experiments.The pseudocodeofCULP is depictedinAlgorithm3.
Algorithm3 CULPAlgorithm.
functionCULP(X,X(u),y,s,k,
λ
)G←LEG(X,X(u),y,s,k) ˆ
y←
{}
fori∈Vudo j∗←argmax
j∈Vc
(
λ
i,j) ˆyi←j∗−(n+m) endfor
returnyˆ endfunction
4.3. Compatibilityscore
In this work a novel local score forlink prediction is formed which is designed specifically for the task of classification. This new similarity function is called Compatibility Score and like
Fig. 1. Using AA or RA for predicting the formation of ( i, j 1) in both LEG’s would result in the same score, however node γ in the first case is more valuable for the prediction.
Adamic-Adar and Resource Allocation scores penalizes the com- monneighbors,however,thispenalizationisdonedifferently.
BothAAandRAscorescanbeunfairinsomeinstances,mean- ing that they can over-penalize a valuable common neighbor or give the same score to two inherently different nodes. Take the twoLEGsinFig.1forexample(i∈Vu,
γ
,a,b,c∈Vl andj1,j2∈Vc).Inboth cases thegoal isto find thescore forthe (i, j1) link.AA andRAwouldbothpenalizenode
γ
inthesameway(penaltyof5forRAandlog(5)forAA);however,inthefirstLEGthenode
γ
ismorevaluablethanthat ofthesecondLEGandthisisduetothe factthat threeneighborsofthisnode (a, b,c)are alsoconnected tonodej1.
Whentrying topredictthescorefortheformationoflinkbe- tween nodes i and j with a common neighbors between them namely
γ
,twosetsofedgescanbedefinedstartingfromγ
:com-patibleedgesandincompatibleedges.
Compatible edges for node
γ
are the ones connectingγ
tonodeswhicharebythemselvesconnectedtothedestinationofthe candidatelink(jinthiscase).Wecandefineincompatibleedgesas alltheotheredgeswhicharenotcompatible.
Nowthecardinalityofincompatibleedgesortheincompatibility penaltyfornode
γ
whichisa commonneighborofnodesi andjcanbedefinedasthefollowing:
δ (
i,j,γ )
=|
γ|
−|
γ ∩j
|
(11)UsingEq.(11)theCompatibilityScore(CSforshort)isformally definedas Eq.(12). In thisequation both
δ
(i, j,γ
) andδ
(j, i,γ
)areusedforthepredictionof(i,j)tomakethescoresymmetricso that
λ
i,j=λ
j,i.λ
i,j= γ∈i∩j1
δ (
i,j,γ )
+ 1δ (
j,i,γ )
(12)Usingthe CompatibilityScoreforthe casesofFig.1 thescore forlink(i,j1)inLEG1canbecomputedas0.7andinLEG2as0.4.
ThisisthedesiredoutcomeasthescoreinLEG1isnowhigher.In theexperiments,amoredetailedcomparisonofCSwithotherlink predictionmethodsisdone.
4.4.Timecomplexityanalysis
In this subsection, the time complexity of finding the class membershipedgeofatestnodewillbeanalyzed.Themaincom- ponentin finding the correctlink isthe local similaritymeasure
λ
whichisusedforlinkprediction.Theselocalmeasures findthe scoreintime proportionalto thedegree oftheir sourceanddes- tinationnodes.InCULP, thesource node i belongsto Vu andthe destinationnode jbelongstoVc.Sothefirststepinanalyzingthe timeoffindingaclassmembershipedgeisfindingtheaveragede- greeofnodesinVuandVc.Thedegreeofnodejisthenumberoflabelednodesconnected to it ormore specifically nj which is the numberof data points with class of node j; however, for the degree of i a more de- tailedanalysisisneeded.Asstatedbefore,inanyundirectedgraph Eq.(1)holds.Eq.(1)canberewrittenasthefollowing:
|
E|
= 1 2i∈Vc
|
i|
+i∈Vl
|
i|
+i∈Vu
|
i|
(13)Since thedegreeoftheclassnodessumsuptothenumberof labeleddatan,itcanbesubstitutedintheaboveequation;onthe otherhand,ifwetreateachnodeinVu tohaveaveragedegree D, wecanstatethatnodesinVl wouldhaveaveragedegreeofD+1 (sinceeachofthemhasalsoamembershipedge).Usingallthese, theaboveformulacanberewritteninthefollowingmanner:
|
E|
= 12(
n+n(
D+1)
+mD)
|
E|
=n+nD 2 +mD2 (14)
Asstatedbeforethe numberofedgesina LEGis boundedbyan upper and lower bound which is derived in Eq. (9). Now using Eqs.(14)and(9)theupperboundofDcanbedefinedas:
n+nD 2 +mD
2 =k
(
n+m)
+nD=2k (15)
anditslowerboundas:
n+nD 2 +mD
2 =k
2
(
n+m)
+nD=k (16)
Consequently,theaveragedegreeforlabeledandunlabelednodes is of O(k) and for class nodes is of O(n). The Common Neigh- bor,Adamic-Adar and Resource Allocation all have the complexity offindingthecommonneighborsbetweensourceanddestination whichistheintersection oftheneighborhoodsofthetwo nodes.
TheCompatibilityScorehowever,firstfindsthecommonneighbors anddoestwo intersection foreach of the nodesin the common neighborset.
Ifdoneefficiently,theintersectionoftwosetswithsizesaand bcan be obtainedinorder ofO(min(a,b))inaverage. Usingthis, the complexity of finding the score in LEG for the formation of linksbetweeniand jis ofO(k) whenCommon Neighbor,Adamic- Adar or Resource Allocation is used and is O(k2) when Compati- bility Score is used. Since k is usually small (in our experiments 1≤k≤35), it is safe to state that the link prediction is done in constanttime;alsoasthereareCnodesinVc,predictingthelabel ofminstanceswouldtaketimeofO(mC)aftercreatingtheLEG.
Fig. 2. Toy example demonstrating CULP. A- The set of data belonging to 2 classes and a test point in red B- LEG graph of the data.
4.5. Toyexample
Inthissubsectionasimpleclassificationproblemissolvedus- ing CULP to demonstrate the steps involving in this algorithm.
ThedataispresentedinFig.2-Aastwoclasses.Thewhitepoints represent the data of class 1 and the dark points belong to class 2.The problemisfinding the correctlabelofthe red point (pointi).
Thefirststepischoosingasimilarityfunctionsandavaluefor theparameter k forformingthe graph.Herewe chose k=2 and theEuclideansimilarity(discussedinthepreliminariessection).
Now thenodesets canbe definedasVc= j1,j2,Vu=iandall theotherpointsasthesetVl.BycreatingtheedgesinEcandEsas showninAlgorithm2 theLEG inFig.2-Bcan be derived.As can beseen,inthisgrapheverynodeexceptforiisconnectedtoone oftheclassnodesj1 andj2 (whitenodes)by dottedlinksandthe blacklinksrepresentstheedgesofEs.
Lookingatthegraph,itcanbeseenthatthenodeiisconnected tonodesa,bandc.Thismeansthesenodeswouldassistinfinding thelabelfornodei.Usingthesenodes,thescoresforedges(i,j1) and(i,j2)canbeobtainedwitheachofthescoresdiscussedbefore as
λ
.TheresultsofcomputingthesescoresaredepictedinTable1.The resultsofall the linkpredictors inTable 1show that the scoreforthelink(i,j2)ishigher.Thispredictionmatchesthepat- ternperceivedbylookingatthedatainFig.2-Aandisthecorrect prediction.
Table 1
Scores computed by 4 different link predictor for the toy example of Fig. 2 .
λ (i, j 1) (i, j 2) Prediction
CN 1 2 2
AA 1/ log (4) 2/ log (3) 2
RA 1/4 2/3 2
CS 1 / 2 + 1 / 4 2(1 / 2 + 1 / 3) 2
5. CULMextension
As we stated in the time complexityanalysis subsection and demonstratedinthetoyexampleoftheprevioussection,oncethe LEG structure isformed, the prediction of links can be done in- stantly;knowingthisandthefact thattherearedifferentoptions inchoosingthelinkpredictor
λ
,thequestionarisesastowhynotuse allof our predictors and somehow combinetheir predictive ca- pabilities to assistus in findingthe best membershiplink fora test node?
The next question arises after we analyze the related works doneinthefieldofclassificationusingcomplexnetworkrepresen- tations. Agoodportion ofthesemethods arecapable ofincorpo- ratingorexploitingthelow levelfeatures ofthedata toenhance the classificationperformance. How canwe modifyour framework CULPto exploitthelowlevelfeaturesofthedataaswellasthehigh levelfeatures?
Theanswer tobothofthesequestionsliesinourextensionto CULPalgorithmwhichwecalltheCULMextension.CULMincreases the predictive capabilities of CULP by using a weighted majority vote procedure(hencethe MasinMajorityintheendinstead of P).
Insteadofusingonlyonelinkpredictor
λ
,wewilluseanarrayoflinkpredictors .Eachlinkpredictor
λ
whenused,givesascoreto the links (i, j) for all j∈Vc. We can use all of thesescores to estimatetheprobabilitypofourpredictioncorrectnessasEq.(17).
pyˆ=
λ
i,j∗ j∈Vcλ
i,j(17)
In this equation yˆ is the label corresponding to j∗ and j∗ is computedusingEq.(10)oftheprevioussection.UsingEq.(17)we canassignconfidencetothepredictionof
λ
.Whenusingmultiplepredictors, it isobvious that a
λ
with higherconfidence ismorereliable.Weare goingtousetheseprobabilities toassignweights toeachofthe
λ
sin .Thiswayinsteadofusingasimplemajorityvote, a weighted voting procedure can be used. In a weighted majority vote procedure,few predictions are aggregated. Eachof these predictionhas an individual weight whichstates the value of their vote;finally thevoting inthis settingwould be done as Algorithm4.
Algorithm4 WeightedMajorityVotingAlgorithm.
functionVOTE(Y,W) L←
{
0}
Cfory∈Yandw∈W do Ly←Ly+w
endfor ˆ
y←argmax(L) returnyˆ endfunction
In Algorithm4,Y isthe setcontainingthe predictedlabels of each ofthe predictors, W isthe respective weights ofthe labels and Lis a set with Celements which keepstrack ofthe weight foreachoftheclasses.Usingthisalgorithmenablesustonotonly
usemultiplelinkpredictors’predictedlabels,butalsoincorporate arbitraryanyclassicalclassifier
ψ
withsuitableweights.Thiswaythelowlevelfeaturesofthedataisexploitedaswell.
Thenext step isto definethe weights foreach ofourpredic- torsand
ψ
. Ifyˆλ isthepredictedlabelofthepredictorλ
fortheunlabeleddatax(u)andpλyˆ istheprobabilityofthisprediction,the weightofpredictor
λ
forx(u) can be definedasEq.(18).Alsoforthepredictionof
ψ
onx(u) which canbe denotedasyˆψ,we can definetheweightasEq.(19).wλyˆ =
α
pλyˆλ∈
pλyˆ (18)
wψyˆ =1−
α
(19)The
α
parameterwhich is usedin both equationsisprovidedbythe user.Thisparametercontrolsthetrade-off thatCULM will makebetweenthelinkpredictors’labelsandthepredictionofthe lowlevelclassifier.
The parameter
α
is chosen in the range0 to 1; howeveranyvaluebelow0.5wouldresultinneutralizingthevoteofCULMpre- dictors.Alsoif
α
=1,thepredictioniscompletelydoneby CULM predictorsandthe low levelclassifier is ignored;soin generalit canbestatedthat0.5≤α
≤1.Now theCULM extension canbe formally definedas thepro- cedure captured in Algorithm 5. In this algorithm, after creating
Algorithm5 CULMAlgorithm.
functionCULM(X,X(u),y,s,k, ,
ψ
,α
)G←LEG(X,X(u),y,s,k) ˆ
y←
{}
foriinVudo P←
{}
Yˆ←
{ ψ
(Xi(u))}
W←
{
1−α}
for
λ
in doj∗←argmax
j∈Vc (
λ
i,j) P←P∪ λi,j∗j∈Vc
λi,j Yˆ←Yˆ∪j∗−(n+m) endfor
forp∈Pdo W←W∪ α×p
p∈P p
endfor ˆ
yi←VOTE(Yˆ,W) endfor
returnyˆ endfunction
theLEG,eachofthepredictorsin producealabelandaproba- bility.Theseprobabilitiesandlabelsarethenmergedwiththatof thelowlevelclassifier
ψ
toformupYandWwhicharepassedtoAlgorithm4toproducethefinallabelforthetestinstance.
Asanalyzed,thetimecomplexityofpredictingthelabels ofm instances usingCULP isO(mC). CULM inherentlyrepeats the pre- dictionltimeswithlbeingthenumberoflinkpredictorsin and then uses a majority vote. The predictions complexity is O(lmC) andthevotinghasthecomplexityofO(l);Therefore,wecaniden- tifyCULMtime complexitytobe ofO(lmC+l+O(
ψ
))withO(ψ
)part being the complexity of the low level classifier.Clearly the generaltime forCULMcouldbe majorlydifferentuponusingdif- ferentclassifiers.
Table 2
Datasets used in deriving the results for CULP and CULM.
Dataset Instances Attributes Classes
Zoo 101 16 7
Hayes 132 4 3
Iris 150 4 3
Teaching 151 5 3
Wine 178 13 3
Sonar 208 60 2
Image 210 19 7
Glass 214 9 6
Thyroid 215 5 3
Ecoli 336 7 8
Libras 360 90 15
Balance 625 4 3
Pima 768 8 2
Vehicle 846 18 4
Vowel 990 10 11
Yeast 1,484 8 10
RedWine 1,599 11 6
Segment 2,100 19 7
Optical 5,620 64 10
Poker 25,010 10 10
6. Experimentalresults
In thissection, we are presenting the result of our proposed algorithmsCULPandCULMon20differentrealdatasetsandcom- paringit to classical classification methods as well asbest clas- sifiersof the related works in the domain ofclassification using complexnetworks.
The datasets used for our experiments are all obtained from UCI machinelearningrepository [19].These datasetsinclude Zoo, Hayes-Roth (Hayes), Iris, Teaching Assistant Evaluation (Teaching), Wine,SonarMinesvs.Rocks(Sonar),ImageSegmentationtrainingset (Image)andtestingset(Segmentation),GlassIdentification(Glass), Thyroid Disease (Thyroid), Ecoli,Libras Movement (Libras), Balance Scale(Balance),PimaIndiansDiabetes(Pima),StatlogVehicleSilhou- ettes(Vehicle), Vowel Recognition(Vowel), Yeast, WineQualityRed (RedWine),OpticalRecognitionofHandwrittenDigits(Optical),Poker Hand(Poker).Eachofthesedatasetsalongwiththenumberofin- stances,attributesandclassesislistedinTable2.
6.1.CULPanalysis
Thereasonbehindchoosingthesedatasetsisthevarietyofboth structureanddomainbetweenthem.Thesizeofthesedataisbe- tween101to25,010 whichtestthe practicalityofouralgorithms on both small andlarge datasets; the number of attributes vary from4to90whichtesttheproposedalgorithmsagainstbothlow andhighdimensionaldatasetsandfinallythere isalotofvariety inthenumberofclassesinthe datasetswhich rangesfrom2up to10.
This section is organized asfollows: first, the experiment on CULP and different predictors as
λ
is presented, after that theCULMalgorithms isanalyzed with3different lowlevel classifier, thefollowingsubsectionwilldiscusstheeffectsof
α
parameter,af-terthat acomparisonofCULPandCULMwithclassical classifiers willbedemonstratedandfinallyCULPandCULMwillbecompared along all the classical approaches and the similar works around classificationusingcomplexnetworks.
As the first experiment, different link predictors are used in CULPtocomparetheperformanceofeachoneonthedatasets.For thisexperimentsthepredictor
λ
isoneoftheCN,AA,RAandCSwhicharerespectivelydefinedinEqs.(2),(3),(4),(12).
TheparametersusedinCULMarek(1≤k≤35),
λ
(thelinkpre-dictor which is Common Neighbors, Resource Allocation, Adamic
AdarorCompatibilityScore),thevectorsimilarityfunctionsand
α
(0.5≤
α
≤1).Foreachlinkpredictorandeachdataset,theparam- etersaretuned.Thistuningisdoneviaa10-FoldCrossValidation procedure. After finding the best parameters, 30 runs of 10-Fold CrossValidationisdone thatamounttototalof300runs.Table3 capturestheresultsobtainedbythesesettings.Ineach cell ofTable3,the firstnumberisthe meanaccuracy of the runsand the second number is thestandard deviationof them. The number in the parentheses represent the best k ob- tainedforeachcell andthebold cellarethe bestresultobtained onadataset.
AscanbeseeninTable3,theCompatibilityScoreachievedthe best resultsamongthe predictors,this isdueto the fact that CS exclusivelygotthehighestaccuracyon6datasetsofGlass,Libras, Balance,Pima, YeastandRedWine.Inthesecond placeisthe Re- sourceAllocationIndexthatobtainedthetopaccuracyforZoo,Iris, Ecoli,OpticalandPokerexclusivelyandachievedanidenticalbest accuracywithAdamic-AdarScoreon theVowel dataset.Thethird bestpredictoristheCommonNeighborwith5datasetsofHayes, Teaching, Sonar, Thyroid andVehicle on top and finally Adamic- Adar for Wine, Image and Segment and the shared best results withRAforVowel.
Analyzing the ks in this experiments, we can see that for 10 datasetsofZoo,Hayes,Iris,Teaching,Wine,Image,Thyroid,Libras, Vehicleand Poker thebest k is identicalfor each predictoron a dataset;inBalanceandPimahowever;theksarenoticeablydiffer- entwithCommonNeighborhavingthehighestkinbothofthem.
Intherestofthedatasetsthechoice ofkamongdifferentpredic- torsareatmostdifferentby1(forYeastitis2).
6.2. CULManalysis
Asthenextexperiment,theCULMalgorithmisrunoneachof thedatasets. Theparameter
α
is tunedover theset{0.6, 0.7, 0.8,0.9,1}.Allthevaluesbelow0.6for
α
isnot usedtokeepthere-sultsandcomparisons fair(asstated before,anyvalue below0.5 for
α
zerosthe effectof CULP predictorsalso experimentally the same holdsforα
=0.5), thiswaywe are sure that thelink pre- dictorsisnotcompletelyovershadowedbythelowlevelclassifier.Otherparametersofthealgorithmandthetuningisdoneasbefore andagaineachcellistheresultof300runs.
For a low level classifier to accompany the link predictors in CULM,three different algorithms havebeen chosen andused.
Theselow level classifiersare LDA(LinearDiscriminantAnalysis), CART (Classification And Regression Trees) and multi-class SVM (SupportVectorMachine)withRBFkernel.
Table4 capturesthe resultsofthisexperiments. Thefirst col- umn is the best results for each of the datasets using CULP (Table 3);the next three columns are the results of CULM with respectivelyLDA, CARTandSVM as
φ
andineach ofthecellsinthesecolumnthe numbersin parenthesesrepresentthek and
α
usedinruns.Thelastcolumninthistablerepresentstheaccuracy gainachievedbyusingCULMinsteadofCULP.Eachofthenumbers inthiscolumnisobtainedby comparingthebestresultobtained byCULMwiththebestresultobtainedbyCULPforeachdataset.
LookingatTable4itisclearthat intheThyroid dataset,using CULMachievednochangeintheaccuracyandinthedatasetsIris andOpticaltheaccuracydeteriorates;however,takingintoaccount theother 17datasets,CULMalmost achievedacompletelyhigher result.
CULMwithSVMasitslowlevelclassifierachievedthebestre- sultson6datasets ofSonar,Thyroid, Libras,Balance,Vowel,Red- WineandPoker exclusivelyandshares thebestresultonThyroid with CULM-LDA and CULP. As the next best classifiers we have bothCULM-CARTandCULM-LDAwithexclusively5bestaccuracy