Classiﬁcation using link prediction

(1)

ContentslistsavailableatScienceDirect

Neurocomputing

journalhomepage:www.elsevier.com/locate/neucom

Classiﬁcation using link prediction

Seyed Amin Fadaee, Maryam Amir Haeri

^∗

Department of Computer Science and Information Technology, Amirkabir University of Technology, Iran

a rt i c l e i n f o

Article history:

Received 21 February 2019 Revised 24 May 2019 Accepted 5 June 2019 Available online 13 June 2019 Communicated by Prof. H. Zhang Keywords:

Classiﬁcation Link prediction Graph representation Local similarity measure Similarity-based techniques

a b s t r a c t

Linkpredictioninagraphistheproblemofdetectingthemissinglinksortheonesthatwouldbeformed inthenearfuture.Usingagraphrepresentationofthedata,wecanconverttheproblemofclassification totheproblemoflinkpredictionwhichaimsatfindingthemissing linksbetweentheunlabeled data (unlabelednodes) andtheirclasses.To ourknowledge,despitethefact thatnumerousalgorithmsuse thegraphrepresentationofthedataforclassification,noneareusinglinkpredictionastheheartoftheir classifyingprocedure.Inthiswork,weproposeanovelalgorithmcalledCULP(ClassificationUsingLink Prediction)whichusesanew structurenamelyLabelEmbeddedGraph orLEGand alink predictorto findtheclassoftheunlabeleddata.DifferentlinkpredictorsalongwithCompatibilityScore-anewlink predictorweproposedthatisdesignedspecificallyforoursettings-hasbeenusedandshowedpromis- ingresultsforclassifyingdifferentdatasets.ThispaperfurtherimprovedCULPbydesigninganextension calledCULMwhichusesamajorityvote(hencetheMintheacronym)procedurewithweightspropor- tionaltothepredictions’ confidencestousethe predictivepowerofmultiple linkpredictors andalso exploitsthelowlevelfeaturesofthedata.ExtensiveexperimentalevaluationsshowsthatbothCULPand CULMarehighlyaccurateandcompetitivewiththecuttingedgegraphclassifiersandgeneralclassifiers.

1. Introduction

Classiﬁcationisanoldprobleminmachinelearningandpattern recognition that aims atﬁnding acorrect mappingbetweendata andtheircorresponding labels.Thismappingwouldthenbeused toderivetheclassoftheunlabeleddata[1].

Thisfield isstill highlyactiveinthe literatureanda lotofal- gorithms havebeenproposed tocorrectly classify thedata. Most oftheclassificationalgorithmsaimatfindingadecisionboundary inthefeaturespacefordistinguishingthedatabelongingtodiffer- entclasses;however,asmorecomplexdatarequiremorecomplex algorithms,theseapproachescouldfailornotcapturethetruere- lationsinthedata.

Oneofthenewapproachesthathasrecentlygainedpopularity in the literature is classification ofthe unlabeledinstances using the graphrepresentation ofthe data.Data can be representedin differentformsoneofwhichisagraph.Inthissetting,thedatais first convertedto a graphvia a similarityfunction in thefeature space, then unlabeled datais classified by incorporating a graph property.Thesegraphpropertiesarecalledhighlevelfeaturewhich givemoreinsighttothedatacomparedtothelowlevelfeatures.

∗ Corresponding author.

E-mail addresses: [email protected] (S.A. Fadaee), [email protected] (M. Amir Haeri).

Classificationusinggraphrepresentationisstudied extensively innumerousworks[2–9].Theseworksusegraphpropertiessuch as clustering coefficient, modularity, importance, PageRank and others to classify the unlabeled data and they tend to achieve more accurate results compared to the classifiers that classify based on the low level features of data.This approach has been used in text classification [10],hyperspectral image classification [11,12], image classification [2,8], handwritten digits recognition [3]andotherareas.

Linkpredictionistheproblemofpredictingthemissinglinkin agraphortheonesthatwouldbeformedinthenearfuture[13]. Usingthegraphrepresentationofthedata wecantreat theclas- sification asa linkprediction probleminan intuitive waywhere wetrytofindthelinkbetweentheunlabelednodewithitcorre- spondingclass. Toourknowledge, there arenot any workin the literaturethatuseslinkpredictiontosolvetheproblemofclassifi- cation,however,theuseofclassificationtosolvelinkpredictionis studiedextensively[13].

Inthiswork, weproposed an algorithm calledCULP(acronym forClassiﬁcationUsingLinkPrediction)thattakesadifferentlook attheclassiﬁcationproblemthroughalinkpredictionapproach.As wewillelaborateinthepaper,CULPusesagraphcalledLEGthat modelsthe data inan intuitive andsuitable wayforlink prediction.

Anylinkpredictorscanbeusedtoderivetheclassoftheunla- belednode inCULPandweproposed anewlocalmeasurecalled https://doi.org/10.1016/j.neucom.2019.06.026

(2)

CompatibilityScorethatisdesignedtoimprovetheaccuracyoflink predictionandconsequentlyclassiﬁcation.

As much insight as highlevel features havefor capturingthe patternspresentinthedata,exploitingthelowlevelfeaturealong- sidethemwouldfurtherimprovethepredictivepowerofagraph classifiersand differentresearchers incorporate thisidea in their work[2,4].This iswhy we furtherimproved CULP andproposed theCULM extension-amajorityvotesystem(hencetheMinthe acronym)withweightsproportionaltotheprobabilitiesofthepre- dictions,thisextension usesmultiplelinkpredictorsalong witha lowlevelclassifier.AswewillseebothCULPandCULMalgorithms derivehighlyaccurateresultswhicharecompetitivewithlowlevel classifiersandothergraphbasedclassificationmethods.

Therestofthepaperisorganizedasfollows;inthenextsection areview ofthe generaldomains usedin thispaper is presented whichisapreliminarysectionelaboratingtheproblemoflinkpre- diction,similaritymeasures invectorspace,methodofconverting graphtodataandtheproblemofclassification.Afterthatasection of related works is given which is a summary of recent works usinggraphrepresentationofthedataforclassification.Next,the CULP algorithm is presented with full details which elaborates on the LEG (Label Embedded Graph) structure, the classification procedure which uses link prediction, our novel link predictor - Compatibility Score, the time complexity and a toy example to demonstrate CULP. Finally, the CULM extension is presented whichisfollowedbyourextensiveexperimentalresultstoputour proposed algorithms into perspective. At the end, the conclusion tothepaperandtheaimforfutureworksarepresented.

2. Preliminaries

TofullyunderstandCULP,a groundingforthe detailscompris- ingthisalgorithmshould beset.In thissection, ageneralreview tographtheory concepts andnotationsalong withthe definition ofthelinkpredictionproblemincomplexnetworksisgiven.After that,an overviewofsome ofthe mostimportantsimilaritymea- suresispresented,followingthisthedifferentwaysofconverting datato graph is discussed. Finallyat the endof thissection the problemofclassificationisdefined.

2.1.Linkprediction

Givena set ofverticesV andaset ofedges E containing(i, j) wherei,j∈VthedatastructureG(V,E)canbedeﬁnedasagraph.If theelementsinEareorderedpairs,Gisconsideredtobeadirected graph.Inan undirectedgraph if(i,j)∈E itisimplied that(j,i)∈E. Regardlessof thedirectionalityof thegraph,node j isa neighbor nodetonodeiif(i,j)∈E.Foranodei,iisthesetoftheneighbor nodesofi.

ForthegraphG,adjacencymatrixA_G orsimplyAisdeﬁnedas anN×Nmatrixwithzero-oneelementsandN=

|

^V

|

^.^For^any^en-

tryinA,A_i_,_j=1ifandonlyif(i,j)∈E.Inan undirectedgraphby deﬁnitionA=A^T.Asourfocusinthispaperistowardundirected graph,for the sake of simplicity we use graph to state an undirectedgraph.

Thedegreeofanodeiinagraphcanbederivedusing|i|.For anygraph,thecardinalityor|E|canbeobtainedbysummingover thedegreeofallnodesusingEq.(1)whereN=

|

^V

|

^.

|

^E

|

⁼ ¹₂^N

i=1

|

i

|

⁽¹⁾

Theproblemoflinkpredictioninagraphariseswhenthegoal istopredictforthecurrentlyabsentlinks(0entriesinA)theprob- abilityoflinkformationinthefuture.Therearemanyfunctionsto predictthelinkpredictionscores.Thesefunctionsusuallycompute

thelocalsimilaritybetweenthenodestoderivethescores.Oneof thesimplesttechniquesis knownascommonneighbors(CN) [14]. Usingthisapproachthepredictionscorescanbederivedusingthe following:

λ

i,j=

|

i∩

j

|

⁽²⁾

Eq. (2) simply counts the number of common neighbors of nodesiandjtoderiveascorefortheirlinkformation.

Anotherapproachtoﬁndthelinkformationscoreisintroduced byAdamandAdar[15]whichusesdegreesofcommonneighbors asfeaturesforpredictionanditcanbewrittenas

λ

i,j= γ∈i∩j

1

log

|

γ

|

⁽³⁾

Eq.(3)isknownasthe Adamic-Adarscore(AA). Thisscorepe- nalizesthefeaturesbytheirlogarithmandusesthesefeaturesfor deriving thepredictionscores.Anotherfamous approachfortack- lingtheproblemoflinkpredictionistheResourceAllocationIndex (RA)[16]thatsimulatesthetransitionofresourcesbetweennodes iandj.ThisindexisdeﬁnedasEq.(4).

λ

i,j= γ∈i∩j

1

|

γ

|

⁽⁴⁾

Thisindexisquite similarto AA,howeveritdoesnot usethe logarithm function which reduces the effect of nodes with high degree. This has the beneﬁt of penalizing high degree common nodes.In alotof networks, thesenodesprovidelittle insightfor linkpredictionastheyareconnectedtoalotofothernodesinthe graph.

Inthiswork,we are proposinganewsimilarityfunction used forthepurposeoflinkprediction.calledCompatibilityScorewhich isdiscussedfurtherinthepaper.

2.2. Similaritymeasures

Anydatapointxwithnumericfeaturesx_fwhere1≤f≤dcanbe regardedasa vector inan d-dimensionalspace.Thisview would enable the measurement of the similaritiesbetween data points usingconventionalsimilaritymeasures.Aswe aregoingtoutilize asimilarity measureinconvertingourdatato graph(discussedin the next segment), we aregoing to provide overviewofsome of thesemeasures.

HavingourdatamatrixX,withnrowsanddcolumnswitheach rowbeingadatavector,theCosinesimilaritycanbedeﬁnedasthe following:

si,j= Xi.Xj

^Xⁱ

2

Xj

2

(5)

where

^x

2 denotesthe Euclideannormofthevector xwhichis derivedbythefollowing:

^x

2=

^d

f=1

x²_f

Followingthe above equation,the Euclideandistancebetween anytwoddimensionalvectorscanbewrittenas:

φ

i,j=

^d

f=1

(

^Xi,f−X_j,_f

)

² ⁽⁶⁾

Utilizing the Euclidean distance, another similarity measure - namelyInverseEuclideancanbedeﬁnedusing:

si,j= 1

φ

ⁱ,j+

⁽⁷⁾

(3)

InEq.(7)the

^termîsâ^small^numberûsed^toâvoid^division

by zero in caseof identical vectors. Another prominentdistance in linearalgebra is whatis known astheabsolute orManhattan distance(Eq.(8))andbysubstitutingEq.(8)inEq.(7),theInverse Manhattansimilarityfunctionisdeﬁned.

φ

ⁱ,j= d

f=1

|

^Xi,f−X_j,_f

|

⁽⁸⁾

2.3. Convertingdatatograph

Anyvectorbaseddatacanberepresentedasagraph.Doingthis wouldresultinchanging thestructureof thedatawhichenables ustocomputehighlevelfeatures.

Twoofthemostusedprocedures forconvertingdatatograph arer-RadiusandkNNmethods[17].

Using a similarity measure (e.g. cosine similaritydiscussed in the previous segment)s andmatrix data X we canuse either of thesetwoalgorithmstoconvertthedataintoagraph.Inr-Radius, an edge iscreatedbetweenevery pairofdatapointsthat havea similarityhigherthana predeﬁnedthresholdr.Anotherapproach isusingk-nearestneighborsto formupthegraph.If(based ona s)X_iisinthek-nearestneighborsofX_jtheedge(i,j)iscreated.

Due to the fact that kNN relation is not symmetric this approach wouldgenerally results ina directed graph.However the same principle can be used to create an undirected graph asin Algorithm 1.Using thisapproach,if Xhas N instances,the num-

Algorithm1 UndirectedkNNconversionfunctionforthedatama- trixXandsimilaritymeasures.

functionkNN-Convert(X,s,k) E=

{}

fori,j∈X do

ifi∈kNN(^s,j)^or ^j∈kNN(^s,i)^then E←E∪(ⁱ,j)

endif endfor returnE endfunction

ber of undirected edges |E| in the created graph is bounded by

Nk

2 ≤

|

^E

|

≤Nk.CULP usesanundirectedkNNmodelingofthedata forthetaskofclassiﬁcation.

2.4. Classiﬁcation

Suppose there are two sets ofdata, X withninstances and d featuresforeachinstancewhichisthesetofourlabeleddata.The labelsofXisdenotedbyy wherey_i∈1,2,...,CwithCbeingthe number ofclasses. Eachpair (X_i, y_i) makes up ourtraining data.

TheothersetofdataisX⁽^u⁾withminstancesandagaindfeatures foreachinstancewhicharetheunlabeledorthetestdata.

TheclassificationproblemaimsatfindingamappingX_i⁽û⁾→yˆ_i for every i∈1,...,m. In other words, we are trying to find a proper label for each of the unlabeled instance in X⁽û⁾. IfC=2, thisiscalledbinaryclassificationandifC>2,theproblemiscalled multi-classclassification[1].

ClassifierslikekNNorDecisionTreecannaturallyhandlemulti- classclassificationproblems,howeversomeclassifierslikeSVMare inherentlydesignedforthebinaryclassificationtaskandupgrading themtohandlemulti-classclassification requiresusingOnevs.All orOnevs.Oneapproaches[1].

Inonevs.all,Cclassiﬁersaretrainedandeachclassiﬁerhasthe taskofdecidingwhetheran instancebelongstoaparticularclass or not. The one vs. one approach is done by trainingC(^C−1)/2

classiﬁersto classifyaninstanceintoeitheroftwoclassesamong alloftheCclasses.

3. Relatedworks

Using graph classification has recently gained popularity and numerousworks[2–8]focusonusingthisapproachinsteadofthe classicalmethodsofclassification.Thesemethodcancapturecom- plexpatternsinthedataandtheycangeneratehighlevelfeatures toguidetheclassificationprocedure,furthermoretheycanusually bemodifiedtoutilizethelowlevelfeaturesofthedataaswell.

In[2]arandom walker isusedto classifyunlabeledinstances on the graph embedding of the data. This graph is represented by a weight matrix of similarities. The random walk process is continueduntil convergenceand thenew datareceives the label throughaweightedmajorityvotebetweenthelabelsofthetop

η

nodeswithhighestprobabilities.Thismethodtakesthesimilarity amongthedatapointsintoaccountwithasinglenetworkforthe datasetalongwithstructuralchangesofanunlabeledinstanceon thenetworkscreatedforeachclass.Thecomplexityofthemethod isof O(n²), however,asthe authors claimed, using sparse repre- sentationssuch as kNNnetwork, andgraph construction method basedonLanczosbisection[18],thiscomplexitycanbereducedto acomplexitybetweenO(n^1.06)andO(n^1.33).

Anothersystemisproposed in[9]inwhichagraphiscreated forthe training instances ofeach class,then usingthe proposed spatio-structural differentialeﬃciency measure inthe paper,a test instanceisconnectedtosomeofthenodesineachgraph.Thela- belofthedatawouldbetheclass ofthegraphthat thetestdata hasthehighestimportancein.Theimportanceischaracterizedby Google’s PageRank measureof the network. The spatio-structural differentialeﬃciencymeasurein[9]takesconsiders bothphysical andtopological propertiesof thedata andthe complexityof the proposed method is again of O(n²) which is once more reduced toacomplexitybetweenO(n^1.06) andO(n^1.33)byusinggraphcon- structionmethodbasedonLanczosbisection.

Ahybrid method isproposed in [3]that aids a typical classi- ﬁer (such as kNN, SVM or Naive Bayes) by using high level features.These highlevel features are thedifference of some graph propertiesbeforeandafterinsertinganewinstanceintothegraph representationof the dataof each class. The graph ofeach class isconstructedusing combinationofr-radius andkNNgraphcon- versionmethods. Thegraphpropertiesusedintheir workare as- sortativity,networkclusteringcoeﬃcientandaveragedegree.Thela- belforthe test instanceis generatedby a weightedcombination of low level and high level features. The authors extended their work in [4] by using two more high level features namely Nor- malizedAverageDistanceamongverticesandcorenessvariabilityand using a stacking procedure to learn the weight foreach feature.

Also[5]extendsthesameworkbydiscardingtheuseofanyclas- sical classiﬁer and using a scheme that takes low level features techniquesintoaccount to ﬁlterirrelevant graphsof someof the classes.

Authorsof[6]proposedaframework forclassiﬁcationusingk- AssociatedOptimalGraphformodelingthedataandBayestheorem andcomputingaposteriorprobabilityforeachclasstoclassifynew instances. Similar to kNN graph conversionmethod, k-Associated OptimalGraphcomputesthesimilarityofadatapointwithallof thetrainingdata,however,itwouldformanedgeonlyifthepoints belong to the same class. This would result in having multiple component(and possibly morethan one component fora class).

The method furthermoretries to ﬁnd a local k foreach class so thattheresulting componentsgetthemaximal Purity(ameasure based on average degree of a component). This waythe process ofﬁndingthe parameterk isconductedautomaticallywhich also make the complexity of the framework of O(n²). Another paper

(4)

[3]also usesthe k-Associatedgraph inthispaperalong withthe highlevelclassiﬁcationmethodof[3]toclassifynewinstances.

Other methodsusingdifferentgraphmeasures havebeenpro- ducedaswell. Neto andZhao [7] uses dynamic entropy foreach weightedgraph produced by r-radius where the weights denote thedistancebetweendatapoints.Cupertino etal.[8]utilizesthe modularitymeasureforclassifyingnewinstancethatbelongstoa patternsetofthesameobjectinthetrainingdata.Thelabelisde- rivedby creating akNNgraphfor each patternsetand choosing thelabelofthegraphwithlowestmodularity changeafterinser- tionofthenewdata.Bothofthemethodsin[7,8]havethecom- plexityofO(n²).

Thegraphbasedclassiﬁcationmethodsintheliteraturemostly havethreecharacteristicsin common.Firstlythey createadiffer- entgraphforeachclassesof thedata;thisapproachavoids ﬁnd- ingmeaningfulpatternthatmayformbythesimilaritiesbetween pointsindifferentclasses.

The second aspect these algorithms have in common is that they treat test instances individually and add them to the graph ofeach class andmeasurea graph propertybefore andafter the insertion.This makes thepredictionof anewinstance ineﬃcient inpresenceoflargeamountoftestdata.

Lastly, thepropertiesthat thesealgorithms useforﬁndingthe differences before and after the insertion of the unlabeled data (e.g.clusteringcoeﬃcient, average pathetc.) are time consuming andtheircomputationtimes areusually dependent onthe graph sizewhichcanmaketheminfeasibleforlargedatasets.

Our proposed algorithm CULP and it’s extension CULM solves thefirstandsecondissuebyemployinganovelgraphrepresenta- tioncalledLEG whichtreats classes asnodesalong withtraining andtest instances asaunified object andisdiscussed further in thepaper. As for thethird problem, since the labelof a test instanceis derived using linkprediction measures (as discussed in the previous section), the classification of the unlabeled data is fasterthanthesimilarmethods.

4. CULPalgorithm

CULP (Classification Using Link Prediction) is a classification method aimed to gain a higher accuracy in mulit-class classification task by exploiting the similarity among the data points.

This algorithm employs the powers of graph representation and link prediction methods in complex networks to deal with this problem.¹TheoverallstructureofCULPisconsistedof2stages:

1.CreatingtheLEGstructureGfromthedata 2. ClassifyingthetestdatausingG

In the ﬁrststep we modelour datainto an augmented graph datastructure calledLEG(Label Embedded Graph)which we call G. G is a heterogeneous graph which incorporates the data, the classesandthesimilaritybetweenthemasauniﬁedobject.

LEGessentiallycontains3setsofnodesand2setsoflinks.The differenttype ofnodesinGaretrainingnodes,testingnodesand classnodes,alsoalinkbetweentwodatanodesdenotessimilarity betweenthemandalinkbetweenatrainingnodeandaclassnode denotestheclassmembershipofthatnode.

After creatingG, we can convertthe classiﬁcation problemto the problem of predicting the class membership link of a test- ingnode.Byutilizingalinkpredictionalgorithminthenextstep, membership score for every testing-class pair of nodes is computed.

Each ofthemembershipscores actsasaposterior probability.

Alabelischosenforatestingnodebasedonthesescores.

1The complete code of CULP in python can be found in github.com/aminfadaee/

culp .

CULP procedure is depicted in Algorithm 3. In the next seg- ments each of thesteps of the proposed algorithm iscovered in moredetail.

4.1. LEGrepresentation

The ﬁrst step toward classiﬁcation using CULP is creating the LEGrepresentation.LEG isaheterogeneousgraph withthree sets ofnodes:

• Trainingnodes(V_l)

• Testingnodes(Vu)

• Classnodes(V_c) andtwosetsofedges:

• Similarityedges(E_s)

• Classmembershipedges(Ec)

Eachsetofnodescorrespondtotheiranalogoussetofdatai.e.

V_l containsnnodes,V_ucontainsmnodesandV_ccontainsCnodes.

The classmembership edges arecreatedbased onthe labeled data.Eccontainedges(i,j)wherei∈V_landjisthenoderepresen- tationofy_i,meaningthateachtrainingnodeisconnected(without direction)toitscorresponding classnode.Itshould benotedthat sincethelabelsforthetest dataisnot available,Eccontainsonly pairofnodesfromV_l andV_c.

UnlikeEc,themembersofEsarenotobtainedsotrivially.Esis responsibleforincorporatingthesimilaritiesbetweeninstancesof ourdata andthe edges inthissetare obtainedby usingagraph conversionalgorithm.In thisworktheundirectedversion ofkNN graphconversion(Algorithm1)isused.

EdgesinEsprimarily connect twonodesinV_l ora nodefrom Vu tooneinV_l.However,thereisnoconstraintonhavinganedge betweentwo nodesinVu,meaning thatwe can ﬁndthe similar- itybetweenunlabeleddataandconnectthemaswell(aswehave doneinthiswork).

Iftheunlabeleddataisnotavailableatfirstorincaseofanew unlabeled node x⁽û⁾ this node is first added to the set Vu, after thatthesimilarityedgesbetweenthisnodeandothernodesofthe graphiscreatedthroughalinearsimilaritycomputation.

Aftercreatingallofthesetsofnodesandedges,wecandeﬁne theLEGG(V,E)whereV=V_l∪Vu∪Vc andE=Es∪Ec.AlthoughG isinherentlyheterogeneous,wecantreatitasasimpleundirected graph.TheprocedureforcreatingGissummarizedinAlgorithm2.

Algorithm2 LEGconstructionfunctionforthedataX⁽^l⁾,thelabels yandtheunlabeleddataX⁽^u⁾withparameterkandthesimilarity functions.

functionLEG(X⁽^l⁾,X⁽^u⁾,y,s,k) X=X⁽^l⁾∪X⁽^u⁾

V_l←

{

¹,2,...,n

}

//Nodes are represented by numbers Vu←

{

ⁿ+1,n+2,...,n+m

}

Vc←

{

ⁿ+m+1,n+m+2,...,n+m+C

}

Ec←

{}

fori∈

{

¹,2,...,n

}

^do

E_c←E_c∪(ⁱ,n+m+y_i) endfor

Es←kNN-CONVERT(^X,s,k) V←V_l∪V_u∪V_c

E←Es∪Ec

returnG(^V,E) endfunction

Thisalgorithmtakesthelabeledandunlabeleddataalongwiththe parameter k andthesimilarity measure sandproduces G asthe output.

(5)

TherearealwaysnedgesbelongingtoEc.Thenumberofedges inEshowever,hasanupperandlowerbound.Theminimumnum- ber of possibleedges inEs is obtainedwhen the kNNprocedure ofeachpairofpointsinX(X⁽^u⁾∪X⁽^l⁾)issymmetric-meaningthat

∀

ⁱ

∀

^j,ⁱ∈kNN(j)↔j∈kNN(i).ThemaximumnumberofedgesinEson the other handis obtainedwhenthe kNNprocedure isnot sym- metricforanypairofnodesinX.Usingthese, theboundsonthe numberofedgesinaLEGcanbederivedasEq.(9).

n+k

2

(

ⁿ+m

)

≤

|

^E

|

^≤ⁿ⁺^k

(

ⁿ+m

)

⁽⁹⁾

BytheboundsinEq.(9),itcanbestatedthatGgivesusanew low memorycostrepresentationofthedata.The memoryforthe originaldataisofO(ⁿ×d+m×d+n)^for^X⁽^l⁾^,^X⁽û⁾ând^y^,^but^since it isusually the casethat k<<dforhighdimensional data,LEG savesalotofmemorycomparedtousingtheoriginaldataforthe taskofclassification.

AnotheraspectofLEGisthefact thatwe areincorporatingall of our labeled and unlabeled data and class labels in a unified structurethatenablesustofindthelabelsofthetestdataviasim- pleandefficientgraphproperties,specificallylinkpredictionmeth- odswhichiscoveredinthenextsegment.

4.2. Classiﬁcation

Asstatedbefore,inclassification,thegoalistofindamapping X_i⁽û⁾→yˆ_iforeveryi∈1,...,m.UsingtheLEGrepresentation,this problem can be reformatted asfinding j^∗ for

∀

ⁱ∈V_U so that the probabilityof(i,j^∗)∈E_cismaximized.

The newformulation means that edges will be added to the setEcbypredictingthemostprobablemembershiplinkforevery testnode.Thiscanbeeasilydonevialinkpredictionmethodsdis- cussedbefore.

Using a local similarity measure

λ

^for ^link ^prediction ^(e.g.

Adamic-Adarindex),thisproblemcanbe solvedusingthefollow- ing:

∀

ⁱ^∈^Vû^, Ê^c^←Ê^c^∪

(

^i,^j^∗

)

j^∗=argmax

j∈Vc

( λ

ⁱ,j

)

⁽¹⁰⁾

Althoughmorecomplexlinkpredictionmethods(randomwalk, average path length etc.) can be used to solve the problem, the localsimilaritymeasures arenotonlyextremely fastandeﬃcient to computebutthey alsoderive competitively accurateresults as it willbe discussinthe experiments.The pseudocodeofCULP is depictedinAlgorithm3.

Algorithm3 CULPAlgorithm.

functionCULP(X,X⁽^u⁾,y,s,k,

λ

⁾

G←LEG(^X,X⁽^u⁾,y,s,k) ˆ

y←

{}

fori∈Vudo j^∗←argmax

j∈Vc

(

λ

i,j) ˆ

y_i←j^∗−(ⁿ+m) endfor

returnyˆ endfunction

4.3. Compatibilityscore

In this work a novel local score forlink prediction is formed which is designed speciﬁcally for the task of classiﬁcation. This new similarity function is called Compatibility Score and like

Fig. 1. Using AA or RA for predicting the formation of ( i, j 1) in both LEG’s would result in the same score, however node γ in the ﬁrst case is more valuable for the prediction.

Adamic-Adar and Resource Allocation scores penalizes the com- monneighbors,however,thispenalizationisdonedifferently.

BothAAandRAscorescanbeunfairinsomeinstances,meaning that they can over-penalize a valuable common neighbor or give the same score to two inherently different nodes. Take the twoLEGsinFig.1forexample(i∈Vu,

γ

^,^a,^b,^c∈V_l andj₁,j₂∈Vc).

Inboth cases thegoal isto ﬁnd thescore forthe (i, j₁) link.AA andRAwouldbothpenalizenode

γ

ⁱⁿ^the^same^way^(penalty^of⁵

forRAandlog(5)forAA);however,intheﬁrstLEGthenode

γ

^is

morevaluablethanthat ofthesecondLEGandthisisduetothe factthat threeneighborsofthisnode (a, b,c)are alsoconnected tonodej₁.

Whentrying topredictthescorefortheformationoflinkbe- tween nodes i and j with a common neighbors between them namely

γ

^,^two^setsôfêdges^can^be^defined^starting^from

γ

^:^com-

patibleedgesandincompatibleedges.

Compatible edges for node

γ

^are ^the ^ones ^connecting

γ

^to

nodeswhicharebythemselvesconnectedtothedestinationofthe candidatelink(jinthiscase).Wecandeﬁneincompatibleedgesas alltheotheredgeswhicharenotcompatible.

Nowthecardinalityofincompatibleedgesortheincompatibility penaltyfornode

γ

^whichîsâ ^common^neighborôf^nodesⁱ ând^j

canbedeﬁnedasthefollowing:

δ (

^i,^j,

γ )

⁼

|

γ

|

⁻

|

γ ∩

j

|

⁽¹¹⁾

UsingEq.(11)theCompatibilityScore(CSforshort)isformally deﬁnedas Eq.(12). In thisequation both

δ

⁽^i, ^j,

γ

⁾ ^and

δ

⁽^j, ^i,

γ

⁾

areusedforthepredictionof(i,j)tomakethescoresymmetricso that

λ

i,j=

λ

j,i.

λ

ⁱ,j= γ∈i∩j

1

δ (

i,j,

γ )

⁺ 1

δ (

j,i,

γ )

⁽¹²⁾

Usingthe CompatibilityScoreforthe casesofFig.1 thescore forlink(i,j₁)inLEG1canbecomputedas0.7andinLEG2as0.4.

ThisisthedesiredoutcomeasthescoreinLEG1isnowhigher.In theexperiments,amoredetailedcomparisonofCSwithotherlink predictionmethodsisdone.

(6)

4.4.Timecomplexityanalysis

In this subsection, the time complexity of ﬁnding the class membershipedgeofatestnodewillbeanalyzed.Themaincom- ponentin ﬁnding the correctlink isthe local similaritymeasure

λ

^whichîsûsed^for^linkprediction.Theselocalmeasures findthe scoreintime proportionalto thedegree oftheir sourceanddes- tinationnodes.InCULP, thesource node i belongsto Vu andthe destinationnode jbelongstoVc.Sothefirststepinanalyzingthe timeoffindingaclassmembershipedgeisfindingtheaveragede- greeofnodesinVuandVc.

Thedegreeofnodejisthenumberoflabelednodesconnected to it ormore speciﬁcally n_j which is the numberof data points with class of node j; however, for the degree of i a more de- tailedanalysisisneeded.Asstatedbefore,inanyundirectedgraph Eq.(1)holds.Eq.(1)canberewrittenasthefollowing:

|

^E

|

= 1 2

i∈Vc

|

i

|

+

i∈V_l

|

i

|

+

i∈Vu

|

i

|

⁽¹³⁾

Since thedegreeoftheclassnodessumsuptothenumberof labeleddatan,itcanbesubstitutedintheaboveequation;onthe otherhand,ifwetreateachnodeinVu tohaveaveragedegree D, wecanstatethatnodesinV_l wouldhaveaveragedegreeofD+1 (sinceeachofthemhasalsoamembershipedge).Usingallthese, theaboveformulacanberewritteninthefollowingmanner:

|

^E

|

⁼ ¹₂

(

ⁿ⁺ⁿ

(

^D⁺¹

)

⁺^mD

)

|

^E

|

=n+nD 2 +mD

2 (14)

Asstatedbeforethe numberofedgesina LEGis boundedbyan upper and lower bound which is derived in Eq. (9). Now using Eqs.(14)and(9)theupperboundofDcanbedeﬁnedas:

n+nD 2 +mD

2 =k

(

ⁿ+m

)

+n

D=2k (15)

anditslowerboundas:

n+nD 2 +mD

2 =k

2

(

ⁿ+m

)

+n

D=k (16)

Consequently,theaveragedegreeforlabeledandunlabelednodes is of O(k) and for class nodes is of O(n). The Common Neigh- bor,Adamic-Adar and Resource Allocation all have the complexity ofﬁndingthecommonneighborsbetweensourceanddestination whichistheintersection oftheneighborhoodsofthetwo nodes.

TheCompatibilityScorehowever,ﬁrstﬁndsthecommonneighbors anddoestwo intersection foreach of the nodesin the common neighborset.

Ifdoneeﬃciently,theintersectionoftwosetswithsizesaand bcan be obtainedinorder ofO(min(a,b))inaverage. Usingthis, the complexity of ﬁnding the score in LEG for the formation of linksbetweeniand jis ofO(k) whenCommon Neighbor,Adamic- Adar or Resource Allocation is used and is O(k²) when Compati- bility Score is used. Since k is usually small (in our experiments 1≤k≤35), it is safe to state that the link prediction is done in constanttime;alsoasthereareCnodesinVc,predictingthelabel ofminstanceswouldtaketimeofO(mC)aftercreatingtheLEG.

Fig. 2. Toy example demonstrating CULP. A- The set of data belonging to 2 classes and a test point in red B- LEG graph of the data.

4.5. Toyexample

Inthissubsectionasimpleclassiﬁcationproblemissolvedus- ing CULP to demonstrate the steps involving in this algorithm.

ThedataispresentedinFig.2-Aastwoclasses.Thewhitepoints represent the data of class 1 and the dark points belong to class 2.The problemisﬁnding the correctlabelofthe red point (pointi).

Theﬁrststepischoosingasimilarityfunctionsandavaluefor theparameter k forformingthe graph.Herewe chose k=2 and theEuclideansimilarity(discussedinthepreliminariessection).

Now thenodesets canbe deﬁnedasVc= j₁,j₂,Vu=iandall theotherpointsasthesetV_l.BycreatingtheedgesinEcandEsas showninAlgorithm2 theLEG inFig.2-Bcan be derived.As can beseen,inthisgrapheverynodeexceptforiisconnectedtoone oftheclassnodesj₁ andj₂ (whitenodes)by dottedlinksandthe blacklinksrepresentstheedgesofEs.

Lookingatthegraph,itcanbeseenthatthenodeiisconnected tonodesa,bandc.Thismeansthesenodeswouldassistinﬁnding thelabelfornodei.Usingthesenodes,thescoresforedges(i,j₁) and(i,j₂)canbeobtainedwitheachofthescoresdiscussedbefore as

λ

^.^The^results^of^computing^these^scores^are^depictedⁱⁿ^Table¹^.

The resultsofall the linkpredictors inTable 1show that the scoreforthelink(i,j2)ishigher.Thispredictionmatchesthepat- ternperceivedbylookingatthedatainFig.2-Aandisthecorrect prediction.

(7)

Table 1

Scores computed by 4 different link predictor for the toy example of Fig. 2 .

λ (i, j 1) (i, j 2) Prediction

CN 1 2 2

AA 1/ log (4) 2/ log (3) 2

RA 1/4 2/3 2

CS 1 / 2 + 1 / 4 2(1 / 2 + 1 / 3) 2

5. CULMextension

As we stated in the time complexityanalysis subsection and demonstratedinthetoyexampleoftheprevioussection,oncethe LEG structure isformed, the prediction of links can be done in- stantly;knowingthisandthefact thattherearedifferentoptions inchoosingthelinkpredictor

λ

^,^the^question^arises^as^to^why^not

use allof our predictors and somehow combinetheir predictive ca- pabilities to assistus in ﬁndingthe best membershiplink fora test node?

The next question arises after we analyze the related works doneinthefieldofclassificationusingcomplexnetworkrepresen- tations. Agoodportion ofthesemethods arecapable ofincorpo- ratingorexploitingthelow levelfeatures ofthedata toenhance the classificationperformance. How canwe modifyour framework CULPto exploitthelowlevelfeaturesofthedataaswellasthehigh levelfeatures?

Theanswer tobothofthesequestionsliesinourextensionto CULPalgorithmwhichwecalltheCULMextension.CULMincreases the predictive capabilities of CULP by using a weighted majority vote procedure(hencethe MasinMajorityintheendinstead of P).

Insteadofusingonlyonelinkpredictor

λ

^,^we^willûseânârray

oflinkpredictors .Eachlinkpredictor

λ

^when^used,^gives^a^score

to the links (i, j) for all j∈Vc. We can use all of thesescores to estimatetheprobabilitypofourpredictioncorrectnessasEq.(17).

p_y_ˆ=

λ

i,j^∗ j∈Vc

λ

i,j

(17)

In this equation yˆ is the label corresponding to j^∗ and j^∗ is computedusingEq.(10)oftheprevioussection.UsingEq.(17)we canassignconﬁdencetothepredictionof

λ

^.^When^using^multiple

predictors, it isobvious that a

λ

^with ^higher^conﬁdence ^is^more

reliable.Weare goingtousetheseprobabilities toassignweights toeachofthe

λ

^sⁱⁿ ^.^This^wayînsteadôfûsingâ^simple^majority

vote, a weighted voting procedure can be used. In a weighted majority vote procedure,few predictions are aggregated. Eachof these predictionhas an individual weight whichstates the value of their vote;ﬁnally thevoting inthis settingwould be done as Algorithm4.

Algorithm4 WeightedMajorityVotingAlgorithm.

functionVOTE(Y,W) L←

{

⁰

}

C

fory∈Yandw∈W do Ly←Ly+w

endfor ˆ

y←argmax(^L) returnyˆ endfunction

In Algorithm4,Y isthe setcontainingthe predictedlabels of each ofthe predictors, W isthe respective weights ofthe labels and Lis a set with Celements which keepstrack ofthe weight foreachoftheclasses.Usingthisalgorithmenablesustonotonly

usemultiplelinkpredictors’predictedlabels,butalsoincorporate arbitraryanyclassicalclassiﬁer

ψ

^with^suitable^weights.^This^way

thelowlevelfeaturesofthedataisexploitedaswell.

Thenext step isto deﬁnethe weights foreach ofourpredic- torsand

ψ

^. ^If^y^ˆλ isthepredictedlabelofthepredictor

λ

^for^the

unlabeleddatax⁽^u⁾andp^λ_y_ˆ istheprobabilityofthisprediction,the weightofpredictor

λ

^for^x⁽û⁾ ^can ^be ^definedâsÊq.⁽¹⁸⁾^.Âlso^for

thepredictionof

ψ

ôn^x⁽û⁾ ^which ^can^be ^denotedâs^y^ˆ_ψ,we can definetheweightasEq.(19).

w^λ_y_ˆ =

α

^p^λyˆ

λ∈

p^λ_y_ˆ (18)

w^ψ_y_ˆ =1−

α

⁽¹⁹⁾

The

α

^parameter^which îs ûsedⁱⁿ ^both êquationsîs^provided

bythe user.Thisparametercontrolsthetrade-off thatCULM will makebetweenthelinkpredictors’labelsandthepredictionofthe lowlevelclassiﬁer.

The parameter

α

^is ^chosen ⁱⁿ ^the ^range⁰ ^to ^1; ^however^any

valuebelow0.5wouldresultinneutralizingthevoteofCULMpre- dictors.Alsoif

α

=1,thepredictioniscompletelydoneby CULM predictorsandthe low levelclassiﬁer is ignored;soin generalit canbestatedthat0.5≤

α

≤1.

Now theCULM extension canbe formally deﬁnedas thepro- cedure captured in Algorithm 5. In this algorithm, after creating

Algorithm5 CULMAlgorithm.

functionCULM(X,X⁽^u⁾,y,s,k, ,

ψ

^,

α

⁾

G←LEG(^X,X⁽^u⁾,y,s,k) ˆ

y←

{}

foriinVudo P←

{}

Yˆ←

{ ψ

(^Xi⁽^u⁾)

}

W←

{

¹−

α}

for

λ

ⁱⁿ ^do

j^∗←argmax

j∈V_c (

λ

i,j) P←P∪ ^λ^i,^j∗

j∈Vc

λ_i,j Yˆ←Yˆ∪j^∗−(ⁿ+m) endfor

forp∈Pdo W←W∪ ^α^×^p

p_∈P p

endfor ˆ

y_i←VOTE(^Y^ˆ,W) endfor

returnyˆ endfunction

theLEG,eachofthepredictorsin producealabelandaproba- bility.Theseprobabilitiesandlabelsarethenmergedwiththatof thelowlevelclassiﬁer

ψ

^to^formûp^Yând^W^whichâre^passed^to

Algorithm4toproducetheﬁnallabelforthetestinstance.

Asanalyzed,thetimecomplexityofpredictingthelabels ofm instances usingCULP isO(mC). CULM inherentlyrepeats the pre- dictionltimeswithlbeingthenumberoflinkpredictorsin and then uses a majority vote. The predictions complexity is O(lmC) andthevotinghasthecomplexityofO(l);Therefore,wecaniden- tifyCULMtime complexitytobe ofO(^lmC+l+O(

ψ

))^with^O⁽

ψ

⁾

part being the complexity of the low level classiﬁer.Clearly the generaltime forCULMcouldbe majorlydifferentuponusingdif- ferentclassiﬁers.

(8)

Table 2

Datasets used in deriving the results for CULP and CULM.

Dataset Instances Attributes Classes

Zoo 101 16 7

Hayes 132 4 3

Iris 150 4 3

Teaching 151 5 3

Wine 178 13 3

Sonar 208 60 2

Image 210 19 7

Glass 214 9 6

Thyroid 215 5 3

Ecoli 336 7 8

Libras 360 90 15

Balance 625 4 3

Pima 768 8 2

Vehicle 846 18 4

Vowel 990 10 11

Yeast 1,484 8 10

RedWine 1,599 11 6

Segment 2,100 19 7

Optical 5,620 64 10

Poker 25,010 10 10

6. Experimentalresults

In thissection, we are presenting the result of our proposed algorithmsCULPandCULMon20differentrealdatasetsandcom- paringit to classical classification methods as well asbest clas- sifiersof the related works in the domain ofclassification using complexnetworks.

The datasets used for our experiments are all obtained from UCI machinelearningrepository [19].These datasetsinclude Zoo, Hayes-Roth (Hayes), Iris, Teaching Assistant Evaluation (Teaching), Wine,SonarMinesvs.Rocks(Sonar),ImageSegmentationtrainingset (Image)andtestingset(Segmentation),GlassIdentiﬁcation(Glass), Thyroid Disease (Thyroid), Ecoli,Libras Movement (Libras), Balance Scale(Balance),PimaIndiansDiabetes(Pima),StatlogVehicleSilhou- ettes(Vehicle), Vowel Recognition(Vowel), Yeast, WineQualityRed (RedWine),OpticalRecognitionofHandwrittenDigits(Optical),Poker Hand(Poker).Eachofthesedatasetsalongwiththenumberofin- stances,attributesandclassesislistedinTable2.

6.1.CULPanalysis

Thereasonbehindchoosingthesedatasetsisthevarietyofboth structureanddomainbetweenthem.Thesizeofthesedataisbe- tween101to25,010 whichtestthe practicalityofouralgorithms on both small andlarge datasets; the number of attributes vary from4to90whichtesttheproposedalgorithmsagainstbothlow andhighdimensionaldatasetsandﬁnallythere isalotofvariety inthenumberofclassesinthe datasetswhich rangesfrom2up to10.

This section is organized asfollows: ﬁrst, the experiment on CULP and different predictors as

λ

^is ^presented, ^after ^that ^the

CULMalgorithms isanalyzed with3different lowlevel classiﬁer, thefollowingsubsectionwilldiscusstheeffectsof

α

^parameter,^af-

terthat acomparisonofCULPandCULMwithclassical classifiers willbedemonstratedandfinallyCULPandCULMwillbecompared along all the classical approaches and the similar works around classificationusingcomplexnetworks.

As the ﬁrst experiment, different link predictors are used in CULPtocomparetheperformanceofeachoneonthedatasets.For thisexperimentsthepredictor

λ

îsôneôf^the^CN,ÂA,^RAând^CS

whicharerespectivelydeﬁnedinEqs.(2),(3),(4),(12).

TheparametersusedinCULMarek(1≤k≤35),

λ

^(the^link^pre-

dictor which is Common Neighbors, Resource Allocation, Adamic

AdarorCompatibilityScore),thevectorsimilarityfunctionsand

α

(0.5≤

α

≤1).Foreachlinkpredictorandeachdataset,theparam- etersaretuned.Thistuningisdoneviaa10-FoldCrossValidation procedure. After ﬁnding the best parameters, 30 runs of 10-Fold CrossValidationisdone thatamounttototalof300runs.Table3 capturestheresultsobtainedbythesesettings.

Ineach cell ofTable3,the ﬁrstnumberisthe meanaccuracy of the runsand the second number is thestandard deviationof them. The number in the parentheses represent the best k ob- tainedforeachcell andthebold cellarethe bestresultobtained onadataset.

AscanbeseeninTable3,theCompatibilityScoreachievedthe best resultsamongthe predictors,this isdueto the fact that CS exclusivelygotthehighestaccuracyon6datasetsofGlass,Libras, Balance,Pima, YeastandRedWine.Inthesecond placeisthe Re- sourceAllocationIndexthatobtainedthetopaccuracyforZoo,Iris, Ecoli,OpticalandPokerexclusivelyandachievedanidenticalbest accuracywithAdamic-AdarScoreon theVowel dataset.Thethird bestpredictoristheCommonNeighborwith5datasetsofHayes, Teaching, Sonar, Thyroid andVehicle on top and ﬁnally Adamic- Adar for Wine, Image and Segment and the shared best results withRAforVowel.

Analyzing the ks in this experiments, we can see that for 10 datasetsofZoo,Hayes,Iris,Teaching,Wine,Image,Thyroid,Libras, Vehicleand Poker thebest k is identicalfor each predictoron a dataset;inBalanceandPimahowever;theksarenoticeablydiffer- entwithCommonNeighborhavingthehighestkinbothofthem.

Intherestofthedatasetsthechoice ofkamongdifferentpredic- torsareatmostdifferentby1(forYeastitis2).

6.2. CULManalysis

Asthenextexperiment,theCULMalgorithmisrunoneachof thedatasets. Theparameter

α

^is ^tuned^over ^the^set^{0.6, ^0.7, ^0.8,

0.9,1}.Allthevaluesbelow0.6for

α

^is^not ^used^to^keep^the^re-

sultsandcomparisons fair(asstated before,anyvalue below0.5 for

α

^zeros^the êffectôf ^CULP ^predictorsâlso experimentally the same holdsfor

α

=0.5), thiswaywe are sure that thelink pre- dictorsisnotcompletelyovershadowedbythelowlevelclassiﬁer.

Otherparametersofthealgorithmandthetuningisdoneasbefore andagaineachcellistheresultof300runs.

For a low level classiﬁer to accompany the link predictors in CULM,three different algorithms havebeen chosen andused.

Theselow level classiﬁersare LDA(LinearDiscriminantAnalysis), CART (Classiﬁcation And Regression Trees) and multi-class SVM (SupportVectorMachine)withRBFkernel.

Table4 capturesthe resultsofthisexperiments. Theﬁrst col- umn is the best results for each of the datasets using CULP (Table 3);the next three columns are the results of CULM with respectivelyLDA, CARTandSVM as

φ

ândⁱⁿêach ôf^the^cellsⁱⁿ

thesecolumnthe numbersin parenthesesrepresentthek and

α

usedinruns.Thelastcolumninthistablerepresentstheaccuracy gainachievedbyusingCULMinsteadofCULP.Eachofthenumbers inthiscolumnisobtainedby comparingthebestresultobtained byCULMwiththebestresultobtainedbyCULPforeachdataset.

LookingatTable4itisclearthat intheThyroid dataset,using CULMachievednochangeintheaccuracyandinthedatasetsIris andOpticaltheaccuracydeteriorates;however,takingintoaccount theother 17datasets,CULMalmost achievedacompletelyhigher result.

CULMwithSVMasitslowlevelclassiﬁerachievedthebestre- sultson6datasets ofSonar,Thyroid, Libras,Balance,Vowel,Red- WineandPoker exclusivelyandshares thebestresultonThyroid with CULM-LDA and CULP. As the next best classiﬁers we have bothCULM-CARTandCULM-LDAwithexclusively5bestaccuracy