Contents lists available atScienceDirect
Physics Letters A
www.elsevier.com/locate/pla
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
Label propagation algorithm for community detection based on node importance and label influence
Xian-Kun Zhang
1, Jing Ren, Chen Song, Jia Jia, Qian Zhang
CollegeofComputerScienceandInformationEngineering,TianjinUniversityofScienceandTechnology,No.29,13thAvenue,TEDA,BinhaiNewArea,Tianjin, 300457,PRChina
a r t i c l e i n f o a b s t ra c t
Articlehistory:
Received28March2017
Receivedinrevisedform9June2017 Accepted10June2017
Availableonlinexxxx CommunicatedbyC.R.Doering
Keywords:
Large-scalesocialnetwork Communitydetection Labelpropagation Nodeimportance Labelinfluence
Recently, the detection of high-quality community has become a hot spot in the research of social network. Labelpropagationalgorithm(LPA)hasbeenwidelyconcernedsince ithasthe advantagesof lineartimecomplexityand isunnecessarytodefineobjectivefunctionandthenumber ofcommunity inadvance.However,LPAhastheshortcomingsofuncertaintyandrandomnessinthelabelpropagation process,whichaffectstheaccuracyand stabilityofthecommunity.Forlarge-scalesocialnetwork,this paperproposesanovellabelpropagationalgorithmforcommunitydetectionbasedonnodeimportance and label influence (LPA_NI). The experiments with comparative algorithms on real-world networks and synthetic networks have shown that LPA_NIcan significantly improve the quality ofcommunity detectionandshortentheiterationperiod.Also,ithasbetteraccuracyandstabilityinthecaseofsimilar complexity.
©2017ElsevierB.V.Allrightsreserved.
1. Introduction
Socialnetworkisarelativelystablesocialrelationshipwiththe interactionresult betweenindividual members.Social network is commonlyabstractedintoagraph,inwhichnodesrepresentindi- vidualmembersandedges (orlinks)aretherelationshipbetween individual members [1]. A community is a subgraph containing nodeswhicharemoredenselylinkedtoeachotherthantotherest ofthegraphorequivalently,agraphhasacommunitystructureif thenumberoflinksintoanysubgraphishigherthanthenumber oflinksbetweenthose subgraphs[24].Givenanetwork diagram, theprocessoffindingitscommunitystructureiscalledcommunity detection. In socialnetwork, community detectionforsocial net- workanalysisisimportant.Inrecentdecades,manysocialnetwork communitydetectionalgorithmshavebeenproposed.Accordingto thesolutionstrategy,algorithms canbe dividedintooptimization methodsandheuristicmethods[2].Optimizationmethodsachieve community detection through setting the objective function and iteratingto getapproximation of theoptimal value ofthe objec- tivefunction.Therepresentativealgorithmsofoptimizationmeth- ods includespectral algorithms andmodularity maximization al- gorithms[3–9].Thespectralalgorithms[3,4]transformcommunity identificationproblemintosimplequadraticoptimizationproblem,
E-mailaddress:[email protected](X.-K. Zhang).
1 Fax:8602260274450.
andtheapproximateoptimalpartitionofthenetworkisobtained bysolvingthecharacteristicvectorsofLaplacianmatrix.Modular- itymaximization methodsare to findthemaximumvalue ofthe modularityfunction(Q function)inthenetwork,whoserepresen- tative algorithms includeFN algorithm [5], GA algorithm [6]and EO algorithm [7]. Heuristic methods find the optimaldivision of communities bysettingheuristicsrules,anditsrepresentative al- gorithmsincludeGNalgorithm[10]andWHalgorithm[11].
Inaddition,thereareothereffectivecommunitydetectionalgo- rithms. Forexample,hierarchicalclustering algorithms regard the social network asthecomposition of themulti-layer community, and use the traditional hierarchicalclustering algorithm to carry outcommunitydetection.Evolutionarycomputationalgorithmsre- gardtheprocessofcommunitydetectionastheoptimizationpro- cess ofthe objective function,and useevolutionary computation methodstofindtheapproximateoptimalsolutiontotheproblem.
Thesealgorithms generallyhave hightime andspacecomplexity, andthey are not suitable forlarge-scale social networkstructure detection.In2007,Raghavanetal.[12]proposedafastcommunity detectionalgorithmLPAbasedonlabelpropagation.Thecomplex- ityofLPAisnearlylineartime,andthedesignofthealgorithmis simple,soLPAhasgoodefficiencyindealingwithlarge-scalenet- work. Moreover, LPA neitherrequire apredefined objective func- tion optimizationnor need communityquantity andscale. All of these make LPA receive quite a lot of attention from numerous scholars.However, intheprocess ofupdatingthenode label, LPA hastheshortcomingsofuncertaintyandrandomness,whichleads http://dx.doi.org/10.1016/j.physleta.2017.06.018
0375-9601/©2017ElsevierB.V.Allrightsreserved.
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
Fig. 1.Label propagation process (Raghavan et al.[12]).
to poorresults inaccuracy andstability of the algorithm.Litera- ture[19]takesintoaccounttheinfluenceofnodedegreeandedge weightduringthelabelupdating process.Butinlarge-scalesocial networks,such asthemicro-blogsocialnetwork,theinfluence of thenodes’ prior attribute onthe node importance is not consid- ered.
Inthispaper, wepropose a novellabelpropagationalgorithm forcommunitydetectionbasedonnode importanceandlabelin- fluence in networks (LPA_NI). Firstly, LPA_NI uses a novel node importancemethodtoevaluatenodeimportance.Secondly,inthe iterativeprocess,LPA_NIfixesthenodeordersoflabelupdatingin thedescendingorderofnodeimportance.Asthenumberofmul- tiple labelsreaches maximum,LPA_NI calculates the influenceof each label and selects the mostinfluential label to update node label. Finally, experiments with comparative algorithms on real- world networksand syntheticnetworks haveshown that LPA_NI cansignificantly improvethe qualityofcommunitydetectionand shortenthe iteration period.Also, ithasbetter accuracy andsta- bilityinthecaseofsimilarcomplexity.
The restofthe paperisorganized asfollows.Section 2 intro- duces the label propagation algorithm and the node importance calculationalgorithmbasedonBayesiannetwork.InSection3,this paper introduces the main idea and the detailed process of our algorithm. The experimental results on real-world networks and synthetic networksin Section 4 confirm the effectiveness of the algorithm.TheconclusionisgiveninSection5.
2. Relatedwork
A complex network can be modeled as a graph G=(V,E), where V = {v1,v2,· · ·,vn} represents the set of nodes, E = {e1,e2,· · ·,em}representstheedgesbetweenthenodes,andn,m representthenumberofnodesandedges inthenetwork,respec- tively.EachedgeinE hasapairofnodesinV corresponding.The labelofvi isdenoted asci, N(i)representstheneighborhoodset ofviandd(i)isthedegreeofnodei.
2.1. Labelpropagationalgorithmforcommunitydetectioninnetworks
LPA is a kind of heuristic algorithm, and the main heuristic rules of LPA is to transfer label information constantly between nodes. After several iterations, the node label belonging to the samecommunitywillconverge.ThemainideaofLPAisthateach nodechangesitslabeltotheonecarriedbythelargestnumberof itsadjacentnodes(asshowninFig. 1).TheprocessofLPAcanbe describedasthefollowingsteps:
Input: Network G=(V,E), maximum number of iterations maxIter.
Output:CommunitysetSetc= {c1,· · ·,ck},andkisthenumber ofcommunity.
(1)Initializetheuniquelabelforeachnodeinthenetwork.For agivennodex,cx(0)=x.
(2)Setiterationnumbert=1.
Fig. 2.The oscillation of labels in a bipartite graph (Xing et al.[19]).
(3)Arrangethenodesofthenetworkinrandomorder,andgen- erateanorderedsequence X= {x1,x2,· · ·,xn}.
(4) Foreach node x∈X,iteratively update the node labelac- cording to the formula (1) or (2). In the iterative process, each nodechangesitslabeltotheonecarriedbythelargestnumberof itsadjacentnodes.Asthenumberofmultiplelabelsreachesmax- imum,LPArandomlyselectsoneofthemtoupdatethenodelabel.
Labelupdatingmethodscanbedividedintosynchronousupdating method andasynchronous updating method.Synchronous updat- ing methodis to update the labelof node x inthe tth iteration basedonthelabeloftheadjacentnodesatthe(t−1)thiteration.
Theformulaisasfollows:
cx
(
t) =
fcx1
(
t−
1), · · · ,
cxk(
t−
1)
,
xi∈
N(
x),
(1) where cx(t) representsthelabelof node xinthe t iteration,and N(x)representstheneighborhoodsetofnode x.Synchronousup- dating may occur oscillation phenomenon in bipartite or nearly bipartitestructurenetwork(asshowninFig. 2),andasynchronous updating can be a good solution to this problem [12]. Asyn- chronous updating is to update the label of node x in the tth iteration based on a portion of labels at the tth iteration of its adjacentnodeswhichhavealreadybeenupdatedinthecurrentit- eration andanotherpartoflabelsatthe(t−1)thiterationwhich arenotyetupdatedinthecurrentiteration.Theformulaisasfol- lows:cx
(
t) =
fcx1
(
t−
1), · · · ,
cxm(
t−
1),
cx(m+1)(
t), · · · ,
cxk(
t) ,
xi
∈
N(
x),
(2)(5)Ifiterationnumbert=maxIter orthelabelofeachnodeis thesameasthatofmostofitsneighboringnodes,thenthenodes with thesame labelare placed inthe samecommunity, andthe algorithmends;otherwise,sett=t+1 andgotostep (3).
AlabelpropagationprocessisshowninFig. 1.Firstly, LPAini- tializesallthenodesinthegraph,andallthe5nodesareassigned unique labels: “a”, “b”, “c”, “d” and“e”. Secondly,algorithm ran- domlyselectsnode3toupdate itslabel,becausetheneighboring nodes around node 3 are labeled withthe maximumvalue of 1, node 3 randomly selects one of them to update its label as“a”.
Then, algorithm randomly selects node 5to update its label, be- causethereare twoneighborsofnode 5that sharelabel“a”and only thelabel“a”hasa maximumvalue of2,node5updatesits
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
Fig. 3.The social network sample graph.
labelas“a”. Finally,thelabelsofthewhole nodeshavebeenup- datedforlabel“a”,thelabelpropagationalgorithmend.
Althoughasynchronousupdatingmethodcanavoidtheoscilla- tionproblemsoflabels,it canbe seenfromtheabove steps that theuncertaintyandrandomness ofLPAaremainly fromtheran- domnessofupdatesequenceanduncertaintyinnodelabelschoos- ingprocessinstep(4),whichaffectedtheaccuracyandstabilityof LPAalgorithm.WeanalyzetraditionalLPAwiththesocialnetwork sample graphinFig. 3. Thereare 2 communitiesin thenetwork, {1,2,3}and{4,5,6}.The6nodesinthegraphareinitializedwith label“a”, “b”,“c”,“d”, “e”and“f”,respectively.Assumingthatthe labelsofnodes1,2,3havebeenupdatedtolabel“a”,thelabelsof nodes4,5,6arestill“d”,“e”,“f”.Ifwerandomlyselectnode4to updateandlabel“a”ischosenasitslabel,thenweupdatenode6, andfinally updatenode5.Theresultisthatallnodesinnetwork willbe updatedto label“a”,andall nodeswillbe dividedinto a community.Conversely,ifnode4isupdated tolabel“f”,thenup- datenode 5,andfinallyupdatenode6.Theresultofthenetwork willbe dividedinto2communities,{1,2,3}and{4,5,6},andthe resultswillcorrespondwiththerightcommunities.
Inrecentyears,manyscholarshaveimprovedthestandardLPA algorithm.Steve etal.[13] extended the standard LPAalgorithm, andproposedanalgorithmCOPRAfordetectingoverlapping com- munity structure by allowing each node to retain a number of communitylabels.Barberetal.[14] proposeda constrainedlabel propagationalgorithmLPAm,whichconvertedLPAasanoptimiza- tionproblem,andgavethecorrespondingobjectivefunctionH,so itcouldsolvetheproblemofclusteringperformanceofLPA.Leung etal.[15] foundthat afterfiveiterationofLPA,95%ofthenodes hadbeencorrectly gathered.Afterthat, the iteration was mainly toupdatethenodeslabelsinthecommunity.Therefore,theyim- provedtheupdatingrulesandtheiterativerulesofLPA.Zhanget al.[16] proposedaLPAc algorithmbasedonedgeclusteringcoef- ficient,whichimprovedthe labelupdating processby calculating theedgeclustering coefficient.Literature[17]usedthemethodof localcyclestosolvetherandomprobleminlabelupdatingprocess.
However,thesealgorithmsdonottakeintoaccounttheimportance ofnodesandtheinfluenceofdifferentlabels.Inlarge-scalesocial networks,theprioripropertyofdifferentnodeswilldeterminethe node importance, and different node importance will decide the labelinfluenceofthenodelabeled,sotheinfluenceofdifferentla- belswillbreaktherandomrulesoflabelpropagationprocess,and itcansignificantlyimprovetheaccuracyandstability ofthealgo- rithm.Yao etal.[19] proposed anodes’k-shellvalue basedlabel propagationalgorithm NIBLPA, which chooses the label withthe maximumk-shellvaluetoupdatethenodelabel.Butthealgorithm NIBLPAdoesn’tconsidertheprioripropertyofnodesinsocialnet- works,suchasthenumberoffansoftheusernodes,user’smicro- blogreleasednumber,anduser’smicro-blogforwardednumberin micro-blogsocialnetwork.Therefore,theaccuracyofnode impor- tancecalculationmethodneedstobeimproved.
2.2.NodeimportancecalculationalgorithmbasedonBayesiannetwork
There are many algorithms to calculate the importance of nodes, such as degree centrality [20], clustering coefficient cen- trality[21],andbetweennesscentrality[22],butdegreecentrality
andclustering coefficientcentralitycanonlydescribethelocalin- formation ofnetworks. And the time complexity of betweenness centrality is too high, so it is only suitable for small networks.
Forlarge-scale socialnetworks, Zhangetal.[23] proposeda user nodeimportancecalculationalgorithmbasedonBayesiannetwork, which considered the messages inmicro-blogsocial network are high redundancy, fast rate of transmission and high timeliness.
Firstly,thealgorithm givesadeepanalysisofusers’behaviorpat- terns in micro-blog social network. Secondly, it uses a Bayesian model to learn prior probability of property of user node, and identifiesimportantusersbymanualidentification.Thirdly,ituses expertknowledgetoobtainthepriorprobabilityofeachproperty, andstudiesthe attributevalue ofthe nodewithmajor influence.
Finally,itestablishes amodelofBayesiannetworktodescribethe relationshipbetweenuserpropertyandinfluence,whichcannor- malizeallthenodesimportanceinnetwork.Thealgorithmcanbe describedasthefollowingsteps:
(1) Obtain user’s basic information which includes the total number of fans, the total number of micro-blogs, the number of concerns, the number of collections andcreation time. More- over,obtainuser’smicro-bloginformation,friendrelationshipand micro-blog’sforwardedrelations.
(2) Based on user’s basic information, calculate user’s accu- mulative numberofforwarded micro-blogs,accumulative number ofevaluated micro-blogs,accumulative numberof praised micro- blogs,averagenumberofforwarded micro-blogs,averagenumber of evaluated micro-blogs,average number ofpraised micro-blogs and activity, and classify the micro-blog through the content of themicro-bloginformation.
(3) Identify the important user node manually from different fieldsinthemicro-blogsocialnetwork,thenobtain thethreshold thatmeasurestheinfluenceofvariouspropertiesbycomparingthe identifiedimportantuserwiththeusersinthedataset,finallyes- tablishuserattributes-influenceprobabilityformula.
(4) Calculate the total user influence through multiplying the influenceofeachattributefactor.Theformulaisasfollows:
P
(
Inf) =
P
(
Inf|
Attr),
(3)where P(Inf)representstheinfluenceoftheuser,andP(Inf|Attr) representstheinfluenceofeachattributeoftheuser.
(5)Normalizetheinfluenceofalltheusersinthedataset,and theimportanceofeachnodeisobtained.
3. Proposedmethod
Alotofimprovedlabelpropagationalgorithmsrandomlydeter- mine labelupdating order ofnodesin labelpropagationprocess, because these algorithms didn’t take into account that the node importancemayimpactlabelupdatingprocess,itmayleadtothe lessimportant nodes affectsome more important nodesin turn, anditiscalled“countercurrent”phenomenoninlabelpropagation process. Besides,many algorithms only usethe numberoflabels tomeasure labelinfluenceinthe stageoflabelselection, andig- noretheeffectofnodeinfluenceonlabelselection.Inthissection, weproposeanovellabelpropagationalgorithmforcommunityde- tectionbasedonnodeimportanceandlabelinfluenceinnetworks (LPA_NI).Firstly,LPA_NIusesanovelmethodfornodeimportance evaluation. Secondly, inthe iterativeprocess, LPA_NI updates the nodes in thedescending order basedon the importance of each node.Asthenumberofmultiplelabelsreachesmaximum,LPA_NI calculatestheinfluenceofeachlabelandselectsthemostinfluen- tiallabeltoupdatenodelabel.
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
3.1. Thebasicidea
The importance of nodesis used to measure the influence of nodesinthewhole network,andinthispaper,we usenode im- portancecalculation algorithmbasedonBayesiannetworktonor- malizetheimportanceofallthenodesinnetwork.Ingeneral,the greater the importance ofa node has,the greater the impacton theothernodesithas,sothelabelofthisnodeismorelikelytobe spread.Butnodenormalizedimportancebasedonprioriattributes ofnodeisnotenough,becausetherearecloselinksbetweennodes and nodes in the network, a node linked with more important nodesismoreimportant.Therefore,weproposedanovelnodeim- portancecalculationmethod basedonthe earlynode importance algorithm,andtheformulaisasfollows:
NI
(
i) =
Inf(
i) + α ∗
j∈N(i) Inf
(
j)
d
(
j) ,
(4)where NI(i) represents the importance of node i, Inf(i) repre- sents the priori importance of node i, it is from expert knowl- edge,
α
representsatunableparameterrangingfrom0to1,which measures the extent of the influence of neighboring nodes on node i,N(i) represents theneighborhood set ofnode i, andd(j) isthedegreeofnode j.Wechoosenodeimportanceasthemeasureofnode influence, so we improve the performance of stability by fixing the nodes sequenceoflabelupdatinginthedescendingorderofnodeimpor- tance.
AnotherfactorthataffectsthestabilityofLPAisthatwhenthe number of multiple labels reaches maximum, the original algo- rithm randomly selects a label to update the node label. In this paper,weproposealabelselectionmethodusingtheinformation ofthelabelinfluence,sowecancalculatetheinfluenceofeachla- belandselectsthemostinfluentiallabeltoupdatenodelabel.The formulaisasfollows:
LI
(
i,
l) =
j∈Nl(i) NI
(
j)
d
(
j) ,
(5)whereLI(i,l)representstheinfluenceofthelabell onthenodei, andNl(i)representsthesetoflabellaroundthenode i.
Asaresult,thenewlabelupdateruleformulaisasfollows:
ci
=
arg maxl∈lmaxLI
(
i,
l),
(6)wherecirepresentsthemostinfluentiallabel,andlmax represents thesetsofthemaximumnumberoflabels.
More specifically, when the number of multiple labels in the adjacent node of the node i reaches maximum, LPA_NI recalcu- latestheinfluenceofthelabelsthat reachthemaximumnumber accordingtotheformula(6),andselectsthemostinfluentiallabel toupdatethenodei.
3.2. ThestepsofLPA_NIalgorithm
Based on the novel node importance calculation method and the label influence calculation method, we design a novel label propagationalgorithmforcommunitydetectionbasedonnodeim- portanceandlabelinfluence(LPA_NI).Specificstepsareasfollows:
Input: Network G=(V,E), maximum number of iterations maxIter.
Output:CommunitysetSetc= {c1,· · ·,ck},andkisthenumber ofcommunity.
(1)Initializetheuniquelabelforeachnodeinthenetwork,for agivennodex,cx(0)=x.
(2)Calculate the node importance foreach node accordingto theformula(4),sortnodesindescendingorderofNI,andgenerate
an orderedsequence X= {x1,x2,· · ·,xn},whereNI(x1)≥NI(x2)≥
· · · ≥NI(xn).
(3)Setiterationnumbert=1.
(4) Foreach node x∈X,iteratively update the node labelac- cording to theformula (5)and (6). In theiterative process, each nodeusesthelabelwiththelargestimportanceofadjacentnodes asthelabelitself.Whenthenumberofmultiplelabelsinthead- jacent node of the node i reaches the maximum value, LPA_NI recalculates the influence ofthe labels that reach the maximum number accordingtothe formula(6), andselectsthemostinflu- entiallabeltoupdatenodei.
(5)Ifiterationnumbert=maxIter orthelabelofeachnodeis thesameasthatofmostofitsneighboringnodes,thenthenodes with thesame labelare placed inthe samecommunity, andthe algorithmends;otherwise,sett=t+1 andgotostep(4).
LPA_NI algorithm updates the nodes in the descending order based on theimportance of each node,so the nodeswith larger node importance willinfluence the nodeswith lessnode impor- tance, which is conducive to reducing the “countercurrent” phe- nomenon. Furthermore,LPA_NIconsiders theeffectoflabelinflu- enceonlabelselection strategy,soithashigheraccuracyinlabel updating process.Thereby,LPA_NI improvestheaccuracyandsta- bilityofthealgorithm.
We implementLPA_NI on thesocial network sample graphin Fig. 3with
α
=1.Thedecimalsandlettersoutsidethenodesare thenode importanceandnodelabels.Usingournodeimportance calculation method on the graph, the new importance of node 1–6are1.3302, 0.943,1.3132,1.3702,0.9793,1.3662,respectively, so thenode updating sequence isfixed as4-6-1-3-5-2.The label propagationprocessisshowninFig. 4.Firstly, we update thelabelofnode 6.We usea set oftuples (l,n,LI(l)) to describe the adjacent node information of node 6, where l represents thelabel, n represents the numberofthe la- bel l, and LI(l) represents the influence of label l. As shown in Fig. 4(a), node 6 has 3 adjacent nodes, and the adjacent nodes havedifferentlabels,so theadjacentnode informationofnode 6 is {(c,1,0.4377),(d,1,0.4567),(e,1,0.4897)}.So we chooselabel
“e”asthenewlabelofnode 6.
Then,weupdatethelabelofnode4.Afterthelabelupdatingof node 6,theadjacentnodeinformationofnode4is{(e,2),(a,1)}. Due tothemaximumvalue ofadjacentnode ofnode 4isonly1, LPA_NI nolonger calculates the labelinfluence, andselects label
“e” as the new label of node 4 asshown in Fig. 4(b). The next labelpropagationofnode1 andnode 3areconsistent withnode 6andnode4,sothelabelsofnode1andnode3are updatedas newlabel“b”,andthespecificprocessisnolongerdemonstrated.
Finally,the labelsofadjacentnode informationofnode 5and node2aretheirownlabels,sotheyarenotupdatedand,asshown inFig. 4(c).Thelabelsofallnodesarethesameasthatofmostof theirneighboringnodes,andLPA_NIends.
In the label propagation process sample of LPA_NI, the algo- rithmcangettheresultsofthetwocommunitiesthroughaniter- ation,whichfullyconsistentwiththecorrectcommunitydivision.
Sincethereisnorandomness,theLPA_NI algorithmhasgoodsta- bilityandaccuracy.
3.3. Timecomplexityanalysis
Suppose n= |V|andm= |E|,andwe firstly analyzethetime complexityofthealgorithm.
(1)Thetimecomplexityofinitializationforallnodesinstep1:
O(n).
(2)Thetime complexityofcalculatingthenode importanceof all nodesinstep2: O(n),andthetime complexityofrankingthe nodes in descending order of NI based on the fast sorting algo-
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
Fig. 4.Label propagation process of LPA_NI.
rithm in step 2: O(nlog2(n)), so the whole time complexity of step 2is O(nlog2(n)+n).
(3)The time complexity ofnormal labelupdating: O(m), and the time complexity of recalculating the labels according to the formula(5)ifnecessary: O(m).
(4)The timecomplexity ofassigning thenodeswiththesame labelstoacommunityinstep 5:O(n).
Step 4 is iterative, and maximum number of iterations is maxIter.Thetimecomplexityofthewholealgorithmis3×O(n)+ 2×maxIter×O(m)+O(nlog2(n)).
Thenweanalyzethespacecomplexityofthealgorithm.
(1)The space complexity of using adjacency list to store the nodeandedgeinformationinstep1: O(n+m).
(2)Thespacecomplexityofrankingthenodesbasedonthefast sortingalgorithminstep2: O(nlog2(n)).
(3)Thespacecomplexityofassigningthenodeswiththesame labelstoacommunityinstep5: O(n).
Sothespacecomplexity ofthewholealgorithmis2×O(n)+ O(m)+O(nlog2(n)).
4. Experimentsandanalysis
InordertoverifytheperformanceoftheproposedLPA_NIalgo- rithm,thispaperusesreal-worldnetworksandsyntheticnetworks tocarryoutthecommunitydetectionexperiment.
We present the performance of LPA_NI on several real-world complexnetworkswiththeabsent ofground-truthcommunities:
Zachary’skarateclub[25],Footballnetwork[26],Dolphinsnetwork [28]andtheSinamicro-blogdatasetwhichusedinliterature[17].
And we present the performance of LPA_NI on Girvan–Newman benchmarkandLFRbenchmark[29].
Wecomparetheperformance ofLPA_NI withLPAandNIBLPA.
Where NIBLPA is an improved LPA algorithm choosing the label with the maximum k-shell value to assign the nodes. In order toovercometheinfluenceoftherandomnessoftheexperiments, all the algorithms are processed 50 timesand the averagevalue is used as the results, and the maximum number of iterations
of the algorithm is set to 100 times.All the algorithms are im- plemented in Java and tested in JDK8.0 platform, and simula- tions are carried out in a notebook PC withInter (R) Core (TM) CPU i5-4200 @ 2.50 GHz and 12 GB memory under Windows 10 OS.
4.1. Experimentdataset
ThedetailsoftheSinamicro-blogdatasetareasfollows:
(1) 63641 Sina Micro-blog users’ information, including uid, nickname, name, location, homepage url, gender, the number of fans, the number of concerns, the number of micro-blogs, the numberofcollectionsandcreationtime;
(2) 84168 micro-blogs’ information about 12 topics from 2014-05-03to2014-05-11,includingmid,releasetime,themicro- blog content, source, the number of forwarded micro-blog, the number of evaluated micro-blog, the number of praised micro- blog,publisheduseruidandtopicofmicro-blog;
(3) 12 topics includes Meizu, XiaoMi, Rockets, Jeremy Lin, Hengda,Korean,haze,house’sprice,MyOldClassmate,corruptof- ficialsandtransgenic;
(4)1391718users’relationships;
(5)27759forwardedmicro-blog’srelations.
62827users’ informationcontained inthe datasetcan restore thetrueSinamicro-blognetwork,and12themesofmicro-blogin- formationcanrestoretheflowofinformationinmicro-blogsocial network.Users’relationshipsandforwardedmicro-blog’srelations areenoughtorestoretherealtopologyofthemicro-blogSinanet- work.
Because the original dataset cannot meet the experimental requirements, it is necessary to preprocess the Sina micro-blog dataset.Specificoperationsareasfollows:
(1)Ifandonlyiftheusernodesaremutualconcerned,thereis an edge betweenthenodes,otherwise thereis noedge between thenodes;
(2)Removethediscretesmallcommunity(the totalnumberof nodesin communityislessthan 5), andobtaina network graph consistingof1220pointsand5023edges.
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
LFR benchmark is presented by Lancichinetti et al. The LFR benchmark generates static networks with built-in community structure.Theconfigurationofgeneratednetworksdependsonvar- ioususer-specifiedparameters. Numberofnodesis N,k specifies theaveragedegree ofnodesandkmax istheupperbound onde- greesofnodes.The mixingparametermuisthefractionofedges thata node hastothenodes outsideof itscommunity.There, as we decrease
μ
we obtain a clearset ofcommunities withfewer numberofinter-edges.Later,theauthorsadapted theLFR bench- marktogenerateoverlappingcommunities.ParameterOnspecifies thenumberofoverlappingnodesandOm controlsthenumberof membershipofoverlappingnodes.Inthispaper,weset N=1000, k=15, kmax=50,μ
=0.3 to generate LFR benchmark and set N=128,k=16,kmax=16,μ
=0.1 togenerateGirvan–Newman benchmarktopresenttheperformanceofLPA_NI.4.2. Evaluationcriteria
Themostpopular measureto comparethe similaritybetween thedeliveredcommunitystructureandtheground-truthcommu- nities isNormalized MutualInformation(NMI). Wehaveused an implementationofNMImeasuremadeavailable byMcDaidetal., forsetsofoverlappingcommunities[29].Further,weusemodular- ityvalueofobtainedcommunitystructureswhentheground-truth communitystructureisunknown.
Modularity is currentlywidely used in measuring the perfor- mance of network detection algorithms that proposed by New- manandGirvaninliterature[24].Moreover,whiletheunderlying classlabels oftherealnetwork are unknown,we canonly adopt the modularity asthe evaluationcriteriaon real networks. Fora dataset with no overlapping communities, the modularity is de- finedas:
Q
=
1 2mi,j∈V
Ai j
−
d(
i)
d(
j)
2m× δ(
ci,
cj),
(7)where Q representsthe modularity,m representsthenumber of edges inthe network, and A isthe adjacencymatrix ofthe net- work, ifnode i andnode j are directlyconnected, Ai j=1; else- wise, Ai j=0. ci andcj, denotethe label of node i and node j, respectively,ifci=cj,thenδ(ci,cj)=1,elseδ(ci,cj)=0.
4.3. Experimentresult
Inthissection,therealdatasetisusedtotesttheeffectiveness ofLPA_NIcomparingwithtraditionalLPAandNIBLPA,andwetest thestability ofthe algorithms by analyzingthe fluctuation range ofthewholeresults.
Inordertocomparetheeffectivenessofthealgorithms,weuse theaveragemodularity Q tomeasurethequalityofdividedcom- munities, so the greater the Q value is, the more accurate the results ofcommunity detectionare, and thesmaller the fluctua- tionrangeofQ valueis,themorestablethecommunitydetection resultsare.NIBLPAandLPA_NIcontainonlyoneparameter
α
.And theparameterα
isusedtomeasuretheextentoftheinfluenceof neighboringnodestonodei,soweletα
= {0,0.1,0.2· · ·,1}.More specifically,ifα
=0,theimportanceoftheadjacentnodesofthe node i has no influence,so the importance of node i isonly its ownimportance. And ifα
=1,theimportance ofadjacent nodes isaccumulatedinmultiplesof1toobtainthenewimportanceof nodei.TheexperimentalresultsareshowninFig. 5.As it can be seen in Fig. 5, for the real dataset used in this paper, the variation trend of the Q value of LPA_NI is like bell shapedcurvewiththegradualincreaseofparameters
α
,andwhenα
=0.4,the maximummodularity Q =0.6995 is achieved.More specifically,whenα
=0.4,theinfluenceofadjacentnodesontheFig. 5.The comparison ofQunder different tunable parameterα.
Fig. 6.50 repeated experiments under parameterα=0.4.
node i isthe mostappropriate,andthe newnodeimportance in improving thecommunitystructuredetectionhasthe besteffect.
The Q valueofNIBLPAchangeslittleunderdifferentparameter
α
, and themaximum modularity Q =0.6197 is obtainedwhen the parametersα
=0.5 andα
=0.6. Experimental resultsshow that themodularityofLPA_NIishigherthanNIBLPAwhenα
=0.4.And withtheimportanceofadjacencynodestotheinfluenceofnodei graduallyincreased, thequalityofcommunitydetectionofLPA_NI firstly increased, then gradually decreased. Therefore, the impact of an appropriate amount of the adjacent node importance will significantly improvethe accuracy ofLPA_NI, andLPA_NI can get anoptimalresultofcommunitydetection.In order to analyze the stability of LPA_NI, NIBLPA and LPA, this paper analyzes the 11 experimental results of 50 repeated experiments withparameter
α
=0.4,the variation trendof each algorithmmodularityvalue Q isshowninFig. 6.ItcanbeseenfromFig. 6that Q valuesofLPA_NIandNIBLPA are0.6995and0.6197inthe50repeatedtests.BecauseLPA_NIand NIBLPAimprove therandomness oftheupdate sequence andthe label selection process, they have goodstability. However, LPA is lessstablebecauseoftherandomnessofthealgorithm.Ingeneral, LPA_NI can getstable resultsthan LPA andeffective results than LPAandNIBLPA.
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
Fig. 7.The number of iterations under different tunable parameterα.
Fig. 8.The running time under different tunable parameterα.
Inorder to analyze the efficiencyof LPA_NI, NIBLPA andLPA, Figs. 7 and 8show the numberof iterationand runningtime of thealgorithm,respectively.
Itcan be seen from Fig. 7that asthe parameter
α
increases, thenumberofiterations oftheLPA_NIare stablein7times.And NIBLPAhasalargerfluctuationonthenumberofiterations.Inthe earlyperiod,thenumberofiterations isaround8times,thenthe numberwillsurgeto17,21,13times.Experimental resultsshow that the number of iterations of the LPA_NI is significantly less thanNIBLPA,andtherunning timeisobviouslybetterthan other algorithms. Therefore, the node importance calculated by k-shell valuecan not fullydescribethe node influenceinthe large-scale socialnetworksowhenα
≥0.8,thenumberofiterationsandrun- ningtimeofthe algorithmwill besignificantly increased. LPA_NI using the priori knowledge to calculate the node importance is more accurate, so the number of iterations and running time of LPA_NIwillbe less, andthe stabilityofLPA_NI isbetter thanthe otheralgorithms.Then,wecomparetheperformance ofLPA_NI,NIBLPAandLPA onZachary’skarateclub,Footballnetwork,Dolphinsnetwork.The experimentalresultsareshowninTable 1.It’seasy toseeLPA_NI achievehigherQ thanotheralgorithmsondifferentnetworks.
Table 1
QobtainedbyLPA_NI,NIBLPAandLPAondifferentreal-worldnetworks.
Networks QLPA_NI QNIBLPA QLPA
Karate 0.4151 0.3944 0.3590
Football 0.5805 0.5762 0.5899
Dolphins 0.5143 0.5184 0.4797
Table 2
NMIobtainedbyLPA_NI,NIBLPAandLPAondifferentsyntheticnetworks.
Networks NMILPA_NI NMINIBLPA NMILPA
LFR 1 1 1
Girvan–Newman 0.9989 0.9989 0.9989
We alsoshow theperformance ofLPA_NI onLFR andGirvan–
NewmanbenchmarkinTable 2.WecompareourmethodwithLPA andNIBLPA.It’seasytoseeLPA_NIachievehigherNMIondifferent networks.
Insummary, onthe one hand,LPA_NI takesintoaccount that thenodeimportancemayimpactlabelupdatingprocess,soitwill let the more important nodes affect some less important nodes, anditcanreducethe“countercurrent”phenomenoninlabelprop- agationprocess.Ontheotherhand,LPA_NItakesintoaccountthat the effect of label influence on label selection. Therefore, it can improvetheaccuracyinlabelselectionprocess.Asitcanbe seen, LPA_NI cansignificantlyimprovethequalityofcommunitydetec- tion andshorten theiteration period.Also, ithasbetter accuracy andstabilityinthecaseofsimilarcomplexity.
5. Conclusion
Thispaperpresentsalabelpropagationalgorithmforcommu- nitydetectionbasedonnodeimportanceandlabelinfluence.The algorithmfirstly usesanovelmethodfornode importanceevalu- ation,andfixes thenode ordersoflabelupdatinginthedescend- ingorderofnodeimportance.Duringeachlabelupdatingprocess, whenthenumberofmultiplelabelsreachesthemaximumvalue, thispaperintroduces thelabelinfluenceintotheformulaoflabel updatingtoimprovethestability.Thisalgorithmmaintainsthead- vantages ofthe original LPA algorithm,andit can get stableand efficient resultsby avoiding therandomness in labelpropagation process.Byexperimentsonrealdataset,wedemonstratethat the proposedalgorithmhasbetterperformancethansomeofthecur- rentrepresentativealgorithms.
Uncitedreferences [18] [27]
References
[1] YoufangLin,TianyuWang,RuiTang,YuanweiZhou,HoukuanHuang,Anef- fective modelandalgorithmfor communitydetectioninsocialnetworks,J.
Comput.Res.Dev.49 (2)(2012)337–345.
[2] DayouLiu,DiJin,DongxiaoHe,JingHuang,JianningYang,BoYang,Community miningincomplexnetworks,J.Comput.Res.Dev.50 (10)(2013)2140–2154.
[3] M.Shiga,I.Takigawa,H.Mamitsuka,Aspectralapproachtoclusteringnumeri- calvectorsasnodesinanetworks,PatternRecognit.44 (2)(2011)236–251.
[4] LiangHuang,RuixuanLi,HongChen,XiwuGu,KunmeiWen,YuhuaLi,Detect- ingnetworkcommunitiesusingregularizedspectralclusteringalgorithm,Artif.
Intell.Rev.41 (4)(2014)579–594.
[5] R.Guimera,M.Sales-Pardo,L.A.Amaral,Modularityfromfluctuationsinran- domgraphsandcomplexnetworks,Phys.Rev.E70 (2)(2004)188.
[6] R.Guimera, Functional cartographyofcomplex metabolic networks, Nature 433 (7028)(2005)895–900.
[7] J.Duch,A.Arenas,Communitydetectionincomplexnetworksusingextremal optimization,Phys.Rev.E72 (2)(2005)986.
[8] ZhanBu,ChengcuiZhang,ZhengyouXia,JiandongWang,Afastparallelmod- ularity optimization algorithm (FPMQA) for community detection inonline socialnetwork,Knowl.-BasedSyst.50 (3)(2013)246–259.
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132
[9] S.Zhang,H.Zhao,Normalizedmodularityoptimizationmethodforcommunity identificationwithdegreeadjustment,Phys.Rev.E88 (5–1)(2013)471.
[10] M.Girvan,M.E.J.Newman,Communitystructureinsocialandbiologicalnet- works,Proc.Natl.Acad.Sci.USA99 (12)(2002)7821–7826.
[11] F.Wu,B.A.Huberman,Findingcommunitiesinlineartime:aphysicsapproach, Eur.Phys.J.B-Condens.Matter38 (2)(2004)331–338.
[12] U.N.Raghavan,R.Albert,S.Kumara,Nearlinear-timealgorithmtodetectcom- munitystructuresinlarge-scalenetworks,Phys.Rev.E76 (3)(2007)036106.
[13] G.Steve,Findingoverlappingcommunitiesinnetworksbylabelpropagation, NewJ.Phys.12 (10)(2010)2011.
[14] M.J.Barber,J.W.Clark,Detectingnetworkcommunitiesbypropagatinglabels underconstraints,Phys.Rev.E80 (2)(2009)026129.
[15] I.X.Leung,P.Hui,P.Lio,J.Crowcroft,Towardsrealtimecommunitydetection inlargenetworks,Phys.Rev.E79 (6)(2009)066107.
[16] XianKunZhang,XueTian,YaNanLi,ChenSong,Labelpropagationalgorithm basedonedgeclusteringcoefficientforcommunitydetectionincomplexnet- works,Int.J.Mod.Phys.B28 (30)(2014).
[17] XianKunZhang,SongFei,ChenSong,XueTian,YangYueAo,Labelpropagation algorithmbasedonlocalcyclesforcommunitydetection,Int.J.Mod.Phys.B 29 (5)(2015).
[18] J.Xie,B.K.Szymanski,LabelRank:astabilizedlabelpropagationalgorithmfor communitydetectioninnetworks,Netw.Sci.Workshop(2013)138–143.
[19] Y.Xing,F.Meng,Y.Zhou,M.Zhu,M.Shi,G.Sun,Anodeinfluencebasedla- belpropagationalgorithmforcommunitydetectioninnetworks,Sci.WorldJ.
2014 (5)(2014)627581.
[20] J.Sohn,D.Kang,H.Park,B.G.Joo,I.J.Chung,AnImprovedSocialNetworkAnal- ysisMethodforSocialNetworks,Springer,Netherlands,2014,pp. 115–123.
[21] L.Šubelj, M.Bajec,Groupdetectionincomplexnetworks:analgorithmand comparisonofthestateoftheart,Phys.A,Stat.Mech.Appl.397 (3)(2014) 144–156.
[22] O. Green,D.A.Bader,Faster betweennesscentralitybased ondatastructure
experimentation,Proc.