importance and label inﬂuence Label propagation algorithm for community detection based on node Physics Letters A

(1)

Contents lists available atScienceDirect

Physics Letters A

www.elsevier.com/locate/pla

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Label propagation algorithm for community detection based on node importance and label inﬂuence

Xian-Kun Zhang

¹

, Jing Ren, Chen Song, Jia Jia, Qian Zhang

CollegeofComputerScienceandInformationEngineering,TianjinUniversityofScienceandTechnology,No.29,13thAvenue,TEDA,BinhaiNewArea,Tianjin, 300457,PRChina

a r t i c l e i n f o a b s t ra c t

Articlehistory:

Received28March2017

Receivedinrevisedform9June2017 Accepted10June2017

Availableonlinexxxx CommunicatedbyC.R.Doering

Keywords:

Large-scalesocialnetwork Communitydetection Labelpropagation Nodeimportance Labelinﬂuence

Recently, the detection of high-quality community has become a hot spot in the research of social network. Labelpropagationalgorithm(LPA)hasbeenwidelyconcernedsince ithasthe advantagesof lineartimecomplexityand isunnecessarytodefineobjectivefunctionandthenumber ofcommunity inadvance.However,LPAhastheshortcomingsofuncertaintyandrandomnessinthelabelpropagation process,whichaffectstheaccuracyand stabilityofthecommunity.Forlarge-scalesocialnetwork,this paperproposesanovellabelpropagationalgorithmforcommunitydetectionbasedonnodeimportance and label influence (LPA_NI). The experiments with comparative algorithms on real-world networks and synthetic networks have shown that LPA_NIcan significantly improve the quality ofcommunity detectionandshortentheiterationperiod.Also,ithasbetteraccuracyandstabilityinthecaseofsimilar complexity.

©²⁰¹⁷^Elsevier^B.V.^All^rights^reserved.

1. Introduction

Socialnetworkisarelativelystablesocialrelationshipwiththe interactionresult betweenindividual members.Social network is commonlyabstractedintoagraph,inwhichnodesrepresentindi- vidualmembersandedges (orlinks)aretherelationshipbetween individual members [1]. A community is a subgraph containing nodeswhicharemoredenselylinkedtoeachotherthantotherest ofthegraphorequivalently,agraphhasacommunitystructureif thenumberoflinksintoanysubgraphishigherthanthenumber oflinksbetweenthose subgraphs[24].Givenanetwork diagram, theprocessofﬁndingitscommunitystructureiscalledcommunity detection. In socialnetwork, community detectionforsocial net- workanalysisisimportant.Inrecentdecades,manysocialnetwork communitydetectionalgorithmshavebeenproposed.Accordingto thesolutionstrategy,algorithms canbe dividedintooptimization methodsandheuristicmethods[2].Optimizationmethodsachieve community detection through setting the objective function and iteratingto getapproximation of theoptimal value ofthe objec- tivefunction.Therepresentativealgorithmsofoptimizationmeth- ods includespectral algorithms andmodularity maximization algorithms[3–9].Thespectralalgorithms[3,4]transformcommunity identiﬁcationproblemintosimplequadraticoptimizationproblem,

E-mailaddress:[email protected](X.-K. Zhang).

1 Fax:8602260274450.

andtheapproximateoptimalpartitionofthenetworkisobtained bysolvingthecharacteristicvectorsofLaplacianmatrix.Modular- itymaximization methodsare to ﬁndthemaximumvalue ofthe modularityfunction(Q function)inthenetwork,whoserepresen- tative algorithms includeFN algorithm [5], GA algorithm [6]and EO algorithm [7]. Heuristic methods ﬁnd the optimaldivision of communities bysettingheuristicsrules,anditsrepresentative al- gorithmsincludeGNalgorithm[10]andWHalgorithm[11].

Inaddition,thereareothereffectivecommunitydetectionalgo- rithms. Forexample,hierarchicalclustering algorithms regard the social network asthecomposition of themulti-layer community, and use the traditional hierarchicalclustering algorithm to carry outcommunitydetection.Evolutionarycomputationalgorithmsre- gardtheprocessofcommunitydetectionastheoptimizationpro- cess ofthe objective function,and useevolutionary computation methodstoﬁndtheapproximateoptimalsolutiontotheproblem.

Thesealgorithms generallyhave hightime andspacecomplexity, andthey are not suitable forlarge-scale social networkstructure detection.In2007,Raghavanetal.[12]proposedafastcommunity detectionalgorithmLPAbasedonlabelpropagation.Thecomplex- ityofLPAisnearlylineartime,andthedesignofthealgorithmis simple,soLPAhasgoodeﬃciencyindealingwithlarge-scalenet- work. Moreover, LPA neitherrequire apredeﬁned objective function optimizationnor need communityquantity andscale. All of these make LPA receive quite a lot of attention from numerous scholars.However, intheprocess ofupdatingthenode label, LPA hastheshortcomingsofuncertaintyandrandomness,whichleads http://dx.doi.org/10.1016/j.physleta.2017.06.018

0375-9601/©²⁰¹⁷^Elsevier^B.V.^All^rights^reserved.

(2)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 1.Label propagation process (Raghavan et al.[12]).

to poorresults inaccuracy andstability of the algorithm.Litera- ture[19]takesintoaccounttheinﬂuenceofnodedegreeandedge weightduringthelabelupdating process.Butinlarge-scalesocial networks,such asthemicro-blogsocialnetwork,theinﬂuence of thenodes’ prior attribute onthe node importance is not considered.

Inthispaper, wepropose a novellabelpropagationalgorithm forcommunitydetectionbasedonnode importanceandlabelin- fluence in networks (LPA_NI). Firstly, LPA_NI uses a novel node importancemethodtoevaluatenodeimportance.Secondly,inthe iterativeprocess,LPA_NIfixesthenodeordersoflabelupdatingin thedescendingorderofnodeimportance.Asthenumberofmul- tiple labelsreaches maximum,LPA_NI calculates the influenceof each label and selects the mostinfluential label to update node label. Finally, experiments with comparative algorithms on real- world networksand syntheticnetworks haveshown that LPA_NI cansignificantly improvethe qualityofcommunitydetectionand shortenthe iteration period.Also, ithasbetter accuracy andstabilityinthecaseofsimilarcomplexity.

The restofthe paperisorganized asfollows.Section 2 introduces the label propagation algorithm and the node importance calculationalgorithmbasedonBayesiannetwork.InSection3,this paper introduces the main idea and the detailed process of our algorithm. The experimental results on real-world networks and synthetic networksin Section 4 conﬁrm the effectiveness of the algorithm.TheconclusionisgiveninSection5.

2. Relatedwork

A complex network can be modeled as a graph G=(V,E), where V = {^v1,v2,· · ·,vn} ^represents ^the ^set ^of ^nodes, ^E = {e1,e2,· · ·,em}representstheedgesbetweenthenodes,andn,m representthenumberofnodesandedges inthenetwork,respectively.EachedgeinE hasapairofnodesinV corresponding.The labelofvi isdenoted asci, N(i)representstheneighborhoodset ofviandd(i)isthedegreeofnodei.

2.1. Labelpropagationalgorithmforcommunitydetectioninnetworks

LPA is a kind of heuristic algorithm, and the main heuristic rules of LPA is to transfer label information constantly between nodes. After several iterations, the node label belonging to the samecommunitywillconverge.ThemainideaofLPAisthateach nodechangesitslabeltotheonecarriedbythelargestnumberof itsadjacentnodes(asshowninFig. 1).TheprocessofLPAcanbe describedasthefollowingsteps:

Input: Network G=(V,E), maximum number of iterations maxIter.

Output:CommunitysetSetc= {^c1,· · ·,ck}^,^and^k^is^the^number ofcommunity.

(1)Initializetheuniquelabelforeachnodeinthenetwork.For agivennodex,c_x(0)=^x.

(2)Setiterationnumbert=^1.

Fig. 2.The oscillation of labels in a bipartite graph (Xing et al.[19]).

(3)Arrangethenodesofthenetworkinrandomorder,andgen- erateanorderedsequence X= {^x1,x2,· · ·,xn}^.

(4) Foreach node x∈^X^,iteratively update the node labelac- cording to the formula (1) or (2). In the iterative process, each nodechangesitslabeltotheonecarriedbythelargestnumberof itsadjacentnodes.Asthenumberofmultiplelabelsreachesmax- imum,LPArandomlyselectsoneofthemtoupdatethenodelabel.

Labelupdatingmethodscanbedividedintosynchronousupdating method andasynchronous updating method.Synchronous updating methodis to update the labelof node x inthe tth iteration basedonthelabeloftheadjacentnodesatthe(t−^1)th^iteration.

Theformulaisasfollows:

c_x

(

t

) =

^f

c_x1

(

t

−

¹

), · · · ,

c_xk

(

t

−

¹

)

,

x_i

∈

^N

(

x

),

(1) where cx(t) representsthelabelof node xinthe t iteration,and N(x)representstheneighborhoodsetofnode x.Synchronousup- dating may occur oscillation phenomenon in bipartite or nearly bipartitestructurenetwork(asshowninFig. 2),andasynchronous updating can be a good solution to this problem [12]. Asyn- chronous updating is to update the label of node x in the tth iteration based on a portion of labels at the tth iteration of its adjacentnodeswhichhavealreadybeenupdatedinthecurrentit- eration andanotherpartoflabelsatthe(t−^1)th^iteration^which arenotyetupdatedinthecurrentiteration.Theformulaisasfol- lows:

c_x

(

t

) =

^f

c_x1

(

t

−

¹

), · · · ,

c_xm

(

t

−

¹

),

c_x₍_m₊₁₎

(

t

), · · · ,

c_xk

(

t

) ,

x_i

∈

^N

(

x

),

(2)

(5)Ifiterationnumbert=^maxIter ôr^the^labelôfêach^nodeîs thesameasthatofmostofitsneighboringnodes,thenthenodes with thesame labelare placed inthe samecommunity, andthe algorithmends;otherwise,sett=^t+^{1 and}^go^to^{step (3).}

AlabelpropagationprocessisshowninFig. 1.Firstly, LPAini- tializesallthenodesinthegraph,andallthe5nodesareassigned unique labels: “a”, “b”, “c”, “d” and“e”. Secondly,algorithm ran- domlyselectsnode3toupdate itslabel,becausetheneighboring nodes around node 3 are labeled withthe maximumvalue of 1, node 3 randomly selects one of them to update its label as“a”.

Then, algorithm randomly selects node 5to update its label, be- causethereare twoneighborsofnode 5that sharelabel“a”and only thelabel“a”hasa maximumvalue of2,node5updatesits

(3)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 3.The social network sample graph.

labelas“a”. Finally,thelabelsofthewhole nodeshavebeenup- datedforlabel“a”,thelabelpropagationalgorithmend.

Althoughasynchronousupdatingmethodcanavoidtheoscilla- tionproblemsoflabels,it canbe seenfromtheabove steps that theuncertaintyandrandomness ofLPAaremainly fromtheran- domnessofupdatesequenceanduncertaintyinnodelabelschoos- ingprocessinstep(4),whichaffectedtheaccuracyandstabilityof LPAalgorithm.WeanalyzetraditionalLPAwiththesocialnetwork sample graphinFig. 3. Thereare 2 communitiesin thenetwork, {¹,2,3}ând{⁴,5,6}^.^The⁶^nodesⁱⁿ^the^graphâreinitializedwith label“a”, “b”,“c”,“d”, “e”and“f”,respectively.Assumingthatthe labelsofnodes1,2,3havebeenupdatedtolabel“a”,thelabelsof nodes4,5,6arestill“d”,“e”,“f”.Ifwerandomlyselectnode4to updateandlabel“a”ischosenasitslabel,thenweupdatenode6, andfinally updatenode5.Theresultisthatallnodesinnetwork willbe updatedto label“a”,andall nodeswillbe dividedinto a community.Conversely,ifnode4isupdated tolabel“f”,thenup- datenode 5,andfinallyupdatenode6.Theresultofthenetwork willbe dividedinto2communities,{¹,2,3}ând{⁴,5,6}^,ând^the resultswillcorrespondwiththerightcommunities.

Inrecentyears,manyscholarshaveimprovedthestandardLPA algorithm.Steve etal.[13] extended the standard LPAalgorithm, andproposedanalgorithmCOPRAfordetectingoverlapping community structure by allowing each node to retain a number of communitylabels.Barberetal.[14] proposeda constrainedlabel propagationalgorithmLPAm,whichconvertedLPAasanoptimiza- tionproblem,andgavethecorrespondingobjectivefunctionH,so itcouldsolvetheproblemofclusteringperformanceofLPA.Leung etal.[15] foundthat afterfiveiterationofLPA,95%ofthenodes hadbeencorrectly gathered.Afterthat, the iteration was mainly toupdatethenodeslabelsinthecommunity.Therefore,theyim- provedtheupdatingrulesandtheiterativerulesofLPA.Zhanget al.[16] proposedaLPAc algorithmbasedonedgeclusteringcoef- ficient,whichimprovedthe labelupdating processby calculating theedgeclustering coefficient.Literature[17]usedthemethodof localcyclestosolvetherandomprobleminlabelupdatingprocess.

However,thesealgorithmsdonottakeintoaccounttheimportance ofnodesandtheinfluenceofdifferentlabels.Inlarge-scalesocial networks,theprioripropertyofdifferentnodeswilldeterminethe node importance, and different node importance will decide the labelinfluenceofthenodelabeled,sotheinfluenceofdifferentla- belswillbreaktherandomrulesoflabelpropagationprocess,and itcansignificantlyimprovetheaccuracyandstability ofthealgo- rithm.Yao etal.[19] proposed anodes’k-shellvalue basedlabel propagationalgorithm NIBLPA, which chooses the label withthe maximumk-shellvaluetoupdatethenodelabel.Butthealgorithm NIBLPAdoesn’tconsidertheprioripropertyofnodesinsocialnet- works,suchasthenumberoffansoftheusernodes,user’smicro- blogreleasednumber,anduser’smicro-blogforwardednumberin micro-blogsocialnetwork.Therefore,theaccuracyofnode impor- tancecalculationmethodneedstobeimproved.

2.2.NodeimportancecalculationalgorithmbasedonBayesiannetwork

There are many algorithms to calculate the importance of nodes, such as degree centrality [20], clustering coeﬃcient centrality[21],andbetweennesscentrality[22],butdegreecentrality

andclustering coeﬃcientcentralitycanonlydescribethelocalin- formation ofnetworks. And the time complexity of betweenness centrality is too high, so it is only suitable for small networks.

Forlarge-scale socialnetworks, Zhangetal.[23] proposeda user nodeimportancecalculationalgorithmbasedonBayesiannetwork, which considered the messages inmicro-blogsocial network are high redundancy, fast rate of transmission and high timeliness.

Firstly,thealgorithm givesadeepanalysisofusers’behaviorpat- terns in micro-blog social network. Secondly, it uses a Bayesian model to learn prior probability of property of user node, and identifiesimportantusersbymanualidentification.Thirdly,ituses expertknowledgetoobtainthepriorprobabilityofeachproperty, andstudiesthe attributevalue ofthe nodewithmajor influence.

Finally,itestablishes amodelofBayesiannetworktodescribethe relationshipbetweenuserpropertyandinﬂuence,whichcannor- malizeallthenodesimportanceinnetwork.Thealgorithmcanbe describedasthefollowingsteps:

(1) Obtain user’s basic information which includes the total number of fans, the total number of micro-blogs, the number of concerns, the number of collections andcreation time. More- over,obtainuser’smicro-bloginformation,friendrelationshipand micro-blog’sforwardedrelations.

(2) Based on user’s basic information, calculate user’s accumulative numberofforwarded micro-blogs,accumulative number ofevaluated micro-blogs,accumulative numberof praised micro- blogs,averagenumberofforwarded micro-blogs,averagenumber of evaluated micro-blogs,average number ofpraised micro-blogs and activity, and classify the micro-blog through the content of themicro-bloginformation.

(3) Identify the important user node manually from different fieldsinthemicro-blogsocialnetwork,thenobtain thethreshold thatmeasurestheinfluenceofvariouspropertiesbycomparingthe identifiedimportantuserwiththeusersinthedataset,finallyes- tablishuserattributes-influenceprobabilityformula.

(4) Calculate the total user inﬂuence through multiplying the inﬂuenceofeachattributefactor.Theformulaisasfollows:

P

(

Inf

) =

P

(

Inf

|

^Attr

),

(3)

where P(Inf)representstheinfluenceoftheuser,andP(Inf|Âttr) representstheinfluenceofeachattributeoftheuser.

(5)Normalizetheinﬂuenceofalltheusersinthedataset,and theimportanceofeachnodeisobtained.

3. Proposedmethod

Alotofimprovedlabelpropagationalgorithmsrandomlydeter- mine labelupdating order ofnodesin labelpropagationprocess, because these algorithms didn’t take into account that the node importancemayimpactlabelupdatingprocess,itmayleadtothe lessimportant nodes affectsome more important nodesin turn, anditiscalled“countercurrent”phenomenoninlabelpropagation process. Besides,many algorithms only usethe numberoflabels tomeasure labelinfluenceinthe stageoflabelselection, andig- noretheeffectofnodeinfluenceonlabelselection.Inthissection, weproposeanovellabelpropagationalgorithmforcommunityde- tectionbasedonnodeimportanceandlabelinfluenceinnetworks (LPA_NI).Firstly,LPA_NIusesanovelmethodfornodeimportance evaluation. Secondly, inthe iterativeprocess, LPA_NI updates the nodes in thedescending order basedon the importance of each node.Asthenumberofmultiplelabelsreachesmaximum,LPA_NI calculatestheinfluenceofeachlabelandselectsthemostinfluen- tiallabeltoupdatenodelabel.

(4)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

3.1. Thebasicidea

The importance of nodesis used to measure the inﬂuence of nodesinthewhole network,andinthispaper,we usenode im- portancecalculation algorithmbasedonBayesiannetworktonor- malizetheimportanceofallthenodesinnetwork.Ingeneral,the greater the importance ofa node has,the greater the impacton theothernodesithas,sothelabelofthisnodeismorelikelytobe spread.Butnodenormalizedimportancebasedonprioriattributes ofnodeisnotenough,becausetherearecloselinksbetweennodes and nodes in the network, a node linked with more important nodesismoreimportant.Therefore,weproposedanovelnodeim- portancecalculationmethod basedonthe earlynode importance algorithm,andtheformulaisasfollows:

NI

(

i

) =

^Inf

(

i

) + α ∗

j∈^N(i) Inf

(

j

)

d

(

j

) ,

(4)

where NI(i) represents the importance of node i, Inf(i) represents the priori importance of node i, it is from expert knowledge,

α

representsatunableparameterrangingfrom0to1,which measures the extent of the inﬂuence of neighboring nodes on node i,N(i) represents theneighborhood set ofnode i, andd(j) isthedegreeofnode j.

Wechoosenodeimportanceasthemeasureofnode inﬂuence, so we improve the performance of stability by ﬁxing the nodes sequenceoflabelupdatinginthedescendingorderofnodeimpor- tance.

AnotherfactorthataffectsthestabilityofLPAisthatwhenthe number of multiple labels reaches maximum, the original algorithm randomly selects a label to update the node label. In this paper,weproposealabelselectionmethodusingtheinformation ofthelabelinfluence,sowecancalculatetheinfluenceofeachla- belandselectsthemostinfluentiallabeltoupdatenodelabel.The formulaisasfollows:

LI

(

i

,

l

) =

j∈N^l(i) NI

(

j

)

d

(

j

) ,

(5)

whereLI(i,l)representstheinﬂuenceofthelabell onthenodei, andN^l(i)representsthesetoflabellaroundthenode i.

Asaresult,thenewlabelupdateruleformulaisasfollows:

c_i

=

^{arg max}

l∈^l^maxLI

(

i

,

l

),

(6)

wherec_irepresentsthemostinﬂuentiallabel,andlmax represents thesetsofthemaximumnumberoflabels.

More specifically, when the number of multiple labels in the adjacent node of the node i reaches maximum, LPA_NI recalcu- latestheinfluenceofthelabelsthat reachthemaximumnumber accordingtotheformula(6),andselectsthemostinfluentiallabel toupdatethenodei.

3.2. ThestepsofLPA_NIalgorithm

Based on the novel node importance calculation method and the label influence calculation method, we design a novel label propagationalgorithmforcommunitydetectionbasedonnodeim- portanceandlabelinfluence(LPA_NI).Specificstepsareasfollows:

Input: Network G=(V,E), maximum number of iterations maxIter.

Output:CommunitysetSetc= {^c1,· · ·,ck}^,^and^k^is^the^number ofcommunity.

(1)Initializetheuniquelabelforeachnodeinthenetwork,for agivennodex,cx(0)=^x.

(2)Calculate the node importance foreach node accordingto theformula(4),sortnodesindescendingorderofNI,andgenerate

an orderedsequence X= {^x1,x2,· · ·,xn}^,^where^NI(x1)≥^NI(x2)≥

· · · ≥^NI(xn).

(3)Setiterationnumbert=^1.

(4) Foreach node x∈^X^,iteratively update the node labelac- cording to theformula (5)and (6). In theiterative process, each nodeusesthelabelwiththelargestimportanceofadjacentnodes asthelabelitself.Whenthenumberofmultiplelabelsinthead- jacent node of the node i reaches the maximum value, LPA_NI recalculates the inﬂuence ofthe labels that reach the maximum number accordingtothe formula(6), andselectsthemostinﬂu- entiallabeltoupdatenodei.

(5)Ifiterationnumbert=^maxIter ôr^the^labelôfêach^nodeîs thesameasthatofmostofitsneighboringnodes,thenthenodes with thesame labelare placed inthe samecommunity, andthe algorithmends;otherwise,sett=^t+^{1 and}^go^to^step^(4).

LPA_NI algorithm updates the nodes in the descending order based on theimportance of each node,so the nodeswith larger node importance willinﬂuence the nodeswith lessnode importance, which is conducive to reducing the “countercurrent” phenomenon. Furthermore,LPA_NIconsiders theeffectoflabelinﬂu- enceonlabelselection strategy,soithashigheraccuracyinlabel updating process.Thereby,LPA_NI improvestheaccuracyandsta- bilityofthealgorithm.

We implementLPA_NI on thesocial network sample graphin Fig. 3with

α

=^1.^The^decimalsând^lettersôutside^the^nodesâre thenode importanceandnodelabels.Usingournodeimportance calculation method on the graph, the new importance of node 1–6are1.3302, 0.943,1.3132,1.3702,0.9793,1.3662,respectively, so thenode updating sequence isfixed as4-6-1-3-5-2.The label propagationprocessisshowninFig. 4.

Firstly, we update thelabelofnode 6.We usea set oftuples (l,n,LI(l)) to describe the adjacent node information of node 6, where l represents thelabel, n represents the numberofthe label l, and LI(l) represents the inﬂuence of label l. As shown in Fig. 4(a), node 6 has 3 adjacent nodes, and the adjacent nodes havedifferentlabels,so theadjacentnode informationofnode 6 is {(c,1,0.4377),(d,1,0.4567),(e,1,0.4897)}^.^So ^we ^choose^label

“e”asthenewlabelofnode 6.

Then,weupdatethelabelofnode4.Afterthelabelupdatingof node 6,theadjacentnodeinformationofnode4is{(e,2),(a,1)}^. Due tothemaximumvalue ofadjacentnode ofnode 4isonly1, LPA_NI nolonger calculates the labelinﬂuence, andselects label

“e” as the new label of node 4 asshown in Fig. 4(b). The next labelpropagationofnode1 andnode 3areconsistent withnode 6andnode4,sothelabelsofnode1andnode3are updatedas newlabel“b”,andthespeciﬁcprocessisnolongerdemonstrated.

Finally,the labelsofadjacentnode informationofnode 5and node2aretheirownlabels,sotheyarenotupdatedand,asshown inFig. 4(c).Thelabelsofallnodesarethesameasthatofmostof theirneighboringnodes,andLPA_NIends.

In the label propagation process sample of LPA_NI, the algo- rithmcangettheresultsofthetwocommunitiesthroughaniter- ation,whichfullyconsistentwiththecorrectcommunitydivision.

Sincethereisnorandomness,theLPA_NI algorithmhasgoodsta- bilityandaccuracy.

3.3. Timecomplexityanalysis

Suppose n= |^V|ând^m= |Ê|^,ând^we ^firstly ânalyze^the^time complexityofthealgorithm.

(1)Thetimecomplexityofinitializationforallnodesinstep1:

O(n).

(2)Thetime complexityofcalculatingthenode importanceof all nodesinstep2: O(n),andthetime complexityofrankingthe nodes in descending order of NI based on the fast sorting algo-

(5)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 4.Label propagation process of LPA_NI.

rithm in step 2: O(nlog₂(n)), so the whole time complexity of step 2is O(nlog₂(n)+ⁿ).

(3)The time complexity ofnormal labelupdating: O(m), and the time complexity of recalculating the labels according to the formula(5)ifnecessary: O(m).

(4)The timecomplexity ofassigning thenodeswiththesame labelstoacommunityinstep 5:O(n).

Step 4 is iterative, and maximum number of iterations is maxIter.Thetimecomplexityofthewholealgorithmis3×Ô(n)+ 2×^maxÎter×Ô(m)+Ô(nlog₂(n)).

Thenweanalyzethespacecomplexityofthealgorithm.

(1)The space complexity of using adjacency list to store the nodeandedgeinformationinstep1: O(n+^m).

(2)Thespacecomplexityofrankingthenodesbasedonthefast sortingalgorithminstep2: O(nlog₂(n)).

(3)Thespacecomplexityofassigningthenodeswiththesame labelstoacommunityinstep5: O(n).

Sothespacecomplexity ofthewholealgorithmis2×^O(n)+ O(m)+^O(nlog₂(n)).

4. Experimentsandanalysis

InordertoverifytheperformanceoftheproposedLPA_NIalgo- rithm,thispaperusesreal-worldnetworksandsyntheticnetworks tocarryoutthecommunitydetectionexperiment.

We present the performance of LPA_NI on several real-world complexnetworkswiththeabsent ofground-truthcommunities:

Zachary’skarateclub[25],Footballnetwork[26],Dolphinsnetwork [28]andtheSinamicro-blogdatasetwhichusedinliterature[17].

And we present the performance of LPA_NI on Girvan–Newman benchmarkandLFRbenchmark[29].

Wecomparetheperformance ofLPA_NI withLPAandNIBLPA.

Where NIBLPA is an improved LPA algorithm choosing the label with the maximum k-shell value to assign the nodes. In order toovercometheinﬂuenceoftherandomnessoftheexperiments, all the algorithms are processed 50 timesand the averagevalue is used as the results, and the maximum number of iterations

of the algorithm is set to 100 times.All the algorithms are im- plemented in Java and tested in JDK8.0 platform, and simula- tions are carried out in a notebook PC withInter (R) Core (TM) CPU i5-4200 @ 2.50 GHz and 12 GB memory under Windows 10 OS.

4.1. Experimentdataset

ThedetailsoftheSinamicro-blogdatasetareasfollows:

(1) 63641 Sina Micro-blog users’ information, including uid, nickname, name, location, homepage url, gender, the number of fans, the number of concerns, the number of micro-blogs, the numberofcollectionsandcreationtime;

(2) 84168 micro-blogs’ information about 12 topics from 2014-05-03to2014-05-11,includingmid,releasetime,themicro- blog content, source, the number of forwarded micro-blog, the number of evaluated micro-blog, the number of praised micro- blog,publisheduseruidandtopicofmicro-blog;

(3) 12 topics includes Meizu, XiaoMi, Rockets, Jeremy Lin, Hengda,Korean,haze,house’sprice,MyOldClassmate,corruptof- ﬁcialsandtransgenic;

(4)1391718users’relationships;

(5)27759forwardedmicro-blog’srelations.

62827users’ informationcontained inthe datasetcan restore thetrueSinamicro-blognetwork,and12themesofmicro-blogin- formationcanrestoretheﬂowofinformationinmicro-blogsocial network.Users’relationshipsandforwardedmicro-blog’srelations areenoughtorestoretherealtopologyofthemicro-blogSinanet- work.

Because the original dataset cannot meet the experimental requirements, it is necessary to preprocess the Sina micro-blog dataset.Speciﬁcoperationsareasfollows:

(1)Ifandonlyiftheusernodesaremutualconcerned,thereis an edge betweenthenodes,otherwise thereis noedge between thenodes;

(2)Removethediscretesmallcommunity(the totalnumberof nodesin communityislessthan 5), andobtaina network graph consistingof1220pointsand5023edges.

(6)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

LFR benchmark is presented by Lancichinetti et al. The LFR benchmark generates static networks with built-in community structure.Theconfigurationofgeneratednetworksdependsonvar- ioususer-specifiedparameters. Numberofnodesis N,k specifies theaveragedegree ofnodesandkmax istheupperbound onde- greesofnodes.The mixingparametermuisthefractionofedges thata node hastothenodes outsideof itscommunity.There, as we decrease

μ

we obtain a clearset ofcommunities withfewer numberofinter-edges.Later,theauthorsadapted theLFR bench- marktogenerateoverlappingcommunities.ParameterOnspeciﬁes thenumberofoverlappingnodesandO_m controlsthenumberof membershipofoverlappingnodes.Inthispaper,weset N=^1000, k=^15, ^kmax=^50,

μ

=⁰.3 to generate LFR benchmark and set N=^128,^k=^16,^kmax=^16,

μ

=⁰.1 togenerateGirvan–Newman benchmarktopresenttheperformanceofLPA_NI.

4.2. Evaluationcriteria

Themostpopular measureto comparethe similaritybetween thedeliveredcommunitystructureandtheground-truthcommunities isNormalized MutualInformation(NMI). Wehaveused an implementationofNMImeasuremadeavailable byMcDaidetal., forsetsofoverlappingcommunities[29].Further,weusemodular- ityvalueofobtainedcommunitystructureswhentheground-truth communitystructureisunknown.

Modularity is currentlywidely used in measuring the performance of network detection algorithms that proposed by New- manandGirvaninliterature[24].Moreover,whiletheunderlying classlabels oftherealnetwork are unknown,we canonly adopt the modularity asthe evaluationcriteriaon real networks. Fora dataset with no overlapping communities, the modularity is de- ﬁnedas:

Q

=

¹ 2m

i,j∈^V

A_{i j}

−

^d

(

i

)

d

(

j

)

2m

× δ(

c_i

,

c_j

),

(7)

where Q representsthe modularity,m representsthenumber of edges inthe network, and A isthe adjacencymatrix ofthe network, ifnode i andnode j are directlyconnected, Ai j=^1; ^else- wise, Ai j=^0. ^ci andcj, denotethe label of node i and node j, respectively,ifc_i=^cj,thenδ(c_i,c_j)=^1,^elseδ(c_i,c_j)=^0.

4.3. Experimentresult

Inthissection,therealdatasetisusedtotesttheeffectiveness ofLPA_NIcomparingwithtraditionalLPAandNIBLPA,andwetest thestability ofthe algorithms by analyzingthe ﬂuctuation range ofthewholeresults.

Inordertocomparetheeffectivenessofthealgorithms,weuse theaveragemodularity Q tomeasurethequalityofdividedcom- munities, so the greater the Q value is, the more accurate the results ofcommunity detectionare, and thesmaller the ﬂuctua- tionrangeofQ valueis,themorestablethecommunitydetection resultsare.NIBLPAandLPA_NIcontainonlyoneparameter

α

.And theparameter

α

isusedtomeasuretheextentoftheinﬂuenceof neighboringnodestonodei,sowelet

α

= {⁰,0.1,0.2· · ·,1}^.^More speciﬁcally,if

α

₌0,theimportanceoftheadjacentnodesofthe node i has no inﬂuence,so the importance of node i isonly its ownimportance. And if

α

=^1,^theîmportance ôfâdjacent ^nodes isaccumulatedinmultiplesof1toobtainthenewimportanceof nodei.TheexperimentalresultsareshowninFig. 5.

As it can be seen in Fig. 5, for the real dataset used in this paper, the variation trend of the Q value of LPA_NI is like bell shapedcurvewiththegradualincreaseofparameters

α

,andwhen

α

=⁰.4,the maximummodularity Q =⁰.6995 is achieved.More speciﬁcally,when

α

=⁰.4,theinﬂuenceofadjacentnodesonthe

Fig. 5.The comparison ofQunder different tunable parameterα.

Fig. 6.50 repeated experiments under parameterα=⁰.4.

node i isthe mostappropriate,andthe newnodeimportance in improving thecommunitystructuredetectionhasthe besteffect.

The Q valueofNIBLPAchangeslittleunderdifferentparameter

α

, and themaximum modularity Q =⁰.6197 is obtainedwhen the parameters

α

=⁰.5 and

α

=⁰.6. Experimental resultsshow that themodularityofLPA_NIishigherthanNIBLPAwhen

α

=⁰.4.And withtheimportanceofadjacencynodestotheinfluenceofnodei graduallyincreased, thequalityofcommunitydetectionofLPA_NI firstly increased, then gradually decreased. Therefore, the impact of an appropriate amount of the adjacent node importance will significantly improvethe accuracy ofLPA_NI, andLPA_NI can get anoptimalresultofcommunitydetection.

In order to analyze the stability of LPA_NI, NIBLPA and LPA, this paper analyzes the 11 experimental results of 50 repeated experiments withparameter

α

=⁰.4,the variation trendof each algorithmmodularityvalue Q isshowninFig. 6.

ItcanbeseenfromFig. 6that Q valuesofLPA_NIandNIBLPA are0.6995and0.6197inthe50repeatedtests.BecauseLPA_NIand NIBLPAimprove therandomness oftheupdate sequence andthe label selection process, they have goodstability. However, LPA is lessstablebecauseoftherandomnessofthealgorithm.Ingeneral, LPA_NI can getstable resultsthan LPA andeffective results than LPAandNIBLPA.

(7)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 7.The number of iterations under different tunable parameterα.

Fig. 8.The running time under different tunable parameterα.

Inorder to analyze the eﬃciencyof LPA_NI, NIBLPA andLPA, Figs. 7 and 8show the numberof iterationand runningtime of thealgorithm,respectively.

Itcan be seen from Fig. 7that asthe parameter

α

increases, thenumberofiterations oftheLPA_NIare stablein7times.And NIBLPAhasalargerfluctuationonthenumberofiterations.Inthe earlyperiod,thenumberofiterations isaround8times,thenthe numberwillsurgeto17,21,13times.Experimental resultsshow that the number of iterations of the LPA_NI is significantly less thanNIBLPA,andtherunning timeisobviouslybetterthan other algorithms. Therefore, the node importance calculated by k-shell valuecan not fullydescribethe node influenceinthe large-scale socialnetworksowhen

α

_≥0.8,thenumberofiterationsandrun- ningtimeofthe algorithmwill besigniﬁcantly increased. LPA_NI using the priori knowledge to calculate the node importance is more accurate, so the number of iterations and running time of LPA_NIwillbe less, andthe stabilityofLPA_NI isbetter thanthe otheralgorithms.

Then,wecomparetheperformance ofLPA_NI,NIBLPAandLPA onZachary’skarateclub,Footballnetwork,Dolphinsnetwork.The experimentalresultsareshowninTable 1.It’seasy toseeLPA_NI achievehigherQ thanotheralgorithmsondifferentnetworks.

Table 1

QobtainedbyLPA_NI,NIBLPAandLPAondifferentreal-worldnetworks.

Networks QLPA_NI QNIBLPA QLPA

Karate 0.4151 0.3944 0.3590

Football 0.5805 0.5762 0.5899

Dolphins 0.5143 0.5184 0.4797

Table 2

NMIobtainedbyLPA_NI,NIBLPAandLPAondifferentsyntheticnetworks.

Networks NMILPA_NI NMINIBLPA NMILPA

LFR 1 1 1

Girvan–Newman 0.9989 0.9989 0.9989

We alsoshow theperformance ofLPA_NI onLFR andGirvan–

NewmanbenchmarkinTable 2.WecompareourmethodwithLPA andNIBLPA.It’seasytoseeLPA_NIachievehigherNMIondifferent networks.

Insummary, onthe one hand,LPA_NI takesintoaccount that thenodeimportancemayimpactlabelupdatingprocess,soitwill let the more important nodes affect some less important nodes, anditcanreducethe“countercurrent”phenomenoninlabelprop- agationprocess.Ontheotherhand,LPA_NItakesintoaccountthat the effect of label inﬂuence on label selection. Therefore, it can improvetheaccuracyinlabelselectionprocess.Asitcanbe seen, LPA_NI cansigniﬁcantlyimprovethequalityofcommunitydetec- tion andshorten theiteration period.Also, ithasbetter accuracy andstabilityinthecaseofsimilarcomplexity.

5. Conclusion

Thispaperpresentsalabelpropagationalgorithmforcommu- nitydetectionbasedonnodeimportanceandlabelinfluence.The algorithmfirstly usesanovelmethodfornode importanceevalu- ation,andfixes thenode ordersoflabelupdatinginthedescend- ingorderofnodeimportance.Duringeachlabelupdatingprocess, whenthenumberofmultiplelabelsreachesthemaximumvalue, thispaperintroduces thelabelinfluenceintotheformulaoflabel updatingtoimprovethestability.Thisalgorithmmaintainsthead- vantages ofthe original LPA algorithm,andit can get stableand efficient resultsby avoiding therandomness in labelpropagation process.Byexperimentsonrealdataset,wedemonstratethat the proposedalgorithmhasbetterperformancethansomeofthecur- rentrepresentativealgorithms.

Uncitedreferences [18] [27]

References

[1] YoufangLin,TianyuWang,RuiTang,YuanweiZhou,HoukuanHuang,Anef- fective modelandalgorithmfor communitydetectioninsocialnetworks,J.

Comput.Res.Dev.49 (2)(2012)337–345.

[2] DayouLiu,DiJin,DongxiaoHe,JingHuang,JianningYang,BoYang,Community miningincomplexnetworks,J.Comput.Res.Dev.50 (10)(2013)2140–2154.

[3] M.Shiga,I.Takigawa,H.Mamitsuka,Aspectralapproachtoclusteringnumeri- calvectorsasnodesinanetworks,PatternRecognit.44 (2)(2011)236–251.

[4] LiangHuang,RuixuanLi,HongChen,XiwuGu,KunmeiWen,YuhuaLi,Detect- ingnetworkcommunitiesusingregularizedspectralclusteringalgorithm,Artif.

Intell.Rev.41 (4)(2014)579–594.

[5] R.Guimera,M.Sales-Pardo,L.A.Amaral,Modularityfromﬂuctuationsinran- domgraphsandcomplexnetworks,Phys.Rev.E70 (2)(2004)188.

[6] R.Guimera, Functional cartographyofcomplex metabolic networks, Nature 433 (7028)(2005)895–900.

[7] J.Duch,A.Arenas,Communitydetectionincomplexnetworksusingextremal optimization,Phys.Rev.E72 (2)(2005)986.

[8] ZhanBu,ChengcuiZhang,ZhengyouXia,JiandongWang,Afastparallelmod- ularity optimization algorithm (FPMQA) for community detection inonline socialnetwork,Knowl.-BasedSyst.50 (3)(2013)246–259.

(8)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

[9] S.Zhang,H.Zhao,Normalizedmodularityoptimizationmethodforcommunity identiﬁcationwithdegreeadjustment,Phys.Rev.E88 (5–1)(2013)471.

[10] M.Girvan,M.E.J.Newman,Communitystructureinsocialandbiologicalnet- works,Proc.Natl.Acad.Sci.USA99 (12)(2002)7821–7826.

[11] F.Wu,B.A.Huberman,Findingcommunitiesinlineartime:aphysicsapproach, Eur.Phys.J.B-Condens.Matter38 (2)(2004)331–338.

[12] U.N.Raghavan,R.Albert,S.Kumara,Nearlinear-timealgorithmtodetectcom- munitystructuresinlarge-scalenetworks,Phys.Rev.E76 (3)(2007)036106.

[13] G.Steve,Findingoverlappingcommunitiesinnetworksbylabelpropagation, NewJ.Phys.12 (10)(2010)2011.

[14] M.J.Barber,J.W.Clark,Detectingnetworkcommunitiesbypropagatinglabels underconstraints,Phys.Rev.E80 (2)(2009)026129.

[15] I.X.Leung,P.Hui,P.Lio,J.Crowcroft,Towardsrealtimecommunitydetection inlargenetworks,Phys.Rev.E79 (6)(2009)066107.

[16] XianKunZhang,XueTian,YaNanLi,ChenSong,Labelpropagationalgorithm basedonedgeclusteringcoeﬃcientforcommunitydetectionincomplexnet- works,Int.J.Mod.Phys.B28 (30)(2014).

[17] XianKunZhang,SongFei,ChenSong,XueTian,YangYueAo,Labelpropagation algorithmbasedonlocalcyclesforcommunitydetection,Int.J.Mod.Phys.B 29 (5)(2015).

[18] J.Xie,B.K.Szymanski,LabelRank:astabilizedlabelpropagationalgorithmfor communitydetectioninnetworks,Netw.Sci.Workshop(2013)138–143.

[19] Y.Xing,F.Meng,Y.Zhou,M.Zhu,M.Shi,G.Sun,Anodeinﬂuencebasedla- belpropagationalgorithmforcommunitydetectioninnetworks,Sci.WorldJ.

2014 (5)(2014)627581.

[20] J.Sohn,D.Kang,H.Park,B.G.Joo,I.J.Chung,AnImprovedSocialNetworkAnal- ysisMethodforSocialNetworks,Springer,Netherlands,2014,pp. 115–123.

[21] L.Šubelj, M.Bajec,Groupdetectionincomplexnetworks:analgorithmand comparisonofthestateoftheart,Phys.A,Stat.Mech.Appl.397 (3)(2014) 144–156.

[22] O. Green,D.A.Bader,Faster betweennesscentralitybased ondatastructure experimentation,Proc.