• Tidak ada hasil yang ditemukan

importance and label influence Label propagation algorithm for community detection based on node Physics Letters A

N/A
N/A
Protected

Academic year: 2024

Membagikan "importance and label influence Label propagation algorithm for community detection based on node Physics Letters A"

Copied!
8
0
0

Teks penuh

(1)

Contents lists available atScienceDirect

Physics Letters A

www.elsevier.com/locate/pla

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Label propagation algorithm for community detection based on node importance and label influence

Xian-Kun Zhang

1

, Jing Ren, Chen Song, Jia Jia, Qian Zhang

CollegeofComputerScienceandInformationEngineering,TianjinUniversityofScienceandTechnology,No.29,13thAvenue,TEDA,BinhaiNewArea,Tianjin, 300457,PRChina

a r t i c l e i n f o a b s t ra c t

Articlehistory:

Received28March2017

Receivedinrevisedform9June2017 Accepted10June2017

Availableonlinexxxx CommunicatedbyC.R.Doering

Keywords:

Large-scalesocialnetwork Communitydetection Labelpropagation Nodeimportance Labelinfluence

Recently, the detection of high-quality community has become a hot spot in the research of social network. Labelpropagationalgorithm(LPA)hasbeenwidelyconcernedsince ithasthe advantagesof lineartimecomplexityand isunnecessarytodefineobjectivefunctionandthenumber ofcommunity inadvance.However,LPAhastheshortcomingsofuncertaintyandrandomnessinthelabelpropagation process,whichaffectstheaccuracyand stabilityofthecommunity.Forlarge-scalesocialnetwork,this paperproposesanovellabelpropagationalgorithmforcommunitydetectionbasedonnodeimportance and label influence (LPA_NI). The experiments with comparative algorithms on real-world networks and synthetic networks have shown that LPA_NIcan significantly improve the quality ofcommunity detectionandshortentheiterationperiod.Also,ithasbetteraccuracyandstabilityinthecaseofsimilar complexity.

©2017ElsevierB.V.Allrightsreserved.

1. Introduction

Socialnetworkisarelativelystablesocialrelationshipwiththe interactionresult betweenindividual members.Social network is commonlyabstractedintoagraph,inwhichnodesrepresentindi- vidualmembersandedges (orlinks)aretherelationshipbetween individual members [1]. A community is a subgraph containing nodeswhicharemoredenselylinkedtoeachotherthantotherest ofthegraphorequivalently,agraphhasacommunitystructureif thenumberoflinksintoanysubgraphishigherthanthenumber oflinksbetweenthose subgraphs[24].Givenanetwork diagram, theprocessoffindingitscommunitystructureiscalledcommunity detection. In socialnetwork, community detectionforsocial net- workanalysisisimportant.Inrecentdecades,manysocialnetwork communitydetectionalgorithmshavebeenproposed.Accordingto thesolutionstrategy,algorithms canbe dividedintooptimization methodsandheuristicmethods[2].Optimizationmethodsachieve community detection through setting the objective function and iteratingto getapproximation of theoptimal value ofthe objec- tivefunction.Therepresentativealgorithmsofoptimizationmeth- ods includespectral algorithms andmodularity maximization al- gorithms[3–9].Thespectralalgorithms[3,4]transformcommunity identificationproblemintosimplequadraticoptimizationproblem,

E-mailaddress:[email protected](X.-K. Zhang).

1 Fax:8602260274450.

andtheapproximateoptimalpartitionofthenetworkisobtained bysolvingthecharacteristicvectorsofLaplacianmatrix.Modular- itymaximization methodsare to findthemaximumvalue ofthe modularityfunction(Q function)inthenetwork,whoserepresen- tative algorithms includeFN algorithm [5], GA algorithm [6]and EO algorithm [7]. Heuristic methods find the optimaldivision of communities bysettingheuristicsrules,anditsrepresentative al- gorithmsincludeGNalgorithm[10]andWHalgorithm[11].

Inaddition,thereareothereffectivecommunitydetectionalgo- rithms. Forexample,hierarchicalclustering algorithms regard the social network asthecomposition of themulti-layer community, and use the traditional hierarchicalclustering algorithm to carry outcommunitydetection.Evolutionarycomputationalgorithmsre- gardtheprocessofcommunitydetectionastheoptimizationpro- cess ofthe objective function,and useevolutionary computation methodstofindtheapproximateoptimalsolutiontotheproblem.

Thesealgorithms generallyhave hightime andspacecomplexity, andthey are not suitable forlarge-scale social networkstructure detection.In2007,Raghavanetal.[12]proposedafastcommunity detectionalgorithmLPAbasedonlabelpropagation.Thecomplex- ityofLPAisnearlylineartime,andthedesignofthealgorithmis simple,soLPAhasgoodefficiencyindealingwithlarge-scalenet- work. Moreover, LPA neitherrequire apredefined objective func- tion optimizationnor need communityquantity andscale. All of these make LPA receive quite a lot of attention from numerous scholars.However, intheprocess ofupdatingthenode label, LPA hastheshortcomingsofuncertaintyandrandomness,whichleads http://dx.doi.org/10.1016/j.physleta.2017.06.018

0375-9601/©2017ElsevierB.V.Allrightsreserved.

(2)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 1.Label propagation process (Raghavan et al.[12]).

to poorresults inaccuracy andstability of the algorithm.Litera- ture[19]takesintoaccounttheinfluenceofnodedegreeandedge weightduringthelabelupdating process.Butinlarge-scalesocial networks,such asthemicro-blogsocialnetwork,theinfluence of thenodes’ prior attribute onthe node importance is not consid- ered.

Inthispaper, wepropose a novellabelpropagationalgorithm forcommunitydetectionbasedonnode importanceandlabelin- fluence in networks (LPA_NI). Firstly, LPA_NI uses a novel node importancemethodtoevaluatenodeimportance.Secondly,inthe iterativeprocess,LPA_NIfixesthenodeordersoflabelupdatingin thedescendingorderofnodeimportance.Asthenumberofmul- tiple labelsreaches maximum,LPA_NI calculates the influenceof each label and selects the mostinfluential label to update node label. Finally, experiments with comparative algorithms on real- world networksand syntheticnetworks haveshown that LPA_NI cansignificantly improvethe qualityofcommunitydetectionand shortenthe iteration period.Also, ithasbetter accuracy andsta- bilityinthecaseofsimilarcomplexity.

The restofthe paperisorganized asfollows.Section 2 intro- duces the label propagation algorithm and the node importance calculationalgorithmbasedonBayesiannetwork.InSection3,this paper introduces the main idea and the detailed process of our algorithm. The experimental results on real-world networks and synthetic networksin Section 4 confirm the effectiveness of the algorithm.TheconclusionisgiveninSection5.

2. Relatedwork

A complex network can be modeled as a graph G=(V,E), where V = {v1,v2,· · ·,vn} represents the set of nodes, E = {e1,e2,· · ·,em}representstheedgesbetweenthenodes,andn,m representthenumberofnodesandedges inthenetwork,respec- tively.EachedgeinE hasapairofnodesinV corresponding.The labelofvi isdenoted asci, N(i)representstheneighborhoodset ofviandd(i)isthedegreeofnodei.

2.1. Labelpropagationalgorithmforcommunitydetectioninnetworks

LPA is a kind of heuristic algorithm, and the main heuristic rules of LPA is to transfer label information constantly between nodes. After several iterations, the node label belonging to the samecommunitywillconverge.ThemainideaofLPAisthateach nodechangesitslabeltotheonecarriedbythelargestnumberof itsadjacentnodes(asshowninFig. 1).TheprocessofLPAcanbe describedasthefollowingsteps:

Input: Network G=(V,E), maximum number of iterations maxIter.

Output:CommunitysetSetc= {c1,· · ·,ck},andkisthenumber ofcommunity.

(1)Initializetheuniquelabelforeachnodeinthenetwork.For agivennodex,cx(0)=x.

(2)Setiterationnumbert=1.

Fig. 2.The oscillation of labels in a bipartite graph (Xing et al.[19]).

(3)Arrangethenodesofthenetworkinrandomorder,andgen- erateanorderedsequence X= {x1,x2,· · ·,xn}.

(4) Foreach node xX,iteratively update the node labelac- cording to the formula (1) or (2). In the iterative process, each nodechangesitslabeltotheonecarriedbythelargestnumberof itsadjacentnodes.Asthenumberofmultiplelabelsreachesmax- imum,LPArandomlyselectsoneofthemtoupdatethenodelabel.

Labelupdatingmethodscanbedividedintosynchronousupdating method andasynchronous updating method.Synchronous updat- ing methodis to update the labelof node x inthe tth iteration basedonthelabeloftheadjacentnodesatthe(t1)thiteration.

Theformulaisasfollows:

cx

(

t

) =

f

cx1

(

t

1

), · · · ,

cxk

(

t

1

)

,

xi

N

(

x

),

(1) where cx(t) representsthelabelof node xinthe t iteration,and N(x)representstheneighborhoodsetofnode x.Synchronousup- dating may occur oscillation phenomenon in bipartite or nearly bipartitestructurenetwork(asshowninFig. 2),andasynchronous updating can be a good solution to this problem [12]. Asyn- chronous updating is to update the label of node x in the tth iteration based on a portion of labels at the tth iteration of its adjacentnodeswhichhavealreadybeenupdatedinthecurrentit- eration andanotherpartoflabelsatthe(t1)thiterationwhich arenotyetupdatedinthecurrentiteration.Theformulaisasfol- lows:

cx

(

t

) =

f

cx1

(

t

1

), · · · ,

cxm

(

t

1

),

cx(m+1)

(

t

), · · · ,

cxk

(

t

) ,

xi

N

(

x

),

(2)

(5)Ifiterationnumbert=maxIter orthelabelofeachnodeis thesameasthatofmostofitsneighboringnodes,thenthenodes with thesame labelare placed inthe samecommunity, andthe algorithmends;otherwise,sett=t+1 andgotostep (3).

AlabelpropagationprocessisshowninFig. 1.Firstly, LPAini- tializesallthenodesinthegraph,andallthe5nodesareassigned unique labels: “a”, “b”, “c”, “d” and“e”. Secondly,algorithm ran- domlyselectsnode3toupdate itslabel,becausetheneighboring nodes around node 3 are labeled withthe maximumvalue of 1, node 3 randomly selects one of them to update its label as“a”.

Then, algorithm randomly selects node 5to update its label, be- causethereare twoneighborsofnode 5that sharelabel“a”and only thelabel“a”hasa maximumvalue of2,node5updatesits

(3)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 3.The social network sample graph.

labelas“a”. Finally,thelabelsofthewhole nodeshavebeenup- datedforlabel“a”,thelabelpropagationalgorithmend.

Althoughasynchronousupdatingmethodcanavoidtheoscilla- tionproblemsoflabels,it canbe seenfromtheabove steps that theuncertaintyandrandomness ofLPAaremainly fromtheran- domnessofupdatesequenceanduncertaintyinnodelabelschoos- ingprocessinstep(4),whichaffectedtheaccuracyandstabilityof LPAalgorithm.WeanalyzetraditionalLPAwiththesocialnetwork sample graphinFig. 3. Thereare 2 communitiesin thenetwork, {1,2,3}and{4,5,6}.The6nodesinthegraphareinitializedwith label“a”, “b”,“c”,“d”, “e”and“f”,respectively.Assumingthatthe labelsofnodes1,2,3havebeenupdatedtolabel“a”,thelabelsof nodes4,5,6arestill“d”,“e”,“f”.Ifwerandomlyselectnode4to updateandlabel“a”ischosenasitslabel,thenweupdatenode6, andfinally updatenode5.Theresultisthatallnodesinnetwork willbe updatedto label“a”,andall nodeswillbe dividedinto a community.Conversely,ifnode4isupdated tolabel“f”,thenup- datenode 5,andfinallyupdatenode6.Theresultofthenetwork willbe dividedinto2communities,{1,2,3}and{4,5,6},andthe resultswillcorrespondwiththerightcommunities.

Inrecentyears,manyscholarshaveimprovedthestandardLPA algorithm.Steve etal.[13] extended the standard LPAalgorithm, andproposedanalgorithmCOPRAfordetectingoverlapping com- munity structure by allowing each node to retain a number of communitylabels.Barberetal.[14] proposeda constrainedlabel propagationalgorithmLPAm,whichconvertedLPAasanoptimiza- tionproblem,andgavethecorrespondingobjectivefunctionH,so itcouldsolvetheproblemofclusteringperformanceofLPA.Leung etal.[15] foundthat afterfiveiterationofLPA,95%ofthenodes hadbeencorrectly gathered.Afterthat, the iteration was mainly toupdatethenodeslabelsinthecommunity.Therefore,theyim- provedtheupdatingrulesandtheiterativerulesofLPA.Zhanget al.[16] proposedaLPAc algorithmbasedonedgeclusteringcoef- ficient,whichimprovedthe labelupdating processby calculating theedgeclustering coefficient.Literature[17]usedthemethodof localcyclestosolvetherandomprobleminlabelupdatingprocess.

However,thesealgorithmsdonottakeintoaccounttheimportance ofnodesandtheinfluenceofdifferentlabels.Inlarge-scalesocial networks,theprioripropertyofdifferentnodeswilldeterminethe node importance, and different node importance will decide the labelinfluenceofthenodelabeled,sotheinfluenceofdifferentla- belswillbreaktherandomrulesoflabelpropagationprocess,and itcansignificantlyimprovetheaccuracyandstability ofthealgo- rithm.Yao etal.[19] proposed anodes’k-shellvalue basedlabel propagationalgorithm NIBLPA, which chooses the label withthe maximumk-shellvaluetoupdatethenodelabel.Butthealgorithm NIBLPAdoesn’tconsidertheprioripropertyofnodesinsocialnet- works,suchasthenumberoffansoftheusernodes,user’smicro- blogreleasednumber,anduser’smicro-blogforwardednumberin micro-blogsocialnetwork.Therefore,theaccuracyofnode impor- tancecalculationmethodneedstobeimproved.

2.2.NodeimportancecalculationalgorithmbasedonBayesiannetwork

There are many algorithms to calculate the importance of nodes, such as degree centrality [20], clustering coefficient cen- trality[21],andbetweennesscentrality[22],butdegreecentrality

andclustering coefficientcentralitycanonlydescribethelocalin- formation ofnetworks. And the time complexity of betweenness centrality is too high, so it is only suitable for small networks.

Forlarge-scale socialnetworks, Zhangetal.[23] proposeda user nodeimportancecalculationalgorithmbasedonBayesiannetwork, which considered the messages inmicro-blogsocial network are high redundancy, fast rate of transmission and high timeliness.

Firstly,thealgorithm givesadeepanalysisofusers’behaviorpat- terns in micro-blog social network. Secondly, it uses a Bayesian model to learn prior probability of property of user node, and identifiesimportantusersbymanualidentification.Thirdly,ituses expertknowledgetoobtainthepriorprobabilityofeachproperty, andstudiesthe attributevalue ofthe nodewithmajor influence.

Finally,itestablishes amodelofBayesiannetworktodescribethe relationshipbetweenuserpropertyandinfluence,whichcannor- malizeallthenodesimportanceinnetwork.Thealgorithmcanbe describedasthefollowingsteps:

(1) Obtain user’s basic information which includes the total number of fans, the total number of micro-blogs, the number of concerns, the number of collections andcreation time. More- over,obtainuser’smicro-bloginformation,friendrelationshipand micro-blog’sforwardedrelations.

(2) Based on user’s basic information, calculate user’s accu- mulative numberofforwarded micro-blogs,accumulative number ofevaluated micro-blogs,accumulative numberof praised micro- blogs,averagenumberofforwarded micro-blogs,averagenumber of evaluated micro-blogs,average number ofpraised micro-blogs and activity, and classify the micro-blog through the content of themicro-bloginformation.

(3) Identify the important user node manually from different fieldsinthemicro-blogsocialnetwork,thenobtain thethreshold thatmeasurestheinfluenceofvariouspropertiesbycomparingthe identifiedimportantuserwiththeusersinthedataset,finallyes- tablishuserattributes-influenceprobabilityformula.

(4) Calculate the total user influence through multiplying the influenceofeachattributefactor.Theformulaisasfollows:

P

(

Inf

) =

P

(

Inf

|

Attr

),

(3)

where P(Inf)representstheinfluenceoftheuser,andP(Inf|Attr) representstheinfluenceofeachattributeoftheuser.

(5)Normalizetheinfluenceofalltheusersinthedataset,and theimportanceofeachnodeisobtained.

3. Proposedmethod

Alotofimprovedlabelpropagationalgorithmsrandomlydeter- mine labelupdating order ofnodesin labelpropagationprocess, because these algorithms didn’t take into account that the node importancemayimpactlabelupdatingprocess,itmayleadtothe lessimportant nodes affectsome more important nodesin turn, anditiscalled“countercurrent”phenomenoninlabelpropagation process. Besides,many algorithms only usethe numberoflabels tomeasure labelinfluenceinthe stageoflabelselection, andig- noretheeffectofnodeinfluenceonlabelselection.Inthissection, weproposeanovellabelpropagationalgorithmforcommunityde- tectionbasedonnodeimportanceandlabelinfluenceinnetworks (LPA_NI).Firstly,LPA_NIusesanovelmethodfornodeimportance evaluation. Secondly, inthe iterativeprocess, LPA_NI updates the nodes in thedescending order basedon the importance of each node.Asthenumberofmultiplelabelsreachesmaximum,LPA_NI calculatestheinfluenceofeachlabelandselectsthemostinfluen- tiallabeltoupdatenodelabel.

(4)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

3.1. Thebasicidea

The importance of nodesis used to measure the influence of nodesinthewhole network,andinthispaper,we usenode im- portancecalculation algorithmbasedonBayesiannetworktonor- malizetheimportanceofallthenodesinnetwork.Ingeneral,the greater the importance ofa node has,the greater the impacton theothernodesithas,sothelabelofthisnodeismorelikelytobe spread.Butnodenormalizedimportancebasedonprioriattributes ofnodeisnotenough,becausetherearecloselinksbetweennodes and nodes in the network, a node linked with more important nodesismoreimportant.Therefore,weproposedanovelnodeim- portancecalculationmethod basedonthe earlynode importance algorithm,andtheformulaisasfollows:

NI

(

i

) =

Inf

(

i

) + α ∗

jN(i) Inf

(

j

)

d

(

j

) ,

(4)

where NI(i) represents the importance of node i, Inf(i) repre- sents the priori importance of node i, it is from expert knowl- edge,

α

representsatunableparameterrangingfrom0to1,which measures the extent of the influence of neighboring nodes on node i,N(i) represents theneighborhood set ofnode i, andd(j) isthedegreeofnode j.

Wechoosenodeimportanceasthemeasureofnode influence, so we improve the performance of stability by fixing the nodes sequenceoflabelupdatinginthedescendingorderofnodeimpor- tance.

AnotherfactorthataffectsthestabilityofLPAisthatwhenthe number of multiple labels reaches maximum, the original algo- rithm randomly selects a label to update the node label. In this paper,weproposealabelselectionmethodusingtheinformation ofthelabelinfluence,sowecancalculatetheinfluenceofeachla- belandselectsthemostinfluentiallabeltoupdatenodelabel.The formulaisasfollows:

LI

(

i

,

l

) =

jNl(i) NI

(

j

)

d

(

j

) ,

(5)

whereLI(i,l)representstheinfluenceofthelabell onthenodei, andNl(i)representsthesetoflabellaroundthenode i.

Asaresult,thenewlabelupdateruleformulaisasfollows:

ci

=

arg max

llmaxLI

(

i

,

l

),

(6)

wherecirepresentsthemostinfluentiallabel,andlmax represents thesetsofthemaximumnumberoflabels.

More specifically, when the number of multiple labels in the adjacent node of the node i reaches maximum, LPA_NI recalcu- latestheinfluenceofthelabelsthat reachthemaximumnumber accordingtotheformula(6),andselectsthemostinfluentiallabel toupdatethenodei.

3.2. ThestepsofLPA_NIalgorithm

Based on the novel node importance calculation method and the label influence calculation method, we design a novel label propagationalgorithmforcommunitydetectionbasedonnodeim- portanceandlabelinfluence(LPA_NI).Specificstepsareasfollows:

Input: Network G=(V,E), maximum number of iterations maxIter.

Output:CommunitysetSetc= {c1,· · ·,ck},andkisthenumber ofcommunity.

(1)Initializetheuniquelabelforeachnodeinthenetwork,for agivennodex,cx(0)=x.

(2)Calculate the node importance foreach node accordingto theformula(4),sortnodesindescendingorderofNI,andgenerate

an orderedsequence X= {x1,x2,· · ·,xn},whereNI(x1)NI(x2)

· · · ≥NI(xn).

(3)Setiterationnumbert=1.

(4) Foreach node xX,iteratively update the node labelac- cording to theformula (5)and (6). In theiterative process, each nodeusesthelabelwiththelargestimportanceofadjacentnodes asthelabelitself.Whenthenumberofmultiplelabelsinthead- jacent node of the node i reaches the maximum value, LPA_NI recalculates the influence ofthe labels that reach the maximum number accordingtothe formula(6), andselectsthemostinflu- entiallabeltoupdatenodei.

(5)Ifiterationnumbert=maxIter orthelabelofeachnodeis thesameasthatofmostofitsneighboringnodes,thenthenodes with thesame labelare placed inthe samecommunity, andthe algorithmends;otherwise,sett=t+1 andgotostep(4).

LPA_NI algorithm updates the nodes in the descending order based on theimportance of each node,so the nodeswith larger node importance willinfluence the nodeswith lessnode impor- tance, which is conducive to reducing the “countercurrent” phe- nomenon. Furthermore,LPA_NIconsiders theeffectoflabelinflu- enceonlabelselection strategy,soithashigheraccuracyinlabel updating process.Thereby,LPA_NI improvestheaccuracyandsta- bilityofthealgorithm.

We implementLPA_NI on thesocial network sample graphin Fig. 3with

α

=1.Thedecimalsandlettersoutsidethenodesare thenode importanceandnodelabels.Usingournodeimportance calculation method on the graph, the new importance of node 1–6are1.3302, 0.943,1.3132,1.3702,0.9793,1.3662,respectively, so thenode updating sequence isfixed as4-6-1-3-5-2.The label propagationprocessisshowninFig. 4.

Firstly, we update thelabelofnode 6.We usea set oftuples (l,n,LI(l)) to describe the adjacent node information of node 6, where l represents thelabel, n represents the numberofthe la- bel l, and LI(l) represents the influence of label l. As shown in Fig. 4(a), node 6 has 3 adjacent nodes, and the adjacent nodes havedifferentlabels,so theadjacentnode informationofnode 6 is {(c,1,0.4377),(d,1,0.4567),(e,1,0.4897)}.So we chooselabel

“e”asthenewlabelofnode 6.

Then,weupdatethelabelofnode4.Afterthelabelupdatingof node 6,theadjacentnodeinformationofnode4is{(e,2),(a,1)}. Due tothemaximumvalue ofadjacentnode ofnode 4isonly1, LPA_NI nolonger calculates the labelinfluence, andselects label

“e” as the new label of node 4 asshown in Fig. 4(b). The next labelpropagationofnode1 andnode 3areconsistent withnode 6andnode4,sothelabelsofnode1andnode3are updatedas newlabel“b”,andthespecificprocessisnolongerdemonstrated.

Finally,the labelsofadjacentnode informationofnode 5and node2aretheirownlabels,sotheyarenotupdatedand,asshown inFig. 4(c).Thelabelsofallnodesarethesameasthatofmostof theirneighboringnodes,andLPA_NIends.

In the label propagation process sample of LPA_NI, the algo- rithmcangettheresultsofthetwocommunitiesthroughaniter- ation,whichfullyconsistentwiththecorrectcommunitydivision.

Sincethereisnorandomness,theLPA_NI algorithmhasgoodsta- bilityandaccuracy.

3.3. Timecomplexityanalysis

Suppose n= |V|andm= |E|,andwe firstly analyzethetime complexityofthealgorithm.

(1)Thetimecomplexityofinitializationforallnodesinstep1:

O(n).

(2)Thetime complexityofcalculatingthenode importanceof all nodesinstep2: O(n),andthetime complexityofrankingthe nodes in descending order of NI based on the fast sorting algo-

(5)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 4.Label propagation process of LPA_NI.

rithm in step 2: O(nlog2(n)), so the whole time complexity of step 2is O(nlog2(n)+n).

(3)The time complexity ofnormal labelupdating: O(m), and the time complexity of recalculating the labels according to the formula(5)ifnecessary: O(m).

(4)The timecomplexity ofassigning thenodeswiththesame labelstoacommunityinstep 5:O(n).

Step 4 is iterative, and maximum number of iterations is maxIter.Thetimecomplexityofthewholealgorithmis3×O(n)+ 2×maxIter×O(m)+O(nlog2(n)).

Thenweanalyzethespacecomplexityofthealgorithm.

(1)The space complexity of using adjacency list to store the nodeandedgeinformationinstep1: O(n+m).

(2)Thespacecomplexityofrankingthenodesbasedonthefast sortingalgorithminstep2: O(nlog2(n)).

(3)Thespacecomplexityofassigningthenodeswiththesame labelstoacommunityinstep5: O(n).

Sothespacecomplexity ofthewholealgorithmis2×O(n)+ O(m)+O(nlog2(n)).

4. Experimentsandanalysis

InordertoverifytheperformanceoftheproposedLPA_NIalgo- rithm,thispaperusesreal-worldnetworksandsyntheticnetworks tocarryoutthecommunitydetectionexperiment.

We present the performance of LPA_NI on several real-world complexnetworkswiththeabsent ofground-truthcommunities:

Zachary’skarateclub[25],Footballnetwork[26],Dolphinsnetwork [28]andtheSinamicro-blogdatasetwhichusedinliterature[17].

And we present the performance of LPA_NI on Girvan–Newman benchmarkandLFRbenchmark[29].

Wecomparetheperformance ofLPA_NI withLPAandNIBLPA.

Where NIBLPA is an improved LPA algorithm choosing the label with the maximum k-shell value to assign the nodes. In order toovercometheinfluenceoftherandomnessoftheexperiments, all the algorithms are processed 50 timesand the averagevalue is used as the results, and the maximum number of iterations

of the algorithm is set to 100 times.All the algorithms are im- plemented in Java and tested in JDK8.0 platform, and simula- tions are carried out in a notebook PC withInter (R) Core (TM) CPU i5-4200 @ 2.50 GHz and 12 GB memory under Windows 10 OS.

4.1. Experimentdataset

ThedetailsoftheSinamicro-blogdatasetareasfollows:

(1) 63641 Sina Micro-blog users’ information, including uid, nickname, name, location, homepage url, gender, the number of fans, the number of concerns, the number of micro-blogs, the numberofcollectionsandcreationtime;

(2) 84168 micro-blogs’ information about 12 topics from 2014-05-03to2014-05-11,includingmid,releasetime,themicro- blog content, source, the number of forwarded micro-blog, the number of evaluated micro-blog, the number of praised micro- blog,publisheduseruidandtopicofmicro-blog;

(3) 12 topics includes Meizu, XiaoMi, Rockets, Jeremy Lin, Hengda,Korean,haze,house’sprice,MyOldClassmate,corruptof- ficialsandtransgenic;

(4)1391718users’relationships;

(5)27759forwardedmicro-blog’srelations.

62827users’ informationcontained inthe datasetcan restore thetrueSinamicro-blognetwork,and12themesofmicro-blogin- formationcanrestoretheflowofinformationinmicro-blogsocial network.Users’relationshipsandforwardedmicro-blog’srelations areenoughtorestoretherealtopologyofthemicro-blogSinanet- work.

Because the original dataset cannot meet the experimental requirements, it is necessary to preprocess the Sina micro-blog dataset.Specificoperationsareasfollows:

(1)Ifandonlyiftheusernodesaremutualconcerned,thereis an edge betweenthenodes,otherwise thereis noedge between thenodes;

(2)Removethediscretesmallcommunity(the totalnumberof nodesin communityislessthan 5), andobtaina network graph consistingof1220pointsand5023edges.

(6)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

LFR benchmark is presented by Lancichinetti et al. The LFR benchmark generates static networks with built-in community structure.Theconfigurationofgeneratednetworksdependsonvar- ioususer-specifiedparameters. Numberofnodesis N,k specifies theaveragedegree ofnodesandkmax istheupperbound onde- greesofnodes.The mixingparametermuisthefractionofedges thata node hastothenodes outsideof itscommunity.There, as we decrease

μ

we obtain a clearset ofcommunities withfewer numberofinter-edges.Later,theauthorsadapted theLFR bench- marktogenerateoverlappingcommunities.ParameterOnspecifies thenumberofoverlappingnodesandOm controlsthenumberof membershipofoverlappingnodes.Inthispaper,weset N=1000, k=15, kmax=50,

μ

=0.3 to generate LFR benchmark and set N=128,k=16,kmax=16,

μ

=0.1 togenerateGirvan–Newman benchmarktopresenttheperformanceofLPA_NI.

4.2. Evaluationcriteria

Themostpopular measureto comparethe similaritybetween thedeliveredcommunitystructureandtheground-truthcommu- nities isNormalized MutualInformation(NMI). Wehaveused an implementationofNMImeasuremadeavailable byMcDaidetal., forsetsofoverlappingcommunities[29].Further,weusemodular- ityvalueofobtainedcommunitystructureswhentheground-truth communitystructureisunknown.

Modularity is currentlywidely used in measuring the perfor- mance of network detection algorithms that proposed by New- manandGirvaninliterature[24].Moreover,whiletheunderlying classlabels oftherealnetwork are unknown,we canonly adopt the modularity asthe evaluationcriteriaon real networks. Fora dataset with no overlapping communities, the modularity is de- finedas:

Q

=

1 2m

i,jV

Ai j

d

(

i

)

d

(

j

)

2m

× δ(

ci

,

cj

),

(7)

where Q representsthe modularity,m representsthenumber of edges inthe network, and A isthe adjacencymatrix ofthe net- work, ifnode i andnode j are directlyconnected, Ai j=1; else- wise, Ai j=0. ci andcj, denotethe label of node i and node j, respectively,ifci=cj,thenδ(ci,cj)=1,elseδ(ci,cj)=0.

4.3. Experimentresult

Inthissection,therealdatasetisusedtotesttheeffectiveness ofLPA_NIcomparingwithtraditionalLPAandNIBLPA,andwetest thestability ofthe algorithms by analyzingthe fluctuation range ofthewholeresults.

Inordertocomparetheeffectivenessofthealgorithms,weuse theaveragemodularity Q tomeasurethequalityofdividedcom- munities, so the greater the Q value is, the more accurate the results ofcommunity detectionare, and thesmaller the fluctua- tionrangeofQ valueis,themorestablethecommunitydetection resultsare.NIBLPAandLPA_NIcontainonlyoneparameter

α

.And theparameter

α

isusedtomeasuretheextentoftheinfluenceof neighboringnodestonodei,sowelet

α

= {0,0.1,0.2· · ·,1}.More specifically,if

α

=0,theimportanceoftheadjacentnodesofthe node i has no influence,so the importance of node i isonly its ownimportance. And if

α

=1,theimportance ofadjacent nodes isaccumulatedinmultiplesof1toobtainthenewimportanceof nodei.TheexperimentalresultsareshowninFig. 5.

As it can be seen in Fig. 5, for the real dataset used in this paper, the variation trend of the Q value of LPA_NI is like bell shapedcurvewiththegradualincreaseofparameters

α

,andwhen

α

=0.4,the maximummodularity Q =0.6995 is achieved.More specifically,when

α

=0.4,theinfluenceofadjacentnodesonthe

Fig. 5.The comparison ofQunder different tunable parameterα.

Fig. 6.50 repeated experiments under parameterα=0.4.

node i isthe mostappropriate,andthe newnodeimportance in improving thecommunitystructuredetectionhasthe besteffect.

The Q valueofNIBLPAchangeslittleunderdifferentparameter

α

, and themaximum modularity Q =0.6197 is obtainedwhen the parameters

α

=0.5 and

α

=0.6. Experimental resultsshow that themodularityofLPA_NIishigherthanNIBLPAwhen

α

=0.4.And withtheimportanceofadjacencynodestotheinfluenceofnodei graduallyincreased, thequalityofcommunitydetectionofLPA_NI firstly increased, then gradually decreased. Therefore, the impact of an appropriate amount of the adjacent node importance will significantly improvethe accuracy ofLPA_NI, andLPA_NI can get anoptimalresultofcommunitydetection.

In order to analyze the stability of LPA_NI, NIBLPA and LPA, this paper analyzes the 11 experimental results of 50 repeated experiments withparameter

α

=0.4,the variation trendof each algorithmmodularityvalue Q isshowninFig. 6.

ItcanbeseenfromFig. 6that Q valuesofLPA_NIandNIBLPA are0.6995and0.6197inthe50repeatedtests.BecauseLPA_NIand NIBLPAimprove therandomness oftheupdate sequence andthe label selection process, they have goodstability. However, LPA is lessstablebecauseoftherandomnessofthealgorithm.Ingeneral, LPA_NI can getstable resultsthan LPA andeffective results than LPAandNIBLPA.

(7)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

Fig. 7.The number of iterations under different tunable parameterα.

Fig. 8.The running time under different tunable parameterα.

Inorder to analyze the efficiencyof LPA_NI, NIBLPA andLPA, Figs. 7 and 8show the numberof iterationand runningtime of thealgorithm,respectively.

Itcan be seen from Fig. 7that asthe parameter

α

increases, thenumberofiterations oftheLPA_NIare stablein7times.And NIBLPAhasalargerfluctuationonthenumberofiterations.Inthe earlyperiod,thenumberofiterations isaround8times,thenthe numberwillsurgeto17,21,13times.Experimental resultsshow that the number of iterations of the LPA_NI is significantly less thanNIBLPA,andtherunning timeisobviouslybetterthan other algorithms. Therefore, the node importance calculated by k-shell valuecan not fullydescribethe node influenceinthe large-scale socialnetworksowhen

α

0.8,thenumberofiterationsandrun- ningtimeofthe algorithmwill besignificantly increased. LPA_NI using the priori knowledge to calculate the node importance is more accurate, so the number of iterations and running time of LPA_NIwillbe less, andthe stabilityofLPA_NI isbetter thanthe otheralgorithms.

Then,wecomparetheperformance ofLPA_NI,NIBLPAandLPA onZachary’skarateclub,Footballnetwork,Dolphinsnetwork.The experimentalresultsareshowninTable 1.It’seasy toseeLPA_NI achievehigherQ thanotheralgorithmsondifferentnetworks.

Table 1

QobtainedbyLPA_NI,NIBLPAandLPAondifferentreal-worldnetworks.

Networks QLPA_NI QNIBLPA QLPA

Karate 0.4151 0.3944 0.3590

Football 0.5805 0.5762 0.5899

Dolphins 0.5143 0.5184 0.4797

Table 2

NMIobtainedbyLPA_NI,NIBLPAandLPAondifferentsyntheticnetworks.

Networks NMILPA_NI NMINIBLPA NMILPA

LFR 1 1 1

Girvan–Newman 0.9989 0.9989 0.9989

We alsoshow theperformance ofLPA_NI onLFR andGirvan–

NewmanbenchmarkinTable 2.WecompareourmethodwithLPA andNIBLPA.It’seasytoseeLPA_NIachievehigherNMIondifferent networks.

Insummary, onthe one hand,LPA_NI takesintoaccount that thenodeimportancemayimpactlabelupdatingprocess,soitwill let the more important nodes affect some less important nodes, anditcanreducethe“countercurrent”phenomenoninlabelprop- agationprocess.Ontheotherhand,LPA_NItakesintoaccountthat the effect of label influence on label selection. Therefore, it can improvetheaccuracyinlabelselectionprocess.Asitcanbe seen, LPA_NI cansignificantlyimprovethequalityofcommunitydetec- tion andshorten theiteration period.Also, ithasbetter accuracy andstabilityinthecaseofsimilarcomplexity.

5. Conclusion

Thispaperpresentsalabelpropagationalgorithmforcommu- nitydetectionbasedonnodeimportanceandlabelinfluence.The algorithmfirstly usesanovelmethodfornode importanceevalu- ation,andfixes thenode ordersoflabelupdatinginthedescend- ingorderofnodeimportance.Duringeachlabelupdatingprocess, whenthenumberofmultiplelabelsreachesthemaximumvalue, thispaperintroduces thelabelinfluenceintotheformulaoflabel updatingtoimprovethestability.Thisalgorithmmaintainsthead- vantages ofthe original LPA algorithm,andit can get stableand efficient resultsby avoiding therandomness in labelpropagation process.Byexperimentsonrealdataset,wedemonstratethat the proposedalgorithmhasbetterperformancethansomeofthecur- rentrepresentativealgorithms.

Uncitedreferences [18] [27]

References

[1] YoufangLin,TianyuWang,RuiTang,YuanweiZhou,HoukuanHuang,Anef- fective modelandalgorithmfor communitydetectioninsocialnetworks,J.

Comput.Res.Dev.49 (2)(2012)337–345.

[2] DayouLiu,DiJin,DongxiaoHe,JingHuang,JianningYang,BoYang,Community miningincomplexnetworks,J.Comput.Res.Dev.50 (10)(2013)2140–2154.

[3] M.Shiga,I.Takigawa,H.Mamitsuka,Aspectralapproachtoclusteringnumeri- calvectorsasnodesinanetworks,PatternRecognit.44 (2)(2011)236–251.

[4] LiangHuang,RuixuanLi,HongChen,XiwuGu,KunmeiWen,YuhuaLi,Detect- ingnetworkcommunitiesusingregularizedspectralclusteringalgorithm,Artif.

Intell.Rev.41 (4)(2014)579–594.

[5] R.Guimera,M.Sales-Pardo,L.A.Amaral,Modularityfromfluctuationsinran- domgraphsandcomplexnetworks,Phys.Rev.E70 (2)(2004)188.

[6] R.Guimera, Functional cartographyofcomplex metabolic networks, Nature 433 (7028)(2005)895–900.

[7] J.Duch,A.Arenas,Communitydetectionincomplexnetworksusingextremal optimization,Phys.Rev.E72 (2)(2005)986.

[8] ZhanBu,ChengcuiZhang,ZhengyouXia,JiandongWang,Afastparallelmod- ularity optimization algorithm (FPMQA) for community detection inonline socialnetwork,Knowl.-BasedSyst.50 (3)(2013)246–259.

(8)

1 67

2 68

3 69

4 70

5 71

6 72

7 73

8 74

9 75

10 76

11 77

12 78

13 79

14 80

15 81

16 82

17 83

18 84

19 85

20 86

21 87

22 88

23 89

24 90

25 91

26 92

27 93

28 94

29 95

30 96

31 97

32 98

33 99

34 100

35 101

36 102

37 103

38 104

39 105

40 106

41 107

42 108

43 109

44 110

45 111

46 112

47 113

48 114

49 115

50 116

51 117

52 118

53 119

54 120

55 121

56 122

57 123

58 124

59 125

60 126

61 127

62 128

63 129

64 130

65 131

66 132

[9] S.Zhang,H.Zhao,Normalizedmodularityoptimizationmethodforcommunity identificationwithdegreeadjustment,Phys.Rev.E88 (5–1)(2013)471.

[10] M.Girvan,M.E.J.Newman,Communitystructureinsocialandbiologicalnet- works,Proc.Natl.Acad.Sci.USA99 (12)(2002)7821–7826.

[11] F.Wu,B.A.Huberman,Findingcommunitiesinlineartime:aphysicsapproach, Eur.Phys.J.B-Condens.Matter38 (2)(2004)331–338.

[12] U.N.Raghavan,R.Albert,S.Kumara,Nearlinear-timealgorithmtodetectcom- munitystructuresinlarge-scalenetworks,Phys.Rev.E76 (3)(2007)036106.

[13] G.Steve,Findingoverlappingcommunitiesinnetworksbylabelpropagation, NewJ.Phys.12 (10)(2010)2011.

[14] M.J.Barber,J.W.Clark,Detectingnetworkcommunitiesbypropagatinglabels underconstraints,Phys.Rev.E80 (2)(2009)026129.

[15] I.X.Leung,P.Hui,P.Lio,J.Crowcroft,Towardsrealtimecommunitydetection inlargenetworks,Phys.Rev.E79 (6)(2009)066107.

[16] XianKunZhang,XueTian,YaNanLi,ChenSong,Labelpropagationalgorithm basedonedgeclusteringcoefficientforcommunitydetectionincomplexnet- works,Int.J.Mod.Phys.B28 (30)(2014).

[17] XianKunZhang,SongFei,ChenSong,XueTian,YangYueAo,Labelpropagation algorithmbasedonlocalcyclesforcommunitydetection,Int.J.Mod.Phys.B 29 (5)(2015).

[18] J.Xie,B.K.Szymanski,LabelRank:astabilizedlabelpropagationalgorithmfor communitydetectioninnetworks,Netw.Sci.Workshop(2013)138–143.

[19] Y.Xing,F.Meng,Y.Zhou,M.Zhu,M.Shi,G.Sun,Anodeinfluencebasedla- belpropagationalgorithmforcommunitydetectioninnetworks,Sci.WorldJ.

2014 (5)(2014)627581.

[20] J.Sohn,D.Kang,H.Park,B.G.Joo,I.J.Chung,AnImprovedSocialNetworkAnal- ysisMethodforSocialNetworks,Springer,Netherlands,2014,pp. 115–123.

[21] L.Šubelj, M.Bajec,Groupdetectionincomplexnetworks:analgorithmand comparisonofthestateoftheart,Phys.A,Stat.Mech.Appl.397 (3)(2014) 144–156.

[22] O. Green,D.A.Bader,Faster betweennesscentralitybased ondatastructure experimentation,Proc.

Referensi

Dokumen terkait

Through making an analysis of behaviors of users in sina microblog including forwarding messages, commenting on messages and replying messages, this paper constructs three factors

The focused distortion region (FDR) is a region in the face that is used to measure changes of facial distortion. facial distortions) during yawn, was identiied based on

The studies of node behavior have helped many researchers to understand many research problems such as the design of fault tolerant routing protocol and analysis of

MATLAB Simulink will be used to simulate the shunt active power filter (SAPF) control algorithm based on neural network.. 2 Multi-layer

We presented a novel method for the contour detection around the cervical organ in CT images classified with artificial neural network ( ANN) which is used to

2.1 Floor Detection The objective of this work is to improve the performance of pedestrian detection algorithms by using context information of presence of floor area below each

Genetic Algorithm based Chain Leader Election in Wireless Sensor Network for Precision Farming ABSTRACT Wireless Sensor Network WSN is one of the commonly used technologies in

Venn-diaNet showed that the seeds from the segments of Venn diagram and the results of the network propagation with PPI network are effective enough to prioritize genes without