A Study on Link Prediction Algorithm Based on Users' Privacy Information in the Weighted Social Network

(1)

A Study on Link Prediction Algorithm Based on Users' Privacy Information in the Weighted Social Network

Jian Zhang^*, Changlun Zhang

School of Science, Beijing University of Civil Engineeringand Architecture, Beijing 100044, China

*E-mail:zhangjian@bucea.edu.cn

Abstract—With the rapid development of information technol- ogy, more and more people join in social network. People are willing to share their information in the network to develop their contacts. Social network is reflection of real social rela- tions. Many scholars have shown keen interest in this field.

Link Prediction Algorithm is one of the important research direction. Interaction between users and the privacy infor- mation existing among individual users greatly affects the accuracy of link prediction. This paper firstly does network weighted processing by using some interactive behavior char- acteristics of social network users. Then through considering the user's privacy information and analyzing the users' interest preference, the paper provides a weighted directed network link prediction algorithm. Finally, the simulation experiment shows that this algorithm has high prediction accuracy.

Keywords: weighted social network, privacy information, link prediction

I. INTRODUCTION

Social network refers to a relatively stable relationship system between individual members formed because of their interaction. It focuses on the interaction and contact between people ^[1]. The establishment of a social network makes it convenient for the exchange of ideas and sharing of information among users. Some well-known social network sites like Facebook, Twitter abroad and Sina Weibo and Tencent Weibo at home can not only provide a platform for people's leisure and entertainment, but also enhance communication and mutual understanding between people.

In social network, each user node equals to an information channel which can help users to release personalized information freely and also focus on some related users according to their own interests and set up their own social relation circles. But because social networks tend to have hundreds of millions of user nodes, when the user sets up his own social relations, he may be faced with data overload problem. Therefore, it's often very difficult to help users find the other users they may be interested in in a social network.

Link prediction can effectively solve this problem. Social network link prediction is to help the users to predict the other users whom they may be interested in according to the characteristics of the structure of social networks and network users' existing information, and then recommend those relevant users it predicts to the target users. Social network perfectly reflects the user's social relations, and in the social network, there exist a lot of interactions between users which better reflect the intensity of the relationship. This paper

network edge and puts forward link prediction algorithm based on users' behavior information so as to improve the accuracy of link prediction.

II. RELEVANT WORK

Link prediction in the network refers to how, through some known information, such as network structure, to predict the possibility for two not-yet-evened nodes to be evened ^[2].It includes both the prediction of unknown links and the future links. Link prediction originated from the research of complex networks, and with the rapid development of Internet and the emerging of social networks, link prediction has also been used in social network to predict the other users that may be of interest to the users, so that they can build up their own social networking circle and gain access to more information they might be interested in.

In 2007, Liben - Nowell et al.^[3]became the first to put forward the earliest social network link prediction model.

The model predicts the links of the two node edges through calculating the topology structure similarity of the two nodes in social network. Subsequently, scholars have researched on this in succession and different types of link prediction models have been formed. These models can be roughly classi- fied into three types: link prediction based on node properties;

link prediction based on network topology structure; link prediction combined with node properties and network topology structure.

To certain extent, social network can reflect the reality of users interpersonal relations , while in reality the relationship between people may be good or bad. Therefore, the relationship between two users reflected in the social network may be strong or weak. The relationship strength can be indicated by the weights in the network edge and at present, many scholars begin to study link prediction method in weighted social network. In the weighted social network, the even edge of the nodes indicates the relationship between the two users, while the edge weights indicate the intensity of the relationship, and the calculations of the intensity of the relationship are mostly expressed by the relevant information between them.

Literature[4-7] calculate the user relationship intensity by analyzing the relevant information between users, the similarity between users and the network topology structure analysis technology ,Literature [8] measures the user relationship intensity by users' interactivity. Literature [9] calcu- lates the users' relationship intensity according to the similarity between users and the users' interactivity combined. Lit- 2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)

(2)

sion of the link prediction is improved. In 2014, Yu et al.^[12]

while studying link prediction in the weighted social network, weighted the even edges between users using the network node degree information and at the same time they scored the even edges between nodes according to the local structure characteristics of the network and then applied the scoring method into the micro-blogging network, and has achieved better prediction results.

At present, when weighting the even edge weight assignment of the nodes by link prediction algorithm in the weighted social network, most researchers only consider the node degree information in the network. As we know, social network is perfect reflection of the real personal social relationship. The weighting of the even edge of nodes reflects the intensity of the relationship between users. In social network, therefore, there exist lots of interactions between users.

So when doing the weight assignment, we should consider using the behavioral relationship between users because it can better reflect the relationship between them, the intensity of the user relationship so gathered may be more accurate and the performance of the link prediction method acquired in this weight assignment will be further improved.

III. THEFORMATION OF THEWEIGHTED NETWORK

As a AC type of users, social network updates large amount of information every day. Interaction happens every moment of the day, such as the state information released by users, comment information between users, the information situation relayed by users, etc.

Frequent interaction between users may indicate that they are interested in the exchange of information or they have good relationship. Users' behavioral relationship shows the degree of preference of users to certain users' information and the intensity of the relationship between users. Analysis of the behavioral relationship between users can better find certain interest preference of users which can help to predict friends whom the users may be interested in so as to recommend friends.

For example, figure 1 provides data of some behavioral relationship, the first two rows, listed is user ID (e.g.

1000016 and 1204954 in the second line), the three rows listed after are the user's behavioral relationship (e.g. 1,5,0 in the second line). Take the data of users' behavioral relationship in the second line. For example, the digit 1 refers to the number of times user 1000016 reminds user 1204954 to notice him, the digit 5 refers to the number of times user 1000016 relays the information of user 1204954, the digit 0 refers to the number of times user 1000016 comments on user 1204954. It reflects the degree of the users' attention of certain information and the intensity of the relationship between the users.

Figure 1. he data of the user action

Fig. 1 the data of the user action analyses the behavioral relationship between users, thus abstains the intensity of relationship between users and further forms weighted social networks for the even edge of the network assignment.

Therefore, firstly, we will analyze network from the following three aspects:

A. Nodal Analysis

For a particular user

u

, we will form a basic information tag according to the users' required basic information (such as personal information, education background, profession and personal labels, etc.) extracted according to the preset information types. For the interaction information between users (such as relayed information, comment information, interactive information, etc.), we form a special users' information tag using information gaining feature selec- tion algorithm to extract important word segmentation tags from the information to form a special user information tag.

Combining the above two information tags, the user's personal characteristic information vector is formed. The extraction of users' information tags is shown in figure 2.

user User

Basic Information Tag Special Information Tag

Userÿs Background Information (personal information, education background, occupational information,

etc)

Interaction Information between Users (released information,

relayed information, commtend information) Information Tag

Classification

Information Tag Source

Figure 2. the user information tag extraction

B. Relationship Analysis

In social network, the interaction between users is em- bodied in their comment information and information relay, etc. The more a user relays another user's information, or the more frequent a user comments on another user, the stronger the relationship between the users is indicated. Therefore, when using the intensity of relationship to describe weights of even edges of the nodes, we only need to consider the number of comments and relays between users.

As in Fig.3suppose there exist even edges between Us- er

u

,and User

i

, the number of times User

u

comments on User

i

is

m

₁,the number of times User

u

relays User

i

's

(3)

information is

m

₂then the weight of the even edges between the two users is:

1 2

[0,1]

ui ui

m m MIN

w w

MAX MIN

(1)

Among this,

MAX MIN ,

indicates the maximum and the minimum number of times node

u

comments on and relays information of surrounding nodes.

C. Network Analysis

There are manynetwork nodes in the network. It is obvi- ously too complicated to attempt to go through all the nodes to analyze network. In this paper, certain nodes are chosen at random in network to form a network local structure. The local structure characteristics of the network nodes are taken into consideration.

IV. THE WEIGHTED NETWORK LINK PREDICTION

ALGORITHM(WLPA)

We reflect the social network into a directed graph,and define this weighted directed graph as

G V E W ( , , )

,top

V

represents the user in social network. The even edge

E

between the two tops represents the relationship between us- ersthe weight of the even edges represents the degree of intensity of their relationship.

Rules.: the directed edge

( , ) u v E

indicates that node

u

is connected with node

v

and points to node

V

, the weight

( , )

w u v W

indicates the weight of the even edge between node

u

and node

v

, the neighboring set of node

u

is

^ `

( ) | ( , )

out

u v V u v E

*

,the out-degree is

out

( ) u

*

,and the in - degree of node

u

is

*

_in

( ) u

,

^ `

( ) | ( , )

in

u v V v u E

*

.

Suppose node

u

is the target node and it is estimated that there is no possibility for the connection of the even node and the target node in the weighted directed network, the specific method is divided into the following four steps:

Analyzing the relations between the node users and look- ing for the candidate nodes;

Analyzing users' privacy information and calculating the similarity of users' information;

Comprehensively considering the user's behavior relationship and privacy information and putting forward the weighted social network link prediction model.

A. Establishing Candidate Node Set

Reference [13], first finds all the similar nodes in the target nodes and then finds all nodes possible for connection in the target nodes according to the similar nodes.

similar node set

1 2 3

( ) ( ) ( ) ( )

S u S u S u₂₂₂₂( )( )( ) S u₃₃₃( )(( Among this:

As directed network in Fig.3, assuming

u

₁as target node, then:

^ `

^ ` ^ `

^ `

1

1 1 2

2 1 3

( )

3 1 4

( )

1 2 3 2 3 4

( ) ( )

( ) ( ) ( ) ( ) , ,

out

out out

out

v u

in

v u

S u u u

S u v u u

S u S u S u S u u u u

*

2 3

^ ^

2^{( )}^{( )}^{( )} 33^{( )}^{( )}^{( )}

^ ^ ^

2 33

2^{( )}^{( )} 3^{( )}⁽

^

^ `

^

1

^

2

^

( )1

( )

out ut

out(

^^

( )

*

^^

u ( )

^

( )11

^

( )1

( )

out

out( out

^^

( )

*

^^

u ( )

( )1 out

Figure 3. directed and weighted network

Figure 4. similar node set and candidate node set

B. Candidate Node Set

Candidate nodes confirmed as target nodes through nodes directed by similar nodes is defined as following:

(3)

then the candidate node

u

₁of is:

^ `

1 2

( ) 3

( )

( ) ( )

out

out out

out

v u

in

v u

S u u

S u v u

*

^

( )

out out out(

^

( ) ( )

out

^

u ( )

*

( )

^

( )

out

out in out(

^

u ( )

*

( )

out( )

( )

_out

( )

_out

( )

v S u

C u v u

* *

( )

out

( ) ( ) ( )

o S(

( )

(4)

^ ` ^ ` ^ ` ^ ` ^ `

1 1

( )

3 5 2 6 7 2 3 5 6 7

( ) ( ) ( )

, , , , ,

out out

v S u

C u v u

u u u u u u u u u u

* *

^ ` ^

5 2 6 7

^ ` ^

u u u u66 7

^ ` ^

55 22 66 7

^ ` ^

u u u u666 7

^ ` ^

( )

out( ) o

S(

( ) ( ) ( )

C. Calculating Scores of Nodes

In social network, the relationships between users are mostly one-way. If a user

u

shows concern for another user

i

˄i.e. node

u

connects with node

i

˅that means user

u

hopes to learn user

i

's dynamic activity information (that is, user

u

obtains information he is interested in).Therefore, when studying link prediction of the social network, we can consider predicting for the user the potential user who is likely to connect by analyzing his interest preference.

Calculation of information similarity

Suppose a node

u

in social networkˈit's characteristic information tag is

i i

_{1 2}

, , , , , i i

_k_k_k, in order to protect the securi- ty of the user's privacy information, we abstract all the information tag of the user into a series of number at random

1

, , ,

2 _k

n n n

g

, n

_k

,

, then we form the user's characteristic information vector quantity

profile u ( ) n n

1

, , ,

2

, n n

_kk

.

As each user has different interest preference, the degree of interest to the same information is different. Then the weight of each information tag differs in the heart of the user assuming

p

_iindicates the weight of information tag No.

i

, we define the user's information vector quantity as

1 2

int erest u ( ) p p , , , , p p

_kk

.

Considering the influence of interaction between users and the surrounding nodes on users' interest preference, we define user

u

's interest preference vector as

1

,

2

, ,

u u u uk

Z Z Z , , Z Z

uk_u

while

Z

_uiindicates the proportion of preference user

u

shows on information No.

i

1 2 3

ui

u

W W W

Z M

⁽⁴⁾

where

1

1 1 1

1

1 1

1

( , )

n

S k k

k

W ¦ p w u S

2

2 2 22 22 2

2

2 1 1 2

1

( , ) ( , )

k k

n

S k S S k

k

W ¦ p w u S w S S

3

3 3 3 13 3

3

3 1 3 1

1

( , ) ( , )

k

n

S k k S k

k

W ¦ p w u S w S S

n

l

is No.

l

layer of similar node set of user

u

.

S

_lcon- tains the number of information

i

Slkl

p

is the proportion of

information

i

in nodeS_{l kl} _,l 1, 2,3,

w ( , )

is the weight of edge of related nodes.

Now we suppose both user

u

and user

i

have

n

similar information tags and their proportion of interest preference are

Z Z

_u

,

_i, then the definition of the degree of information similarity between user

u

and user

i

is

¹

2 2

1 1

( , ) cos ,

n uk ik k

u i n n

uk ik

k k

Sim u i

Z Z Z Z

Z Z

¦

¦ ¦

Proportion of compute node degree

In the social network, node degree is used to indicate the frequency of the interaction between users. In this paper the definition of node degree is the total sum of all similar nodes degrees. when calculating the proportion of user node degree, we take account of all the candidate nodes in the target nodes and all similar nodes of candidate nodes. its degree of proportion computation formula is as follows

( )

( , )

ⁱ

i i C u

Deg u i k

k

¦

where

u

is the target nodeˈ

i

is any candidate nodeˈ

k

iis the number of all similar nodes of node

i

ˈ

C u ( )

is candidate node set of node

u

.

The compute node scores

Considering the influence of users' privacy information and users' node degree information on the link prediction results, the final formula reached after weighting is

( , ) * ( , ) (1 )* ( , ), 0 1

Score u i p Sim u i p Deg u i d dp D. Predict Nodes Most Likely to be Linked

For target node

u ,

calculate the scores of all candidate nodes using formula (6) , then range it from big to small, and finally predict nodes most likely to be linked for the target nodes using Top - K method.

V. NUMERIC SIMULATION EXPERIMENT

Using Tencent Weibo data (2012KDD competition data) , the simulation was carried out on the link prediction algorithm (WLPA) proposed. For briefness, this paper used pre- treatment for the original data, randomly selected 13258 nodes. The average node degree of user node is 25, user node degree distribution is shown in Fig. 5.

(5)

Figure 5. Fig. 1.the distribution graph of user node degree

A. Evaluation Index

The merits of the model is measured by the three indexes:

definition precision, recall rate and F1 measurement .Detailed description is as follows

x Precision

Precision is the ratio of A to B, where A represents the number of users who have been predicted to have become good friends, and B represents the number of friend users predicted.

x Recall Rate

Recall rate is the ratio of A to B, where A represents the number of users who have been predicted to have become good friends, and B represents the number of friend users in the test data.

x F1 Measurement

1 2 Precision Recall F Precision Recall

u u

B. The Results of Simulation and the Analysis

Discussion of the influence of Parameter P on the predicting results

With particular regards to weighted network link prediction algorithm (WLPA) which considers the two respects of users' privacy information and user node degree. The paper simulates the impact on the predicting results respectively from the user node degree information, privacy information of user nodes and the two kinds of information (WLPA) combined. The simulation results are shown in Table 1.

TABLE I. THE NODE OF LINK PREDICTION RESULTS

Prediction based on user node privacy information (p=1)

Prediction based on node degree infor-

mation (p=0)

WLPA (p=0.7)

Precision 0.0046 0.0027 0.0058

Recall 0.0087 0.0037 0.0096

F1 0.0060 0.0031 0.0072

It can be seen from table 1 that, compared to the prediction model which only considers the information based on node degree and the prediction model which only considers user nodes' privacy information, the accuracy of the WLPA predicting results has greatly improved. As for accuracy rate and F1measurement, WLPA has approximately doubled those of the predicting results based on the node degree information model and the recall rate has increased by about three times, while it is similar to the predicting results based on user node privacy information model. This shows that network users' privacy information plays an important role on the prediction whether the two users will connect. From the user's privacy information perspective, the higher the similarity of the two users' information, the closer the information interest preference of the two users, and the more likely the two users may connect.

On the other hand, though from the predicting results, WLPA is similar to privacy information based prediction model, as for the three indicators: accuracy, recall rate, and F1 measurement, WLPA has increased respectively by about 25%10%20%.

The above prediction results show that the user's node degree also has great influence on prediction of nodes. This is because the greater the user node degree, the more other

users he shows concern for, and the more user information he produces, which may lead to wider interest preference of the user, and chances will increase for other users to have the same interest preference which in turn leads to more chances for other users to connect with him, thus affects the prediction results.

TABLE II. THE EFFECT OF P

p Precision Recall F1

濃澳 0.0027 0.0037 0.0031

濃濁濄澳 0.0033 0.0049 0.0039

濃濁濅澳 0.0037 0.0056 0.0045

濃濁濆澳 0.0041 0.0063 0.0050

濃濁濇澳 0.0047 0.0075 0.0058

濃濁濈澳 0.0049 0.0081 0.0061

濃濁濉澳 0.0054 0.0089 0.0067

濃濁濊澳 0.0058 0.0096 0.0072

濃濁濋澳 0.0053 0.009 0.0067

濃濁濌澳 0.0050 0.0088 0.0064

濄澳 0.0046 0.0087 0.0060

(6)

F1 are also increasing, when p = 0.7, the model prediction achieves optimal results.

We know that the higher the user node degree, the more users he focuses on, the more frequent the interaction between the users, and the more user information he produces, which may lead to wider interest preference of the user, thus there will be more users with the common interest preference and chances will increase for other users to connect with him, and he will produce more private information. The similarity between his and other users ' private information will increase, thus it may be more possible for the two users to connect. Therefore, as the predicting model which comprehensively considers information based on node degree and user node privacy information, it (WLPA) has higher fore- casting performance for friends prediction, and can help users to predict friend- may-be users with higher satisfaction.

Analysis of link prediction algorithm results

Comparing the link prediction algorithm we put forward with the existing classic link prediction algorithm, the result is shown in Table 3.

TABLE III. THE COMPARISON OF DIFFERENT PREDICTION ALGORITHMS

Link Prediction

Algorithm Precision Recall F1

WCN 0.0047 0.0086 0.0061

WAA 0.0043 0.0083 0.0057

WRA 0.0048 0.0089 0.0062

BrCN 0.005 0.0088 0.0064

BrAA 0.0046 0.0087 0.0061

BrRA 0.0051 0.0092 0.0067

WLPA 0.0058 0.0096 0.0072

As can be seen from Table 3, the accuracy of weighted network link prediction algorithm (WLPA) put forward in this paper has increased compared with all the other models.

The estimated result shows link prediction algorithm (WLPA) put forward based on comprehensive consideration of user privacy information and user node degree information is comparatively in accordance with the real situation of the social network, and the prediction results come more in line with the requirements of users.

VI. CONCLUSION

Considering the social network users' behavior information, this paper proposes a link prediction algorithm based on privacy protection in weighted social network (WLPA) and applies it into Weibo friend forecast and preferable prediction results have been achieved.

ACKNOWLEDGEMENTS

This work was supported by scientific research fund pro- ject in Beijing University Of Civil Engineering And Archi- tecture (KYJJ2017035).

REFERENCES

[1] Manisha P , Rushed K . Link Prediction in Complex Networks, Ad- vanced Methods for Complex Network Analysis [J]. 2016:58-97.

[2] Linyuan Lv. Link Prediction on Complex Network. Journal of Uni- versity of Electronic Science and Technology of China, Vo1.35ˈ Noˊ5ˈ651-661ˈ2010ˊ

[3] Liben-Nowell D, Kleinberg J. The link-prediction Problem for social networks [J]. Journal of the American society for information science and technology, 2007, 58(7): 1019-1031.

[4] Shao C , Duan Y . Attractive density: A new node similarity index of link prediction in complex networks[C]// 2015 5th International Con- ference on Information Science and Technology, ICIST 2015. IEEE, 2015.

[5] LesKovec J, Huttenlocher D, Kleinberg J. Predicting positive and negative links in online social networks. Proceedings of the 19th In- ternational Conference on World Wide Web(WWW’ 10). Raleigh, 2010: 641-650.

[6] Pei P , Liu B , Jiao L . Link prediction in complex networks based on an information allocation index[J]. Physical A: Statistical Mechanics and its Applications, 2017, 470:1-11.

[7] Valverde-Rebaza J C , Lopes A D A . Link Prediction in Complex Networks Based on Cluster Information[J]. Advances in Artificial In- telligence - SBIA 2012: 92-101.

[8] Kahanda I, Neville J. Using transactional information to predict link strength in online social networks. Proceedings of the ICWSM’09.

San Jose. USA. 2009.

[9] Xiang Rongjing, Neville J, Rogati M. Modeling relationship strength in online social networks. Proceedings of the WWW2010. Raleigh, North Carolina, USA, 2010: 981-990.

[10] Lü L, Zhou T. Link prediction in weighted networks: The Role of Weak Ties[J]. Europhys Lett, 2010, 89(1): 18001.

[11] Zhijie Lin, Yun Xiong and Yangyong Zhu. Link prediction using BenefitRanks in weighted networks. International Conferences on Web Intelligence and Intelligent Agent Technology,2012: 423-430.

[12] Yan Yu, and Xinxin Wang. Link Prediction in Directed Network and Its Application in Microblog. Hindawi Publishing Corporation Math- ematical Problems in Engineering Volume 2014.

[13] Zheng D.F, Trimper S, Zheng B, et al. Weighted scale-free networks with stochastic weight assignments. Phys. Rev. E,2003.

67(4):040102(1-4).

[14] Tan F , Xia Y , Zhu B . Link Prediction in Complex Networks: A Mutual Information Perspective[J]. PLOS ONE, 2014, 9(9): e107056.

A Study on Link Prediction Algorithm Based on Users' Privacy Information in the Weighted Social Network