Enhancing Extraction Method for Aggregating Strength Relation Between Social Actors

(1)

 All sources 33  Internet sources 21  Own documents 10  Organization archive 2 [0]  https://link.springer.com/chapter/10.1007/978-3-319-57261-1_31

13.1% 39 matches

[1]  https://www.springerprofessional.de/en/e...relation-be/12204946 11.7% 37 matches

[2]  "3028-3639-1-RV.pdf" dated 2017-10-30 5.1% 18 matches

[3]  "CR-INT110-Semantic interpretation ...ot; dated 2017-10-09 3.3% 12 matches

[4]  "CR-INT136-Social network extractio...ot; dated 2017-10-09 3.8% 15 matches

[5]  "CR-INT137-Enhancing to method for ...ot; dated 2017-10-09 3.0% 12 matches

[6]  https://www.researchgate.net/publication...etween_Social_Actors 3.6% 8 matches

[7]  "CR-INT135-Information Retrieval on...ot; dated 2017-10-09 2.9% 13 matches

[8]  https://link.springer.com/chapter/10.1007/978-3-319-67621-0_20 3.4% 13 matches

[9]  https://archive.org/stream/arxiv-1212.4702/1212.4702_djvu.txt 2.7% 9 matches

[10]  https://arxiv.org/pdf/1303.3964.pdf 2.1% 7 matches

[11]  https://rd.springer.com/content/pdf/10.1007/978-3-319-05476-6_9.pdf 1.9% 6 matches

[12]  www.emeraldinsight.com/doi/citedby/10.1108/09696470510611384 1.7% 5 matches

[13]

 www.springer.com/us/book/9783319572604 1.7% 5 matches

 1 documents with identical matches [15]  "3472-4529-1-SM.pdf" dated 2017-10-30

1.4% 5 matches

[16]  dblp.uni-trier.de/db/conf/csoc/csoc2017-1 1.6% 4 matches

[17]  www.academia.edu/3144197/Simple_Search_Engine_Model_Adaptive_Properties_for_Doubleton 0.8% 3 matches

18.5%

Results of plagiarism analysis from 2017-12-28 15:58 UTC

31 - Enhancing Extraction Method for Aggregating Strength Relation Between Social Actors.pdf

(2)

[18]  https://www.researchgate.net/profile/Mahyuddin_Nasution 1.1% 3 matches

[19]  dblp.uni-trier.de/pers/n/Nasution:Mahyuddin_K=_M= 1.3% 4 matches

[20]  www.yasni.de/mariati mohd zin/person information 1.0% 4 matches

[21]  it.usu.ac.id/index.php/penelitian-pengabdian/publikasi/49-daftar-publikasi-tahun-2017 1.1% 3 matches

[22]  https://www.researchgate.net/profile/Mah...citationCount&page=1 1.1% 3 matches

[24]  "ICIC_2017_paper_144.pdf" dated 2017-09-11 0.4% 1 matches

[27]  https://patents.google.com/patent/US20090287685A1/en 0.3% 1 matches

[28]

 https://www.researchgate.net/publication..._Kesehatan_Indonesia 0.4% 1 matches

 1 documents with identical matches

[30]  "PTUPT_Sistem_Peringatan_Dini_Kebakar.pdf.pdf" dated 2017-09-09 0.3% 1 matches

[31]  https://www.researchgate.net/publication...ed_on_Indonesian_NLP 0.3% 1 matches

[32]  aasec.conference.upi.edu/2017/ 0.3% 1 matches

[33]

 "16. IOP.pdf" dated 2017-12-07 0.3% 1 matches

 5 documents with identical matches

[39]  https://books.google.co.uk/patents/US20150186789 0.2% 1 matches

10 pages, 3520 words PlagLevel: selected / overall

(3)

Data policy: Compare with web sources, Check against my documents, Check against my documents in the organization repository, Check against organization repository, Check against the Plagiarism Prevention Pool

Sensitivity: Medium Bibliography: Consider text

(4)

--Enhancing

Enhancing Extraction

Extraction Metho

Metho d

d for

for Aggregating

Aggregating

[2121]

Strength

Strength Relation

Relation Bet

Betw

ween

een So

So cial

cial Actors

Actors

Mahyuddin K.M.[0] Nasution(B) _{and Opim Salim Sitompul}

Information Technology Department, Fakultas Ilmu Komputer Dan Teknologi Informasi (Fasilkom-TI), and Information System Centre,

Universitas Sumatera Utara, 1500 USU, Medan, Sumatera Utara, Indonesia

[email protected]

[00]

Abstract.

Abstract. There are diﬀerences in the resultant of extracting the rela-tions between social actors based on two streams of approaches in

prin-[0]

ciple. However, one of the methods like the superﬁcial methods can upgraded to make the information extraction by using the principles of

[0]

the other methods, and this needs proof systematically.This paper serves to reveal some formulations have the function for resolving this issue.

Based on the results of experiments conducted the expanded method is

[0]

the adequate.

[00]

Keyw

Keywords:ords: Search engine

·

Search term

·

Query

·

Social actor

·

Singleton ·Doubleton

1

1 Intro

In

tro duction

duction

Extractingsocial network fromWeb has carried out with a variety of approaches

ranging from simple to complex [1]. Unsupervised method or superﬁcial method generally more concise and low cost, but only generates the strength relations between social actors from heterogeneous and unstructured sources such as the W

e b [2]. Instead, supervised methods are generally more complicated and high

cost and it produces labels of relationship between social actors, but it came from sources, homogeneous and semi-structured like corpuses [3 4, ]. However, to

generate social networks that enable to express semantically meaning is not easy

[5]. This requires a method to represent their privilege of both methods: An

approach is not only produces a relationship but re-interpret the relationship based on the aggregation principle.[4] This paper aimed to enhance the superﬁcial method for extracting social network from Web.

2

2 Problem Definition

Problem

Definition

[55]

The initial concept semantically of the extraction of social network from Web

is to explore a series of names through co-occurrence using search engine [6 7, ].

Then, the extraction of social network made possible by involving the occurrence. Formally, the following we stated extracting social networks [8 9, ].

c

[_0] _{Springer International Publishing AG 2017}

R.Silhavy et al. (eds.),Artificial Intelligence Trends in Intelligent Systems,

(5)

(6)

Enhancing Extraction Method for Aggregating Strength Relation 313

Occurrence and co-occurrence individually are a query ( ) representing aq

social actor and a query representing a pair of social actors. On the occurrence,

q contains a name of social actor, for exampleq =”Mahyuddin K. M. Nasution”. While on the co-occurrence, q contains two names of social actors, for example

q = ”Mahyuddin K. M. Nasution”, ”Shahrul Azman Noah” [2]. Therefore, names of social actors are the search terms, and we deﬁne it formally as follows

De

Definition 2.inition 2. A earch ermstk consists of words or phrase, i.e.t tk = {wk|k =

1, }.. . . , o

We use the well query to pry information from the Web by submitting it to

search engine. A search engine works on a collection of documents or web pages, or more precisely as follows [10].

[9]

De

Definition 3.inition 3. Ω is a set of web pages indexedsearch engine, if there are a table

relation of(ti, j) such thatω Ω = ({ t, ω)ij}, here ti is search terms and w ωj is a

page that is indexed by search engine contains at least one occurrence of tx,

(7)

(8)

314 M.K.M. Nasution and O.S. Sitompul

As information of any social actor, the singleton is the basic of search engine property that statistically related to the social actor. In this case, the singleton be the necessary condition for gaining the information of social actor from Web

although it contains connatural trait (bias and ambiguity), and naturally it becomes the social dynamic of human beings [2]. Hit count is main information

for a social actor based on Web, and validation of this information can obtained by crawling one after one the snippets list returned by the search engine [11].

De

proved that w has the character, i.e. the relative probability of w

p w( ) | |w =

hit count of doubleton as follows

|Ωx∩Ωy|=



Ω

(Ωx(tx ∧ty)∩Ωy(tx ∧ty)) = 1) . (3)

(9)

(10)

Lemma

Lemma 2.2. If w is a token in LD as list of snippets based on doubleton, then w

statistically has the character in the doubleton.

Proo f. Similar to Lemma 1, and based on Deﬁnitions 5 and 6, e ave hew har-h t c acter of w in the doubleton as follows

pD(w) =

| |w

|Ωx ∩Ωy|

∈[0,1], (4)

where |w| Ωx∩Ω≤y| and |Ωx| ∩Ωy| = . 0

Fig.

Fig. 1.1. Type of snippets based on co-occurrence (Google search engine)

As information of the relations between social actors, the doubleton naturally

be basic for reﬁning the information about a social actor where one of search

terms be a keyword for other. Therefore, this is suﬃcient condition for

nating the connatural trait of the singleton. The snippets of doubleton, however

naturally showed the diﬀerent kind of information of relations. We conclude that

(11)

(12)

of three (triple) dots between two names of social actors. Triple dots naturally is a word in text. The direct relations represented by direct co-occurrences like co-author, but the indirect relations represented by indirect co-occurrences such as citation or present on same event.

3

3 The Prop

The

Prop osed

osed Approach

Approach

The method of extracting information from Web recognized as the superﬁcial method, categorized in unsupervised stream, involving a search engine to obtain the information like the hit counts used in computation [12]. Generally, for gen-erating relation between actors applied the similarity measurement [13].

[44]

De

Definition 7.inition 7. rs ∈R is the strength relation between two social actors a, b ∈A

if it meets the comparison among the diﬀerent information of twoactors (aa and

b

b) and the common information of them (aa∩bb) in the similarity measurement.

Or sr=sim(aa b, ,b aa∩bb) in [0 1], , aa∩bb≤aa and aa∩bb≤bb.

Suppose we use Jaccard coeﬃcient, we possess sr based on hit counts

sr= |Ωa∩Ωb|

Lemma 3.3. If ir is a indirect relation between two social actors a, b ∈ A, then

irstatistically has the character in the doubleton.

Proo f. Suppose the indirect relationsir can be recognized in each snippet based on doubleton, we have number of the indirect relations in the snippets list based

on doubleton or |ir| |, ir| = number of snippets contain triple dots. Therefore, we generate the character of ir as follows

p(ir) | |ir=

Propositionosition 3.3. If sr is a strength relation between two social actors a, b ∈ A,

then the aggregation of sr consists of three binderies.

Proo f. Suppose ( )p ir ∈ [0,1] (Eq. (6)) as probability of the indirect relation

based on doubleton, then probability of the direct relation ( ) based on dou-dr

bleton is as follows

(13)

(14)

characteristics are (p ir) nd p(dr), respectively.a However, 1−p(ir)−p(dr)≥ 0,

if p(ir) p(dr)+−1= 0, we obtain

p(ur) −( (p ir=) p dr( ))1+ (8)

i.e. the character of relation has not be determined with certainty through the co-occurrence. Because (p ir), p dr( ) ndp(ur) can be considered as the percentagea values, the multiplication of a characteristic with the strength relation regarded as bindery based on type of relations. Therefore, we have three bindings of the strength relations as follows

J1 A bindery of strength relations based on the direct relations,

srdr =sr∗p dr( )∈[0,1] (9)

J2 A bindery of strength relations based on the indirect relations,

srir =sr∗p ir( )∈[0,1] (10)

J3 A bindery of strength relations based on the unclear relations,

srur =sr∗p ur( )∈[0,1] (11)

Fig.

Fig. 2.2. Type of relations based the social network extraction

Prop

Propositionosition 4.4. Ifsr is a strength relation between two social actors a, b ∈ A,

then the aggregation of sr consists of sheets.

Proo f. Based on Proposition and by applying Eq. ( ) to the strength relation3 4

sr, we can generate the aggregations based on words and we call it as the sheets of relations sh, i.e.

(15)

(16)

Generally, this concept is considered to be an approach to the concept of latent semantic analysis [14] that have been put forward and produce labels on

the social networks based on the supervised stream or the generative probabilistic

model (PGM) [4,15]. This approach as enhancing for superﬁcial method [16,17].

Theorem

Theorem 1.1. sr is the strength relation between two actors a, b∈A if and only

if there are aggregation.

[ 3

]

Proo f. This is a direct consequence of Propositions3 and 4 as the necessary conditions, and Lemmas1 2 , and as the suﬃcient conditions, see Fig.3 2.

generate (keyword)

INPUT : A set of actors

OUTPUT : aggregation of the strength relations STEPS :

1. |Ωa| ta query and search engine.← 2. |Ωb| tb query and search engine.←

3. |Ωa∩Ωa| ta∧tb←query and search engine.A= {w1, 2, w .n} Collect. ←. , w words-(terms) per a pair of actors from snippets based on doubleton.

4. |dr| List of snippets based on doubleton.← 5. |ir| List of snippets based on doubleton.← 6. sr∗p(dr) nd sr∗p(ira)

7. Aggregating sr∗p(dr) nd sr∗p(ira) based on the summation of sheets per domain.

[11]

8. Measuring recall and precision of relations.

4

4 Exp erimen

Exp

eriment

t

In this experiment, we implicate n = 469 social actors or n n( −1) = 219 492,

potential relations. There are 30,044 strength relations between 469 actors or

14% of potential relations, among them (a) 4,422 direct relations (2%), (b) 21,462 indirect relations (10%), and (c) 4,160 direct and indirect relations (2%). There-fore, there are 21,462 lists of snippets of doubleton (LD) contain the triple dots

in all snippets, or there are 4,422 lists of snippets of doubleton (LD) ave o h n

dots in all snippets.

Suppose we deﬁne the ontology domain and taxonomically we interpret in a set of words as follows

1. Direct relations:

(a) author-relationship = {activity, article, author, authors, award, journal, journals, paper, patent, presentation, proceedings, publication, theme, poster, . }. . .

(b) academic rule = {supervisor, cosupervisor, editor, editors, graduate, lec-turer, professor, prof, researcher, reviewer, student, . }. . .

(17)

(18)

[2020]

T

Tableable 1.1. The strength relation, direct and indirect relations, and author-relationship

sr

1. [20Abdullah Mohd Zin] 0.0482 0.0395 2.[20Abdul Razak Hamdan] 0.0237 3. Tengku Mohd Tengku Sembok

dr ir dr ir

1. Abdullah Mohd Zin 0.0163 0.0815 0.0975 0.0612 2. Abdul Razak Hamdan 0.0000 0.2349 3. Tengku Mohd Tengku Sembok

1 & 1 2 & 2 3 & 3

(a) scientiﬁc event = {chair, conference, conferences, meeting, programme,

schedule, seminar, session, sponsor, symposium, track, workshop, . }. . .

(b) citation ={reference, references, bibliography, . } . .

With the concept of aggregation starting from the bindery, each bindery consists of chapters (domains), and each chapter contains the sheets (words).

[7]

“Tengku Mohd Tengku Sembok” is 281. Therefore, based on Eq.(5) e ave w h

three strength relations sr like Table 1. From 100 snippets based on doubleton, we have:

1. 60 snippets contain the indirect relations and 12 snippets contain the direct relations for “Abdullah Mohd Zin” and “Abdul Razak Hamdan”,

2. 27 snippets contain the indirect relations and 43 snippets contain the direct relations for “Abdullah Mohd Zin” and “Tengku Mohd Tengku Sembok”, and 3. 66 snippets contain the indirect relations for “Abdul Razak Hamdan” and

(19)

(20)

In this case, p(dr) nd p(ir) for a pair of actors there are in Table 1. While 100a snippets for each pair of actors are calculatedpD(w) for each word and its value

is directly transferred to the sheets in the appropriate domain, such as Table 1.

T

Tableable 2.2. .

Aggregation Recall Precision 1 Author-relationship 61.76% 17.65% 2 Research group 55.88% 7.28% 3 Academic rule 61.94% 13.15% 4 Scientiﬁc event 61.76% 6.10% 5 Citation 50.01% 6.63%

We conduct an experiment using 65 social actors that have direct and indirect

relations between them, orn n( −1) = 4,160 potential relations. Based on survey we obtain the relevant relation and this is a comparison of the results obtained

through extraction from Web. Based on Table 2, the recall and the precision give

the impression that the activation of each aggregation of the strength relation as adequate.

5

5 Conclusion and

Conclusion

and F

Future

uture W

Work

ork

By studying the principle of methods for extraction the relation between social

actors, we have an enhanced method for aggregation the relations to interpret more rich about social. Thus, this new method still needs further veriﬁcation. Future work we study about combination between sheets and domain based on ontology.

2. Nasution, M.K.M., Noah, S.A.: Superﬁcial method for extracting social network for academics using web snippets. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS (LNAI), vol. 6401, pp. 483–490. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16248-068

[0]

3. Cullota, A., Bekkerman, R., McCallum, A.:Extracting social networks and contact

[1]

information from email and the Web. In: Proceedings of the 1st Conference on Email and Anti-Spam (CEAS) (2004)

[0]

4. McCallum, A., Corrada-Emmanual, A., Wang, X.: The author-recipient-topic

model for topic and role discovery in social networks, with application to Enron

[0]

and academic email. In: Proceedings of the Workshop and Link Analysis,

Coun-[0]

(21)

(22)

5. Heras, S., Atkinson, K., Botti, V., Grasso, F., Juli´an, V., McBurney, P.:[0]Research

opportunities for argumentation in social networks. Artif. Intell. Rev. 3939, 39–62 (2013)

[0]

6. Kautz, H., Selman, B., Shah, M.: ReferralWeb: combining social networks and collaborative ﬁltering. Commun.[0]ACM 4040(3), 63–65 (1997)

[0]

7. Finin, T., Ding, L., Zhou, L., Joshi, A.: Social networking on the semantic web.

Learn. Organ.[012]12(5), 418–435 (2005)

[0]

8. Nasution, M.K.M., Sitompul, O.S., Sinulingga, E.P., Noah, S.A.: An extracted social network mining. In: SAI Computing Conference. IEEE (2016)

[0]

9. Nasution, M.K.M.:Social network mining (SNM): a deﬁnition[0] of relation between the resources and SNA. Int. J. Adv. Sci. Eng. Inf. Technol.[1]66(6), 975–981 (2016)

[0] [0]

10. Nasution, M.K.M.: Modelling and simulation of search engine. In: International Conference on Computing and Applied Informatics (ICCAI). IOP (2016)

[0]

11. Nasution, M.K.M.: New method for extracting keyword for the social actor. In: Nguyen, N.T., Attachoo, B., Trawi´nski, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8397, pp. 83–92. Springer, Cham (2014). doi:10.1007/ 978-3-319-05476-6 9

12. Matsuo, Y., Mori, J., Hamasaki, M., Nishimura, T., Takeda, T., Hasida, K., [0]

Ishizuka, M.: POLYPHONET:an advanced social networks extraction system from the web. J. Web Semant. Sci. Serv.[0]Agents World Wide Web 55, 262–278 (2007)

[0]

13. Nasution, M.K.M.: New similarity. In: Annual Applied Science and Engineering Conference (AASEC). IOP (2016)

14. Blei, D.M., Ng, A.Y., Jordan, M.J.: Latent Dirichlet allocation. J. Mach. Learn. Res. , 993–1022 (2003)33

[0]

15. McCallum, A., Corrada-Emmanual, A., Wang, X.: Topic and role discovery in

[0]

social networks. In: Proceedings of the 19th International Joint Conference on

Artiﬁcial Intelligence, pp. 786–791 (2005)

16. Nasution, M.K.M., Mohd Noah, S.A.:[0]Extraction of academic social network from

online database.In: Mohd Noah, S.A. et al. (eds.[0)]Proceeding of 2011 International Conference on Semantic Technology and Information Retrieval (STAIRS 2011), pp.

64–69.[1]IEEE, Putrajaya (2011)

[0]

(23)