• Tidak ada hasil yang ditemukan

social media user personality classification using

N/A
N/A
Nguyễn Gia Hào

Academic year: 2023

Membagikan "social media user personality classification using"

Copied!
10
0
0

Teks penuh

(1)

SOCIAL MEDIA USER PERSONALITY CLASSIFICATION USING COMPUTATIONAL LINGUISTIC

By

Louis Christy Lukito 12112013

BACHELOR’S DEGREE in

INFORMATION TECHNOLOGY

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY

SWISS GERMAN UNIVERSITY EduTown BSDCity

Tangerang 15339 Indonesia

August 2016

(2)

By

Louis Christy Lukito 12112013

BACHELOR’S DEGREE in

INFORMATION TECHNOLOGY

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY

SWISS GERMAN UNIVERSITY EduTown BSDCity

Tangerang 15339 Indonesia

August 2016

(3)

SOCIAL MEDIA USER PERSONALITY CLASSIFICATION

USING COMPUTATIONAL LINGUISTIC Page 2 of 83

STATEMENT BY THE AUTHOR

I hereby declare that this submission is my own work and to the best of my knowledge, it contains no material previously published or written by another person, nor material which to a substantial extent has been accepted for the award of any other degree or diploma at any educational institution, except where due acknowledgement is made in this thesis.

Louis Christy Lukito

Student Date

Approved by:

Alva Erwin, ST, M.Sc., MTI

Thesis Advisor Date

James Purnama, M.Sc.

Thesis Co-Advisor Date

Dr. Ir. Gembong Baskoro, M.Sc

Dean Date

Louis Christy Lukito

(4)

ABSTRACT

SOCIAL MEDIA USER PERSONALITY CLASSIFICATION USING COMPUTATIONAL LINGUISTIC

By

Louis Christy Lukito

Alva Erwin, ST, M.Sc., MTI, Advisor James Purnama, M.Sc., Co-Advisor

SWISS GERMAN UNIVERSITY

Personality is what that differentiate an individual with another. By knowing and un- derstanding an individual’s personality, many advantages can be obtained. Together with rapid growth of technology, knowing an individual’s personality can be done auto- matically. Psychology researches suggest that certain personality traits have correlation with linguistic behavior. Supported by the fame of social media, predicting humans’

personality from their post become possible. Most existing researches have done sim- ilar approach in predicting personality from social media. However, focuses on closed vocabulary investigation with English as their language and mostly based on Big Five personality type. In this thesis, we explore Twitter as data source for open vocabulary personality prediction in Indonesia. We analyze and compare three different statistical model and find correlation about which personality traits are related with linguistic be- havior. As a result, Naïve Bayes classifier outperforms the other statistical model with the highest accuracy (80% for I/E and 60% for S/N, T/F, and J/P personality traits) and shows the best performance in terms of speed in classifying the users. Moreover, a simple application was developed based on the best statistical model compared before to classify an individual’s personality with their Twitter username and gender as an in- put. In addition, the top 30 of most frequently used words per personality traits is also the result of this experiment, in which all of the experiments has been validated with a psychology expert.

Keywords: Personality, Twitter, Lexical Approach, Grammatic Rule, Naïve Bayes,

(5)

SOCIAL MEDIA USER PERSONALITY CLASSIFICATION

USING COMPUTATIONAL LINGUISTIC Page 4 of 83

c Copyright 2016 By Louis Christy Lukito

All rights reserved

Louis Christy Lukito

(6)

DEDICATION

I dedicate this work to my beloved parents who always taking care and support me with love, care, counsel, and resources to become a better man.

(7)

SOCIAL MEDIA USER PERSONALITY CLASSIFICATION

USING COMPUTATIONAL LINGUISTIC Page 6 of 83

ACKNOWLEDGEMENTS

I would like to be grateful to God because of his countless blessings bestowed me so that this thesis can be done. And also I would like to express my deepest gratitude to Mr. Alva Erwin and Mr. James Purnama as the advisor and co-advisor of this thesis for continuous support, guidance and feedback to complete this thesis. Special thanks to Ms. Wulan Danoekoesoemo as an expert in psychology for her advice and guidance so that the result of this thesis is valid, and also Wahyu Mulyadi for the time, advice, and laugh as his guidance gave a huge impact in the advancement of this thesis.

Many thanks also given to all of my friends for the jokes and support whenever I found obstacles in doing this thesis, specially to Melissa as she dedicated a little bit of time in verifying the MBTI psychological test by doing a double translate. Also thank you to countless number of people who have helped me whether directly or indirectly.

Lastly, I give my biggest thanks to all of my family for countless love, care, and support in any form throughout my life. They contribute a huge portion of merit that I become the person as who I am today.

Louis Christy Lukito

(8)

Contents

STATEMENT BY THE AUTHOR 2

ABSTRACT 3

DEDICATION 5

ACKNOWLEDGEMENTS 6

TABLE OF CONTENTS 7

LIST OF FIGURES 10

LIST OF TABLES 11

1 INTRODUCTION 12

1.1 Research Background . . . 12

1.2 Research Problems . . . 12

1.3 Research Objectives . . . 12

1.4 Significance of Study and Business Value . . . 13

1.5 Research Scope . . . 13

1.6 Research Limitation . . . 14

1.7 Research Questions . . . 14

1.8 Hypothesis . . . 14

2 LITERATURE REVIEW 15 2.1 Related Works . . . 15

2.2 Myers-Briggs Type Indicator (MBTI) . . . 17

2.3 Term Frequency - Inverse Document Frequency (TF-IDF) . . . 18

2.4 Naïve Bayes . . . 19

2.5 External Resources . . . 21

2.5.1 Corpus . . . 21

2.5.2 FAROO algorithm . . . 21

3 RESEARCH METHODOLOGY 22 3.1 Methodology Overview . . . 22

(9)

SOCIAL MEDIA USER PERSONALITY CLASSIFICATION

USING COMPUTATIONAL LINGUISTIC Page 8 of 83

3.2.1 Respondents Collection . . . 24

3.2.2 Twitter Crawler Development . . . 25

3.3 Data Preprocessing . . . 27

3.4 Training Set Development . . . 30

3.4.1 Training Set for Machine Learning Approach . . . 30

3.4.2 Training Set for Lexicon Based Approach . . . 31

3.4.3 Training Set for Grammatic Rule Approach . . . 33

3.5 Statistical Model Development . . . 34

3.5.1 Machine Learning Statistical Model . . . 34

3.5.2 Lexicon Based Statistical Model . . . 34

3.5.3 Grammatic Rule Statistical Model . . . 34

3.6 Tweaking and Testing . . . 38

3.7 Word Analysis . . . 39

3.8 Simple Application Development . . . 40

4 RESULTS AND DISCUSSIONS 41 4.1 Data Collection . . . 41

4.2 Experiment Result . . . 41

4.2.1 Machine Learning Approach . . . 42

4.2.2 Lexicon Based Approach . . . 43

4.2.3 Grammatic Rule Approach . . . 45

4.2.4 Statistical Model Comparison . . . 46

4.2.5 Data Size Experiment . . . 47

4.3 Word Analysis Result . . . 48

4.4 Application Result . . . 54

5 CONCLUSION AND FUTURE WORK 57 5.1 Conclusion . . . 57

5.2 Future Work . . . 57

GLOSSARY 59

REFERENCES 60

APPENDICES 65

A MBTI PERSONALITY TEST QUESTIONS AND CALCULATION FOR-

MULA 65

B PAPER 74

Louis Christy Lukito

(10)

CURRICULUM VITAE 83

Referensi

Dokumen terkait

Orlf'ga y ;asset ~- Tierno Galrnn En Los Coloquios De Darmstadt En esla ciudad alemana se celebran anualmenle unos coloquios de tipo filos6fico en los que los mas