Personality Classification on Twitter Social Media using BERT

(1)

Personality Classification on Twitter Social Media using BERT

Yantrisnandra Akbar Maulino^*, Warih Maharani, Prati Hutari Gani School of Computing, Informatics Study Program, Telkom University, Bandung, Indonesia Email: ^1,*[email protected], ²[email protected],

3[email protected]

Correspondence Author Email: [email protected]

Abstract−In the modern era, social media is a platform often used to interact with people. Twitter is a popular social media, especially for human interaction. Using tweets on Twitter can describe how a person's personality and can also describe characteristics of a person. Humans themselves based on the Big Five Model Nursing Theory (Big Five Personality), have five general personalities, namely openness, conscientiousness, extraversion, agreeableness, and neuroticism. Personality itself influences a person's judgment of many things, knowing the personality of a person can make it easier to know the characteristics, habits, and ways of that person in their daily activities. In addition, understanding someone's personality can be a reference in seeing how someone can interact with others. It can also be used when looking for a job according to their personality. Thus, this research builds a system to classify personality using the BERT model with the dataset used in the form of tweets from Twitter users by making several changes such as parameters and using tests with several ratios in determining test data and also training data. The results acquired in this study are 50%.

Keywords: Social Media; Personality; Twitter; Big Five Personality; BERT

1. INTRODUCTION

The era that is increasingly developing causes technology to grow without limits. Currently, social media has become easily accessible to humans and can be used as a place for human interaction. Social media will be easily accessed by humans and can be used as a place to interact with humans. As of January 2022, there are 191.4 million social media users in Indonesia[1]. In Indonesia, Twitter is a popular social media. There are 18.45 million users of the Twitter application as of January 2022[2]. Twitter is a platform that humans can use to be able to interact by giving tweets. These tweets are used to present how the personality of that person. Personality is a trait that can be used as a characteristic to distinguish it from others seen in behavior, way of speaking, thinking, and others [3].

To find out someone's personality can be done by conducting interviews or by giving a questionnaire. Personality can be used as a benchmark that can be used as a reference in seeing the daily life of that person, then how one interacts with other people and can also be used as a reference in terms of finding a job according to one's personality. One's personality can be known by one of the methods, namely the Big Five Personality method. With this method, five personalities are obtained, namely extraversion, agreeableness, conscientiousness, neuroticism and openness [4].

Research on personality classification has been carried out using several methods. One of the studies conducted by Sedrick Scott Keh, I-Tsun Cheng entitled Myers-Briggs Personality Classification and Personality- Specific Language Generation Using Pre-trained Language Models, from this study it can be concluded that the BERT model can be used to predict personality types MBTI. The results of this study showed that the BERT model yielded an accuracy of 0.47 by successfully predicting class personality on the MBTI personality and obtaining an accuracy of 0.86 by successfully predicting two types of MBTI personality [5]. In another study conducted by Amirmohammad Kazameini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, Erik Cambria, from this study the BERT model was used to extract contextual word embedding from textual data to perform personality detection after using the BERT model to extract contextual words then used for the Bagged-SVM classification which is used for personality classification and the results obtained for this classification get an increase of 1.04%[6].

Another research conducted by Alireza Souri, Shafigheh Hosseinpour and Amir Masoud Rahmani entitled Personality Classification based on the profile of social network users and the five-factor model of personality using several methods, namely Naïve Baiyes, Boosting Naïve Baiyes, Neural Network, 6 Boosting Neural Network, Decision Tree, Boosting Decision Tree, Support Vector Machine (SVM), and Boosting Support Vector Machine (SVM). The results of this study are the most accurate Boosting Decision Tree with an accuracy of 82.2%

[7].Another study conducted by Ghina Dwi Salsabila and Erwin Budi Setiawan, this study used data originating from Twitter with a total of 511,617 tweets. The model used in the study was SVM which was then combined with the BERT model as a semantic approach that obtained the best results used to be able to predict personality with an accuracy of 80.07%[8]. Another study conducted by Kamal El-Demerdash, Reda A. El-Khoribi, Mahmoud A.

Ismail Shoman, and Sherif Abdou entitled Deep learning based fusion strategies for personality prediction using BERT, ULMFiT and ELMo methods. The results of the study stated that BERT had the highest average accuracy rate, with a value of 60.43% [9].

Thus, a person's personality can be classified through uploads on social media, especially Twitter, with different levels of accuracy according to the method used. Bidirectional Encoder Representation From Transformers ( BERT ) is a deep learning algorithm that can be used to be able to work on natural language processing [10]. The advantage of BERT is understanding unclear words in the text and then converting them into words that are appropriate to the context and simultaneously processing all the words in the text [11]. There has

(2)

yet to be any research from previous research to classify personality from social media, especially Twitter, using the BERT model used as a classifier. This research can help to understand a person's personality, which can be used as a reference when looking for a job that fits his personality and to find out how a person interacts with others without the need to conduct interviews or give questionnaires.

2. RESEARCH METHODOLOGY

2.1 Research Stages

The research stages define the stages held out in the research. From figure 1 shows a system that classifies a person's personality based on the Big Five personality types is built. The system retrieves tweet data on Twitter, and then the system performs pre-processing. After pre-processing, the system splits the data into training data and test data. Then the training data is trained using the BERT model and then tested using the test data using the BERT model. Then the results issued by the system are evaluated with a confusion matrix and get accuracy, precision, recall, and F1 Score values.

Figure 1. System Design 2.2 Data crawling

This study collected data from Twitter users who filled out a questionnaire. The data used is in the form of usernames from Twitter users and tweets created from tweets made by Twitter users. In obtaining the dataset, a questionnaire was given to the respondent containing questions about personality to get a personality set to be a label for the Twitter user's personality. From the questionnaire results, a tweet was taken on the user's Twitter with the respondent's permission. The results of the crawling data are entered into a CSV file which contains the name of the respondent's Twitter account, the respondent's tweet, and the respondent's personality label.

2.3 Pre Processing

At the pre-processing data stage, the data is processed so that the data is neater and more structured so that later it is easier to process in research. Several stages are carried out at this stage, starting with Case Folding, namely changing uppercase letters to lowercase letters, then continuing with Remove Punctuation, namely removing punctuation marks, symbols, and characters. Then eliminate the word SlangWord by using the existing dictionary.

Stages for Pre Processing data can be seen in table 1.

Table 1. Pre-Processing

Stages Result

Original Data Claraang 09,2021-03-06 10:38:46,@nikghtcrawlers Oke gua ad temennya wkwk Halo

@SpotifyID kenapa bbrp hp ga bisa tampilin lirik pdhl udh selalu upgrade ke versi terbaru Case Folding claraang 09,2021-03-06 10:38:46,@nikghtcrawlers oke gua ad temennya wkwk halo

@spotifyid kenapa bbrp hp ga bisa tampilin lirik pdhl udh selalu upgrade ke versi terbaru Remove

Punctuatuion

claraang nikghtcrawlers oke gua ad temennya wkwk halo spotifyid kenapa bbrp hp ga bisa tampilin lirik pdhl udh selalu upgrade ke versi terbaru

Change Slang Word

claraang nikghtcrawlers oke gua ada temannya wkwk halo spotifyid kenapa beberapa hp ga bisa tampilin lirik padahal udah selalu upgrade ke versi terbaru

(3)

2.4 BERT

BERT is a deep learning model that produces good results for working on NLP [12]. BERT is a contextual word representation model previously trained using the Mask Language Model (MLM) [13]. BERT produces a text model with no labels and can coordinate both texts at all layers, namely pre-train bidirectional representation[14].

At the time of pre-training, model training is carried out, which is intended for data that does not have a label for each different pre-training. Meanwhile, during fine-tuning, initialization is carried out with parameters that have previously been trained, and parameters are set using labelled data [15]. From Figure 2, the BERT pre-training process, Masked LM will be carried out. Masked LM will provide a BERT model of sentences, which will counterbalance the BERT model so that it can produce sentences for other words. Training is carried out in depth for two-way representation, then closes a portion of the random input token percentage and predicts the disguised token [15]. In addition to doing Masked LM, BERT NSP (Next sentence Prediction) pre-training is also carried out, with many things being done based on understanding the relationship between the two sentences, whereas language modelling does not directly capture the meaning of the relationship between the two sentences. BERT conducts pre-training to predict the relationship between the two sentences [15].

Figure 2. BERT Model 2.4.1 Prepare BERT Input

After carrying out the pre-processing process, then preparing the input for BERT, at this stage, preparations for making input for the BERT model are carried out. BERT itself has three inputs, namely input_ids, which is a numerical representation of the token according to the order of input in the text, attention_mask which is a mask for token conditions that must be considered or can be left unattended, attention_mask is needed because there are words that have different word lengths in one sentence, padding is added to the sequence when the length of the sequence does not have the length of the longest padding sequence, and token_type_ids which is the token index to indicate the first and second parts of the input text. At this stage, tokenization is also carried out, and tokenizing is carried out using the BERT model following the form of the BERT model itself. In this case, tokenized tweets are adapted to the understanding of the BERT model, when the BERT model understands the context of the word, the word is tokenized into one word, if the model does not understand, then the word is split into characters that the model understands. The results of the BERT model when tokenizing words can be seen in table 2.

Table 2. Tokenize BERT

Stages Result

Original Data sudah flu berat terus batuk parno banget takut positif lagi tidak bisa mencium hidungnya mampet sih mau sih anjir wkwk saya norak sepertinya tahu obat puskesmas gratis sama alat

mengecek tensi otomatis Tokenize

inside BERT

['clara', '##ang','ni', '##kg','##ht', '##cra', '##wl', '##ers','ok', '##e', 'gu', '##a', 'ada', 'teman', '##nya', 'w', '##k', '##w', '##k', 'hal', '##o', 'spot', '##ify', '##id', 'ken', '##apa', 'beberapa', 'hp',

'ga', 'bisa', 'tampil', '##in',

'li', '##rik', 'pada', '##hal', 'ud', '##ah', 'selalu', 'upgrade','ke', 'versi', 'ter', '##baru']

2.4.2 BERT Classification

At this stage, the classification is done using the BERT model. To create a model, call the BERT model first to fine-tune the model. At this stage, several layers are created for input so that they can enter the model that will be processed by the base model layer and produce output stored in the last_hidden_state layer. In making it starts with making a layer for input_ids, then creating a layer for token_type_ids, then creating a third layer which is the

(4)

attention_mask layer. These three layers become input before entering into the base m layer. Then next, a global average pooling layer is used to produce a brief representation from existing sentences and create a dense layer using the "sigmoid" activation. Then the model is compiled using the Adam optimizer, and for loss using sparse_categorical_crossentropy loss and using accuracy metrics. All layers created during modeling generate 177,857,285 parameters. After the model is successfully created, personality classification is then carried out using the Big Five method. The classification here is carried out by training the model and validating the model that has been made.

2.5 Evaluation

At this stage, the evaluation will use the Confusion matrix which provides information about the comparison between the classification results and the results of the actual classification carried out by the system [16]. The confusion matrix is a table in the form of a matrix that describes the performance of the classification model performed using test data whose actual value is known. Terms found in Confusion matrix is True Positives, True Negatives, False Positives, and False Negatives [17]. The Confusion Matrix table can be seen in table 3.

Table 3. Confusion Matrix

Actual Result

Prediction Results Positif Negatif

Positif TP FN

Negatif FP TN

The confusion matrix can calculate performance from the performance of the model used by calculating accuracy, recall, precision, and f1-score using formulas 1 to 4[18]. The accuracy value takes into account the accuracy of a model to classify correctly [19], while precision takes into account the accuracy of the desired data with the predicted results generated by the model [20], recall will calculate the success of the model in being able to obtain information[21], and the f1-score will indicate whether the recall and precision classification model are good or bad[22].

Accuracy = ^TP+TN

TP+TN+FP+FN (1)

Precision = ^TP

TP+FP (2)

Recall = ^TP

TP+FN (3)

F1 − Score = 2 x Precission x Recall

Precission+Recall (4)

3. RESULT AND DISCUSSION

In testing the dataset used, there are 276 data of Twitter users who have personality labels according to the Big Five personality method. The distribution of personality labels in the dataset is shown in Figure 3.

Figure 3. Distribution of Datasets 3.1 Test Result

3.1.1 First Scenario

The distribution of tweets according to the personality labels in Figure 3 is used to become the data to be tested with several scenarios. Testing was carried out with several scenarios. The first scenario using different ratios

(5)

when splitting data with the parameters used are baseline parameters, namely using parameters as shown in table 4.

Table 4. Baseline Parameter Parameter Nilai Parameter Learning Rate 1e-5

Batch size 32

Class weight ’none’

The accuracy results obtained from the first scenario, namely by using several different ratios for training data and testing data using baseline parameters as in table 4, can be seen in table 5.

Table 5. First Scenario Accuracy Results Data Training : Data Testing Akurasi

50:50 36.23 %

60:40 43.24 %

70:30 39.76 %

80:20 42.86 %

90:10 46.43 %

From table 5, the highest accuracy value is obtained for the first scenario using baseline parameters as shown in table 4, with a split data ratio of 90:10 with an accuracy of 46.43%. The results of the confusion matrix generated from the first scenario produce a model that can be classified with the results shown in table 6.

Table 6. First Scenario Confusion Matrix Results

Actual Results ( 28 )

Prediction Results Neuroticism

( 0 )

Agreeableness ( 0 )

Openness ( 28 )

Extraversion ( 0 )

Conscientiousness ( 0 ) Neuroticism

( 3 )

0 0 3 0 0

Agreeableness ( 6 )

0 0 6 0 0

Openness ( 13 )

0 0 13 0 0

Extraversion ( 2 )

0 0 2 0 0

Conscientiousness ( 4 )

0 0 4 0 0

From table 6, the model with the best accuracy in the first scenario succeeded in classifying Openness personality. The model cannot correctly classify Neuroticism personality, Agreeableness personality, Extraversion personality, and Conscientiousness personality because there needs to be more data on these personalities. The results of the precision, recall, and F1-score in the scenario with the best accuracy results in the first scenario can be seen in table 7.

Table 7. Precision, Recall, F1-Score First Scenario Results Label Recall Precision F1-Score

Neuroticism 0 0 0

Agreeableness 0 0 0

Openness 1 0.46 0.63

Extraversion 0 0 0

Conscientiousness 0 0 0

Table 7 shows that the openness personality label results get a recall 1, precision 0.46, and F1-Score 0.63.

Table 7 also shows the results of the precision, recall, and F1-score values for the agreeableness, neuroticism, extraversion, and conscientiousness personality labels with a score of 0%. This happens because the model cannot understand the personality labels of agreeableness, neuroticism personality, extraversion personality, and conscientiousness personality due to unequal data for each personality.

3.1.2 Second Scenario

The second scenario is carried out by changing the parameters used. The parameters used for testing in the second scenario are the Batch size, and the learning rate for the distribution of the dataset ratio using the best ratio

(6)

distribution carried out in the first scenario, namely 90:10, and using the baseline parameters as in table 4. The resulting accuracy in the second scenario is shown in table 8.

Table 8. Second Scenario Accuracy Results Parameter

Accuracy Learning Rate Batch Size

1e - 5 8 35.71%

1e-6 8 42.86%

1e-6 32 50%

From table 8, the accuracy results from the second scenario when using a learning rate of 1e-5 and batch size 8 get an accuracy of 35.71%. This accuracy has decreased compared to batch size 32 in the first scenario.

When changing the learning rate to 1e-6 and using a batch size as in the first scenario, namely 32, the accuracy value has increased to 50%. Then, when changing the learning rate and batch size, the resulting accuracy decreased to 42.86%. The results of the confusion matrix generated from the second scenario produce a model that can classify with results as shown in Table 9.

Table 9. Second Scenario Confusion Matrix Results

Actual Result ( 28 )

Prediction Results Neuroticism

( 3 )

Openness ( 21 )

Extraversion ( 0 ) Conscientiousness ( 0 ) Neuroticism

( 3 )

1 0 2 0 0

Agreeableness ( 6 )

0 2 4 0 0

Openness ( 13 )

1 1 11 0 0

Extraversion ( 2 )

0 1 1 0 0

1 0 3 0 0

From table 9, the model with the best accuracy obtained succeeded in classifying neuroticism personality, agreeableness personality, and openness personality. The model cannot correctly classify Extraversion and Conscientiousness personalities because more data needs to be collected. The results of the precision, recall, and F1 scores in the second scenario with the best accuracy results, namely by using a learning rate of 1e-6 and also a batch size of 32 with an accuracy of 50%, can be seen in Table 10.

Neuroticism 0.33 0.33 0.33

Agreeableness 0.33 0.50 0.40

Openness 0.85 0.52 0.65

Extraversion 0 0 0

From table 10, the results for the neuroticism personality label get a recall value of 0.33, precision 0.33, and F1-Score 0.33, then for the agreeableness personality label get a recall value of 0.33, precision 0.50, and F1- Score 0.40, and for the openness personality label get a value recall 0.85, precision 0.52, and F1-Score 0.65. Table 10 also shows that the precision, recall, and F1-score values of the extraversion and conscientiousness personality labels get a value of 0%. This happens because the model cannot learn the personality labels of conscientiousness and extraversion

3.1.3 Third Scenario

The third scenario is carried out by changing the 'none' class weights to 'balanced' class weights, where class weights are used because the data is not balanced. In this case, the class weight itself raises the minority class, where the minority class is a class that has a small amount of data compared to the majority class or has more data.

As seen in figure 3, the amount of data in the openness class is 111 data, the neuroticism class is 52, and the Agreeableness class is 81, so the amount of data is very far from the amount of data owned by the extraversion class which is 8 data, and the conscientiousness class is 24 data, class weights make the class weight values that have many data, or the majority class decrease and increase the class weight values that have little data or minority

(7)

data. The class weight parameter is changed to 'balanced' in the third scenario. Tweet data after being processed using class weights can be seen in table 11.

Table 11. Weight Class

Neuroticism Agreeableness Opennes Extraversion Conscientiousness

1.0615 0.6815 0.4972 6.9 2.3

From the distribution of class weight data, as shown in table 11, it is found that the weight value of the majority class is below the minority class. From the distribution in table 11, it is used for the third scenario, where in this scenario, it is added for unbalanced data handling with the best parameters in the second scenario, namely by using the same learning rate and batch size used in the second scenario, namely using a learning rate of 1e-6, then using a batch size of 32 and dividing the data ratio 90:10. This scenario is intended to find out when the class weight model has been added to determine the personality class for the minority class. The results of the confusion matrix generated from the second scenario produce a model that can classify with results as shown in Table 12.

Table 12. Third Scenario Confusion Matrix Results

Actual Results 28

Prediction Results Neuroticism (

11 )

Openness ( 14 )

Extraversion ( 0 )

Conscientiousness ( 2 ) Neuroticism

( 3 )

1 0 2 0 0

Agreeableness ( 6 )

3 0 3 0 0

Openness ( 13 )

4 0 7 0 2

Extraversion ( 2 )

0 1 1 0 0

3 0 1 0 0

From table 12, the model can classify the most suitable for Neuroticism and Openness personalities. For Conscientiousness, Agreeableness, and Neuroticism personality, no personality is predicted correctly by the model. From table 12, after adding the class weights, the minority personality class still cannot make the minority class classifiable by the model. The values of precision, recall, and F1 scores in the third scenario can be seen in table 13.

Neuroticism 0.33 0.091 0.14

Agreeableness 0 0 0

Openness 0.54 0.50 0.52

Extraversion 0 0 0

From table 13, the results for the neuroticism personality label get a recall value of 0.33, precision 0.091, and F1-Score 0.14, and for the openness personality label get a recall value of 0.50, precision 0.54, and F1-Score 0.52. Table 13 also shows that the precision, recall, and F1-score values of the conscientiousness, agreeableness, and extraversion personality labels have 0%. This can happen because the model still cannot learn exactly the personality labels of conscientiousness, agreeableness, and extraversion.

3.2 Analysis of Test Results 3.2.1 Analysis First Scenario

In the first scenario, it is done by changing the dataset ratio comparison used. This scenario is made to determine whether the difference in the amount of test data and test data affects the resulting accuracy. From the results of tests carried out in first scenario, the most excellent accuracy value is when using a 90:10 ratio using the baseline parameters, namely learning rate 1e-5 and batch size 32, and class weight set to none. In the first scenario, the best accuracy results are obtained using a ratio of 90:10. By using the dataset ratio, the model will be trained with 90%

of all existing data and only use 10% as test data. The confusion matrix results in the first scenario in table 6 show that the model only correctly classifies the Openness personality class by successfully classifying 13 out of 13 personality data with a recall value of 1, a precision value of 0.46, and an F1-score value of 0.63. This can be caused by the uneven distribution of the data and also because the amount data on the openness personality label has the most amount of data among the other personality labels, namely as many as 111 data, and by using a dataset

(8)

ratio of 90:10 causes the data tested to be more towards openness personality label. This affects the model to be able to learn other classes.

3.2.2 Analysis Second Scenario

In the second scenario, this is done by changing the parameters used. In this scenario, several parameters were tested, namely the learning rate and the batch size, with the values for each learning rate being 1e-5 and 1e-6 using batch sizes 32 and 8. This scenario aims to determine the effect of the parameters used in the resulting accuracy.

Changes in the learning rate parameter cause the model to understand the data quickly or not when using training data. Using a learning rate with a low value can help the model get the maximum value while changing the batch size parameter will affect the algorithm's speed when run. While using small batch sizes can produce significant noise on large computations, and running algorithms can make algorithms converge faster, whereas using a large batch size can reduce the amount of noise, and the running algorithm takes longer. The results of the second scenario produce an increase when making changes to the learning rate parameter to 1e-6 and using a batch size of 32 with 50% accuracy. The results of the confusion matrix generated in the second scenario in table 9 show that the model succeeded in classifying the neuroticism personality label as much as 1 out of 3 personality data and obtaining a recall value of 0.33, a precision of 0.33, and an F1-Score of 0.33, then for personality agreeableness can classify as many as 2 out of 6 personality data and get a recall value of 0.33, precision 0.50, and F1-Score 0.40, and for openness personality can classify as many as 11 out of 13 personality data and get a recall value of 0.85, precision 0.52, and F1-Score 0.65. From the recall results obtained in the second scenario there was a decrease in the openness personality class, and an increase in the neuroticism and agreeableness personality labels, the extraversion and conscientiousness personality labels did not change because these classes could not be classified based on the model. From the recall results obtained in the second scenario, there was an increase for the neuroticism class to 0.33, agreeableness to 0.50, and openness class to 0.52. This shows that when the learning rate parameter is changed it will increase the accuracy results and also affect the performance of the model itself.

In this scenario the model can predict more classes compared to the first scenario, where in the second scenario the model can predict neuroticism, agreeableness, and openness classes.

3.2.3 Analysis Third Scenario

In the third scenario, change the class weight to 'balanced' because in this test, the dataset provided is unbalanced, where there is data that has many data on one personality. On the other hand, there is little data on one personality.

Changing the class weight to be balanced will add weight to the minority class, namely the extraversion class and the conscientiousness personality class, then lower the weight for the majority class, namely the openness personality class, the neuroticism personality class, and the agreeableness personality class. After changing the class weight to balanced in the majority class, the weight value became 0.4972 for openness, 1.0615 for the neuroticism personality class, and 0.6815 for the agreeableness personality class. Then for the minority class, the weight value for the extraversion personality becomes 6.9 and for the conscientiousness personality class becomes 2.3. This scenario used the best parameters in the second scenario, namely by using a learning rate of 1e-6 and a batch size of 32. This scenario aims to find out how changing the class weight to 'balanced' affects the personality class that the model can classify. The results of the confusion matrix generated in the third scenario in table 12 show that the model succeeded in classifying the neuroticism personality label as much as 1 out of 3 personality data and getting a recall value of 0.33, precision 0.091, and F1-Score 0.14 and for openness personality, it can classify as many as 7 from 13 personality data and get a recall value of 0.54, precision 0.50, and F1-Score 0.52, while for the agreeableness, extraversion, and conscientiousness personality classes get precision, recall, and F1- Score 0. The results of the precision produced in the third scenario experienced a decrease significantly in the openness and agreeableness personality labels. This scenario does not get better accuracy results than the previous scenario. Also, the minority class cannot be classified correctly by the model. Therefore by changing the class weights to 'balanced' the model still cannot classify the minority class and obtains no better accuracy than the first or second scenario.

4. CONCLUSION

In this research, a system was built to classify a person's personality based on the Big Five Personality originating from social media Twitter. In this case, the Big Five Personality consists of openness, conscientiousness, extraversion, agreeableness, and neuroticism. The dataset used is data obtained from Twitter social media to be able to classify personality. The model results are evaluated using the confusion matrix's help and calculated for accuracy, precision, recall, and F1-score. The test results obtained the highest accuracy results from all scenarios, namely by using a learning rate of 1e-6, batch size 32, and a dataset comparison ratio of 90:10 with an accuracy of 50%. From all the scenarios that are done, the best recall values were produced for the neuroticism personality class in the second and third scenarios with a recall value of 0.33, then for the agreeableness personality class in the second scenario with a recall value of 0.33, then for the openness personality class in the first scenario with a recall value of 1. Meanwhile, classification was not successful for all the scenarios that were carried out for the

(9)

extraversion personality class and the conscientiousness personality class. Suggestions for further research are to add datasets that are used mainly for minority classes so that the data becomes more balanced, then do more experiments on parameters that can be changed.

REFERENCES

[1] L. Jemadu and D. Prastya, “Jumlah Pengguna Media Sosial Indonesia Capai 191,4 Juta per 2022,” 2022.

https://www.suara.com/tekno/2022/02/23/191809/jumlah-pengguna-media-sosial-indonesia-capai-1914-juta-per-2022 (accessed Jan. 07, 2023).

[2] C. M. Annur, “Pengguna Twitter Indonesia Masuk Daftar Terbanyak di Dunia, Urutan Berapa?,” 2022.

https://databoks.katadata.co.id/datapublish/2022/03/23/pengguna-twitter-indonesia-masuk-daftar-terbanyak-di-dunia- urutan-berapa (accessed Jan. 07, 2023).

[3] G. M. Framanta, “Pengaruh Lingkungan Keluarga Terhadap Kepribadian Anak,” J. Pendidik. dan Konseling, vol. 2, no.

1, pp. 126–129, 2020, doi: 10.31004/jpdk.v1i2.654.

[4] M. P. R. Putra and K. R. N. Wardani, “Penerapan Text Mining Dalam Menganalisis Kepribadian Pengguna Media Sosial,” JUTIM (Jurnal Tek. Inform. Musirawas), vol. 5, no. 1, pp. 63–71, 2020, doi: 10.32767/jutim.v5i1.791.

[5] S. S. Keh and I.-T. Cheng, “Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-trained Language Models,” 2019, [Online]. Available: http://arxiv.org/abs/1907.06333

[6] A. Kazameini, S. Fatehi, Y. Mehta, S. Eetemadi, and E. Cambria, “Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles,” pp. 1–4, 2020, [Online]. Available: http://arxiv.org/abs/2010.01309

[7] A. Souri, S. Hosseinpour, and A. M. Rahmani, “Personality classification based on profiles of social networks’ users and the five-factor model of personality,” Human-centric Comput. Inf. Sci., vol. 8, no. 1, 2018, doi: 10.1186/s13673-018- 0147-4.

[8] G. D. Salsabila and E. B. Setiawan, “Semantic Approach for Big Five Personality Prediction on Twitter,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 4, pp. 680–687, 2021, doi: 10.29207/resti.v5i4.3197.

[9] K. El-Demerdash, R. A. El-Khoribi, M. A. Ismail Shoman, and S. Abdou, “Deep learning based fusion strategies for personality prediction,” Egypt. Informatics J., vol. 23, no. 1, pp. 47–53, Mar. 2022, doi: 10.1016/J.EIJ.2021.05.004.

[10] R. Devika, S. Vairavasundaram, C. S. J. Mahenthar, V. Varadarajan, and K. Kotecha, “A Deep Learning Model Based on BERT and Sentence Transformer for Semantic Keyphrase Extraction on Big Social Data,” IEEE Access, vol. 9, pp.

165252–165261, 2021, doi: 10.1109/ACCESS.2021.3133651.

[11] A. T. B. W and D. H. Fudholi, “KLASIFIKASI EMOSI PADA TEKS DENGAN MENGGUNAKAN METODE DEEP LEARNING,” J. Ilm. Indones., vol. 6, no. 1, 2021.

[12] R. M. R. W. P. K. Atmaja and W. Yustanti, “Analisis Sentimen Customer Review Aplikasi Ruang Guru dengan Metode BERT (Bidirectional Encoder Representations from Transformers),” Jeisbi, vol. 02, no. 03, pp. 55–62, 2021.

[13] J. H. Tandijaya, Liliana, and I. Sugiarto, “Klasifikasi dalam Pembuatan Portal Berita Online dengan Menggunakan Metode BERT,” J. Infra, vol. 9, no. 2, pp. 320–325, 2021.

[14] C. A. Putri, “Analisis Sentimen Review Film Berbahasa Inggris Dengan Pendekatan Bidirectional Encoder Representations from Transformers,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 181–193, 2020, doi: 10.35957/jatisi.v6i2.206.

[15] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

[16] M. Fansyuri, “Analisa algoritma klasifikasi k-nearest neighbor dalam menentukan nilai akurasi terhadap kepuasan pelanggan (study kasus pt. Trigatra komunikatama),” Humanika J. Ilmu Sos. Pendidikan, dan Hum., vol. 3, no. 1, pp.

29–33, 2020.

[17] I. Düntsch and G. Gediga, “Confusion Matrices and Rough Set Data Analysis,” J. Phys. Conf. Ser., vol. 1229, no. 1, 2019, doi: 10.1088/1742-6596/1229/1/012055.

[18] A. M. Argina, “Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes,” Indones.

J. Data Sci., vol. 1, no. 2, pp. 29–33, 2020, doi: 10.33096/ijodas.v1i2.11.

[19] M. R. A. Nasution and M. Hayaty, “Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter,” J. Inform., vol. 6, no. 2, pp. 226–235, 2019, doi: 10.31311/ji.v6i2.5129.

[20] M. Azhari, Z. Situmorang, and R. Rosnelly, “Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 640, 2021, doi:

10.30865/mib.v5i2.2937.

[21] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020, doi: 10.1186/s12864-019-6413-7.

[22] A. Putri, B. S. Negara, and S. Sanjaya, “Penerapan Deep Learning Menggunakan VGG-16 untuk Klasifikasi Citra Glioma,” J. Sist. Komput. dan Inform., vol. 3, no. 4, p. 379, 2022, doi: 10.30865/json.v3i4.4122.