Personality Detection On Twitter User With RoBERTa

(1)

Personality Detection On Twitter User With RoBERTa

Rianda Khusuma^*, Warih Maharani, Prati Hutari Gani School of Computing, Informatics, Telkom University, Bandung, Indonesia

Email: ^1,*riandakhusuma@student.telkomuniversity.ac.id, ²wmaharani@telkomuniversity.ac.id,

3pratihutarigani@telkomuniversity.ac.id

Correspondence Author Email: riandakhusuma@student.telkomuniversity.ac.id

Abstract−Social media provides a service where users can make status updates about themselves. One of the social media that has such a facility is twitter. Twitter allows its users to express themselves easily by uploading tweets to their Twitter accounts.

These activities on social media can indirectly describe the personality of the account owner. One form of personality classification that can be used is the big five personality. This theory classifies individual characters into five personality types, namely openness, conscientiousness, extraversion, agreeableness, and neuroticism. In the work environment, personality will significantly affect the work that is suitable for someone to do. To do a personality test, a test that is done manually, certainly takes longer and costs more. Therefore the use of machine learning to detect personality from social media is needed. By using the RoBERTa model to perform personality classification and dataset support from Twitter tweets, a system can be formed to detect personality. In the RoBERTa model, by determining the optimal ratio of training data and test data, as well as performing hyperparameter tuning, accuracy results can be obtained in classification activities, reaching 57.14%.

Keywords:Twitter; Personality Classification; Big Five Personality;RoBERTa; Hyperparameter

1. INTRODUCTION

In the modern era, one of the things that have become the primary research in companies is the relationship between individual personality and the work environment [1]. Personality can be described as a distinctive and moderately stable style of thinking, manners, and vigorous response that distinguishes a person's adaption to circumstances [2]. A personality test is needed to determine an individual's personality. Generally, company personality tests are still in manual forms, such as interviews. Assessing a person's personality can be seen from the social media they use. Currently, social media is a platform that Indonesians widely use. Social media is generally used for many things, including to provide the latest news about oneself. In Indonesia, 191.4 million people are social media users. Twitter is one of Indonesia's most widely used social media, with a total of 18.45 million users in early 2022 [3].Twitter is an online communication and social networking service on which users publish and interact with messages comprehended as “tweets.” On Twitter, users can use each other and spread information further through

“retweeting” [4]. Detecting a person's personality on social media can be done using the Big Five Personality method. In this model, personality can be classified into 5 types, namely Openness (O), Conscientiousness (C), Extraversion (E), Agreeableness (A), and Neuroticism (N) [5]. To support the formation of a personality detection tool, Machine Learning is also needed [6]. One of the known used machine learning models is RoBERTa.

Personality identification using machine learning has been widely researched before. Like previous research [7], it can be proven that by using machine learning, a system to detect personality can be built. In this study, the MBTI personality classification was carried out on the Twitter dataset, and several models were applied to carry out the classification. It was found in this study that the SVM model was able to outperform other classifiers.In this study [8], which formed the RoBERTa model from the BERT model architecture, it was proven to be able to obtain state-of-the-art results on GLUE, RACE, and SQuAD, without multi-task finetuning for GLUE or additional data for SQuAD. In the GLUE Task, RoBERTa accomplishes state-of-the-art results on all 9 of the GLUE task development sets. In the SQuAD task, RoBERTa is able to achieve an accuracy of up to 94.6%, which is better than BERT and XLNet, and in the RACE task, the RoBERTa model is able to outperform the BERT and XLNet model and achieved an accuracy of 83.2%. Research [9] uses the RoBERTa model to detect sarcastic tweets in English based on the RoBERTa pre-train model. The results of this study can achieve a new state-of-the-art from the iSarcasm dataset, which produces an f1-score model of 0.526. Based on previous research [10], using the dialogue-based personality dataset, FriendsPersona. The results show that RoBERTa has the highest accuracy value, with 4 of the 5 big five personalities obtaining the highest accuracy value compared to other models, namely the ABCNN, ABLSTM, HAN, and BERT models. The RoBERTa model obtained the highest accuracy score on the AGR personality of 59.72%, EXT of 60.62%, OPN of 65.86%, and NEU of 61.07%, while for CON personality with an accuracy score of 60.13%, it belongs to the ABCNN model. Based on research [11] uses datasets from Facebook and Twitter. It can be concluded that the averaging model from the BERT, RoBERTa, and XLNet models and the NLP feature can improve prediction accuracy and obtain an accuracy of 77.34% and an f1-score of 0.749.

Based on some of the research results above, using machine learning, a model can be formed that can classify the personalities of Twitter social media users. With reference to the research results above, it can be seen that the A Robustly Optimized BERT Pretraining Approach (RoBERTa) model has an excellent ability to perform NLP tasks. Based on some of the results of previous studies, there has yet to research using the RoBERTa model as a classifier to classify the personalities of Indonesian Twitter social media users, so in this study, the personality detection of Indonesian Twitter users was carried out with RoBERTa. It is hoped that this research will lead to the

(2)

use of personality detection tools by companies when recruiting new employees through prospective employee social media.

2. RESEARCH METHODOLOGY

2.1 Research Stages

Figure 1, a system design, illustrates the flow during the system's construction. System development starts with crawling data to form datasets, then preprocessing the dataset, then splitting the data into training and testing data, and then, through the existing pre-trained model, carrying out RoBERTa to form a model that can perform downstream classification tasks. After the model is formed and can do the classification, the model performances can be evaluated using performance metrics.

Figure 1. Research Stages Flowchart 2.2 Crawling Data

Only tweets from Twitter accounts were included in the dataset, which did not include profile or bio information.

The tweets are from Indonesian Twitter users and are in Indonesian. Respondents fill out personality-related questions on the survey to determine their personalities, which are then used as labels. Following the completion of the survey with the respondent's consent, data is then crawled on Twitter accounts for those respondents. The data results are then saved to a CSV file containing usernames, tweets, and personality labels.

2.2 Data Preprocessing

The data is processed at the data preprocessing stage to make it more structured and easier to process for research.

Table 1 below shows an example of data preprocessing. The process involved during the data preprocessing stages are case folding stage, all capital letters are lowercase, then the remove punctuation stage removes punctuation and symbols, and finally, slang word handling replaces abbreviations, slang words, and typos with common words.

Table 1. Data Preprocessing Examples

Preprocessing Results

Raw Data achadianrani,2020-12-26 11:21:06,Lagi2 suster dan dokter igd disana ngebantu nenangin gue dan laki gue kalo ini termasuk gejala ringan saja. Walaupun ttp ada indikasi covid sih.

Setidaknya gue gak terlalu panik

Case Folding achadianrani,2020-12-26 11:21:06,lagi2 suster dan dokter igd disana ngebantu nenangin gue dan laki gue kalo ini termasuk gejala ringan saja. walaupun ttp ada indikasi covid sih.

setidaknya gue gak terlalu panik Remove

Punctuation

achadianrani lagi2 suster dan dokter igd disana ngebantu nenangin gue dan laki gue kalo ini termasuk gejala ringan saja walaupun ttp ada indikasi covid sih setidaknya gue gak terlalu panik

Slang Word Handling

achadianrani lagi lagi suster dan dokter igd disana membantu menenangkan gue dan laki gue kalau ini termasuk gejala ringan saja walaupun tetap ada indikasi covid sih setidaknya gue tidak terlalu panik

(3)

A Robustly Optimized BERT Pretraining Approach (RoBERTa) is a modification of the BERT model, where from previous research, it can be seen that the BERT model is very undertrained based on the evaluation results of the hyperparameters and the size of the dataset. Some of the modifications made to improve the performance of the BERT model are.

a. Train older models with more data. The BERT pre-train model is only trained on 13GB of data, while for the RoBERTa model, training is carried out on up to 160GB of data, proving to improve accuracy.

b. Removed the next sentence prediction (NSP) objective. The NSP objectivity model will be trained to determine whether the two sentences are related. In the RoBERTa model, the NSP objectivity is removed because it can increase downstream task performance.

c. Training on longer sequences. The BERT model only trains 256 sequences and 1M steps, while the RoBERTa model trains up to 8K sequences and 500K steps.

d. Always change the masking pattern on the training data. In the BERT architecture, masking is only done at the preprocessing stage to produce a single static mask. For the RoBERTa model, the training data is duplicated ten times so that each sequence is masked differently ten times in 40 training epochs [8].

With these modifications, the RoBERTa model is formed, which is an improvement from the BERT model and the RoBERTa model is capable of performing various NLP tasks [8], [12], [13]. This RoBERTa model is trained using the architecture of BERT large [8].

Figure 2. RoBERTa Architecture

Figure 2 shows the architecture of the RoBERTa model, and the way of workings of the RoBERTa model based on the model architecture in Figure 2 are as follows. The RoBERTa model accepts sentences that will be transformed or encoded into tokens by the model so that they can be used as valid input into the model. The valid inputs of the RoBERTa model include input_ids which are the numeric representations of each token [14]. The [CLS] token is added at the beginning of each token sequence to specify the classification, and the token [SEP] is at the end of each token sequence [14]. Then the following input for the model is attention_mask, which is a binary representation of whether the token is padding [14]. Padding will be added to the token sequence if it has a length less than the longest sequence. The addition of this padding is adjusted to the maximum number of tokens that can be accommodated by the RoBERTa model, which is 512 tokens. Next is token_type_ids, a binary representation indicating whether two sentences are sentence pairs. These token_type_ids are usually required in the question- answering task that accepts sentence pair input [14], [15]. Next, inputting the input into the model will be processed into the 12-layer RoBERTa encoder so that the input will be converted into a vector embedding form consisting of embedding tokens, segments embedding, and position embedding [15], [16]. Then, the last_hidden_state output layer will be formed, where all embedding vector words are stored. These word vectors will be trained to understand language, and then the model will be fine-tuned so that it can perform NLP tasks [15], [16].

2.3.1 Preparing RoBERTa Input

After data splitting, which divides the dataset into training and testing data, the tweet data on the training and testing data is ready for encoding. To prepare input into the RoBERTa model, the pre-trained RoBERTa model from Flax Community is used. This pre-trained model was trained using the OSCAR dataset, especially on the unshuffled_deduplicated_id subset corpus. This pre-trained model can achieve an evaluation accuracy of 62.45%[17]. In the encoding process, three inputs are formed that can be processed into the model, namely

(4)

input_ids, attention_mask, and token_type_ids. At this stage, tweets are tokenized, whereas, in the pre-trained RoBERTa model, there is a tokenizer that can directly tokenize the tweet data. This process is adapted to the pre- trained RoBERTa model, where tweets are tokenized according to the understanding of the pre-trained model corpus. If the model understands the word, then the word becomes a single word. If not, then the word is separated into sub-words until it becomes a character known to the corpus of the pre-trained model.

Table 2. RoBERTa Tokenizer Examples

Tokenizer Result

Raw Data semoga lekas sembuh din bandung psb lagi sikecil bingung ya bunda kamu mimpi ngews awan jadi yang ya map atuh tahun end party yuk jinx yuk ki ayo ah langsung yuk kobe co racun tros bandung ajar abang al sekarang pokus dunia model euy menarik tunggu kangen juga iya

RoBERTa Tokenizer

['semoga', 'Ġlekas', 'Ġsembuh', 'Ġdin', 'Ġbandung', 'Ġp', 'sb', 'Ġlagi', 'Ġ', 'Ġsik', 'ecil', 'Ġbingung', 'Ġya', 'Ġbunda', 'Ġkamu', 'Ġmimpi', 'Ġng', 'ews', 'Ġawan', 'Ġjadi', 'Ġyang', 'Ġya', 'Ġmap', 'Ġat', 'uh', 'Ġtahun', 'Ġend', 'Ġparty', 'Ġyuk', 'Ġjin', 'x', 'Ġyuk', 'Ġki', 'Ġayo', 'Ġah', 'Ġlangsung', 'Ġyuk', 'Ġk', 'obe', 'Ġco', 'Ġracun', 'Ġt', 'ros', 'Ġbandung', 'Ġajar', 'Ġabang', 'Ġal', 'Ġsekarang', 'Ġp', 'okus', 'Ġdunia', 'Ġmodel', 'Ġeuy', 'Ġmenarik', 'Ġtunggu', 'Ġkangen', 'Ġjuga', 'Ġiya', 'Ġ']

In table 2 above, after the tokenization process is carried out, as in the example from table 2, each word token is then converted into input_ids form. After all, tokens have been converted into input_ids form, [CLS]

tokens are added at the beginning and [SEP] tokens at the end of each token sequence. Due to the unequal word length in the data, it is necessary to add an attention_mask. Apart from input_ids and attention_mask, there are also token_type_ids. In the classification requirement, all token_type_ids are set to 0, which means that inputs are categorized into one sentence.

Table 3. RoBERTa Inputs RoBERTa

Input

Results

RoBERTa Tokenizer Results

['semoga', 'Ġlekas', 'Ġsembuh', 'Ġdin', 'Ġbandung', 'Ġp', 'sb', 'Ġlagi', 'Ġ', 'Ġsik', 'ecil', 'Ġbingung', 'Ġya', 'Ġbunda', 'Ġkamu', 'Ġmimpi', 'Ġng', 'ews', 'Ġawan', 'Ġjadi', 'Ġyang', 'Ġya', 'Ġmap', 'Ġat', 'uh', 'Ġtahun', 'Ġend', 'Ġparty', 'Ġyuk', 'Ġjin', 'x', 'Ġyuk', 'Ġki', 'Ġayo', 'Ġah', 'Ġlangsung', 'Ġyuk', 'Ġk', 'obe', 'Ġco', 'Ġracun', 'Ġt', 'ros', 'Ġbandung', 'Ġajar', 'Ġabang', 'Ġal', 'Ġsekarang', 'Ġp', 'okus', 'Ġdunia', 'Ġmodel', 'Ġeuy', 'Ġmenarik', 'Ġtunggu', 'Ġkangen', 'Ġjuga', 'Ġiya', 'Ġ']

RoBERTa Encoded Input

{'input_ids': array([[ 0, 87, 2, ..., 1, 1, 1], [ 0, 73, 2, ..., 1, 1, 1],

[ 0, 81, 2, ..., 1, 1, 1], ...,

[ 0, 93, 2, ..., 1, 1, 1], [ 0, 69, 2, ..., 1, 1, 1],

[ 0, 225, 2, ..., 1, 1, 1]], dtype=int32), 'attention_mask': array([[1, 1, 1, ..., 0, 0, 0], [1, 1, 1, ..., 0, 0, 0],

[1, 1, 1, ..., 0, 0, 0], ...,

[1, 1, 1, ..., 0, 0, 0], [1, 1, 1, ..., 0, 0, 0],

[1, 1, 1, ..., 0, 0, 0]], dtype=int32), 'token_type_ids': array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0],

[0, 0, 0, ..., 0, 0, 0], ...,

[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0],

[0, 0, 0, ..., 0, 0, 0]], dtype=int32)}

Table 3 above is an example of valid input into the RoBERTa model, input_ids, attention_mask, and token_type_ids wrapped in a dictionary and ready to be inputted into the RoBERTa model for processing.

2.3.2 Building RoBERTa Model

Once the input for the RoBERTa model has been prepared, then the RoBERTa model can be built. The model starts building by first calling the RoBERTa pre-trained model and finetuning the pre-trained model to perform

(5)

Rianda Khusuma, Copyright © 2023, MIB, Page 546 the classification task. In the first step, the input is entered into the model to be processed in the model base layer, which contains 12 encoder layers, and the output layer is the last_hidden_state layer which is the layer where the embedding of all tokens is saved. Furthermore, we also added several keras layers to the output layers in development. Between them, there is a flatten layer to reduce the dimensions of the previous layer. There is a dropout layer to avoid overfitting the neural network and a dense layer for receiving input from the previous neuron layer and performing classification because it acts as a classifier. For all layers, including the input layer, base model layer, and output layer, the total parameters are as many as 225,310,469, with all parameters trainable due to the absence of a frozen layer.

2.3.3 Classification with RoBERTa

After the model is built, it is fine-tuned so it can be used as a classifier. Classifying personality into five categories based on the big five methods can be done by training the model and validating the results of the model that has been made. At this stage, experiments were also carried out on the parameters to produce the best performance.

2.4 Model Evaluation

The confusion matrix is a benchmark that defines an algorithm's performance in a table used for a classification task. A confusion matrix can visualize and infer classification algorithm performance [18]. The confusion matrix consists of four essential characteristics that are used to determine the classifier measurement metrics, i.e. true positive (TP), false positive (FP), true negative (TN), and false negatives (FN). The performance metrics of an algorithm are accuracy, precision, recall, and F1-score, which will be used to compare the system's efficiency [19].

The following is an explanation and formula for each performance metric shown in equations (1), (2), (3), and (4).

a. Accuracy

Accuracy is a metric that takes into account the total number of predictions the correct one created by the classifier.

The following equation calculates the accuracy [19], [20].

𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (1)

b. Recall

A recall is calculated by taking the proportion of positive inputs that are correctly predicted. The following equation calculates a recall [19], [20].

𝑇𝑃

𝑇𝑃+𝐹𝑁 (2)

c. Precision

Precision is the positive case that the classifier predicts correctly. The following equation calculates precision [19], [20].

𝑇𝑃

𝑇𝑃+𝐹𝑃 (3)

d. F1-Score

F1-score is a measure of test accuracy. F1-score is defined as the weighted average of precision and recall. F1- score has the maximum value of 1 and the worst at 0. The following equation calculates F1-score [19], [20].

2×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 (4)

3. RESULT AND DISCUSSION

3.1 Results

In this research, the testing process used a dataset consisting of usernames, labels, and tweets. The data obtained for testing consisted of 276 data. Of the 276 data, the distribution of personality labels for each user can be seen in the histogram below.

Figure 3. Label Distribution Histogram

(6)

From Figure 3, a histogram of label distribution above, it is known that the label Neuroticism has 52 data, the label Agreeableness has 81 data, the label Openness has 111 data, the label Extraversion has 8 data, and the Conscientiousness label has 24 data.

3.1.1 First Scenario Results

Based on the dataset with distribution according to Figure 3, the tests were carried out with various scenarios, starting from the splitting results of the training and testing data tested with the first scenario parameters. Here are the parameters of the first scenario used.

Table 4. First Scenario Parameters

Parameters Value

Parameter 1 Parameter 2 class_weight ‘none’ ‘none’

learning_rate 1e-5 1e-5

batch_size 32 8

Table 4 shows the parameters and their values for class_weight = 'none', learning_rate = 1e-5, batch_size

= 32 for parameter 1, and batch_size = 8 for parameter 2. Parameters 1 and parameter 2 are used to build the first scenario model. Then the application of parameter 1 and parameter 2 is then carried out to test the splitting results of the training and testing data distribution.

Table 5. First Scenario Results

From table 5 above, the test results from the splitting scenario of dividing training and testing data tested with parameters 1 and 2 obtained the best accuracy at a ratio of 90:10 in the use of parameter 1, which can achieve an accuracy of up to 53.57%, for the following test scenario, will be using a 90:10 splitting ratio, because this scenario obtained the highest accuracy in the first scenario test. Next, the confusion matrix results formed from this scenario model are displayed.

Table 6. First Scenario Confusion Matrix

Actual Results Predicted Results

Neuroticism (2)

Agreeableness (9)

Openness (17)

Extraversion (0)

Conscientiousness (0) Neuroticism

(3)

1 0 2 0 0

Agreeableness (6)

0 3 3 0 0

Openness (13)

1 1 11 0 0

Extraversion (2)

0 2 0 0 0

Conscientiousness (4)

0 3 1 0 0

In table 6 above, it can be seen that the model cannot predict the Extraversion and Conscientiousness classes. This is because of the imbalanced data, which makes the model able to predict the class correctly only in the majority class, namely, Neuroticism, Agreeableness, and Openness. Because the influence of imbalanced data is apparent, metrics precision, recall, and f1-score are also displayed to see the model's prediction performance.

Table 7. First Scenario Precision, Recall, and F1-Score Labels Recall Precision F1-Score

Neuroticism 0.33 0.50 0.40

Agreeableness 0.50 0.33 0.40

Openness 0.85 0.65 0.73

Extraversion 0.00 0.00 0.00

Training Data : Testing Data

Accuracy

Parameter 1 Parameter 2

50 : 50 42.03% 39.13%

60 : 40 42.34% 39.64%

70 : 30 43.37% 39.76%

80 : 20 42.86% 41.07%

90 : 10 53.57% 46.43%

(7)

Conscientiousness 0.00 0.00 0.00

Weighted F1-Score 0.47

Table 7 above shows that the minority class, namely Extraversion and Conscientiousness, has a value of precision and recall, and the f1-score is 0, which means that the minority class is still unpredictable. From table 7, it can be seen that the imbalanced data affects the results of the classification model.

3.1.2 Second Scenario Results

Due to an imbalanced data condition, the test was continued by changing the class_weight parameter, which was previously set to ‘none’, then set to ‘balanced’. From the class weight balancing results, the class weight results for each class are obtained.

Table 8. Class Weight Table

Neuroticism Agreeableness Openness Extraversion Conscientiousness

1.0615 0.6815 0.4972 6.9 2.3

From table 8 above, it can be seen that the weight of the minority class is higher than the weight of the majority class, namely the Extraversion class with a weight of 6.9 and the Conscientiousness class with a weight of 2.3, while the Neuroticism class weights 1.0615, the Agreeableness class has a weight of 0.6815 and the Openness class has a weight 0.4972. By setting the class weight of each class, the test is again carried out with the following parameters.

Table 9. Second Scenario Parameters

Parameter 1 Parameter 2 class_weight ‘balanced’ ‘balanced

Batch_size 32 8

In table 9, we can see the parameters used to test the second scenario. The class_weight = 'balanced' parameter is applied for this second scenario. By applying the parameters according to table 8, a model can be formed with the results seen in table 10.

Table 10. Second Scenario Results Accuracy

50% 46.43%

From table 10, the result of the second scenario with the class_weight = ‘balanced’ application is known to obtain the best accuracy using parameter 1, with an accuracy of 50%, which means there is a decrease from the first scenario. The model's confusion matrix resulting from parameter 1 can be seen as follows.

Table 11. Second Scenario Confusion Matrix

Openness (23)

(3)

0 0 3 0 0

Agreeableness (6)

0 2 4 0 0

Openness (13)

0 1 12 0 0

Extraversion (2)

0 0 2 0 0

0 2 2 0 0

Based on table 11 above, after applying class_weight = ‘balanced’ in the second scenario, it can be seen that the model still cannot predict the Extraversion and Conscientiousness classes, and even in this second scenario, the model also fails to predict the Neuroticism class. The number of true positives in this model is 14. The values for precision, recall, and f1-score of the model that has been formed is shown in the table below.

(8)

Table 12. Second Scenario Precision, Recall, and F1-Score Labels Recall Precision F1-Score

Openness 0.92 0.52 0.67

Based on table 12, precision, recall, and f1-score for the second scenario, it can be seen that the values for precision, recall, and f1-score for the Neuroticism, Extraversion, and Conscientiousness classes are 0, which means that in this second scenario, the model was unable to predict the classes that class. Moreover, the f1-score value of this model is 0.39.

3.1.3 Third Scenario Results

Because still unable to predict the minority class, in the third scenario, a change is made to the learning_rate parameter. The results of parameter changes in the third scenario can be seen in the following table.

Table 13. Third Scenario Parameters

Parameter 1 Parameter 2 class_weight ‘balanced’ ‘balanced’

Batch_size 32 8

In table 13, there is a change in the learning_rate parameter = 1e-5 to learning_rate = 1e-6. Using parameters 1 and Parameter 2, a model can be formed with the following results.

Table 14. Third Scenario Results Accuracy

50% 46.43%

From table 14, the results of the third scenario with changes in the learning_rate = 1e-6 parameter, and still using class_weight = balanced, obtained an accuracy of 50% when using parameter 1. Experiments in this third scenario did not experience changes in accuracy from the second experiment. For the confusion matrix, the results of the third scenario model using parameter 1 are shown in the table below.

Table 15. Third Scenario Confusion Matrix

Openness (13)

(3)

2 1 0 0 0

Agreeableness (6)

1 2 3 0 0

Openness (13)

1 2 9 0 1

Extraversion (2)

1 1 0 0 0

2 0 1 0 1

Table 15 above shows that by changing the learning_rate = 1e-6 parameter, the model can finally predict one of the minority classes, namely the Conscientiousness class. The number of true positives in this model is 14.

For more details, the values of precision, recall, and f1-score are displayed from the models formed in the third scenario.

Table 16. Third Scenario Precision, Recall, and F1-Score Labels Recall Precision F1-Score

(9)

Openness 0.69 0.69 0.69

Based on table 16, precision, recall, and f1-score in the third scenario, the class with precision, recall, and f1-score of 0 is only the Extraversion class. In this third experiment, the Conscientiousness class and the Neuroticism class, which in the second scenario were unpredictable, were successfully predicted by the model in this third scenario. Moreover, with an increase in the model f1-score, the third scenario model proves that the personality prediction is improving.

3.1.4 Fourth Scenario Results

If seen from the second scenario, where the change in class_weight = ‘balanced’ results in a decrease in accuracy, then in the fourth scenario, the class_weight parameter is set to ‘none’ and still maintains learning_rate = 1e-6 because learning_rate = 1e-6 in the third scenario is proven to be able to increase accuracy and help predict class minority. Therefore, the parameters used in the fourth scenario are as follows.

Table 17. Fourth Scenario Parameters

Parameter 1 Parameter 2 class_weight ‘none ‘none’

Batch_size 32 8

In table 17, the class_weight parameter is again set to ‘none’ for the fourth scenario, and the learning_rate parameter is maintained at 1e-6. By using parameter settings 1 and 2, a model with the following accuracy is formed.

Table 18. Fourth Scenario Results Akurasi

57.14% 50%

Based on table 18, the results of the fourth scenario with the test parameters above, a model with an accuracy of 57.14% when using parameter 1, can be formed. This accuracy is the highest accuracy of all the test scenarios that have been carried out. The following shows the confusion matrix table from the fourth scenario model results using parameter 1.

Table 19. Fourth Scenario Confusion Matrix

Openness (14)

(3)

2 1 0 0 0

Agreeableness (6)

1 2 3 0 0

Openness (13)

0 2 11 0 0

Extraversion (2)

1 1 0 0 0

1 2 0 0 1

Based on table 19, which is the confusion matrix resulting from the fourth scenario model, it can be seen that the fourth scenario model can predict the Conscientiousness minority class and still cannot predict the Extraversion class. This model has the highest number of true positives of 16. The table below shows the tables for precision, recall and f1-score in the fourth scenario.

Table 20. Fourth Scenario Precision, Recall, and F1-Score Labels Recall Precision F1-Score

(10)

Labels Recall Precision F1-Score

Openness 0.85 0.79 0.81

Conscientiousness 0.25 1 0.40

In table 20, the same as the third scenario, the model formed in the fourth scenario still has precision, recall, and an f1-score of 0 for the Extraversion personality. However, because the f1-score of this model is the highest with a score of 0.55 and the highest accuracy of 57.14%, this makes the model in the fourth scenario the best model of all existing scenarios for predicting personality.

3.2 Discussion

3.2.1 First Scenario Results Discussion

Several models are formed from the results of several test scenarios that produce different performances. In the first scenario, testing is carried out by setting class_weight = none, learning_rate = 1e-5, batch_size = 32 and batch_size = 8. Testing is carried out with various data-splitting scenarios. Based on the results of scenario 1 testing, all model results that use the parameter batch_size = 8 have lower accuracy than batch_size = 32.

Batch_size is the number of data samples from the dataset entered into the neural network for training. A smaller batch_size will make the algorithm converge faster but generate noise in more extensive computations, whereas using a larger batch_size can help reduce noise, but the algorithm will converge longer. In the first scenario, parameter 1 can produce a model with the best accuracy of 53.57%. The model with the best accuracy in the first scenario is achieved in the 90:10 data splitting condition. The confusion matrix of the model results in table 6 shows that no class predicts Extraversion and Conscientiousness classes. The classification results from the first model can classify Neuroticism personality, which successfully predicts 1 out of 3 data with a recall value of 0.33.

The Agreeableness personality predicts 3 out of 6 data with a recall value of 0.5, and the Openness personality successfully predicts 11 out of 13 data with a recall value of 0.85. The f1-score result of the first scenario model is 0.47, and the model can classify 3 out of 5 personalities. The non-existent predictability of the minority class in this model is due to the uneven distribution of data, where there are only 8 data for the Extraversion class and 24 for the Conscientiousness class. In comparison, there are 111 data for the Openness class, 81 for the Agreeableness class, and 52 for Neuroticism class. Due to this imbalanced data, the model can correctly predict its class only in the majority classes. This is caused by the lack of data for the minority classes, so the model during the training phase has insufficient data to carry out training, thus affecting the model's ability to classify the minority labels.

3.2.2 Second Scenario Results Discussion

To try to solve the problem of the first scenario, then in the second scenario, the parameter class_weight = none is then set to class_weight = balanced. Setting the class_weight = balanced parameter adds the weight of the Extraversion class with a weight of 6.9 and the Conscientiousness class with a weight of 2.3, while the Neuroticism class with a weight of 1.0615, the Agreeableness class with a weight of 0.6815 and the Openness class with a weight of 0.4972. Setting the weight of each of these classes certainly affects the classification of classes in the model training phase. When the class_weight parameter = none, as in the first scenario, then all classes are given a balanced weight and are considered the same during the model training phase, whereas when class_weight = balanced, then the weight is automatically assigned to each class with an inverse comparison of the amount of data in each class, the more data, the smaller the class weight, the fewer data, the bigger the class weight. With the class_weight = balanced application, the model produced by the second scenario can achieve an accuracy of 50%

using parameter 1 in table 9. As seen in table 12, the precision, recall, and f1-score of the second scenario, Neuroticism, Extraversion, and Conscientiousness, has a value of 0, which means that the model fails to predict this personality type. Let us look at the details in table 11, which contains the second scenario confusion matrix.

The results of the classification of the second model can classify Agreeableness personality, which was successfully predicted in 2 out of 6 data with a recall value of 0.33. Openness personality predicted 12 out of 13 data with a recall value of 0.92. From the results of the confusion matrix model in the second scenario, there is an increase in the recall value for the Openness personality from the first scenario. Apart from that, the f1-score for this model is 0.39, and the second scenario model can predict 2 out of 5 personalities. In this second scenario, the model has worse performance for classifying, and there is a decrease in accuracy.

3.2.3 Third Scenario Results Discussion

Due to a decrease in performance in the second scenario, for the third scenario, the parameter learning_rate = 1e- 5 is changed to 1e-6, while for the parameter class_weight = balanced and batch_size = 32 and 8. The learning_rate parameter itself is a parameter that controls how quickly the model adapts to changes in weight during the training phase. The low learning_rate allows the model to perform learning at optimal weight series, up to global optimal.

In the third scenario, a model with an accuracy of 50% is obtained by applying parameter 1, according to table 13.

Table 16 shows the values of precision, recall, and f1-score 0 only for the Extraversion class. The details of this

(11)

Rianda Khusuma, Copyright © 2023, MIB, Page 552 third scenario model for classifying can be seen in table 15. From the third model, it can classify Neuroticism personality, which successfully predicts 2 out of 3 personality data with a precision value of 0.67. Agreeableness personality successfully predicts 2 out of 6 data with a precision value of 0.33, the Openness personality successfully predicts 9 out of 13 data with a precision value of 0.69, and the Conscientiousness personality successfully predicts 1 out of 4 data with a precision value of 0.25. The f1-score of this model is 0.48, and the model can predict 4 out of 5 personalities, which means an increase in the performance of the model for classifying.

Based on the results of the third scenario model, it is proven that selecting the optimal learning_rate dramatically affects the model's performance. This is proven by reducing the learning_rate value in the third scenario. In the end, the model can predict Conscientiousness personality, which the model in the first and second scenarios cannot even predict. This makes the third scenario model better than the first and second scenario models because the third scenario model can already predict four personalities.

3.2.4 Fourth Scenario Results Discussion

When considering the second scenario, setting the class_weight = balanced parameter causes a decrease in the performance of the model. Therefore, in the fourth scenario, the model is formed using the following parameters, class_weight = none, learning_rate = 1e-6, batch_size = 32 and 8. Using parameter 1 in table 17, the model in the fourth scenario can obtain an accuracy of 57.14% with the values of precision, recall, and f1-score 0 only for the Extraversion class, which can be seen in table 20. For details of this fourth scenario model for classifying tasks can be seen in table 19. The fourth model can classify Neuroticism personality, which successfully predicts 2 from 3 personality data with a recall value of 0.67. The Agreeableness personality was successfully predicted 2 out of 6 data with a recall value of 0.33. The Openness personality was successfully predicted 11 out of 13 data with a recall value of 0.85, and the Conscientiousness personality which successfully predicted in 1 out of 4 data with a value recall of 0.25. In the fourth scenario model, there is an increase in the f1-score of this model to 0.55, which is the model with the best f1-score and can predict 4 out of 5 personalities, and the accuracy of this model is the highest accuracy of all existing scenarios. The model in The fourth scenario is the best model for personality classification.

4. CONCLUSION

In this study, a personality classification of Twitter social media users was carried out, classified into the big five personalities: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. In this research, the dataset of tweets from Twitter users that have been preprocessed is used as input to the RoBERTa model to classify personality. The RoBERTa model is used as a classifier to perform classification tasks. In testing the model, there are several test scenarios to find the best results from the model formed. A confusion matrix and performance metrics are used to measure the performance results of the model that has been built. From several test scenarios that have been carried out, the best model results are obtained by setting the parameters class_weight = none, learning_rate = 1e-6, and batch_size = 32. The model tested with these parameters can achieve an accuracy of 57.14% and predict the Neuroticism, Agreeableness, Openness, and Conscientiousness labels. This model can perform reasonably well in carrying out personality classifications even though the data distribution conditions are imbalanced. Suggestions for future research are to conduct model training to a much more balanced data for the category class. Then apart from that, more to explore other parameters that affect model performance such as optimizer, loss, and even changes to the formed model layer.

REFERENCES

[1] D. J. Holman and D. J. Hughes, “Transactions between Big-5 personality traits and job characteristics across 20 years,”

J Occup Organ Psychol, vol. 94, no. 3, pp. 762–788, Sep. 2021, doi: 10.1111/joop.12332.

[2] D. T. Alidemi and F. Fejza, “Theories Of Personality: A Literature Review,” International Journal of Progressive Sciences and Technologies (IJPSAT, vol. 25, no. 2, pp. 194–200, 2021, [Online]. Available: http://ijpsat.ijsht-journals.org [3] K. Simon, “DIGITAL 2022: INDONESIA,” Feb. 15, 2022. https://datareportal.com/reports/digital-2022-indonesia

(accessed Jan. 22, 2023).

[4] H. J. Kawekas, “Application of Social Media Twitter as a Strategy for Government’s Transparency: Study on #Kemala Jateng Program,” Forum Ilmu Sosial, vol. 47, no. 1, pp. 1–7, 2020, doi: 10.15294/fis.v47i1.23424.

[5] N. Hutagalung, “Klasifikasi Tipe Kepribadian Pengguna Sosial Media Berdasarkan Teori BIG Five Menggunakan K- Nearest Neighbor,” Skripsi Sarjana, Universitas Sumatera Utara, Medan, 2018.

[6] W. Bleidorn and C. James, “Using Machine Learning to Advance Personality Assessment and Theory,” Personality and Social Psychology Review, vol. 23, no. 2, pp. 190–203, 2019, doi: 10.1177/1088868318772990.

[7] Md. T. Zumma, J. A. Munia, D. Halder, and Md. S. Rahman, “Personality Prediction from Twitter Dataset using Machine Learning,” in 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022, pp. 1–5. doi: 10.1109/ICCCNT54827.2022.9984495.

[8] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” CoRR, vol. abs/1907.11692, 2019, [Online]. Available: http://arxiv.org/abs/1907.11692

(12)

[9] M. Hercog, P. Jaroński, J. Kolanowski, P. Mieczyński, D. Wiśniewski, and J. Potoniec, “Sarcastic RoBERTa: A RoBERTa-Based Deep Neural Network Detecting Sarcasm on Twitter,” in Big Data Analytics and Knowledge Discovery, 2022, pp. 46–52.

[10] H. Jiang, X. Zhang, and J. D. Choi, “Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings,” CoRR, vol. abs/1911.09304, 2019, [Online].

Available: http://arxiv.org/abs/1911.09304

[11] H. Christian, D. Suhartono, A. Chowanda, and K. Z. Zamli, “Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging,” J Big Data, vol. 8, no. 1, p. 68, 2021, doi:

10.1186/s40537-021-00459-1.

[12] D. Lu, “Masked Reasoner at SemEval-2020 Task 4: Fine-Tuning RoBERTa for Commonsense Reasoning,” in Proceedings of the Fourteenth Workshop on Semantic Evaluation, Dec. 2020, pp. 411–414. doi:

10.18653/v1/2020.semeval-1.49.

[13] M. A. Ayub, K. Ahmad, K. Ahmad, N. Ahmad, and A. I. Al-Fuqaha, “NLP Techniques for Water Quality Analysis in Social Media Content,” CoRR, vol. abs/2112.11441, 2021, [Online]. Available: https://arxiv.org/abs/2112.11441 [14] A. F. Adoma, N.-M. Henry, and W. Chen, “Comparative Analyses of Bert, Roberta, Distilbert, and Xlnet for Text-Based

Emotion Recognition,” in 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2020, pp. 117–121. doi: 10.1109/ICCWAMTIP51612.2020.9317379.

[15] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” CoRR, vol. abs/1810.04805, 2018, [Online]. Available: http://arxiv.org/abs/1810.04805 [16] D. Dhami, “Understanding BERT Word Embeddings,” Medium, Jul. 05, 2020.

https://medium.com/@dhartidhami/understanding-bert-word-embeddings-7dc4d2ea54ca (accessed Jan. 22, 2023).

[17] “flax-community/indonesian-roberta-base,” Huggingface.co, Dec. 02, 2022. https://huggingface.co/flax- community/indonesian-roberta-base (accessed Jan. 22, 2023).

[18] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–

19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

[19] A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit, vol. 91, pp. 216–231, 2019, doi:

https://doi.org/10.1016/j.patcog.2019.02.023.

[20] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, p. 6, 2020, doi: 10.1186/s12864-019-6413-7.