Sentiment Analysis on Tweets of Kanjuruhan Tragedy Using Deep Learning IndoBERTweet

(1)

Sentiment Analysis on Tweets of Kanjuruhan Tragedy Using Deep Learning IndoBERTweet

Adhyaksa Diffa Maulana^*, Kemas Muslim Lhaksmana School of Computing, Informatics, Telkom University, Bandung, Indonesia Email:¹,*[email protected], ²[email protected]

Correspondence Author Email: [email protected]

Abstract−The incident that occurred in Indonesian football at the Kanjuruhan Stadium was caused by unscrupulous supporters who entered the field and unscrupulous officers who fired tear gas into the stands. With this incident, many responses and opinions were given by the Indonesian people through social media Twitter in the form of positive, negative, and neutral opinions. This difference in opinion occurred because of the many victims who died or were injured, with many supporters who did not like the actions taken by the authorities during the riots. With this incident, the government must make decisions to ease the concerns of the community. Therefore, research will be conducted to analyze the sentiment of public opinion regarding the Kanjuruhan tragedy using the IndoBERTweet method with a comparison using naive Bayes. The results of this study using the IndoBERTweet method get better results than naive Bayes method. With the results of the IndoBERTweet method 88% accuracy, 82% precision value, 85% recall value, and 84% f1-score value, naive the Naive Bayes results are 62%

accuracy, 59% Precision Value, 61% Recall Value, and f1-Score of 59%.

Keywords: Kanjuruhan; Sentiment Analysis; Twitter; IndoBERTweet; Naïve

1. INTRODUCTION

Football is a very popular sport in Indonesia, and many Indonesians are passionate about the game. Indonesia also has the Indonesian Football Association (PSSI) as an organisation to manage football in Indonesia. Indonesian football management is not as modern as in other developed countries, but it has a lot of young talent, dozens of international stadiums in Indonesia, and an audience or supporters who are known to be fanatical [1]. Too much fanaticism is not good either, as it can lead to riots and even the deaths of the fans themselves. In this Kanjuruhan tragedy, the authorities made a mistake by firing tear gas at the crowd or supporters because FIFA rules prohibit firing tear gas on the pitch [1]. This caused spectators and supporters to rush off the pitch, resulting in deaths and injuries. After the incident, many people took to social media, especially Twitter, to voice their opinions. This tragedy has also received more attention from football observers, artists and state officials. The Twitter application also gets number one trending with the hashtag kanjuruhan tragedy.

This tragedy encourages virtual social activities on social media, especially Twitter [2]. There are 3 effects of social media, namely cognitive, affective, and conative. Cognitive is awareness of learning and knowledge, effective effects are about emotions, feelings, and attitudes, while conative effects are about the intention to do something [3]. Related to tweets on the Kanjuruhan event, we can choose messages that can be taken from the event with the desired choices [4]. This research uses 3 sentiment polarities, namely positive, negative,and neutral [5].

Research related to sentiment analysis on the Kanjuruhan tragedy; previously, no one has conducted research on the Kanjuruhan tragedy because this incident is relatively new. This incident occurred in early October 2022. When it comes to sentiment analysis, there have been many journals that have conducted research. Journals that conduct research using Deep Learning IndoBertweet, for example, in the journal "INDOBERTWEET: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization" in 2021 written by Fajri Koto and friends. In this study, comparing the results of the algorithm model between mBERT, MalayBERT, IndoBERT [5], and IndoBERTweet(1M steps) obtained HS1 results of 88.8% and HS2 87.5% [5]. From the results that were compared to several models, IndoBERTweet obtained greater results than other BERT algorithm models. This research also has another algorithm to compare with IndoBERTweet, namely the Naive Bayes algorithm. I chose Naive Bayes because in the journal “Sentiment Analysis of Review Datasets using Naive Bayes' and K-NN Classifier” in 2018 written by Haditsah Annur, the results of the recall for Naive Bayes movie review were 80.12%, Precision 84.09%, accuracy 82.43% [6]. Another journal that uses the IndoBERT method, written by Ilham Rizki Hidayat and Warih Maharani with the journal title "General Depression Detection Analysis Using IndoBERT Method" [7] gets results with 51% accuracy, 48% precision, 23% recall, 31% F1-score . This study also uses the Naïve Bayes method as a comparison between methods. The Naïve Bayes method entitled "Classification of Novel Synopsis Using the Naïve Bayes Classifier Method" [8] obtained 86%

recall, 89% precision. Journal with the Naïve Bayes method entitled "Optimization of Naive Bayes Using Genetic Algorithms as Feature Selection to Predict Student Performance" with naïve Bayes accuracy results of 91% and naïve Bayes+ga 97% [9]. The studies mentioned earlier have different results because of the varying types of datasets and the number of datasets. The results of the research mentioned can be a reference in writing this research to find the performance of the IndoBERTweet and Naïve Bayes methods.

This research will be evaluated on the results of the system as a comparison between IndoBERTweet and Naive Bayes. The evaluation will be evaluated with accuracy, recall, Precision, and F1-Score. Based on the

(2)

comparison between models of invisible domain words can expand the vocabulary of refinement models [10].

Research that has been discussed, the purpose of this final project is to carry out sentiment analysis using the IndoBERTweet deep learning method. The dataset that will be used is the tweet dataset on Twitter related to the Kanjuruhan tragedy. The IndoBERTweet deep learning method is used because the algorithm's results have better accuracy than other deep learning [11]. This evaluation can make it easier to see the results of tweets on the Twitter application.

2. RESEARCH METHODOLOGY

2.1 General System Description

Contains an explanation of the stages of research (MANDATORY IN THE ARTICLE) which describes the The system built in this research is the IndoBERTweet Deep Learning system. Using the Twitter application can perform sentiment analysis with tweets that can be classified into positive and negative tweets. The research phase begins with collecting datasets on the Twitter application. The dataset will be divided into two, namely test data and train data. The previous dataset has gone through pre-processing first. Furthermore, the train data will be pre- trained using IndoBERTweet. After the pre-trained process is carried out, then the testing process uses the test data. In this study there are also other algorithms for comparison, namely, I use the Naive Bayes algorithm as a comparison. Furthermore, accuracy, recall, Precision, and F1-Score will be evaluated. IndoBERTweet and Naive Bayes algorithms are evaluated as a comparison of this research. The system design in sentiment analysis research on the Kanjuruhan tragedy is carried out in stages as shown in Figure 1.

Figure 1. System Flowchart

Using the NLP technique, this classification is very effective because it can find out information accurately and quickly. Classification lies at the heart of human and machine intelligence. Deciding what letters, words or pictures have been presented to our senses, identifying faces or voices, selecting letters, and assigning grades for homework; all are examples of turning on categories to inputs [12].

2.2 Dataset

The dataset used in this research is tweet comment data from Twitter social media. The data was taken when the hashtag Kanjuruhan tragedy was popular during the month of October. This data is taken by crawling data on the website https://developer.twitter.com by sending it from Twitter in the form of APIs or tokens. The label for each tweet is positive and negative.

2.3 Pre-Processing

This pre-processing process is done before the data is used to build the model [13]. Several stages will be carried out in this process, including checking the data whether there are missing values and outliers contained in the

(3)

dataset. Then handle missing values and outliers contained in the dataset. After that, data normalization will be carried out using the Min-Max smaller so that the scale of the data in each column is the same and reduces its dimensionality. The goal is to optimize and speed up the pre-trained process. The data preprocessing stage consists of four steps, namely:

A. Case Folding

Case folding is the process of making all letters in a sentence into lowercase letters that often occur at the beginning of sentences, cities, names, and others.

B. Data Cleaning

Data cleaning is the steps in cleaning the data, including HTML escape code to remove HTML tags from sentences, sentence URL removal to remove all links, mentions by removing words beginning with "@", punctuation removal to remove all sentence punctuation, and removing numbers to remove all numbers from sentences [14].

C. Stopword Removal

The removal of stop words reduces the dimensionality of the data and removes words that do not contain emotional elements, such as personal pronouns, conjunctions, and prepositions, using the Literary Library [14].

D. Word Steaming

Word Stemming is the process of converting words into base words. This stage is very important in text-based classification [14].

2.4 Spliting Data

In this data splitting process, the data for the sentiment model design process will be divided into 3 parts, namely training data, validation data, and test data. Training data and validation data will be used in the process of training the model while test data will be used as model evaluation data. The percentage of data division for each data is 80% for training data, 10% for evaluation data, and 10% for test data.

2.5 IndoBERTweet

After data sharing. The next stage in model design in this study uses the IndoBERTweet method with the stages as in Figure 2.

Figure 2. Flowchart of IndoBERTweet Model Design

The next step in model design is to perform the training process. The process of training new data using a pre-trained model is referred to as fine-tuning. In the IndoBERT method because it follows the BERT model architecture, in the fine-tuning process in the classification case a dense layer is added to the output layer according to the number of data labels trained [15]. Illustration of the fine-tuning process for the classification model in the BERT method.

To perform the classification process the model needs to understand the relationship between sentences during the fine-tuning phase the model. During this phase, the model will combine two input sentences randomly in the corpus used. The model must then predict whether the two sentences separated by tokens [SEP] are related or not [16]. The process of identifying the relation between sentences in the finetuning phase of the model is known as the next sentence prediction (NSP) process.

The input for this model is data that has gone through the preprocessing stage in section 2.3. The steps taken to create the IndoBERTweet model begin with tokenization using AutoTokenizer. From the results of the data splitting process in section 2.4, training data will be obtained to be used to train the model using AutoModel IndoBERTweet, and test data used to test the prediction results of the IndoBERTweet model. The output of this method is in the form of model sentiment label prediction results on test data which will then be evaluated in section 2.7. The results of the IndoBERTweet model will be compared with the Naive Bayes model.

2.6 Naive Bayes

Stages of design using the Naive Bayes method as a comparison as in Figure 3

(4)

Figure 3. Naïve Bayes Model Design Flowchart

This Naïve Bayes stage is for the comparison method between the IndoBERTweet Method and naïve bayes.

The input for this model is data that has gone through the preprocessing stage in section 2.3. The steps taken to create the IndoBERTweet model begin with tokenization using naïve bayes. Bayesian classification is a statistical classification that can be used in predicting the probability of belonging to a class. Bayesian classification is based on Bayes' theorem, which has classification characteristics similar to decision trees and neural networks. Bayesian classification has high accuracy and speed when applied to large databases. The Bayesian method is a statistical approach to inductive reasoning for classification problems [17]. When a normally distributed random sample dataset is generated and used to calculate pn(x), the results shown in Fig. 4.5 are obtained [18]. We first discuss the basic concepts and definitions of Bayes' Theorem, then use this theorem to perform classification in Data Mining. Bayes' Theorem has the following general form [19]:

P(H | X) =P(X|H)P(H)

P(X) (1)

The output of this method is in the form of model sentiment label prediction results on test data which will then be evaluated in section 2.7. The results of the Naive Bayes model will be compared with the IndoBERTweet model.

2.7 Evaluation

The evaluation stage is where we see how well the training has performed using the training data. Confusion matrix is a way of measuring performance in two or more classes. Confusion matrix is a method that helps evaluate the performance of the built classification model [20]. Each column of the matrix represents what was classified based on the prediction, and each row of the matrix represents what was classified based on the actual class label. The True Positive (TP) section indicates the number of correct hits or predictions in the positive class. The False Negative (FN) part shows how many instances of that class were missed, incorrectly predicting it as a negative class. The false positive fraction (FP) is the number of instances that were wrongly predicted as a positive class when they were not. The fraction of true negatives (TN) is the number of cases that were correctly predicted as a negative category illustrated in Table 1.

Tabel 1. Confusion Matrix

Actually Positive Actually Negative Predicted Positive True Positives (TPs) False Positives (FPs) Predicted Negative False Negatives (FNs) True Negatives (TNs) Description:

TPP (True Positive Positive) = number of class 0 documents correctly classified as class 0. TNegNeg (True Negative Negative) = number of class 1 documents correctly classified as class 1. TNetNet (true neutral neutral)

= number of class 2 documents correctly classified as class 2 PFNeg (negative false positive) = number of class 0 documents incorrectly classified as class 1 NegFP (false negative = class document number) 1[21].

Next, the evaluation process stage will be carried out on the IndoBERTweet algorithm. The evaluation matrix that will be used at this stage is accuracy, recall, Precision, and F1-Score. Calculation of the value of the two evaluators, namely accuracy, recall, Precision, and F1-Score, is carried out to measure the performance of the IndoBERTweet algorithm using the mathematical equation formula for accuracy, recall, Precision, and F1-Score [7]. From the evaluation results, the performance value of the IndoBERTweet algorithm will be obtained. Using this formula, an evaluation will be carried out which will be compared with the Naive Bayes algorithm.

By using the mathematical formulas of accuracy, recall, Precision, and F1-Score. This research will use the formula:

A. Accuracy

(5)

Accuracy can be defined as the ratio of accurately classified data to the total amount of observation data. The accuracy metric is calculated using the following formula.

Accuracy = ^{TP + TN}

TP + FP + FN + TN (2)

B. Recall

Recall is used to measure the ratio of correctly classified positive patterns out of the overall data. The recall metric is calculated using the following formula.

Recall = ^TP

TP + FN (3)

C. Precision

Precision is used in measuring the predictions of correctly predicted positive patterns out of the total predicted patterns in the positive class. The precision metric is calculated using the following formula.

Precision = ^TP

TP + FP (4)

D. F1-Score

The F1-Score evaluation metric takes a weighted comparison by averaging the results of precision and recall to calculate the summary performance of a model. The F1-Score metric is calculated using the following formula.

F1 − Score =2 ∗ Recall ∗ Precision

Recall + Precision (5)

2.8 Analysis

Analysis research at this stage is a process for analyzing the evaluation of the performance that has been carried out for analysis [22]. With this IndoBERTweet algorithm, the performance is analyzed with a comparison algorithm, namely the Naive Bayes algorithm. This research will analyze two algorithms for comparison between the IndoBERTweet algorithm and Naive Bayes.

3. RESULT AND DISCUSSION

3.1 Sentiment towards the kajuruhan tragedy

The dataset collected by this study consists of 13,600 records of collected index data. After processing the data, 3,277 records were collected. The number of tags in this study, negative tags is 2426, and the number of positive tags is 851. The distribution of these tags in the dataset is shown in Figure 4. Tag '0' means negative and '1' means positive. Therefore, this data was used for modeling, and IndoBERTweet and Naive Bayes used a ratio of 74.03%:

25.97% for positive and negative labels.

Figure 4. Label 3.2 IndoBERTweet

This IndoBERTweet method uses the dataset listed in figure 4, namely by using data that has been pre-processed with the parameters in Table 2.

(6)

Tabel 2. IndoBERTweet Parameters Parameter Value

Epoch Batch _size

Train Random state

5 8 2621

0

The parameters used in making this research to build the IndoBERTweet model are shown in Table 2.

batch_size is 8. When separating data into training data, validation data, and test data, test data with a ratio of 80%:10%:10 is used. Random_state is set to 0 so that the data is not randomized. The performance model is shown in Table 3

Tabel 3. Performance IndoBERTweet

Accuracy Precision Recall F1-Score

Positive Negative Positive Negative Positive Negative

88% 71% 94% 81% 90% 75% 92%

From the experimental results that have been carried out in Table 3, the IndoBERTweet model provides good evaluation results, the IndoBERTweet model produces an accuracy score of 88%. This high accuracy is possible with a large dataset and a model for the Indonesian language. Therefore, the performance of the IndoBERTweet model can be calculated by summing up all the prediction results of the report classification for the evaluation results. Based on the calculation, the accuracy value of the model for the dataset is 88%. 82%

precision value, 85% recall value, and 84% f1-score value.

As mentioned above, the IndoBERTweet model produces accuracy with a value of 88%, with precision, recall, and f1-score values for negative and positive labels. After analyzing this evaluation data, it is known that it is very difficult to do positive results because the dataset has many negative comments. Judging from Figure 5 confusion matrix model predicts the correct answer with 63 out of 78 for positive labels and 224 out of 250 for negative labels.

Figure 5. IndoBERTweet Confusion Matrix 3.3 Naive Bayes

This Naive Bayes method uses the dataset listed figure 4, namely by using data that has been pre-processed with the parameters in table 4.

Table 4. Parameter Naive Bayes Parameter Value Random State

Test Size Gaussian

42 0.2 GaussianNB

The parameters used in this research writing to build the Naive Bayes model are described in Table 4.

Random State 42 and test size 0.2. In this Naive Bayes using Gaussian method, namely Naive Bayes Gaussian Performance can be seen in Table 5.

(7)

Table 5. Performance Naive Bayes

Accuracy Precision Recall F1-Score

Positive Negative Positive Negative Positive Negative

62% 39% 80% 60% 62% 47% 70%

From the results of Table 4.4 the performance of the Naive Bayes method with an evaluation value of getting the results of the Accuracy Value of 62%, Precision Value of 59%, Recall Value of 61%, and f1-Score of 59%. By using the Gaussian Technique. Gaussian technique with the assumption that the distribution of values is continuous concerning each feature containing a numeric value. When plotted, a symmetrical bell-shaped curve appears about the average feature value.

With the evaluation results on the Naive Bayes Method with the Gaussian technique, it can be seen from each attribute that the positive label is relatively low due to the small number of positive tweets, while the negative label is relatively higher than the positive label due to the high number of tweets on the negative label. Negative labels are more active than positive labels. Of the 250 negative data successfully predicted 224 data, and a total of 78 positive data successfully predicted 63 data.

Figure 6. Confusion Matrix

4. CONCLUSION

Based on the results of the research and discussion with 3,277 data collected and labeled about the Kanjuruhan Stadium tragedy in Malang, the ratio of negative to positive feelings was 74.03%: 25,97%. The dataset was also used to build IndoBERTweet method and Naive Bayes method. The best accuracy model is generated on model evaluation with IndoBERTweet accuracy of 88, Naive Bayes accuracy value of 62%. For each estimation result, the IndoBERTweet model shows better results than Naive Bayes in terms of accuracy, precision, recall and f1 scores. The difference in evaluation scores for all datasets, precision 26%, precision 23%, recall 24%, and f1 score 25%, the IndoBERTweet method has a higher and better score for all attributes compared to the Naive Bayes method. Based on these results, it can be concluded that IndoBERTweet is a better technique than Naive Bayes in this study, especially in sentiment analysis research. The IndoBERTweet method is good because it is also a new method, but the Naive Bayes method may be an older method and more data from a data set with more tags can be used for further investigation to draw conclusions. you can get more evidence for the model. Comparing. Then use different parameters for different models to find the right values that improve the performance of your method.

REFERENCES

[1] S. A. Azzahra, “Human Rights Violation in The Rioting of Supporters: Case of Kanjuruhan Football Stampede.” [Online].

Available: https://news.detik.com/berita/d-6331229/total-korban-jiwa-tragedi-kanjuruhan-dan-rinciannya-data-5- [2] Y. Mogot, E. Agus, W. B. Riset, I. Nasional, and O. Solihin, “GERAKAN SOSIAL VIRTUAL MENYIKAPI TRAGEDI

KANJURUHAN,” Dewantara : Jurnal Pendidikan Sosial Humaniora, vol. 1, no. 4, 2022.

(8)

[3] F. Fitriansyah Program Studi Penyiaran Akademi Komunikasi BSI Jakarta and C. Sitasi, “Efek Komunikasi Massa Pada Khalayak (Studi Deskriptif Penggunaan Media Sosial dalam Membentuk Perilaku Remaja),” Cakrawala, vol. 18, no. 2, pp. 171–178, 2018, doi: 10.31294/jc.v18i2.

[4] E. Dwi and S. Watie, “Komunikasi dan Media Sosial (Communications and Social Media),” 2011. [Online]. Available:

http://id.wikipedia.org/wiki/Media_sos

[5] R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. Sekar, T. Ayu, and W. F. Dicka, “Dataset Indonesia untuk Analisis Sentimen,”

2019.

[6] L. Dey, S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari, “Sentiment Analysis of Review Datasets using Naïve Bayes’

and K-NN Classifier.” [Online]. Available: www.imdb.com

[7] I. R. Hidayat and W. Maharani, “General Depression Detection Analysis Using IndoBERT Method,” International Journal on Information and Communication Technology (IJoICT), vol. 8, no. 1, pp. 41–51, Aug. 2022, doi:

10.21108/ijoict.v8i1.634.

[8] V. Rahmayanti Setyaning Nastiti and S. Basuki, “Klasifikasi Sinopsis Novel Menggunakan Metode Naïve Bayes Classifier,” vol. 1, no. 2, pp. 125–130, 2019.

[9] S. Busono, “Optimasi Naive Bayes Menggunakan Algoritma Genetika Sebagai Seleksi Fitur Untuk Memprediksi Performa Siswa,” Jurnal Ilmiah Teknologi Informasi Asia, vol. 14, no. 1, 2020.

[10] F. Fernández-Martínez, C. Luna-Jiménez, R. Kleinlein, D. Griol, Z. Callejas, and J. M. Montero, “Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension,” Applied Sciences (Switzerland), vol. 12, no. 3, Feb. 2022, doi: 10.3390/app12031610.

[11] B. Juarto, “International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING Indonesian News Classification Using IndoBert.” [Online]. Available: www.ijisae.org

[12] D. Jurafsky and J. H. Martin, “Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Third Edition draft Summary of Contents.”

[13] J. Engel et al., “Breaking with trends in pre-processing?,” TrAC - Trends in Analytical Chemistry, vol. 50. Elsevier B.V., pp. 96–106, 2013. doi: 10.1016/j.trac.2013.04.015.

[14] M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.

[15] A. Y. S. H. W. D. F. A. B. N. Y. Kuncahyo Setyo Nugroho, “BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews,” Association for Computing Machinery, vol. 21, no. 6, pp. 258–264, Sep. 2021.

[16] W. Shi and V. Demberg, “Next Sentence Prediction helps Implicit Discourse Relation Classification within and across Domains.” [Online]. Available: https://github.com/google-research/

[17] I. Rish, “An empirical study of the naive Bayes classifier.”

[18] “Duda_Pattern_classification”.

[19] H. Annur, “KLASIFIKASI MASYARAKAT MISKIN MENGGUNAKAN METODE NAÏVE BAYES,” 2018.

[20] I. Nawangsih, I. Melani, S. Fauziah, and A. I. Artikel, “PELITA TEKNOLOGI PREDIKSI PENGANGKATAN KARYAWAN DENGAN METODE ALGORITMA C5.0 (STUDI KASUS PT. MATARAM CAKRA BUANA AGUNG,” Jurnal Pelita Teknologi, vol. 16, no. 2, pp. 24–33, 2021.

[21] D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” 2021.

[22] D. Alita and A. Rahman, “Pendeteksian Sarkasme pada Proses Analisis Sentimen Menggunakan Random Forest Classifier,” 2020.