Sentiment Analysis on Twitter Social Media towards Climate Change on Indonesia Using IndoBERT Model

(1)

Sentiment Analysis on Twitter Social Media towards Climate Change on Indonesia Using IndoBERT Model

Muhammad Fadhil Mubaraq^*, Warih Maharani

School Of Computing, Informatics, Telkom University, Bandung, Indonesia

Email: ^1,*[email protected], ²[email protected] Email Penulis Korespondensi: [email protected]

Abstract-The phenomenon of climate change is a change in temperature and weather patterns in the long term. This incident became a frightening specter for everyone because consciously or unconsciously the bad effects of climate change are already in sight. This has become an urgency for all levels of society so that this topic has become quite hot on Social Media, especially on Twitter. The topic of climate change in Indonesia on Twitter Social Media can be analyzed so that it can be seen how people's sentiments towards this phenomenon. This research utilizes the Transformer architecture, namely IndoBERT, IndoBERT itself is the development of the BERT architecture by the IndoNLU team which has 74 million words from various Bahasa Indonesia sources. Therefore, this method was chosen in the hope of helping sentiment analysis on the topic of climate change so that public sentiment can be mapped. The test results obtained an F1-Score values of 95.6% with a tuning parameter of 0.00002 learning rate and 16 of batch size. Hopefully the results of this research can be used in future research.

Keywords: Climate Change, Sentiment Analysis, Twitter, Indobert, Bert

1. INTRODUCTION

Climate change is a long-term change in temperature and weather patterns. This phenomenon initially occurred naturally, but since the 1800s, this change cannot be separated from human intervention, caused by the use of fossil fuels (oil, coal, and gas) which produce gas heat sinks [1]. The phenomenon of climate change has occurred for a long time but has only become an issue that is often sought after for the past few years. In Indonesia itself, the issue of climate change is still a tertiary issue that is still rarely conveyed by politicians but is quite popular among young people, this is reflected in the many active young citizens who are worried and express this sentiment on social media, one of which is Twitter.

Twitter is a social networking and microblogging service that users use to send and read text-based messages called tweets [2]. Quoting from Twitter's 3^rd quarter 2021 financial report [3], it was reported that Twitter's daily active users reached 211 million users. Twitter daily active users in Indonesia is one of the most active daily users in Southeast Asia. Nayomi Kankanamge revealed [4], utilizing social media as one of the expected approaches in assessing citizens' knowledge. This study shows how Twitter social media can be used to analyze sentiments from various topics, one of which is the issue of climate change.

Nowadays, research on sentiment analysis on Twitter social media has been widely published on the internet, some of which use long-established methods such as Naive-Bayes Classifier, Word2Vec and Support Vector Machine. One of the previous studies on sentiment analysis using neural networks showed a fairly high accuracy result but with several variations of scenarios such as data shuffling, learning rate, hidden layer nodes and different dropouts. This indicates that the neural network method is only reliable when some of the previously mentioned factors are appropriate.

Sentiment Analysis is a study that analyzes opinions, sentiments, evaluations, judgments, attitudes, and emotions of people towards entities such as products, services, organizations, individuals, problems, events, topics, and the attributes [5]. Sentiment analysis is divided into two processes, namely Sentiment Extraction and Sentiment Classification. Sentiment Extraction is the process of extracting aspects that have been evaluated [5].

Sentiment Classification is the process of determining opinions about different aspects that are positive, negative or neutral [5].

In research [6], this study conducted a sentiment analysis of someone's tweets regarding flood disaster management, especially in West Java on Twitter social media. This study uses a neural network algorithm with the Term Frequency - Inversed Document Frequency (TF-IDF) method. This study uses several different scenarios of data shuffling, learning rate, node hidden layer and drop out. The best accuracy comes from the eighth scenario with 73.87% accuracy without data shuffling, learning rate of 0.001, 128 node hidden layer and no drop out.

Furthermore, in research [7], this study conducted a sentiment analysis of the dataset obtained from social media Twitter regarding the post-disaster. This study uses the naive Bayes classifier algorithm with the n-gram feature, the word solving in the sentence in this study is divided into two, namely with a single term or n=1 and bigram or n=2. The highest accuracy results obtained from several tests with different ratios of training and test datasets are 93.33% for single terms or unigrams, while for bigrams the accuracy value is 86.67%.

Over time methods for sentiment analysis developed quite rapidly, one of which is IndoBERT. IndoBERT is an extension of BERT, an AI Language developed by Google researchers. IndoBERT is an Indonesian variation of a pre-trained model with more than 220 million words drawn from several sources. The three main sources are the Indonesian Wikipedia (74 million words), news articles from Kompas, Liputan6 and Tempo (55 million words) and Indonesian Web Corpus like Medved and Suchomel (90 million words). IndoBERT is one of the state-of-the-

(2)

art model options for conducting sentiment analysis in Bahasa Indonesia [8], such as a journal published by Bens Pardamean, et al [9]. The journal, entitled Finetunning IndoBERT to Understand Indonesian Stock Trader Slang Language, explains that IndoBERT is the method with the highest accuracy rate out of 10 previous studies, with an accuracy rate of 68%.

The contribution of this research is delivering a fine-tuned IndoBERT [10] model so it can analyze the sentiments of the data withdrawn from Twitter whether its positive or negative sentiments. The discrepancy between this research and previous research is situated in the parameter tuning used in the model explained further on Result and Discussion section.

2. RESEARCH METHODOLOGY

2.1 Research Phases

The system used in this research is a system for sentiment analysis on Twitter social media on climate change in Indonesia using the IndoBERT method. The flow diagram of this system is as follows.

Figure 1. Experiment Flowchart 2.2 Data Crawling

The data crawling process is carried out using a program with the python programming language and the twint library. The data collected is tweets on social media Twitter about climate change in Indonesia. Crawling data was carried out using the keyword 'perubahan iklim’ and 1533 Indonesian-language tweets were collected with a time period of 2 July 2022 to 13 July 2022. The collected data was saved in a CSV format file.

2.3 Data Labelling

The initial stage in designing this system is labeling the data manually by adding ‘1’ for sentences with positive sentiments and ‘0’ for sentences with negative sentiments. Labelling is done collectively by three people and then the sentiment category of the tweet is determined by adding up the sentiment scores. Positive categories covered tweets containing positive words, reporting on climate change management, praising efforts to deal with climate change and others. While negative category covered tweets containing negative words, complaining about climate change conditions, reporting on the effects of climate change and others. This labeling is done with the hope of increasing accuracy in the system to be built. An example of the labelling process is explained on a Table 1 below.

Table 1. Tweet Category

Sentence Label 1 Label 2 Label 3 Sentiment Category Pemanasan global bakal menyebabkan jamaah

haji lebih berisiko karena berpotensi terpapar panas tinggi @aik_arif #Iptek #AdadiKompas

#Haji2022 https://t.co/xtwQ6mDLJm

0 0 0 Negative

Isu perubahan iklim menjadi ancaman nyata.

Untuk itu, Indonesia tengah menyiapkan berbagai kebijakan untuk mendukung transisi energi terbarukan yang ramah lingkungan.

https://t.co/twOjAmPwaU

1 1 1 Positive

2.4 Preprocessing

Before the data trained in the IndoBERT model [11], it must be preprocessed to meet the standard-of the model.

Preprocessing is the process of preparing the data used to conform to predetermined standards, so that the knowledge extraction process can be applied [12]. The data preprocessing step is carried out using a program with

(3)

the Python programming language, the nltk library and the Sastrawi library. Pre-processing is a handy step to fabricate quality data [13]. The steps taken while preprocessing are as follows.

1. Data Cleaning

In the early stages of preprocessing, data cleaning will be carried out to removing all characters other than letters, hashtag, URL, punctuation also removing mentions and username links.

2. Case Folding

The next stage of preprocessing is case folding, which convert all letters into lowercase letters.

3. Stopword Removal

The next stage is stopword removal, which is removing words that are considered meaningless.

4. Stemming

The last stage is stemming, which is removing affixes on words.

An example of the preprocessing process is explained on Table 2 below.

Table 2. Example of The Pre-processing

Preprocess Sentence

Raw Data Pemanasan global bakal menyebabkan jamaah haji lebih berisiko karena berpotensi terpapar panas tinggi, sehingga diharapkan ada langkah bersama untuk menurunkan emisi. @aik_arif #Iptek #AdadiKompas #Haji2022 https://t.co/xtwQ6mDLJm

Cleaning &

Case Folding

pemanasan global bakal menyebabkan jemaah haji di mekah lebih berisiko karena berpotensi terpapar panas tinggi suhu bola basah ratarata di mekah naik hampir 2 derajat celcius antara tahun 19842013 liputanhaji adadikompas Stopword

Removal

pemanasan global menyebabkan jemaah haji mekah berisiko berpotensi terpapar panas suhu bola basah ratarata mekah 2 derajat celcius 19842013 liputanhaji adadikompas

Stemming panas global jemaah haji mekah risiko potensi papar panas suhu bola basah ratarata mekah 2 derajat celcius 19842013 liputanhaji adadikompas

2.5 Splitting Data

The third stage for designing this system is data splitting. This stage is done to separate the dataset into two, namely training data and test data. Training data is used to create a model and testing data to test the model. The proportion of data separation is 70% for train data and 30% for test data.

2.6 BERT

BERT or Bidirectional Encoder Representations from Transformers is one of the main innovations in Contextualized representations learning [14]. BERT is designed to train deep bidirectional representations of unlabeled text by conditioning the left and right sides into a context across all layers. This allows the previously trained BERT model to be matched with just one output layer to create up-to-date models for various tasks.

Figure 2. Pre-training and Fine-tuning BERT [8]

The framework of BERT is divided into two, pre-training and fine-tuning. In pre-training, the model is trained with unlabeled data on different pre-training tasks. Whereas in fine-tuning the BERT model is first initialized with fine-tuned parameters using labeled data from downstream tasks. Each downstream has a separate

(4)

fine-tuned model, although initialized with the same pre-trained parameters. In Figure above it can be seen that the pre-training process at BERT uses two unsupervised tasks.

2.7 IndoBERT Modelling

IndoBERT is a modification of the BERT Base initiated by the IndoNLU team, a masterpiece for sentiment analysis in Indonesian. This model has become popular recently because it is trained with about 4 billion Word Corpus [9]. The model is trained using over 220M words aggregated from three main sources: (1) Indonesian Wikipedia, (2) news article from Kompas, Tempo and Liputan6 also (3) Indonesian Web Corpus (Medved and Suchomel). These resources of the pre-trained model are accesible and easy to reproduce [15]. IndoBERT uses the transformer mechanism that learns the relationship between words in a text/sentence, it trained purely as a masked language model training using the Huggingface framework [11].

There are two stages in the pre-training process of IndoBERT. The first stage is Masked Language Model (Masked LM) where the model will try to predict the original value of the words given the [MASK] token which has been randomly inserted before. An example as in Table 3 below.

Table 3. Masked LM Process

Sentence Encode Token Masked LM

Perubahan Iklim Meningkatkan Risiko Jemaah Haji dari Paparan Panas

[‘perubahan’, ‘iklim’, ‘meningkatkan’, ‘risiko’,

‘jamaah’, ‘haji’, ‘dari’, ‘paparan’, ‘panas’]

The second stage is Next Sentence Prediction (NSP). In this task, given two sentences (A and B) and are asked to predict whether B follows A. Training data can be generated trivially. 50% of the time B follows A in the corpus and 50% of the time B is a random sentences from the corpus. The tokens used are such as (1) [CLS] first token of every sequence, (2) [SEP] used to separate two sentences and (3) [PAD] which is a special token used for padding. An example as in Table 4 below.

Table 4. NSP Process

Sentence Encode Token NSP

Perubahan Iklim Meningkatkan Risiko Jemaah Haji dari Paparan Panas

['[CLS]', 'perubahan', 'iklim', 'meningkatkan', 'risiko', 'jemaah', 'haji', 'dari', 'paparan', 'panas', '[SEP]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]']

2.8 Evaluation

At this stage, evaluation and analysis of the results obtained from the previous stages are carried out. Evaluation carried out using the Confussion Matrix method, a table that allows visualization of the performance of an algorithm [6] to represent the accuracy, precission, recall and F1-Score values. This matrix consist of some variable on the Table 5 as follows.

Table 5. Confussion Matrix Prediction

Actual True Positive False Negative False Positive True Negative The terms in the Confusion Matrix are as follows [16]:

1. True Positive is the number of positive data that is classified correctly by the system.

2. True Negative is the number of negative data that is classified correctly by the system.

3. False Positive is the number of positive data but classified incorrectly by the system.

4. False Negative is the number of negative data but classified incorrectly by the system.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑥 100% (1) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑥 100% (2)

𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑥 100% (3)

F1-Score = 2 𝑥 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛−𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 𝑥 100% 10 / 78 (4)

(5)

Accuracy shows data that classified correctly from the amount of data used [17]. The accuracy value becomes a reference that a classification is done well, the higher the accuracy value, the better the classification.

However, the accuracy value is not a benchmark for classification if the data used is not balanced. Therefore, other calculations are needed, namely precision (positive predictive value) and recall (true positive rate) where these two calculations must be worth one or close to one. Then there is also the F1-score calculation where this calculation is a matrix that takes into account precision and recall, the F1-score value depends on the results of precision and recall calculations.

3. RESULTS AND DISCUSSION

In this research, the system is tested to determine succes rate of the system in terms of the average F1-Score gained from each label. The model built has 6 scenarios with different hyperparamater tunning, each scenarios has different learning rates and batch size. The scenarios value previously mentioned presented in Table 6 as follows.

Table 6. Experiments Scenario

Model Scenario

Learning Rate Batch Size

SMA_01 2e-5 16

SMA_02 2e-5 32

SMA_03 3e-5 16

SMA_04 3e-5 32

SMA_05 5e-5 16

SMA_06 5e-5 32

3.1 Testing Results

According to the hyperparameter tunning attempt before, it can be seen in the Table 7 below that model SMA_01 has the highest F1-Score value compared to other models. Model SMA_01 gets 95.6% of F1-Score value that indicates this model is the best model based on tuning parameters. Lower batch size help the model to increasing F1-Score value.

Table 7. Experiment Results

Model Results

Train Accuracy

Validation Accuracy

Testing Accuracy

Precision Recall F1-Score

SMA_01 0.9530 0.8969 0.9530 0.9640 0.9494 0.9567

SMA_02 0.9512 0.8798 0.9466 0.9451 0.9578 0.9514

SMA_03 0.9392 0.8798 0.9374 0.9630 0.9207 0.9414

SMA_04 0.9080 0.8540 0.9070 0.9473 0.8786 0.9116

SMA_05 0.5427 0.5751 0.5455 0.5455 1.0000 0.7060

SMA_06 0.8362 0.8197 0.8418 0.8793 0.8229 0.8502

3.2 Analysis of Test Results

In Table 8 below shown the model of SMA_01 result of experiment conducted by this research as explained in detail in Research Method Section, the dataset holds 1533 records and divided as training set with 1087 tweets also test and validation set with each 233 tweets. From table above it can be concluded that the scenario is the best scenario by giving an accuracy results 95.3% for training set at 1087 tweets. In the validation set, the proposed model performed with accuracy: 89.6% at 233 tweets. In the testing set, the model performed with 95.3% at 233 tweets. The values gained from the train, validation and test above are the best results that can be obtained from the proposed model due to its lower learning rate and batch size. The more lower learning rate and batch size used the more time taken to train the model but generate higher accuracy [18].

Table 8. Model SMA_01 Accuracy Results

Model Results

Train Accuracy Validation Accuracy Testing Accuracy

SMA_01 95.3% 89.6% 95.3%

Besides differentiate the tuning parameters on the proposed model, the performance and accuracy of every scenarios are also affected by the number of datasets and some potential on wrongly-labelled data since the labelling process still relying on human.

(6)

4. CONCLUSION

This research succesfully fine-tuning IndoBERT model to implement sentiment analysis in case of Climate Change topic on Twitter in Indonesia. In the scope of sentiment analysis, the proposed model succesfully accomplished best results with an average F1-Score of 95.6% on the model with learning rate of 2e-5 and 16 batch size. These results exceed the performance of the best results with an average training accuracy of 95.3% due to lower batch size. However, IndoBERT model still have some drawback due to its incompatibility to some model scenario on this research. The overall value on model SMA_05 is significantly contrast to other model due to its incapability to predict the negative-labelled records. Future research is needed to discover a better model scenarios especially on model SMA_05 by discovering the best scenarios between learning rate and batch size to improve the prediction process on the system made.

REFERENCES

[1] “What Is Climate Change? | United Nations.” https://www.un.org/en/climatechange/what-is-climate-change (accessed Aug. 07, 2022).

[2] “About Twitter | Our company purpose, principles, leadership.” https://about.twitter.com/en/who-we-are/our-company (accessed Aug. 07, 2022).

[3] “Twitter’s Daily Active Users Increase By 13 Percent In Q3 2021 / Digital Information World.”

https://www.digitalinformationworld.com/2021/10/twitters-daily-active-users-increase-by.html (accessed Aug. 07, 2022).

[4] N. Kankanamge, T. Yigitcanlar, A. Goonetilleke, and M. Kamruzzaman, “Determining disaster severity through social media analysis: Testing the methodology with South East Queensland Flood tweets,” Int. J. Disaster Risk Reduct., vol.

42, p. 101360, Jan. 2020, doi: 10.1016/J.IJDRR.2019.101360.

[5] B. Liu, “Sentiment Analysis and Opinion Mining,” http://dx.doi.org/ 10.2200/S00416ED1V01Y201204HLT016 , vol. 5, no. 1, pp. 1–184, May 2012, doi: 10.2200/S00416ED1V01Y201204HLT016.

[6] A. Layalia Safara Az-Zahra Gunawan and K. Muslim Lhaksamana, “Analisis Sentimen pada Media Sosial Twitter terhadap Penanganan Bencana Banjir di Jawa Barat dengan Metode Jaringan Saraf Tiruan Sentiment Analysis On Twitter Social Media On Flood Disaster Management In West Java With Neural Network Method”.

[7] F. Rozi et al., “ANALISIS SENTIMEN PADA TWITTER MENGENAI PASCA BENCANA MENGGUNAKAN METODE NAÏVE BAYES DENGAN FITUR N-GRAM,” J. Inform. Polinema, vol. 6, no. 2, pp. 33–39, Mar. 2020, doi:

10.33795/JIP.V6I2.316.

[8] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

[9] R. Rahutomo and B. Pardamean, Finetunning IndoBERT to Understand Indonesian Stock Trader Slang Language.

[10] B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,”

2020, [Online]. Available: http://arxiv.org/abs/2009.05387

[11] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” pp. 757–770, 2021, doi: 10.18653/v1/2020.coling-main.66.

[12] E. Acuña, “PREPROCESSING IN DATA MINING”.

[13] J. Cheng and R. Greiner, “Comparing Bayesian Network Classifiers,” pp. 101–108, 2013, [Online]. Available:

http://arxiv.org/abs/1301.6684

[14] M. E. Peters et al., “Deep contextualized word representations,” NAACL HLT 2018 - 2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, pp. 2227–2237, Feb. 2018, doi: 10.18653/v1/n18- 1202.

[15] “GitHub - IndoNLP/indonlu: The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020).”

https://github.com/IndoNLP/indonlu (accessed Aug. 07, 2022).

[16] P. Singh, N. Singh, K. K. Singh, and A. Singh, “Diagnosing of disease using machine learning,” Mach. Learn. Internet Med. Things Healthc., pp. 89–111, Jan. 2021, doi: 10.1016/B978-0-12-821229-5.00003-3.

[17] I. Menarianti, “Klasifikasi data mining dalam menentukan pemberian kredit bagi nasabah koperasi,” J. Ilm. Teknosains, vol. 1, no. 1, pp. 1–10, 2015, [Online]. Available: http://e-jurnal.upgrismg.ac.id/index.php/JITEK/article/view/836 [18] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.