BERT Implementation on News Sentiment Analysis and Analysis Benefits on Branding
Muhammad Faris Abdussalam, Donni Richasdy*, Moch Arif Bijaksana Informatics, School of Computing, Telkom University, Bandung, Indonesia
Email: 1[email protected], 2,*[email protected], 3[email protected] Email Penulis Korespondensi: [email protected]
Abstract−The rapid development of information makes data processing easy and fast, especially in the business world, so many business brands have used the internet as a marketing medium for their operations. Now the business does not only depend on its operations; now, the opinion of the public media, especially on the news, has become an essential spotlight in today's business, especially against negative opinions that indirectly impact the image and product branding of the business, we need the proper means to help identifying and analyzing this kind of news. This study aims to identify and analyze sentiment with negative and positive indications on news titles from one of the sources of an Indonesian online news portal using the Bidirectional Representations from Transformers (BERT) sentiment analysis method, with the measurement of the confusion matrix metrics to measure and identify which headlines contains negative and positive indications. The sentiment analysis system offers identification and categorization with ease and immediately provide good results on identifying news. The results of this study, the sentiment model achieves an accuracy rate of 93% in identifying negative and positive news and F1-Score on negative identification rate of 92% and positive identification rate of 93%. The sentiment analysis system was built as effort to help analyzing against positive news indications or awful news as analysis benefits carried out to identifying alarming news indications towards branding.
Keywords: Sentiment Analysis; News; (BERT); Confusion Matrix; Branding
1. INTRODUCTION
In recent years of information advancement, the rapid advancement of information technology makes it easier to access and collect information to be presented to the public, specifically via the internet. The ultimate goal of using the internet is the use of social media. Indirectly, social media facilitates information and news for the Indonesian people to consume [1]. From this, it has advantages that can be used, for example, in branding a business. The branding technique was prevalent before the internet and is very profitable, allowing businesses or influencers to easily reach their audience with their brand to the broader community [2].
The brand is also the identity of the business itself because the brand is the primary reflection that will be seen first by the public. Due to the development of information technology, the business brand can reach out quickly to the broader community, especially on online news portals that are easily accessible by the public to get the latest information instantly [2]. Not all brands have a favorable view in society; along with the ease of dissemination of information, public opinion can quickly reach a broader audience. These opinions are usually undetectable with their harmful content, especially opinions that point to a particular brand [2]. Negative opinions or opinions indirectly have an impact on the image and product branding of the business. Therefore, negative opinions must be mitigated to maintain a good picture of the business brand to the public; this is where the role of technology becomes crucial to facilitate the mitigation by using sentiment analysis [3].
Sentiment analysis uses analytical texts to obtain various data sources from the internet and other information platforms by retrieving information about a consumer's perception of a product, service, or brand [4].
Sentiment Analysis applies Natural Language Processing (NLP), Neural Processing Language is part of computer science and linguistics that studies the interaction between computers and human (natural) language to extract, transform, and interpret text information and classify them into different semantic groups [5].
In this study, we analyze the indications of positive and negative news from news titles on an online news portal source in Indonesia using the BERT algorithm to help mitigate and analyze alarming news. BERT algorithm is an open source technology based on a neural network for pre-training NLP [6]. BERT algorithm will be used to classify information with the Sentiment Analysis technique to be grouped and processed into two types of polarity;
positive polarity, categorized as news that does not contain harmful elements. The negative polarity is categorized as news that contains harmful elements with the scheme system design, system testing by using multilingual pre- trained provided from BERT base model then comparing the results against the Indonesian-only pre-trained, and analyzing the work of the best model accuracy from the results of testing.
In this study, we use the following references as a theoretical basis and the following references for research, journals, or international papers on sentiment analysis with similar BERT designs. The sentiment analysis with BERT in the research "BERT: Pre-training of deep bidirectional transformers for language understanding" uses pre-trained models to improve system accuracy and performance to provide excellent and accurate analysis results, especially in understanding different types of languages [6]. In research conducted by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, the BERT algorithm and other algorithms were trained to analyze datasets from various sources such as General Language Understanding Evolution (GLUE), Stanford Question Answering Dataset (SquAD), and Situations With Adversarial Generations (SWAG). BERT algorithm has the
highest accuracy percentage compared to other algorithms, with F1-Score results of 91.0%, %, 93.2%, and 86.3%
from each dataset source [6]. Specifically, we used Indonesia Language only in our research. We also used two polarity detection based on the research "Cross-Cultural Polarity and Emotion Detection Using Sentiment Analysis and Deep Learning on COVID-19 Related Tweets" this sentiment analysis is carried out similarly to the polarity target for reactions from cultural differences to the global crisis [7]. Sentiment analysis is also popular with famous problems such as noisy text using several pre-processing techniques to overcome noisy text in the research
"Sentiment Analysis of Noisy Malay Text: State of Art, Challenges and Future Work". Noisy text can interfere with and complicate classification in a sentiment analysis because of the ambiguity and meaning of different words [8]. For our research metrics, we adopt the same metrics on BERT model; in the research, "COVID-19 Sensing:
Negative Sentiment Analysis on Social Media in China via BERT Model” [9]. Different metrics can be implemented on different BERT model, but we specifically used confusion matrix to visualize the predictive analytics to give us direct comparison of different values.
This study used the same Bidirectional Encoder Representation from Transformers (BERT) algorithm to perform sentiment analysis. Because this algorithm will make it easier for the system to understand the context of the search intended by the user by examining the correlation of each keyword entered. BERT algorithm has a multi-layer bidirectional transformer encoder architectural model [6]. The sentiment analysis system in this study was built to improve analyzing aspect of news identification specifically on Indonesian Language so it can be suitable for different model testing on the similar topic.
2. RESEARCH METHOD
2.1 Research Stages
The systematics of writing in this study are as follows: the introduction section, this introduction contains the background, problem formulation, objectives and benefits, problem boundaries, research methods, and writing systematics. The related studies section and the theoretical basis discuss the basic concepts and literature reviews used in research, such as sentiment analysis technology, web scraping, branding, and the BERT algorithm. The method design section describes the system design that has been built, performance parameters, and device specifications used. This evaluation section contains data on the results of system tests and analysis of the test results obtained follow by accuracy, precision, recall, and F1-Score. Furthermore, the conclusion section contains conclusions from the results of the analysis of system testing and suggestions for further research to improve the performance of this sentiment analysis system.
2.2 Natural Language Processing (NLP)
Neural Language Processing is a branch of computer science that deals with computational linguistics. This field of science facilitates the interaction between humans and computers. The machine learns the meaning of text written in human language, then processes it and provides results to the user. NLP is significant in the future because it can help humans to build models and processes that can retrieve information from sound or text [10].
2.3 Sentiment Analysis
Sentiment analysis is a branch of the development of the science of text mining that aims to obtain opinions from a text. These opinions are classified into positive, negative, or neutral classes. Most researchers in using sentiment analysis in social media aim to obtain datasets from social media [11]. The primary purpose of sentiment analysis is to ascertain the polarity of natural language by performing a supervised or unsupervised classification. Sentiment analysis is commonly used to predict politics, and marketing strategy, to find out about the reputation of a product and analyze social media users [12]. Fig.1 is the stages of the process to perform sentiment analysis on the sentiment of news are as follows: dataset exploration, preprocessing, fine-tuning, tokenizing, feature normalization, stop words removal, transformation, classification, and evaluation [13].
Figure 1. Sentiment Analysis Process [12]
a. Review Dataset
Review Dataset is the initial stage of conducting sentiment analysis for this research is to collect datasets. The dataset used is in the form of text. The more datasets used for training, the machine will produce higher accuracy
because it has many reference words or sentences from the dataset. However, from each dataset, at least sentences or words have ambiguous meanings [13].
b. Tokenizing
Tokenizing is the task of cutting the sequence of characters or sentences per word to make a sentence more meaningful [13].
Table 1. Example Results of the Normalization Process
Original Sentence Tokenized Sentence Converting to IDs
Heboh! hastag #blacklivesmatter jadi trending di twitter.
[‘Heboh’, ‘!’, ‘hastag’, ‘#’,
‘black’, ’live’, ’s’, ‘matter’,
‘jadi’, ‘trending’, ‘di’,
‘twitter’, ‘.’]
[424,521,6143,53,753,12,5 3,634,867,4536,7356,827,5
3]
Table 1 describes an example of tokenizing results in a sentence. The sentence is "Heboh! hastag
#blacklivesmatter jadi trending di twitter.". After going through the tokenizing process, the sentence will change to a token. Each token was given a unique ID. Sentences that have gone through the tokenizing process will be converted into words with a higher meaning [13].
c. Normalization
Normalization is the task of cleaning a sentence with typical components exists in articles such as usernames, hashtags, and URLs (Uniform Resource Locator). Components such as username “@” are marked as username,
“#” are marked with a hashtag, while URL components are marked with a particular link, usually HTTP. For example, emoticons are a textual expression of feelings, affecting the sentiment value. The emoticon will turn into a suitable string. Emoticons are usually in a tweet from various social media, generally shown as an icon that represents the emoticon [13].
Table 2. Example Results of the Normalization Process Original Sentence After Normalization Heboh! hastag #blacklivesmatter
jadi trending di twitter.
Heboh hastag blacklivesmatter jadi
trending di twitter.
Table 2 is an example of the normalization of features in a news headline. In the news headline, there is a
"#" symbol, one of the typical components that must be cleaned to improve accuracy in sentiment analysis [13].
d. Stemming
Stemming is the process of different mapping words with different morphology into a basic form. Stemming in Indonesian faces the problem of more complex affix variations compared to English. Affixes in Indonesian are prefixes, suffixes, insertions (infixes), and combinations (confixes) [13]. For example, word "makanan" the word comes from the word "makan" and has a suffix in the form of "an".
e. Stopwords Removal
Words that appear too often in a text are not too good to be used as keywords. Words that often occur up to 80%
of the text are useless for information retrieval and should be eliminated. Words that appear frequently are called stop words that often appear in large numbers and have no meaning. Examples such as: “ini”, “itu”, “adalah”,
“adanya”, “bagaimanapun”, and others. With the removal of stopwords, the vocabulary will be reduced, with words with a higher weight value [13].
Table 3. Example Results of the Stopwords Removal Process
Original Sentence After Stopwords Removal Menjadi PTS terbaik Telkom
University meraih peringkat terbaik di Indonesia
PTS Telkom University peringkat terbaik Indonesia
Words that appear too often in a text are not too good to be used as keywords. Words that often occur up to 80% of the text are useless for information retrieval and should be eliminated. Words that appear frequently are called stop words that often appear in large numbers and have no meaning [13]. Examples such as: “ini”, “itu”,
“adalah”, “adanya”, “bagaimanapun”, and others. With the removal of stopwords, the vocabulary will be reduced, with words with a higher weight value [13].
f. Classification using the Confusion Matrix Metrics
Classification is finding a set of models or functions that describe and differentiate data classes. The purpose of classification is to predict the class of an object whose class is not yet known; to find out how much data has been classified, use the Confusion Matrix [13]. In the Confusion Matrix, there are True Negative (TN); the amount of data that is negative and is predicted to be true as negative, True Positive (TP); the amount of data that has a positive value and is correctly predicted as positive, False Negative (FN); the amount of data that is positive but predicted as negative, and False Positive (FP); The amount of data that is negative but predicted as positive [14].
The following explains the confusion matrix metrics calculation.
1. Accuracy
Accuracy is to calculate the overall truth in doing sentiment analysis [14]. Accuracy can be calculated by the following equation 1.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁
𝑇𝑁+𝐹𝑃+𝐹𝑁+𝑇𝑃 (1)
2. Precision
Precision is the comparison between True Positive (TP) and the number of data predicted to be positive. The ideal value of the precision parameter is close to 100%. By looking at the False Positive (FP), it can be seen how much influence the FP value has on the classification results [14]. Precision can be calculated by the following equation 2.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃
𝑇𝑃+𝐹𝑃 (2)
3. Recall
Recall compares True Positive (TP) with the number of positive data and the number of False Negatives (FN) data. By looking at False Negative (FN), it can be seen how much influence the FN value has on the classification results [14]. Recall can be calculated by the following equation 3.
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃
𝑇𝑃+𝐹𝑁 (3)
4. F1-Score
The best value for F1-Score is 1, and the worst value is 0. If the F1-Score has a good score, the recall and precision classification models are also good. The F1-Score value also shows an even distribution between precision and recall parameters in the confusion matrix. The F1-Score value also serves to overcome the solution if there is a class imbalance between precision and recall [14]. F1-Score can be calculated by the following equation 4.
1 𝐹1=1
2( 1
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+ 1
𝑅𝑒𝑐𝑎𝑙𝑙) (4)
2.4 Bidirectional Encoder Representations from Transformers (BERT)
BERT is designed to train representation in two directions of unlabeled (unsupervised) text [6]. The trained BERT model can be configured with an additional output layer to create a sophisticated, multi-tasking model. The BERT model can also perform supervised (labeled) data classification [6]. Referencing from Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova [6], the BERT algorithm was designed based on the encoder architecture of Transformers because the encoder architecture can solve various tasks such as Neural Machine Translation, Question Answering, Sentiment Analysis, and Text Summarization [15]. This research has produced two BERT models: BERTBASE and BERTLARGE. In a study by Benjamin Muller, Benoît Sagot, and Djamé Seddah, the BERTBASE architecture already has lexical normalization where BERTBASE can correct typos and find the meaning of short written words [16]. In this system, the BERT model used to perform the classification is BERTBASE-multilingual-cased, and the Indonesian-language BERT model ‘BERTBASE -indonesian-522m’, which has the same based model adapted from BERTBASE-multilingual-cased, specifications for both models as listed in Table 4.
Table 4. BERT Model Specifications
The input representation used by BERT can represent one or many sentences in one token sequence [6].
The first token of each input sequence is [CLS]. This token performs classification tasks as an aggregate of all sequence representations. The CLS token is followed by the WordPiece token and [SEP] for single text sentence tasks. [CLS] is a special symbol that is added in front of each input. Fig.2 is an example of an BERT input representation.
Architecture Model ID Model Details
BERTBASE bert-base-multilingual-cased -12 layer, 768 hidden, 12 heads, 179 million parameters.
-Support up-to 104 languages. (Indonesian Included) BERTBASE bert-base-indonesian-522m -12 layer, 768 hidden, 12 heads, 179 million parameters.
-Indonesian Language only.
Figure 2. BERT Input Representation [6]
Fig.2 is a sentence pair task and WordPiece tokens; the two sentences are separated by a [SEP] token. These inputs sequence also ends with the [SEP] token. WordPiece embeddings are added with segment embeddings, which mark the first sentence as 'A', the second sentence as 'B', and embedding the position for each word [6].
Each input embedding is combined with three embeddings, i.e., Position Embeddings, BERT learns and uses positional embeddings to express a word in a sentence. Positional embeddings were added to overcome the limitation of transformers which cannot distinguish between “order” and “order” information [17]. The second is Segment Embedding; BERT can take a sentence to be used as input in a task. This model learns unique embedding to be able to distinguish the meaning of the first sentence and the second sentence [17]. Third Token Embeddings, in this embedding process, to learn a vocabulary in WordPiece to continue in Fine-Tuning process [17].
In Fine-Tuning process, the BERT model is trained according to the criteria of an entered text. The training dataset documented into BERT will add a new layer used as a learning and prediction layer. The technique used for text classification in BERT in this system is single sentence classification [18].
Figure 3. Single Sentence Classification [18]
Fig.3, where the last condition of the hidden [CLS] token represents a pooled input sequence and has a fixed dimension. This last condition is included in the classification layer. Only the classification layer is added as a parameter. It has dimensions K × H where K is the number of classification labels, and H is a measure of the hidden state. The probability of a label is calculated by the standard softmax finalization function [19].
2.5 Web Scraping
Web scraping is the process of retrieving semi-structured documents from the Internet, generally, in the form of a web page built with a markup language such as HTML (Hypertext Markup Language) or XHTML (Extensible Hypertext Markup Language), which aims to retrieve information from the page in the form of text either directly or indirectly. In part or in whole [20]. In general, there are four stages in using web scraping.
a. Study the HTML document from the website where the information will be retrieved [20].
b. Understand the navigation mechanism on the website where the information will be taken to be imitated in the web scraper application that will be created [20].
c. A web scraper is created to automate information retrieval from a specified website [20].
d. The information that has been retrieved will be stored in a file with a particular format, for example, in the form of JSON or CSV [20].
It is necessary to have input in the form of keywords and a web URL or category related to the aims of the dataset.
2.6 Branding
Branding is an activity of communicating, developing, maintaining, and strengthening a brand to provide perspective and interest to others who see it. Branding has a close relationship with the image of a product. Product image is the key to a brand or company and indirectly becomes a view in the eyes of the public. If the product's image is excellent and reasonable, then the public's view will be good of the product, and if the image of the
product is negative, then the public's view will be inaccurate. Every brand or business must receive praise and criticism from consumers and the public due to internal and external influences [2].
3. RESULTS AND DISCUSSIONS
This section is an overview of the sentiment analysis system that we built and used for this study. The sentiment analysis system architecture in Fig.4 consists of dataset preparation, pre-processing, fine-tuning using BERT, classification, and evaluation.
Figure 4. System Architecture 3.1 Dataset Preparation
Dataset preparation to collect the required datasets is scrapped using web-scraping techniques, and several datasets are taken from the dataset provider website to speed up dataset collection in the CSV format.
Table 5. Dataset Schema
Table 5 is the systematics of the used dataset in this study. Dataset was taken from the popular Indonesian news portal; detik.com and combined several datasets from scraping with a data period from 2015-2020. Then, the dataset will be divided into four parts: raw datasets, train datasets, validation datasets, and test datasets. From the dataset, 90% of the raw dataset will be used for training, and 10% of raw datasets will be used for validation.
3.2 Pre-processing Phase
The dataset will be labeled in the pre-processing phase and proceed to other stages such as feature normalization, stemming, and tokenizing. Feature normalization works to eliminate useless typical components such as punctuation and URLs [21]. Then labeling the data according to the polarity and meaning of the news title used by labeling 0 as unfavorable and 1 as positive, the labeling process is automatically and corrected manually to minimize errors from labeling automation. Tokenizing changes every sentence in the news headline text into a number (token) that matches the vocabulary already contained in the pre-trained BERT model [21]. Examples of Tokenizing results using the BERT Tokenizer can be seen in Table 6.
Table 6. BERT Tokenizer Result Sample
In Table 6, there are numbers in each word called the WordPiece Token. However, [CLS] and [SEP] are not included in the WordPiece Token. [CLS] is a unity number representing the entire text string, meaning that the number in [CLS] represents all WordPiece Tokens contained in each word. [SEP] is a token separator that
RAW
Dataset Train Dataset Validation Dataset
Test Dataset
Test Dataset Label 0 (negative)
Test Dataset Label 1 (positive) 1600
News Titles
90% from RAW dataset
10% from RAW dataset
160 News
Titles 80 News Titles 80 News Titles
BERT Tokenization Result
['[CLS]', 'pria', 'di', 'medan', 'labuhan', 'tewas', 'diti', '##kam', 'polisi', 'buru', 'pelaku', '[SEP]']
[3, 3907, 1495, 4462, 18147, 4865, 18218, 18362, 4476, 14989, 8652, 1]
functions as a separator between each sentence or as a sign of the end of a sentence [21]. The results of the dataset that has passed the pre-processing stage will be divided into two parts: the train dataset and the validation dataset.
Examples of results from pre-processing processes that have not been and have been normalized and labeled are in Table 7 and Table 8.
Table 7. Original Sentence before Pre-processing
Sources News Title
detik.com Mendapat jumlah lulusan nganggur terbanyak! kredibilitas PTS Telkom University dipertanyakan?.
detik.com Kemensos kunjungi Penerima Bansos di Kota Bandung begini kondisinya…
Table 8. Original Sentence after Pre-processing
Sources News Title Label
detik.com mendapat jumlah lulusan nganggur terbanyak kredibilitas pts telkom university dipertanyakan 0 detik.com kemensos kunjungi penerima bansos di kota bandung begini kondisinya 1 3.3 Fine-Tuning Data
The fine-tuning technique determines whether the news title contains harmful elements or it does not. The technique used to perform the classification is Single Sentence Classification [6]. Single Sentence Tagging works by putting a [CLS] mark in each text. All WordPiece tokens contained in the text are represented in [CLS].
Furthermore, [CLS], which contains all the information about the WordPiece token, is entered into the classification layer to get the text classification results whether the text contains positive or negative information [18].
3.4 Classification
In the classification process, the data is trained with a pre-trained model; the pre-trained model has been trained with a large dataset to produce a good-quality benchmark dataset. Then the hyperparameters used are Epoch and batch. Hyperparameters are variables that affect the model's output. Hyperparameters are not changed during model optimization [18]. Each sample in the training dataset is the Epoch that has the opportunity to update the model. In this case, the Epoch can be interpreted as a training iteration [18]. In this process, we used 20 epoch hyperparameters, and 32 batch sizes, while the learning rates used were 0.00001, 0.00002, 0.00003, 0.00005, and 0.00007. The system uses epoch 20 because the system created is included in the BERT model with an additional layer [22]. BERT with an additional layer is the trained BERT model with a new dataset, which increases the number of layers and parameters in the BERTBASE model [22]. The BERT model with an additional layer uses 20 epochs for training [22]. The value of the learning rate affects the speed of the training process. If the learning rate is significant, the training process will be faster. However, a more extensive learning rate can cause fluctuations in the training process loss. In order to determine the correct learning rate for this study, it is necessary to do trial and error because each model or system designed has a different number, sequence, and type of layer.
3.5 Testing Results and Evaluation
This sentiment analysis system aims to classify text correctly and provide good accuracy. In this section, we evaluate the BERT method's evaluation results. Performance parameters were analyzed to assess system performance, confusion matrix calculation results, including accuracy, precision, recall, and F1-Score. We use 4 epoch variations of 5, 10, 15, and 20, batch size of 32, and 5 types of learning rates: 0.00001, 0.00002, 0.00003, 0.00005, and 0.00007. The tests were carried out with the same configuration for the two pre-trained models, for the dataset using 1600 training data and 160 labeled test data on 'bert-base-multilingual-case' model and 'bert-base- indonesia-522M' model.
Figure 5. Performance Results
Figure 6. F1-Score Results on Negative and Positive
From the results of the Fig.5 and Fig.6, accuracy testing between 'bert-base-multilingual-case' and 'bert- base-indonesian-522M', the accuracy results of 'bert-base-multilingual-case' in the 0.00001 configurations reached 92% then 'bert-base-indonesian-522M' also reached 86%, in the 0.00002 configurations it reached 93% for the 'bert-base-multilingual-case' model then the 'bert-base-indonesian-522M' model only reached 84%, in the 0.00003 configurations the multilingual-cased model reached 91 % while the indonesian-522M model only reached 52%, the 0.00005 and 0.00007 configurations were the lowest of the two pre-trained models we tested with 50%
accuracy. From these results, it can be determined that the best accuracy of the two models is the 'bert-base- multilingual-case' pre-trained model. Therefore, fine-tuning using the pre-trained multilingual-cased model is performed better to the Indonesian-522M pre-trained model. Then, based on our benchmark results, we produced the highest score of 93% for the fine-tuning model with a learning-rate configuration of 0.00002.
Fig.5 also shows the best precision average results are 93% at a learning rate of 0.00002 with a multilingual- cased model. The ideal precision value is when it is close to 100%. The equation for the average value of each parameter in this system in equation 5.
𝑀𝑎𝑐𝑟𝑜 𝐴𝑣𝑔 = (𝑃1 + 𝑃2)/2 (5)
The variable P1 is the parameter value in class 0, and P2 is the parameter value in class 1. The precision parameter provides information on how many news headlines contain negative elements and does not contain negative elements from the overall news headlines that are predicted.
In Fig.5, test results on the recall parameter; the average best value obtained is 93% at a learning rate of 0.00002 with a multilingual-cased model. The recall parameter provides information on how many percent of news headlines are predicted to contain negative elements and do not contain negative elements compared to the overall news headlines.
In Fig.5, test results for the F1-Score parameter; the average best score obtained is 92% at a learning rate of 0.00001 and 0.00002 with a multilingual-cased model. The F1-Score parameter is used to compare the average value of precision and recall, then serves to determine the optimal balance between precision and recall values. In Fig.6, test result for the F1-Score parameter; on Label 0 (Negative) classification the best score obtained is 92%
at a learning rate of 0.00001 and 0.00002 with a multilingual-cased model, followed by the best score from indonesian-522m model at 86% at a learning rate of 0.00001, which is 6% further low than the multilingual-cased model, with the lowest score in both models at a learning rate of 0.00005 and 0.00007. As for Label 1 (Positive) classification, the best score obtained is 93% at a learning rate of 0.00002 with a multilingual-cased model, followed by the best score from indonesian-522m model at 87% at a learning rate of 0.00001, which is 6% further low than the multilingual-cased model, with the lowest score in both models at a learning rate of 0.00005 and 0.00007.
3.5.1 Confusion Matrix Results
Performance parameters are analyzed to assess system performance with a confusion matrix. Fig.6 are the result of the confusion matrix based on the highest performing accuracy in our testing.
Figure 7. Confusion Matrix Results
3.5.2 Evaluation Results
With these results, the learning rate of 0.00002 is the best configuration with a multilingual-cased pre-trained model that can be used in this sentiment analysis system, with 93% accuracy. Indonesian-522m pre-trained model only reach 86% accuracy with 0.00001 learning rate configuration.
3.5.3 Analysis Insights
Based on our best model result, we sample some of the predictions results to analyze some keyword from the headlines. Some words have strong semantic meaning; for example, in Table 9, word highlighted with yellow like
"bunuh" or "pembunuh", has strong indications to be negative, since the semantic meaning leaning towards bad rather than good. Naturally one keyword can affect certain kind of emotion and meanings toward things. Thus, creates easy monitoring against certain wording to identifies sentiment.
Table 9. Predictions Result Samples
Category Headlines Predictions
unknown didi bunuh pemain organ tunggal di samarinda karena goda istrinya 0
Negativ e detikNew
s belasungkawa tel u insiden tewas nya alexander di luar kampus 0
Negativ e detikNew
s fakta fakta pembunuhan mahasiswa telkom university di karawang 0
Negativ e unknown sertijab kapolsek kapolresta deli serdang ciptakan suasana kondusif 1 Positive
news sambangi mahfud md dubes pradeep india sangat damai 1 Positive
unknown ribuan skripsi dan tesis mahasiswa telkom university dibuka ke publik 1 Positive
4. CONCLUSION
Based on the research and testing results, the resulting model achieves an accuracy rate of 93% on ‘BERTBASE - multilingual-cased’ pre-trained model, and accuracy rate of 86% on model ‘BERTBASE -indonesian-522m’ pre- trained model. Both tested models can be used for classification related to positive and negative news indications.
The model that has been tested is expected to help analyzing any indications of positive and negative news against certain businesses or organizations that are spreading on Indonesian online news portals, benefiting in quick insights, identifying key emotional triggers, and agent monitoring. For further research, several things can be done to improve performance and classification for our Sentiment Analysis system, such as: increasing the amount of data in data training to improve sentiment classification accuracy and conducting in-depth research related to ambiguous words in human language (noisy text) which can interfere with and complicate the classification in a sentiment analysis because of the ambiguity and meaning of different words to give better results.
REFERENCES
[1] S. Singh, V. Priscilia, A. Fivaldo, and N. Limantara, “Factors Influencing of Social Media Ads Usage in Indonesia,” in 2022 2nd International Conference on Information Technology and Education (ICIT&E), Jan. 2022, pp. 186–190. doi:
10.1109/ICITE54466.2022.9759845.
[2] J. Fueller, R. Schroll, S. Dennhardt, and K. Hutter, “Social Brand Value and the Value Enhancing Role of Social Media Relationships for Brands,” in 2012 45th Hawaii International Conference on System Sciences, Jan. 2012, pp. 3218–3227.
doi: 10.1109/HICSS.2012.533.
[3] T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial Intelligence, vol. 267, pp. 1–
38, Feb. 2019, doi: 10.1016/j.artint.2018.07.007.
[4] M. Ahmad, S. Aftab, and I. Ali, “Sentiment Analysis of Tweets using SVM,” International Journal of Computer Applications, vol. 177, no. 5, pp. 25–29, Nov. 2017, doi: 10.5120/ijca2017915758.
[5] P. Klosowski, “Deep Learning for Natural Language Processing and Language Modelling,” in 2018 Signal Processing:
Algorithms, Architectures, Arrangements, and Applications (SPA), Sep. 2018, pp. 223–228. doi:
10.23919/SPA.2018.8563389.
[6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.04805
[7] A. S. Imran, S. M. Daudpota, Z. Kastrati, and R. Batra, “Cross-Cultural Polarity and Emotion Detection Using Sentiment Analysis and Deep Learning on COVID-19 Related Tweets,” IEEE Access, vol. 8, pp. 181074–181090, 2020, doi:
10.1109/ACCESS.2020.3027350.
[8] M. F. R. Abu Bakar, N. Idris, L. Shuib, and N. Khamis, “Sentiment Analysis of Noisy Malay Text: State of Art, Challenges and Future Work,” IEEE Access, vol. 8, pp. 24687–24696, 2020, doi: 10.1109/ACCESS.2020.2968955.
[9] T. Wang, K. Lu, K. P. Chow, and Q. Zhu, “COVID-19 Sensing: Negative Sentiment Analysis on Social Media in China via BERT Model,” IEEE Access, vol. 8, pp. 138162–138169, 2020, doi: 10.1109/ACCESS.2020.3012595.
[10] A. Jain, G. Kulkarni, and V. Shah, “Natural Language Processing,” International Journal of Computer Sciences and Engineering, vol. 6, no. 1, pp. 161–167, Jan. 2018, doi: 10.26438/ijcse/v6i1.161167.
[11] A. A. Lutfi, A. E. Permanasari, and S. Fauziati, “Corrigendum: Sentiment Analysis in the Sales Review of Indonesian Marketplace by Utilizing Support Vector Machine,” Journal of Information Systems Engineering and Business Intelligence, vol. 4, no. 2, p. 169, Oct. 2018, doi: 10.20473/jisebi.4.2.169.
[12] A. Rasool, R. Tao, K. Marjan, and T. Naveed, “Twitter Sentiment Analysis: A Case Study for Apparel Brands,” Journal of Physics: Conference Series, vol. 1176, p. 022015, Mar. 2019, doi: 10.1088/1742-6596/1176/2/022015.
[13] R. Julianto, E. D. Bintari, and I. Indrianti, “Analisis Sentimen Layanan Provider Telepon Seluler pada Twitter Menggunakan Metode Naïve Bayesian Classification,” Journal of Big Data Analytic and Artificial Intelligence, vol. 3, no. 1, pp. 23–30, 2017.
[14] J. Dj Novakovi, A. Veljovi, S. S. Ili, ˇ Zeljko Papi, and M. Tomovi, “Evaluation of Classification Models in Machine Learning,” 2017.
[15] Y. Iwasaki, A. Yamashita, Y. Konno, and K. Matsubayashi, “Japanese Abstractive Text Summarization using BERT,”
Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 1674–1682, Dec. 2020, doi:
10.25046/aj0506199.
[16] B. Muller, B. Sagot, and D. Seddah, “Enhancing BERT for Lexical Normalization,” in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), 2019, pp. 297–306. doi: 10.18653/v1/D19-5539.
[17] Y. Wu et al., “Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks,” in 2019 IEEE International Conference on Big Data (Big Data), Dec. 2019, pp. 1971–1980. doi: 10.1109/BigData47090.2019.9006104.
[18] S. Yashu, “BERT Explained – A list of Frequently Asked Questions,” Blog, Let the Machines Learn, Jun. 12, 2019.
https://yashuseth.wordpress.com/2019/06/12/bert-explained-faqs-understand-bert-working/ (accessed Jul. 27, 2022).
[19] A. Padmanabhan, “BERT (Language Model),” Devopedia, Jun. 30, 2021. https://devopedia.org/bert-language-model (accessed Jul. 27, 2022).
[20] A. Priyanto and M. R. Ma’arif, “Implementasi Web Scrapping dan Text Mining untuk Akuisisi dan Kategorisasi Informasi dari Internet (Studi Kasus: Tutorial Hidroponik),” Indonesian Journal of Information Systems, vol. 1, no. 1, pp. 25–33, Aug. 2018, doi: 10.24002/ijis.v1i1.1664.
[21] Y. Qiao, C. Xiong, Z. Liu, and Z. Liu, “Understanding the Behaviors of BERT in Ranking,” Apr. 2019.
[22] M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi, “Bidirectional Attention Flow for Machine Comprehension,” Nov.
2016.