Xiaomi Smartphone Sentiment Analysis on Twitter Social Media Using IndoBERT

(1)

Xiaomi Smartphone Sentiment Analysis on Twitter Social Media Using IndoBERT

Priyan Fadhil Supriyadi^1,*, Yuliant Sibaroni² School of Computing, Telkom University, Bandung, Indonesia

Email: ^1,*priyanfadhil@student.telkomuniversity.ac.id, ²yuliant@telkomuniversity.ac.id Correspondence Author Email: priyanfadhil@student.telkomuniversity

Submitted 19-01-2023; Accepted 24-01-2023; Published 17-02-2023 Abstract

The extraordinary evolution of technology has resulted in smartphones becoming important devices in people's daily lives. As a result, today's smartphones impact many people's lives, with more and more people owning smartphones. One of the most popular smartphone products today is Xiaomi. This popularity cannot be separated from various opinions on Twitter. Twitter is a social media that makes it easy for people to express their opinions regarding Xiaomi products called sentiment. Sentiment analysis is needed to classify various opinions on Twitter into positive, neutral, and negative classes. This study aims to analyze the sentiment of public opinion on Xiaomi smartphone products on Twitter social media. The models used in this study were BERT and IndoBERT because they produced a good performance in previous studies. This study's stages of work consisted of collecting, preprocessing, separating training and test data, building models with BERT and IndoBERT to detect sentiment, and carrying out training and testing stages. Test results using IndoBERT get a very good accuracy value with an accuracy value above 90%. The sentiment classification results for Xiaomi smartphone products show that positive sentiment on batteries has a greater number, with a positive percentage of 78%. In comparison, neutral sentiment is 4%, and negative sentiment is 18%. Furthermore in the camera aspect, positive sentiment has a greater number, with a positive percentage of 68%, while neutral sentiment is 18% and negative sentiment is 14%. Moreover, on the screen, positive sentiment has more numbers, with a positive percentage of 67%, neutral sentiment is 10%, and negative sentiment is 23%. Last, in the ram aspect, positive sentiment has a greater number with a positive percentage of 76%, while neutral sentiment is 17% and negative sentiment is 7%. The highest number of positive sentiments is in the camera aspect, which has 1935 positive sentiments from 2830 data. The sentiment analysis results can be used as an evaluation along with insights for the Xiaomi company so that in the future, the company can maintain and even improve the quality of the aspects that smartphone users like about Xiaomi products, namely cameras.

Keywords: Xiaomi Smartphones; Sentiment Analysis; BERT; IndoBERT; Twitter

1. INTRODUCTION

In recent years, the extraordinary evolution of technology has resulted in smartphones becoming important devices in people's daily lives. Information and communication are now within a "click" distance, making life easier and more practical. As a result, today's smartphones impact many people's lives, with more and more people owning smartphones.

Based on data from the Statista Research Department in 2022, by 2020, as many as 67.15% of the population in Indonesia will have at least one smartphone [1]. The use of smartphones in Indonesia is expected to continue to grow yearly.

Smartphone products have sprung up a lot in Indonesia, where one of the most popular products today is Xiaomi.

The brand issues various products such as gadgets, laptops, smartwatches, televisions, etc. Every product launched is sure to get various kinds of opinions/sentiments from the public [2]. Lei Jun founded Xiaomi in 2010 with the vision of

"innovation for everyone".

Based on a market research report from Canalys, for the 1st (one) quarter of 2021 and 2022 that Xiaomi is in the third position of the 5 (five) largest vendors on the market [3]. From the table above, Xiaomi's market share is recorded at 14% in the first (first) quarter of 2021 and 13% in the first (first) quarter of 2022. Despite a 20% decline, Xiaomi still maintains its third position with 39.2 million units. Smartphone devices to various regions in the first quarter of 2022.

Along with the increase in smartphone users, the development of social media has changed the way humans communicate. Many people use social media to express their opinions, and other things that are of public concern [4].

Such things are often called sentiments. Sentiment analysis is needed in order to understand, extract, and process textual data automatically so that sentiment information is obtained in an opinion sentence [5].

A technique that aims to detect opinions about a subject (for example, a product or an organization) in a data set is called Sentiment analysis [6]. it is also a process for determining sentiment, which is made in text form and divided based on several categories of positive, neutral, or negative sentiment [7]. Sentiment analysis analyzes sentiment values from a text document, the entire document, paragraphs, sentences, or clauses [8].

Apart from extracting sentiments, people usually want to know is, when did sentiment change occur and what caused the sentiment to change. This is important because by knowing what causes sentiment to change, the parties concerned can make better decisions [9]. In the business industry, it is used to view each customer's review or opinion, which is useful for providing customer information.in making business decisions, this information will be useful [10].

One tool that performs sentiment classification is BERT. BERT is a neural network-based technique for pre- training natural languages [11]. The way BERT works is by being able to train a Language model based on a whole set of words in a sentence or query. BERT allows the language model to learn the context of words according to those around them [12]. Bidirectional Encoder Representations from Transformers is a pre-trained contextual word representation model based on MLM (Masked Language Model), using two-way Transformers [13].

(2)

The BERT model architecture is a multi-layer bidirectional transformer encoder-decoder structure [14].

Transformers follow this architecture using self-attention and point-wise stacked, fully connected encoders and decoders [15]. There are 2 steps in the performance of the BERT framework, namely pre-training and fine-tuning. BERT pre- training does not use the traditional left-to-right or right-to-left method but uses Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) for pre-training data [16]. MLM fills in the blanks, where the model uses the context word around the mask token to predict what word should be, while NSP is the prediction of the next sentence with the two models given. After pre-training data, BERT will perform fine-tuning where fine-tuning is initialized with previously trained parameters, and all fine-tuning parameters use labeled data from downstream tasks [17]. BERT is pre-trained using 800 million words of BookCorpus data and 2.5 billion words of English Wikipedia. With rich pre-training data and pre-training tasks that make the BERT model understand every word deeply, only fine-tuning is needed to use BERT in various tasks [18].

IndoBERT is the Indonesian version of the BERT model [12]. IndoBERT is contextual embedding because it is the most advanced language model for Indonesian based on the BERT model [19]. Previous studies have used the BERT and IndoBERT models to conduct product sentiment analysis. It was stated that the IndoBERT model labeling classification has the highest f1 score among the 5 other models, namely Naive Bayes, Logistic Regression, BiLSTM w/FastText, MBERT, MalayBERT, namely achieving 84.13% [20]. In addition, research produces an F1 score obtained from precision and recall, which is 98.9% and an accuracy value of 99%, concluding that the pre-trained BERT method is very effective for implementing sentiment analysis [21]. Furthermore, some studies produce a fairly high accuracy, equal to 73.7%. This accuracy has been proven good and far enough compared to using the Naïve Bayes algorithm for the classification process [22]. In other studies, some studies produce the highest accuracy of 84% [23].

This topic was chosen because in 2021 research has been carried out on products with the Xiaomi brand to determine public opinion towards Xiaomi products. It uses the Naïve Bayes method with data of 2,078 tweets and producing a fairly good accuracy of 71.88% with a percentage distribution of 39% positive sentiment, 51% neutral and 10% negative [2]. This study aims to conduct a sentiment analysis of public opinion on Xiaomi smartphone products on social media Twitter using IndoBERT with hyperparameters that can optimize the performance of IndoBERT to produce excellent accuracy. The BERT and IndoBERT methods were used because they performed well in previous studies. The BERT method has advantages over the CNN, RNN, OpenAI GPT, and ELMo methods [18]. Likewise, the IndoBERT method outperforms 5 other models, namely Naive Bayes, Logistic Regression, BiLSTM w/FastText, MBERT, and MalayBERT [20].

2. RESEARCH METHODOLOGY

2.1 Research Stages

The following is a system design that will be built in this study.

Figure 1. Overview of Research Flow

The flow chart above graphically illustrates the steps and sequence of procedures in this study. The first step is to collect data with the input, namely datasets. After that, preprocessing of the dataset will be carried out. Then do the data split to divide the dataset into two parts: training data and test data. After that, classification will be carried out using the BERT and IndoBERT algorithms. Finally, a performance evaluation was carried out.

2.2 Data Collection

The data collected by the researchers are tweets containing the keyword "Xiaomi" carried out using a crawling technique.

Crawling is retrieving data from URL pages using an API to retrieve large datasets. Then, delete duplicate data and unnecessary columns, group data based on aspects, and manually label 2 people who carried out datasets. Conducting aspect-based sentiment analysis can make it easier for readers to determine opinions based on the category of the product being analyzed [24]. Labeling is done to categorize tweets that contain negative, neutral, and positive sentences. Labeling will be generalized to "0" for positive sentiment, "1" for neutral sentiment, and "2" for negative sentiment. Labeling is

(3)

necessary because the supervised method can see examples and produce generalizations so that the output will produce the desired label [25].

2.3 Dataset Preprocessing

Data preprocessing is carried out to convert unstructured datasets into structured ones and prepare datasets for the training process. The preprocessing is case folding, data cleaning, stopword removal, stemming, and tokenization.

Figure 2. Dataset Preprocessing a. Case Folding

Case folding is the process of making all forms of letters in the dataset lowercase. Mapping everything to lowercase means, it's really helpful for the generalization process [26].

b. DataCleaning

Data cleaning aims to perform URL removal, namely to remove links in each sentence, cleaning data including unescaped HTML, namely removing HTML tags in sentences, mention removal, removing every word with the prefix

"@," number removal, namely removing the numbers in each sentence, and punctuation removal, namely removing all punctuation marks in sentences.

c. Tokenization

Tokenization aims to separate text into minimal meaningful components. Tokenization must be performed in text preprocessing to perform any type of analysis [27].

d. Stopword Removal

Stopword removal aims to remove words that have no meaning. Stopwords are general words with no or less meaning than other keywords [27].

e. Stemming

Stemming aims to extract words to return to their basic form [27].

2.4 Data Splits

Data splitting is a data-sharing process aiming to divide the dataset into two parts: training data and test data. The training data will be used in training the BERT and IndoBERT classification models, while the test data will be used in the model evaluation process. This process will use a library from python, namely sci-kit-learn. Split data will use cross-validation rules which will repeat the sampling of the test data k times and require a different sample each time to do the test [35].

The training set will be divided into smaller k sets.

2.5 BERT

BERT, which is used for sentiment analysis, it is necessary to add an output layer so that the BERT model can perform the classification task. Then, the model is fine-tuned using a dataset that has gone through preprocessing. Most hyperparameter models are the same as pre-training for fine-tuning, except for epochsn batch size, and learning rate. The dropout probability is always kept at 0.1. Based on previous research, the most optimal hyperparameter values and work well in all tasks are in the following value ranges [16]:

a. Number of epochs:2, 3, 4 b. Batch sizes:16, 32

c. Learning rate (Adam):5e-5, 3e-5, 2e-5 2.6 IndoBERT

IndoBERT has the same model as the BERT model. The process involved is fine-tuning or training new data to create a model from a specific dataset using preprocessed datasets. The list of alternative parameter values to be used is as follows:

a. Number of epochs:2, 3, 4 b. Batch sizes:16, 32

c. Learning rate (Adam):5e-5, 3e-5, 2e-5 2.7 Evaluation

The evaluation stage aims to see the results of sentiment analysis. This study will use evaluation metrics to measure the quality of the classification model that has been trained. Evaluation metrics are important in achieving an optimal classifier during classification training [28]. Evaluation metrics used in this study are.

a. Confusion Matrix

(4)

The machine learning concept in which there is information about the actual and predicted classifications done by the classification system is called the confusion matrix [29]. there are two dimensions to the confusion matrix; the class predicted by the classifier indexes one dimension, and the actual class of an object indexes another.

Table 1. Confusion Matrix

True Positive Class True Negative Class Positive Class Prediction True positives (tp) False negatives (fn) Negative Class Prediction False positives (fp) True negatives (tn)

Based on the table above, true positive (tp) and true negative (tn) show the number of positive and negative examples that are accurately classified. Meanwhile, False positive (fp) and False negative (fn) respectively show the number of inaccurately classified negative and positive examples.

b. Accuracy

The accuracy metric will calculate the ratio of accurate predictions to the total number of instances that have been evaluated [28]. The formula defines accuracy:

𝑡𝑝+𝑡𝑛

𝑡𝑝+𝑓𝑝+𝑡𝑛+𝑓𝑛 (1)

c. Cross Validation

Cross-validation, also known as k-fold cross-validation, is a technique that calculates training errors by performing tests on test data. The trick is to repeat the sampling of the test data k times and require a different sample each time you do the test [30]. The training set will be divided into smaller k sets. The procedure below will be implemented for each k" folds":

Figure 3. Cross Validation Ilustration

As a result, the model will be validated on the remaining data (i.e., this will be used as a test set to calculate accuracy).

2.8 Karakteristik Data

The dataset in this study uses messages posted to Twitter pages or tweets taken using the Scraping Data technique through executable documents, namely the Google Colaboratory, using the scrape library. This dataset is data about smartphone users' opinions of Xiaomi smartphone products with the keywords "Xiaomi" and "redmi." Four aspects will be reviewed in this study, including "battery," "camera," "screen," and "ram." Sentiment labeling is done manually on dataset tweets based on the aspect being reviewed. There are 3 labels: positive, neutral, and negative.

The dataset is 3801 data consisting of date, aspect, username, and tweet, but the column used is only the tweeting part. Then the dataset is divided into 4 parts based on the reviewed aspects. The number of datasets that have a battery aspect is as many as 976 data which has the following distribution of labels,

Figure 4. Battery Aspect Sentiment

The sentiment on the battery aspect has a positive percentage of 33%, then a neutral percentage of 34%, and a negative percentage of 34%.

(5)

The number of datasets that have a camera aspect is as much as 1560 data which has the following distribution of labels,

Figure 5. Battery Aspect Sentiment

The sentiment of the camera aspect has a positive percentage of 33%, then a neutral percentage of 33%, and a negative percentage of 33%.

The number of datasets that have screen aspects is as many as 338 data which have the following distribution of labels,

Figure 6. Screen Aspect Sentiment

The sentiment of the screen aspect has a positive percentage of 33%, then a neutral percentage of 33%, and a negative percentage of 33%.

The number of datasets that have ram aspects is as much as 927 data which has the following distribution of labels

Figure 7. RAM Aspect Sentiment

The sentiment of the ram aspect has a positive percentage of 34%, then a neutral percentage of 34%, and a negative percentage of 32%.

2.9 Data Pre-processing

Data preprocessing will handle imperfect data. The data preprocessing carried out in this study was case folding, data cleaning, stopwords removal, stemming, and tokenization. Pre-processing of this data is done automatically through the Google Colaboratory service using the Python programming language. The following is a change in the dataset during the data preprocessing process.

2.9.1 Case Folding

The case folding stage aims to make all forms of letters in the dataset lowercase. The following table is an example of case folding results.

(6)

Table 2. Result Case Folding

Before After

@SobatHp Kamera Xiaomi 9 memang bagus untuk memfoto langit

#KameraXiaomi

@sobathp kamera xiaomi memang bagus untuk memfoto langit

#kameraxiaomi

It can be seen in the table above that words containing capital letters, such as 'Kamera' and 'Xiaomi,' will be converted into lowercase letters to become 'kamera' and 'xiaomi.'

2.9.2 DataCleaning

The data cleaning stage aims to remove links in each sentence, remove HTML tags in sentences, and mention and hashtag removal, namely removing every word with the prefix "@" or "#." The following table is an example of data cleaning results.

Table 3. Results of Data Cleaning

@sobathp kamera xiaomi memang bagus untuk memfoto langit #kameraxiaomi

kamera xiaomi memang bagus untuk memfoto langit

It can be seen in the table above that words containing mentions and hashtags such as '@SobatHp' and '#KameraXiaomi' will be deleted.

2.9.3 Tokenization

The tokenization stage aims to separate the text into minimal meaningful units. The following table is an example of tokenization results.

Table 4. Tokenization results

kamera xiaomi memang bagus untuk memfoto langit

[‘xiaomi’, ‘memang’, ‘bagus’,

‘untuk’, ‘memfoto’, ‘langit’]

It can be seen from the text in the table above that each word is separated into word units.

2.9.4 Stopword Removal

This stage aims to remove words that have no meaning. Researchers use Sastrawi, a simple Python library that defines a list of stopwords to be used at this stage. The following table is an example of the results of stopword removal.

Table 5. Stopword Removal Results

['Xiaomi, 'indeed,' 'good,' 'for,' 'to photograph,' 'sky']

[‘xiaomi’, ‘memang’, ‘bagus’,

‘memfoto’, ‘langit’]

It can be seen in the table above that words that have no meaning, such as 'untuk' will be deleted.

2.9.1 Stemming

This stage aims to extract basic words. Researchers use Sastrawi, a simple Python library that allows researchers to reduce inflected words in Indonesian to their basic forms. The following table is an example of stemming results.

Table 6. Stemming Results

Xiaomi memang bagus memfoto langit Xiaomi memang bagus foto langit

It can be seen in the table above that words are changed into root words, such as the word 'memfoto' being changed to the word 'foto'.

Then the dataset is divided into 2 parts: the training data and the test dataset. The training data will be divided into 2 more parts: the training dataset and the validation dataset. The training dataset is used to train the model; the validation dataset is used for the model validation process. In contrast, the test dataset is used to test models that have been trained using the training dataset. The training data will be divided using the k-fold cross-validation rule, dividing the dataset into k-1 training batches and 1 test batch in k folds. The k that will be used in this training are five, where later, the data will be divided into 80% for the training dataset and 20% for the test dataset in each iteration.

(7)

3. RESULTS AND DISCUSSION

3.1 Classification Models

After preparing the dataset, researchers need to do two things before training the data: tokenizing text sentences and creating batches of tokens using DataLoader to conduct training and testing. This is done so that data can be iterated and simplify model learning.

Then the researcher needs to select hyperparameters to fine-tune the model. The hyperparameters selected are Batch Sizes, Learning Rate (AdamW), and Epoch. The test scheme will be carried out by the researcher trying one hyperparameter at a time. For hyperparameters not being tested, their values will be set according to their default or smallest values. The hyperparameter will be selected if it produces the highest accuracy among the other values.

From the results of the tests carried out, the researcher found that batch size with a value of 16, epoch with a value of 4, and learning rate with a value of 3e-5 produced the best accuracy for the BERT and IndoBERT models in this study. Therefore researchers will use these hyperparameters to fine-tune the BERT and IndoBERT models. The following table will show the BERT model's training and validation accuracy results.

Table 7. Best Hyperparameter BERT Accuracy

f epoch battery camera screen ram

training validation training validation training validation training validation

1 1 0.58 0.88 0.56 0.86 0.37 0.68 0.55 0.85

1 2 0.89 0.86 0.93 0.93 0.81 0.93 0.88 0.86

1 3 0.92 0.90 0.95 0.92 0.90 0.90 0.93 0.93

1 4 0.97 0.90 0.97 0.93 0.98 0.88 0.97 0.93

2 1 0.62 0.84 0.71 0.90 0.38 0.54 0.47 0.76

2 2 0.86 0.93 0.93 0.89 0.63 0.75 0.88 0.94

2 3 0.94 0.92 0.95 0.89 0.78 0.89 0.94 0.94

2 4 0.97 0.93 0.96 0.93 0.93 0.94 0.96 0.96

3 1 0.50 0.67 0.66 0.93 0.41 0.61 0.38 0.36

3 2 0.79 0.88 0.93 0.93 0.78 0.88 0.78 0.94

3 3 0.92 0.89 0.96 0.94 0.95 0.88 0.93 0.94

3 4 0.96 0.88 0.97 0.93 0.97 0.90 0.96 0.93

4 1 0.49 0.58 0.59 0.83 0.40 0.59 0.53 0.89

4 2 0.77 0.91 0.89 0.94 0.73 0.64 0.92 0.94

4 3 0.92 0.93 0.96 0.93 0.90 0.88 0.95 0.95

4 4 0.95 0.93 0.98 0.93 0.97 0.85 0.97 0.92

5 1 0.54 0.76 0.67 0.88 0.32 0.49 0.51 0.82

5 2 0.84 0.85 0.91 0.94 0.65 0.88 0.91 0.92

5 3 0.92 0.87 0.95 0.94 0.93 0.88 0.96 0.97

5 4 0.95 0.90 0.97 0.95 0.96 0.90 0.97 0.96

Then the following table will show the training and validation accuracy results on the IndoBERT model.

Table 8. Best IndoBERT Hyperparameter Accuracy

f epoch battery camera screen ram

training validation training validation training validation training validation

1 1 0.80 0.97 0.84 0.92 0.56 0.88 0.70 0.95

1 2 0.95 0.98 0.97 0.94 0.93 0.93 0.97 0.96

1 3 0.99 0.95 0.98 0.95 0.99 0.93 0.99 0.97

1 4 1.00 0.97 0.99 0.96 1.00 0.93 0.99 0.97

2 1 0.72 0.90 0.83 0.93 0.55 0.91 0.66 0.95

2 2 0.96 0.97 0.98 0.95 0.96 0.96 0.97 0.97

2 3 0.98 0.97 0.98 0.93 0.98 0.95 0.98 0.97

2 4 0.99 0.92 0.99 0.91 1.00 0.96 0.99 0.97

3 1 0.77 0.92 0.83 0.95 0.54 0.76 0.68 0.96

3 2 0.97 0.95 0.97 0.95 0.92 0.97 0.97 0.97

3 3 0.99 0.95 0.98 0.95 0.99 0.96 0.98 0.96

3 4 0.99 0.94 0.99 0.96 1.00 0.97 0.99 0.97

4 1 0.76 0.94 0.81 0.97 0.54 0.82 0.71 0.94

4 2 0.97 0.97 0.97 0.96 0.95 0.86 0.97 0.96

4 3 0.99 0.99 0.99 0.97 0.99 0.91 0.99 0.97

4 4 1.00 0.91 0.99 0.95 1.00 0.90 0.99 0.96

5 1 0.74 0.91 0.83 0.95 0.50 0.62 0.63 0.97

(8)

5 2 0.97 0.96 0.96 0.96 0.87 0.90 0.97 0.96

5 3 0.98 0.94 0.98 0.96 0.97 0.91 0.99 0.99

5 4 1.00 0.96 1.00 0.97 1.00 0.90 0.98 0.98

The fine-tuning results show that the IndoBERT model produces a higher training accuracy and validation accuracy value than the BERT model, where the IndoBERT model has a greater training accuracy and validation accuracy value equal to 90% at epoch 4 in each fold.

Classification results are also shown using a confusion matrix to determine the performance of the BERT classification model and the IndoBERT classification model. In the BERT classification model on the battery test dataset, the results are as follows.

Figure 8. Confusion Matrix BERT Dataset Test Battery

Figure 9. Confusion Matrix IndoBERT Dataset Test Battery

Figure 7 shows the confusion matrix diagram resulting from the performance of the BERT model on the battery test dataset. This model correctly classifies 62 data as positive labels and incorrectly classifies positive labels into 4 as neutral labels and 0 as negative labels. Furthermore, this model correctly classifies 59 data as neutral labels and incorrectly classifies neutral labels into 3 data positive labels and 4 data into negative labels. Then this model correctly classifies 57 negative data labels and incorrectly classifies negative labels into 2 data positive labels and 4 data into neutral labels. Whereas in the IndoBERT classification model on the battery test dataset, the results are as follows.

Figure 8 shows the confusion matrix diagram resulting from the performance of the IndoBERT model on the battery test dataset. This model correctly classifies 65 data as positive labels and incorrectly classifies positive labels into 0 data as neutral labels and 1 data as negative labels. Furthermore, this model correctly classifies 64 data as neutral labels and incorrectly classifies 1 data as a positive label and 2 data as a negative label. Then this model correctly classifies 61 data as negative labels and incorrectly classifies negative labels into 1 data as positive labels and 1 data as neutral labels. Then in the BERT classification model on the camera test dataset, the results are as follows.

Figure 10. Confusion Matrix BERT Dataset Test Camera

Figure 11. Confusion Matrix IndoBERT Dataset Test Camera

Figure 9 shows the confusion matrix diagram resulting from the performance of the BERT model on the camera test dataset. This model correctly classifies 98 data as positive labels and incorrectly classifies positive labels into 5 data as neutral and 1 as negative. Furthermore, this model correctly classifies 100 data as neutral labels and incorrectly classifies neutral labels into 2 data positive labels and 2 data into negative labels. Then this model correctly classifies 100 data as negative labels and incorrectly classifies negative labels into 1 data as positive labels and 3 data as neutral labels. In the IndoBERT classification model on the camera test dataset, the results are as follows.

Figure 10 shows the confusion matrix diagram resulting from the performance of the IndoBERT model on the camera test dataset. This model correctly classifies 99 data as positive labels and incorrectly classifies positive labels into 4 data as neutral and 1 as negative. Furthermore, this model correctly classifies 99 data as neutral labels and incorrectly classifies 1 data as a positive label and 4 data as a negative label. Then this model correctly classifies 103

(9)

data as negative labels and incorrectly classifies negative labels into 0 data as positive labels and 1 data as neutral labels.

Whereas in the BERT classification model on the screen test dataset, the results are as follows.

Figure 12. Confusion Matrix BERT Dataset Test Screen

Figure 13. Confusion Matrix IndoBERT Dataset Test Screen

Figure 11 shows the confusion matrix diagram resulting from the performance of the BERT model on the screen test dataset. This model correctly classifies 22 data as positive labels and incorrectly classifies positive labels into 0 data as neutral labels and 0 data as negative labels. Furthermore, this model correctly classifies 20 data as neutral labels and incorrectly classifies neutral labels into 3 data positive labels and 0 data into negative labels. Then this model correctly classifies 21 data as negative labels and incorrectly classifies negative labels into 1 data as positive labels and 0 data as neutral labels. In the IndoBERT classification model on the screen test dataset, the results are as follows.

Figure 12 shows the confusion matrix diagram resulting from the performance of the IndoBERT model on the screen test dataset. This model correctly classifies 22 data as positive labels and incorrectly classifies positive labels into 0 data as neutral labels and 0 data as negative labels. Furthermore, this model correctly classifies 21 data as a neutral label and incorrectly classifies a neutral label as 1 data as a positive label and 1 data as a negative label. Then this model correctly classifies 23 negative data labels and incorrectly classifies negative labels into 0 data positive labels and 0 data into neutral labels. Whereas in the BERT classification model on the ram test dataset, the results are as follows.

Figure 14. Confusion Matrix BERT Dataset Test RAM

Figure 15. Confusion Matrix IndoBERT Dataset Test Ram

Figure 13 shows the confusion matrix diagram resulting from the performance of the BERT model on the ram test dataset. This model correctly classifies 57 data as positive labels and incorrectly classifies positive labels into 3 as neutral labels and 0 as negative labels. Furthermore, this model correctly classifies 60 data as neutral labels and incorrectly classifies 1 data as a positive label and 1 data as a negative label. Then this model correctly classifies 60 negative data labels and incorrectly classifies negative labels into 0 data positive labels and 3 data into neutral labels.

The results are as follows in the IndoBERT classification model on the ram test dataset.

Figure 14 shows the confusion matrix diagram resulting from the performance of the IndoBERT model on the ram test dataset. This model correctly classifies 56 data as positive labels and incorrectly classifies positive labels into 3 data as neutral and 1 as negative. Furthermore, this model correctly classifies 62 data as neutral labels and incorrectly classifies neutral labels into 0 data positive labels and 0 data into negative labels. Then this model correctly classifies 63 data as negative labels and incorrectly classifies negative labels into 0 data positive labels and 0 data into neutral labels.

3.2 Sentiment Analysis

At this stage, the previously trained model's results will be discussed to obtain good accuracy in the classification of each aspect. The model to be used is the IndoBERT model, which has the highest accuracy value in each aspect, namely the battery dataset in the 1st fold with an average accuracy of 98.5%, the camera dataset in the 5th fold with an average accuracy of 98.5%, screen dataset in the 3rd fold with an average accuracy of 98.5%, ram dataset in the 5th fold with an accuracy of 98%.

(10)

Sentiment classification is divided into 3 classes: positive, neutral, and negative sentiment. The number of test datasets that have a battery aspect is 1368 data. The battery test dataset has 1068 positive sentiments, 54 neutral sentiments, and 246 negative sentiments. The spread of sentiment labels on the battery test dataset is as follows.

Figure 16. Battery Test Dataset Sentiment

The figure above shows that the sentiment of the battery aspect has a positive percentage of 78%, then a neutral percentage of 4%, and a negative percentage of 18%. This shows that many users post tweets with positive sentiments for the battery aspect, so positive sentiment is the majority sentiment for the battery aspect of Xiaomi.

The number of test datasets that have a camera aspect is 2830 data. The camera test dataset has 1935 positive sentiments, 505 neutral sentiments, and 390 negative sentiments. The distribution of sentiment labels on the camera test dataset is as follows.

Figure 17. Sentiment Dataset Test Camera

The figure above shows that the sentiment of the camera aspect has a positive percentage of 68%, then a neutral percentage of 18%, and a negative percentage of 14%. This shows that many users post tweets with positive sentiments for the camera aspect, so positive sentiment is the majority sentiment for the camera aspect of Xiaomi.

The number of test datasets that have screen aspects is 988 data. The screen test dataset has 660 positive sentiments, 99 neutral sentiments, and 229 negative sentiments. The distribution of sentiment labels on the screen test dataset is as follows.

Figure 18. Sentiment Dataset Test Screen

The figure above shows that the sentiment of the screen aspect has a positive percentage of 67%, then a neutral percentage of 10%, and a negative percentage of 23%. This shows that many users post tweets with positive sentiment for screen aspects, so positive sentiment is the majority sentiment for screen aspects on Xiaomi.

The number of test datasets that have RAM aspects is 1304 data. The ram test dataset has 994 positive sentiments, 222 neutral sentiments, and 88 negative sentiments. The distribution of sentiment labels on the ram test dataset is as follows.

(11)

Figure 19. Sentiment Dataset Test Ram

The figure above shows that the sentiment of the ram aspect has a positive percentage of 76%, then a neutral percentage of 17%, and a negative percentage of 7%. This shows that many users post tweets with positive sentiments for the ram aspect, so positive sentiment is the majority sentiment for the ram aspect at Xiaomi.

It can be seen that of the four aspects, the dataset with the camera aspect is the most posted aspect on tweeters with a total of 2830 data and is the aspect with the most positive sentiment with a total of 1935 positive sentiments compared to the battery aspect, screen aspect, and ram aspect. This shows that many users give positive sentiments for the camera aspect, so the majority of sentiments in the test dataset in this study are positive sentiments for the camera aspect.

4. CONCLUSION

Based on the results of the tests and analyses carried out in this study, the questions in this study can be answered as follows. First, the test results compared between the BERT and IndoBERT classification models show that in terms of the confusion matrix, BERT made more sentiment prediction errors where BERT misclassified 43 data while IndoBERT only misclassified 23 data. Then in terms of accuracy, IndoBERT has an accuracy value greater than the accuracy value of BERT, where the accuracy value of IndoBERT has a greater accuracy score equal to 90% in epoch 4 in each fold.

Second, sentiment classification results for Xiaomi smartphone products show that positive sentiment on batteries has a greater number, with a positive percentage of 78%. In comparison, neutral sentiment is 4%, and negative sentiment is 18%. Then in the camera aspect, positive sentiment has a greater number with a positive percentage of 68%, while neutral sentiment is 18% and negative sentiment is 14%. Then on the screen aspect, positive sentiment has a greater number with a positive percentage of 67%, neutral sentiment is 10%, and negative sentiment is 23%. Next, on the ram aspect, positive sentiment has a greater number with a positive percentage of 76%, while neutral sentiment is 17% and negative sentiment is 7%. This is supported by the IndoBERT classification model, which has an average accuracy value of 98%. Third, the test results from the use of hyperparameters that can optimize the performance of the Bert model and the IndoBERT model show that a batch size with a value of 16, an epoch value of 4, and a learning rate with a value of 3e-5 produce the best accuracy for the BERT and IndoBERT models in this study. The results of this study are expected to be taken into consideration by the Xiaomi company by conducting sentiment analysis on the IndoBERT model, which is considered effective in obtaining customer sentiment; the accuracy value supports this in model testing, which is very good at detecting sentiment using Indonesian on social media Twitter. Apart from that, by dividing the aspects, the company can find out smartphone users' sentiments based on battery, camera, screen, and ram aspects. The sentiment analysis results can be used as an evaluation along with insights for the Xiaomi company so that in the future, the company can maintain and even improve the quality of the aspects that smartphone users like about Xiaomi products, namely cameras. This study uses public opinion on Xiaomi smartphone products on Twitter social media by separating data based on 4 main aspects: the battery aspect, the camera aspect, the screen aspect, and the RAM aspect. In future research, it is expected to use more data and be able to take data sources other than Twitter. Then for further research, it can be classified using the IndoBERT model to classify various other aspects, such as price and design.

REFERENCES

[1] Statista Research Department, “Smartphone penetration rate in Indonesia from 2017 to 2020 with forecasts until 2026,”

www.statista.com, 2022. .

[2] R. W. Utami, A. Jazuli, and T. Khotimah, “Analisis Sentimen Terhadap Xiaomi Indonesia Menggunakan Metode Naive Bayes,”

Indones. J. Technol. Informatics Sci., vol. 3, no. 1, pp. 21–29, 2021, doi: 10.24176/ijtis.v3i1.7514.

[3] canalys.com, “Global smartphone market shrinks 11% in Q1 2022 as regional headwinds bite,” https://canalys.com/, 2022. . [4] C. Troussas, M. Virvou, K. J. Espinosa, K. Llaguno, and J. Caro, “Sentiment analysis of Facebook statuses using Naive Bayes

Classifier for language learning,” IISA 2013 - 4th Int. Conf. Information, Intell. Syst. Appl., pp. 198–205, 2013, doi:

10.1109/IISA.2013.6623713.

[5] G. A. Buntoro, “Analisis Sentimen Calon Gubernur DKI Jakarta 2017 Di Twitter,” Integer J., vol. 2, no. 1, pp. 32–41, 2017.

[6] T. Nasukawa and J. Yi, “Sentiment Analysis: Capturing Favorability Using Natural Language Processing,” Proc. 2nd Int. Conf.

Knowl. Capture, pp. 70–77, 2003.

(12)

[7] N. M. S. Hadna, P. I. Santosa, and W. W. Winarno, “Studi Literatur Tentang Perbandingan Metode Untuk Proses Analisis Sentimen Di Twitter,” Semin. Nas. Teknol. Inf. dan Komun. 2016 (SENTIKA 2016), 2016.

[8] B. Liu, “Sentiment Analysis and Opinion Mining,” Synth. Lect. Hum. Lang. Technol., vol. 5, no. 1, pp. 1–167, May 2012, doi:

10.2200/S00416ED1V01Y201204HLT016.

[9] I. Sunni and D. H. Widyantoro, “Analisis Sentimen dan Ekstraksi Topik Penentu Sentimen pada Opini Terhadap Tokoh Publik,”

J. Sarj. Inst. Teknol. Bandung Bid. Tek. Elektro dan Inform., vol. 1, no. 2, pp. 200–206, 2012.

[10] E. Marrese-Taylor, J. D. Velásquez, F. Bravo-Marquez, and Y. Matsuo, “Identifying customer preferences about tourism products using an aspect-based opinion mining approach,” Procedia Comput. Sci., vol. 22, pp. 182–191, 2013, doi:

10.1016/j.procs.2013.09.094.

[11] C. Huang, A. Trabelsi, and O. R. Zaïane, “ANA at SemEval-2019 task 3: Contextual emotion detection in conversations through hierarchical LSTMs and BERT,” NAACL HLT 2019 - Int. Work. Semant. Eval. SemEval 2019, Proc. 13th Work., pp. 49–53, 2019, doi: 10.18653/v1/s19-2006.

[12] A. L. Bagus and D. H. Fudholi, “KLASIFIKASI EMOSI PADA TEKS DENGAN MENGGUNAKAN METODE DEEP LEARNING,” Syntax Lit. J. Ilm. Indones., vol. 6, no. 1, 2021.

[13] Y. Peng, S. Yan, and Z. Lu, “Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets,” Proc. ofthe BioNLP 2019 Work., pp. 58–65, 2019, doi: 10.18653/v1/w19-5006.

[14] W. Fang, H. Luo, S. Xu, P. E. D. Love, Z. Lu, and C. Ye, “Automated text classification of near-misses from safety reports: An improved deep learning approach,” Adv. Eng. Informatics, vol. 44, no. February, p. 101060, 2020, doi:

10.1016/j.aei.2020.101060.

[15] A. Vaswani et al., “Attention Is All You Need,” 31st Conf. Neural Inf. Process. Syst. (NIPS 2017), Long Beach, CA, USA, 2017, doi: 10.1109/2943.974352.

[16] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc.

Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

[17] J. H. Tandijaya, Liliana, and I. Sugiarto, “Klasifikasi dalam Pembuatan Portal Berita Online dengan Menggunakan Metode BERT,” J. Infra, vol. 9, no. 2, pp. 320–325, 2021.

[18] F. A. Pratama and A. Romadhony, “Identifikasi Komentar Toksik Dengan BERT,” e-Proceeding Eng., vol. 7, no. 2, pp. 1–9, 2020.

[19] M. I. Rahajeng and A. Purwarianti, “Indonesian Question Answering System for Factoid Questions using Face Beauty Products Knowledge Graph,” J. Linguist. Komputasional, vol. 4, no. 2, pp. 59–63, 2021.

[20] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” arXiv Prepr. arXiv2011.00677., 2020, doi: 10.18653/v1/2020.coling-main.66.

[21] R. M. R. W. P. K. Atmaja and W. Yustanti, “Analisis Sentimen Customer Review Aplikasi Ruang Guru dengan Metode BERT (Bidirectional Encoder Representations from Transformers),” J. Emerg. Inf. Syst. Bus. Intell., vol. 02, no. 03, p. 2021, 2021.

[22] C. A. Putri, Adiwijaya, and S. Al Faraby, “Analisis Sentimen Review Film Berbahasa Inggris Dengan Pendekatan Bidirectional Encoder Representations from Transformers,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 181–193, 2020, doi: 10.35957/jatisi.v6i2.206.

[23] K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken Dw, F. A. Bachtiar, and N. Yudistira, “BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews,” ACM Int. Conf. Proceeding Ser., pp. 258–264, 2021, doi:

10.1145/3479645.3479679.

[24] I. P. A. M. Utama, S. S. Prasetyowati, and Y. Sibaroni, “Multi-Aspect Sentiment Analysis Hotel Review Using RF, SVM, and Naïve Bayes based Hybrid Classifier,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 630, 2021, doi: 10.30865/mib.v5i2.2959.

[25] Y. Goldberg, Neural Network Methods in Natural Language Processing. 2017.

[26] Christopher D.Manning, “Speech and Language Processing: An introduction to natural language processing,” SPEECH Lang.

Process. An Introd. to Nat. Lang. Process. Comput. Linguist. Speech Recognit., pp. 1–18, 2021, [Online]. Available:

http://www.cs.colorado.edu/~martin/slp.html.

[27] A. Kulkarni and A. Shivananda, Natural Language Processing Recipes. 2019.

[28] H. M and S. M.N, “A Review on Evaluation Metrics for Data Classification Evaluations,” Int. J. Data Min. Knowl. Manag.

Process, vol. 5, no. 2, pp. 01–11, 2015, doi: 10.5121/ijdkp.2015.5201.

[29] X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, “An improved method to construct basic probability assignment based on the confusion matrix for classification problem,” Inf. Sci. (Ny)., vol. 340–341, pp. 250–261, 2016, doi: 10.1016/j.ins.2016.01.033.

[30] K. S. Nugroho, “Validasi Model Klasifikasi Machine Learning pada RapidMiner,” medium.com, 2020.

https://ksnugroho.medium.com/validasi-model-machine-learning-pada-rapidminer-50be0080df14.