Sentiment Analysis of Reviews on Lazada Apps using Naïve Bayes Algorithm

(1)

Sentiment Analysis of Reviews on Lazada Apps using Naïve Bayes Algorithm

Zhafran Afif Nurdiyansah^1,*, Berlilana²

1Faculty Computer Science , Study Program Informatics, Universitas Amikom Purwokerto, Purwokerto, Indonesia

1Faculty Computer Science , Study Program Information System, Universitas Amikom Purwokerto, Purwokerto, Indonesia Email: ^1,*[email protected], ²[email protected]

Correspondence Author Email: [email protected]

Abstract−Lazada app reviews on the Google Play Store become useful information if processed properly. Existing or new users can analyze app reviews to get information that can be used to evaluate the service. The activity of analyzing app reviews is not enough just to look at the number of stars, it is necessary to look at the entire content of the review comments to be able to know the purpose of the review. A sentiment analysis system is a system used to automatically analyze reviews to obtain information including sentiment information that is part of online reviews. This time the data will be classified using the Naive Bayes method. A total of 1,000 user reviews of the Lazada app were collected to form a dataset. The purpose of this study was to conduct sentiment analysis of Lazada app reviews on Google Play Store using Naive Bayes algorithm. This stage of research involves data collection, labeling, pre-processing, sentiment classification, and evaluation. In the pre-processing stage, there are 6 stages, namely Cleaning, Case Factoring, Word Normalization, Tokenization, Hyphen Removal, and Base Word Formation. The TF-IDF (Term Frequency - Inverse Document Frequency) method is used for word weighing. The data will be grouped into two categories, namely negative and positive. Next, the data will be evaluated using accuracy parameters. The test results showed an accuracy value of 84%, then for the grouping of negative and positive reviews, it was found that Lazada application reviews tended to be negative.

Keywords: Markeplace; Naïve Bayes; Sentimen Analysis; Pre – Processing; TF-IDF

1. INTRODUCTION

The rapid growth of E-commerce in Indonesia has given birth to many online marketplaces that offer various products and services to the public. This phenomenon not only reflects the rapid development of technology, but also creates various opportunities for consumers as well as businesses[1]. With the growing number of online marketplace options, consumers can easily search and compare products and prices encourage healthy competition among online marketplaces that drive innovation in user experience, services, and promotions [2]. Currently, Google Play Store provides various online market applications. As one of Google's digital content services, the Play Store includes digital products such as apps, music, books, games, and cloud-based media players[3]. Features such as ratings and reviews are included, allowing users to express their opinion about the products they use. One of the E-commerce apps available on the Play Store is the Lazada app, which was launched in 2012. According to data from SimillarWeb (May 2023), Lazada has around 74 million users in Indonesia. Due to the variety of opinions and thoughts, there are various points of view about this application, ranging from positive and negative discussions, user criticism, to suggestions related to features. While star ratings can make it easier to check reviews, they don't provide a thorough understanding of the entire content of the review [4]. Manual analysis of reviews is possible by checking reviews one by one, but it becomes impractical when dealing with large volumes. An efficient alternative is to use automated systems such as sentiment analysis [5].Sentiment analysis involves identifying, extracting, and assessing sentiments or opinions expressed in text or data, particularly in reviews or opinions[6].

The goal is to understand people's views, feelings, or attitudes toward a particular topic or entity. Sentiment analysis can be done manually by humans or automatically using computational algorithms and techniques such as natural language processing and machine learning [7]. In practical applications, sentiment analysis is often used in customer surveys, social media monitoring, product review analysis, and public feedback analysis to gain insight into perceptions and opinions[8].

This research was conducted due to the lack of utilization of reviews provided by users of the Lazada application, as the reviews were limited to expressions without follow-up only. This has led to a drop in the ranking of the Lazada app on the Play Store. The main purpose of this study is to conduct a sentiment analysis of Lazada app reviews that can be accessed on the Google Play Store. The goal is to assess users of the Lazada app by using the Naive Bayes technique with TF-IDF weights to categorize reviews into negative and positive sentiments.

In addition, this study aims to evaluate the accuracy of the Naive Bayes algorithm in conducting sentiment analysis of Lazada app reviews. Based on this background, the authors did not conduct the same research as other studies. This study used review data from the Lazada application on the Google Play Store, with the method used was a naïve bayes algorithm. Then the review data that has been collected will be grouped into negative and positive reviews before a process called sentiment analysis is carried out. Sentiment analysis, also known as opinion mining or artificial emotional intelligence, refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, measure, and study affective states and subjective information[9]. The application of sentiment analysis involves a variety of materials, including customer survey reviews and responses, online and social media content, and healthcare materials. Sentiment

(2)

analysis applications cover a wide range of areas, from marketing and customer service to clinical medicine[10].

In this study, sentiment analysis was used to review Lazada application reviews and then combined with the naïve bayes method. There are several similar studies, this is used by researchers for literature review material before conducting this study.

The initial study, conducted by [11] from Tanjungpura University in 2018, discussed Indonesian online product opinions and sentiment analysis using the Naive Bayes approach. The study went through stages such as data collection, pre-processing, word weighting, model building, and sentiment classification. TF-IDF was used in word weighting. In the 3-class test (negative, neutral, and positive), the best performance was achieved with 90% training data and 10% test data, resulting in 78% accuracy, 93.33% recall, and 77% precision. In the 5-class test, the optimal performance was also with 90% training data and 10% test data, achieving 59.33% accuracy, 58.33% recall, and 59.33% precision. The second study, conducted by [12] from Amikom University Yogyakarta in 2021, titled "Analysis of Sentiment on Google Play Store Apps utilizing Naive Bayes Algorithm and Genetic Algorithm," aimed to determine the accuracy level between Naive Bayes and Genetic Algorithm. The dataset comprised applications such as Shopee, Ruangguru, Pedia Shopee, and Gojek. The Pedia Store dataset achieved the highest accuracy rate of 96.87%, followed by Shopee with an overall accuracy of 96.53%, Ruangguru with 95.54%, and Gojek with 96.54%. The datasets were collected using a web scraper application from Google Chrome, and manual labeling was performed in Microsoft Excel. The third study, conducted by [13] from Teknokrat University of Indonesia in 2021, titled "Analysis of Public Sentiment Towards the Pre-Employment Card Program on Twitter with the Support Vector Machine Method," aimed to analyze public opinion on Twitter using Support Vector Machine (SVM). Linear and RBF kernels were compared, with the linear kernel achieving precision of 98.67%, recall of 99%, and an F1 score of 98%, while the RBF kernel had precision of 97%, recall of 98.67%, and an F1 score of 98%. The accuracy of the RBF kernel reached 98.34%, indicating a neutral sentiment towards the Pre-Employment Card Program. The fourth study, conducted by [14]from Teknokrat University in 2021, titled "Utilization of the Naive Bayes Algorithm for Analyzing Sentiments in National BMKG Twitter Data,"

employed Python 3.74 and a dataset of 1179 tweets. The data was divided into training and testing sets with a 70:30 ratio. Despite the imbalanced dataset, with more positive reviews, the Naive Bayes method without feature normalization resulted in a test accuracy of 69.97%. Therefore, after conducting a literature review and explaining the problems faced, this research needs to be carried out.

In the course of this research investigation, a significant contribution has been made to the existing body of knowledge, thereby enriching the literature available for subsequent research endeavors. This research serves as a foundational resource, particularly for those scholars and practitioners who are keen on exploring the nuanced realm of sentiment analysis in comment reviews, employing the sophisticated Naive Bayes algorithm. The multifaceted contribution of this research extends beyond its immediate findings and outcomes. It comprehensively addresses the complexities and intricacies associated with sentiment analysis, shedding light on the challenges, methodologies, and potential avenues for improvement within this analytical domain. By presenting a detailed examination of the application of the Naive Bayes algorithm in sentiment analysis, this research establishes a solid groundwork for further investigations, offering a nuanced understanding of the algorithm's efficacy in deciphering sentiments expressed in comment reviews. The comprehensive literature material produced by this research is poised to act as a catalyst for future studies, providing researchers with a wealth of insights, methodologies, and critical reflections. This, in turn, fosters an environment conducive to the refinement and advancement of analytical techniques in sentiment analysis. As the research community navigates the dynamic landscape of sentiment analysis, the intricate details and nuanced perspectives uncovered in this study are expected to inspire and guide future research endeavors, ultimately contributing to the continuous evolution and enhancement of sentiment analysis methodologies.

2. RESEARCH METHODOLOGY

The research methodology employed in this study is depicted in Figure 1, illustrating the sequential stages of the investigation. It encompasses four primary procedures to be undertaken, namely: Data Selection, Pre-Processing, Transformation, and Data Mining.

Figure 1. Concept Study

(3)

After Figure 1, there is a more detailed explanation for each stage in this study, therefore it will be explained as follows:

a. Data Selection

Data selection is the process of selecting a subset of relevant data from one or more data sources for analysis or modeling purposes. In this context, the main goal is to focus on parts of the data that are important or have significant value [15]. This is done so that the analysis process or model creation is more efficient, reduces complexity, and improves the quality of the results obtained. Data selection involves the selection of relevant variables or features, determination of appropriate sample sizes, and attention to the quality of the data used.

In this research, the data was taken from user reviews of the Lazada application, the data retrieval process was carried out automatically using the python. Then for the results of automatic review data collection later can be seen in table 1. For the beginning of taking reviews, all components are taken from username, userimage, content and others. To be clearer and more detailed, it has been presented in table 1.

Table 1. Result data retrieval

Review

Id userName User

Image content score

Thumbs Up Count

0

4e24de39- c0e7-47e1- ad9c- 0ec17558e399

Malik Aji https://play-

lh.googleusercontent.com/a/ACg8oc...

Setelah diperbarui malah banyak yang ngga sesu...

2

1

783e22fe-02e8- 4188-9a93- 2c26fe997a5d

Yudhi Bonge Sukajadi Gang Cendana

https://play- lh.googleusercontent.com/a/ACg8oc...

Saya cocok pakai aplikasi lazada karena semua ...

5

2

823b9922- b4d3-418d- a1d2- 7c0b3bb54ef6

Ella Octaviany

https://play- lh.googleusercontent.com/a-/ALV-U...

Kenapa yah,abis di update malah jadi error,pas...

3

b77fac67-3ab0- 4a40-81d6- ae15c151a6b5

Rudi Aja https://play-

lh.googleusercontent.com/a-/ALV-U...

Ulasan produk stelah update 11- 11, kenapa

tida...

5

4

1fd77195-fe2c- 4e70-b756- 083b3b47af53

Sakura sweet

https://play- lh.googleusercontent.com/a-/ALV-U...

dulu iklan melulu sampai memenuhi layar dan me...

1

Continue Table 1. Result data retrieval

reviewCreatedVersion at replyContent repliedAt appVersion

216 07.39.02 20/12/2023

18.27

Hi Kak Malik Aji, kami sampaikan permohonan

ma...

20/12/2023

18.35 07.39.02

159 07.40.01 20/12/2023

23.49

Hi Yudhi Bonge Sukajadi Gang C..., terima kasi...

21/12/2023

00.05 07.40.01

114 07.40.00 20/12/2023

16.54

Hi Kak Ella Octaviany, kami sampaikan

permohon...

20/12/2023

17.05 07.40.00

9944 07.37.00 04/11/2023

23.53

Hi Rudi Aja, terima kasih untuk ulasan positif...

05/11/2023

00.16 07.37.00

6458 07.35.01 17/10/2023

00.17

Hi Kak Sakura sweet, kami sampaikan

permohonan...

17/10/2023

00.26 07.35.01

(4)

After the data has been obtained, the data is grouped based on negative and positive reviews. For all grouping processes, it is done automatically using python coding. For the results of grouping negative and positive reviews can be seen in table 2. Because the components taken are too many, the review data is summarized to make it easier to understand.

Table 2. Grouping of negative and positive reviews

content score Label

0 Ongkirnya lebih mahal dari harga barangnya, pa... 5 Positif 1 lumayan mudah cepat dan ORI tapi kadang ya ada... 4 Positif 2 makin asik dan nyaman belanja di lazada sekara... 5 Positif 3 Sudah dua kali beli barang dilazada. Yang data... 1 Negatif 4 Barang si murah tapi, setelah di apdate ongkir... 1 Negatif 5 Mengapa aq perbarui akun q malah hilang {no hp... 1 Negatif 6 Kok lazada mahal banget ya,harga produk nya 16... 1 Negatif 7 Vocher diskon tidak menarik seperti dulu. Peli... 5 Positif 8 Hlo kak sebenarnya aplikasi bagus buat perbela... 2 Negatif 9 Paylater ga bisa dipake dengan alasan akulaku ... 1 Negatif

…. … …..

1000 tolong di perbaiki dong,ekspedisi pengirimanny... 2 Negatif b. Pre-Processing

Data preprocessing is a series of steps to clean, organize, and adjust data for effective use in analysis or modeling [16]. The goal is to ensure that the data used is in accordance with the needs of the analysis or model, so that the results obtained become more accurate and useful [17]. In this research, the re-processing stage or pre-processing of data involves a series of steps as follows.

1. Cleansing involves the removal of emoticons and symbols in this study. Emoticons and symbols are disregarded because the focus of this research primarily centers on the text within the reviews. The excluded characters encompassed "~", " `", "!", " $", " %", " ^", " &", " *", " (", " )", " _", "-", "+", "=", ":", "'",

"comma", "period", "?". Reviews containing expressions like "tolong min, untuk jasa kirim diperbaiki"

after processing become "tolong min untuk jasa kirim diperbaiki".

2. Case folding is a step that entails converting the text in the document to a standardized form, which is lowercase. For instance, the comment "Tolong perbaiki fitur pembayaran" transforms into "tolong perbaiki fitur pembayaran", where the uppercase "T" is changed to lowercase "t". Example case folding contained in the table 3.

Table 3. Process Case Folding

content score Label text_clean

0 Ongkirnya lebih mahal dari harga

barangnya, pa... 5 Positif ongkirnya lebih mahal dari harga barangnya pad...

1 lumayan mudah cepat dan ORI tapi

kadang ya ada... 4 Positif lumayan mudah cepat dan ori tapi kadang ya ada...

2 makin asik dan nyaman belanja di

lazada sekara... 5 Positif makin asik dan nyaman belanja di lazada sekara...

3 Sudah dua kali beli barang dilazada.

Yang data... 1 Negatif sudah dua kali beli barang dilazada yang datan...

4 Barang si murah tapi, setelah di

apdate ongkir... 1 Negatif barang si murah tapi setelah di apdate ongkir ...

5 Mengapa aq perbarui akun q malah

hilang {no hp... 1 Negatif mengapa aq perbarui akun q malah hilang no hp ...

6 Kok lazada mahal banget ya,harga

produk nya 16... 1 Negatif kok lazada mahal banget yaharga produk nya rb ...

7 Vocher diskon tidak menarik seperti

dulu. Peli... 5 Positif vocher diskon tidak menarik seperti dulu pelit...

8 Hlo kak sebenarnya aplikasi bagus

buat perbela... 2 Negatif hlo kak sebenarnya aplikasi bagus buat perbela...

…. … … … …

9 Paylater ga bisa dipake dengan alasan

akulaku ... 1 Negatif paylater ga bisa dipake dengan alasan akulaku ...

3. Word Normalizer is a method utilized to rectify words in reviews to produce sentences adhering to proper and accurate Indonesian grammar. This enhancement is crucial for enhancing reader comprehension regarding the intended meaning of the sentence. For example, if a comment reads "knpa lama proses

(5)

update," after undergoing Word Normalization, it becomes "kenapa lama proses update." The change of the word "knpa" to "kenapa" is done to enhance message clarity.

4. Stopword removal is a stage involving the elimination of words based on a predefined list of conjunctions.

Phrases like "di", "dan", "yang" will be removed.

5. Tokenizing is a technique employed to break down the text into phrases, considering punctuation and space constraints. For instance, the sentence "Lazada sekarang lemot" is transformed into "Lazada", "sekarang",

"lemot".

6. Stemming is the process of reducing words to their basic form or "root" word. For example, the word

"batalkan" is changed to "batal". Stemming is employed in text processing and analysis to reduce word variations and enhance consistency in subsequent analyses or processing.

After performing various stages in pre-processing for the final results of these stages can be seen in the table 4.

Table 4. Final Result Pre-Processing

text_clean text_StopWord text_tokens text_steamindo Positif

ongkirnya lebih mahal dari harga barangnya pad...

ongkirnya mahal harga barangnya tertera gratis...

[ongkirnya, mahal, harga, barangnya, tertera, ...

ongkirnya mahal harga barang tera gratis ongki...

Positif

lumayan mudah cepat dan ori tapi kadang ya ada...

lumayan mudah cepat ori kadang ya nyesel nya b...

[lumayan, mudah, cepat, ori, kadang, ya, nyese...

lumayan mudah cepat ori kadang ya nyesel nya b...

Positif

makin asik dan nyaman belanja di lazada sekara...

asik nyaman belanja lazada gampang belanja tin...

[asik, nyaman, belanja, lazada, gampang, belan...

asik nyaman belanja lazada gampang belanja tin...

Negatif

sudah dua kali beli barang dilazada yang datan...

kali beli barang dilazada sesuai dipajang foto...

[kali, beli, barang, dilazada, sesuai, dipajan...

kali beli barang dilazada sesuai pajang foto t...

Negatif

barang si murah tapi setelah di apdate ongkir ...

barang si murah apdate ongkir mahal males bela...

[barang, si, murah, apdate, ongkir, mahal, mal...

barang si murah apdate ongkir mahal males bela...

c. Transformation

The extraction of features using TF-IDF, also known as Term Frequency-Inverse Document Frequency, stands as a pivotal stage in natural language processing (NLP) and text mining. It involves converting raw textual data into a numerical format suitable for machine learning algorithms and various statistical analyses [18]. TF-IDF, being a statistical metric, assesses the significance of a term within a document relative to an entire collection of documents, commonly referred to as a corpus. This evaluation incorporates both the frequency of a term in a specific document (Term Frequency, TF) and the rarity of the term across the entire corpus (Inverse Document Frequency, IDF) [19]. The multiplication of these two values yields the TF-IDF score, which tends to be higher for terms that hold importance within a document but are infrequent across the entire corpus.

Additionally, Figure 2 illustrates the configuration and results of the TF-IDF process.

Figure 2. Result Transformation TF-IDF d. Data Mining

Data mining is a systematic process of uncovering pertinent, concealed, and potentially valuable insights from extensive and intricate datasets [20]. The primary objective of data mining is to reveal novel knowledge and information that can contribute additional value or confer a competitive advantage within a specific domain or industry [21]. This study employs one of the established algorithms in data mining, namely Naive Bayes. Naive Bayes is a widely used classification method known for its consistently high accuracy rates. The classification technique of Naive Bayes is rooted in basic probabilities and is designed to operate under the assumption of independence among explanatory variables [22]. This algorithm places a strong emphasis on probability-based learning, and the formulas commonly used in calculating the naive bayes algorithm (1)

P(H|X) = ^P(X|H)P(H)

P(X) (1)

(6)

In this equation, X represents evidence, H stands for the hypothesis, and P(H | X) signifies the probability that hypothesis H is true given evidence X, or conversely, P(H | X) contributes to the likelihood of hypothesis H under the circumstance of evidence X. On the other hand, P(X | H) can be understood as the likelihood of evidence X when hypothesis H is considered, while P(H) denotes the initial likelihood of hypothesis H, and P(X) represents the initial probability of evidence X.

3. RESULT AND DISCUSSION

.

There were 1000 reviews taken in this study, and after going through several preprocessing stages, only 990 data sets were used. The result of the Naive Bayes calculation can be seen in Figure 3.

Figure 3. Result of naïve bayes calculation

In Figure 3, there are calculations from naive bayes, displaying results that have been grouped into negative and positive reviews. Then for the results of the calculation, an accuracy rate of 84% was obtained when analyzing user reviews related to the Lazada application, with this accuracy value showing the success of this study in achieving a high level of accuracy. In addition, this study involves determining the value of precision, recall, and F1-Score, which are categorized into two groups, namely reviews that state negative sentiment and reviews that state positive sentiment. The results are outlined as follows

First, in the review category with negative sentiment, a precision score of 83% was obtained, accompanied by a recall value of 95%, resulting in an F1-Score of 89%. This illustrates the model's remarkable ability to identify negative reviews with accuracy and precision.

Meanwhile, in the review category with positive sentiment, a precision score of 87% was obtained, with a recall value of 65%, resulting in an F1-Score of 75%. Although the recall and F1-Score values were slightly lower compared to the negative sentiment group, these results still demonstrate the effectiveness of the model in identifying positive sentiment. Not only that, in this study, confusion matrix is also displayed which can be seen in figure 3. It can be concluded that the calculations in this study are very complete.

Furthermore, from these findings, it can be concluded that most reviews regarding the Lazada application tend to convey negative sentiments. Users often raise concerns about features that need improvement and user interfaces that might be considered confusing. In addition, significant observations are made regarding application updates, which can affect their performance, sometimes inadvertently causing slower application performance. As a result, this has resulted in an increase in the number of negative reviews from users.

Conversely, positive reviews often revolve around appreciation for substantial discounts and user-friendly features, especially with free shipping nationwide. Not only that, positive reviews are also given based on the ease of using paylatter which makes it easier for users to pay next month. This shows that there are positive aspects that Lazada can maintain and improve to maintain its popularity among users.

This research provides new insights and knowledge regarding sentiment analysis, especially when using the Naive Bayes algorithm. The results showed a fairly large or high level of accuracy reaching 84%, highlighting its effectiveness compared to other algorithms. This implies that the Naive Bayes algorithm has great potential in the field of sentiment analysis, particularly for applications such as Lazada. In addition, an interesting observation emerged from user feedback on the Lazada app, which showed most users expressing negative sentiments. This confirms areas where the app can be improved to provide a more satisfying user experience. Such areas include

(7)

not only functional aspects but also the overall user interface and user experience design. The implications of this study are noteworthy, providing steps that can be taken to improve the Lazada app. Addressing concerns raised in negative reviews provides an opportunity for Lazada to increase user satisfaction and potentially attract new users.

This confirms the practical significance of sentiment analysis and its potential impact on user-focused product development. In conclusion, the use of the Naive Bayes algorithm in sentiment analysis proved to be a valuable approach, with a high accuracy rate of 84%. The abundance of user feedback further confirms the relevance of this research, providing a clear roadmap for Lazada to make informed improvements to its app, positively improve the user experience, and potentially expand their user base

4. CONCLUSION

Research findings show that an accuracy rate of 84% has been achieved. In addition, the precision results for negative comments reached 83%, with a recall rate of 95%, resulting in an f1-score of 89%. Regarding positive comments, precision reached 87%, recall by 65%, and f1-score by 75%. The survey uses the TF-IDF technique combined with the Naive Bayes algorithm to give weight to the classification of user reviews into negative or positive sentiments. In addition, the analysis of user sentiment of the Lazada application revealed the dominating negative sentiment towards Lazada services. This shows potential areas of improvement, particularly in features and other aspects. In summary, the analysis concluded that data collected from the Google Play Store platform for the Lazada app showed a higher frequency of negative reviews than positive reviews. Negative feedback often has to do with app feature issues, the delivery process, and frequent slow updates. On the other hand, positive reviews often highlight various promotions offered by Lazada, such as free shipping, product discounts, and cashback

ACKNOWLEDGMENT

With sincerity and gratitude, the author would like to express his gratitude to all parties who have participated in completing the writing of this journal. Thank you to colleagues who provided valuable input, experience, and insight. Gratitude is also conveyed to the supervisors who have provided guidance and encouragement throughout the research process. Not to forget, thank you to the family who always provide support and understanding. All these contributions and support are very meaningful in achieving the success of the preparation of this journal.

Thank you for the cooperation and dedication of all parties.

REFERENCES

[1] S. Ayu and A. Lahmi, “Peran e-commerce terhadap perekonomian Indonesia selama pandemi Covid-19,” Jurnal Kajian Manajemen Bisnis, vol. 9, no. 2, p. 114, Dec. 2020, doi: 10.24036/jkmb.10994100.

[2] M. F. El Firdaus, N. Nurfaizah, and S. Sarmini, “Analisis Sentimen Tokopedia Pada Ulasan di Google Playstore Menggunakan Algoritma Naïve Bayes Classifier dan K-Nearest Neighbor,” JURIKOM (Jurnal Riset Komputer), vol. 9, no. 5, p. 1329, Oct. 2022, doi: 10.30865/jurikom.v9i5.4774.

[3] J. Homepage, N. C. Agustina, D. Herlina Citra, W. Purnama, C. Nisa, and A. Rozi Kurnia, “MALCOM: Indonesian Journal of Machine Learning and Computer Science The Implementation of Naïve Bayes Algorithm for Sentiment Analysis of Shopee Reviews on Google Play Store Implementasi Algoritma Naive Bayes untuk Analisis Sentimen Ulasan Shopee pada Google Play Store,” vol. 2, pp. 47–54, 2022.

[4] D. Pranitasari and A. N. Sidqi, “Analisis Kepuasan Pelanggan Elektronik Shopee menggunakan Metode E-Service Quality dan Kartesius,” Jurnal Akuntansi dan Manajemen, vol. 18, no. 02, pp. 12–31, Oct. 2021, doi:

10.36406/jam.v18i02.438.

[5] E. H. Muktafin, K. Kusrini, and E. T. Luthfi, “Analisis Sentimen pada Ulasan Pembelian Produk di Marketplace Shopee Menggunakan Pendekatan Natural Language Processing,” Jurnal Eksplora Informatika, vol. 10, no. 1, pp. 32–42, Sep.

2020, doi: 10.30864/eksplora.v10i1.390.

[6] G. Manik, I. Ernawati, and I. Nurlaili, Analisis Sentimen Pada Review Pengguna E-Commerce Bidang Pangan Menggunakan Metode Support Vector Machine (Studi Kasus: Review Sayurbox dan Tanihub pada Google Play). 2021.

[7] I. Habib Kusuma and N. Cahyono, “Analisis Sentimen Masyarakat Terhadap Penggunaan E-Commerce Menggunakan Algoritma K-Nearest Neighbor,” vol. 8, no. 3, 2023.

[8] D. Pakpahan, V. Siallagan, and S. Siregar, “Classification of E-Commerce Product Descriptions with The Tf-Idf and Svm Methods,” sinkron, vol. 8, no. 4, pp. 2130–2137, Oct. 2023, doi: 10.33395/sinkron.v8i4.12779.

[9] S. Pandya and P. Mehta, “A Review On Sentiment Analysis Methodologies, Practices And Applications,”

INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH, vol. 9, p. 2, 2020, [Online].

Available: www.ijstr.org

[10] S. Smetanin, “The Applications of Sentiment Analysis for Russian Language Texts: Current Challenges and Future Perspectives,” IEEE Access, vol. 8, pp. 110693–110719, 2020, doi: 10.1109/ACCESS.2020.3002215.

[11] M. Haerunnissa, A. Priyanto, C. Asnawi, and N. A. Sa’diya, “Analisis Sentimen Kepuasan Pelanggan Perusahaan

Telekomunikasi Seluler Telkomsel di Twitter.” [Online]. Available:

http://ejournal.unjaya.ac.id/index.php/Teknomatika/, vol. 12, no. 1, 2022.

[12] A. Rahman, E. Utami, and S. Sudarmawan, “Sentimen Analisis Terhadap Aplikasi pada Google Playstore Menggunakan Algoritma Naïve Bayes dan Algoritma Genetika,” Jurnal Komtika (Komputasi dan Informatika), vol. 5, no. 1, pp. 60–

71, Jul. 2021, doi: 10.31603/komtika.v5i1.5188.

(8)

[13] N. Hendrastuty, A. Rahman Isnain, and A. Yanti Rahmadhani, “Analisis Sentimen Masyarakat Terhadap Program Kartu Prakerja Pada Twitter Dengan Metode Support Vector Machine,” vol. 6, no. 3, 2021, [Online]. Available:

http://situs.com

[14] D. Darwis, N. Siskawati, and Z. Abidin, “Penerapan Algoritma Naive Bayes untuk Analisis Sentimen Review Data Twitter BMKG Nasional,” jurnal TEKNO KOMPAK, vol. 15, no. 1.

[15] L. Sun, X. Zhang, Y. Qian, J. Xu, and S. Zhang, “Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification,” Inf Sci (N Y), vol. 502, pp. 18–41, Oct. 2019, doi:

10.1016/j.ins.2019.05.072.

[16] A. Tabassum and R. R. Patil, “A Survey on Text Pre-Processing & Feature Extraction Techniques in Natural Language Processing,” International Research Journal of Engineering and Technology, 2020, [Online]. Available: www.irjet.net [17] A. P. Pimpalkar and R. J. Retna Raj, “Influence of Pre-Processing Strategies on the Performance of ML Classifiers

Exploiting TF-IDF and BOW Features,” ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, vol. 9, no. 2, pp. 49–68, Jun. 2020, doi: 10.14201/adcaij2020924968.

[18] N. Arifin, U. Enri, and N. Sulistiyowati, “ PENERAPAN ALGORITMA SUPPORT VECTOR MACHINE (SVM) DENGAN TF-IDF N-GRAM UNTUK TEXT CLASSIFICATION,” jurnal STRING (Satuan Tulisan Riset dan Inovasi Teknologi), Vol. 6 No. 2 Desember 2021.

[19] W.-S. Choi, K.-C. Yoo, and S.-H. Choi, “EasyChair Preprint Create List of Stopwords and Typing Error by TF-IDF Weight Value Create List of Stopwords and Typing Error by TF-IDF Weight Value,” 2019.

[20] R. Ordila, R. Wahyuni, Y. Irawan, and M. Yulia Sari, “PENERAPAN DATA MINING UNTUK PENGELOMPOKAN DATA REKAM MEDIS PASIEN BERDASARKAN JENIS PENYAKIT DENGAN ALGORITMA CLUSTERING (Studi Kasus : Poli Klinik PT.Inecda),” Jurnal Ilmu Komputer, vol. 9, no. 2, pp. 148–153, Oct. 2020, doi:

10.33060/jik/2020/vol9.iss2.181.

[21] L. Setiyani, M. Wahidin, D. Awaludin, and S. Purwani, “Analisis Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Metode Data Mining Naïve Bayes : Systematic Review,” Faktor Exacta, vol. 13, no. 1, p. 35, Jun. 2020, doi: 10.30998/faktorexacta.v13i1.5548.

[22] I. Nurjanah, J. Karaman, I. Widaningrum, and D. Mustikasari, “Penggunaan Algoritma Naive Bayes Untuk Menentukan Pemberian Kredit Pada Koperasi Desa,” 2023.