Prediction of Indonesian Presidential Election Results using Sentiment Analysis with Naïve Bayes Method

(1)

Prediction of Indonesian Presidential Election Results using Sentiment Analysis with Naïve Bayes Method

Asno Azzawagama Firdaus¹, Anton Yudhana^2,*, Imam Riadi³, Mahsun⁴

1Master Program of Informatics Universitas Ahmad Dahlan, Yogyakarta, Indonesia

2Department of Electrical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia

3Department of Information System, Universitas Ahmad Dahlan, Yogyakarta, Indonesia

4Department of Indonesian Language and Literature Education, Universitas Mataram, Mataram, Indonesia Email: ¹[email protected], ^2,*[email protected],³[email protected], ⁴[email protected]

Correspondence Author Email: [email protected]

Abstract−Social media serves as a solution for politicians as a campaign tool because it can save costs compared to conventional campaigns. The 2024 Indonesian Presidential Election has drawn public attention, especially among social media users. Twitter, as one of the widely used social media platforms in Indonesia, functions as an effective campaign forum.

However, the problem that arises is how to automatically collect social media data related to presidential discussions and provide conclusions on the analysis results. Of course, this is not easy if done manually. Sentiment analysis is one approach that can be used for this in order to draw conclusions and analysis related to the available data. Data was collected shortly after the registration of presidential and vice-presidential candidates in November 2023. This study aims to obtain sentiment results from the latest data obtained, get the best model from the Naive Bayes method, to conduct analysis in predicting presidential election results based on sentiment. However, at the time of data collection, candidate numbers had not been assigned by the Election organizers. The obtained data amounted to 11,569 records using the Valence Aware Dictionary for Sentiment Reasoning (VADER) library for labeling. After removing duplicated tweets, the data was reduced to 4,893 records, with each candidate pair having 1,631 data points. The sentiment analysis classification model was determined using the Naïve Bayes method with Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction. Based on the data, the highest percentage of positive sentiment was found in Ganjar Pranowo - Mahfud MD data at 69.16%, and the highest negative sentiment was in Prabowo Subianto - Gibran Rakabuming Raka data at 52.12%. Common words in positive sentiment for Ganjar Pranowo - Mahfud MD include "strong," "corruption," "support," "reward," and others. Meanwhile, frequently appearing negative sentiment words for Prabowo Subianto - Gibran Rakabuming Raka include "child," "eldest," "mk," "young," and others. This research achieved an average accuracy of 76.67% using the Naive Bayes method on the entire dataset, indicating its reliabilit y in similar cases.

Keywords: Indonesia; Naïve Bayes; President; Sentiment Analysis; Twitter

1. INTRODUCTION

Indonesia will undergo a change of President in 2024, as announced on the official Government website. President Jokowi explained that there will be no delay for the General Election (Pemilu) on February 14, 2024, and the simultaneous Regional Head Election (Pilkada) in November 2024. According to research on the Presidential candidate projections [1], three pairs of candidates for President and Vice President have been established. The Presidential candidates mentioned in the research have registered together with their respective Vice-Presidential candidates, namely Anies Baswedan – Muhaimin Iskandar, Ganjar Pranowo – Mahfud MD, and Prabowo Subianto – Gibran Rakabuming Raka. The registration of participants in the 2024 Indonesian Presidential Election contest has also been done in collaboration with coalitions or alliances of political parties, each supporting a specific pair of candidates. Parties and campaign teams for each candidate pair actively participate in campaigning to ensure the success of their respective candidates.

Social media plays a crucial role for politicians as a platform to engage participants. Social media has grown rapidly in recent years, becoming the widest information disseminator globally [2][3]. Control over media is crucial due to its power to influence, regulate, and limit content on social platforms. Social media has a significant impact on its users [4], offering great opportunities for politicians to reach the public directly at minimal costs [5]. This presents an opportunity for campaign teams and political parties within coalitions to campaign more easily through social media, reaching voters faster.

Among the many popular social media platforms in Indonesia, Twitter is one of the most widely used.

Twitter allows users to write short messages up to 140 characters, send images, and videos [6]. Twitter provides a space to express perspectives [7], making it a primary online platform for politicians to engage directly with the public through social and political comments. This is why political accounts are often more active than non- political accounts [8]. About 60% of users discuss political issues on sites like Twitter, making social media a popular place for people to trust the latest political information [9]. As the election approaches, Twitter users actively participate in discussing election topics, candidates, and supporting parties.

Data on social media cannot be manually selected one by one due to its ever-increasing volume. Sentiment analysis as one approach that can be used to overcome problems with the data. Because this requires a lot of time and energy if done manually. Sentiment analysis is a discipline that analyzes statements, attitudes, and feelings [10]. Sentiment analysis is an approach that utilizes large amounts of data to provide conclusions in the form of

(2)

sentiments contained in the available data. Sentiment analysis is applied in various case studies, including health, social, economic, and political topics. When it comes to predicting political content, sentiment analysis is reliable.

Naïve Bayes is one of the probability methods used for classification. This method serves as a machine learning model that can be used as a reference for trained data. Naïve Bayes is also widely used for sentiment analysis case studies. Naïve Bayes has proven to be the most famous method for sentiment analysis problems [11].

Naïve Bayes is also widely used in many applications [11][12][13][14] due to its simplicity, efficiency, and traceability [15], as well as having higher accuracy than other methods for sentiment analysis approaches [16].

The implementation of this method often uses feature extraction to weight the text for easier recognition when creating a model. This weighting can be in the form of TF-IDF, which calculates the weight of text in each document that will be the input value in the Naïve Bayes method.

Many studies have discussed sentiment analysis using social media data. One of them is in research [17]

about thai tourism in Bangkok, Chiang Mai and Phuket. This study used decision tree classification methods, support vector machines and random forests with the best analysis accuracy obtained at SVM 77.4%. Another study [18] on the use of natural language to analyze logistics needs in Japan during the Covid-19 pandemic. The main implications that "logistics" generally have a positive meaning, the trend of increasing interest in logistics in Western Japan in 2022, the use of social media to address the challenges of the logistics industry and the analysis of logistics and transportation trends.

Other studies that discuss predictions using the sentiment analysis approach in the General Election are [10]

also used Facebook comment data on the 2019 Indonesian presidential election to gain popularity and sentiment of presidential election candidates. The percentage of popularity of the Jokowi-Ma'ruf pair was 40.52% and the Prabowo-Sandi pair was 59.48%. The percentage of positive sentiment obtained by the Jokowi-Ma'ruf pair was 56.76 and negative sentiment was 43.24%. While the Prabowo-Sandi pair has a percentage of positive sentiment as much as 24.21% and negative sentiment as much as 75.79%.

Other research [19] also explains presidential election projections based on data obtained on the Twitter platform. This research determines sentiment based on data before the determination of candidates for the 2024 Indonesian Presidential candidate using the SVM method. Sentiment results were obtained based on three datasets of selected candidates, namely Anies Baswedan 65.62%, Ganjar Pranowo 73.58%, and Prabowo Subianto 66.34%.

The results of the accuracy of the methods owned by the three datasets are Anies Baswedan 73%, Ganjar Pranowo 79% and Prabowo Subianto 79%

Based on the above explanation, this research projects the results of the Presidential election using Twitter data. The data used is the latest data after the registration of Presidential and Vice-Presidential candidates by the Election organizers. The approach used to obtain and analyze the data is sentiment analysis. Sentiment analysis can provide conclusions from the dataset to form positive or negative sentiment classes. The projection analysis of the Presidential election results based on sentiment analysis is conducted using the Naïve Bayes text classification method with the TF-IDF weighting technique.

Although there have been many studies on Presidential elections using sentiment analysis approaches, this research provides novelty in the form of obtaining the latest data. In addition, this research offers a solution to the imbalance in the amount of data in each class, which is a common problem in sentiment analysis cases and only a few studies have provided solutions. This research will have a significant impact on various elements such as politicians, candidate winning teams, survey institutions and researchers as a reference in predicting the results of the presidential election. The data from this study can also be an input for other similar studies as a comparison for novelty in other studies. Finally, this research aims to obtain candidate sentiment results, classification models, and analysis using the Naive Bayes method with updated data on presidential election candidates.

2. RESEARCH METHODOLOGY

The Election Stages released on the official website of the General Election Commission of the Republic of Indonesia (KPU-RI) state that the schedule for the nomination of the President and Vice President of Indonesia is from October 19 to November 25, 2023. This is followed by the campaign period from November 28, 2023, to February 10, 2024, and the next stage is the voting process on February 14, 2024. Therefore, several political parties and coalitions have announced the candidates they support as candidates for the President of Indonesia in 2024 long before the official nomination date for the President and Vice President.

After the official registration at KPU-RI, each candidate is accompanied by their respective Vice- Presidential candidate and each political party supporting the coalition. Therefore, data collection was conducted for the three mentioned candidates in November 2023.

Sentiment analysis serves as a means to measure the results of the Presidential Election. This approach is carried out through several stages until it can be visualized based on popularity and the distribution of voter regions.

One of the shortcomings of other studies with similar cases is the inability to adjust data in each class to be equal.

This, of course, will alter the research results. This study addresses this issue by implementing stratified K-Fold Cross-Validation to ensure data adjustment across classes. Additionally, the study ensures the use of an equal amount of data for each candidate when drawing conclusions.

(3)

Figure 1. Framework for Twitter based sentiment analysis for Indonesia 2024 election

Figure 1 illustrates the research framework from data collection to the analysis and interpretation of public responses to the dataset after official registration at KPU-RI. The data utilized in November 2023, post-official registration, involves each prospective presidential candidate registering alongside their respective Vice- Presidential candidates, such as Anies Baswedan paired with Muhaimin Iskandar, Ganjar Pranowo with Mahfud MD, and Prabowo Subianto with Gibran Rakabuming Raka. The data collection process is followed by data cleaning (pre-processing) and labeling. Labeling is used to determine whether a text falls into a positive or negative category [19]. The labeling used is VADER, which is employed after translation into English. VADER is a lexicon- based technique for sentiment analysis in text [20]. VADER performs exceptionally well in the domain of social media due to its advantages over individual human analysis [21].

TF-IDF is employed as the feature extraction process in this research. The resulting TF-IDF feature extraction is then subjected to classification using the Naïve Bayes method to obtain the best model for the observed cases. Validation of the model will be conducted as a benchmark for evaluating the success of the classification method used.

TF-IDF is a commonly used weighting method in data mining to assess the influence of words in considering documents and the global corpus [22]. TF (Term Frequency) represents the frequency of word occurrences in an article or document class. Meanwhile, IDF (Inverse Document Frequency) provides the ability of words to be distinguished based on a classification method [23].

tfidf_i= tf_iidf_i (1)

TF indicates the frequency of occurrence in the corpus (Function 1). IDF is a measure of the importance of a term across the entire corpus. It consists of a logarithmic calculation of the inverse document relationships in the corpus (Function 2). The TF-IDF weight is calculated by multiplying the two (Function 3). The larger the weight, the more significant relevant keywords are in the corpus [24].

Naïve Bayes method was proposed by Thomas Bayes and serves as a reference for predicting future possibilities based on past experiences [25][26]. In theory, the Naive Bayes model has the lowest error rate compared to other classification algorithms [27]. Naïve Bayes is a classic classification method [28][29] widely used in real-world classification tasks [12][30], supervised based on Bayes' Theorem of conditional probability [31]. This algorithm is based on the Bayes' Theorem, as shown in the following formula [32][33][34].

P(A|B) =^P(B|A)P(A)

P(B) (2)

Formula (Function 4) illustrates the calculation of the probability of hypothesis A based on condition B. It indicates that the initial probability of class P(A) from the training data, the conditional probability of attribute distribution P(A|B), and the initial probability in each class P(B|A) multiplied by P(B).

Performing measurements on models in machine learning is crucial [35]. Cross-validation (K-Fold) is a method used to evaluate predictions on both training and testing samples. The data is partitioned into testing and training subsets, repeated at intervals to determine errors each time [3]. However, there may be an imbalance in class distribution, and the solution to this issue is to use stratified [36]. Stratified K-Fold Cross Validation is a method of data partitioning into k folds to maintain similar class proportions in each sample [37]. Evaluation methods for classification include precision, accuracy, recall, and F1 score, which have proven useful for confusion matrices [34].

(4)

Accuracy = TP + (^TN

TP) FP + FN + TN (3)

Precision =^TP

TP+ FP (4)

Recall =^TP

TP+ FN (5)

F1 = 2 x precision x ^recall

precision+ recall (6)

TP indicates true positive results, FP indicates false positive results, TN indicates true negative results, and FN indicates false negative results.

3. RESULT AND DISCUSSION

Indonesian Presidential Election in 2024 is projected to be determined using sentiment analysis approaches. This research establishes projections based on sentiment analysis results from data collected after the official registration of candidates.

3.1 Labelling

This study determines sentiment on Tweets using the VADER technique. The obtained data then undergoes preprocessing stages. The preprocessing stage, especially the drop duplicated step, results in the removal of some data deemed unnecessary. The data obtained in November 2023 amounted to 4,893 data points, with each candidate having 1,631 data points. The data is then translated into English to facilitate this labeling technique further. This research collects data before and after the official registration of candidates with the election organizing institution.

The sentiment results for the comparison of each candidate's data are displayed in Figure 2.

Figure 2. Sentiment Post Official Registration Candidates

Figure 2 displays a chart of data results with the highest positive sentiment, which is for Ganjar Pranowo and Mahfud MD, at 69.16%. Meanwhile, the largest negative sentiment is for Prabowo Subianto and Gibran Rakabuming Raka at 52.12%. Based on the sentiment chart obtained, common words that appear in each data for the most prevalent positive and negative sentiments are indicated in Figure 3.

(a) (b)

Figure 3. Word Cloud Sentiment Positive and Negative

Figure 3 shows a comparison of word sets. The largest negative sentiment is displayed in the data for Prabowo Subianto - Gibran Rakabuming Raka (b), reaching 52.12%, while the highest positive sentiment is found in the data for Ganjar Pranowo - Mahfud MD (a), amounting to 69.16%. The word sets that appear in the data for Prabowo Subianto - Gibran Rakabuming Raka include words like "child," "eldest," "mk," "young," and others.

Meanwhile, the data for Ganjar Pranowo - Mahfud MD contains words such as "strong," "corruption," "support,"

"reward," and others. The common words gathered are words frequently discussed by Twitter users in discussing the current Indonesian Presidential Election. The appearance of common words may change depending on the time

(5)

of word collection, as the topics discussed are issues that are currently trending in the context of the Presidential Election.

The next step is to translate the document into English and perform labeling using the VADER labeling technique. The labeling results undergo preprocessing steps, including dropping duplicates. The initial data count is 11,569, and after removing duplicate tweets, it is reduced to 4,893 data points. Each candidate has 1,631 data points. The results of preprocessing are then input into the TF-IDF stage, as shown in Figure 4.

3.2 TF-IDF

The TF-IDF stage is carried out using input from the preprocessing results in the stemming stage, as shown in Figure 4.

Figure 4. Architecture of TF-IDF

Figure 4 illustrates the TF-IDF process flow from the preprocessing stage. The stemming results are then passed on to calculate IDF and TF. Subsequently, the summation of DF is processed to obtain IDF, thus finding the TF-IDF results by multiplying TF and IDF.

Figure 5. Results TF-IDF

Figure 4 illustrates an example of TF-IDF calculation in a program that counts the occurrences of each word available in each document. Each word will be assigned a weight and occurrences in each document will be determined to find the frequency of word occurrences. Based on the TF value, the inverse number of word occurrences or IDF for each document will be found. Thus, the DF value is obtained, measuring how often a word appears in a document, with a higher value indicating more frequent occurrences of that word in the document.

Finally, the final TF-IDF value is obtained as input into the classification model for each method used.

The final output of TF-IDF will serve as input in the Naive Bayes method.

(6)

3.3 Naïve Bayes

The Naive Bayes calculation process is performed per fold for each word in the document, as shown in Figure 5.

Figure 6. Naive Bayes Display Calculation per Fold

The Multinomial Naïve Bayes classification method is used to create a model for each obtained data. The compared data consists of the candidates for President and Vice President. The models generated for both of these datasets are illustrated in Figure 6 and Figure 7 using a total of 10 folds.

3.4 Evaluation

The evaluation was conducted using a 10-fold method with testing results for precision, recall, F1-Score, and accuracy on three sets of presidential and vice-presidential candidate pairs, as indicated in Figure 7.

(a) (b)

(c)

Figure 7. Results Naive Bayes

(7)

Figure 6 illustrates the evaluation results of Precision, Recall, and F1-Score for each of the three candidates' data. Data for Anies Baswedan - Muhaimin Iskandar (a), Ganjar Pranowo - Mahfud MD (b), and Prabowo Subianto - Gibran Rakabuming Raka (c). The accuracy curves for each candidate are shown in Figure 7.

(a) (b)

(c)

Figure 8. Curve Accuracy for Each Fold of Naïve Bayes

Figure 7 shows the accuracy results graph for the Naive Bayes model. Figures (a), (b), and (c) depict data obtained after the official registration of presidential and vice-presidential candidates but before the assignment of ballot numbers. The keywords used are Anies Baswedan – Muhaimin Iskandar (a), Ganjar Pranowo – Mahfud MD (b), and Prabowo Subianto – Gibran Rakabuming Raka (c). The best fold accuracy was obtained by the Ganjar Pranowo – Mahfud MD pair (b) in fold 5 with 79% accuracy, Prabowo Subianto – Gibran Rakabuming Raka (c) in fold 9 with 77% accuracy, and Anies Baswedan – Muhaimin Iskandar (a) in fold 9 with 74% accuracy. The average accuracy for the entire dataset using the Naive Bayes method is 76.67. This proves that this method can be relied upon in sentiment analysis cases for political topic discussions.

3.5 Analysis

Every election contestant, especially in the Presidential Election, certainly undergoes a campaign process.

Campaigns are conducted to increase the candidate's popularity [38]. The popularity of a candidate is indicated by the mention of the number of words found in the data obtained, as shown in Table 1.

Table 1. Common Words Common Words Count

anies 1273

ganjar 1228

mahfud 1223

prabowo 939

gibran 852

muhaimin 483

Table 1 shows common words that appear in the data specifically focused on the names of the candidates for President and Vice President. The most frequently mentioned words in the data are anies with 1273 mentions, ganjar with 1228 mentions, mahfud with 1223 mentions, prabowo with 939 mentions, gibran with 852 mentions, and muhaimin with 483 mentions.

Comparing these findings with previous research, such as that by Chaudhry (2021), which demonstrated high accuracy when using the Naive Bayes method for sentiment analysis in the case of pre and post-election sentiments in America in 2020. The research achieved an accuracy of 94.58% using TF-IDF for feature extraction from Twitter data. Naive Bayes was found to have higher accuracy compared to other methods. Fitri et al (2023) showed that Naive Bayes outperformed Decision Tree and Random Forest, with an accuracy of 86.43% in sentiment analysis cases. Additionally, Naive Bayes was superior when compared to SVM, as demonstrated in the study by Alfina et al (2018), where Naive Bayes achieved 90.2% accuracy compared to SVM's 86.5% in a hate

(8)

speech case. Li et al (2020) also indicated that Naive Bayes outperformed SVM in sentiment analysis of video cases.

Finally, this study identified several factors for valuable comparisons. The complexity and size of the data influenced the performance of this method, and a balance of data among sentiment classes was necessary. If not achieved, techniques like stratified cross-validation are required to address this issue. The method's performance remained consistent even with more features and the ability to handle large-scale data quickly. Lastly, this method is straightforward when applied to various machine learning cases. This research has shortcomings, including: 1) It is not yet able to anticipate collections of Tweets from paid users (buzzers). This type of Tweet needs to be cleaned to obtain clean data for processing into the classification model. 2) The limitations of the obtained data do not match the voter population in Indonesia.

4. CONCLUSION

This research collected data on Twitter in November after candidate registration but before the determination of candidate pair numbers, totaling 11,569 tweets. Data cleansing stages were employed, such as case folding, tokenizing, stop word removal, normalization, stemming, and drop duplicated, to facilitate the classification process. The obtained tweets were labeled into positive and negative classes. This stage utilized the VADER technique, followed by the TF-IDF process, which helps determine the frequency of words in a document for classification purposes. The classification method used was Naïve Bayes, with the division of test and training data samples using stratified K-Fold to address data imbalance between classes. The validation model used precision, recall, F1 score, and accuracy. Based on the obtained data, the highest percentage of positive sentiment was for Ganjar Pranowo - Mahfud MD at 69.16%, and the highest negative sentiment was for Prabowo Subianto - Gibran Rakabuming Raka at 52.12%. The best accuracy was 79%, the lowest was 74%, with an average accuracy across all data of 76.67%. This proves that the Naive Bayes method is reliable for sentiment analysis in the context of presidential election topics. The most frequently mentioned words in the data are anies with 1273 mentions, ganjar with 1228 mentions, mahfud with 1223 mentions, prabowo with 939 mentions, gibran with 852 mentions, and muhaimin with 483 mentions. However, this research can serve as a reference for similar studies, serving as the basis for predicting the results of the presidential election using data within different time frames. Undoubtedly, it will be compared with the actual results of the Indonesian Presidential Election in 2024. This can also serve as a reference for political parties or campaign teams as an alternative to electability surveys in predicting the election results. Additionally, in future research plans, various methods will be applied, especially in Natural Language Processing (NLP) learning, such as the Support Vector Machine method. This method will be compared as a new finding with available data to project the results of the Presidential Election based on data collected at different times

ACKNOWLEDGMENT

This work is supported by the Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia based With Contract number: 181/E5/PG.02.00.PL/2023. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation. Author also gratefully for the cooperation as a partner of the Nusantara Research Institute, the Center for Human Development Studies Foundation.

REFERENCES

[1] A. A. Firdaus, A. Yudhana, and I. Riadi, “Public Opinion Analysis of Presidential Candidate Using Naïve Bayes Method,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 2, pp. 563–570, May 2023, doi: 10.22219/kinetik.v8i2.1686.

[2] H. Yas et al., “The Negative Role of Social Media During the COVID-19 Outbreak,” Int. J. Sustain. Dev. Plan., vol. 16, no. 2, pp. 219–228, Apr. 2021, doi: 10.18280/ijsdp.160202.

[3] B. Al sari et al., “Sentiment analysis for cruises in Saudi Arabia on social media platforms using machine learning algorithms,” J. Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00568-5.

[4] A. Zahri, R. Adam, and E. B. Setiawan, “Social Media Sentiment Analysis using Convolutional Neural Network (CNN) dan Gated Recurrent Unit (GRU),” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 9, no. 1, pp. 119–131, 2023, doi:

10.26555/jiteki.v9i1.25813.

[5] E. S. Prihatini, “Women and social media during legislative elections in Indonesia,” Womens. Stud. Int. Forum, vol. 83, p. 102417, Nov. 2020, doi: 10.1016/j.wsif.2020.102417.

[6] S. Hinduja, M. Afrin, S. Mistry, and A. Krishna, “Machine learning-based proactive social-sensor service for mental health monitoring using twitter data,” Int. J. Inf. Manag. Data Insights, vol. 2, no. 2, p. 100113, 2022, doi:

10.1016/j.jjimei.2022.100113.

[7] N. M. Azahra and E. B. Setiawan, “Sentence-Level Granularity Oriented Sentiment Analysis of Social Media Using Long Short-Term Memory (LSTM) and IndoBERTweet Method,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 9, no. 1, pp.

85–95, 2023, doi: 10.26555/jiteki.v9i1.25765.

(9)

[8] D. Antypas, A. Preece, and J. Camacho-Collados, “Negativity spreads faster: A large-scale multilingual twitter analysis on the role of sentiment in political communication,” Online Soc. Networks Media, vol. 33, no. January, 2023, doi:

10.1016/j.osnem.2023.100242.

[9] R. H. Ali, G. Pinto, E. Lawrie, and E. J. Linstead, “A large-scale sentiment analysis of tweets pertaining to the 2020 US presidential election,” J. Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00633-z.

[10] B. Haryanto, Y. Ruldeviyani, F. Rohman, T. N. Julius Dimas, R. Magdalena, and F. Muhamad Yasil, “Facebook analysis of community sentiment on 2019 Indonesian presidential candidates from Facebook opinion data,” Procedia Comput.

Sci., vol. 161, pp. 715–722, 2019, doi: 10.1016/j.procs.2019.11.175.

[11] Z. Geng, Q. Meng, J. Bai, J. Chen, and Y. Han, “A model-free Bayesian classifier,” Inf. Sci. (Ny)., vol. 482, pp. 171–

188, 2019, doi: https://doi.org/10.1016/j.ins.2019.01.026.

[12] W. M. Shaban, A. H. Rabie, A. I. Saleh, and M. A. Abo-Elsoud, “Accurate detection of COVID-19 patients based on distance biased Naïve Bayes (DBNB) classification strategy,” Pattern Recognit., vol. 119, 2021, doi:

10.1016/j.patcog.2021.108110.

[13] H. Zhang, L. Jiang, and L. Yu, “Attribute and instance weighted naive Bayes,” Pattern Recognit., vol. 111, p. 107674, Mar. 2021, doi: 10.1016/j.patcog.2020.107674.

[14] S. Wang, J. Ren, and R. Bai, “A semi-supervised adaptive discriminative discretization method improving discrimination power of regularized naive Bayes,” Expert Syst. Appl., vol. 225, no. November 2022, 2023, doi:

10.1016/j.eswa.2023.120094.

[15] R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Variable selection for Naïve Bayes classification,” Comput. Oper. Res., vol. 135, 2021, doi: 10.1016/j.cor.2021.105456.

[16] V. A. Fitri, R. Andreswari, and M. A. Hasibuan, “Sentiment analysis of social media Twitter with case of Anti-LGBT campaign in Indonesia using Naïve Bayes, decision tree, and random forest algorithm,” Procedia Comput. Sci., vol. 161, pp. 765–772, 2019, doi: 10.1016/j.procs.2019.11.181.

[17] N. Leelawat et al., “Twitter data sentiment analysis of tourism in Thailand during the COVID-19 pandemic using machine learning,” Heliyon, vol. 8, no. 10, 2022, doi: 10.1016/j.heliyon.2022.e10894.

[18] E. Hirata and T. Matsuda, “Examining logistics developments in post-pandemic Japan through sentiment analysis of Twitter data,” Asian Transp. Stud., vol. 9, no. April 2023, 2023, doi: 10.1016/j.eastsj.2023.100110.

[19] A. A. Firdaus, A. Yudhana, and I. Riadi, “DECODE : Jurnal Pendidikan Teknologi Informasi,” Decod. J. Pendidik.

Teknol. Inf., vol. 3, no. 2, pp. 236–245, 2023, doi: http://dx.doi.org/10.51454/decode.v3i2.172.

[20] M. Qorib, T. Oladunni, M. Denis, E. Ososanya, and P. Cotae, “Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset,” Expert Syst. Appl., vol. 212, no. September 2022, p. 118715, 2023, doi: 10.1016/j.eswa.2022.118715.

[21] H. Xu, R. Liu, Z. Luo, and M. Xu, “COVID-19 Vaccine Sensing: Sentiment Analysis and Subject Distillation from Twitter Data,” SSRN Electron. J., vol. 8, no. July, 2022, doi: 10.2139/ssrn.4073419.

[22] S. Xu, Y. Leng, G. Feng, C. Zhang, and M. Chen, “A gene pathway enrichment method based on improved TF-IDF algorithm,” Biochem. Biophys. Reports, vol. 34, no. December 2022, p. 101421, 2023, doi:

10.1016/j.bbrep.2023.101421.

[23] M. Liang and T. Niu, “Research on Text Classification Techniques Based on Improved TF-IDF Algorithm and LSTM Inputs,” Procedia Comput. Sci., vol. 208, pp. 460–470, 2022, doi: 10.1016/j.procs.2022.10.064.

[24] M. Chiny, M. Chihab, Y. Chihab, and O. Bencharef, “LSTM, VADER and TF-IDF based Hybrid Sentiment Analysis Model,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 7, pp. 265–275, 2021, doi: 10.14569/IJACSA.2021.0120730.

[25] A. Yudhana and A. Dwi, “Spatial distribution of soil nutrient content for sustainable rice agriculture using geographic information system and Naïve Bayes classifier,” Int. J. Smart Sens. Intell. Syst., vol. 16, no. 1, 2023, doi: 10.2478/ijssis- 2023-0001.

[26] A. Yudhana, D. Sulistyo, and I. Mufandi, “GIS-based and Naïve Bayes for nitrogen soil mapping in Lendah, Indonesia,”

Sens. Bio-Sensing Res., vol. 33, p. 100435, 2021, doi: 10.1016/j.sbsr.2021.100435.

[27] Z. Ye, P. Song, D. Zheng, X. Zhang, and J. Wu, “A Naive Bayes model on lung adenocarcinoma projection based on tumor microenvironment and weighted gene co-expression network analysis,” Infect. Dis. Model., vol. 7, no. 3, pp. 498–

509, 2022, doi: 10.1016/j.idm.2022.07.009.

[28] H. Zhang, L. Jiang, and G. I. Webb, “Rigorous non-disjoint discretization for naive Bayes,” Pattern Recognit., vol. 140, 2023, doi: 10.1016/j.patcog.2023.109554.

[29] W. Guo, G. Wang, C. Wang, and Y. Wang, “Distribution network topology identification based on gradient boosting decision tree and attribute weighted naive Bayes,” Energy Reports, vol. 9, pp. 727–736, 2023, doi:

10.1016/j.egyr.2023.04.256.

[30] A. H. Rabie, N. A. Mansour, A. I. Saleh, and A. E. Takieldeen, “Expecting individuals’ body reaction to Covid-19 based on statistical Naïve Bayes technique,” Pattern Recognit., vol. 128, 2022, doi: 10.1016/j.patcog.2022.108693.

[31] M. Vishwakarma and N. Kesswani, “A new two-phase intrusion detection system with Naïve Bayes machine learning for data classification and elliptic envelop method for anomaly detection,” Decis. Anal. J., vol. 7, no. January, 2023, doi:

10.1016/j.dajour.2023.100233.

[32] D. van Herwerden, J. W. O’Brien, P. M. Choi, K. V. Thomas, P. J. Schoenmakers, and S. Samanipour, “Naive Bayes classification model for isotopologue detection in LC-HRMS data,” Chemom. Intell. Lab. Syst., vol. 223, no. November 2021, 2022, doi: 10.1016/j.chemolab.2022.104515.

[33] M. Artur, “Review the performance of the Bernoulli Naïve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features,” Procedia Comput. Sci., vol. 190, no. 2019, pp. 564–570, 2021, doi: 10.1016/j.procs.2021.06.066.

[34] A. Tariq et al., “Modelling, mapping and monitoring of forest cover changes, using support vector machine, kernel logistic regression and naive bayes tree models with optical remote sensing data,” Heliyon, vol. 9, no. 2, 2023, doi:

10.1016/j.heliyon.2023.e13212.

[35] A. Peryanto, A. Yudhana, and R. Umar, “Convolutional Neural Network and Support Vector Machine in Classification

(10)

of Flower Images,” Khazanah Inform. J. Ilmu Komput. dan Inform., vol. 8, no. 1, pp. 1–7, 2022, doi:

10.23917/khif.v8i1.15531.

[36] C. B. G. Allo, L. S. A. Putra, N. R. Paranoan, and V. A. Gunawan, “Comparing Logistic Regression and Support Vector Machine in Breast Cancer Problem,” Jambura J. Probab. Stat., vol. 4, no. 2022, 2023.

[37] S. Szeghalmy and A. Fazekas, “A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning,” Sensors, vol. 23, no. 4, p. 2333, Feb. 2023, doi: 10.3390/s23042333.

[38] U. Daxecker and M. Rauschenbach, “Election type and the logic of pre-election violence: Evidence from Zimbabwe,”

Elect. Stud., vol. 82, no. January, 2023, doi: 10.1016/j.electstud.2023.102583.