Comparative Analysis of Naive Bayes Model Performance in Hate Speech Detection in Media Social Twitter

(1)

Comparative Analysis of Naive Bayes Model Performance in Hate Speech Detection in Media Social Twitter

Muhamamd Hadyan Baqi^1,*, Yuliant Sibaroni², Sri Suryani Prasetiyowati³ School of Computing, Informatics Study Program, Telkom University, Bandung, Indonesia

Email: ^1,*hadyanbaqistudent.telkomuniversity.ac.id, ²[email protected], ³[email protected] Correspondence Author Email: hadyanbaqistudent.telkomuniversity.ac.id

Submitted 08-01-2023; Accepted 08-02-2023; Published 17-02-2023 Abstract

Twitter is a popular social media in Indonesia, and for some people, it is a place to find and disseminate information. Hate speech is aggressive behavior against individuals or groups such on race, gender, religion, nationality, ethnicity, sexual orientation, gender identity, or disability. In this study, hate speech is modeled using Naive Bayesian models, which consist of Multinomial, Bernoulli, and Gaussian Naïve Bayes Models. These methods were chosen because Naïve Bayes is a simple method but has good performance in the case of sentiment analysis. This research aims to get the method with the highest accuracy value in analyzing hate speech. Thus, the Naïve Bayes model can provide the best solution for hate speech problems. The process carried out in this study is to process all data which obtained from Twitter social media and then classify it using the Multinomial Naïve Bayes, Gaussian Naïve Bayes, and Bernoulli Naive Bayes models based on the classification of HS and non-HS sentiment categories. In this study, to get the best accuracy, two different scenarios were used. The result of the analysis of the accuracy is 82.13% of the Multinomial Naïve Bayes model which is the best accuracy rate value compared with other models.

Keywords: Naïve Bayes; Multinomial; Gaussian; Bernoulli; Hate Speech

1. INTRODUCTION

Technology development continues to increase, which positively affects the field of information and communication so that information can be obtained quickly and easily. The increase in the use of social media among the public can prove this. Based on the We Are Social survey results, there will be more than 190 million users of social media who lives in Indonesia by January 2022. [1]. The increase was 12.35% compared to 2021, only 171 million people. Most people use social media to find information and express opinions. Social media activities can have positive and negative impacts on users. The effects of negative impacts will be felt when users use social media too often, where users will feel lazy and more concerned with themselves so that there will be less concern for others and the surrounding environment[2]. One of the most popular social media platforms in Indonesia is Twitter. The official website of Kominfo in 2013 stated that Indonesia occupied the fifth rank position in the use of Twitter worldwide after America, Brazil, Japan, and England[3].

The impact of freedom of opinion also creates uproar on social media if it exceeds the limits of freedom of opinion.

So it is not strange that Twitter users sometimes get reprimanded. Hate speech is often done with the aim of insulting, demeaning, or making the victim feel harmed. Twitter users who received a reprimand had the largest number of 79 accounts. This figure was recorded from February 23 to March 11, 2021. In the field of informatics, to detection of hate speech, it can be done by classifying tweet data on Twitter to detect hate speech. In addition, the government has also overcome problems related to hate speech by making the UU ITE that internet users are prohibited from disseminating information that causes hatred[4].

The rise of hate speech that has occurred on social media requires research on detecting hate speech. Research to detect hate speech can be done by implementing the Naïve Bayes model, as has been done by Gurinder Singh, et al [5]. The research performs sentiment analysis classification by implementing Naïve Bayes using the Bernoulli Naïve Bayes and Multinomial Naive Bayes models. Bayes' theorem helps discover the conditional probability of the occurrence between two events according to the probability of the occurrence of each event; that is why Naive Bayes was chosen.

The results showed that the Multinomial Naive Bayes model obtained better accuracy [5].

Suppose the previous research only used Naïve Bayes type modeling. In that case, the following research related to sentiment analysis on Twitter using machine learning techniques, namely Logistic Regression, Naïve Bayes, and Multinomial Naïve Bayes, was conducted by P. S. Mishra and S. Tanuben [6]. The result is that Multinomial Naïve Bayes gets the most optimal accuracy value with an accuracy rate of 98%.

Meanwhile, research on sentiment classification in movie reviews using Multinomial Naïve Bayes was also conducted by Muhamad Biki Hamzah [7]. The technique proposed in this study is Multinomial Naïve Bayes combined with the AdaBoost technique to enhance the accuracy when using Multinomial Naïve Bayes. The final result of the accuracy score of the Multinomial Naïve Bayes algorithm and addition method of AdaBoost and chi-square feature selection is 87.74%.

In addition, the application of the Multinomial Naïve Bayes algorithm in creating an opinion mining system was also carried out by G. R. Wisastra [8]. From the test results, the proposed model using Binarize Multinomial Naïve Bayes gets a precision value of 96.67% and a recall value of 96.4%. In comparison, the Normal Multinomial Naïve Bayes model produces a precision value of 83.3% and a recall value of 80.3%.

(2)

The application of variations of the Naïve Bayes algorithm for sentiment analysis on data sourced from Twitter social media was also carried out by N. Umar and M. Adnan Nur [9]. This research aims to get a method from the Naïve Bayes model that has the highest accuracy value by applying the frequency parameter using TF-IDF. This research uses the Naive Bayes models: Bernoulli Naive Bayes, Gaussian Naive Bayes, Complement Naive Bayes, and Multinomial Naive Bayes. Testing is done using k-fold cross-validation of data and split into data train and data test, then evaluating the accuracy level using a confusion matrix to get the best model. The best results were obtained in the Multinomial Naïve Bayes model with an accuracy score of 0.6374. The second highest accuracy score was the Bernoulli Naïve Bayes model, with an accuracy score of 0.6337.

Based on the research above, there is no research about hate speech detection using with Naïve Bayes several Naïve Bayes Models. In this research, the author will classify hate speech by implementing a model in Naïve Bayes to get the Naïve Bayes model with the most optimal performance. The models used include Multinomial, Gaussian, and Bernoulli. Naive Bayes was chosen because it has good enough performance to solve many cases with extensive data. As done by P. S. Mishra and S. Tanuben [6], modeling results with Naïve Bayes get 98% accuracy. Therefore, implementing the model in Naïve Bayes is expected to obtain the most optimal Naïve Bayes model among Multinomial, Gaussian, and Bernoulli in the case of hate speech detection.

2. RESEARCH METHODOLOGY

2.1 System Design

The System will be built to compare of Naïve Bayes model performance in hate speech detection. Building this model consists several stages, Figure 1 describes the system that was formed.

Figure 1. System Design Flowchart

System development starts from crawling data obtained from Twitter and then labeling the data until model testing is carried out using the Naive Bayes model. This training data serves to train the modeling, while the testing data determines the performance of the model trained with the previous training data. The test produces a predictive value to measure the performance value of the Naïve Bayes model classification.

(3)

2.2 Crawling Data

The data collection process was carried out using the tweepy library and Twitter API tools to gain access to Twitter. Data was taken from April 20th, 2022, until November 10th, 2022, and data was searched through hashtags such as

#MataNajwaDebatJakarta, #DebatFinalPilkadaJKT, #TolakTegasPenundaanPemilu, #Tolak3Periode, and

#MahasiwaBergerak. The resulting data amounted to 4506 data.

Figure 2. Worldcloud Visualization

Word clouds are a direct and engaging way to visualize text. Word clouds are commonly used for various purposes.

The context for filtering text and providing exposure to of frequently occurring words[10]. Based on the Figure 2word cloud visualization above, it can be concluded that the words that most often appear from the dataset obtained are

"jokowikejarsetoran", "disbandkhilafatulmuslimin", and "ahok". The words that often appear contain the sentence

"jokowikejarsetoran" which discusses the news about Jokowi's position for the third period. Then on, "ahok" is one of the political figures caught in the case. Meanwhile, "disbandkhilafatulmuslimin" is an illegal community organization.

2.3 Labeling Data

The data labeling process in Table 1 is done manually by three people to reduce subjectivity in the dataset by labeling HS (hate speech) and Non_HS (Non-hates speech). Hate speech is a tweet containing prohibited writing because it can trigger hate speech against an individual or group. Below is an example of a dataset that has been labeled.

Table 1. Labeling Data

Tweet Label

"Jika persoalan pemimpin hanya dikaitkan dengan lisan, maka apa bedanya gubernur dengan anak-anak?

#MataNajwaDebatJakarta"

Non_HS Cerita Djarot ini diharapkan membuat para pendukung tetap berdamai. #debatfinalpilkadadki #temanahok

#AniesSandi

Non_HS Ahok mampu melawan korupsi di ibukota, bangun infrastruktur, mencoba memperbaiki dengan berbagai

trobosan dan works well.

Non_HS Huuu sylvi tak tahu apa-apa asal ngoceh #NobarAhokDjarot #DebatFinalPilkadaJKT HS Ulama dan Rakyat Saatnya bergerak lawan Kedzoliman Negara. #JokowiKejarSetoran

#JokowiKejarSetoran

HS Jangan sampai diadili setelah 10 tahun, maka caranya : "" Terus berkuasa melalui tangan Oligarki ""

#JokowiKejarSetoran #JokowiKejarSetoran

HS The following is the distribution of existing data with HS and Non_HS labels. The data used in this study were taken from April 20th, 2022, to November 10th, 2022. The data is obtained using hashtag keywords such as

#MataNajwaDebatJakarta, #DebatFinalPilkadaJKT, #TolakTegasPenundaanPemilu, #Tolak3Periode, and

#MahasiwaBergerak. The data distributed can see in Figure 3 below.

Figure 3. Data Distributed

(4)

From the dataset that has been formed, Non_HS tweet data are 2840 data, while HS tweet data totals 1665 tweets.

2.4 Preprocessing Data

This preprocessing phase will go through many stages to make the data efficient and have a format that is easy for computers to understand and efficient.

2.4.1 Data Cleaning

Data cleaning removes HTML characters, symbols, usernames, URLs, and emails. Data cleaning facilitates calculation and saves time [11], making it easier to group words in the tokenizing stage later.

2.4.2 Case Folding

At this stage, uniformity is performed on all characters. Characters in the dataset are converted into lowercase letters, and characters listed from 'A-Z' become 'a-z' characters. So, if there is the same word but has a difference in the case, then the word is not identified as a different word and will be made into the same token [12].

2.4.3 Tokenizing

In This Tokenizing stage, the sentences in each line of the dataset will be broken down into words or tokens. So that from these tokens, further analysis can be done. The input string is trimmed based on each word used to compose it. Spaces are also used to separate words [13].

2.4.4 Normalization

At this stage, changes are made to words that are not standardized to standardized and change abbreviated words to words that should be by comparing data using a dictionary obtained from GitHub [14]. So that even if there is a word writing error, the word gets corrected to a standardized word and unites the same word with different writing into the same token.

2.4.5 Stopwords Removal

The process of removing stopwords, which is the removal of off-topic words that are considered unimportant. In this case, the words in the stoplist are conjunctions such as "dan", "atau", and "yang" that obtained from Sastrawi [16]. This process helps to reduce unimportant features in the data [15].

2.4.6 Stemming

Stemming reduces the number of different indices of a piece of data so that words with suffixes or prefixes are converted back to their primary form. In addition, stemming is also helpful for grouping other words with similar root words and meanings but have different forms due to obtaining a different affix[11]. By stemming our text data, this process transforms it into something less readable but closer to its base meaning and more suitable for comparison across observations [16].

2.5 Model Naïve Bayes Classifier

In this research, three models of Naïve Bayes were used and compared to find the best model. These models are Multinomial, Bernoulli, and Gaussian Naïve Bayes.

2.5.1 Multinomial Naïve Bayes

Several terms form multinomial Naïve Bayes. The probability that one term uses another is independent or non-dependent.

For example, in Naïve Bayes, the value of a probability for a document in class c can be calculated by equation (1).

𝑃(𝑐|𝑑) =𝑃(𝑑|𝑐)𝑃(𝑐)

𝑃(𝑑) ⁽¹⁾

Description:

c : Class d : Document P : Probability

By calculating the term, the probability value in document d in class c can be calculated using equation (2)[17]. 𝑃(𝑐|𝑑) = 𝑃(𝑐) ∏_(1 ≤ 𝑘 ≤ 𝑛𝑑) 𝑃(𝑓𝑘|𝑐)

(2) Description:

𝑃(𝑐|𝑑) : class c likelihood in document d 𝑃(𝑐) : c class probability value

𝑃(𝑓𝑘|𝑐) : likelihood of the term fk in class c

(5)

The parameter 𝑃(𝑓𝑘|𝑐) calculated with equation (3).

𝑃(𝑓𝑘|𝑐) = 𝑇𝑐𝑡 + 1

∑_𝑡′∈𝑉𝑇𝑐𝑡^′+ 𝐵′ (3)

Description:

Tct : term t's appearance in a class c document

t^'∈VTct^' : the number of terms contained in all documents in class c B : the number of word variants that are still present in the train data [18]

For a given document, Multinomial Naive Bayes determines the class probability. The vocabulary size N and the letter C stand for the set of classes, respectively. Using the Bayes' rule in the equation, Multinomial Nave Bayes assigns a test document 𝑡_𝑖 to the class with the highest probability 𝑃(𝑡_𝑖|𝑐) (4)[19].

𝑃(𝑐|𝑡𝑖)= 𝑃(𝑐)𝑃(𝑡_𝑖|𝑐)

𝑃(𝑡_𝑖) ⁽⁴⁾

By dividing the whole number of documents that belong to class c by the total number of documents, one can estimate class P(c). While 𝑃(𝑡𝑖|𝑐) determined by the equation, represents the likelihood of receiving a document comparable to 𝑡_𝑖 in class c (5).

𝑃(𝑡_𝑖|𝑐) = (∑ 𝑓_𝑛𝑖)!

𝑛

∏𝑃(𝑤_𝑛|𝑐)^𝑓^𝑛𝑖 𝑓_𝑛𝑖!

𝑛

(5)

where𝑃(𝑤_𝑛|𝑐) is the likelihood that word n will occur given class c and 𝑓_𝑛𝑖 is the number of words in the test text 𝑡_𝑖. The equation is used in the training text to estimate the latter probability (6).

𝑃(𝑤_𝑛|𝑐) = 1 + 𝐹_𝑛𝑐

𝑁 + ∑^𝑁_𝑥=1𝐹_𝑥𝑐 ⁽⁶⁾

The laplace estimator is used to prefix each word count by one to avoid the zero frequency problem, and𝐹_𝑥𝑐is the total number of words in all training documents that correspond to class c. Equation (4)'s normalization factor 𝑃(𝑡_𝑖) can be determined by using Equation (7).

𝑃(𝑡_𝑖) = ∑ 𝑃

|𝐶|

𝑘=1

(𝑘)𝑃(𝑡_𝑖|𝑘) ⁽⁷⁾

Since they are dependent on class c, computationally (∑ 𝑓_𝑛 _𝑛𝑖)! and ∏_𝑛𝑓_𝑛𝑖!in Equation (5) can be eliminated without changing the outcome, thus Equation (5) can be written as ( 8).

𝑃(𝑡_𝑖|𝑐) = 𝛼 ∑ 𝑃(𝑤_𝑛|𝑐)^𝑓^𝑛𝑖

𝑛

( 8) where is α constant that is produced as a result of the normalization process.

2.5.2 Bernoulli Naïve Bayes

In Bernoulli modeling, the probability value for each class word will be calculated for each test data. The difference with the first example is that the probability calculation uses the amount of data containing the word t (Tct), not the word's frequency of occurrence. Suppose the example in Multinomial modeling uses the number of words and the number of vocabularies, then in Bernoulli modeling. In that case, the number of training data per class (Tc) and the number of classes (Σc) are used. Thus, the probability of each word is calculated using equation 9) [20].

𝑃(𝑐) = 𝑇𝑐𝑡 + 1

𝑇𝑐 + ∑𝑐 ⁹⁾

After obtaining the probability for each word, the inverse multiplication value for the probability for each word in the training data that is different from the test data will be calculated. Then suppose there is a set of words contained in the training data d and a set of words contained in the test data f. Then identify the training data d words that are not related to the test data f words. Then identify the training data words that are separate from the test data words.

Furthermore, the probability word value and total words (M) will be derived from these words. As for the sentence probability value for a class, it is calculated by equation ( 10) [20].

𝑃(𝑐) = 𝑃(𝑐)∏_𝑖=1^𝑁 𝑃(𝑐) 𝑥 ∏_𝑖+1^𝑀 (1 − 𝑃(𝑓𝑘^′|𝑐)) ^{( 10)}

description:

𝑃(𝑐) : probability of word in class C

(6)

1 − 𝑃(𝑓𝑘^′|𝑐)) : probability of words that are not in class C M : total of words

2.5.3 Gaussian Naïve Bayes

When continuous data is present, it is usually assumed that the continuous values bound to each class are normally distributed (Gaussian). Gaussian Naïve Bayes is well suited for making predictions based on normal distribution characteristics. The expected feature probability is calculated using the equation ( 11) [17].

𝑃(𝑥_𝑖|𝑐) = 1

√2𝜋𝜎_𝑐²𝑒𝑥𝑝 (−(𝑥_𝑖−𝜇_𝑐)²

2𝜎_𝑐² ) ( 11)

description :

𝑥_𝑖 : likelihood or value (attribute|𝑥_𝑖)

σ : Standard Deviation of attribute (attribute|𝑥_𝑖) µ : mean of (attribute|𝑥_𝑖)

2.6 Term Frequency-Inverse Document Frequency (TF-IDF)

The way to extract functions in sentiment analysis is by using TF-IDF [21]. This calculation is done for each word and assigns a weighted value to each word. Equation ( 12) is used to obtain the weights and the final result is used during the sentiment analysis process. The calculated TF-IDF value multiplies TF by IDF [12].

𝑖𝑑𝑓_𝑗= 𝑙𝑜𝑔 𝑙𝑜𝑔 (𝐷

𝑑𝑓_𝑗) ( 12)

Then, the TF-IDF calculation can be done to get the results. Equation ( 13) is the TF-IDF calculation formula.

𝑤_𝑖𝑗= 𝑡𝑓_𝑖𝑗𝑥 𝑖𝑑𝑓_𝑗 ^{( 13)}

description:

tfij : Number of term occurrences in the document wij : Weight of term in document

D : Number of all documents

idfj : Distribution of terms in document dfj : Number of documents containing the term 2.7 N-Gram Characters Features

The tokenization process uses n-gram token types in the tweet data. In this research, unigram, bigram, and trigram.

Unigram is a word splitter for n=1 or single-term tweet data. A bigram is an n-word splitting in n=2 sentences, and a trigram is an n-word splitting in n=3 sentences. The following is an illustration of an N-gram applied to one of the tweets [15].

Table 2. N-Gram Characters N-gram Output

Data Tweet Fadli Zon Minta Mendagri Segera Menonaktifkan Ahok Jadi Gubernur DKI

Unigram ‘Fadli’,’Zon’,’Minta’,’Mendagri’,’Segera’,’Menonaktifkan’,’Ahok’,’Jadi’,’Gubernur’ ‘,’DKI ‘ Bigram ‘Fadli Zon’,’ Minta Mendagri’,’ Segera Menonaktifkan’,’ Ahok Jadi’,’ Gubernur DKI’

Trigram ‘Fadli Zon Minta’,’ Mendagri Segera Menonaktifkan’,’ Ahok Jadi Gubernur’,’ DKI’

2.8 Performance Measurement

Performance measurement and evaluation of the created algorithm are very important. The system's performance uses a binary classification metric from many existing binary classifications to get an accuracy score. This accuracy is what determines the optimal value or not on the model used. The explanation formula for true positive (TP), true negative (TN), false positive (FP), and false negative (FN) can be seen in the Table 3. Confusion Matrix[21].

Table 3. Confusion Matrix

Actually Positive Actually Negative

Prediction Values Positive True Positive (TP) False Positive (FP)

Negative False Negative (FN) True Negative (TN) a. Accuracy is how accurate a system that has been created in performing classification correctly.

(7)

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁

𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 ⁽¹⁴⁾

3. RESULT AND DISCUSSION

This stage includes system testing and analysis of test results. These tests and analyses are carried out per the objectives described in the introduction. The test system has two scenarios.

3.1 Result and Discussion of Performance with Ratio Data 80:20

In the first test, namely model testing, by determining the same data ratio in each model, namely 80: 20. The division of this data ratio is done randomly to form 80% of the data used for training data. In comparison, 20% is used for test data.

Model testing is carried out on available data. Testing includes three models namely Multinomial, Gaussian, and Bernoulli. The test results can be seen in the table with 3 Model Comparison.

Table 4. Model Comparison

Model Ratio Data Accuracy

Multinomial

80:20 81%

Gaussian 71%

Bernoulli 76%

In the first test, namely model testing, by determining the same data ratio in each model, namely 80: 20. The division of this data ratio is done randomly to form 80% of the data used for training data. In comparison, 20% is used for test data. Model testing is carried out on available data. Testing includes three models namely Multinomial, Gaussian, and Bernoulli. The test results can be seen in the table with 3 Model Comparison.

Table 44 shows that the multinomial model has an accuracy value of 81%, the Gaussian model has an accuracy value of 71%, and the Bernoulli model has an accuracy value of 76%. It shows that the multinomial model is the most accurate compared to other sentiment analysis models.

3.2 Result and Discussion of Performance Using N-gram

This research topic is about hate speech, so this second scenario was chosen with n-grams. Because each n-gram can have different meanings that can affect whether there is hate speech in each n-gram, this scenario will perform n-grams including Unigram, Bigram, Trigram, Unibigram, and Bitrigram. The results of this n-gram character scenario applied to each model are shown in the table below.

Table 5. Comparison of N-Gram Accuracy of Multinomial Models

Model N-Gram Accuracy

Multinomial

Unigram 80.24%

Bigram 82.13%

Trigram 79.13%

Unibigram 81.46%

Bitrigram 81.35%

Table 6. Comparison of N-Gram Accuracy of Bernoulli Models

Bernoulli

Unigram 79.91%

Bigram 77.13%

Trigram 74.13%

Unibigram 79.57%

Bitrigram 75.80%

Table 7.Comparison of N-Gram Accuracy of Gaussian Models

Gaussian

Unigram 70.92%

Bigram 71.03%

Trigram 64.70%

Unibigram 77.02%

Bitrigram 71.03%

(8)

The results of testing the N-gram Character feature show that the multinomial model has the best average value of each n-gram character with a Unigram value of 80.24%, Bigram 82.13%, Trigram 79.13%, Unibigram 81.46% and Bitrigram 81.35%. This test shows that N-gram characters affect the accuracy of the Naïve Bayes model.

3.3 Analysis of Test Results

In this study, two supporting scenarios were run to obtain accurate performance results for the Naive Bayes model. In In the first test, namely model testing, by determining the same data ratio in each model, namely 80: 20. The division of this data ratio is done randomly to form 80% of the data used for training data. In comparison, 20% is used for test data. Model testing is carried out on available data. Testing includes three models namely Multinomial, Gaussian, and Bernoulli. The test results can be seen in the table with 3 Model Comparison.

Table 44, researchers used a data ratio of 80:20 for each naive Bayesian model and obtained good accuracy values, including 81% multinomial, 71% Gaussian, and 76% Bernoulli models. This value shows that the multinomial model has the highest accuracy and is the best model for hate speech detection. One of the factors that multinomial has the highest accuracy is because this model will consider certain terms that appear frequently. So the frequency of occurrence of the same word affects this model. It has higher accuracy than the other two models.

Figure 4. Result of Accuracy Model’s

The results of the first scenario prove that the Multinomial Naïve Bayes model with 81% accuracy is more optimal than the Gaussian Naïve Bayes, which is 10% below. Meanwhile, Bernoulli gets 76% only.

In the second scenario, researchers applied the N-Gram character feature to increase the accuracy score of each model, then applied a combination of n-grams to display the accuracy results for each model. From this test, researchers obtained the highest accuracy score for the Multinomial model with a unigram score of 80.24%, Bigram 82.13%, Trigram 79.13%, Unibigram 81.46%, and Bitrigram 81.35%. However, in this research, the tweet data used is Indonesian- language, with keywords mentioned during data collection. It means the highest accuracy when using the Multinomial model with the application of Bigram features in the case of hate speech detection.

Figure 5. Comparison Accuracy with N-Gram Character

As seen in Figure5, the highest performance in hate speech detection is obtained with Multinomial Naïve Bayes modeling applied to Bigram features with 82.13%. While the best performance underneath is still modeling with

81% 71% 76%

M U L T I N O M I A L G A U S S I A N B E R N O U L L I

(9)

Multinomial Naïve Bayes with the application of Unigram and Bitrigram features with accuracy values of 81.46% and 81.35%. The best accuracy result in this scenario is not only able to surpass the first scenario but also exceeds the accuracy value in other Multinomial Naïve Bayes research, which only gets accuracy at 63%.

4. CONCLUSION

From two scenarios of research flows that have been carried out, it can be concluded that the Naïve Bayes model with word vector representation using TF IDF and N-gram features to identify hate speech on Twitter gets satisfactory accuracy performance. The combined result of the best N-gram is in the Multinomial model by applying Bigram as the N-gram character. The most optimal performance result obtained is 82.13%. Testing the N-gram feature in this study gave positive results, with a tendency for the accuracy value to increase. N-Gram character is a feature that can increase the accuracy of sentiment analysis. The accuracy value with the addition of n-grams is also influenced by the meaning formed from the n-gram combination. So, each n-gram combination produces different accuracy. However, in this research, the tweet data used is Indonesian-language tweet data with keywords mentioned during data collection. The outcome demonstrates that, in the case of this hate speech detection, the Multinomial model with the application of Bigram features yields the maximum accuracy. Future research can explore this Multinomial Naïve Bayes model with different data ratios, extraction features, or languages to be used during development so that during testing, researchers can understand whether it affects the accuracy of the analysis. Furthermore, the author recommends using more datasets so that future researchers can study whether more things affect the accuracy of the Naive Bayes model.

REFERENCES

[1] WeAreSocial, “SOCIAL MEDIA USERS PASS THE 4.5 BILLION MARK,” 2021.

https://wearesocial.com/us/blog/2021/10/social-media-users-pass-the-4-5-billion-mark/ (accessed Nov. 15, 2022).

[2] C. O. (Universitas Muhammadiyah Yogyakarta), “Analisis Yuridis Tindak Pidana Ujaran Kebencian Dalam Media Sosial,” Al- Adl : Jurnal Hukum, vol. 13, no. 1, p. 168, 2021, doi: 10.31602/al-adl.v13i1.3938.

[3] A. Rafi R, M. Nasrun, and R. Astuti N, “Deteksi Ujaran Ancaman Berbasis Website Pada Postingan Media Sosial Twitter Menggunakan Metode Naive Bayes,” e-Proceeding of Engineering, vol. 8, no. 1, p. 500, 2021.

[4] A. Perwira, J. Dwitama, and K. Kunci, “Deteksi Ujaran Kebencian Pada Twitter Bahasa Indonesia Menggunakan Machine Learning : Reviu Literatur,” Jurnal SNATi, vol. 1, no. 1, pp. 31–39, 2021.

[5] G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification,” 2019 International Conference on Automation, Computational and Technology Management, ICACTM 2019, pp. 593–596, 2019, doi: 10.1109/ICACTM.2019.8776800.

[6] P. S. Mishra and S. Tanuben, “Sentiment Analysis of Twitter Text Using Machine Learning Techniques Like Logistic Regression, Naïve Bayes, and Multinomial Naïve Bayes,” International Research Journal of Modernization in Engineering Technology and Science, no. 07, pp. 2582–5208, 2022.

[7] M. B. Hamzah, “Classification of Movie Review Sentiment Analysis Using Chi-Square and Multinomial Naïve Bayes with Adaptive Boosting,” Journal of Advances in Information Systems and Technology, vol. 3, no. 1, pp. 67–74, 2021, doi:

10.15294/jaist.v3i1.49098.

[8] C. Fiarni, H. Maharani, and G. R. Wisastra, “Opinion Mining Model System for Indonesian Non Profit Organization Using Multinomial Naive Bayes Algorithm,” 2020 8th International Conference on Information and Communication Technology, ICoICT 2020, 2020, doi: 10.1109/ICoICT49345.2020.9166391.

[9] N. Umar and M. Adnan Nur, “Application of Naïve Bayes Algorithm Variations On Indonesian General Analysis Dataset for Sentiment Analysis,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 585–590, 2022, doi:

10.29207/resti.v6i4.4179.

[10] D. N. Fitriana and Y. Sibaroni, “ARJUNA) Managed by Ministry of Research, Technology, and Higher Education,” Accredited by National Journal Accreditation, vol. 4, no. 2, pp. 846–853, 2020, [Online]. Available: http://jurnal.iaii.or.id

[11] J. Evanovich, Hardcore twenty-four : a Stephanie Plum novel.

[12] J. Patihullah and E. Winarko, “Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit,”

IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 13, no. 1, p. 43, 2019, doi: 10.22146/ijccs.40125.

[13] S. Symeonidis, D. Effrosynidis, and A. Arampatzis, “A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis,” Expert Systems with Applications, vol. 110, pp. 298–310, 2018, doi:

10.1016/j.eswa.2018.06.022.

[14] riochr17, “Analisis-Sentimen-ID,” github, 2018. https://github.com/riochr17/Analisis-Sentimen- ID/blob/516d11ba66002cf6580ae4598e980ca71501df0a/kamus/kbba.txt#L1-L20

[15] E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion using word embedding for tweet topic classification,”

Proceeding of 2016 10th International Conference on Telecommunication Systems Services and Applications, TSSA 2016:

Special Issue in Radar Technology, no. 2011, 2017, doi: 10.1109/TSSA.2016.7871085.

[16] C. Albon, Machine learning with Python cookbook : practical solutions from preprocessing to deep learning. 2018.

[17] N. Rezaeian and G. Novikova, “Persian text classification using naive bayes algorithms and support vector machine algorithm,”

Indonesian Journal of Electrical Engineering and Informatics, vol. 8, no. 1, pp. 178–188, 2020, doi: 10.11591/ijeei.v8i1.1696.

[18] W. A. Prabowo and C. Wiguna, “Sistem Informasi UMKM Bengkel Berbasis Web Menggunakan Metode SCRUM,” Jurnal Media Informatika Budidarma, vol. 5, no. 1, p. 149, 2021, doi: 10.30865/mib.v5i1.2604.

[19] A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial naive bayes for text categorization revisited,” Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), vol. 3339, pp. 488–499, 2004, doi:

10.1007/978-3-540-30549-1_43.

(10)

[20] S. A. Karunia, R. Saptono, and R. Anggrainingsih, “Online News Classification Using Naive Bayes Classifier with Mutual Information for Feature Selection,” Jurnal Ilmiah Teknologi dan Informasi, vol. 6, no. 1, pp. 10–15, 2017.

[21] K. D. Kategori, “Kata Kunci : Naive Bayes, Bernoulli, Klasifikasi Dokumen Kategori”.