AI Explanation related Covid Hoax Detection Using Support Vector Machine and Logistics Regression Methods

(1)

AI Explanation related Covid Hoax Detection Using Support Vector Machine and Logistics Regression Methods

Naufal Haritsah Luthfi^*, Agus Hartoyo

School of Computing, Informatics Study Program, Telkom University, Bandung, Indonesia Email: ^1,*[email protected], ²[email protected]

Correspondence Author Email: [email protected]

Abstract−Hoax news about Covid is still circulating in society. Especially on social media, this phenomenon still occurs. The existence of this disinformation can cause divisions between communities. Currently, technology can classify hoax news and non-hoax news. But no system can see the reasons for a model to classify hoax news and non-hoax news. Therefore, in this study, a system was developed that can see words on a system that detects hoax and non-hoax news using the Support Vector Machine and Logistic Regression methods. Meanwhile, the Explainable AI method is Local Interpretable Model-agnostic Explanations (LIME). The test results show that the SVM and Logistic Regression methods have the highest accuracy of 91%

and 95%. The words collected in the dataset are sufficient to differentiate between a hoax and non-hoax news. It was found that hoax news about Covid-19 has many words related to Covid-19, religion, politics, medical, and words that are not related to Covid-19. Among them are "lockdown", "masjid", "rezim", "ventilator", and "kiamat". Meanwhile, non-hoax news about Covid-19 has many words related to Covid-19, government, and medical. Among them are "protokol", "isolasi", "infeksi",

"menteri", and "nakes".

Keywords: Detection; Explainable-AI; Hoax; Logistic Regression; Support Vector Machine; Tf-Idf; Word.

1. INTRODUCTION

News is information obtained through the form of newspapers, radio and online media [1]. Hoax is information or news that contains things whose truth has not been confirmed and is not following reality [2]. The impact of spreading hoax news is hazardous because it can cause divisions between communities, especially during the Covid-19 pandemic [3]. Hoax news in Indonesia, especially regarding the Covid-19 pandemic, is still a hot topic of conversation.

In contrast to the old days, people still find it difficult to find news, whereas, in today's digital era, it is effortless to get news or information. With increasingly sophisticated digital technology, a way has also been found to detect hoax news using algorithms that can determine whether the news is hoax news or not [4]. But until now, there is still no system that can see the reasons for the model to classify hoax news and non-hoax news. This is what needs to be done, considering that not everyone can understand what the hoax detection system is doing.

Considering that not many detection systems have Explainable AI to explain the detection. Each hoax news has different words, so each piece has its point of view. By knowing the words in the news, it is expected to know whether the news leads to factual news or hoaxes.

Much research has been done on the detection of hoaxes related to news, such as research conducted by Ropikoh (2021), who examined the application of the Support Vector Machine algorithm for the classification of hoax news [5]. This research produces an accuracy value of 90.46%, a precision value of 66.86%, and a recall value of 64.53%. Furthermore, there is research from Ismayanti (2021), which examines the detection of Indonesian-language hoax content on Twitter using the Word2Vec expansion feature [6]. The research resulted in an accuracy value of 82.79% and an F1-Score of 0.8278.

Frista (2018), conducted research on the detection of hoax content using the Levenshtein distance method [4]. However, the test obtained a limit of 0.0014 in the results of scenario 2 due to the fact that more data was used than in scenario 1. The Tf-Idf calculation calculates the number of word occurrences in documents which causes the resulting performance to be poor. Alvanof (2020), conducted research on the detection of hoax content using 4 types of algorithms namely, multilayer perceptron or MLP, Naïve Bayes, SVM, and Random Forest [1]. The highest accuracy was obtained, namely, the Random Forest algorithm with a value of 75.37%. The random forest algorithm gets the highest accuracy value, so the precision and recall level is also high.

Palma (2021), conducted research on the text classification of COVID-19 hoax news articles using the K- Nearest Neighbor algorithm [3]. It was found that the test results obtained an accuracy value of 48% using a value of k = 5 and the percentage of data was 80% training data and 20% test data. However, the results obtained are not very good because there are several classes that have a small amount of data that affects the model's ability to classify these classes. Putra (2021), conducted research on the detection of the use of abusive sentences in Indonesian texts using the IndoBERT method [7]. Obtained from the results of testing the F1-Score value with the IndoBERT method obtained a value of 76.32%. The IndoBERT method has optimal results because it utilizes transformers which study contextual relationships between words in the text.

Pradana (2019), conducted research on the detection of hoaxes on Android-based social media [8]. In his research, good results were obtained with a minimalist and not confusing interface. However, the news analyzed can only be in Indonesian. Aldwairi (2018), conducted research on detecting fake news on social media networks

(2)

[9]. The research conducted obtained a high classifier value when using a logistics classifier with a precision value of 99.4%.

Then, there is research on Explainable AI using the LIME framework by Ribeiro (2016), which examines LIME explanations [10]. Research conducted on experiments on datasets about religion with several sentences that will be predicted related to religion. The results are that the LIME framework's accuracy reaches 89%. Saini (2021), conducted research on optimizing the AI LIME Explainable method to BO-LIME or (Bayesian Optimation-Local Interpretability Model-agnostic Explainability) [11]. The BO-LIME method works significantly based on Bayesian optimization and provides good explanatory stability compared to the LIME model.

Based on this background, the detection of hoaxes about Covid-19 is still interesting for further research.

The Explainable AI method using LIME is exciting to study because the framework can see the contents of the machine learning black box and then make it a white box or transparent, considering that it is rare to find research using Explainable AI. Then the SVM and logistic regression methods are used as models for the detection of hoax and non-hoax news. The two methods need to be compared to find the best classification report value so that explainable AI can see the black box correctly. The limitations of the problem in this research are datasets taken through the websites turnbackhoax.id, hoaxbuster, and Kompas. Then, this study uses a dataset with text in the form of news in Indonesian. The dataset used is 500 data. This final project will look at the performance of machine learning models from Support Vector Machine and Logistic Regression and what words are in hoax and non-hoax news related to Covid.

2. RESEARCH METHODOLOGY

2.1 System Design

Explainable AI does is look at the words in a hoax and non-hoax news in the dataset. An overview of the system design to be made can be seen in Figure 1. The dataset is news or articles retrieved via the web turnbackhoax.id and hoaxbuster as a collection of hoax news about Covid-19. In comparison, the Kompas web is a collection of non-hoax news. Before creating a system design model, steps such as data preprocessing and data splitting need to be carried out. Next, the model will be tested using the Support Vector Machine and Logistic Regression algorithms as the machine learning algorithms. Meanwhile, the Explainable AI framework uses Local Interpretable Model-agnostic Explanations or LIME.

Figure 1. System Design Flowchart 2.2 Dataset

The following dataset used is contained in Table 1. The dataset was taken via the websites https://turnbackhoax.id/

and https://covid19.go.id/p/hoax-buster as a hoax news dataset about Covid-19. Meanwhile, the website https://www.kompas.com/ is a non-hoax news dataset or factual news about Covid-19.

Table 1. Dataset

No News Title Text Label

1

[SALAH] Bandung Kota Zona Hitam Corona

“Bandung Kota, Zona Hitam,, Corona 19 di Bdg meningkat drastis.

Hati2 ya kawan semua.” 1

2 Kasus Covid-19 Naik, Pejabat dan

Kementerian Sekretariat Negara (Kemensetneg) menerbitkan surat

Nomor B-56/KSN/S/LN.00/07/2022 mengenai kebijakan pelaksanaan 0

(3)

No News Title Text Label Pegawai

Pemerintah Dilarang ke Luar Negeri

perjalanan dinas luar negeri dalam upaya pencegahan penularan Covid- 19. Surat yang terbit pada Jumat (22/7/2022) dan bersifat sangat segera itu diteken Sekretaris Kemensetneg Setya Utama. Setya pun sudah mengonfirmasi terbitnya surat tersebut. Dalam surat itu disebutkan, perjalanan dinas luar negeri ditangguhkan sebagai dampak dari kembali meningkatnya kasus Covid-19 di Tanah Air.

2.3 Preprocessing Data

Data preprocessing is done after the dataset has been collected through crawling data. The results of the data preprocessing carried out are in Table 2. The purpose of this data preprocessing is to remove words that are not needed in the process of designing a hoax detection system.

a. Case Folding

At this stage, the process is carried out to make all letters or words that were previously capital letters lowercase.

b. Filtering

After tokenizing, the next stage is filtering, which takes the important words. Removing unnecessary words is done using the remove stopword technique [5].

c. Stemming

Stemming is the last process that is carried out to remove words that have affixes and turn them into basic words.

This stemming process will use the Sastrawi library.

d. Tokenizing

At this stage, a process is carried out to break a sentence into pieces of words so that it can be known which words are nouns, adjectives, verbs, conjunctions, and punctuation words in a sentence and can eliminate unnecessary words.

Table 2. Preprocessing Result

Step Text

Initial

“CORONA itu adanya di CHINA bukan di sini,

di sini Cuma di ada adakan lagian CORONA hanya penyakit biasa,

bukan wabah di zaman NABI, jadi, shouf di masjid-masjid wajib di rapatkan kembali,

agar tdk mengundang murka ALLAH SWT”

Case Folding &

Cleansing

corona itu adanya di china bukan di sini di sini cuma di ada adakan lagian corona hanya penyakit biasa bukan wabah di zaman nabi jadi shouf di masjid masjid wajib di rapatkan kembali agar tdk mengundang murka allah swt

Filtering

corona adanya china bukan sini sini cuma ada adakan lagian corona penyakit biasa bukan wabah zaman nabi jadi shouf masjid masjid wajib rapatkan tdk mengundang murka allah swt

Stemming corona ada china bukan sini sini cuma ada adakan lagi corona sakit biasa bukan wabah zaman nabi jadi shouf masjid masjid wajib rapat tdk undang murka allah swt

Tokenizing

['corona', 'ada', 'china', 'bukan', 'sini', 'sini', 'cuma', 'ada', 'adakan', 'lagi', 'corona', 'sakit', 'biasa', 'bukan', 'wabah', 'zaman', 'nabi', 'jadi', 'shouf', 'masjid', 'masjid', 'wajib', 'rapat', 'tdk', 'undang', 'murka', 'allah', 'swt']

2.4 Feature Extraction

Furthermore, feature extraction needs to be done because the machine learning model created cannot accept input in the form of strings. The feature extraction that will be used in this model is TF-IDF. The TF-IDF feature is word weighting by finding its TF value which then results in the form of a vector for each word weight [12]. The formula of TF-IDF can be described as follows. The following equation (1), 𝑎_𝑖𝑗 is the weight of the word i in document j.

Then, N is the number of documents, 𝑡𝑓_𝑖𝑗 is the frequency of the word i in document j. Next, 𝑑𝑓_𝑖 is the frequency of word documents in the existing document collection.

𝑎_𝑖𝑗= 𝑡𝑓_𝑖𝑗𝑑𝑓_𝑖= 𝑡𝑓_𝑖𝑗 × 𝑙𝑜𝑔₂ (^𝑁

𝑑𝑓𝑖) (1)

2.5 Support Vector Machine

Support Vector Machine (SVM) is one of the best machine learning algorithms for classification problems and outlier detection [13]. Basically, SVM is used to find the best hyperplane. Furthermore, the hyperplane is used to

(4)

be a separator between classes and the outermost data that is close to the hyperplane. Then, the outermost data that is close to the hyperplane is called a support vector. The formula used in the Support Vector Machine algorithm can be seen in the following equation. Equations (2) and (3) are used to calculate positive or negative data. Equation (4) is used to calculate vector weight values. Then, equation (5) is used to calculate the bias value.

𝑋𝑖. 𝑊 + 𝑏 ≥ 1, 𝑌𝑖 = 1 (2)

𝑋𝑖. 𝑊 + 𝐵 ≤ 1, 𝑌𝑖 = −1 (3)

𝑤 = ∑^𝑛_𝑖=1𝑎_𝑖𝑦_𝑖𝑥_𝑖 (4)

𝑏 = −¹

2(𝑤. 𝑥⁺ + 𝑤. 𝑥⁻) (5)

2.6 Logistic Regression

Logistic Regression is a machine learning algorithm that aims to calculate the comparison of a sample taken.

Logistic Regression is interpreted using a logistic model in a binary approach where the function will produce a negative value of 0 or a positive value of 1 [14]. The way this logistic regression works is to calculate the probability ratio of a sample to assign a negative (0) or positive (1) value from the relationship between input variables or features and the dependent or target variable [15]. The formula used in the Logistic Regression algorithm can be seen in the following equation. Equation (6) is used to calculate the simple regression equation.

Then, equation (7) is used to calculate the sigmoid function. Furthermore, equation (8) calculates the logistic regression equation.

𝑌 = 𝑎₀+ 𝑎₁𝑋 (6)

𝑝 = ¹

1+ 𝑒^−𝑦 (7)

𝑙𝑛 ( ^𝑝

1−𝑝) = 𝑎₀+ 𝑎₁𝑋 (8)

2.7 Classification Report

After the testing process has been carried out, model evaluation needs to be carried out to get maximum results from the research. The results of this test will get an accuracy level value in the form of accuracy, precision, recall, and F1 score [5]. The following is the classification report used in this study.

a. Accuracy

Accuracy is the comparison of the predicted positive and negative values with the whole data [16]. The following formula for calculating accuracy can be seen in equation (9).

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑣𝑒 𝑑𝑎𝑡𝑎

𝐴 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑑𝑎𝑡𝑎 (9)

b. Precision

Precision is the comparison of the predicted true positive value in all data [17]. The following formula for calculating precision can be seen in equation (10).

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ^𝑇𝑃

𝑇𝑃+𝐹𝑃 (10)

c. Recall

Recall is a comparison of the predicted true positive value on data that is true positive [17]. The following formula for calculating recall can be seen in equation (11).

𝑅𝑒𝑐𝑎𝑙𝑙 = ^𝑇𝑃

𝑇𝑃+𝐹𝑁 (11)

d. F1 Score

F1 Score is the calculation of the average result of precision and recall. The following formula for calculating the F1 Score can be seen in equation (12).

𝐹1 𝑆𝑐𝑜𝑟𝑒 = (2∗𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)

(𝑅𝑒𝑐𝑎𝑙𝑙+𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) (12)

With a high level of accuracy, this indicates that the model works very well in classifying hoax and non- hoax news related to Covid. After determining the best scenario based on the evaluation that has been carried out, the next step is to provide an explanation or Explainable AI to the system model using the Local Interpretable Model-agnostic Explanations (LIME) framework.

2.8 Local Interpretable Model-agnostic Explanations

Local Interpretable Model-agnostic Explanations or LIME, is a framework used to obtain explanations in model- agnostic scenarios [11]. LIME consists of 2 modules: the local sampler and the sparse linear model [10]. The main goal of LIME is to understand why machine learning models make those results or predictions [18]. LIME tests

(5)

these predictions when the data is perturbed into a machine learning model. Next, LIME generates a new data set consisting of samples and corresponding predictions from black box machine learning models [19]. The formula for the LIME model can be described in equation (13) as follows G is a dataset that allows it to be explained by models such as linear models, decision trees, and others [20].

𝑔 = 𝑅^𝑑→ 𝑅, 𝑔 ∈ 𝐺 (13)

3. RESULT AND DISCUSSION

There are four scenarios that are carried out with a ratio of (90:10), (80:20), (70:30), and (60:40). The dataset used is 500 data with a composition of 251 and 249 hoax and non-hoax data. The purpose in this final project is to see what words are contained in a hoax and non-hoax news and then group words so that the word relationships in the news can be seen. The best classification report results of the four scenarios will be sought from the two methods used, namely SVM and LogReg. The method with the best classification report will be used as an Explainable AI parameter. Where Explainable AI will open a black box on the method so that it appears as a highlight of the words that cause the news to be categorized as a hoax and non-hoax news by this method. After successfully collecting the existing words, then do the word grouping by looking at the existing word relationships. So that the results can be taken that hoax and non-hoax news have different word relations.

3.1 Dataset

The distribution of the dataset and most frequently occurring words can be seen in Figure 2 and Figure 3. The dataset from crawling results is 251 hoax news and 249 non-hoax news about Covid-19 in Indonesia. Furthermore, the dataset will be used as training data and test data. The dataset has been manually labeled with 1 as a hoax and 0 as a non-hoax. Most frequently occurring words in this dataset include "covid", "indonesia", "orang", "vaksin",

"kata", and "perintah".

Figure 2. Dataset Figure 3. Dataset Wordcloud The composition of the data sources used can be seen in Table 3. The news sources used were Turnbackhoad.id with 33, covid19.go.id/hoax-buster with 217, and Kompas.com with 249.

Table 3. Sources

Source Total

Turnbackhoax.id 33

covid19.go.id/ hoax-buster 217

Kompas.com 249

3.2 Analysis of Experimental Results

There were 4 tests carried out in this study using the Support Vector Machine (SVM) method and Logistic Regression. The first test is the scenario of the ratio of data (60:40). The second test is the scenario of the ratio of data (70:30). The third test is the scenario of the ratio of data (80:10). The fourth test is the scenario of the ratio of data (90:10).

3.2.1 Testing

The test results can be seen in Table 4. Testing is carried out according to the scenario of the data ratio determined by the SVM and LogReg methods.

Table 4. Test Result Data Ratio

SVM Logistic Regression

Accuracy(%) F1-Score(%)

0 1 0 1

60:40:00 84.26% 83% 86% 95% 96% 95%

200 210220 230240 250260

Dataset

Hoax Non-Hoax

(6)

Data Ratio

SVM Logistic Regression

Accuracy(%) F1-Score(%)

0 1 0 1

70:30:00 90% 89% 91% 95% 95% 95%

80:20:00 91% 91% 91% 93% 93% 93%

90:10:00 86% 85% 87% 92% 92% 92%

The accuracy value and F1-Score results are not enough to determine the best model in the research conducted. The prediction results for each scenario and method can be seen in Table 5. The prediction results are used to ensure the best model for each scenario that is carried out.

Table 5. Prediction Result

Data Ratio SVM Logistic Regression

Hoax (1) Non-Hoax (0) Hoax(1) Non-Hoax (0)

60:40:00 Predicted Non-Hoax (0) 1 74 4 99

Predicted Hoax (1) 92 30 89 5

3.3 Analysis of Test Results

Based on the results of the tests that have been carried out, the test on the data ratio scenario (80:20) gets the highest score for the SVM model. The accuracy value obtained in the SVM method is 91% and the F1-Score is 91%. For the LogReg model, testing on the data ratio scenario (60:40) gets the highest value with an accuracy value of 95% and an F1-Score of 95%.

The SVM test with the ratio of data (80:20), the model can classify both hoax and non-hoax news data for Covid-19, with results of hoax news data which are predicted to be hoaxes as many as 47, while the results of non- hoax news data which are predicted to be non- as many as 43 hoaxes. Furthermore, the LogReg test with the ratio of data (60:40), the model can also classify well, with results of hoax data which are predicted to be hoaxes as many as 89, while the results of non-hoax news data which are predicted to be non-hoaxes are as many as 99.

Furthermore, the best method, LogReg is used for the detection method on the Explainable AI parameters, namely LIME. The LIME method can see why a detection model classifies news into hoax and non-hoax news for each news.

3.3.1 Explainable AI Results Analysis (LIME)

The Explainable AI model from the LIME framework can open the black box contents of the LogReg method properly at the correct prediction. Prediction results on the hoax and non-hoax news range from 60% to 95%. The following are some of the results of hoax and non-hoax news that the Explainable AI process has carried out.

Figure 4. Explainable AI Results 1

Figure 5. Explainable AI Results 2

(7)

It can be seen in Figure 4 and Figure 5, that the Explainable AI method from LIME has succeeded in opening the black box of the Logistic Regression method, which classifies each news item as a hoax or non-hoax.

But there is a problem, the highlighted words still need to be explained in depth by ordinary people regarding this detection.

The words contained in hoax news are collected and then classified according to the relationship to each word. It can be seen in Table 6 of word classification results for each news item. There are 4 types of word relations for hoax news, namely words related to Covid-19, religion, politics, and medical. As for non-hoax news, there are 3 types of word relations, namely words related to Covid-19, government, and medical.

Table 6. Word Classification Results

Hoax News

Words related to Covid-19

lockdown nafas (breath) panas (fever)

corona vaksin (vaccine) virus

flu prokes (health

protocol) omicron

Words related to

religion umat (people) masjid (mosque) sholat (pray)

Words related to politics

rakyat (people) dukun (witch doctor)

bansos (sosial assistance)

luhut rezim (regime) china

biadab (barbaric) bpjs (bpjs

insurance)

Words related to medical

kimia (chemical) mrna hati (heart)

paru paru (lungs) darah (blood) oksigen (oxygen) rs (hospital) suntik (inject) dna

mayat (dead body) genetik (genetics) peti (chest)

sanitizer adenovirus hepatitis

ventilator asma (asthma) viagra

influenza

Words unrelated to Covid-19

pliiis (please) bunuh (kill) gila (crazy) parah (critical) hilang (lost) tipu (gimmick)

media bahaya (danger) capek (tired)

majikan (employer) bluetooth sulit (difficult) modifikasi

(modification) mainstream takut (afraid) panik (panic) palsu (counterfeit)

kiamat (disaster) kopit (covid)

Non-Hoax News

Words related to Covid-19

pandemi (pandemic) gejala (symptom) subvarian (subvariant) infeksi (infection) isolasi (isolation) covid protokol (protocol) booster Words related to

government

kota (city) indonesia resmi (official) menteri (minister) dinas (agency) sekolah (school) masyarakat (public) kasus (case) bpom

kabupaten (districts) presiden (president) Words related to

medical

dosis (dose) klinis (clinical) nakes (health workers) rawat (treat) tular (contagious)

4. CONCLUSION

From this research, hoax detection has been developed using the SVM and Logistic Regression methods and the Explainable AI method, namely LIME. The ratio of data (80:20) in the SVM method and the ratio of data (60:40) in the LogReg method have the highest accuracy values of 91% and 95%. From the tests carried out, the LogReg method is obtained as a detection model for the Explainable AI parameters used in LIME. The LIME method successfully opens a black box by looking at the highlighted words for each news item and the predictions of the news so that there are reasons why the news is classified as a hoax or non-hoax news. In this study, it was found that hoax news about Covid-19 has many words related to Covid-19, religion, politics, medical, and words that are not related to Covid-19. Among them are "lockdown", "masjid", "rezim", "ventilator", and "kiamat". Meanwhile, non-hoax news about Covid-19 has many words related to Covid-19, government, and medical. Among them are

"protokol", "isolasi", "infeksi", "menteri", and "nakes". However, the framework has not been able to explain in

(8)

common people. The framework can only explain the reasons for the detection of choosing hoax and non-hoax news, not linguistically. For future researchers, researchers can develop existing Explainable AI methods to explain the language in text classification.

REFERENCES

[1] M. M. Alvanof and R. Triandi, “Analisa Dan Deteksi Konten Hoax Pada Media Berita,” J. Teknol. Terap. Sains 4.0 Univ.

Malikussaleh, vol. 1, p. 2, 2020.

[2] C. Juditha, “Interaksi Komunikasi Hoax di Media Sosial Serta Antisipasinya,” J. Pekommas, vol. 3, no. 1, pp. 31–34, 2018.

[3] B. K. Palma, D. T. Murdiansyah, and W. Astuti, “Klasifikasi Teks Artikel Berita Hoaks Covid-19 dengan Menggunakan Algotrima K- Nearest Neighbor,” eProceedings …, vol. 8, no. 5, pp. 10637–10649, 2021.

[4] G. W. Frista, “Deteksi Konten Hoax Berbahasa Indonesia Pada Media Sosial Menggunakan Metode Levenshtein Distance,” Perpust. Univ. Islam Neger Sunan Ampel, pp. 1–78, 2018.

[5] I. A. Ropikoh, R. Abdulhakim, U. Enri, and N. Sulistiyowati, “Penerapan Algoritma Support Vector Machine (Svm) Untuk Klasifikasi Web Phising,” J. Chem. Inf. Model., vol. 5, no. 1, pp. 64–73, 2021.

[6] F. Ismayanti and E. B. Setiawan, “Deteksi Konten Hoax Berbahasa Indonesia di Twitter Menggunakan Fitur Ekspansi dengan Word2Vec,” vol. 8, no. 5, pp. 10288–10300, 2021.

[7] J. Tugas, A. Fakultas, H. K. Putra, M. Arif Bijaksana, and A. Romadhony, “Deteksi Penggunaan Kalimat Abusive Pada Teks Bahasa Indonesia Menggunakan Metode IndoBERT,” e-Proceeding Eng., vol. Vol.8, No., no. 2, pp. 3028–3038, 2021.

[8] H. A. Pradana, A. Bramantoro, A. A. Alkodri, O. Rizan, T. Sugihartono, and Supardi, “An android-based hoax detection for social media,” Int. Conf. Electr. Eng. Comput. Sci. Informatics, pp. 189–194, 2019, doi:

10.23919/EECSI48112.2019.8976998.

[9] M. Aldwairi and A. Alwahedi, “Detecting fake news in social media networks,” Procedia Comput. Sci., vol. 141, pp.

215–222, 2018, doi: 10.1016/j.procs.2018.10.171.

[10] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier,”

NAACL-HLT 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr.

Sess., pp. 97–101, 2016, doi: 10.18653/v1/n16-3020.

[11] A. Saini and R. Prasad, “Locally Interpretable Model Agnostic Explanations using Gaussian Processes,” 2021.

[12] H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,” J. Phys. Conf. Ser., vol. 2171, no. 1, 2022, doi: 10.1088/1742-6596/2171/1/012021.

[13] X. Zhou, X. Zhang, and B. Wang, “Online support vector machine: A survey,” Adv. Intell. Syst. Comput., vol. 382, no.

8, pp. 269–278, 2016, doi: 10.1007/978-3-662-47926-1_26.

[14] A. A. T. Fernandes, D. B. F. Filho, E. C. da Rocha, and W. da Silva Nascimento, “Read this paper if you want to learn logistic regression,” Rev. Sociol. e Polit., vol. 28, no. 74, pp. 1/1-19/19, 2020, doi: 10.1590/1678-987320287406EN.

[15] H. H. Rashidi, N. K. Tran, E. V. Betts, L. P. Howell, and R. Green, “Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods,” Acad. Pathol., vol. 6, 2019, doi:

10.1177/2374289519873088.

[16] M. Junker, R. Hoch, and A. Dengel, “On the evaluation of document analysis components by recall, precision, and accuracy,” Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, no. April, pp. 717–720, 1999, doi:

10.1109/ICDAR.1999.791887.

[17] S. Haghighi, M. Jasemi, S. Hessabi, and A. Zolanvari, “PyCM: Multiclass confusion matrix library in Python,” J. Open Source Softw., vol. 3, no. 25, p. 729, 2018, doi: 10.21105/joss.00729.

[18] N. Aslam et al., “Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence (XAI),” Sustain., vol. 14, no. 12, 2022, doi: 10.3390/su14127375.

[19] M. R. Islam, M. U. Ahmed, S. Barua, and S. Begum, “A Systematic Review of Explainable Artificial Intelligence in Terms of Different Application Domains and Tasks,” Appl. Sci., vol. 12, no. 3, 2022, doi: 10.3390/app12031353.

[20] M. T. Ribeiro, S. Singh, and C. Guestrin, “Model-Agnostic Interpretability of Machine Learning,” no. Whi, 2016.