Explainable AI: Identification of Writing from Famous Figures in Indonesia Using BERT and Naive Bayes Methods

(1)

Explainable AI: Identification of Writing from Famous Figures in Indonesia Using BERT and Naive Bayes Methods

Firdaus Putra Kurniyanto^*, Agus Hartoyo Informatics, Informatics, Telkom University, Bandung, Indonesia

Email: ^1,*firdausputra@student.telkomuniversity.ac.id, ²ahartoyo@telkomuniversity.ac.id Correspondence Author Email: firdausputra@student.telkomuniversity.ac.id

Abstract−Identifying the writings of well-known figures in Indonesia is a form of appreciation for the writing itself. By knowing the language style used by every famous figure in Indonesia, we can know the uniqueness of each writer, and it can help us understand the thoughts, ideas, and ideas they convey. This research has yet to be done, so it is still interesting to do further research. In this study, only a few writers were used, so it is still impossible to know the overall language style used by every famous figure in Indonesia. In this study, a system was built to determine the language style used by well-known figures in Indonesia based on their writing using the BERT, Naïve Bayes, and LIME algorithms for explainable AI processes. The results are that the BERT algorithm is better at classifying text with an accuracy of 92% compared to Naïve Bayes, which has an accuracy of 90%. From this study, it was also found that KH. Abdurrahman Wahid and Emha Ainun Nadjib have almost the same style of language in which their writings contain many words with political and religious elements. Dahlan Iskan, his writing contains many words with political and socio-cultural elements, while Pramoedya Ananta Toer's writing uses many pronouns.

Keywords: BERT; Explainable AI; Figure; Indonesia; LIME; Naïve Bayes; Writings

1. INTRODUCTION

Many famous figures like to write articles or novels in Indonesia. The author is a term for people who have written works, both articles, and novels [1]. Many well-known Indonesian writers, such as KH. Abdurrahman Wahid, Dahlan Iskan , Pramoedya Ananta Toer, and others . Written works can be in the form of novels, articles, or other writing-related things.

Every writer has a different style of language, and this is what makes each written work unique. Writing can be a fact or a work of fiction. A fact is a statement in which the statement can be proven false [2]. Fact is also the opposite of opinion [3]. The opinion is an expression of someone who cannot be proven [2].

A work of fiction is a work of imagination, not based on facts, although it may be based on a real story or situation [4]. An example of this work of fiction is a short story or a novel. A work of fiction usually has communicative language and an exciting storyline. In writing, it usually uses language that is not standard or language used in everyday life.

It is essential to identify the writings of well-known figures in Indonesia because it can help us understand the thoughts, ideas, and ideas that they convey. In addition, recognizing the writings of well-known figures in Indonesia can broaden our horizons and knowledge about Indonesian history, culture, and culture. Identifying these papers can also help us understand the development of thoughts and ideas in Indonesia and how these works influence Indonesian society and civilization.

For example, in 2020, there was research to find out changes in writing styles carried out by Iyer and Vosoughi with the title "Style Change Detection Using BERT." From the research that has been done, it is found that the accuracy of the Logistic Regression method is higher than Naïve Bayes, namely 91%, while Naïve Bayes is 90% [5] . However, this research still does not use explainable AI in its process, so the results obtained are still in the form of a black box which is still tricky for humans to understand. Explainable AI itself is a technique or method for understanding the output of a machine learning algorithm so that the output results are understandable to humans and easy to make decisions.

Based on research conducted by Aborisade and Anwar with the title "Classification for authorship of tweets by comparing logistic regression and naive Bayes classifiers," they researched the classification of authorship attribution or authorship ownership from Twitter data obtained through Twitter's RESTful API. The methods used alone are Logistic Regression and Naïve Bayes. From the research that has been done, it is found that the accuracy of the Logistic Regression method is higher than Naïve Bayes, namely 91%, while Naïve Bayes is 90%. The performance obtained depends on the corpora that have been trained, the more corpora that are trained, the higher the performance [6].

Based on research conducted by Zheng et al. with the title "The email author identification system based on support vector machine (SVM) and analytic hierarchy process (AHP)," they conducted research related to author attribution, author characteristics, and camouflage detection in an electronic mail (email) using Support Vector Machine and Analytic Hierarchy methods. The researcher collects a dataset from several emails sent, with 80% as train data and 20% as test data. From the data that has been collected, it will be identified who is the owner of the email by analyzing the writing procedures, choosing words, and so on. The resulting accuracy from the Support Vector Machine method is more than 95% [7].

(2)

Based on research conducted by Mathews with the title "Explainable artificial intelligence applications in NLP, biomedical, and malware classification: a literature review," they conducted research related to Natural Language Processing on tweets, which was then followed by cancer detection in biomedical signal classification, then researchers also conducted research related to malware detection on Windows PCs. The method used for biomedical classification uses the Deep neural network with an accuracy of 94%, while the classification of tweet data uses the XGB method with an accuracy of 84%. For malware detection, Microsoft's Windows PC malware classification is used, which has almost perfect accuracy, namely 99.83%. Of the three classification methods, LIME will be used as explainable [8].

Based on research conducted by Aslam et al. entitled “Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence (XAI),” they conducted research related to domains that were indicated to be dangerous using the Decision Tree, Naïve Bayes, Random Forest, Ada Boost, XGB, and Cat Boost methods. The data used alone is 45,000 samples from each of the dangerous domains and non-hazardous domains. Researchers use Explainable AI to assist in the models' explanations. The results obtained from the models used have high accuracy, but the XGB model has the highest accuracy, 98.18%. The framework used in Explainable AI is LIME [9].

Based on this background, a system was built to find out the style of language of several well-known writers in Indonesia with explainable AI. This research has never been done before, so it is still interesting to do further research. Explainable AI is later used to understand the output generated from a machine learning algorithm. The algorithm used in this study is BERT and also Naïve Bayes. The algorithm is used to classify a paper from a well- known figure in Indonesia.

By using the BERT and Naïve Bayes algorithms, a comparison can be made of the two algorithms' performance in classifying a paper from a well-known figure in Indonesia. Then, by building this explainable AI system, it can be seen how the style of language used by well-known figures in Indonesia in their writings.

For the limitations of this study, the authors identified only KH. Abdurrahman Wahid, Dahlan Iskan, Emha Ainun Nadjib, and Pramoedya Ananta Toer. Then the dataset used only Indonesian language datasets with no more than 2000 data.

This final project aims to compare the performance of the BERT and Naïve Bayes algorithms in classifying papers written by well-known figures in Indonesia, namely, KH. Abdurrahman Wahid, Dahlan Iskan, Emha Ainun Nadjib, and Pramoedya Ananta Toer. In addition, this research was also conducted to find out the style of language used by the character in writing his essay.

2. RESEARCH METHODOLOGY

Dataset used in this final project is in the form of articles and or novels. The articles or novels used are the writings or works of well-known figures in Indonesia. The dataset used is the writing of KH. Abdulrahman Wahid, Dahlan Iskan, Emha Ainun Nadjib, and Pramoedya Ananta Toer. After collecting the dataset, it will then enter the preprocessing data stage, which will then be divided into train data, eval data, and test data. The train data will be used to train the model, and the eval data will be used to evaluate the model when tuning or optimizing the model, while the test data will be used to test a previously trained model. This final project will use the BERT, Naïve Bayes, and LIME methods for the framework used in explainable AI. After everything has been done, the last step is to evaluate the results that have been obtained.

(3)

The dataset used in this thesis is an article or novel written by a well-known figure. The dataset used is the writing of KH. Abdulrahman Wahid, Dahlan Iskan , Emha Ainun Nadjib, and Pramoedya Ananta Toer. The data taken is in the form of several sentences, and later each writer will take the sentences and put them in .csv format. After the dataset has been successfully collected, data preprocessing is next.

Table 1. Dataset Example

Title Text Author

Radikal Shofa Shofa sendiri dosen agama Islam di Universitas Indonesia. Dosen tidak tetap. Ia sarjana filsafat dari Universitas Gadjah Mada. Lalu ambil master bidang pemikiran Islam di UIN Syarif Hidayatullah Jakarta. Ditambah lagi master bidang hukum ekonomi di Universitas Nasional. Setamat SMP di Blora, Shofa masuk SMA pondok Tebuireng Jombang. Selama kuliah di UGM ia juga mondok di pesantren Krapyak.

Dalan Iskan

Jaka Tingkir dan Swadaya Masyarakat

Kita hampir selalu melihat perkembangan LSM/NGO (Lembaga

Swadaya Masyarakat/Non Govermental Organization) sebagai fenomena yang baru. Padahal kalau kita simak dengan teliti, sejarah masa lampau kita akan memperlihatkan asal-usul LSM pada sejarah masa lampau kita sendiri. Dalam hal ini, kita dapat memulainya dengan kisah pertarungan antara Sultan Hadiwidjaya (Raden Mas Karebet atau Jaka Tingkir) di Pajang dan menantunya, Sutawidjaya.

KH

Abdurrahman Wahid

2.2 Preprocessing Data

In the text mining process there is a pre-process that must be carried out, this stage consists of tokenization, lower casing, filtering, and stemming.

a. Lower Casing

At this stage, the process of generalizing the form of writing a word is carried out. In this case, each word will be changed to lowercase [10]. The following is an example of input and output results from the lower casing process.

Table 2. Lower Casing Process

Input Output

Para agamawan sejak dahulu sering dinilai sebagai penghambat bagi kemajuan

para agamawan sejak dahulu sering dinilai sebagai penghambat bagi kemajuan

b. Filtering

After the lower casing process is carried out, the next step is the filtering process. In this process, words that are not too important such as conjunctions [10], will be deleted. The following is an example of input and output results from the filtering process.

Table 3. Filtering Process

para agamawan sejak dahulu sering dinilai sebagai penghambat bagi kemajuan

para agamawan dahulu sering dinilai penghambat kemajuan

c. Stemming

After the filtering process is complete, the next step is the stemming process. The word returns to the basic form [10]. The following is an example of input and output results from the stemming process.

Table 4. Stemming Process

para agamawan dahulu sering dinilai penghambat kemajuan para agamawan dahulu sering nilai hambat maju d. Tokenization

After the stemming stage has been completed, the next stage is tokenization. At this stage, the process of breaking sentences into words will be carried out, and the following is an example of input and output results from the tokenization process.

Table 5. Tokenization Process

para agamawan dahulu sering nilai hambat maju

‘para’, ‘agamawan’, ‘dahulu’, ‘sering’, ‘nilai’, ‘hambat’,

‘maju’

(4)

2.3 Data Splitting

After the preprocessing process has been completed, the next step is the data splitting process. Data splitting is a process that divides data into three parts, namely train data, eval data, and test data. The train data will be used to train the model, while the eval data will be used to evaluate the model when the model is being tuned or optimized.

The test data will be used to test models that have been trained or trained. Training data and test data will be divided by 80% for training data and 10% for eval data, and 10% for test data.

2.4 Word Embedding

Because the dataset used is text and one of the models used is Naïve Bayes, a word embedding process is required.

In this process, data changes are made from text to numerical. The algorithm used in the word embedding process is TF-IDF. This algorithm works to increase proportionally the number of times a word appears in a document but is offset by the number of documents that appear in that word [11]. Equation (1) is an equation for TF (Term Frequency) where the number of times the word (i) appears in the document (j) is divided by the total number of words in the document. Equation (2) is the equation for IDF (Inverse Document Frequency), where the log of the total number of documents is divided by the number of documents containing the word, while equation (3) is the general TF-IDF equation which is calculated by multiplying TF by IDF.

𝑡𝑓_𝑖,_𝑗= _{∑ 𝑛}^𝑛^𝑖^,^𝑗

𝑘,𝑗

𝑘 (1)

𝑖𝑑𝑓(𝑤) = log (^𝑁

𝑑𝑓𝑡) (2)

𝑤_𝑖,_𝑗= 𝑡𝑓_𝑖,_𝑗× log (^𝑁

𝑑𝑓𝑖) (3)

2.5 BERT

BERT (Bidirectional Encoder Representations from Transformers) is a mechanism for NLP (Natural Language Processing) pre-training created by "Google. BERT". The transformer is an architecture in deep learning based on self-attention to draw global dependencies between input and output [12]. BERT is a model used to help understand how to represent words and sentences by capturing the meaning of and the best word relationships [13].

2.6 Naïve Bayes

Naïve Bayes is a linear classifier known to be simple but very efficient. This classification method is based on Bayes' theorem, and the adjective "Naive" comes from the assumption that the features in the data are mutually independent or can stand alone [14]. Especially for small datasets, this method is very suitable [15]. In equation (4), A and B are events, and P(A|B) is the probability of event A, which is when event B is true. P(A) is a priori of A, and P(B|A) is the probability of the event B, where the probability of A is already tied. The Naïve Bayes method is also a method that is relatively strong, easy to implement, and can be used in many different fields [16].

𝑃(𝐴 |𝐵) = 𝑃(𝐵 |𝐴)𝑃(𝐴)

𝑃(𝐵) (4)

2.7 Model Training

After the data splitting process has been completed, the next step is model training. In this final project, two models are used: BERT (IndoBERT) and Naïve Bayes. The data train used is in the form of a data set consisting of sentences written by well-known figures in Indonesia. This training model will obtain a system model, which will then be used in model testing.

2.8 Model Testing

After the model has been successfully trained, the next step will be model testing, and this model testing will be used to measure how a model produces reasonable output. After the testing is successfully carried out, the results of the process will be known and will be continued in the evaluation process to find out the results obtained in the testing process.

2.9 Model Evaluation

After the model testing process has been carried out, the next step is to evaluate based on the evaluation of the following metrics.

a. Accuracy

Accuracy is a measure used in the model to determine how accurately the model guessed correctly [17]. The equation of self-accuracy is as follows.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑑𝑎𝑡𝑎

𝑇𝑜𝑡𝑎𝑙 𝑜𝑓 𝑎𝑙𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑑𝑎𝑡𝑎 (5)

(5)

Firdaus Putra Kurniyanto, Copyright © 2023, MIB, Page 164 Precision is a measure used in model testing to determine how accurately the model can guess data classified as a positive class [18]. Here is the formula for precision.

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ^𝑇𝑃

𝑇𝑃+𝐹𝑃 (6)

c. Recall

Recall is a measure used in model testing to determine how accurately the model can guess the correct data [18].

The equation itself is as follows.

𝑅𝑒𝑐𝑎𝑙𝑙 = ^𝑇𝑃

𝑇𝑃+𝐹𝑁 (7)

d. F1-Score

F1 Score is the average result of precision and recall with the following equation.

𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ×𝑟𝑒𝑐𝑎𝑙𝑙

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 (8)

2.10 Explainable AI (LIME)

An explainable AI framework will be used in this process using LIME (Local Interpretable Model-Agnostic Explanations). Local Interpretable Model-Agnostic Explanations, commonly known as LIME, is a technique developed by researchers from the University of Washington [19]. LIME itself is a framework used in explainable AI. LIME is an algorithm that provides an instance-based explanation for predictions of any classifier [20]. In other words, LIME is an algorithm that can correctly explain any classifier or regressor's predictions by proxying it locally with an interpretable model [10]. When LIME is perturbed, a prediction process will be carried out where the decision will try to be explained in equation (9), where G is a class that can be interpreted. The goal of LIME itself is to understand why a machine-learning model produces this output. Using LIME can also make it easier for humans to make decisions based on the output generated from machine learning models. This process will be carried out to find out the results of the model that has been built but with output that humans can easily understand.

The results of this process can later be used as material in linguistic analysis, and conclusions can be drawn.

𝑔 ∈ 𝐺 (9)

3. RESULT AND DISCUSSION

This section will explain the implementation of explainable AI for the writings of well-known figures in Indonesia.

The explanation in this section will be divided into two: test results and analysis of test results. The test results will be discussed regarding data entered the testing stage until the AI explainable process. Meanwhile, in the analysis of the test results, an analysis process will be carried out to determine which model is better and what style of language is used by well-known figures in Indonesia in writing their writings.

3.1 Dataset

The dataset is obtained by crawling the web and taking it from a book. The source of the dataset obtained can be seen in Table 6. The total dataset obtained is 1200 from various titles of writings from well-known figures in Indonesia. After entering the pre-processing stage, the total dataset becomes 1197, which can be seen in Figure 2.

After going through the pre-processing stage, the next step is to do the splitting process with the proportion of 80%

train data, 10% eval data, and 10% test data.

Table 6. Dataset Source

Author Source

KH. Abdurrahman Wahid https://gusdur.net/koleksi-karya-tulis/

Dalan Iskan https://disway.id/kategori/99/catatan-harian-dahlan Emha Ainun Nadjib https://www.caknun.com/tag/esai/

Pramoedya Ananta Toer The novel entitled Bumi Manusia Table 7. Total Dataset

Author Amount

KH. Abdurrahman Wahid 300

Dalan Iskan 300

Emha Ainun Nadjib 300 Pramoedya Ananta Toer 300

(6)

Figure 2. Number of Clean Data 3.2 Test Results

3.2.1 Test Data

The classification was carried out using the BERT algorithm and also Naïve Bayes. The datasets used were the writings of several figures. The character taken in this test is KH. Abdurrahman Wahid, Dahlan Iskan, Emha Ainun Nadjib, and Pramoedya Ananta Toer. The number of datasets used in the test can be seen in Table 8.

Table 8. Number of Test Data

Author Amount

KH. Abdurrahman Wahid 40

Dalan Iskan 26

Emha Ainun Nadjib 28

Pramoedya Ananta Toer 26 3.2.2 Model Evaluation

When the test dataset is entered into the BERT and Naïve Bayes models, in Table 9, the accuracy of the BERT model is 0.92 with an average precision of 0.94, an average recall of 0.92, and an average F1-Score of 0.92. In Table 10, the Naïve Bayes model has an accuracy of 0.90 with an average precision of 0.89, an average recall of 0.91, and an average F1-Score of 0.89.

Table 9. BERT Classification Report

Author Precision recall F1-Score Accuracy KH. Abdurrahman Wahid 0.95 0.95 0.91

0.92

Dalan Iskan 0.96 0.93 0.94

Emha Ainun Nadjib 0.86 0.83 0.84

Pramoedya Ananta Toer 1.00 0.96 0.98 Table 10. Naive Bayes Classification Report

Author Precision recall F1-Score Accuracy KH. Abdurrahman Wahid 0.95 0.90 0.93

0.90

Dalan Iskan 0.69 1.00 0.82

Emha Ainun Nadjib 0.93 0.93 0.93

Pramoedya Ananta Toer 1.00 0.81 0.93 3.2.3 Explainable AI (LIME)

Then explainer is performed using LIME on the Naïve Bayes model. This explainer is done by entering each test data that has been prepared. For example, based on Figure 3, the highlighted text is classified as text written by Dahlan Iskan with a probability of 0.55. In Figure 4, the highlighted text is classified as text written by KH.

Abdurrahman Wahid with a probability of 0.81. In Figure 5, the highlighted text is classified as written by Emha Ainun Nadjib with a probability of 0.74. Furthermore, in Figure 6, the highlighted text is classified as text written by Pramoedya Ananta Toer with a probability of 0.83.

200 220 240 260 280 300 320 340

KH.

Abdurrahman Wahid

Dahlan Iskan Emha Ainun Nadjib

Pramoedya Ananta Toer

Clean Data

(7)

Figure 4. Example of KH. Abdurrahman Wahid's Explainer

Figure 5. Example of Emha Ainun Nadjib's Explainer

Figure 6. Example of Pramoedya Ananta Toer's Explainer 3.3 Analysis of Test Results

Based on Table 9 and Table 10, it can be seen that the accuracy for the BERT model is slightly higher than using the Naïve Bayes model. The average recall value generated is also slightly higher in the BERT model, where the average recall in the BERT model is 0.92, while in the Naïve Bayes model it is 0.91. With a higher average BERT recall than naïve Bayes, the BERT model is more accurate in making predictions. Then the BERT model also produces a higher mean precision value compared to naïve Bayes, so the BERT model is more reliable for predicting the correct data. In Table 11, the grouping of words is based on the text highlighted during the Explainer for the author KH. There are two types of words used by Abdurrahman Wahid, namely, words that contain religious elements and words that contain political elements. For the writer Dahlan Iskan, the words mainly used contain political and social-cultural elements. For the writer Emha Ainun Nadjib, the words used contain many political and religious elements. Furthermore, for the writer Pramoedya Ananta Toer, the words used tend to use pronouns such as personal pronouns and instructions.

(8)

Table 11. Word Grouping

Author Original Text English Text Detail Category

KH. Abdurrahman Wahid

kiai kiai People/Groups The word contains

religious elements pesantren boarding

nabi prophet

umat people

kaum folk

muslimin muslimin

islam islam

agama religion

damai peace Other

allah god

paham understand

sikap attitude

orientasi oritentation

pandang view

masyarakat society People/Groups The word contains a political elements kelompok group

bangsa nation

pihak side

negara country Geopolitical

bangsa nation

lembaga institution

pimpin lead

irak iraq

asean asean

absentia absentia internasional international

Dahlan Iskan jokowi jokowi People/Groups The word contains a political elements

trump trump

biden biden

sambo sambo

bupati regent

masyarakat society

jaksa presecutor

polisi police

tiongkok china Geopolitical

kongres congress

demo demo

indonesia indonesian

negeri country

kamboja cambodia

kota city

sirup syrup Health

bahan material

obat medicine

ginjal kidney

leong leong Culture The word contains social

and cultural elements

pantun rhyme

jurus kick

nyanyi sing

rangkul embrace Social

hemat thirfty

riya arrogant

cantik beautiful

sikap attitude

budi mind

Emha Ainun Nadjib kaum folk People/Groups The word contains religious elements

maiyah maiyah

(9)

Author Original Text English Text Detail Category

muslimin muslimin

kiai kiai

allah god Other

agama religion

firman word

tuhan god

islam islam

corona corona Health The word contains a

political elements

obat medicine

virus virus

kelompok group People/Groups

masyarakat society

hukum law Other

acara event

kuasa power

politik politics masalah problem

pimpin lead

isu issues

kritik criticism politik politics Pramoedya Ananta

Toer

jean jean Person Subtitute Words

mellema mellema

annelies annelies

sinyo sinyo

robert robert

suurhof suurhof

minke minke

kamu you

ibu mother

bapak father

ayah father

wanita woman

tamu guest

tuan master

nyai nyai

pada at Instructions

pabrik factory

eropa europe

kantor office

belanda dutch

4. CONCLUSION

Based on the tests and analyzes that have been carried out, it is found that the BERT algorithm is better at classifying the writings of well-known figures in Indonesia with an accuracy of 92% compared to the Naive Bayes algorithm, with an accuracy of 90%. Based on Table 8 and Table 9, the average of the recalls obtained is still higher using the BERT algorithm, which is 92%. With a high recall value, the BERT algorithm is better at guessing data correctly and accurately than Naïve Bayes. They are then based on the analysis results in Table 10, KH.

Abdurrahman Wahid and Emha Ainun Nadjib have almost the same style of language in which their writings contain many words with political and religious elements. Dahlan Iskan, his writing contains many words with political and socio-cultural elements, while Pramoedya Ananta Toer's writing uses many pronouns. The suggestion for this research is to add several author figures in Indonesia, which will later be analyzed for their language style.

In addition, researchers can further develop the results of the explainable AI process that has been carried out.

Because there are memory limitations in carrying out explainable AI for the BERT algorithm, in the future, researchers may be able to carry out an explainable AI process for the two models used, which can later be compared.

REFERENCES

[1] M. Foucault, “Foucault - Author.Pdf,” Truth and Method, vol. 8. pp. 101–120, 1969.

(10)

[2] I. The Center for Humanities, “Knowing the difference between facts and opinions,” Brgh. Manhattan Community Coll., no. 1977, p. 1977, 1977.

[3] C. S. Lammer-Heindel, “Facts and Opinions,” no. September, pp. 10–12, 2016.

[4] B. Publishing, “is Fiction ?,” vol. 43, no. 4, pp. 385–392, 2012.

[5] A. Iyer and S. Vosoughi, “Style Change Detection Using BERT Notebook for PAN at CLEF 2020,” CEUR Workshop Proc., vol. 2696, no. September, pp. 22–25, 2020.

[6] O. M. Aborisade and M. Anwar, “Classification for authorship of tweets by comparing logistic regression and naive bayes classifiers,” Proc. - 2018 IEEE 19th Int. Conf. Inf. Reuse Integr. Data Sci. IRI 2018, pp. 269–276, 2018, doi:

10.1109/IRI.2018.00049.

[7] Q. Zheng, X. Tian, M. Yang, and H. Su, “The email author identification system based on Support Vector Machine (SVM) and Analytic Hierarchy Process (AHP),” IAENG Int. J. Comput. Sci., vol. 46, no. 2, pp. 178–191, 2019.

[8] S. M. Mathews, Explainable Artificial Intelligence Applications in NLP, Biomedical, and Malware Classification: A Literature Review, vol. 998. Springer International Publishing, 2019.

[9] N. Aslam, I. U. Khan, S. Mirza, A. Alowayed, and F. M. Anis, “Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence ( XAI ),” 2022.

[10] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier,”

NAACL-HLT 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr.

Sess., pp. 97–101, 2016, doi: 10.18653/v1/n16-3020.

[11] H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,” J. Phys. Conf. Ser., vol. 2171, no. 1, pp.

218–222, 2022, doi: 10.1088/1742-6596/2171/1/012021.

[12] L. M. R. Rizky, “Improving Stance-based Fake News Detection using BERT Model with Synonym Replacement and Random Swap Data Augmentation Technique,” 2021.

[13] V. F. Dr. Vladimir, “済無No Title No Title No Title,” Gastron. ecuatoriana y Tur. local., vol. 1, no. 69, pp. 5–24, 1967.

[14] S. H. Myaeng, K. S. Han, and H. C. Rim, “Some effective techniques for naive bayes text classification,” IEEE Trans.

Knowl. Data Eng., vol. 18, no. 11, pp. 1457–1466, 2006, doi: 10.1109/TKDE.2006.180.

[15] R. M. Maertens, A. S. Long, and P. A. White, “Performance of the in vitro transgene mutation assay in MutaMouse FE1 cells: Evaluation of nine misleading (‘False’) positive chemicals,” Environ. Mol. Mutagen., vol. 58, no. 8, pp. 582–591, 2017, doi: 10.1002/em.22125.

[16] S. Raschka, “Naive Bayes and Text Classification I - Introduction and Theory,” pp. 1–20, 2014.

[17] M. Junker, R. Hoch, and A. Dengel, “On the evaluation of document analysis components by recall, precision, and accuracy,” Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp. 717–720, 1999, doi: 10.1109/ICDAR.1999.791887.

[18] S. Haghighi, M. Jasemi, S. Hessabi, and A. Zolanvari, “PyCM: Multiclass confusion matrix library in Python,” J. Open Source Softw., vol. 3, no. 25, p. 729, 2018, doi: 10.21105/joss.00729.

[19] A. Saini and R. Prasad, Select Wisely and Explain: Active Learning and Probabilistic Local Post-hoc Explainability, vol.

1, no. 1. Association for Computing Machinery, 2022.

[20] S. Mishra, B. L. Sturm, and S. Dixon, “Local interpretable model-agnostic explanations for music content analysis,”

Proc. 18th Int. Soc. Music Inf. Retr. Conf. ISMIR 2017, pp. 537–543, 2017.