View of Multilabel Classification for Keyword Determination of Scientific Articles

(1)

Journal of Information Technology and Computer Science Volume 8, Number 2, August 2023, pp. 137 - 147

Journal Homepage: www.jitecs.ub.ac.id

Multilabel Classification for Keyword Determination of Scientific Articles

Sulthan Rafif¹, Rizal Setya Perdana*², Putra Pandu Adikara³

1,2,3Brawijaya University, Malang

1[email protected], ²[email protected], ³[email protected]

*Corresponding Author

Received 11 August 2023; accepted 23 August 2023

Abstract. In writing scientific articles, there are provisions regarding the structure or parts of writing that must be fulfilled. One part of the scientific article that must be included is keywords. The process of determining keywords manually can cause discrepancies with the specific themes discussed in the article. Thus, causing readers to be unable to reach the scientific article. The process of determining the keywords of scientific articles is determined automatically by the classification method. The classification process is carried out by determining the set of keywords owned by each scientific article data based on the abstract and title. Therefore, the classification process applied is multi-feature and multi-label. Classification is done by applying the Contextualized Word Embedding Method. The implementation of Contextualized Word Embedding Method is done by applying BERT Model. By applying the BERT Model, it is expected to provide good performance in determining the keywords of scientific articles. The evaluation results by applying the BERT Model to the case of multi-label classification on abstract data for keyword determination resulted in a loss value of Training Data is 0.514, loss value of Validation Data is 0.511, and an accuracy value of 0.71, a precision value of 0.71, a recall value of 0.71, an error value of 0.29 and f-1 score of 0.83.

Based on the results of the evaluation, it shows that the BERT Classification Model can carry out a classification process to determine a set of keywords from each abstract data in scientific articles.

Keywords: Scientific Article Keyword, Text Classification, BERT, Contextualized Word Embedding, Scientific Article

1. Introduction

In writing scientific articles, there are structural provisions or sections of writing that must be fulfilled [1]. One part of a scientific article that must be included is keywords. Keywords represent discussion in scientific articles. Keywords are written based on the thesaurus of the scientific field being researched [2]. The thesaurus is a list of terms covering specific themes that are more specific. Keywords are a tool for indexing and helping search engines find relevant scientific articles [3]. If search engines can find the scientific article, readers can find it too, which will increase the number of people reading it and will generate more citations to the scientific article [4]. Therefore, determining keywords that are following specific research themes needs to be done with the aim that scientific articles can be reached by readers. The process of determining keywords manually can cause errors that lead to discrepancies with the specific themes discussed in the article.

(2)

138 Journal Volume 8, Number 2, August 2023, pp 137-147 Keywords that are not specifically specified have an impact on the directory where scientific articles are published. Scientific articles will be stored in a storage directory with a general theme. Readers looking for scientific articles through storage directories that contain collections of scientific articles with specific fields of study.

Keywords that are not specifically defined prevent readers from reaching the scientific article [5]. Based on these problems, it is necessary to apply the process of automatically determining scientific article keywords.

The process of determining scientific article keywords is automatically applied using the classification method. The classification process in determining scientific article keywords can be done based on the contents of the abstract. An abstract is a summary of the content discussed by the scientific article [1]. Abstracts can make it easier for readers to find out the entire contents discussed in the scientific article quickly.

The abstract acts as a guide to the important sections of the article [4]. Every scientific article has more than one keyword that represents the research being carried out.

Therefore, in practice, the classification is multi-labelled [6].

There are three approaches to solving multi-label dataset classification cases, namely data transformation, adaptation method, and ensemble method [7]. In this research, the Data Transformation approach was applied.

Based on previous studies, some cases occurred when carrying out the term- weighting process using TF-IDF [8]. The case that arises in TF-IDF is if two sentence documents have the same list of words but only differ in order, then the two documents will have the same total weight value for all words. This causes both documents to be classified under the same label or class. Meanwhile, based on the dataset, the two documents are included in different labels or classes. This can affect the performance of the model applied to the data. Changing the word order of a sentence can affect the meaning of the sentence. Therefore, it is necessary to apply a classification method based on the context of each document..

The classification method based on the context of the sentence is called Contextualized Word Embeddings. Contextualized Word Embeddings is a method that classifies text based on context/meaning and pays attention to the order of a word in the sentence [9]. One implementation of the Contextualized Word Embeddings Method is the BERT Model [10].

Therefore, in this research, multi-label classification is applied with the implementation of Contextualized Word Embedding. The multi-label classification process uses a data transformation approach, namely changing multiclass data into binary data. While the implementation of Contextualized Word Embedding is implemented using the BERT Model. The classified dataset is a collection of scientific articles that contain abstracts and a collection of keywords.

2. Previous Works

There is previous research that applies the Contextualized Word Embeddings Method. Research conducted by Ambalavanan et al classified scientific articles that fall into the biomedical domain. The classification aims to find specific scientific articles within a large collection of collections [11]. The articles used as data in this study are multi-criteria in nature, therefore it is necessary to carry out the classification process by modelling separately.

The model used is the Ensemble Model. In this study, several architectures were proposed as follows: Individual Task Learner (ITL), Cascade Learner, Boolean Ensemble Learner (Ensemble-Boolean), and Feed-forward Network Ensemble Learner (Ensemble-FFN).

The results of this study indicate that the Pre-trained Neural Contextualized Language (SciBERT) model has good performance in screening scientific articles. The

(3)

Sulthan Rafif et al. , Multilabel, Classification for …139 screening process is carried out using a Single Integrated model, namely ITL (Individual Task Learner) which produces a recall value of 0.985. The recall value obtained from the ITL model is the highest among other architectures.

Research conducted by Saadah et al, classified public opinion regarding the COVID-19 vaccine program. The public opinion data was taken via Twitter for the period January 2021 to March 2022. The purpose of the classification process is to compare the performance of each Deep Learning Model used, namely IndoBERTweet, IndoBERT, and CNN-LSTM [12].

Based on the test results, the IndoBERT Model can classify positive sentiment by 80%. IndoBERTweet succeeded in classifying negative sentiment by 68%.

Meanwhile, the CNN-LSTM model reaches 53% for positive sentiment. The classification process is carried out for a dataset containing 2020 rows of data.

Research conducted by Juarto and Girsang, implementing the Collaborative Filtering Method with Neural Network to provide news recommendations to users.

However, the application of Neural Network has weaknesses in providing news recommendations such as news titles, and news content to users. Therefore, in this study the BERT Model is applied to apply sentence embedding to news titles and content. The application of the BERT Model aims to classify news documents against news categories and users. The data used is a total of 50,000 user data, 51,282 news data with 5,475,542 interactions between users and news. The evaluation process of predicting news clicks by users uses the calculation of precision, recall, and ROC [13].

Based on the test results, it shows that the precision value is 99.14%, recall is 92.48%, f1-score is 95.69%, and ROC is 98%. Evaluation measurements applying hit ratio@10 resulted in a hit ratio of 74% at the fiftieth epoch for the application of BERT, the hit ratio is better when compared to Neural Collaborative Filtering (NCF).

Research conducted by Babi et al, applied the BERT Model for the feature extraction process and applied the Support Vector Machine and Naïve Bayes Methods for the classification process of fake reviews of a product. This study aims to determine the performance of the Support Vector Machine and Naïve Bayes models that apply extraction features from the BERT Model [14].

Based on the evaluation, it shows that the Naïve Bayes Model produces an accuracy of 97.14% while the Support Vector Machine Model produces an accuracy of 100% / This shows that the Support Vector Machine produces the best performance if the extraction features are applied using BERT.

Research conducted by Ovidiu-Mihai et al, applied classification with the BERT Model to determine the emotions of a person with psychological problems through extracted journal content. Based on the evaluation results, the precision value for the anger, fear, joy, love, sadness, and surprise classes is 91%, 87%, 94%, 79%, 96%, and 75%, respectively [15].

Based on this research, the researcher applies the BERT Model to the classification process for scientific article keywords.

3. Methods

In Figure 1. The flowchart of the research carried out is shown.

Based on Figure 1. The stages of the research carried out are Doing Pre- processing of Scientific Article Datasets using Pre-trained SciBERT [16]. The preprocessing data is in the form of abstracts and keywords. Furthermore, the classification process is carried out on the training data and validation data using the BERT Model.

Then the results of the classification will be measured using a confusion matrix to determine the accuracy, precision-recall, and error generated from the applied model.

(4)

140 Journal Volume 8, Number 2, August 2023, pp 137-147

Figure 1. Flowchart of Data Processing Techniques 3.1 Datasets

The dataset used in this research is a collection of the scientific article abstract data along with a collection of keyword labels. The collection of scientific articles contains 2304 articles from the computer science domain published by ACM. Each paper has keywords assigned by the authors and verified by the reviewers. Different parts of the paper, such as the title and abstract, are separated, allowing extraction based on parts of the article text [17].

The processed dataset is multi-label, each abstract data can have a relationship with more than one label or class. The dataset used consists of training data and validation data. The number of datasets for training is 2073 data. For validation data consists of 231 data.

3.2 Multi-label Classification

Multi-label classification is a classification process in which data can fit into more than one label or category [7]. Unlike the binary and multiclass classification models, each data in the multilabel classification will be associated with a vector belonging to more than one output [18] [19] [20].

(5)

Sulthan Rafif et al. , Multilabel, Classification for …141 The vector length of each output will be determined based on the number of labels in the dataset. Each element in the vector will be a binary value that functions to determine whether each label is relevant to the classification data or not, several labels may be relevant to one of the data. Each different combination of labels against one of the relevant data is known as a label set.

3.3 Dataset Transformation

Data Transformation is an approach that converts multi-label datasets into multiclass or binary datasets. In the process, the Data Transformation applied in this study will change multilabel datasets into binary datasets [7].

Binary classification is used to determine whether the classified keywords are part of a set of keywords belonging to abstract data or not. The binary classification process can be handled using the Sigmoid Activation Function. The sigmoid activation function will produce a classification result with a range between zero and one [21].

3.4 Sigmoid Function

The Sigmoid Activation Function is applied in a binary classification case study.

The Sigmoid Activation function will generate a classification result with a range between zero and one [21]. In this study, the Sigmoid Activation Function was carried out in the binary classification process to determine whether each keyword is included in the keyword collection from abstract data or not. The zero value of the result of the sigmoid activation function indicates that the keyword is not included in the set of keywords from the abstract data. The value of one indicates the entry keyword in the keyword set of abstract data.

3.5 Contextualized Word Embeddings

Contextualized Word Embeddings is a method that classifies text based on context/meaning and pays attention to the order of a word in the sentence [9]. In the process, Contextualized Word Embeddings performs vector calculations on each word in the sentence. Unlike the Word Embeddings Method, Contextualized Word Embeddings not only captures the static semantic meaning but also the context of a sentence [9].

For example, there are two sentences, namely "I like apples" and "I like apple MacBooks. It can be noted that the word "apple" has a different semantic meaning for each sentence. By using the Contextualized Language Model, the process of embedding the results of vector calculations on words in the sentence will produce a different value.

One implementation of the Contextualized Word Embeddings Method is the BERT Model based on Transformer Architecture [10].

3.6 Transformers

Transformer is a simple machine-learning model architecture consisting of an encoder and a decoder that uses careful attention. The encoder on the transformer is tasked with understanding the context of the sentence by calculating the vector value of each word in the sentence and then transforming the sentence using attention. While the decoder is in charge of displaying the output of sentences that have been processed on the encoder, namely the class of the sentence [22].

In the case of scientific article classification, the author only applies the role of the encoder to find out the context of each sentence, in this case, the abstract and title of the scientific article. By applying an encoder to the classification process, the context of the abstract in scientific articles can be known, and this makes it easier to determine the class of the abstract. The classes in question are keywords from scientific articles.

(6)

142 Journal Volume 8, Number 2, August 2023, pp 137-147 In implementing the encoder from the Transformer, the author uses the BERT Model in its implementation.

3.7 BERT Model

BERT is a model that implements the encoder role in Transformer. The encoder applied to BERT plays a role in knowing the context of a sentence by calculating the vector value of each word in that sentence [10]. There are two processes applied to the Encoder in the BERT Model, namely pre-training and fine-tuning.

Pre-training plays a role in carrying out two unsupervised stages, namely the Masked Language Model and Next Sentence Prediction [10]. In the case of scientific article classification, at the Masked Language Model stage, the process of adding Masked Language is carried out between each word in the abstract sentence. The output of the Masked Language is a word that defines the relationship between each word in an abstract sentence. In the Next Sentence Prediction stage, a process is carried out to determine the relationship of each abstract sentence.

After going through the pre-training stage, the fine-tuning process is then carried out. The Fine-tuning process plays a role in replacing the Fully Connected Output Layer with the desired Output Layer [10]. The output layer used in the case of scientific article classification is a class, namely a collection of scientific article keywords. After determining the output layer according to the case study used, a supervised training process is carried out using data, namely abstract sentences that have been pre-trained and data collection of keywords in scientific articles.

3.8 Pre-Trained SciBERT

In the case of scientific article classification, the authors apply the SciBERT Pre- trained Model. SciBERT contains a total of 1.14 million scientific article data taken from the Semantic Scholar website which has gone through a pre-training process. In the SciBERT model, 18% of articles are from the field of computer science and 82%

are from the field of biomedical science. All parts of the text in scientific articles are pre-trained. On average, each scientific article consists of 154 sentences or a total of 2,769 tokens [16].

By applying the SciBERT Pre-trained Model the scientific article classification process becomes faster. This is because the training process from the start is only carried out on the output parameters in the fine-tuning process. While the pre-training process does not need to be done, the fine-tuning process utilizes the pre-training results from the SciBERT Model.

3.9 Confusion Matrix

A confusion matrix is a method used to evaluate the results of the classification being carried out. This method presents the classification results using the matrix shown in Table 1 [23].

Table 1. Confusion Matrix Actual Classification Classified Into

+ -

+ True Positive False Positive

- False Negative True Negative

(7)

Sulthan Rafif et al. , Multilabel, Classification for …143 True Positive is the number of positive records that have been classified as positive. A false Positive is a positive record that is incorrectly classified as negative.

False negatives are negative records that are incorrectly classified as positive. True Negative is a negative record that is successfully classified as a negative record.

In this study, the Confusion Matrix method was used to evaluate the resulting Naive Bayes classification model. The confusion matrix evaluation method produces a calculation with five outputs, namely:

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ^𝑇𝑃

𝑇𝑃+𝐹𝑃 × 100% (1) 𝑅𝑒𝑐𝑎𝑙𝑙 = ^𝑇𝑃

𝑇𝑃+𝐹𝑁 × 100% (2) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ^{𝑇𝑃+𝑇𝑁}

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 × 100% (3) 𝐸𝑟𝑟𝑜𝑟 = 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁^{𝐹𝑃+𝐹𝑁} × 100% (4) 𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = ^𝑇𝑃

𝑇𝑃+ ¹

2 (𝐹𝑃+𝐹𝑁) ⁽⁵⁾

4. Results and Discussion

4.1 Dataset Transformation

At this stage, the dataset containing abstract data along with a collection of keywords in multi-label form will be transformed into binary form. The process of transforming data into a binary form is carried out for each abstract data at the training stage. The binary classification that is carried out is to determine each keyword in the dataset whether is included in the set of keywords that are owned by abstract data or not.

The process of binary classification in the Neural Network Model, namely the BERT Model, is carried out using the Sigmoid function.

4.2 Preprocessing

The pre-processing stage is carried out to cut off and filter words on the document or is called the Tokenizing stage. Word splitting is done by using spaces as separators between words. Filtering is done by removing certain characters such as punctuation marks. The next step is to change all capital letters in the document to lowercase or it is called the Case Folding stage. Then, the removal of common words in the dataset is carried out or it is called the stop word Removal stage. After that, the word transformation contained in a document is carried out into the basic word [24].

The basic word transformation stage is called Stemming. In the BERT Model, the stages of tokenizing, case folding, stop word removal, and stemming are carried out using the BERT Pre-Trained Model. In this study, a pre-trained Bert model was applied, namely SciBERT.

Next, the encoding process is done by applying padding to each text data in a batch, namely determining the input ids, attention mask, and token type for each abstract data. Input ids are input tokens of abstract data. The input token is determined based on each word belonging to the abstract data. Input ids can be displayed in the form of human-readable tokens, namely tokens that contain a collection of words in abstract data along with CLS tokens and SEP tokens. The CLS token is placed at the

(8)

144 Journal Volume 8, Number 2, August 2023, pp 137-147 first index in the word set array, indicating that all words must start with the CLS token.

The SEP token is placed at the last index in the word set array, indicating that all words must end with a CLS token.

The ID type token represents the index of the sentence in the abstract data. Ids type tokens will be represented with an index of 0 for all abstract data, this is because in this study a classification case is applied using one sentence, namely abstract data.

Attention mask is used to equalize the sentence length for all abstract data in the dataset. The stages of the attention mask are first looking for abstract data with the longest sentence in the dataset. Next, the length equalization process is carried out by adding PAD tokens to the empty space in the sentence. PAD is a token that will not be processed in classification.

Furthermore, labelling is carried out based on a collection of keywords for each abstract on training data and validation data in the one-hot-encoding form. The one- hot-encoding form is implemented in the form of an array. Where every single value in the array has a range between 0 and 1. The length of the array represents the total number of labels or keywords in the dataset. The order of values in the array represents the labels in the dataset. The value 0 indicates that the keyword is not part of the set of keywords belonging to abstract data. The value 1 indicates that the keyword is part of a collection of keywords belonging to an abstract.

4.3 BERT Classification

Furthermore, the classification process is carried out using the BERT Model. The stages of BERT Classification are first to initialize the hyperparameter, namely the number of classes = 8758, the maximum length of tokens in abstract data = 300, learning rate = 2e-05, the number of batches for each one epoch = 32, and the number of epochs

= 5. In the classification process, a training process is carried out using training data and validation data to form a classification model.

4.4 Model Evaluation 1) Loss

The resulting BERT classification model is then evaluated based on the loss value for each step. There are a total of 325 steps performed for 5 epochs. Figure 2 shows the graph of the number of steps for each epoch.

Figure 2. Graph of the number of steps each epoch

(9)

Sulthan Rafif et al. , Multilabel, Classification for …145 Figure 3 shows the loss graph for validation and training data based on the number of steps.

Figure 3. Graph of Loss Training Data and Validation Data

Table 2 shows the values of train loss and validation loss, from the classification results of the Training Data and Validation Data using the BERT Classification Model in 5 epochs.

Table 2. Training Data and Validation Data Loss Results No Validation Data

Loss

Train Data Loss

1 0.514 0.511

Based on Table 2 and Figure 3. The training process carried out in 5 epochs resulted in a loss value for data validation and training data of 0.514 and 0.511.

2) Confusion Matrix

Furthermore, the evaluation process is carried out on the BERT classification model using a confusion matrix. The results obtained from the calculation of accuracy, precision, recall, and error for the data testing documents that have been carried out by the classification process. The results of the calculation of accuracy, precision, recall, error, and f-1 score can be seen in Table 3.

Table 3. Confusion Matrix Results No. Calculation Name Results

1. Accuracy 0.71

2. Precision 0.71

3. Recall 0.71

4. Error 0.29

5. F1-Score 0.83

The results of the evaluation of the BERT classification model show an accuracy value of 0.71, a precision of 0.71, a recall of 0.71, an error value of 0.29, and a f-1 score of 0.83.

(10)

146 Journal Volume 8, Number 2, August 2023, pp 137-147

5. Conclusions And Future Works

The evaluation results by applying the BERT Model to the case of multi-label classification on abstract data for keyword determination resulted in an a loss value of Training Data is 0.514, loss value of Validation Data is 0.511, accuracy value of 0.71, a precision value of 0.71, a recall value of 0.71, an error value of 0.29 and a f-1 score of 0.83. Based on the results of the evaluation, it shows that the BERT model provides good results in performing classification to determine the keywords of scientific articles based on abstract documents. In future research, title and abstract documents will be applied as input features to provide better classification results.

References

1. Author, “How to Write a Paper in Scientific Journal Style and Format Reprinted from the On-line Resources Website http://www.bates.edu/biology/student-resources/resources/The title goes here, cente,” 2014, [Online]. Available: http://www.bates.edu/biology/student- resources/resources/.

2. W. Lestari, “Pembuatan Indeks Kata Kunci Jurnal di PDII-LIPI (Pusat Dokumentasi dan Infromasi Ilmiah Lembaga Ilmu Pengetahuan Indonesia),” (2012).

3. Springer, “No Title,” Springer, 2023. https://www.springer.com/gp/authors- editors/authorandreviewertutorials/writing-a-journal-manuscript/title-abstract-and-

keywords/10285522 (accessed Mar. 06, 2023).

4. S. Nature, “Title, Abstract and Keywords,” 2023, 2023.

https://www.springer.com/gp/authors-editors/authorandreviewertutorials/writing-a-journal- manuscript/title-abstract-and-keywords/10285522 (accessed Mar. 09, 2023).

5. Yupi Royani; Mulni Adelina Bachtar; Kamariah Tambunan; Tupan Tupan; Sugiharto Alm,

“Pemetaan Karya Tulis Ilmiah Lpnk: Studi Kasus Lipi Dan Bppt (2004-2008),” Baca J.

Dokumentasi Dan Inf., vol. 34, no. 1, pp. 1–28, 2013, doi:

http://dx.doi.org/10.14203/j.baca.v34i1.171.

6. J. L. B. danVictor H. Faris Musthafa, “Pemodelan Multilabel Tweet Media SosialMahasiswa untuk Klasifikasi Keluhan,” J. Tekik Its, vol. 7, no. 1, pp. A247–A252 (2018).

7. A. J. Rivera and M. J. Jesus, Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus (auth.) - Multilabel Classification _ Problem Analysis, Metrics and Techniques (2016, Springer International Publishing) - libgen.lc.pdf. .

8. S. Rafif, P. Yoga Saputra, and M. Zawaruddin Abdullah, “Classification of Trends in Lecturer Research Fields Using Naive Bayes Method,” Proc. - IEIT 2021 1st Int. Conf.

Electr. Inf. Technol., pp. 92–98, 2021, doi: 10.1109/IEIT53149.2021.9587385.

9. M. E. Peters et al., “Deep contextualized word representations,” NAACL HLT 2018 - 2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol.

1, pp. 2227–2237, 2018, doi: 10.18653/v1/n18-1202.

10. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf.

North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no.

Mlm, pp. 4171–4186 (2019).

11. A. K. Ambalavanan and M. V. Devarakonda, “Using the contextual language model BERT for multi-criteria classification of scientific articles,” J. Biomed. Inform., vol. 112, no.

October, p. 103578, 2020, doi: 10.1016/j.jbi.2020.103578.

12. S. Saadah, Kaenova Mahendra Auditama, Ananda Affan Fattahila, Fendi Irfan Amorokhman, Annisa Aditsania, and Aniq Atiqi Rohmawati, “Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indonesia,” J.

RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 4, pp. 648–655, 2022, doi:

10.29207/resti.v6i4.4215.

13. B. Juarto and A. S. Girsang, “Neural Collaborative with Sentence BERT for News Recommender System,” Int. J. Informatics Vis., vol. 5, no. 4, pp. 448–455, 2021, doi:

10.30630/JOIV.5.4.678.

14. C. Babi, M. S. Roshini, P. Manoj, and K. S. Kumar, “Fake Online Reviews Detection and Analysis Using Bert Model,” vol. 10, pp. 2748–2756 (2023).

15. V. Ovidiu-Mihai, I. Tudor-Alexandru, and M. Petrescu, “Using BERT to extract emotions

(11)

Sulthan Rafif et al. , Multilabel, Classification for …147 from personal journals,” Proc. - 2022 IEEE 18th Int. Conf. Intell. Comput. Commun. Process.

Conf. ICCP 2022, pp. 89–94, 2022, doi: 10.1109/ICCP56966.2022.10053943.

16. I. Beltagy, K. Lo, and A. Cohan, “SCIBERT: A pretrained language model for scientific text,” EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt.

Conf. Nat. Lang. Process. Proc. Conf., pp. 3615–3620, 2019, doi: 10.18653/v1/d19-1371.

17. M. Krapivin, A. Autayeu, and M. Marchese, “Large Dataset for Keyphrases Extraction,”

Tech. Rep. DISI-09-055, no. DISI-09-055 (2009).

18. E. Gibaja and S. Ventura, “A tutorial on multilabel learning,” ACM Comput. Surv., vol. 47, no. 3, 2015, doi: 10.1145/2716262.

19. G. Tsoumakas, I. Katakis, and I. Vlahavas, “Data Mining and Knowledge Discovery Handbook,” Data Min. Knowl. Discov. Handb., no. May 2016, pp. 0–20, 2010, doi:

10.1007/978-0-387-09823-4.

20. M. L. Zhang and Z. H. Zhou, “A review on multi-label learning algorithms,” IEEE Trans.

Knowl. Data Eng., vol. 26, no. 8, pp. 1819–1837, 2014, doi: 10.1109/TKDE.2013.39.

21. S. Elfwing, B. Robot, A. T. R. Computational, and N. Laboratories, “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning,” no.

2015, pp. 1–18 (2017).

22. A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017- Decem, no. Nips, pp. 5999–6009 (2017).

23. R. N. Devita, H. W. Herwanto, and A. P. Wibawa, “Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa indonesia,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 4. p. 427, 2018, doi:

10.25126/jtiik.201854773.

24. R. R. Waliyansyah and C. Fitriyah, “Perbandingan Akurasi Klasifikasi Citra Kayu Jati Menggunakan Metode Naive Bayes dan k-Nearest Neighbor (k-NN),” J. Edukasi dan Penelit. Inform., vol. 5, no. 2, p. 157, 2019, doi: 10.26418/jp.v5i2.32473.