Classification of Company Level Based on Student Competencies in Tracer Study 2022 using SVM and XGBoost Method

(1)

Classification of Company Level Based on Student Competencies in Tracer Study 2022 using SVM and XGBoost Method

Tyo Revandi, Putu Harry Gunawan^*

School of Informatics, Informatics, Telkom University, Bandung, Indonesia Email: ¹[email protected], ^2,*[email protected]

Correspondence Author Email: [email protected]

Abstract−Assessing the quality level of companies where graduates are employed is crucial for understanding the impact of academic programs on career placements. The use of methodologies that do not match the research objectives may lead to inaccurate or irrelevant analysis. When company classification methods are not aligned with the nature of the data collected in a tracking study, the risk of misinterpretation and the formulation of invalid generalizations becomes apparent. This study utilizes the 2022 Tracer Study Data from Telkom University, encompassing responses from 4306 graduates working across Local, National, and Multinational companies. The research employs support vector machine (SVM) and XGBoost algorithms to analyze and classify the company levels of the surveyed graduates. The primary objective is to enhance the accuracy of company level classification, thereby facilitating a more precise analysis of the Tracer Study dataset. The SVM and XGBoost algorithms are rigorously tested, and the results indicate an accuracy improvement with the XGBoost method, yielding a 2%

increase over the SVM method. The evaluation is conducted with a data separation of 20% test data and 80% training data.

This research not only contributes to the refinement of company level classification in the context of Tracer Studies but also underscores the potential of machine learning algorithms, specifically SVM and XGBoost, in providing valuable insights into graduates' professional trajectories. The findings of this study pave the way for more informed decision-making processes in academic and career development initiatives.

Keywords: Classification; Company Level; Tracer Study; Support Vector Machine (SVM); XGBoost

1. INTRODUCTION

Tracer Study, commonly known as an alumni tracking system, represents a systematic effort to gather information about graduates through the completion of questionnaires. As an invaluable instrument, Tracer Study goes beyond merely tracking the success of alumni; it serves as a key element in shaping graduation portfolios and enhancing overall improvements within the higher education environment.

The utility of implementing Tracer Study extends beyond academic evaluation, playing a pivotal role in determining how well a higher education institution prepares its graduates for the challenges of the workforce. In this context, alumni tracking becomes a critical foundation to ensure that the curriculum and learning experiences provided align with the competencies demanded in today's professional landscape.Telkom University already has a tracer study in the form of a questionnaire. However, alumni must fill in their personal data first in to the questionnaire. In the 2022 Telkom University tracer study data, there are 4306 student data who have graduated and filled out the tracer study questionnaire.

Tracer Study also functions as an effective tool in supporting the enhancement of graduation portfolios, subsequently elevating the reputation and appeal of higher education institutions. By evaluating the achievements and progress of alumni, colleges and universities can identify areas for improvement, optimize academic programs, and develop new strategies to ensure the success of graduates across various sectors.

Moreover, it is important to acknowledge that Tracer Study is not merely an internal evaluation tool but also serves as a guide for students who have completed their studies. By mapping the competencies acquired during their academic journey, Tracer Study helps align the skills of graduates with the practical demands of the real world. This creates a robust bridge between the realms of education and industry, ensuring that graduates possess not only academic knowledge but also the practical skills necessary for success in their careers.

Furthermore, Tracer Study provides a comprehensive overview of the positions held by graduates in the job market. This information not only offers in-depth insights into the impact and relevance of higher education on the workforce but also helps institutions understand career trends, industry needs, and changes in the business environment that may influence future educational strategies.

In 2018, Dika Rizky Nurcholis et al. Do research on Tracer Study Results Analysis of Telkom University Alumni using Minimum Spanning Tree (MST). The Kruskal algorithm which is summarized using five kinds of centrality, namely degree centrality, betweenness centrality, proximity centrality, eigenvector centrality, and overall centrality can be used to determine the most influential competencies. Based on the results of the study, the six most influential competencies were obtained, namely competency number (1) Knowledge in the field or discipline, (22) Ability to hold responsibility, (4) Internet Skills, (3) General knowledge, (8) Learning Ability, and (8) Critical Thinking [1].

In 2019, May Rozakhi Takkas et al. Do research on Analyzing Tracer Study Results for Telkom University Alumni with Forest of All Minimum Spanning Trees (MSTs). Based on research, the highest competency of the five centralities is number (21) Leadership. In other words, competency number (21) is the most influential and central competency among all other competencies with a score of 0.9802538 [2].

(2)

In 2021, Zahrina Aulia Adriani et al. Do research on the Prediction of University Student Performance Based on a Tracer Study Dataset Using an Artificial Neural Network. Based on research using the SMOTE (Synthetic Minority Oversampling Technique) method to overcome the imbalance of the dataset, the model accuracy is 0.87 with a 10% increase with K = 3 obtaining an accuracy of 0.78. The Evaluation Score increased towards class 2 Precision = 0.50, Recall 0.45, F1- Score = 0.47 with 0.87 accuracy model [3].

In 2022. Tutik et al. Designing a tracer study application for alumni of SMK Negeri 1 Sukorejo based on Android. With the stages of research that are done first making an android-based tracer study application. After that, knowing the effectiveness and efficiency through usability tests including learnability, efficiency, and errors.

Learnability obtained a value of 97.14%, time base efficiency of 0.018 goals/sec and overall relative 87.9%, error obtained a value of 0.05. Based on the research results, the tracer study application has fulfilled the usability test and is suitable for use as a graduate tracing application at SMK Negeri 1 Sukorejo [4].

In this research, the analysis and classification of the company level of Telkom University students at the Local, National, and Multinational company levels will be elaborated. Here, the classification uses the Support Vector Machine (SVM) and XGBoost algorithms to compare the accuracy model and evaluation score of the company level.

2. RESEARCH METHODOLOGY

2.1 Dataset

The dataset used is the 2022 Telkom University tracer study data. The data includes atributes used are At the time of graduation, how closely the ethical competencies you mastered (F1761), At the time of graduation, how closely the expertise competencies based on the field of knowledge you mastered (F1763), At the time of graduation, how closely the time management competencies you mastered (F1765), At the time of graduation, (F1767), At the time of graduation, how closely your competence in the use of information technology (F1767), At the time of graduation, how closely your communication competence (F1769), At the time of graduation, how closely your competence in teamwork (F1771), At the time of graduation, how closely your competence in self-development (F1773), How much do you earn, and the level of the company you work for.

Table 1. Sample Dataset Company

Level F1761 F1763 F1765 F1767 F1769 F1771 F1773

National High High Very High High

Enough High Very High High

National High High Enough

High Enough

National High High High High High High

Enough

High Enough

National High High High High High

Enough High High

Based on the table 1, in the context of this research, the company level is identified as the level at which students work, covering national, local, and multinational scales. Further analysis also highlighted the variability of students' abilities, as measured through variables such as F1761, F1763, F1765, F1767, F1769, F1771, and F1773. These variables detail the levels of student ability ranging from very low, low, high, high enough, to very high.

In addition, the income earned by the students, represented by the salary categories of <4m, 4-7m, 7-10m, 10-15m, and >15m, provides an additional dimension in understanding the different income levels that may be related to the level of the company and the ability of the students. Thus, this research holistically covers these aspects in an in-depth effort to describe and analyze the relationship between company level, student ability, and income in the context of the scope of this research.

2.2 Research Stages

This dedicated section serves as an intricate visual representation, encapsulated in a meticulously crafted flowchart, elucidating the step-by-step progression of the research methodology intended for application at the company level. The flowchart not only acts as a visual guide but also delineates the systematic sequence of activities that will be undertaken to execute the research plan effectively. This comprehensive flow facilitates a holistic understanding of the research process, enabling a seamless transition from data acquisition to model evaluation.

Within this framework, the incorporation of advanced machine learning algorithms adds a layer of sophistication to the research design. Specifically, the support vector machine algorithm and XGBoost are strategically employed to assess and quantify the performance metrics, primarily focusing on accuracy and the overall efficacy of the predictive models. This methodological choice reflects a deliberate effort to harness the

(3)

power of cutting-edge computational techniques, ensuring a nuanced and in-depth analysis of the intricate relationships within the research variables.

Figure 1. Research Stages

In the preliminary phase of this research, illustrated in Figure 1, the primary step involves the collection of a comprehensive dataset derived from student-filled forms. These forms serve as a valuable repository of diverse information crucial for the subsequent analytical processes. Moving forward, the data undergoes a meticulous preprocessing stage aimed at enhancing its quality and reliability.

During the preprocessing phase, a thorough examination for missing values is conducted, utilizing a systematic approach to identify and address any gaps in the dataset. Subsequently, a strategic decision is made to discard rows that contain null or missing values, ensuring the integrity and completeness of the dataset. This curation process is imperative to mitigate potential biases and inaccuracies that may arise from incomplete data.

Following the data cleansing process, the focus shifts to feature selection, where the research endeavors to identify the most relevant and influential variables. This involves a comprehensive evaluation of each feature's significance in contributing to the research objectives. Additionally, a scrutiny of Spearman correlation values is carried out to gauge the interdependencies between different variables, providing valuable insights into potential relationships and patterns within the dataset.

As a pivotal step in the analytical pipeline, the data undergoes a transformation into categorical form. This transformation is driven by the need to categorize variables, enabling a more structured and interpretable representation of the information. This categorical conversion facilitates subsequent statistical analyses and machine learning applications, paving the way for a nuanced exploration of the dataset and extraction of meaningful insights.

(4)

In essence, the multi-faceted approach outlined above, as visualized in Figure 1, underscores the significance of a systematic and robust methodology in handling the acquired dataset. Each step contributes to the overall data quality and prepares the groundwork for subsequent stages of analysis, ensuring the research is poised to derive meaningful conclusions and insights from the student data.

Subsequently, following the completion of the preprocessing stage, the dataset undergoes a partitioning process, wherein it is divided into training and testing sets in an 80:20 ratio. This partitioning is integral to ensure the efficacy and generalizability of the ensuing predictive models. The preprocessed data is then subjected to two distinct machine learning algorithms, specifically Support Vector Machine (SVM) and XGBoost, representing diverse approaches to pattern recognition and classification.

Upon the implementation of SVM and XGBoost on the respective training datasets, the models are rigorously evaluated to gauge their performance and predictive accuracy. The evaluation process employs metrics such as accuracy value, offering a quantitative measure of the models' overall correctness in predicting outcomes.

Additionally, a detailed performance analysis is conducted utilizing the confusion matrix, which provides a granular insight into the true positive, true negative, false positive, and false negative predictions made by the models.

This comprehensive evaluation framework allows for a nuanced understanding of the strengths and weaknesses of the employed machine learning algorithms. It enables researchers to not only quantify the overall accuracy of the models but also delve into the specifics of their predictive capabilities and identify potential areas for improvement. As a crucial step in the analytical pipeline, this model evaluation phase serves to validate the robustness and reliability of the predictive models generated through the application of SVM and XGBoost on the preprocessed dataset.

2.3 Preprocessing

In this stage, three processes are carried out, namely eliminating missing values or null data, selecting features, and converting data into categorical or transformed data.

2.3.1 Data Cleaning

Data cleaning is one of the stages in preprocessing that is useful for removing, and changing incomplete data. At this stage, data cleaning is done to remove outliers from the data set [3].

2.3.2 Feature Selection

Before we apply SVM to a high-dimensional dataset, we use a feature selection algorithm to reduce the number of attributes. The feature selection algorithm is a popular technique used to find the most important and optimal subset of features to build a robust learning model. An efficient feature selection method can eliminate irrelevant features and redundant data [5]. This research uses statistical methods that measure the correlation or dependence between variables that can be filtered to select the most relevant features to obtain the 9 most relevant features to be studied [3].

2.3.3 Data Transformation

Data Transformation is a change in data into a form that is suitable for mining. In this research, data changes are made by changing the form of data into categorical data using the ordinal encoding method [3].

2.4 Support Vector Machine (SVM)

Support vector machines are computer algorithms that learn through examples to label objects [6]. Support vector machine (SVM) is a supervised learning method that analyzes data and recognizes patterns. Support vector machines are used for regression analysis and classification with novelty detection. Given a set of training data in two classes, the support vector machine algorithm builds a model or classification function that places new observations into one of the two classes on either side of a hyperplane, making it a binary linear classifier [7].

SVMs are divided into two categories, namely linear and non-linear. Linear SVMs separate data linearly, meaning the separation of two classes on a hyperplane with soft margins. Non-linear SVMs, on the other hand, involve the use of kernel techniques to transform the data into a high-dimensional space [8]. The mathematical model of the conventional SVM can be interpreted as a convex quadratic programming problem. Furthermore, through the learning hard interval maximization process, a linear classification machine can be obtained [9]. SVMs that use kernel techniques can map the original data from its original dimension to a relatively higher dimension [10]. Here is the regression function used [11]:

Suppose x = {x_i, … , x_n} is the input vector and y = {y_i, … , y_n} is target, then:

f(x) = w^Tφ(x) + b (1)

Description:

w ∶ weigt vector

φ(x) ∶ a function that map x to n-dimensional space b ∶ bias

(5)

In non-linear separable problems, the input space is transformed into a high-dimensional feature space, and in this new space, the separating hyperplane is found. The optimal hyperplane must be able to distinguish categories accurately, so it looks for the hyperplane with the maximum distance between the classes, i.e. the hyperplane that is most effective in separating the classes [12].

In SVM, there is a kernel term that is used for non-linear problems where the data will be projected to a higher dimension so that it is easier to perform linear separation. The following are the kinds of kernels that can be used by SVM [11]:

1. Radial Basis Funtion Kernel

k(x, y) = exp (g‖x − y‖²) (2)

2. Polinomial Kernel

k(x, y) = (x, y + c)^d (3)

3. Linear Kernel

k(x, y) = (x^Ty + c)^d (4)

Description:

x ∶ is the input data y ∶ is the target data

g ∶ is inversely proportional to standard deviation c ∶ is a constant number

d ∶ is the degree of polynomials

2.5 Extreme Gradient Boosting (XGBoost)

eXtreme Gradient Boosting (XGBoost) is an efficient and structured implementation of a gradient boosting framework [13]. XGBoost, which stands for eXtreme Gradient Boosting, is a highly adaptive and versatile tool. It is designed to efficiently utilize available resources and overcome obstacles that may arise from previous gradient boosting [14]. XGBoost method is a development algorithm of gradient tree boosting based on an ensemble algorithm, which can effectively handle large-scale machine learning cases. The XGBoost method was chosen because it has several additional features that are useful for speeding up the calculation system and preventing overfitting [15]. XGBoost only uses some hardware such as processor, memory, and disk space so that it does not burden the main memory alone. The XGBoost tree model can be integrated with the summation method, the objective equation is [16]:

Obj = ∑ l(x̂_i _i, y_i) + ∑ Ω(f_k _k) (5)

From the objective equation:

l is a loss function that represents the loss to measure the difference between the predicted value x̂_i and the target value y_i.

Ω is a function to prevent overfitting of the data.

The Ω function can be described as follows [16]:

Ω(f) = γT +¹

2λ‖w‖² (6)

T is the number of leaves for each tree.

w is the weight of the leaves for each tree [16].

In the performance of the optimized XGBoost model, this study conducted a customized evaluation matrix to investigate and assess its capabilities on test data. This approach provides deep insight into the extent to which the model can overcome challenges that may arise in harsher testing scenarios [17].

2.6 Performance Analysis

Performance analysis provides information in assessing the accuracy and information on each of the XGBoost and SVM algorithms on data. After that, the model results will be evaluated using Confusion Matrix to compare the two methods with the accuracy of the model at the working company level.

Confusion matrix is a table that details and organizes information related to the classification of test data, identifying the number of correctly and incorrectly classified data [18].

Table 2. Confusion Matrix Classification Actual Class

Positive Negative Prediction

Class

Positive TP FP

Negative FN TN

(6)

Table Explanation 2, Description [19]:

TP is true positive, which is the amount of positive data that is correctly classified.

TN is true negative, which is the amount of negative data that is correctly classified.

FP is false positive, which is the amount of positive but misclassified data.

FN is false negative, which is the amount of negative but misclassified data.

Model performance can be measured by several scores, namely Accuracy, Precision, Recall and F1-score with Confusion Matrix [3].

Accuracy = ^TP+TN

TP+TN+FP+FN (7)

Precision = ^TP

TP+FP (8)

Recall = ^TP

TP+FN (9)

F1 − Score = 2 Precision x Recall

Precision+Recall (10)

Accuracy is a metric that indicates how many correct predictions are generated by an algorithm. However, when the data distribution is unbalanced, achieving high accuracy with this method does not necessarily reflect its ability to distinguish effectively and efficiently between different categories [20].

3. RESULTS AND DISCUSSION

3.1 Dataset Information

This research utilizes the information dataset, employing eight distinct attributes to present Spearman correlation, mean, and standard deviation in the subsequent table.

Table 3. Dataset Score Statistics

Attributes Correlation Spearman Mean Std Dev

F1761 0.030 2.934 1.444

F1763 0.017 2.641 1.692

F1765 0.035 2.865 1.535

F1767 0.022 2.961 1.443

F1769 0.026 2.825 1.536

F1771 0.040 3.021 1.407

F1773 0.006 2.112 1.119

Pendapatan_diperoleh 0.008 1.575 0.922

Based on Table 3, there are 9 attributes or features used in this study, which have been selected as the most relevant features by having a correlation > 0.001 using Spearman correlation and not having missing values. So it can be continued for scenario 1 testing by comparing the accuracy values of 3 classes and 2 classes.

Figure 2. Three-class and two-class working company level data distribution

Based on the information in Figure 2, the number of working company-level classes can be divided into three categories, namely Local, National, and Multinational. The data shows that the Local class has as much as 1210 data, while the National class has a higher amount of data, which is 2510 data. Furthermore, the Multinational class has 586 data. In addition, there is a separation into two classes, namely Local and National. The Local class in this case has 1210 data, while the National class has a larger amount of data, reaching 3096 data. This

(7)

information provides a clear picture of the distribution and proportion of data within each company-level class, providing a solid basis for further analysis.

3.2 Scenario I Testing Results

In this scenario, 1 test, is carried out to find the best data class from 3 working company-level classes and 2 working company-level classes. For 3 classes include Local, National, and Multinational classes and for 2 classes include Local and National classes using the Support Vector Machine (SVM) method.

Table 4. Evaluation Score of 3 Class and 2 Class

Class 3 Class 2 Class

0 1 2 0 1

Accuracy 0.39 0.71

Precision 0.51 0.32 0.44 0.43 0.72 Recall 0.31 0.61 0.42 0.01 0.99 F1-Score 0.39 0.42 0.35 0.02 0.83

Based on Table 4, from the results of the evaluation score to improve accuracy in this study using the SVM method with the division of test data and training data 20: 80 obtained the best accuracy in 2 classes with an accuracy value of 0.71, Precision class 0 = 0.43, Recall class 0 = 0.01, F1-Score class 0 = 0.02 and Precision class 1 = 0.72, Recall class 1 = 0.99, F1-Score class 1 = 0.83. After that, testing scenario II with 2 classes using the SVM method and scenario III with 2 classes using the XGBoost method looking for accuracy based on the level of the company's work.

3.3 Scenario II Testing Results

In this scenario II test using the best model of the company's level of work, namely 2 classes, Local and National.

By using the SVM method with training data and test data of 80:20 running train split data 10 times to get the average accuracy of the SVM method.

Table 5. Evaluation Score 2 Class with SVM Class Precision Recall F1-Score Accuracy

0 0.43 0.01 0.02

0.714

1 0.72 0.99 0.83

0 0.38 0.01 0.02

0.727

1 0.73 0.99 0.84

0 0.22 0.01 0.02

0.721

1 0.73 0.99 0.84

0 0.38 0.01 0.02

0.727

1 0.73 0.99 0.84

0 0.50 0.01 0.02

0.718

1 0.72 1.00 0.84

0 0.50 0.01 0.02

0.719

1 0.72 1.00 0.84

0 0.40 0.01 0.02

0.713

1 0.72 1.00 0.83

0 0.40 0.01 0.02

0.708

1 0.71 1.00 0.83

0 0.22 0.01 0.02

0.729

1 0.74 0.99 0.84

0 0.50 0.01 0.02

0.711

1 0.71 1.00 0.83

Average 0.719

The results of this scenario II test obtained an average accuracy = 0.719. After that, the scenario III stage is carried out using the XGBoost method to find the average accuracy. To get the best method model.

3.4 Scenario III Testing Results

In scenario III test using the XGBoost method with training data and test data of 80:20, running train split data 10 times to obtain the average accuracy of the XGBoost method.

Table 6. Evaluation Score 2 Class with XGBoost Class Precision Recall F1-Score Accuracy

0 0.43 0.01 0.03 0.741

(8)

Class Precision Recall F1-Score Accuracy

1 0.74 0.99 0.85

0 0.50 0.01 0.02

0.748

1 0.75 1.00 0.86

0 0.50 0.02 0.04

0.748

1 0.75 0.99 0.86

0 0.25 0.01 0.02

0.759

1 0.76 0.99 0.86

0 0.41 0.03 0.06

0.734

1 0.74 0.98 0.85

0 0.60 0.01 0.03

0.737

1 0.74 1.00 0.85

0 0.50 0.02 0.04

0.733

1 0.74 0.99 0.85

0 0.33 0.01 0.02

0.738

1 0.74 0.99 0.85

0 0.29 0.01 0.02

0.738

1 0.74 0.99 0.85

0 0.44 0.02 0.03

0.740

1 0.74 0.99 0.85

Average 0.742

The results of this scenario III test obtained an average accuracy = 0.742.

3.5 Best Method Model

The best method model seen from the evaluation scores of the two methods, namely Support Vector Machine (SVM) and XGBoost, is 2 classes, namely Local and National with an average accuracy of SVM = 0.719 and an average accuracy of XGBoost = 0.742. So the best model is obtained, namely 2 classes of Local and National with the XGBoost Method with an average accuracy = 0.742. The parameter values in the XGBoost method are as follows.

Table 7. Parameter values used Parameter Nilai

“num_class” 2

“max_depth” 3

“eta” 0.1

“subsample” 0.7

“colsample_bytree” 0.7

“seed” 42

4. CONCLUSIONS

Based on the comprehensive exploration and discussion of the findings, this research rigorously applies SVM and XGBoost methods to conduct an in-depth analysis and classification of the company levels within Telkom University's 2022 tracer study data. The meticulous testing of various scenarios proves pivotal for the subsequent phases of research, with Scenario I focusing on the optimal performance achieved with 2 classes: Local and National. Following this, Scenario 2 delves into the implementation of the SVM method on the identified best 2 classes, subjecting the model to 10 iterations to ascertain an average accuracy of 0.719, along with a detailed evaluation score presentation. Subsequently, Scenario 3 unfolds, employing the XGBoost method to determine the average accuracy over 10 runs, revealing a noteworthy value of 0.742. These outcomes collectively highlight the XGBoost method as the superior model, particularly when considering the 2 Working Company Level Classes—

Local and National. The thorough examination and comparison of these scenarios not only contribute valuable insights into the performance of SVM and XGBoost methods but also establish a foundation for understanding the nuanced dynamics of company-level classification. Consequently, this study discerns XGBoost as the optimal model for discerning the Local and National classes within the Telkom University 2022 tracer study data, underscoring its efficacy in this specific context..

REFERENCES

[1] D. R. Nurcholis and R. F. Umbara, “Analisis Hasil Tracer Study Terhadap Alumni Universitas Telkom dengan menggunakan Minimum Spanning Tree ( MST ),” eProceedings, vol. 5, no. 3, pp. 8093–8104, 2018.

[2] M. R. Takkas, R. F. Umbara, S. Si, M. Si, D. Indwiarti, and M. Si, “Analisis Hasil Tracer Study Terhadap Alumni

(9)

Universitas Telkom dengan Forest of All Minimum Spanning Trees ( MSTs ),” vol. 6, no. 1, pp. 2380–2389, 2019.

[3] Z. A. Adriani and I. Palupi, “Prediction of University Student Performance Based on Tracer Study Dataset Using Artificial Neural Network,” J. Komtika (Komputasi dan Inform., vol. 5, no. 2, pp. 72–82, 2021.

[4] M. Imron Rosadi and K. Kunci, “Rancang Bangun Aplikasi Tracer Study Alumni Smk Negeri 1 Sukorejo Berbasis Android,” J. Krisnadana, vol. 2, no. 1, pp. 277–288, 2022.

[5] I. Syarif, A. Prugel-Bennett, and G. Wills, “SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 14, no. 4, p. 1502, 2016.

[6] S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco, S. Narandes, Y. Wang, and X. U. Wayne, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer Genomics and Proteomics, vol. 15, no. 1, pp. 41–51, 2018.

[7] D. A. Otchere, T. O. Arbi Ganat, R. Gholami, and S. Ridha, “Application of supervised machine learning paradigms in the prediction of petroleum reservoir properties: Comparative analysis of ANN and SVM models,” J. Pet. Sci. Eng., vol.

200, p. 108182, 2021.

[8] A. R. Isnain, A. I. Sakti, D. Alita, and N. S. Marga, “Sentimen Analisis Publik Terhadap Kebijakan Lockdown Pemerintah Jakarta Menggunakan Algoritma Svm,” J. Data Min. dan Sist. Inf., vol. 2, no. 1, p. 31, 2021.

[9] Z. Wan, Y. Dong, Z. Yu, H. Lv, and Z. Lv, “Semi-Supervised Support Vector Machine for Digital Twins Based Brain Image Fusion,” Front. Neurosci., vol. 15, no. July, pp. 1–12, 2021.

[10] M. M. Muzakki and F. Nhita, “The spreading prediction of Dengue Hemorrhagic Fever (DHF) in Bandung regency using K-means clustering and support vector machine algorithm,” 2018 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp. 453–458, 2018.

[11] D. Helyudanto, F. Nhita, and A. Atiqi Rohamwati, “Prediksi Penyebaran Demam Berdarah di Kabupaten Bandung dengan Metode Hybrid Autoregressive Integrated Moving Average ( ARIMA ) – Support Vector Machine ( SVM ) Program Studi Sarjana Informatika Fakultas Informatika Universitas Telkom Bandung Prediksi Pen,” 2019.

[12] D. C. Toledo-Pérez, J. Rodríguez-Reséndiz, R. A. Gómez-Loenzo, and J. C. Jauregui-Correa, “Support Vector Machine- based EMG signal classification techniques: A review,” Appl. Sci., vol. 9, no. 20, 2019.

[13] S. Pan, Z. Zheng, Z. Guo, and H. Luo, “An optimized XGBoost method for predicting reservoir porosity using petrophysical logs,” J. Pet. Sci. Eng., vol. 208, p. 109520, 2022.

[14] E. Al Daoud, “Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset,” Int. J. Comput.

Inf. Eng., vol. 13, no. 1, pp. 6–10, 2019.

[15] S. E. Herni Yulianti, Oni Soesanto, and Yuana Sukmawaty, “Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit,” J. Math. Theory Appl., vol. 4, no. 1, pp. 21–26, 2022.

[16] N. M. Rifki, F. Nhita, and A. R. Aniq, “Prediksi Penyebaran Penyakit Demam Berdarah Dengue ( DBD ) di Kabupaten Bandung Menggunakan Algoritma XGBoost Program Studi Sarjana Informatika Fakultas Informatika Universitas Telkom Bandung,” 2022.

[17] K. Budholiya, S. K. Shrivastava, and V. Sharma, “An optimized XGBoost based diagnostic system for effective prediction of heart disease,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 7, pp. 4514–4523, 2022.

[18] D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” J. Sains Komput. Inform. (J-SAKTI, vol. 5, no. 2, pp. 697–711, 2021.

[19] D. Sharifrazi et al., “Fusion of convolution neural network, support vector machine and Sobel filter for accurate detection of COVID-19 patients using X-ray images,” Biomed. Signal Process. Control, vol. 68, no. March, p. 102622, 2021.

[20] S. Sharma and K. Guleria, “A Deep Learning based model for the Detection of Pneumonia from Chest X-Ray Images using VGG-16 and Neural Networks,” Procedia Comput. Sci., vol. 218, pp. 357–366, 2022.