• Tidak ada hasil yang ditemukan

Classification of Brain Tumor Using K-Nearest Neighbor-Genetic Algorithm and Support Vector

N/A
N/A
Nguyễn Gia Hào

Academic year: 2023

Membagikan "Classification of Brain Tumor Using K-Nearest Neighbor-Genetic Algorithm and Support Vector "

Copied!
5
0
0

Teks penuh

(1)

Classification of Brain Tumor Using K-Nearest Neighbor-Genetic Algorithm and Support Vector

Machine-Genetic Algorithm Methods

Velery Virgina Putri Wibowo Department of Mathematics

University of Indonesia Depok, Indonesia veleryvirgina@sci.ui.ac.id

Zuherman Rustam Department of Mathematics

University of Indonesia Depok, Indonesia

rustam@ui.ac.id

Jacub Pandelaki Department of Radiology Cipto Mangunkusumo Hospital

Jakarta, Indonesia jacubp@gmail.com

Abstract—The emergence of disease is inevitable in Indonesia and throughout the world. Glioblastoma which is one of the most common types of brain tumor is a dangerous disease that leads to death. Patients with this disease have a fairly low survival rate and are generally diagnosed when the tumor has developed further. Therefore, it is essential to make an early diagnosis with accurate results to determine the status of a person who has glioblastoma. In this study, the implementation of machine learning methods, namely K-Nearest Neighbor and Support Vector Machine with Genetic Algorithm as a feature selection (KNN-GA and SVM-GA) were compared to classify glioblastoma. Furthermore, the Genetic Algorithm (GA) was implemented to determine the selected relevant features and classified them by KNN and SVM methods. The numerical data used were obtained from Magnetic Resonance Imaging (MRI) as the results from Dr. Cipto Mangunkusumo Hospital. The results showed that the SVM-GA method using a Radial Basis Function kernel and 5 features with 90% training data was the best for classifying glioblastoma. The obtained values for accuracy, recall, precision, and f1-score were 92.35%, 93.19%, 92.62%, and 92.83% respectively.

Keywords—K-Nearest Neighbor, Support Vector Machine, Genetic Algorithm, Classification, Brain Tumor

I. INTRODUCTION

The emergence of disease is inevitable in Indonesia and throughout the world. There are varioustypes of diseases such as tumors which are related to cell growth that affect humans.

A tumor is an abnormal mass of tissue in the body that is solid or filled with fluid. Generally, benign and malignant i.e noncancerous and cancerous, are the two types of tumors. The benign tumors grow slowly and are confined to their location of origin. Meanwhile, malignant tumors are uncontrolled growth of abnormal or cancer cells [1]. They can appear anywhere in the body and can be life-threatening.

Furthermore, the one that appears in the brain is called a brain tumor.

The brain is an essential organ and functions as a human nerve center that regulates all activities in the body. It is protected by a very hard skull, therefore, any growth in the confined space can lead to various damages. Furthermore, the pressure inside the skull increases while a tumor grows and develops in the brain. This leads to brain damage and can be lethal [2].

A brain tumor is an abnormal growth and development of cells in the brain. Everyone at any age can experience the appearance of this tumor in any area with various shapes and sizes [3]. The malignant brain tumor is the most common and deadly type in children and adolescents. In 2018, there were

296.851 new cases worldwide with 156.217 in Asia. In Indonesia, new cases of malignant brain tumors have a presentation of 1.5% or about 5.323 cases of all cancer cases [4]. Glioblastoma is one of the various types of brain tumors that can be life-threatening. It is the most aggressive type which leads to death after two years in more than two-thirds of adults after diagnosis [5].

In many cases, some symptoms are experienced long before the patient was diagnosed. However, patients do not recognize these symptoms of brain tumors because they are not specific [6]. Furthermore, early diagnosis in patients with glioblastoma needs to be carried out because it increases the survival rate. This can be treated by a doctor to curb its spread and further development. Tests such as computed tomography (CT) scans, magnetic resonance imaging (MRI), or biopsy can be carried out to ascertain the status of a person who has glioblastoma. However, these tests are not enough to detect tumors in the brain early. In addition, due to the features contained in the examination results, it is time-consuming to produce a diagnosis if carried out manually. Along with the times, technology has begun to be used in the health sector to accurately diagnose disease.

Therefore, machine learning was implemented in this study to assist doctors in diagnosing brain tumors with an accurate method. This was carried out by studying the available data patterns to build a mathematical model and making a decision [7]. The purpose of this study is to assist medical staffs in making decision for treatment plans by providing useful information based on the data taken from patients. Furthermore, the machine learning method has various methods that can be used to diagnose a disease. This study implemented the K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) to classify brain tumor diseases. These methods have been widely implemented in previous studies and have high performance in classification.

The KNN method is one of the simplest and most commonly implemented algorithms in the medical field for the classification of thalassemia [8], liver disease [9], and cervical cancer [10]. Meanwhile, the SVM method is one of the algorithms in machine learning that is commonly implemented for classification and regression. Furthermore, it has been implemented by previous studies for the classification of several diseases, including cerebral infarction [11], schizophrenia [12], and breast cancer [13].

In classification, there are various features available on the dataset that represent information about it. This study used numerical data obtained from MRI results, which showed various features that explained the status of tumors in the

2021 International Conference on Decision Aid Sciences and Application (DASA) | 978-1-6654-1634-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/DASA53625.2021.9682341

(2)

brain. However, there may be some less important features that affect the accuracy of the classification results. Therefore, a Genetic Algorithm (GA) was implemented as a feature selection method for KNN and SVM to produce an accurate classification. Furthermore, the GA is a search and optimization method inspired by the natural selection process that is commonly implemented for feature selection. In previous studies, it was implemented for the classification of cardiovascular disease [14] and cancer data [15].

In this study, the modified KNN and SVM methods that have been implemented by GA as feature selection, namely KNN-GA and SVM-GA. The results of the classification performance of the two methods were determined and compared. Afterward, the most optimal in classifying and analyzing brain tumor data will be discovered.

II. RESEARCH METHODS

A. Dataset

This study used data obtained from the Department of Radiology at Dr. RSUPN. Cipto Mangunkusumo (RSCM) from 2013 to 2019. These were numerical results from MRI images of patients with glioblastoma. The data consisted of classes 1 and 0 which represented patients with and without glioblastoma in their brains, respectively. Furthermore, 334 patients were divided equally into the two classes. Also, there were 7 features in this data, namely the size, minimum value, maximum value, average value, the standard error value, the number of acute points, and the length of the tumor area. The glioblastoma dataset is listed in Table 1 below.

TABLE I. GLIOBLASTOMA DATASET Area

(𝒄𝒎𝟐)

Min Max Avg SD Sum Length (cm)

Class 11,99 367 394 380,

5 19,0

9 761 12,39 1

12,9 529 543 536 9,9 1072 12,73 1

12,45 502 753 606, 33

130, 75

1819 12,95 1 12,7 1001 1047 1024 32,5

3

2048 12,83 0

12,9 715 721 718 4,24 1436 12,73 0

B. Machine Learning

Machine learning is an artificial intelligence that trains the system to complete the task automatically through a certain process. The principle of its algorithms is to build mathematical models using various available data to make a decision or prediction correctly. Generally, there are various machine learning techniques used which include supervised learning and unsupervised learning. In supervised learning, each data training has a label. Meanwhile, the data on unsupervised learning does not have a label or target [7]. This study used the supervised learning technique.

C. Feature Selection

Feature selection is the removal of features that are no longer useful and relevant in data to improve the performance of the machine learning algorithms in building models.

Various methods in the feature selection process that are commonly used in machine learning include filter, wrapper, and embedded methods. The filter method works by rating the features before classification is carried out by classifiers to select less relevant features. Furthermore, the wrapper method uses classifiers to evaluate and discover features that maximize the performance of the algorithm. The embedded

method uses the classifiers process to include a selection of features in the training process [16].

D. Support Vector Machine

Support Vector Machine (SVM) is one of the algorithms in machine learning that solves classification problems and was first proposed by Vapnik in 1992. It works by maximizing the margin value, which is the nearest distance between the optimal hyperplane and an input data point [17]. Furthermore, the data points in each class that have the nearest distance to the hyperplane are called support vectors [18]. Figure 1 shows an illustration of SVM.

Fig. 1. An illustration of SVM in finding for the optimal separating hyperplane between two classes [19]

Given a set of n input data {(𝑥!, 𝑦!), (𝑥", 𝑦"), . . . , (𝑥#, 𝑦#)}

or 𝑥$∈ 𝑅% for 𝑗 = 1,2,3, … , 𝑛, each data has a class label denoted as 𝑦$∈ {−1, +1}. SVM aims to form an optimal separating hyperplane that provides a maximum limit between classes or margin which is defined as follows [20].

𝒘&𝒙 + 𝑏 = 0 (1)

where 𝑤 ∈ 𝑅% is a vector that contains the value of the weight which is orthogonal to the hyperplane and 𝑏 ∈ 𝑅% is the bias which is the distance from the origin to the hyperplane.

The problem of SVM optimization is defined as:

𝑚𝑖𝑛!"‖𝒘‖" (2)

𝑠. 𝑡 𝑦$(𝒘&𝒙$+ 𝑏) ≥ 1, 𝑗 = 1,2, . . , 𝑢 (3) The optimization problem in equation (2) is called the primal problem. Furthermore, in forming an optimal hyperplane, there are cases in which the data are not linearly separated. These cases are called misclassification errors which can be solved by adding a variable slack A𝜉$C and a penalty parameter (𝐶 > 0). Therefore, equation (3) can be written as follows:

min I!"‖𝒘‖"+ 𝐶 ∑' 𝜉$

$(! K , 𝑗 = 1,2,3, … , 𝑛 (4) 𝑠. 𝑡 𝑦$A𝒘&𝒙$+ 𝑏C ≥ 1 − 𝜉$, 𝑗 = 1,2,3, … , 𝑛 (5)

𝜉𝑗≥ 0, 𝑗 = 1,2,3, … , 𝑛 (6) This problem can be solved by converting to its dual form and the value of 𝒘 and 𝑏 will be obtained. Furthermore, this can be solved using the Lagrange multiplier as follows:

(3)

𝐿(𝒘, 𝑏, 𝛼) =!"‖𝒘‖"+ 𝐶 ∑'$(!𝜉$− ∑'$(!𝛼$[𝑦$A𝒘&𝒙$+ 𝑏C − 1 + 𝜉$] − ∑' 𝛽$𝜉$

$(!

(7)

To solve nonlinear problems in which the data are not linearly separated in SVM, the kernel function is used with 𝜙(𝑥) to map data in the input space to a higher dimension.

Furthermore, data can be linearly separated in the higher dimension. The kernel function is defined as [21]:

𝐾(𝑥$, 𝑥*) = 𝜙(𝑥$)&𝜙(𝑥*) (8) The values of 𝒘 and 𝑏 obtained using the Lagrange multiplier are substituted to the optimal hyperplane function 𝑓(𝑥) = 𝒘&𝑥 + 𝑏. The decision function equation of SVM using kernel function is defined as:

𝑓(𝑥) = ∑#$(!𝑎$𝑦$𝐾(𝒙$, 𝒙*) + 𝑏 (9) This study used the Radial Basis Function (RBF) kernel with the formula below:

𝐾(𝑥*, 𝑥$) = 𝑒𝑥𝑝( − 𝛾X𝑥*− 𝑥$X"), 𝛾 > 0 (10) E. K-Nearest Neighbor

K-Nearest Neighbor (KNN) is one of the machine learning methods that is often implemented for classification. It was first proposed by Fix and Hodges and later developed by Cover and Hart [22][23]. The principle of the KNN algorithm is to classify unlabeled data based on its nearest neighbor [23].

Furthermore, the KNN predicts the label on the test sample with the majority label from its nearest neighbor or training sample with the nearest distance. Figure 2 is an illustration of KNN.

Fig. 2. Illustration of KNN with data in two-dimensional space using 𝑘 = 5 [24]

In Figure 2.3 above, there are test sample 𝑥𝑢 and three class labels that represent three kinds of colors, namely 𝜔1, 𝜔2, and 𝜔3. The number of neighbors or training samples with the closest distance is 5 𝑘 = 5 with 4 neighbors labeled 𝜔1 and 1 neighbor labeled 𝜔2. The majority of the labels of the 5 neighbors are 𝜔1, therefore the KNN classified sample 𝑥𝑢 as that with a 𝜔1 label.

The KNN method classifies objects based on the distance between samples. In this method, the Euclidean distance is used due to its convenience, efficiency, and productivity [25].

It can be defined as follows [24]:

𝑑(𝒙*, 𝒚*) = ^∑ (𝒙'*(! *− 𝒚*)" (11) F. Genetic Algorithm

Genetic Algorithm (GA) is an algorithm for solving optimization problems first proposed by John Holland in 1975 [26]. The basic principle of GA is to work with a population that consists of various individuals, each representing a possible solution to the problem.

Individuals with the highest fitness value will undergo crossover, which is the process of merging two individuals to obtain a new one. The two individuals that undergo the crossover process are called parents while the new ones are called offspring. Furthermore, the offspring chromosomes will carry out the mutation, which is the process of changing the value of the genes on a chromosome to add genetic diversity to those in the population. After mutation, the next process is elitism, which is the merging of chromosomes in the initial population and chromosomes offspring which are then sorted from the highest to the lowest fitness value [27].

Figure 3 is a flowchart of GA.

Fig. 3. Flowchart of Genetic Algorithm process

In this research, the selection method that used is in this study is the roulette wheel method, which works by placing chromosomes on the roulette wheel. The probability of chromosomes in the inside of roulette wheel is calculated by dividing the fitness value of a chromosome and the total fitness value of all chromosomes which is defined as follows [28]:

𝑃(𝐼') = ((*!)

((*!)

"

!#$ (12)

G. Confusion Matrix

The confusion matrix is a performance measurement in the form of a matrix with a 2𝑥2 size used to calculate the values of accuracy, recall, precision, and f1-score. Table II below is shown a confusion matrix.

TABLE II. CONFUSION MATRIX

Predicted Actual

Positive Negative

Positive True Positive False Positive Negative False Negative True Negative

From the Table II above, there are four values of the model performance. True Negative (TN) represents the total of non- glioblastoma samples that were accurately predicted. True Positive (TP) represents the number of glioblastoma samples that were accurately predicted. False Negative (FN) represents

(4)

the number of glioblastoma samples that were inaccurately predicted. False Positive (FP) represents the number of non- glioblastoma samples that were inaccurately predicted.

The values of accuracy, recall, precision, and f1-score can be calculated from Table II. The higher these values are towards 1, the better the performance of the proposed classification model. The followings are formulas for the accuracy, recall, precision, and f1-score [29].

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = &/0&1

&/02/0210&1 (13)

𝑟𝑒𝑐𝑎𝑙𝑙 =210&/&/ (14) 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =2/0&/&/ (15) 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 =345677×9345*:*;'×"

34567709345*:*;' (16)

III. RESULTS AND ANALYSIS

This study implemented K-Nearest Neighbor-Genetic Algorithm (KNN-GA) and Support Vector Machine-Genetic Algorithm (SVM-GA) in classifying brain tumor data with 70% and 90% training data. The parameters used were 𝑘 = 7 for KNN-GA, 𝐶 = 10, and 𝛾 = 0.01 for SVM-GA with Radial Basis Function (RBF) kernel. Table III and Table IV show the performances of KNN-GA and SVM-GA using 70%

training data, while Table V and Table VI show the performances of KNN-GA and SVM-GA using 90% training data.

TABLE III. THE PERFORMANCES OF KNN-GA WITH 𝑘 = 7USING 70%TRAINING DATA

Number of Features

KNN-GA Accura

cy (%)

Recall (%)

Precisi on (%)

F1-score (%)

The selected features

1 88.22 87.95 88.81 88.21 Avg

2 88.32 88.14 88.20 87.87 Min, Max

3 88.02 75.72 92.23 87.60 Area, Min,

Max

4 86.93 82.42 90.21 86.04 Min, Max,

Avg, Length

5 84.85 82.25 86.97 84.01

Area, Min, Max, Avg, Length 6 90.00 91.02 89.50 90.21

Area, Min, Max, Avg, SD, Length TABLE IV. THE PERFORMANCES OF SVM-GA WITH = 10 AND 𝛾 =

0.01USING 70%TRAINING DATA Number

of Features

SVM-GA Accura

cy (%)

Recall (%)

Precisi on (%)

F1-score (%)

The selected features

1 87.82 88.78 86.76 87.69 Min

2 86.54 87.31 86.83 86.76 Area, Min

3 90.00 89.18 90.04 89.55 Area, Min,

Length

4 88.12 88.38 87.36 87.79 Area, Min,

SD, Length

5 87.43 87.14 87.44 87.19

Area, Max, Avg, SD,

Length 6 91.88 91.05 92.69 91.73

Area, Min, Max, Avg, SD, Length

TABLE V. THE PERFORMANCES OF KNN-GA WITH 𝑘 = 7USING 90%TRAINING DATA

Number of Features

KNN-GA Accura

cy (%) Recall

(%) Precisi

on (%) F1-score

(%) The selected features

1 87.06 85.27 89.70 87.32 Avg

2 86.47 88.28 84.49 86.66 Avg, SD

3 81.18 87.17 77.98 81.08 Avg, SD,

Length

4 81.47 82.98 81.70 80.86 Area, Avg,

SD, Length 5 87.65 88.19 88.56 87.65

Area, Min, Max, Avg, Length

6 87.06 87.52 89.52 87.82

Area, Min, Max, Avg, SD, Length TABLE VI. THE PERFORMANCES OF SVM-GA WITH = 10 AND 𝛾 =

0.01USING 90%TRAINING DATA Number

of Features

SVM-GA Accura

cy (%)

Recall (%)

Precisi on (%)

F1-score (%)

The selected features

1 86.77 85.67 87.54 86.24 Avg

2 87.06 86.51 86.53 86.09 Area, Avg

3 87.94 85.92 91.13 88.25 Area, Avg,

Length

4 87.94 86.75 87.88 87.20 Area, Avg,

SD, Length 5 92.35 93.19 92.62 92.83 Area, Min, Avg, SD,

Length

6 89.71 89.59 90.01 89.70

Area, Min, Avg, SD, Sum, Length

As shown from Table III to Table VI, it is observed that the KNN-GA method produced the best performance with results above 90% only when using 70% training data. The highest accuracy, recall, and f1-score values were 90%, 91.02%, and 90.21% respectively which were obtained using the Area, Min, Max, Avg, SD, and Length features.

Meanwhile, the highest precision value, which is 92.23% was obtained using the Area, Min, and Max features.

For the SVM-GA method, experiments using 70% and 90% training data yielded higher results compared to KNN- GA. Furthermore, the accuracy, recall, precision and f1-score values are 92.35%, 93.19%, 92.62%, and 92.83% respectively which were obtained using 5 features, namely Area, Min, Avg, SD, and Length. The highest accuracy, precision, and f1- score values were obtained using SVM-GA with 90% training data and 5 features, while the highest recall value was obtained using 70% training data and 6 features. Table VII shows the comparison of the performances of KNN-GA and SVM-GA methods using 70% and 90% training data.

TABLE VII. THE COMPARISON OF THE PERFORMANCES OF KNN-GA AND SVM-GAMETHODS USING 70% AND 90%TRAINING DATA Training

Data (%) Methods Accur

acy (%)

Recall (%)

Preci sion (%)

F1-score (%) 70% KNN-GA 90.00 91.02 92.23 90.21

SVM-GA 91.88 91.05 92.69 91.73 90% KNN-GA 87.65 88.28 87.90 87.82 SVM-GA 92.35 93.19 92.62 92.83

(5)

IV. CONCLUSION

In this study, the classification of brain tumors using K- Nearest Neighbor (KNN) and Support Vector Machine methods with Genetic Algorithm (GA) as a feature selection, namely KNN-GA and SVM-GA was discussed. Furthermore, the implementation of KNN-GA and SVM-GA was successfully carried out on glioblastoma data. The parameter for KNN-GA is 𝑘 = 7 and parameters for SVM-GA using RBF kernel are 𝐶 = 10 and 𝛾 = 0.01. This was carried out using 70% and 90% proportion training data.

The results of both KNN-GA and SVM-GA were compared to obtain a final result. From the comparison of the two methods, it was observed that SVM-GA with RBF kernel which used 90% training data and 5 features, namely Area, Min, Avg, SD, and Length have higher results compared to KNN-GA. The accuracy, recall, precision, and f1-score values for SVM-GA were 92.35%, 93.19%, 92.62%, and 92.83%, respectively, while the accuracy, recall, precision, and f1- score values of KNN-GA was 90%, 91.02%, 92.23%, and 90.21% respectively. Therefore, it can be concluded from this comparison that the experiment used 90% training data and 5 features of the SVM-GA kernel RBF method with 𝐶 = 10 and 𝛾 = 0.01 parameters is the best method for classifying glioblastoma data.

For future studies, the use of other feature selection methods or machine learning methods for classifying glioblastoma is suggested. Also, developing KNN-GA and SVM-GA methods and modifying the parameters of these methods is suggested to obtain better results. This study is anticipated to aid the medical industry to diagnose glioblastoma and other illnesses.

ACKNOWLEDGEMENT

This research supported financially by Universitas Indonesia with FMIPA HIBAH 2021 research grant scheme.

REFERENCES

[1] T. Sinha, “Tumors: benign and malignant,” Cancer Therapy &

Oncology International Journal, vol.10, no. 3, pp. 52-54, May 2018.

[2] R. C. Suganthe, G. Revathi, S. Monisha, and R. Pavithran, “Deep Learning Based Brain Tumor Classification Using Magnetic Resonance Imaging,” Journal of Critical Reviews, vol. 7, no. 9, pp.

347-350, May 2020.

[3] S. Kumar, C. Dabas, and S. Godara, “Classification of brain MRI tumor images: a hybrid approach,” Procedia computer science, vol. 122, pp.

510-517, 2017.

[4] F. Bray, et al., “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: a cancer journal for clinicians, vol. 68, no. 6, pp. 394–

424, September 2018.

[5] K. Aldape, et al., “Challenges to curing primary brain tumours,” Nature reviews Clinical oncology, vol. 16, no. 8, pp. 509- 520, August 2019.

[6] K. Bunyaratavej, R. Siwanuwatn, K. Chantra, and S. Khaoroptham,

“Duration of symptoms in brain tumors: influencing factors and its value in predicting malignant tumors,” J Med Assoc Thai, vol. 93, no.

8, pp. 903–910, August 2010.

[7] I. Kononenko and M. Kukar, Machine learning and data mining.

Horwood Publishing, 2007.

[8] T. Siswantining, et al., “Classification of thalassemia data using K- nearest neighbor and Naïve Bayes,” International Journal of Advanced Science and Technology, vol. 28, no. 8, pp. 15-19, October 2019.

[9] N. Khateeb and M. Usman, “Efficient heart disease prediction system using K-nearest neighbor classification technique,” In Proceedings of the International Conference on Big Data and Internet of Thing, pp. 21- 26, December 2017.

[10] M. Sharma, S. K. Singh, P. Agrawal, and V. Madaan, “Classification of clinical dataset of cervical cancer using KNN,” Indian Journal of Science and Technology, vol. 9, no. 28, pp. 1-5, July 2016.

[11] Z. Rustam, Arfiani, and J. Pandelaki, “Cerebral infarction classification using multiple support vector machine with information gain feature selection,” Bulletin of Electrical Engineering and Informatics, vol. 9, no. 4, pp. 1578-1584, August 2019.

[12] T. V. Rampisela and Z. Rustam, “Classification of Schizophrenia Data Using Support Vector Machine (SVM),” J. Phys.: Conf. Ser., vol. 1108, no. 1, December 2018.

[13] C. Aroef, Y. Rivan, and Z. Rustam, “Comparing random forest and support vector machines for breast cancer classification,”

TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, pp. 815–821, April 2020.

[14] S. Nikam, P. Shukla, and M. Shah, “Cardiovascular disease prediction using genetic algorithm and neuro-fuzzy system,” Int. J. Latest Trends Eng. Technol, vol. 8, no. 2, pp. 104-110, 2017.

[15] Z. Rustam, I. Primasari, and D. Widya, “Classification of cancer data based on support vectors machines with feature selection using genetic algorithm and laplacian score,” AIP Conference Proceedings, vol.

2023, October 2018.

[16] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers & Electrical Engineering, vol. 40, no. 1, pp. 16- 28, January 2014.

[17] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273-297, September 1995.

[18] L. Wang, “Support vector machines: theory and applications,” Springer Science & Business Media, vol. 177, 2005.

[19] A. Tharwat, “Parameter investigation of support vector machine classifier with kernel functions,” Knowl Inf Syst, vol. 61, pp. 1269–

1302, February 2019.

[20] C. J. Burges, “A tutorial on support vector machines for pattern recognition,” Data mining and knowledge discovery, vol. 2, no. 2, pp.

121-167, January 1998.

[21] N. Stanevski and D. Tsvetkov, “Using support vector machine as a binary classifier,” International Conference on Computer Systems and Technologies–CompSys Tech, 2005.

[22] E. Fix, and J. L. Hodges Jr, “Discriminatory analysis. Nonparametric discrimination: Consistency properties,” International Statistical Review/Revue Internationale de Statistique, vol. 57, no. 3, 99. 238-247 1989.

[23] T. M. Cover, P. E. Hart, “Nearest neighbor pattern classification,”

IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.

[24] S. Kaghyan and H. Sarukhanyan, “Activity recognition using k-nearest neighbor algorithm on smartphone with tri-axial accelerometer,” International Journal of Informatics Models and Analysis (IJIMA), ITHEA International Scientific Society, vol. 1, pp.

146-156, 2012.

[25] A. Kataria and M. D. Singh, “A Review of Data Classification Using K-Nearest Neighbour Algorithm,” Int. J. Emerging Technol. Adv. Eng, vol. 3, pp. 354–60, June 2013.

[26] J. H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, 1992.

[27] G. Rivera, et al., “Genetic algorithm for scheduling optimization considering heterogeneous containers: A real-world case study,” Axioms, vol. 9, no. 1, p. 27, March 2020.

[28] H. M. Pandey, “Performance evaluation of selection methods of genetic algorithm and network security concerns,” Procedia Computer Science, vol. 78, pp. 13-18, December 2016.

[29] M. Kohl, “Performance Measures in Binary Classification,”

International Journal of Statistics in Medical Research, vol. 1, pp. 79- 81, January 2012.

Referensi

Dokumen terkait

In tests using other software in this research is to use Rapid Miner 3.1 The method used to classify the data of this study using the method of classification Nearest

This study uses the Multinomial Naïve Bayes algorithm, Random Forest, Decision Tree, K-Nearest Neighbor K-NN, and Multilayer Perceptron MLP to identify species of marine mammals and