Study of Classification Method to Detect Coronary Heart Disease Based On Signal Photoplethysmography (PPG)

(1)

Study of Classification Method to Detect Coronary Heart Disease Based On Signal Photoplethysmography (PPG)

Azha Alvin Rahmansyah¹, Satria Mandala^1,*, Miftah Pramudyo²

1 Informatics, Informatics, Telkom University, Bandung, Indonesia

2 Medicine, Department of Cardiology and Vascular, Universitas Padjadjaran, Bandung, Indonesia

Email: ¹[email protected], ²[email protected],³ [email protected] Correspondence Author Email: [email protected]

Abstract−Coronary heart disease (CHD) is one of the deadliest diseases in the world, especially in Indonesia. This disease is caused by the accumulation of fat in blood vessels and can cause heart attacks that can endanger a person's health and safety.

There are several methods for detecting CAD, such as using Electrocardiogram (ECG) signals and Photophlethysmograph (PPG) signals. However, studies that have tested machine learning classification methods to detect CAD using PPG signals are rarely found compared to detection using ECG. This study uses PPG signals taken from smartphone cameras to detect CHD, so that CHD detection is easier and affordable. To be able to diagnose CHD, machine learning assistance is needed to determine whether CHD is positive or negative. This study proposes a classification algorithm study to detect CAD. There are 3 classification methods used in this study. The three methods are KNN, SVM, and decision tree. The final results obtained in this study resulted in the best classification for KNN 81%, SVM 90%, and Decision Tree 90%. Each classification used has been carried out before and after tuning

Keywords: PPG; CAD; KNN; SVM; Decision Tree

1. INTRODUCTION

The heart is one of the most essential organs in the human body. The heart’s main function is to circulate blood containing oxygen throughout the body. Because the heart is a vital human organ, so when someone is exposed to diseases related to the heart, it will be hazardous to health and safety from various kinds of heart disease, one of the deadliest diseases is coronary heart disease (CHD). [1] stated that CAD is one of the most significant factors that cause death and kills as many as 17.9 million people yearly. CHD itself occurs because fat deposits block the heart's blood vessels, called coronary arteries. The more fat accumulates in these blood vessels, it will result in reduced blood flow to the heart and can cause a heart attack. Coronary arteries supply blood to the heart muscle.

When the inner lining of these arteries becomes hard due to calcium deposits, the blood supply to the heart is blocked [2]. However, the application of technology has resulted in a rapid improvement in the quality of healthcare sevices [3], CHD can be detected more efficiently using Photophlethysmograph (PPG) signals and machine learning.

PPG is a simple technique to detect volumetric changes in circulating blood visually. [4] PPG is an optical technique used to distinguish waves of heartbeats that propagate throughout the body. Tissue analysis is performed using a light source and a PhotoPlethysmoGraph (PPG) photodetector signal to estimate pulse rate, blood pressure, blood oxygen levels, hemoglobin and biometric identification, etc. [5]. The way PPG works is by utilizing low- intensity infrared light; when light travels through the body, the light will be absorbed by the bones, skin pigments, as well as veins and arteries. Blood vessels absorb the most light, so when there is a change in blood flow, it can be detected by the PPG signal as a change in light intensity. Besides being simple, PPG signals are also easy to access because using a smartphone you can also use PPG signals. CHD detection can also be performed using an ECG. ECG-based Heart Rate Variability (HRV) analysis is a technique that has become popular among researchers for diagnosing heart disease [6], but its use is more difficult because of the many equipments that must be used.

To find out if someone has CHD, machine learning will be used as a tool for diagnosing CHD. In machine learning, it has several stages, namely Denoising, Selection, and Classification. Each stage has its function, for the classification stage has the process of determining the final result whether the user suffers from CHD or not. In this study, a classification algorithm will be carried out to detect CHD based on PPG signals using a smartphone.

Research conducted by [7] determined the relationship between breathing problems and PPG signals in sleep apnea patients. The study was conducted using 32 of the 34 features found and using several classification algorithms such as the k-nearest neighbor’s classification algorithm, radial basis function neural network, probabilistic neural network, and multilayer feedforward neural network (MLFFNN) and combining several classification methods. MLFFNN achieved the highest accuracy with an accuracy value of 97.07%.

Researchers [8] introduced a new system that is useful for monitoring heart health using PPG signals implanted in smartphones. PPG is an optical-based technique that can be used to estimate blood pressure in certain organ parts. The test was carried out using a smartphone that has a rear camera and LED flash and also has a pretty good accuracy, according to the author, so that it can be developed further

Research conducted by [2] proposes a new system that monitors coronary artery disease (CAD) using PPG using the SVM approach to classify. The data set used comes from the MIMIC-II database containing data on CAD patients. From this study, it can produce a sensitivity value of 85% and a specificity of 75% from all processed datasets, and it can be concluded that PPG can be used to diagnose CAD.

(2)

Research [9] wants to make a comparison of machine learning performance to predict coronary heart disease (CHD). This performance test uses several machine learning methods such as Random Forest, Decision Trees, and K-Nearest Neighbors. The data set used in this trial utilized the “Framingham Heart Study,” which consisted of 4240 medical records. The accuracy of each method is 96.8% for Random Forest, 92.7% for Decision Trees, and 92.89% for K-Nearest Neighbors.

this study [10] tested several machine learning methods to detect CHD. Some of these methods are Na¨ıve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT). The South African disease data set was used in this study, with a total of 462 cases. The accuracy in this study was 0.705, 0.715, and 0.71 for the DT, NB, and SVM methods.

Previous related research will be used as a guide in doing this research. Still, there will be several factors that will be changed, such as the selection of the classification algorithm to be used and several other methods.

Because no one has studied comparing methods for detecting CHD disease using PPG signals, this study conducted a classification to determine which method is the best for detecting CHD using PPG signals. The classification algorithm used in this research is Support Vector Machine (SVM), K-Nearest-Neighbors (KNN), and Decision Tree.The three classification methods will produce an accuracy score which will later be compared and get the best one. In addition, the data we take is also different from previous research because we collect data collectively so that the information we collect is relevant to the system we have built and tested.

2. RESEARCH METHODOLOGY

2.1. Flowchart

Figure 1. Flowchart

The stages of this research can be described in the flowchart in Figure 1. The first stage is data acquisition.

At this stage, the data collection process for the dataset occurs. Then there is the denoising stage, and this stage performs noise repair on the data that has been collected. Then in the next stage, feature extraction is carried out on data that has been clean of noise using the HRV method. Furthermore, the features will be selected using the Pearson Correlation method at the feature selection stage. After the previous process is complete, the classification stage will be carried out. At this stage, the data that is clean from noise and get the best features will be processed in the classification stage to get accuracy, and there will be 3 classifications used, namely SVM, KNN, and Decision Tree. The last stage is testing the model of classification using metrics.

2.2. Data Acquisition

Data acquisition was carried out at RSAU Dr. M. SALAMUN Bandung regency. Data retrieval using a smartphone application that we have designed to be able to retrieve PPG data. after the data is taken and then sent to the database using security data using obs as in research [11]. When taking data the fingers must be right on the camera, and the cellphone flash must be on. The flash is used as a light source so that the camera can record heart rate data. The data we used in this study consisted of two groups: the healthy group and the group of patients with a history of CAD. There are 30 data in the healthy group and 28 in the infected patient group. So there is a total of 58 data present in our dataset.

(3)

2.3. Denoising

The initial stage in this research was denoising the previous dataset that we had collected, but the data we took had a problem, namely noise. Noise is one of the problems that cause a decrease in the quality of the information signal [12]. The labeled dataset will then be denoised using a filter. One of the filters that can be used to minimize noise is the Finite Impulse Response (FIR) filter [12]. to get a better signal to facilitate feature extraction. In our research, we use an FIR filter for the denoising method.

2.4. Feature Extraction

After performing the denoising process, the following process is feature extraction. Extraction is done to get the characteristics or features of the dataset that we already have. The method we use to get featured in this process is the HRV feature algorithm. HRV measures the peak-to-peak features at regular intervals (NN intervals) present in the signal [13].

2.5. Feature Selection

After getting all the features from the dataset, the next step is feature selection. Feature selection is used to find which features are the best and most influential because sometimes the features obtained are irrelevant and affect the model’s performance. The method used in our feature selection is the Pearson Correlation algorithm. Pearson correlation is the most commonly used numerical method. This method assigns a value of -1 to 1 to carry out its duties where -1 is a total negative correlation, 0 has no correlation at all, and 1 provides a real positive correlation [14].

2.6. Classification

2.6.1. K-Nearest-Neighbours (KNN)

The k-Nearest-Neighbours (kNN) is a non-parametric classification method that is simple but effective in solving many cases. [15]. KNN classifier has high performance on large and almost infinite data samples, where the error rate is almost the same as Bayes optimization under light conditions. However, this method has many weaknesses, such as choosing the value of k, choosing the size of the distance, and so on, which affect the performance of the knn itself [16]. But the biggest problem with knn is in the value of k because it can cause bias, there are many to choose the value of k, but the simplest way is to run the knn algorithm many times with different values of k to get the most optimal value of k [15]. ECG-based prediction algorithm for imminent malignant ventricular arrhythmias using decision tree [17]. The accuracy of the KNN method is also influenced by the calculation of the Euclidean distance, which will be shown in the following formula:

𝐷(𝑥, 𝑦) = √∑^𝑛_𝑘−1(𝑥_𝑘− 𝑦_𝑘)² (1)

2.6.2. Decision Tree

Decision tree classification classifies data into an inverted tree-like form consisting of root nodes, internal nodes, and leaf nodes [18]. Where each internal node shows tests on attributes, each branch represents the test results, and each leaf node holds a class label, and also the root node is the beginning of a tree that represents population data. [19]. The basic idea of the decision tree is to separate the data homogeneously. The more pure the branches we have, the better the decision tree classification will be in terms of the performance of the decision tree [20]. in the decision tree, there is an impurity which is the level of impurity of the attributes that are assessed using entropy.

While choosing which attribute to use using information gain, the greater the information gain of an attribute, the selected attribute. This method is one of the popular algorithms to be used in various studies because it has many advantages, such as being easy to use, free from ambiguity, and still strong when some data is missing. [18]. and also decision trees are widely used for disease detection, such as diagnosing breast tumors on ultrasound images, ovarian cancer, and heart sound diagnosis [19]. The following is the formula for entropy and information gain:

𝐸(𝑆) = ∑^𝑐_𝑖=1𝑝_𝑖log₂𝑝₁ (2)

𝐺(𝑆, 𝐴) = 𝐸(𝑆) − ∑^𝑐_𝑖=1^|𝑆_|𝑆|^𝑖^| 𝐸(𝑆_𝑖) (3)

2.6.3. Support Vector Machine (SVM)

A support vector machine (SVM) is a computer algorithm that studies examples to assign a label to an object [21].

SVM is a reasonably powerful method for constructing a classification. The main purpose of SVM is to separate two classes using a constraint that allows predicting the label of one or more features [22]. This constraint is called a hyperplane. This constraint is usually located far from the data points of each class, while the data points closest to the hyperplane are called support vectors [22]. In several previous studies, the SVM classification has also been successful in developing medical applications, the most common of which is that SVM can diagnose or prognosis tumors by examining gene expression profiles obtained from tumor samples or peripheral fluids. [21]. However, some things can interfere with the performance of this SVM algorithm in the classification process, namely the

(4)

selection of the kernel. Kernel function selection is an essential factor in influencing the performance of the SVM model, no one knows which kernel type is suitable for recognizing a particular pattern. The only way to find out which kernel is the most suitable is to test each kernel to find out which kernel is the best [22]. Several types of kernels are commonly used, namely in the form of linear kernels, Radian Basis Function (RBF) kernels, and sigmoid kernels with the following kernel formulas that can be used:

𝑘(𝑥_𝑗, 𝑥_𝑘) = (𝑥_𝑗 . 𝑥_𝑘+ 1)^𝑑 (4)

𝑘(𝑥, 𝑦) = 𝑡𝑎𝑛ℎ(𝑎𝑥^𝑡𝑦 + 𝑐) (5)

𝑘(𝑥_𝑗, 𝑥_𝑘) = exp(−𝑦||𝑥_𝑗− 𝑥_𝑘||²) , 𝑦 > 0 (6)

2.7. Test Metrics

Test metrics are one of the fundamentals in machine learning, and test metrics are used to determine the accuracy of a model by comparing the predicted value with the actual value [23]. In our research, we will use test metrics to assess the value of each model to determine which one is the best. The test metric has 3 factors: accuracy, sensitivity, and specificity.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ^{𝑇𝑃+𝑇𝑁}

𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁 𝑥 100% (7)

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = ^𝑇𝑁

𝑇𝑁+𝐹𝑃 𝑥 100% (8)

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = ^𝑇𝑃

𝑇𝑃+𝐹𝑁 𝑥 100% (9)

Where True Positive (TP) is the correct value successfully predicted by the model in the positive class.

True Negative (TN) is the correct value successfully predicted by the model in the negative class. False Positive (FP) is a false value that is successfully predicted by the model that is in the positive class. False Negative (FN) is a false value that is successfully predicted by the model that is in the negative class.

2.8. Scenario

At this stage, the data that has been processed in the preprocessing stage will undergo several scenarios related to the classification test with 3 different algorithms. The scenario carried out aims to find out which classification algorithm is better, either after tuning the algorithm or before tuning. Some of these scenarios can be done as follows:

1. Classification using the KNN algorithm 2 times, namely before tuning and after tuning on the KNN classification algorithm.

2. Classification using the Decision Tree algorithm 2 times, namely before tuning and after tuning on the Decision Tree classification algorithm

3. Classification using the SVM algorithm 2 times, namely before tuning and after tuning on the SVM classification algorithm

3. RESULT AND DISCUSSION

3.1 Implementation

This section will describe the research carried out.

3.1.1 Data Acquisition

Figure 2. Signal PPG Acquisition

(5)

In Figure 2 the left side shows the appearance of the application that we made in order to be able to get PPG signals from patients. In comparison, the right shows the data collection process.

3.1.2 Denoising

Figure 3. Sample of PPG Signal

Figure 4. Sample of Denoised PPG signal

Figure 3 shows an example of data that has not been denoised, and Figure 4 shows a PPG data signal that has been denoised. It can be seen in Figure 3, which is the result of the rougher PPG signal plot compared to Figure 4. The results of the signal plot in Figure 4 show a smoother signal shape and maintain the signal shape as best as possible so that the program is better at reading the signal.

3.1.3 Feature Extraction

The following features were obtained from the feature extraction stage using the HRV Features Algorithm on the PPG signal. Table 1 contains data labeled healthy, and table 2 contains data on sick patients. The features that can be obtained are Shannon Entropy (ESH), Mean Absolute Deviation (MAD), Skewness (Skew), Kurtosis (Kurt), Very Low-Frequency (VLF), Low-Frequency (LF), and High-Frequency (HF).

Table 1. HRV Features on Healthy Subject Feature Value

ESH 12.67

MAD 0.132

Skewness 0.215 Kurtosis -0.012

VLF 0.0023

LF 0.000064

HF 8.816.926

Table 2. HRV Features on Healthy Subject Feature Value

ESH 13.06

MAD 0.161

Skewness -0.426 Kurtosis -0.106

VLF 0.0062

LF 0.000078

(6)

Feature Value

HF 1.744.376

3.1.4 Feature Selection

Figure 5. Feature Selection

Figure 5 above shows the correlation coefficient between features. After determining the threshold, features with high correlation are selected. The threshold is 0.5, eliminating features with a correlation higher than 0.5.

Figure 5 shows that the correlation value between esh and VLF features is 0.58, so one of them is omitted because it exceeds the threshold. The selection process is repeated for each feature. The final result of feature selection in HF, KURT, LF, and SKEW is discarded. So what is left is the MAD, ESH, and VLF features that will be used.

3.1.5 Classification

The following are the research results at the classification stage, which has several scenarios and has also been tested using a test matrix.

a. Scenario 1 (KNN Classification)

In scenario 1, a test will be carried out by comparing the KNN classification algorithm before and after tuning.

In table 3, it will be shown how many scores the KNN algorithm has after we have tested it on the test metric.

Table 3. KNN Classification Accuracy Results Status Accuracy Sensitivity Specificity

Before 63% 75% 57%

After 81% 85% 85%

It can be seen in table 3 that the data shows a fairly significant increase in accuracy, sensitivity, and specificity after tuning the KNN classification algorithm. KNN classification tuning is done by running the program several times so as to get the best neighbor value.

b. Scenario 2 (Decision Tree Classification)

In scenario 2, a test will be carried out by comparing the Decision Tree classification algorithm before and after tuning. Table 4, it will be shown how many scores the Decision Tree algorithm has after we have tested it on the test metric.

Table 4. Decision Tree Classification Accuracy Results Status Accuracy Sensitivity Specificity

Before 90% 100% 85%

After 81% 100% 71%

Can be seen in the data, table 4 shows a fairly significant decrease in accuracy, and specificity. While the score sensitivity is the same after tuning the Decision Tree classification algorithm

c. Scenario 3 (SVM Classification)

In scenario 3, a test will be carried out by comparing the SVM classification algorithm before and after tuning.

Table 5 will show how many scores are the SVM algorithm after we tested on the test metrics. In the tuning process, we use a library from Sklean called GridSearch. This library aims to show the best C parameters, kernel parameters, and gamma parameters from the SVM algorithm for the data we tested.

Table 5. SVM Classification Accuracy Results Status Accuracy Sensitivity Specificity

Before 63% 75% 57%

After 90% 75% 100%

(7)

As can be seen in the data, table 5 shows a fairly significant increase in the number of scores on the accuracy, and specificity. While on the sensitivity, the resulting score will be the same after tuning the SVM classification algorithm. In the tuning process, we use a library from Sklean called GridSearchCV, this library aims to show the best parameters of a model. because, at this time we use a decision tree, the resulting parameter is only the max-depth parameter.

3.1.6 Results

Figure 6. Accuracy Results Before Tuning

Based on Figure 6, which is blue in color, it can be seen that the comparison of the accuracy of the 3 classification methods before tuning is seen here. The decision tree method has the highest score of 90%, followed by SVM and KNN with a score of 63%. While in figure 6, which is orange, it can be seen that after undergoing tuning, The classification method that has a significant increase is KNN and SVM, with SVM reaching a score of 90%, and KNN has an increase to 81%, but the decision tree has decreased in classification performance to 81%

Table 6. Accuracy Improvement Table Metode Before After Increase

SVM 63% 90% 27%

KNN 63% 81% 18%

Decision

Tree 90 81% -9%

Table 6 shows how much performance increases in accuracy against each method after tuning, namely, SVM has an increase of 27%, KNN has an increase of 18%, and Desicion Tree has decreased by -9%.

Figures 8 and 9 show the overall performance of the 3 classifications in terms of accuracy, sensitivity, and specificity.

Figure 8. Performance Comparison Before Tuning

Figure 9. Performance Comparison After Tuning 63%

90%

63%

81%

90%

0% 20% 40% 60% 80% 100%

KNN Decision Tree SVM

After Before

63% 75%

57%

90% 100%

63% 75% 85%

57%

0%

50%

100%

150%

Accuracy Sensitivity Specificity

81% 81% 85%100% 85%

90% 71%

75% 100%

0%

50%

100%

150%

Accuracy Sensitivity Specificity

(8)

Figure 8 shows that the performance before tuning the decision tree classification method has a very good performance compared to other methods, as shown in the graph with a score of 90% for accuracy, 100% for sensitivity, and 85% for specificity. While the SVM and KNN classifications have the same score, namely 63%

for accuracy, 75% for sensitivity, and 57% for specificity. While Figure 9 shows the overall performance around tuning on the classification that has been implemented, showing that the current performance is the best compared to other classification methods. SVM got a score of 90% for accuracy, 75% for sensitivity, and 100% for specificity. For KNN, get 81% for accuracy, 85% for sensitivity, and 85% for specificity. While the decision tree has a score of 81% for accuracy, 100% for sensitivity, and 71% for specificity

4. CONCLUSION

The three classification methods can handle PPG signal data. They can determine which data are labeled "sick patients" and "healthy patients," but there must be several factors that must be met for the SVM and KNN methods to maximize performance. By using tuning, the performance of SVM increased up to 27% in terms of accuracy, and 43% in terms of specificity. The use of the KNN classification also experienced an increase after tuning, with performance accuracy increasing by 18%, 10% from sensitivity, and 27% from specificity. As for the Decision Tree, it has maximum performance before tuning. The performance of the decision tree before tuning is more or less the same as the performance of the SVM classification.

REFERENCES

[1] World Health Organization, “Cardiovascular diseases,” World Health Organization, 2022. [Online]. Available:

https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_3. [Diakses 22 07 2022].

[2] N. Paradkar dan S. R. Chowdhury, “Coronary artery disease detection using photoplethysmography,” International Conference of the IEEE Engineering, vol. 39, no. Medicine and Biology Society (EMBC), pp. 100-103, 2017.

[3] S. Mandala, S. N. Anggis dan M. S. Mubarok, “Energy efficient IoT thermometer based on fuzzy logic for fever monitoring,” International Conference on Information and Communication Technology (ICoIC7), vol. 5, pp. 1-6, 2017.

[4] P. Pal, S. Ghosh, B. P. Chattopadhyay, K. K. Saha dan M. Mahadevappa, “Screening of Ischemic Heart Disease based on PPG Signals using Machine Learning Techniques,” International Conference of the IEEE Engineering, vol. 42nd, no.

Medicine & Biology Society (EMBC), pp. 5980-5983, 2020.

[5] S. A. Siddiqui, Y. Zhang, Z. Feng dan A. Kos, “A Pulse Rate Estimation Algorithm Using PPG and Smartphone Camera,”

Journal of Medical Systems, vol. 40, no. 5, pp. 1-6, 2016.

[6] S. Satheeskumaran, C. Venkatesan dan S. Saravanan, “Real-time ECG signal pre-processing and neuro fuzzy-based CHD risk prediction,” Int. J. Comput. Sci. Eng, vol. 24, no. 4, pp. 323-330, 2021.

[7] M. K. Uçar, M. R. Bozkurt, C. Bilgin dan K. Polat, “Automatic detection of respiratory arrests in osa patients using ppg and machine learning techniques,” Neural Computing and Applications, vol. 28, no. 10, pp. 2931-2945, 2017.

[8] S. Sukaphat, S. Nanthachaiporn, K. Upphaccha dan P. Tantipatrakul, “Heart rate measurement on Android platform,”

International Conference on Electrical Engineering/Electronics, vol. 13, no. Computer, Telecommunications and Information Technology (ECTI-CON), pp. 1-5, 2016.

[9] D. Krishnani, A. Kumari, A. Dewangan, A. Singh dan N. S. Naik, “Prediction of Coronary Heart Disease using Supervised Machine Learning Algorithms,” TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), pp. 367- 372, 2019.

[10] A. H. Gonsalves, F. Thabtah, R. M. A. Mohammad dan G. Singh, “Prediction of coronary heart disease using machine learning: an experimental analysis,” International Conference on Deep Learning Technologies (ICDLT 2019), vol. 3rd, pp. 52-56, 2019.

[11] Y. Coulibaly, A. I. Al-Kilany, M. S. Abd Latiff, G. Rouska, S. Mandala dan M. A. Razzaque, “Secure burst control packet scheme for Optical Burst Switching networks,” IEEE International Broadband and Photonics Conference (IBP), pp. 86-91, 2015.

[12] Dimurtadha, “Analisis Filter Finite Impulse Response (FIR) pada,” Seminar Nasional dan Expo Teknik Elektro, pp. 101- 104, 2019.

[13] M. F. Ihsan, S. Mandala dan M. Pramudyo, “Study of Feature Extraction Algorithms on Photoplethysmography (PPG) Signals to Detect Coronary Heart Disease,” International Conference on Data Science and Its Applications (ICoDSA), pp. 300-304, 2022.

[14] D. Nettleton, “Selection of Variables and Factor Derivation,” Commercial Data Mining, pp. 79-104, 2014.

[15] G. Guo, H. Wang, D. Bell dan K. Greer, “KNN Model-Based Approach in Classification,” On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, pp. 986-996, 2003.

[16] S. Zhang, X. Li, M. Zong, X. Zhu dan D. Cheng, “Learning k for kNN Classification,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 3, pp. 1-19, 2017.

[17] S. Mandala, D. T. Cai dan M. S. Sunar, “ECG-based prediction algorithm for imminent malignant ventricular arrhythmias using decision tree.,” Plos One, vol. 15, no. 5, p. e0231635, 2015.

(9)

[18] Y. Song dan Y. Lu, “applications for classification and prediction. Shanghai Arch Psychiatry,” Shanghai archives of psychiatry, vol. 27, no. 2, p. 130, 2015.

[19] D. Lavanya dan K. U. Rani, “Performance Evaluation of Decision Tree Classifiers on Medical Datasets,” International Journal of Computer Applications, vol. 26, no. 4, pp. 1-4, 2011.

[20] B. Deshpande, Decision Tree Digest – An eBook Understand, build and use decision trees for common business problems with RapidMiner, SimaFore, 2015.

[21] W. S. Noble, “What is a support vector machine?,” Nature biotechnology, vol. 24, no. 12, pp. 1565-1567, 2006.

[22] S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang dan W. Xu, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer genomics & proteomics, vol. 15, no. 1, pp. 41-45, 2018.

[23] G. Zeng, “On the confusion matrix in credit scoring and its analytical properties,” Communications in Statistics-Theory and Methods, vol. 49, no. 9, pp. 2080-2093, 2020.

[24] R. Banerjee, S. Bhattacharya dan S. Alam, “Time Series and Morphological Feature Extraction for Classifying Coronary Artery Disease from Photoplethysmogram,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 950-954, 2018.