LOESS Smoothing of Naive Bayes Classifier For Highly Dependent Medical Data
IV. E XPERIMENT R ESULTS
To analyze how the NB classifier performs on our ECG data, we conducted 3 experimets: one to measure the alleviation of zero probability problem and the two other is to measure the alleviation of continous attributes problem. The first experiments measures the effect of changing prior and conditional probabilities from simple relative frequency count to Laplace estimate correction. The second experiment measures the effect of changing the number of points sample in LOESS. Whereas the third experiment measures the effect of changing the LOESS window size.
The classification result are measured as Classification Accuracy (CA) and area under ROC curve (AUC). While the CA measures the ability of classifier to recognize a pattern (the range is 0% up to 100%), the AUC measures how the classifier performs to false-positive and false negative input (the range is 0.0 to 1.0). AUC=1.0 means no false positive or false negative classified by NB.
A. The effect of Laplace estimate
We measure how changing the prior and conditional probabilities from relative frequency to Laplace estimate affects the classifier for fixed LOESS parameters (fixed window size and number of samples). First we observe how if we change the prior probability from relative frequency to Laplace. The result can be viewed in Table 1 and 2.
From Table 1, we see that Laplace estimate correction on prior probability has little impact on improving the system. In average, the increase on classification accuracy is only 1.13% while the increas on AUC is only 0.0023. And, from Table 2, we see that Laplace estimate correction on conditional probability has actually negative impact on classification accuracy. In average the increase on classification accuracy is -0.095% while the AUC increase is only 0.034.
Hence from the first experiment, we conclude that introducing Laplace correction either has little or negative impact on our data.
TABLE1
CLASSIFICATION RESULTS AFTER CORRECTING PRIOR FROM RELATIVE FREQUENCY (RF) TO LAPLACE (L) LOESS
WS LOESS
NS Prior Cond. CA AUC 0.25 75 RF RF 65.22% 0.7439
0.25 75 L R F
∆
65.36% 0.7439 0.14% 0.0000
0.50 75 RF R F
F
∆
66.31% 0.7372
0.50 75 L R 66.29% 0.7370
-0.02% -
0.75 75 RF R F
F
∆
66.31% 0.7372
0.75 75 L R 67.75% 0.7452
1.44% 0.0080
0.25 100 RF R F
F
∆
66.31% 0.7372
0.25 100 L R 65.35% 0.7438
-0.96% 0.0066
0.50 100 RF R F
F
∆
78.96% 0.8725
0.50 100 L R 79.01% 0.8721
0.05% -
0.75 100 RF R F
F
∆
79.19% 0.8778
0.75 100 L R 79.22% 0.8778
0.03% 0.0000 avg ∆ 1.13% 0.0023
TABLE2
CLASSIFICATION RESULTS AFTER CORRECTING CONDITIONAL FROM RELATIVE FREQUENCY (RF) TO LAPLACE (L)
LOESS WS
LOESS
NS Prior Cond. CA AUC 0.25 75 RF R F 65.22% 0.7439 0.25 75 RF L
∆
65.22% 0.7439 0.00% 0.0000
0.50 75 RF R F
L
∆
66.31% 0.7372
0.50 75 RF 65.22% 0.7370
-1.09% -
0.75 75 RF R F
L
∆
66.31% 0.7372
0.75 75 RF 67.75% 0.7452
1.40% 0.0080
0.25 100 RF R F
L
∆
66.31% 0.7372
0.25 100 RF 65.35% 0.7438
-0.96% 0.0066
0.50 100 RF R F
L
∆
78.96% 0.8725
0.50 100 RF 79.01% 0.8721
0.05% -
0.75 100 RF R F
L
79.19% 0.8778
0.75 100 RF 79.22% 0.8778
∆ 0.03% 0.0000 avg ∆ -0.095% 0.0034
B. The effect of LOESS Number of Samples
In our second experiment, we varying the LOESS number of samples for fixed window sizes. The result is shown in Table 3. On top of the table we put the classification result when LOESS number of samples and window size are 0, i.e., no LOESS smoothing case, as reference from our observation (but is not included in our calculation). We can see that overall gain obtained by increasing the number of point samples in LOESS is about 2.31%. The AUC increase is 0.0222.
TABLE3
CLASSIFICATION RESULTS AFTER INCREASING THE LOESSN SAMPLES.
(RF:RELATIVE FREQUENCY,L:LAPLACE) LOESS
WS
LOESS
NS Prior Cond. CA AUC
0 0 RF RF 64.63% 0.7251
0.25 75 RF RF 65.22% 0.7439 0.25 100 RF RF
avg ∆
66.31% 0.7372 0.25 150 RF RF 66.31% 0.7372
0.54% - 0.50 75 RF RF 66.31% 0.7372 0.50 100 RF RF
avg ∆
78.96% 0.8725 0.50 150 RF R F 66.24% 0.7369
-0.03% - 0.75 75 RF RF 66.31% 0.7372 0.75 100 RF RF
avg ∆
79.19% 0.8778 0.75 150 RF RF 79.16% 0.8777
6.43% 0.0702
Gain 2.31% 0.0222
C. The effect of LOESS Window Size
In our third experiment, we varying the LOESS window size for fixed number of samples. The result is shown in Table 4. On top of the table we also put the classification result when LOESS number of samples and window size are 0, i.e., no LOESS smoothing case. We can see that overall gain obtained by increasing the window size in LOESS is about 4.47%. The AUC increase is only 0.0457.
TABLE4
CLASSIFICATION RESULTS AFTER INCREASING LOESSWINDOW SIZE
(RF:RELATIVE FREQUENCY,L:LAPLACE) LOESS
WS
LOESS
NS Prior Cond. CA AUC
0 0 RF RF 64.63% 0.7251
0.25 75 RF RF 65.22% 0.7439
0.50 75 RF RF
avg ∆
66.31% 0.7372
0.75 75 RF RF 66.31% 0.7372
0.54% - 0.25 100 RF RF 66.31% 0.7372 0.50 100 RF RF
avg ∆
78.96% 0.8725 0.75 100 RF RF 79.19% 0.8778
6.44% 0.0703 0.25 150 RF RF 66.31% 0.7372 0.50 150 RF RF
avg ∆
79.19% 0.7369 0.75 150 RF RF 79.16% 0.8777
6.43% 0.0702
Gain 4.47% 0.0457
V. CONCLUSION AND FUTURE WORKS
From our experiments we conclude that varying the LOESS window size has the most significant impact that gives increase or gain on classification accuracy for 4.47% and AUC increase 0.0475. The increase of 4.47% where highly critical decision such as for many
medical application is worth. The best classification accuracy is 79.22% and the best ACU is 0.8778. This is obtained by 0.75 window size, 100 number of samples of LOESS, and set the Laplace correction on Prio probability.
We also performed the confusion matrix analysis the best experimental result. The correctly classified category accuracy is given on the diagonal of the matrix. From the confusion matrix, we can infer that the most difficult problem to be solved in this work is to detect the moderate apnea patient (B) which is located on the gray area between the apnea (A) and healthy patient (N). This result means that B pattern is closedly related simultaneously to the A and N categories. This fact is also caused by the fact that the B category has the fewest number of records; 5 persons compared to 20 persons for class A and 10 persons for class N.
Correct
Prediction
A B N
A 71.7% 0.5% 27.8%
B 39.3% 28.2% 32.5%
84.9%
N 14.2% 0.9%
Fig. 2. Confusion matrix of the best parameter combination of Naive Bayesian classification in this work.
Currently, with the information available on the Internet, it is possible to acquire data sets (pools) of virtually unlimited sizes. We noted that the Naive Bayes Classifier is reasonably effective to learn on a complicated decision using only limited training set.
Such classifier is a good candidate to be used in our future works to make an online incremental learning services of automatic apnea detection.
REFERENCES
[1] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Machine Learning, 1997. 29(2-3): pp.
131-163.
[2] I. Rish, “An Empirical Study of the Naïve Bayes Classifier,”
In: Proceedings of the Int. Joint Conf. on Artificial Intelligence, Workshop on “Empirical Methods in AI”, 2001.
[3] E. Frank, M. Hall, and B. Pfahringer, “Locally Weighted Naive Bayes,” In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2003: pp. 249–256.
Morgan Kaufmann.
[4] Y. Ji and L. Shang, “RoughTree: A Classifier with Naïve- Bayes and Rough Sets Hybrid in Decision Tree Representation,” 2007 IEEE International Conference on Granular Computing, 2007: pp. 221-226.
[5] S.B. Kotsiantis, I.D. Zaharakis, and P.E. Pintelas,
“MachineLearning: A Review of Classification and Combining Techniques,” Artificial Intelligence Review, 2006.
26(3): pp. 159-190.
[6] B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Data-Centric Systems and Applications.
2007: Springer.
[7] R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. 1996: pp. 202–207.
[8] I. Zelič, I. Kononenko, N. Lavrač, and V. Vuga, “Induction ofDecision Trees and Bayesian Classification Applied to Diagnosis of Sport Injuries,” Journal of Medical Systems, 1997. 21(6): pp. 429 – 444.
[9] R. Abraham, J.B. Simha, and S.S. Iyengar, “Medical Datamining with a New Algorithm for Feature Selection and Naïve Bayesian Classifier,” in ICIT. 2007: IEEE Computer Society, pp.44-49.
[10] Z. Xie, W. Hsu, Z. Liu, and M. Lee, “SNNB: A Selective Neighborhood Based Naive Bayes for Lazy Learning,”
Lecture Notes in Computer Science, 2002: pp. 104–114.
[11] P. De Chazal, C.Heneghan, R.B.Reilly, “Automatic Sleep Apnoea Detection using Measures of Amplitude and Heart Rate Variability from the Electrocardiogram (Published Conference Proceedings style),” in Proc. 16th International Conference on Pattern Recognition, August 2002.
[12] R.W.DeBoer, J.M.Karemaker, J.Strackee, “Comparing spectra of a series of point events particularly for heart rate variability data,” IEEE Trans. Biomed. Eng., vol BME-31, pp.384-387, 1984.
[13] P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” Machine Learning, 1997. 29(2-3): pp. 103-130.
[14] M.C.Teich, S.B.Lowen, B.M.Jost, K.Vibe-Rheymer, C.Heneghan, “Heart rate variability: measures and models,” in Nonlinear Biomedical Signal Processing, vol.II, M.Akay, Ed.
Piscataway, NJ: IEEE Press, 2000.
[15] Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology,
“Heart rate variability – standards of measurement, physiological interpretation and clinical use,” Euro. Heart J., vol 17, pp. 354-382, 1996.
[16] C.Ratanamahatana, D.Gunopulos, “Scaling up the Naive Bayesian classifier using decision trees for feature selection,”
unpublished.
[17] M.Wiggins, A.Saad, B.Litt, and G.Vachtsevanos, “Evolving a Bayesian classifier for ECG-Based Age Classification in Medical Applications,” unpublished.
[18] C.M.Bishop, Pattern Recognition and Machine Learning. UK:
Springer, 2007, pp.380-383.
[19] R.Stockute and P.Johnson, “Laplace distribution”, unpublished).
[20] W.H.Kruskal and J.M.Tanur, International Encyclopedia of Statistics Volume 1. NY: The Free Press, 1978, pp.9-15, 155- 180, 493-496.
Heart Beat Classification Using Wavelet Feature
I Made Agus Setiawan∗‡, Elly Matul I.†‡, Nulad W.P.‡, P. Mursanto‡ and Wisnu Jatmiko‡
∗ Computer Science Department, Udayana University,[email protected]
†Mathematic Department, State University of Surabaya
‡Faculty of Computer Science,Universitas Indonesia,[email protected]
Abstract—Type of arrhythmia through classification process has been determined using a computerized sys- tem. Arrhythmia or Cardiac Arrhythmia is one of heart disease type that can be diagnosed by a standard electro- cardiogram (ECG). By means of an electrocardiogram, doctors can analyze the electrical activity of the heart and determine the type of arrhythmia currently suffered.
Computerized process is divided into three steps: data preprocessing, feature extraction and classification. In preprocessing step, beat by beat signal is cut using pivot R peak. Wavelet algorithm is applied for feature extraction and selection. ECG signal is then classified into six classes: fVN, LBBB, NOR, RBBB, PVC, APC by using two algorithms, i.e. Back-Propagation and Fuzzy Neuro Learning Vector Quantization (FNLVQ).
10-Hold-Out Cross Validation was applied to verify the system. First experiment shows that the best number of features is 86 by considering the computational cost and classification result. Second experiment shows that various cross validation schemes produce an average accuracy of 87.15% and 99.57% for Back-Propagation and FNLVQ respectively.
Index Terms—ECG, Electrocardiogram, Arrhythmia, FNLVQ, back-propagation, wavelet transforms, k-hold- out cross validation, physionet, MIT-BIH.
I. INTRODUCTION
Arrhythmia or Cardiac Arrhythmia is an abnormal heart rhythm or irregular heartbeat. It is one of heart disease type. In an arrhythmia, the heartbeat maybe too slow, too rapid, too irregular or too early. Some patients are unaware from that kind of conditions, whereas some others have complaints of symptom including palpitations, a feeling or vibration heart leap, dizziness, shortness of breath or chest pain.
Some normal people also have the same feeling like palpitations, but they are not arrhythmia. Therefore it is not enough to diagnose arrhythmia only from that kind of symptom.
There are several techniques can be used to diagnose arrhythmias including a standard electrocardiogram (ECG) , Blood and urine tests, Holter Monitoring, electro-physiology studies (EPS), Event Recorder, an echo-cardiogram, Chest X-Ray, Tilt-table test ([1][2]).
Using ECG is a common and the best way for diagnos- ing arrhythmias. Doctors analyze the electrical activity of heart through ECG signal and determine occurrence of arrhythmias. In this research, we will study on how to determine the type of arrhythmia based on the ECG
signal. Instead of using manual way using specific expertise like doctors, we use computerize technique based on the pattern contained in the ECG signal.
Various study have been done already for clas- sification of various arrhytmias. There are a lot of works applying artifical neural network (ANN) and it’s variant as a detection method ([3],[4],[5]) and some of them are combining wavelet transform (WT) or Principal Component Analysis (PCA) or Fuzzy C- Mean (FCM) with ANN or LVQ-NN for classifing the signal([6],[7],[8],[9]), and applying bayesian frame- work [10]. There also researcher applying fuzzy the- ory on arrhytmia detection([11],[12],[6],[13]). Some of them also applying Support Vector Machine as a classifier ([14],[15]) and combining with Genetic Al- gorithm, like Nasiri doing [16] or combining with Par- ticle Swarm Optimization (PSO) like Melgani works [17]. Ghongade et.al make a comparation for many feature extraction method like DFT, PCA, DWT, Mor- phological based and integrating it with ANN classifier [18]. Philip et.al studies the arrhythmia classification using AAMI standard and apply the morphological feature using linier discriminant (LD) [19].
In this study, we will utilize back-propagation NN and Fuzzy Neuro Learning Vector Quantization (FN- LVQ) as our classifier and make some comparation at the end. To support this study, we use MIT-BIH ar- rhythmia database provided online [20] as our dataset.
This paper is organized as follows. In section II, we describe preprocessing technique to extract signal in beat basis. Discrete wavelet transformation is used to extract the feature contain in each beat signal in section III. Each beat will be grouped according to the wave pattern that we already define through classification which will be discussed in section IV and section V&VI contain the result and conclusions of this paper and future plans of our study.
II. DATAPREPROCESSING
In this research, we use MIT-BIH arrhythmia database from physionet [20]. This database contains 48 recordings from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Each record contains two 30-min ECG lead signal, mostly
For this research, we only use the MLII lead as our source data. the groups/classes that we will to consider in this research are Fusion of ventricular and normal beat (fVN), Left Bundle Branch Block beat (LBBB), normal beat (NOR), right bundle branch block beat (RBBB), premature ventricular contraction (PVC) and atrial premature contraction (APC).
In this step, the continuous ECG signals will be transform into individual ECG beats. We approximate the width of individual beat to 300 sample data and the extracted beat is centered around R peak. For this purpose we utilize the annotation provided by the database to do the transformation. We use the R peak annotation as the pivot point for each beat. For each R-peak, we cutoff the continuous signal for each beat start at R-150 pos until R+149 pos, as you can see in Fig.1, therefore we will get a beat with 300 sample data in width.
Fig. 1: cutoff technique used in this trasnformation process.
III. FEATUREEXTRACTION
As part of the pattern recognition system, feature is an important part to make the classification process work well. Good feature will lead the process to the better result as expected, but if the feature is not appropriate, it will yield to negative result. There are many way to do a feature extraction process, in this step, we use discrete wavelet transformation to extract the feature contain in the individual signal beat. The Wavelet Transform (WT) of a signalf(x) is define as:
Wsf(x) =f(x)∗Ψs(x) = 1s Z +∞
−∞ f(t)Ψ(x−t s )dt (1) where s is scale factor, Ψs(x) = 1
sΨ(x s) is the dilation of a basic wavelet Ψ(x) by the scale factor s. Lets=2j (j∈Z,Z is the integral set), then the WT is called dyadic WT [21]. The dyadic WT of a digital signal f(n) can be calculated with Mallat algorithm as follows:
S2jf(n) =
k∈ Z
hkS2j−1f(n−2j−1k) (2)
W2jf(n) = X
k∈ Z
gkS2j−1f(n−2j−1k) (3) whereS2j, is smoothing operator,S2jf(n) =aj.aj is the low frequency coeficients that is approximation of original signal while W2jf(n) = dj, dj is high frequency coeficients that is the detail of original signals [22].
In wavelet theory, selecting the appropriatemother wavelet and the number of decomposition level is an importance part. The proper selection aims to retain the important part of information and still remain in the wavelet coefficients. The Mother wavelet that we used in this research is one member of the Daubechies fam- ilies : Daubechies order 8, adapted from the Senhadji research [28], who concluded that the Daubechies wavelet provide the best performance. through out this research, we will try to decompose our individual beats data from level 1 until level 5. Thus, the individual beat will be decomposed into detailsd1· · ·d5, and one of the approximation a1· · ·a5, depend on the level we chose.
In all the information generated after the decom- position process, forexample, decomposition at level 4, namely a4, d1· · ·d4, then we chose the proper coefficient that represent the signal well. For each individual beat, the detail d1 is usually noise signals and have to eliminated and the d2, d3, d4 represent the high frequency coefficient of the Signal. Sincea4
represent the approximation of the signal, it is mean that it is contain the main feature of the signal, then we chosen a4 as a feature for each individual beat.
For each individual beat, we have 300 sample data, after the decomposition using wavelet db8 level 4, we have got a4 contains 32 points, as we can see in Fig.2(a) show the original signal of RBBB type beat, and Fig.2(b) show the wavelet coefficient of that signal after decomposition, one is for level 4.
IV. CLASSIFICATIONTECHNIQUE
A. Back Propagation
Back propagation, or propagation of error, is a common method in artificial neural networks. It firstly described by Arthur E. Bryson and Yu-Chi Ho in 1969.
Since 1986, through the work of David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, it gained recognition, and it lead to arenaissancein the field of artificial neural network research.
It is a supervised learning method, and is an im- plementation of the Delta rule. It requires a teacher that knows, or can calculate, the desired output for any given input. It is most useful for feed-forward networks (networks that have no feedback, or simply, that have no connections that loop). The term is an abbreviation
Fig. 2: (a) RBBB Original Signal, 300 sample data, (b) Wavelet Coeficient RBBB after each level of decomposition using db8, level 1-157 point, level 2- 86 point, level 3-50 point and level 4-32 point (left to right respectifully)
for ”backwards propagation of errors”. Back propaga- tion requires that the activation function used by the artificial neurons (or ”nodes”) is differentiable [27].
This method consists of two phases: taken from the feed forward perceptron and back propagation phase errors. One of the things that distinguish between back propagation with perceptron is on their network architecture. Perceptron has a single-layer perceptron network but back propagation have multiple layers of coating, which is a neural network layer lots (MLP) with one hidden layer.
The steps in the process of back propagation method of neural networks is as follows:
1) Feed forward process:
• Determining the value of inputs in the input layer.
The value at each input node is obtained from the value of each pixel in the image will be recognized.Determining the value of the input to hidden layer nodes.
• The value input to each hidden node is obtained by summing weight input signal.Determining the value of hidden layer nodes. Each hidden node value is obtained by using the activation function.
• Determining the value of input to output layer nodes.
nodes.
• Determining the value of output layer nodes.
Each node output value is obtained by using its activation function.
2) The process of back propagation of error:
• Knowing the value weighted error between the output layer to hidden layer. The value of error in this section is used to calculate the correction value of weights and bias between hidden layer and output layer.
• Knowing the value of error weights between the hidden layer to input layer. The value of error in this section is used to calculate the correction value of weights and biases between the input layer and hidden layer.
• Change the value of each weight and bias. The value of each weight is updated by summing the weight of the old value with the value of the correction weights
B. Fuzzy Neuro Learning Vector Quantization (FN- LVQ)
FNLVQ derived from LVQ and extend it using fuzzy theory. In this method, activation of neuron is expressed in terms of fuzzy number for dealing with the fuzziness caused by statistical measurement error. Fuzzification of all components of the reference and the input vectors is done through a normalized triangular fuzzy numbers process; with the maximum membership, the value is equal to 1. A normalized triangular fuzzy number is designated as [25], [26] :
F = (f, f1, fr) (4) Wheref the center-peak position ofF,fl left part fuzziness andfrright part one. Fuzziness is expressed by the skirt width of the membership function. For ECG hearth beat signal, we get membership function by grouping the train data into five groups, then we get minimum column value for fl , mean column value for f and maximum column for fr. Triangular fuzzy numbers is shown at Figure 3 Triangular fuzzy numbers present fuzzy membership function, with the value of membership function of f is 1 and 0 for fl andfr.
Fig. 3: Triangular fuzzy member