CLASSIFICATION OF MEDICAL DATA BASED ON SPARSE REPRESENTATION USING DICTIONARY

We have shown that sparse representation with any of the dictionary learning algorithms such as K-SVD and online dictionary learning (ODL) is quite suitable for a host of classification, clustering and retrieval tasks on different medical datasets. Content-based image indexing tends to facilitate automatic identification and abstraction of the visual content of an image.

Tasks involved in medical image classification and retrieval

Feature extraction

Color
Texture
Shape Retrieval
Semantics
Edge Information

Some of the color spaces we commonly encounter are RGB, HSV, CIE Lab and Luv [3]. Current CBIR systems extract similar images from a collection based on the easily accessible characteristics of images, such as shape, color and texture.

Indexing for retrieval and browsing

Semantic similar is meant in terms of human visual similarity perception (or high level in CBIR). However, it cannot be used to find similarities between images in a database.

Issues addressed in this thesis

In this thesis, we propose a method to solve the problem of data imbalance in medical images using multi-level classification approach. These features are then used for the proposed dictionary learning-based clustering and sparse representation-based classification algorithms.

Organization of the thesis

Features used for representation of an image

Extraction of gray-level features
Extraction of texture features
Extraction of shape features

We start by identifying all the similar gray level regions (connected components) in the image and count the number of pixels they have. The statistical approach uses the statistical distribution of the pixels gray level intensity to identify features.

Measure of similarity

Existing methods for medical image classification

Image retrieval and classification is then done based on the Euclidean distance and using support vector machines. These overlapping classes are detected in the merging scheme using measures such as the correctness rate of each class, the similarity of the imaging body organs, and the misclassification rate.

Issues addressed in medical image classification

Summary

Feature extraction

In the first method, edge-based feature extraction is used to extract edge information of medical images. Each part of the image is divided into concentric circular regions. equal area as shown in Fig. Divide each part of each image into 4 concentric circular regions, so that each circular region has the same number of pixels as the other regions.

Calculate the mean and variance for each circular area and use them as components of the feature vector.

Fig. 3.2: (a) Samples of IRMA medical images. (b) Edge images of samples in (a). (c) Images are divided into equal size of patches

Proposed method

A test image bj is assigned to class Ci if the i-th dictionary associated with class Ci gives the highest sparsity for bj among all dictionaries, taking into account the l1 - distance. Construct a dictionary Di for each training class Ci using an online dictionary learning (ODL) algorithm.

Experimental Results

Furthermore, kernel SVM is explored with different types of kernels, namely linear, polynomial, RBF and sigmoid. Then, if the appropriate classes within the top three matches are taken into account, the performance increases to 97.9%. In [56], an evaluation on a dataset of 1500 images from the IRMA database achieved a classification rate of 97.5% in a 17-class classification problem.

The classes are separated based on the imaging angle and anatomical area and an accuracy rate of 82.87% was achieved.

Fig. 3.3: Confusion matrix using (a) LDA classification (b) Bayesian classification (c) ODL classification (d) KNN classification (e) K-SVM classification (f) NN classification

Summary and Conclusions

The results presented on the 5 standard UCI medical datasets demonstrate the effectiveness of the proposed multi-level classification approach. One of the problems with medical image classification is that medical datasets are often unbalanced, that is, they are also highly susceptible to noise in the data, and performance degrades significantly when noisy data is fed as input to one of the individual classifiers.

One of the dictionary learning algorithms, namely, online dictionary learning is used along with support vector machines in the proposed method.

Table 3.1: X-ray image classes: anatomical, direction. [6](A=Coronal, B=Axial, C=Other orien- orien-tation D=Sagittal and E=Rotated)

Multi-level classification approach to medical data

Feature extraction

This dataset contains 30 continuous features, calculated from a digitized image of a fine needle aspirate (FNA) of a breast mass. The functions include age, number of times pregnant, diastolic blood pressure and body mass index. Features include age, gender, chest pain type, resting blood pressure, serum cholesterol, fasting blood glucose, resting electrocardiographic results, maximal heart rate achieved, exercise-induced angina, old peak, peak exercise ST-segment slope, number of major vessels.

Now, the features extracted from the above dataset are given as input to form a sparse dictionary using online dictionary learning.

On-line dictionary learning and sparsity based classification

Multi-level classification approach

Experimental results and discussion

The effectiveness of the proposed system is evaluated by measuring the classification accuracy, sensitivity and specificity. The efficiency of the proposed method gives 88%, which is the best classification accuracy compared to other single and multiple. The performance of the proposed method on this dataset shows a classification accuracy close to the state of the art.

Table 4.4: Comparison of classification performance with state-of-the-art approaches on Heart-StatLog dataset.

Table 4.2 shows the performance evaluation obtained on WBCD data. In this ta- ta-ble, F-BP, F-kNN, F-SVM, F-Bayes and multi-agent classifiers give good classification results because they use multi-classification technique

Summary and Conclusions

Sparse representation

Various algorithms such as online dictionary learning (ODL) [58], K-SVD [59] and method of optimal directions (MOD) [60] have been developed to process training data. It combines the advantages of generic multi-scale representations with the K-SVD dictionary learning method. Experiments of modality-based medical image classification application using sparse representation are discussed in detail in section 5.3.

The present work provides a method for medical image classification using the multiscale dictionary learning framework.

Sparsity based medical image classification

Second, the entire data set is represented using fixed small size dictionary which greatly reduces computation time. In the classification phase, each subimage obtained from the test image is matched to the trained dictionaries of only that subimage. The class that yields maximum sparsity is chosen as the class for that sub-band.

When all the sub-images are evaluated, the class that matches the majority of the sub-bands is selected as the category for the test image.

Experimental Results

The classification accuracy of different possible combinations of the gradient vectors extracted from the four subbands is shown in Table 5.3. It can be observed that the LL subband contains more information among the four subbands. The classification accuracy based on the gradient vectors extracted from the LH, HL, and HH subbands was 73.8%, respectively.

Different combination sequences were tried and the best classification accuracy of 91.6% was achieved after combining the dictionaries from all the subbands.

Table 5.1: Classification accuracy (%) of multi-scale dictionary learning method using wavelet decomposition based features and different dictionary sizes.

Summary and Conclusions

Feature Extraction

Classification

Time domain functions include RR interval functions, QRS duration, QR duration, RS duration and T wave duration, energy of the QRS complex, energy of the QR segment, energy of the RS segment and energy of the T wave. The RR interval functions include the pre-RR interval, post-RR interval, the mean RR interval, and the local mean RR interval. The pre-RR interval is the time interval between the current R-peak and the previous R-peak, and the post-RR interval is the time interval between the current R-peak and the next R-peak.

The morphological features of the ECG consist of the morphological features of the fixed interval from the QRS complex and the T wave of the heartbeat cycle.

Fig. 6.1: Cardiac cycle of a typical heartbeat represented by the P-QRS-T wave form.

Experimental Results

In this chapter, we develop an approach to classify normal and abnormal heartbeats using adaptive learning. The classification approach based on adaptive learning improves the classification accuracy compared to the one-time classification approach. The Adaboosting method uses a weighted voting technique and the weight assigned to a classifier depends on its error on the training set.

Second approach is to classify only SVEB and VEB heartbeats using different classifiers with and without adaptive learning mechanism.

Table 6.1: Comparison of classification performance (%) using indi- indi-vidual classifiers without adaptive learning.

Summary and Conclusions

The problem of searching for similar images in a large image repository based on its content is called content based image retrieval (CBIR) [53]. Digital image retrieval techniques are critical in the emerging field of medical image databases for the clinical decision-making process. In this chapter, we address the above-mentioned issues in the proposed content-based medical image retrieval (CBMIR) method.

In this chapter, we propose a content-based medical image retrieval (CBMIR) algorithm using dictionary learning approach.

CBMIR using Dictionary Learning

Feature extraction

In the first feature extraction method, an image is divided into concentric circular regions of equal area for rotation invariant representation shown in Fig. In the second feature extraction method, an image is divided into four blocks, resulting in four subimages shown in Figs. Each sub-image is divided into concentric circular regions of equal area from which the mean and variance of pixel intensity values are calculated.

This feature extraction method is more suitable for medical image databases due to the rich information of medical images available in the center of images.

Proposed Method

The images in the cluster associated with this dictionary are compared using a similarity measure to retrieve images that are similar to the query image. The cluster assignment and dictionary update steps are repeated until there is no significant change in the clusters Ci. After identifying the most relevant cluster, we find the relevant images in the cluster using a similarity metric.

The related images (search results) within the cluster are identified based on the distance criterion.

Experimental Results

Database Description and Results

The best performance using the fuzzy C-means clustering is 74.8% precision and 60% recall and K-means clustering is 62.6% precision and 48%. It can be deduced from Table 7.3 and 7.4 that the proposed method using second feature extraction method (93.7% precision and 83.2% recall) gives better performance than the fuzzy C-means and K-means clustering algorithms. From the results in Table 7.5 and 7.6, it can be concluded that the proposed method performs better (62.8% precision and 47.2% recall) than fuzzyC-means and K-means clustering methods.

Table 7.2: Performance measure (%) of the proposed fuzzy C-means and K-means clustering methods using second feature extraction method and Euclidean distance as similarity measure.

Table 7.1: Performance measure (%) of the proposed, fuzzy C -means and K -means clustering methods obtained with the first feature extraction method and the Euclidean distance as similarity measure.

Existing medical image search and retrieval techniques are not very efficient in terms of time and accuracy of search results, because most of the existing medical image search tools use text-based image retrieval techniques. Image capture of different modalities suffers from significant contrast variation between images of the same organ or body part. Wavelet features extracted from an image provide useful discriminative information for medical image classification.

Addressed the problem of searching for relevant information in large medical image databases in content-based medical image retrieval.

Table 7.6: Performance measure (%) of the proposed, fuzzy C-means and K-Means clustering method using second feature extraction method and Mahalanobis distance as similarity measure.

Contributions of the work

Adaptive dictionary learning-based classification is used to classify normal and abnormal heartbeat patterns from an ECG database. We also proposed a method for clustering medical data based on sparse representation using dictionary learning. Classification of medical images based on acquisition source (modality) represented by multi-scale wavelets using online dictionary learning was proposed.

A novel clustering method for medical image retrieval based on sparse representation and dictionary learning was proposed.

Directions for future research

Mehrotra, "Hybrid Trees: An Index Structure for High-Dimensional Feature Spaces," in Data Engineering, 1999. Jain, "Content-Based Image Retrieval at the End of Early Years," Pattern Analysis and Intelligence of Machinery, IEEE Transactions on , vol. Shao, "Breast mass identification based on multi-agent interactive information fusion method," in Bioinformatics and Biomedical Engineering, 2009.

Wang, “Exploration of Bayesian methods from a small sample high-dimensional data set in poison identification,” Software Engineering and Service Science (ICSESS), 4th IEEE International Conference on, 2013, p.