*Corresponding author: Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi-6204, Bangladesh E-mail addresses: [email protected] (Abu Sayeed)
12
A Comparative Analysis on the Task of Classification for Remote Sensing Hyperspectral Data
Abu Sayeed*, Md. Ali Hossain, and Md. Rabiul Islam
Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi-6204, Bangladesh
ARTICLE INFORMATION ABSTRACT
Received date: 30 Jan 2019 Revised date: 24 April 2019 Accepted date: 15 May 2019
The improved spectral information of Hyper-spectral images makes it suitable for ground object identification after an effective classification.
However, the classification task becomes very challenging as the number of input dimension becomes high in compare to limited training samples.
For instance, the classification accuracy of test data becomes low if the ratio of input dimension to number of training samples becomes low.
Kernel Support Vector Machines (KSVM) is studied in this research as it has the ability to generate the best separation plane between the classes of interest to separate them in a new high dimensional space. This paper presents a performance comparison between KSVM and Maximum Likelihood Classifier (MLC) on detected subspace in terms of classification accuracy for linearly inseparable classes. Experiments are performed on NASA Airborne Visible Infrared Spectrometer (AVIRIS) image and it shows KSVM outperforms MLC and obtains the highest accuracy 100% as it more robust against the outliers than MLC.
Keywords
Image Classification Feature Reduction Support Vector Machine Maximum Likelihood Classifier
1. Introduction
Earlier scientists and geologists obtained land cover information through manual surveying by observing places on manually. This method implemented in large number of regions of the earth for many decades but later for some situations and condition, it was found that they are inefficient and impractical. For example, suppose you have to implement this for large and remote areas. So now you might be thinking how much time and money should be required and also thinking if there are other way to do so. So manual surveying approach will practically require many times and it is expensive.
Then introduction of aircraft photography came into place where researchers captured land cover information by mounting a camera mounted on an aircraft. This helps the researchers to obtain and record land cover information in a much shorter time. However, they found such method dependent on the weather to some extent for the increasing number of air accidents. In accordance with the development of modern space technologies, remote sensing satellite took the place of aerial photography technology where satellites captured land cover information using mounting sensors on the satellites. In this mentioned technology creates an immense opportunity than the aircraft photography as a large number of land cover information can be required continuously, globally and with a relatively low cost.
Journal of Engineering and Applied Science
Contents are available at www.jeas-ruet.ac.bd
13 This technology makes the whole operation of monitoring land covers efficient and several places can be covered and observed at a time [1,2,3]. As a result, this helps detecting deviations or changes in land cover that have took place in a particular time period effectively. Time to time changes in land cover which is connected with natural conditions and human activities is a significant indication for taking decision accurately is to be made particularly related to environmental, agriculture and urbanization [4,5,7,8].
The high-dimensional nature of hyperspectral data introduces important limitations in supervised classifications, such as the limited availability of training samples reduces the overall classification accuracy. It create low classification accuracy problem that suffers from curse of dimensionality problem. The solution is the dimensionality reduction that may achieved through feature selection or feature extraction. The popular feature selection techniques are PCA, KPCA, LDA and KLFDA etc. The feature extractions are SNR, BD and MIFS etc [9].
Figure 1: Performance of KSVM on total 220 PCA images
Kernel SVM rest on statistical learning theory and the aim of SVM is the measuring the location of decision boundaries that create the optimal separation of classes. If a pattern recognition problem has two classes in which the mentioned classes are not linearly inseparable, then SVM with RBF kernel selects one from among the infinite number of linear decision boundaries and picks the one that minimizes the generalization error. By this procedure, the decision boundary that has been taken into consideration has to be the one that leaves the highest margin between the two classes, where margin is calculated as the sum of the distances to the hyper-plane from the closest points of the two classes. By means of standard quadratic programming, the problem of maximizing the margin can be solved. The closest data points to the hyper-plane are used to measure the margin; hence these data points are termed as ‘support vectors’. Consequently, the number of support vectors is small (Vapnik 1995).For identifying the global ecology or environmental changes; basic assumption has to be made about landuse and land cover. Landuse or land cover mapping is an essential component, which consists of several parameters that are merged together on the basis of requirement. Land cover focuses to cultivated land, buildings, water body, natural vegetation, rock/soil, fallow land, artificial, glacial, cover and others element observed on the land [1]. The structure of different land conditions can be predicted through the estimation of the pixels and characteristics of different model images used in formulation of their types and regions. Previously to predict the nature and classes existent in those images, several classification algorithms have been tried out. But SVM and MLC are the most emerging two algorithms for classifying them. MLC classification is based on parametric approach which stands on the idea of the parametric classifier based on the assumptions of normally distributed data for each class and entirely total selected set. Maximum Likelihood Classifier (MLC) has been implemented as the benchmark to compare accuracy of classification where most of the classes of the AVIRIS dataset are normally distributed. MLC has been considered as a relevantly effective technique to thematic mapping from the hyperspectral imagery but in real sense, the behavior of the distribution is unknown. In concise, it is most effective to use the non-parametric classifiers because they do not deliberately take any assumption. MLC has been adopted in this classification comparison for advantages of comparison of the classification accuracies and to validate the accuracy of the non-parametric classifier for classifying the land cover classes.
2. Classification techniques
There are two classification techniques that are selected in this research to classify the hyperspectral images.
Curse of dimensionality
Feasible no. of feature
14 2.1. Maximum Likelihood Classifier (MLC)
Maximum Likelihood Classifier (MLC) implements the parametric classification method depended on the idea of normally distributed data for each class and entirely all selected set of classes. Due to classify of land cover classes, number of studies has used the MLC as a standard to compare its classification accuracy with the other newly developed classifiers [7] like Kernel SVM, MLC and Spectral Angle Mapper (SAM) etc. It fundamentally uses supervised classification technique and its mathematical equation can obtained from Bayes theorem, that refers to the a posteriori distribution L(k|ⱷ) i.e., the probability of a pixel in where feature vector ⱷ equal to class k, is followed by equation:[8]
𝐿(ⱷ) = ∑𝑁𝑘=1𝐿(ⱷ|𝑘)𝐿(𝑘) (1) where L(k|ⱷ) is the likelihood function which states the probability of a feature vector ⱷwhen the pixel class is known to us., L(k) is the priori information that describes the probability of class k lies in the main study filed and L(ⱷ) refers to probability of ⱷ is observed, that can be written as followed equation:
𝐿(𝑘|ⱷ) =𝐿(ⱷ|𝑘)𝐿(𝑘)
𝑃(ⱷ) (2) The total number of classes is defined by N. L(ⱷ) refers as a constant of normalization to confirm ∑L(k|ⱷ) addition to 1. L is a pixel that is set to class k=j by the rule:
𝐿 ∈ k if 𝐿(𝑘|ⱷ) > 𝑃(𝑗|ⱷ) 𝑓𝑜𝑟 𝑎𝑙𝑙 j ≠ k (3) This means if a pixel class probability L(k|ⱷ) is greater than the other pixel class probability P(j|ⱷ) with respect to feature vector then the incoming pixel will be the member of class i. MLC often takes into account that the distribution of the data within a given class k follows a multivariate Gaussian distribution [9]. Then it is suitable for identify the log likelihood (or discriminate function):
𝑓𝑖(ⱷ) = 𝑙𝑛𝐿(ⱷ|𝑖) = − (1
2) (ⱷ − 𝜇𝑖) − (N
2) 𝑙𝑛(2π) − (1
2) ln (|𝐶𝑖|) (4) Since log is a monotonic function, Equation (3) is equivalent to: [10]
𝐿 ∈ k if 𝑓𝑘(ⱷ) > 𝑓𝑗(ⱷ) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗 ≠ 𝑘 (5) Each pixel is assigned to the maximum class labelled or likelihood due to if the value of probability lies in below a marginal threshold set by researcher that noted as unclassified [11].
2.2. Kernel Support Vector Machine (KSVM)
This classification considers separate data sets and then obtains the training and testing sets. All the training set carries different target value and different characteristics. The goal of SVM is to assume the target values of the given test set data. The nonlinear SVM is used to calculate the inner product of the mapped samples. As a result of it, the difficulty in determining the mapping function 𝜙 and also the cost for calculation of the mapped samples and their inner products can be minimized to some extent. Several typical and popular used kernels are radial basis function, linear and polynomial. Though SVM is initially introduced for the solution of two-class problems, it has been optimized to deal with several class classification which is based on either combination of decision results from multiple two-class classifications or optimization on multiclass based learning [12,13]. In the first phase, SVM is used for initial training and classification. In the second phase, these obtained results have been used for probability based modeling. The open-source library lib-SVM is used for initial training and classification of the four datasets and the Gaussian radial basis (RBF) kernels has been tested. For each group of datasets, all the data are normalized to (−1, 1) before SVM is applied. The best group of parameters, including the cost and the gamma value, are optimally determined through 10- fold cross validation. Eventually, the optimal parameters are used for classification. A linear hyper-plane is unable to separate classes without misclassification that are common cases. For this special case, a nonlinear separating hyper- plane can able to separate those classes. By using a nonlinear transformation function, data are mapped into a high dimension space with a nonlinear transformation function. So in the higher dimensional space, data are spread out, and a linear separating hyper-plane may be found. This is the idea of a principle Cover’s theorem on the separability of patterns. In the figure below that describes one that when two classes are in an input space, linear separating hyper-
15 plane cannot separate two classes. In the higher dimensional feature space a linear separating hyper-plane can be obtained when data has to be mapped [14].
Now ϴ is a nonlinear transformation function that maps all data into a high dimensional feature space. Let there is a function L, that is said to be a kernel function, such that:
𝐿(𝒀𝑖, 𝒀𝑗) ≡ϴ(𝒀𝑖) ⋅ϴ(𝒀𝑖) (6) The kernel function ϴ is altered by the explicit form of ϴ and dot product of the transformed vectors. Again, the lesscomputationally intensive is provided by the use of the kernel function [15-17]. The formulation of the kernel function from the dot product is a special case of Mercer’s theorem (Mercer, 1909; Schölkopf&Smola, 2002).
Figure 2: Map nonlinear data to high dimension feature space when successfully linear separating hyper-plane can be built.
3. Methodology
The study is involved with three stages: data preprocessing, using standard feature selection method, using maximum likelihood and SVM using RBF kernel for classification and comparison of results between them. The overall working flow diagram is given below:
Figure 3: Overall working flow diagram 3.1. Data Pre-Processing
We have worked with NASA Airborne Visible Infrared Spectrometer (AVIRIS) image. That’s why we have manipulated the image using MULTISPEC tool and taken 8 classes. We have taken five features from the image and studied on the classification results of them. That’s why it is necessary to extract the best information features for feature selection of data suitable for classification of classes in our image so that we get the best possible outcome for classification techniques used for the classification AVIRIS data set.
3.2. Feature Extraction
During the selection and manipulation of some particular bands and features of our specified data set, our study has been involved with the feature extraction of some selected bands from the specialized NASA AVIRIS image data. For this MIFS approach has been used [11]. This approach is selected in this case for the feature extraction of our specified band is because the classification accuracy of MIFS improves the most as more and more features are added which is due to the improvement in class discrimination power [11].That is why we have used this method for the extraction of feature from our AVIRIS data set.
Input Hyperspectral Image
Data Pre-Processing &
Feature Extraction
Feature Selection on New Feature Data
Resultant Subspace
Training Testing Supervised
Classification (KSVM, MLC classifier) Map
16 3.3. Maximum Likelihood Classifier
Maximum Likelihood Classifier (MLC) is considered as the parametric classifier when all the classes of the dataset are normally distributed. As the suitability for normal distributed data, MLC classifier is considered for comparing classification accuracy with the other classifiers [12,18]. By applying MLC, the thematic mapping is created from the hypespectral remote sensing image but in real sense, the nature of the distribution is unknown. Due to the unknown distribution nature, it is preferable to use the non-parametric classifiers that are free from assumptions. MLC classification is conducted for the sake of comparison of the classification accuracies and to validate the suitability of the non-parametric classifier for classifying the land cover classes in this study [13, 19].
3.4. Support Vector Machine using RBF kernel
The Support vector machine with RBF kernel is an efficient classifier for hyperspectral remote sensing images as it has the capability of handling of linearly inseparable classes [14]. This classifier also works well when the classes of interest are not normally distributed. Therefore, KSVM becomes an emerging technique for hyperspectral remote sensing image classification [15,16,20]. When there is a multiclass problem in hyperspectral image, Kernel SVM provide a multiclass optimization problem after successful implementation of SVMs, but as the number of the classes needed to be classified increases, the number of parameters to be estimated increases. Liv-SVM in MATLAB tools has been used for implementing the Kernel SVM. Three steps are followed in implementation stage; firstly, the input data scaling in the dimension (0 1), secondly the cross validation check to examine the best value of c (cost estimation factor value) and the gamma to carry out the testing of our referred AVIRIS data set. The values of c and gamma will be used to train our selected data values in accordance with the band value selected by feature selection. Finally it will be calculated the training accuracy with best c and gamma value and get the best training accuracy for the specified values referred to as our extracted band values.
4. Dataset Description
All hyperspectral images are captured over the Indian Pine in north part of Indiana and are captured by NASA AVIRIS sensor. There are total 220 image bands in the dataset [wavelength: 400-2400nm: visible -> near infrared]. In where, each image contain 145 x 145 pixels. There are 16 objects presents in the image. [23-26]. For this experiment, the numbers of eight classes are used. They are Soy, Woods, Corn-nottill, Soy-till, Hay, Corn, Wheat and Corn-min.
Table 1: Classes used in AVIRIS dataset AVIRIS Dataset
Class Name Train Test
Soy 96 168
Woods 170 130
Corn-nottill 96 126
Hay 120 128
Soy-till 110 105
Corn 56 30
Wheat 63 84
Corn-min 90 80
Total 801 851
4.1 Accuracy Assessment
The best fit gamma and c value (kernel width and cost parameter) has been examined to get the best training accuracy through the cross validation. After getting best gamma and c, it applied for testing of AVIRIS Indian pine dataset.
4.2 Experimental Analysis
In this paper, NASA AVIRIS (Airborne Visible Infrared Spectrometer) have been used hyper-spectral data for experimental analysis. Maximum Likelihood Classification and Support Vector Machine (RBF Kernel) have been
17 apply on 220 band data, PCA data (first 5 PCA) and MIFS used for classification[28,29]. Maximum Likelihood classifier is applied on AVIRIS Indian pine dataset ('92AV3C.lan). Standard feature extraction algorithm is used for best feature. Channels used: 193, 65, 31, 170, 2 are the best feature [9,30,31]. The numbers of eight classes are used.
They are Soy, Woods, Corn-nottill, Soy-till, Hay, Corn, Wheat and Corn-min. In the experiment, the first 5 PCA band images has been selected according to the maximum variance of the data (PCA1, PCA2, PCA3, PCA4, PCA5 images are selected) and Mutual Information Feature Selection (MIFS) Technique has been applied on the AVIRIS Indian pines images and selected the images bands 193, 65, 31, 170, 2 among 220 which is shown in Table 2. PCA1 image has higher variance than PCA2 and PCA2 has higher than PCA3 and so on. Kernel SVM and Maximum Likelihood Classifier (MLC) has been applied on the PCA images. The performance of accuracy of the KSVM and MLC classifier on PCA images are shown in Table 3. The performance of KSVM classifier outperforms the MLC classifier.
Figure 4: (a) Ground Truth Image (b) Original AVIRIS 220 band Image
Figure 5: Feature space of (a) PC1 (b) PC2 and (c) PC3 for AVIRIS data
Besides PCA, Kernel SVM and Maximum Likelihood Classifier (MLC) were also used to measure the classification accuracy of the feature obtained by Mutual Information based Feature Selection (NMIFS) technique.
Table 2: Selected feature for classification
Data Set Methods Orders of selected features
AVIRIS (Indian Pines)
Feature extraction (PCA) PC: 1, 2, 3, 4, 5
Feature Selection (MIFS) MIFS: 193, 65, 31, 170, 2
(a) (b)
(b)
(a) (c)
18 Table 3: Shows the performance on KSVM and MLC on PCA feature data (first 5 PCA feature)
No of features KSVM Testing Accuracy in % MLC Testing Accuracy in %
PCA 1 58.7 49.5
PCA 1+2 80.4 77.3
PCA 1+2+3 94.9 87.3
PCA 1+2+3+4 97.4 86.0
PCA 1+2+3+4+5 99.3 88.2
Table 4: Shows the performance on KSVM and MLC on MIFS
No of features KSVM Test Accuracy in % MLC Test Accuracy in %
Band 193 60.5 59.2
Band 193+65 79.4 61.5
Band 193+65+31 98.1 68.0
Band 193+65+31+170 100 68.5
Band 193+65+31+170+2 100 68.5
Table 5: Shows confusion matrix of KSVM on MIFS (training 5 features) Class
Name
Number Sample Pixel Number
Number of Samples in Class
1 2 3 4 5 6 7 8
Soy Woods Cornnottil Hay Soytill Corn Wheat Cornmin
Soy 1 96 96 0 0 0 0 0 0 0
Woods 2 170 0 170 0 0 0 0 0 0
Cornnottill 3 96 0 0 96 0 0 0 0 0
Hay 4 120 0 0 0 120
Soy-till 5 110 0 0 0 0 110 0 0 0
Corn 6 56 0 0 0 0 0 56 0 0
Wheat 7 63 0 0 0 0 0 0 63 0
Corn-min 8 90 0 0 0 0 0 0 0 90
Total 801 109 166 100 120 94 59 59 94
Overall Training Class Performance (801/801) =100%
Table 6: Shows confusion matrix of KSVM on MIFS (testing 5 features) Class
Name
Number Sample Pixel Number
Number of Samples in Class
1 2 3 4 5 6 7 8
Soy Woods Cornnottil Hay Soytill Corn Wheat Cornmin
Soy 1 168 168 0 0 0 0 0 0 0
Woods 2 130 0 130 0 0 0 0 0 0
Cornnottill 3 126 0 0 126 0 0 0 0 0
Hay 4 128 0 0 0 128 0 0 0 0
Soy-till 5 105 0 0 0 0 105 0 0 0
Corn 6 30 0 0 0 0 0 30 0 0
Wheat 7 84 0 0 0 0 0 0 84 0
Corn-min 8 80 0 0 0 0 0 0 80
Total 0 851 168 130 126 128 105 30 84 80
Overall Testing Class Performance (851/851) = 100%
In MIFS, MI between original image band and class labels is calculated to select the output feature for classification.
In the experiment image band 193, 65, 31, 170, 2 are selected among 220. It is seen that the KSVM classifier again
19 outperforms MLC in terms of classification accuracy for the NMIFS features which is shown in Table 4. Table 5 and Table 6 show the confusion matrix of training and testing samples classification using KSVM respectively. Also Table 7 and Table 8 show the confusion matrix of training and testing performance of MLC classifier respectively applied on 5 NMIFS features. Finally overall performance of KSVM and MLC are shown in Figure 6. It is seen that the KSVM provides improved classification accuracy in both the cases.
Table 7: Shows confusion matrix (training) of MLC on MIFS Class
Name
Number Sample Pixel Number
Number of Samples in Class
1 2 3 4 5 6 7 8
Soy Woods Cornnottil Hay Soytill Corn Wheat Cornmin
Soy 1 96 90 0 1 0 4 1 0 0
Woods 2 170 0 152 0 0 0 0 0 18
Cornnottill 3 96 1 0 91 0 3 1 0 0
Hay 4 120 0 0 0 120 0 0 0 0
Soy-till 5 110 20 0 6 0 76 5 3 0
Corn 6 56 0 0 0 0 0 55 1 0
Wheat 7 63 1 0 0 0 1 1 60 0
Corn-min 8 90 0 19 0 0 0 0 0 71
Total 801 112 171 98 120 84 63 64 89
Overall Training Class Performance (715/801) = 89.3%
Figure 6: Performance of KSVM vs. MLC on PCA image and MIFS features data.
Table 8: Shows confusion matrix (testing) of MLC on MIFS Class
Name
Number Sample Pixel Number
Number of Samples in Class
1 2 3 4 5 6 7 8
Soy Woods Cornnottil Hay Soytill Corn Wheat Cornmin
Soy 1 168 132 0 0 0 21 3 12 0
Woods 2 130 0 94 0 0 0 0 0 32
Cornnottill 3 126 0 0 125 0 1 0 0 0
Hay 4 128 0 0 0 128 0 0 0 0
Soy-till 5 105 6 0 4 0 84 11 0 0
Corn 6 30 0 0 2 0 0 26 2 0
Wheat 7 84 0 0 0 0 4 6 74 0
Corn-min 8 80 0 12 0 0 0 0 0 68
Total 851 138 110 131 128 110 46 88 100
Overall Testing Class Performance (735 / 851 ) = 86.4%.
20 5. Conclusion
Experimental analysis shows that KSVM provides better result than MLC as it has the ability to generating an optimal hyper plane between the class of interest and the rest. It also maps the input data to high dimensional space where the samples are easily separable using a hyper-plane. On the other hand MLC depends on the normal like data and suffers from the curse of dimensionality. The use of RBF kernel in KSVM makes it suitable to separate the class samples which are not linearly separable. Thus the approach with KSVM_NMIFS outperforms the state of the art approaches.
References
[1] A. Singh, “Review Article Digital change detection techniques using remotely-sensed data”, International Journal of Remote Sensing, vol. 10, no. 6, pp. 989-1003, 1989
[2] K.R. Manjula, J. Singaraju J. and A.K. Varma, “Data preprocessing in multi-temporal remote sensing data for deforestation analysis”, Global Journal of Computer Science and Technology Software & Data Engineering, vol. 13, no. 6, pp. 1-8, 2013.
[3] N.IS. Bahari, A. Ahmad and B.M. Aboobaider, “Application of support vector machine for classification of multispectral data”, 7th IGRSM. International Remote Sensing & GIS Conference and Exhibition. IOP Conf. Series: Earth and Environmental Science, 2014.
[4] A. Ahmad and S. Quegan, “Haze modelling and simulation in remote sensing satellite data”, Applied Mathematical Sciences, vol. 8, no. 159,pp. 7909-7921, 2014.
[5] M.F. Razali, A. Ahmad, O. Mohd and H. Sakidin, “Quantifying haze from satellite using haze optimized transformation (HOT)”, Applied Mathematical Sciences, vol. 9, no. 29, pp. 1407 – 1416, 2015.
[6] C. Huang C, L.S. Davis and J.R.S. Townshend, “An assessment of support vector machines for land cover classification”, International Journal of Remote Sensing, vol. 23, pp.725-749, 2002.
[7] A. Ahmad and S. Quegan, “Analysis of Maximum Likelihood Classification”, Applied Mathematical Sciences, vol. 6, pp. 6425 – 6436, 2012.
[8] B. Schölkopf and A. J. Smola, ”Learning with kernels: support vector machines, regularization, optimization, and beyond”. Cambridge, Mass.: MIT Press, 2002.
[9] M. A. Hossain and M. P. XiupingJia, “Improved feature selection based on a mutual information measure for hyperspectral image classification”, International Geosciences and Remote Sensing Symposium (IGARSS), 2011.
[10] D. S. Schimel, and P. D. Try, “Remote sensing of the land surface for studies of global change: Models – Algorithms–experiments”, Remote Sensing of Environment, vol. 51, pp. 3-26, 1989.
[11] D. A. Landgrebe, “Signal Theory Methods in Multispectral Remote Sensing Hoboken”, NJ: John Wiley
& Sons, 2003.
[12] C. J. C. Burges,“A tutorial on support vector machines for pattern recognition”, Data Mining and Knowedge Discovery, vol. 2, pp. 121–167, 1998.
[13] D.G. Brown, D. P. Lusch and K. A. Duda, “Geomorphology 21”, pp. 233 – 250, 1998.
[14] C. Chang and C.J. Lin, “LIBSVM: a library for support vector machines”, ACM Transactions on Intelligent Systems and Technology, vol. 2, 2011.
[15] M. Fauvel, J. Chanussot, and J.A. Benediktsson, “Kernel principal component analysis for classification of hyperspectral remote sensing data over urban areas,” EURASIP Journal on Advances in Signal Processing, vol. 2009, pp. 1–14, 2005.
[16] J.A. Richards and M. P. XiupingJia, “Remote Sensing Digi-tal Image Analysis”, 4th Edition, Springer- verlag Berlin Heidelberg, Germany, 2006.
[17] G.F. Hughes, “On the mean accuracy of statistical pat-tern recognizers,” IEEE Transactions on Information Theory, vol. 14, pp. 55–63, 1968.
[18] J. Yang, P. Yu, and B. Kuo, “A nonparametric feature extraction and its application to nearest neighbour classification for hyperspectral image data,” IEEE Transaction on Geoscience and Remote Sensing, vol.
48, pp. 1279–1293, 2010.
[19] S. Tajudin and D.A. Landgrebe, “Covariance estimation with limited training samples,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, pp. 2113– 2118, 1999.
[20] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max- dependency, max-relecvance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1226–1238, 2005.
[21] C.Conese and F. Maselli, “Selection of optimum bands from tm scenes through mutual information analysis,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 48, pp. 2–11, 1993.
21 [22] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-
dependency, max-relecvance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1226–1238, 2005.
[23] K. Torkkola, “Feature extraction by non-parametric mutual information maximization,” Journal of Machine Learning Research 3, vol. 3, pp. 1415–1438, 2003.
[24] T.M. Cover and J.A. Thomas, “Elements of Information Theory”, Second Edition, pp. 776, 2006.
[25] K. Fukunaga, “Introduction to Statistical Pattern Recognition”, New York: Academic, 1990.
[26] D. A. Landgrebe, “https://engineering.purdue.edu/ biehl/ multispec/hyperspectral.html,”
[27] P. F. Hsieh and D. A. Landgrebe, “Classification of high dimensional data,” PhD Thesis and School of Electrical and Computer Engineering Technical Report, vol. 98-4, pp. 40–42, 1998.
[28] A, Sahar, “Hyperspectral image classification using unsupervised algorithms”, International Journal of Advanced Computer Science and Applications, vol. 7, pp. 198-205, 2016.
[29] K. Islam, M. Jashimuddin, B. Nath and T.K. Nath, “Quantitative assessment of land cover change using landsat time series data: case of Chunati Wildlife Sanctuary (CWS)”, Intern. J. Environ. Geoinform., vol.
3, pp. 45-55, 2016.
[30] B. Wang, J. Choi, S. Choi, S. Lee, P. Wu and Y. Gao, “Image fusion-based land cover change detection using multitemporal high-resolution satellite images”, Remote Sensing, vol. 9, pp. 1-19. 2017.
[31] J.S. Rawat and M. Kumar, “Monitoring land use/cover change using remote sensing and GIS techniques:
a case study of Hawalbagh block”, Egypt. J. Remote Sens. Space Sci., vol. 18, pp. 77-84, 2015.