Journal of Engineering and Applied Science

(1)

*Corresponding author: Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi-6204, Bangladesh E-mail addresses: [email protected] (Abu Sayeed)

12

A Comparative Analysis on the Task of Classification for Remote Sensing Hyperspectral Data

Abu Sayeed*, Md. Ali Hossain, and Md. Rabiul Islam

Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi-6204, Bangladesh

ARTICLE INFORMATION ABSTRACT

Received date: 30 Jan 2019 Revised date: 24 April 2019 Accepted date: 15 May 2019

The improved spectral information of Hyper-spectral images makes it suitable for ground object identification after an effective classification.

However, the classification task becomes very challenging as the number of input dimension becomes high in compare to limited training samples.

For instance, the classification accuracy of test data becomes low if the ratio of input dimension to number of training samples becomes low.

Kernel Support Vector Machines (KSVM) is studied in this research as it has the ability to generate the best separation plane between the classes of interest to separate them in a new high dimensional space. This paper presents a performance comparison between KSVM and Maximum Likelihood Classifier (MLC) on detected subspace in terms of classification accuracy for linearly inseparable classes. Experiments are performed on NASA Airborne Visible Infrared Spectrometer (AVIRIS) image and it shows KSVM outperforms MLC and obtains the highest accuracy 100% as it more robust against the outliers than MLC.

Keywords

Image Classification Feature Reduction Support Vector Machine Maximum Likelihood Classifier

1. Introduction

Earlier scientists and geologists obtained land cover information through manual surveying by observing places on manually. This method implemented in large number of regions of the earth for many decades but later for some situations and condition, it was found that they are inefficient and impractical. For example, suppose you have to implement this for large and remote areas. So now you might be thinking how much time and money should be required and also thinking if there are other way to do so. So manual surveying approach will practically require many times and it is expensive.

Then introduction of aircraft photography came into place where researchers captured land cover information by mounting a camera mounted on an aircraft. This helps the researchers to obtain and record land cover information in a much shorter time. However, they found such method dependent on the weather to some extent for the increasing number of air accidents. In accordance with the development of modern space technologies, remote sensing satellite took the place of aerial photography technology where satellites captured land cover information using mounting sensors on the satellites. In this mentioned technology creates an immense opportunity than the aircraft photography as a large number of land cover information can be required continuously, globally and with a relatively low cost.

Contents are available at www.jeas-ruet.ac.bd

(2)

13 This technology makes the whole operation of monitoring land covers efficient and several places can be covered and observed at a time [1,2,3]. As a result, this helps detecting deviations or changes in land cover that have took place in a particular time period effectively. Time to time changes in land cover which is connected with natural conditions and human activities is a significant indication for taking decision accurately is to be made particularly related to environmental, agriculture and urbanization [4,5,7,8].

The high-dimensional nature of hyperspectral data introduces important limitations in supervised classifications, such as the limited availability of training samples reduces the overall classification accuracy. It create low classification accuracy problem that suffers from curse of dimensionality problem. The solution is the dimensionality reduction that may achieved through feature selection or feature extraction. The popular feature selection techniques are PCA, KPCA, LDA and KLFDA etc. The feature extractions are SNR, BD and MIFS etc [9].

Figure 1: Performance of KSVM on total 220 PCA images

Kernel SVM rest on statistical learning theory and the aim of SVM is the measuring the location of decision boundaries that create the optimal separation of classes. If a pattern recognition problem has two classes in which the mentioned classes are not linearly inseparable, then SVM with RBF kernel selects one from among the infinite number of linear decision boundaries and picks the one that minimizes the generalization error. By this procedure, the decision boundary that has been taken into consideration has to be the one that leaves the highest margin between the two classes, where margin is calculated as the sum of the distances to the hyper-plane from the closest points of the two classes. By means of standard quadratic programming, the problem of maximizing the margin can be solved. The closest data points to the hyper-plane are used to measure the margin; hence these data points are termed as ‘support vectors’. Consequently, the number of support vectors is small (Vapnik 1995).For identifying the global ecology or environmental changes; basic assumption has to be made about landuse and land cover. Landuse or land cover mapping is an essential component, which consists of several parameters that are merged together on the basis of requirement. Land cover focuses to cultivated land, buildings, water body, natural vegetation, rock/soil, fallow land, artificial, glacial, cover and others element observed on the land [1]. The structure of different land conditions can be predicted through the estimation of the pixels and characteristics of different model images used in formulation of their types and regions. Previously to predict the nature and classes existent in those images, several classification algorithms have been tried out. But SVM and MLC are the most emerging two algorithms for classifying them. MLC classification is based on parametric approach which stands on the idea of the parametric classifier based on the assumptions of normally distributed data for each class and entirely total selected set. Maximum Likelihood Classifier (MLC) has been implemented as the benchmark to compare accuracy of classification where most of the classes of the AVIRIS dataset are normally distributed. MLC has been considered as a relevantly effective technique to thematic mapping from the hyperspectral imagery but in real sense, the behavior of the distribution is unknown. In concise, it is most effective to use the non-parametric classifiers because they do not deliberately take any assumption. MLC has been adopted in this classification comparison for advantages of comparison of the classification accuracies and to validate the accuracy of the non-parametric classifier for classifying the land cover classes.

2. Classification techniques

There are two classification techniques that are selected in this research to classify the hyperspectral images.

Curse of dimensionality

Feasible no. of feature

(3)

14 2.1. Maximum Likelihood Classifier (MLC)

Maximum Likelihood Classifier (MLC) implements the parametric classification method depended on the idea of normally distributed data for each class and entirely all selected set of classes. Due to classify of land cover classes, number of studies has used the MLC as a standard to compare its classification accuracy with the other newly developed classifiers [7] like Kernel SVM, MLC and Spectral Angle Mapper (SAM) etc. It fundamentally uses supervised classification technique and its mathematical equation can obtained from Bayes theorem, that refers to the a posteriori distribution L(k|ⱷ) i.e., the probability of a pixel in where feature vector ⱷ equal to class k, is followed by equation:[8]

𝐿(ⱷ) = ∑^𝑁_𝑘=1𝐿(ⱷ|𝑘)𝐿(𝑘) (1) where L(k|ⱷ) is the likelihood function which states the probability of a feature vector ⱷwhen the pixel class is known to us., L(k) is the priori information that describes the probability of class k lies in the main study filed and L(ⱷ) refers to probability of ⱷ is observed, that can be written as followed equation:

𝐿(𝑘|ⱷ) =^𝐿(ⱷ|𝑘)𝐿(𝑘)

𝑃(ⱷ) (2) The total number of classes is defined by N. L(ⱷ) refers as a constant of normalization to confirm ∑L(k|ⱷ) addition to 1. L is a pixel that is set to class k=j by the rule:

𝐿 ∈ k if 𝐿(𝑘|ⱷ) > 𝑃(𝑗|ⱷ) 𝑓𝑜𝑟 𝑎𝑙𝑙 j ≠ k (3) This means if a pixel class probability L(k|ⱷ) is greater than the other pixel class probability P(j|ⱷ) with respect to feature vector then the incoming pixel will be the member of class i. MLC often takes into account that the distribution of the data within a given class k follows a multivariate Gaussian distribution [9]. Then it is suitable for identify the log likelihood (or discriminate function):

𝑓_𝑖(ⱷ) = 𝑙𝑛𝐿(ⱷ|𝑖) = − (¹

2) (ⱷ − 𝜇_𝑖) − (^N

2) 𝑙𝑛(2π) − (¹

2) ln (|𝐶_𝑖|) (4) Since log is a monotonic function, Equation (3) is equivalent to: [10]

𝐿 ∈ k if 𝑓_𝑘(ⱷ) > 𝑓_𝑗(ⱷ) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗 ≠ 𝑘 (5) Each pixel is assigned to the maximum class labelled or likelihood due to if the value of probability lies in below a marginal threshold set by researcher that noted as unclassified [11].

2.2. Kernel Support Vector Machine (KSVM)

This classification considers separate data sets and then obtains the training and testing sets. All the training set carries different target value and different characteristics. The goal of SVM is to assume the target values of the given test set data. The nonlinear SVM is used to calculate the inner product of the mapped samples. As a result of it, the difficulty in determining the mapping function 𝜙 and also the cost for calculation of the mapped samples and their inner products can be minimized to some extent. Several typical and popular used kernels are radial basis function, linear and polynomial. Though SVM is initially introduced for the solution of two-class problems, it has been optimized to deal with several class classification which is based on either combination of decision results from multiple two-class classifications or optimization on multiclass based learning [12,13]. In the first phase, SVM is used for initial training and classification. In the second phase, these obtained results have been used for probability based modeling. The open-source library lib-SVM is used for initial training and classification of the four datasets and the Gaussian radial basis (RBF) kernels has been tested. For each group of datasets, all the data are normalized to (−1, 1) before SVM is applied. The best group of parameters, including the cost and the gamma value, are optimally determined through 10- fold cross validation. Eventually, the optimal parameters are used for classification. A linear hyper-plane is unable to separate classes without misclassification that are common cases. For this special case, a nonlinear separating hyper- plane can able to separate those classes. By using a nonlinear transformation function, data are mapped into a high dimension space with a nonlinear transformation function. So in the higher dimensional space, data are spread out, and a linear separating hyper-plane may be found. This is the idea of a principle Cover’s theorem on the separability of patterns. In the figure below that describes one that when two classes are in an input space, linear separating hyper-

(4)

15 plane cannot separate two classes. In the higher dimensional feature space a linear separating hyper-plane can be obtained when data has to be mapped [14].

Now ϴ is a nonlinear transformation function that maps all data into a high dimensional feature space. Let there is a function L, that is said to be a kernel function, such that:

𝐿(𝒀_𝑖, 𝒀_𝑗) ≡ϴ(𝒀_𝑖) ⋅ϴ(𝒀_𝑖) (6) The kernel function ϴ is altered by the explicit form of ϴ and dot product of the transformed vectors. Again, the lesscomputationally intensive is provided by the use of the kernel function [15-17]. The formulation of the kernel function from the dot product is a special case of Mercer’s theorem (Mercer, 1909; Schölkopf&Smola, 2002).

Figure 2: Map nonlinear data to high dimension feature space when successfully linear separating hyper-plane can be built.

3. Methodology

The study is involved with three stages: data preprocessing, using standard feature selection method, using maximum likelihood and SVM using RBF kernel for classification and comparison of results between them. The overall working flow diagram is given below:

Figure 3: Overall working flow diagram 3.1. Data Pre-Processing

We have worked with NASA Airborne Visible Infrared Spectrometer (AVIRIS) image. That’s why we have manipulated the image using MULTISPEC tool and taken 8 classes. We have taken five features from the image and studied on the classification results of them. That’s why it is necessary to extract the best information features for feature selection of data suitable for classification of classes in our image so that we get the best possible outcome for classification techniques used for the classification AVIRIS data set.

3.2. Feature Extraction

During the selection and manipulation of some particular bands and features of our specified data set, our study has been involved with the feature extraction of some selected bands from the specialized NASA AVIRIS image data. For this MIFS approach has been used [11]. This approach is selected in this case for the feature extraction of our specified band is because the classification accuracy of MIFS improves the most as more and more features are added which is due to the improvement in class discrimination power [11].That is why we have used this method for the extraction of feature from our AVIRIS data set.

Input Hyperspectral Image

Data Pre-Processing &

Feature Extraction

Feature Selection on New Feature Data

Resultant Subspace

Training Testing Supervised

Classification (KSVM, MLC classifier) Map

(5)

16 3.3. Maximum Likelihood Classifier

Maximum Likelihood Classifier (MLC) is considered as the parametric classifier when all the classes of the dataset are normally distributed. As the suitability for normal distributed data, MLC classifier is considered for comparing classification accuracy with the other classifiers [12,18]. By applying MLC, the thematic mapping is created from the hypespectral remote sensing image but in real sense, the nature of the distribution is unknown. Due to the unknown distribution nature, it is preferable to use the non-parametric classifiers that are free from assumptions. MLC classification is conducted for the sake of comparison of the classification accuracies and to validate the suitability of the non-parametric classifier for classifying the land cover classes in this study [13, 19].

3.4. Support Vector Machine using RBF kernel

The Support vector machine with RBF kernel is an efficient classifier for hyperspectral remote sensing images as it has the capability of handling of linearly inseparable classes [14]. This classifier also works well when the classes of interest are not normally distributed. Therefore, KSVM becomes an emerging technique for hyperspectral remote sensing image classification [15,16,20]. When there is a multiclass problem in hyperspectral image, Kernel SVM provide a multiclass optimization problem after successful implementation of SVMs, but as the number of the classes needed to be classified increases, the number of parameters to be estimated increases. Liv-SVM in MATLAB tools has been used for implementing the Kernel SVM. Three steps are followed in implementation stage; firstly, the input data scaling in the dimension (0 1), secondly the cross validation check to examine the best value of c (cost estimation factor value) and the gamma to carry out the testing of our referred AVIRIS data set. The values of c and gamma will be used to train our selected data values in accordance with the band value selected by feature selection. Finally it will be calculated the training accuracy with best c and gamma value and get the best training accuracy for the specified values referred to as our extracted band values.

4. Dataset Description

All hyperspectral images are captured over the Indian Pine in north part of Indiana and are captured by NASA AVIRIS sensor. There are total 220 image bands in the dataset [wavelength: 400-2400nm: visible -> near infrared]. In where, each image contain 145 x 145 pixels. There are 16 objects presents in the image. [23-26]. For this experiment, the numbers of eight classes are used. They are Soy, Woods, Corn-nottill, Soy-till, Hay, Corn, Wheat and Corn-min.

Table 1: Classes used in AVIRIS dataset AVIRIS Dataset

Class Name Train Test

Soy 96 168

Woods 170 130

Corn-nottill 96 126

Hay 120 128

Soy-till 110 105

Corn 56 30

Wheat 63 84

Corn-min 90 80

Total 801 851

4.1 Accuracy Assessment

The best fit gamma and c value (kernel width and cost parameter) has been examined to get the best training accuracy through the cross validation. After getting best gamma and c, it applied for testing of AVIRIS Indian pine dataset.

4.2 Experimental Analysis

In this paper, NASA AVIRIS (Airborne Visible Infrared Spectrometer) have been used hyper-spectral data for experimental analysis. Maximum Likelihood Classification and Support Vector Machine (RBF Kernel) have been

(6)

17 apply on 220 band data, PCA data (first 5 PCA) and MIFS used for classification[28,29]. Maximum Likelihood classifier is applied on AVIRIS Indian pine dataset ('92AV3C.lan). Standard feature extraction algorithm is used for best feature. Channels used: 193, 65, 31, 170, 2 are the best feature [9,30,31]. The numbers of eight classes are used.

They are Soy, Woods, Corn-nottill, Soy-till, Hay, Corn, Wheat and Corn-min. In the experiment, the first 5 PCA band images has been selected according to the maximum variance of the data (PCA1, PCA2, PCA3, PCA4, PCA5 images are selected) and Mutual Information Feature Selection (MIFS) Technique has been applied on the AVIRIS Indian pines images and selected the images bands 193, 65, 31, 170, 2 among 220 which is shown in Table 2. PCA1 image has higher variance than PCA2 and PCA2 has higher than PCA3 and so on. Kernel SVM and Maximum Likelihood Classifier (MLC) has been applied on the PCA images. The performance of accuracy of the KSVM and MLC classifier on PCA images are shown in Table 3. The performance of KSVM classifier outperforms the MLC classifier.

Figure 4: (a) Ground Truth Image (b) Original AVIRIS 220 band Image

Figure 5: Feature space of (a) PC1 (b) PC2 and (c) PC3 for AVIRIS data

Besides PCA, Kernel SVM and Maximum Likelihood Classifier (MLC) were also used to measure the classification accuracy of the feature obtained by Mutual Information based Feature Selection (NMIFS) technique.

Table 2: Selected feature for classification

Data Set Methods Orders of selected features

AVIRIS (Indian Pines)

Feature extraction (PCA) PC: 1, 2, 3, 4, 5

Feature Selection (MIFS) MIFS: 193, 65, 31, 170, 2

(a) (b)

(b)

(a) (c)

(7)

18 Table 3: Shows the performance on KSVM and MLC on PCA feature data (first 5 PCA feature)

No of features KSVM Testing Accuracy in % MLC Testing Accuracy in %

PCA 1 58.7 49.5

PCA 1+2 80.4 77.3

PCA 1+2+3 94.9 87.3

PCA 1+2+3+4 97.4 86.0

PCA 1+2+3+4+5 99.3 88.2

Table 4: Shows the performance on KSVM and MLC on MIFS

No of features KSVM Test Accuracy in % MLC Test Accuracy in %

Band 193 60.5 59.2

Band 193+65 79.4 61.5

Band 193+65+31 98.1 68.0

Band 193+65+31+170 100 68.5

Band 193+65+31+170+2 100 68.5

Table 5: Shows confusion matrix of KSVM on MIFS (training 5 features) Class

Name

Number Sample Pixel Number

Number of Samples in Class

1 2 3 4 5 6 7 8

Soy Woods Cornnottil Hay Soytill Corn Wheat Cornmin

Soy 1 96 96 0 0 0 0 0 0 0

Woods 2 170 0 170 0 0 0 0 0 0

Cornnottill 3 96 0 0 96 0 0 0 0 0

Hay 4 120 0 0 0 120

Soy-till 5 110 0 0 0 0 110 0 0 0

Corn 6 56 0 0 0 0 0 56 0 0

Wheat 7 63 0 0 0 0 0 0 63 0

Corn-min 8 90 0 0 0 0 0 0 0 90

Total 801 109 166 100 120 94 59 59 94

Overall Training Class Performance (801/801) =100%

Table 6: Shows confusion matrix of KSVM on MIFS (testing 5 features) Class

Name

1 2 3 4 5 6 7 8

Soy 1 168 168 0 0 0 0 0 0 0

Woods 2 130 0 130 0 0 0 0 0 0

Cornnottill 3 126 0 0 126 0 0 0 0 0

Hay 4 128 0 0 0 128 0 0 0 0

Soy-till 5 105 0 0 0 0 105 0 0 0

Corn 6 30 0 0 0 0 0 30 0 0

Wheat 7 84 0 0 0 0 0 0 84 0

Corn-min 8 80 0 0 0 0 0 0 80

Total 0 851 168 130 126 128 105 30 84 80

Overall Testing Class Performance (851/851) = 100%

In MIFS, MI between original image band and class labels is calculated to select the output feature for classification.

In the experiment image band 193, 65, 31, 170, 2 are selected among 220. It is seen that the KSVM classifier again

(8)

19 outperforms MLC in terms of classification accuracy for the NMIFS features which is shown in Table 4. Table 5 and Table 6 show the confusion matrix of training and testing samples classification using KSVM respectively. Also Table 7 and Table 8 show the confusion matrix of training and testing performance of MLC classifier respectively applied on 5 NMIFS features. Finally overall performance of KSVM and MLC are shown in Figure 6. It is seen that the KSVM provides improved classification accuracy in both the cases.

Table 7: Shows confusion matrix (training) of MLC on MIFS Class

Name

1 2 3 4 5 6 7 8

Soy 1 96 90 0 1 0 4 1 0 0

Woods 2 170 0 152 0 0 0 0 0 18

Cornnottill 3 96 1 0 91 0 3 1 0 0

Hay 4 120 0 0 0 120 0 0 0 0

Soy-till 5 110 20 0 6 0 76 5 3 0

Corn 6 56 0 0 0 0 0 55 1 0

Wheat 7 63 1 0 0 0 1 1 60 0

Corn-min 8 90 0 19 0 0 0 0 0 71

Total 801 112 171 98 120 84 63 64 89

Overall Training Class Performance (715/801) = 89.3%

Figure 6: Performance of KSVM vs. MLC on PCA image and MIFS features data.

Table 8: Shows confusion matrix (testing) of MLC on MIFS Class

Name

1 2 3 4 5 6 7 8

Soy 1 168 132 0 0 0 21 3 12 0

Woods 2 130 0 94 0 0 0 0 0 32

Cornnottill 3 126 0 0 125 0 1 0 0 0

Hay 4 128 0 0 0 128 0 0 0 0

Soy-till 5 105 6 0 4 0 84 11 0 0

Corn 6 30 0 0 2 0 0 26 2 0

Wheat 7 84 0 0 0 0 4 6 74 0

Corn-min 8 80 0 12 0 0 0 0 0 68

Total 851 138 110 131 128 110 46 88 100

Overall Testing Class Performance (735 / 851 ) = 86.4%.

(9)

20 5. Conclusion

Experimental analysis shows that KSVM provides better result than MLC as it has the ability to generating an optimal hyper plane between the class of interest and the rest. It also maps the input data to high dimensional space where the samples are easily separable using a hyper-plane. On the other hand MLC depends on the normal like data and suffers from the curse of dimensionality. The use of RBF kernel in KSVM makes it suitable to separate the class samples which are not linearly separable. Thus the approach with KSVM_NMIFS outperforms the state of the art approaches.

References

[1] A. Singh, “Review Article Digital change detection techniques using remotely-sensed data”, International Journal of Remote Sensing, vol. 10, no. 6, pp. 989-1003, 1989

[2] K.R. Manjula, J. Singaraju J. and A.K. Varma, “Data preprocessing in multi-temporal remote sensing data for deforestation analysis”, Global Journal of Computer Science and Technology Software & Data Engineering, vol. 13, no. 6, pp. 1-8, 2013.

[3] N.IS. Bahari, A. Ahmad and B.M. Aboobaider, “Application of support vector machine for classification of multispectral data”, 7th IGRSM. International Remote Sensing & GIS Conference and Exhibition. IOP Conf. Series: Earth and Environmental Science, 2014.

[4] A. Ahmad and S. Quegan, “Haze modelling and simulation in remote sensing satellite data”, Applied Mathematical Sciences, vol. 8, no. 159,pp. 7909-7921, 2014.

[5] M.F. Razali, A. Ahmad, O. Mohd and H. Sakidin, “Quantifying haze from satellite using haze optimized transformation (HOT)”, Applied Mathematical Sciences, vol. 9, no. 29, pp. 1407 – 1416, 2015.

[6] C. Huang C, L.S. Davis and J.R.S. Townshend, “An assessment of support vector machines for land cover classification”, International Journal of Remote Sensing, vol. 23, pp.725-749, 2002.

[7] A. Ahmad and S. Quegan, “Analysis of Maximum Likelihood Classification”, Applied Mathematical Sciences, vol. 6, pp. 6425 – 6436, 2012.

[8] B. Schölkopf and A. J. Smola, ”Learning with kernels: support vector machines, regularization, optimization, and beyond”. Cambridge, Mass.: MIT Press, 2002.

[9] M. A. Hossain and M. P. XiupingJia, “Improved feature selection based on a mutual information measure for hyperspectral image classification”, International Geosciences and Remote Sensing Symposium (IGARSS), 2011.

[10] D. S. Schimel, and P. D. Try, “Remote sensing of the land surface for studies of global change: Models – Algorithms–experiments”, Remote Sensing of Environment, vol. 51, pp. 3-26, 1989.

[11] D. A. Landgrebe, “Signal Theory Methods in Multispectral Remote Sensing Hoboken”, NJ: John Wiley

& Sons, 2003.

[12] C. J. C. Burges,“A tutorial on support vector machines for pattern recognition”, Data Mining and Knowedge Discovery, vol. 2, pp. 121–167, 1998.

[13] D.G. Brown, D. P. Lusch and K. A. Duda, “Geomorphology 21”, pp. 233 – 250, 1998.

[14] C. Chang and C.J. Lin, “LIBSVM: a library for support vector machines”, ACM Transactions on Intelligent Systems and Technology, vol. 2, 2011.

[15] M. Fauvel, J. Chanussot, and J.A. Benediktsson, “Kernel principal component analysis for classification of hyperspectral remote sensing data over urban areas,” EURASIP Journal on Advances in Signal Processing, vol. 2009, pp. 1–14, 2005.

[16] J.A. Richards and M. P. XiupingJia, “Remote Sensing Digi-tal Image Analysis”, 4^th Edition, Springer- verlag Berlin Heidelberg, Germany, 2006.

[17] G.F. Hughes, “On the mean accuracy of statistical pat-tern recognizers,” IEEE Transactions on Information Theory, vol. 14, pp. 55–63, 1968.

[18] J. Yang, P. Yu, and B. Kuo, “A nonparametric feature extraction and its application to nearest neighbour classification for hyperspectral image data,” IEEE Transaction on Geoscience and Remote Sensing, vol.

48, pp. 1279–1293, 2010.

[19] S. Tajudin and D.A. Landgrebe, “Covariance estimation with limited training samples,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, pp. 2113– 2118, 1999.

[20] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max- dependency, max-relecvance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1226–1238, 2005.

[21] C.Conese and F. Maselli, “Selection of optimum bands from tm scenes through mutual information analysis,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 48, pp. 2–11, 1993.

(10)

21 [22] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-

dependency, max-relecvance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1226–1238, 2005.

[23] K. Torkkola, “Feature extraction by non-parametric mutual information maximization,” Journal of Machine Learning Research 3, vol. 3, pp. 1415–1438, 2003.

[24] T.M. Cover and J.A. Thomas, “Elements of Information Theory”, Second Edition, pp. 776, 2006.

[25] K. Fukunaga, “Introduction to Statistical Pattern Recognition”, New York: Academic, 1990.

[26] D. A. Landgrebe, “https://engineering.purdue.edu/ biehl/ multispec/hyperspectral.html,”

[27] P. F. Hsieh and D. A. Landgrebe, “Classification of high dimensional data,” PhD Thesis and School of Electrical and Computer Engineering Technical Report, vol. 98-4, pp. 40–42, 1998.

[28] A, Sahar, “Hyperspectral image classification using unsupervised algorithms”, International Journal of Advanced Computer Science and Applications, vol. 7, pp. 198-205, 2016.

[29] K. Islam, M. Jashimuddin, B. Nath and T.K. Nath, “Quantitative assessment of land cover change using landsat time series data: case of Chunati Wildlife Sanctuary (CWS)”, Intern. J. Environ. Geoinform., vol.

3, pp. 45-55, 2016.

[30] B. Wang, J. Choi, S. Choi, S. Lee, P. Wu and Y. Gao, “Image fusion-based land cover change detection using multitemporal high-resolution satellite images”, Remote Sensing, vol. 9, pp. 1-19. 2017.

[31] J.S. Rawat and M. Kumar, “Monitoring land use/cover change using remote sensing and GIS techniques:

a case study of Hawalbagh block”, Egypt. J. Remote Sens. Space Sci., vol. 18, pp. 77-84, 2015.