Graph-Based Hyperspectral Image Classification Using Outliers Detection based on Spatial Information and Estimating of the Number of GMM Mixtures

(1)

Graph-Based Hyperspectral Image Classification Using Outliers Detection based on Spatial Information and Estimating of the Number of GMM Mixtures

Mahboubeh Lak Islamic Azad University Young Researchers Club

Ahmad Keshavarz Electricity Department Persian Gulf University

Hossein Pourghassem Department of Electrical Engineering,

Najafabad Branch,

Islamic Azad University, Isfahan, Iran [email protected] Abstract—In this article is tried to improve the hyperspectral

image classification by expectation maximization (EM) algorithm with proposed approaches for estimate the number of the mixture for image classes, covariance matrix correction and outlier detection. by decrease the number of the mixtures of an hyperspectral image, the time of the algorithm has decreased and by covariance matrix correction the accuracy and the validation of the classification could increased for the classes that have a few training sample .because the EM algorithm is an iterative algorithm, if in one of the step occur an error in classification, this error inter in the next steps and decrease the accuracy and the validation of the classification.

In this article this problem has addressed by outliers identification in each step and remove them for parameter estimation in the next step. Therefore, it is prevented from error propagation and increases the accuracy and the credit of the classification.

Keywords: classification, covariance matrix, EM algorithm, mixture, outlier sample.

I. INTRODUCTION

Traditional machine learning approaches used only a libelled set to train the classifier for classification. Labelled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled data may be relatively easy to collect, but there has been few ways to use them. Semi-supervised learning addresses this problem by using large amount of unlabeled data, together with the labelled data, to build better classifiers. Because semi-supervised learning requires less human effort and gives higher accuracy, it is of great interest both in theory and in practice. Labels are hard to obtain and unlabeled data are abundant, therefore semi-supervised learning is a good idea to reduce human labour and improve classification accuracy [1].

This paper presents a semi-supervised method for hyperspectral images classification [2]. In this paper try to improve the hyperspectral image classification by incorporating spectral and spatial information using semi- supervised learning. We assume each class has a Gaussian mixture model, with large amount of unlabeled data and the mixture components can be identified with the expectation- maximization (EM) algorithm.

Proposed algorithm is presented by attention to hyperspectral image classification problems such as high dimension of these images and few label data and reduces these problems effect as far as possible.

II. MIXTURE MODELS AND THE EMALGORITHM

Mixture model has long been used for semi-supervised learning, e.g. Gaussian mixture model (GMM) [3,4,5]. In this model training is typically done with the EM algorithm. It has several advantages: The model is inductive and handles unseen points naturally and it is a parametric model with a small number of parameters.

A. Combining EM algorithm and mixture model

In typical mixture models for classification, the generative process is the following [6]: one first select a class y, then chooses a mixture component m{1,....,M} with )p(m|y and finally a point x according to p(x|m) is generated. Thus ^p⁽^x^,^y⁾

¦

^m^M¹^p⁽^y⁾^p⁽^m^|^y⁾^p⁽^x^|^m⁾ ^{. In}

this paper we take a different but equivalent parameterization:

¦

^M^m ^p ^m ^p ^y ^m ^p ^x ^m

y x

p( , ) 1 ( ) ( | ) ( | ) (1) For all y we havep(y|m)!0.

The standard EM algorithm learns these parameters to maximize the log likelihood of observed data. Arbitrary distributions )q_i(m|i are introduced on mixture membership, one for each i . By Jensen’s inequality:

¦ ¦

4 L i

M

m i i i

i i i

i

i q m x y

m x p m y p m y p x m q L

1 ( | , )

)

| ( )

| ( ) ) ( ,

| ( log )

(

¦ ¦

U i

M

m i i

i i

i q m x

m x p m x p m q

1 ( | )

)

| ( ) ) (

| ( log

^t

¦ ¦

L i

M

m i i i

i i i

i

i q m x y

m x p m y p m y p

x m q

1 ( | , )

)

| ( )

| ( ) log ( ) ,

| (

¦¦

U i

M

m i i

i i

i q m x

m x p m x p

m q

1 ( | )

)

| ( ) log ( )

|

(

) ,

( 4

{ F q

(2) The EM algorithm works by iterating coordinate-wise ascend on q and 4 to maximizeF(q,4). The E step fixes 2013 International Conference on Communication Systems and Network Technologies

(2)

4 and finds q that maximizes F(q,4) . We denote the fixed 4 at iteration t by p(m)⁽^t⁾ p(x|m)⁽^t⁾ andp(y|m)⁽^t⁾. Since the terms of F has the form of KL divergence, it is easy to see that the optimal q are the posterior onm:

L i k x p k y p k p

m x p m y p m y p

x m

q M

k

i t i t

t

i t i t

t i

¦

, )

| ( )

| ( ) (

)

| ( )

| ( ) ) (

,

| (

1

) ( )

( )

(

) ( )

( )

) ( (

(3) U

i k x p k p

m x p m x p

m

q M

k

t i t

i t t i

t

¦

, )

| ( ) (

)

| ( ) ) (

| (

1

) ( )

(

) ( )

) ( (

(4)

The M step fixes q^(t⁾ and finds 4⁽^t¹⁾to maximizeF. Taking the partial derivatives and set to zero, we find:

¦

v

U L i

t i

t q m

m p

) ( )

1

( ( )

)

( (5)

¦ ¦

{

L i

i t y L i

i t t

mt

m q

m q m

y

p ⁱ

) ( 1 ,

) ( )

1 ( )

1 (

) (

) ( )

| 1

T ( (6)

) 0

| ( )

| ( ) 1 ( ⁽⁾

4 w

¦

w

x

i U i

L i

t i

m x p m x m p q

(7) The last equation needs to be reduced further with the specific generative model for x, e.g. Gaussian. For Gaussian, we have:

¦ ¦

U L i

i t U i L i

i t mt

m q

x m q

) (

) ( )

1 (

) (

)

P ( (8)

¦ ¦

¦

U L i

t i

T t m i t m U i

L i

t t i

m q m

x x

m q

) (

) ( )

( ) ( )

1 (

) (

) )(

( )

( P P

(9) After EM converges, the classification of a new point x is done by

¦ ¦

¦

M m M m

M

m

m p m x p

m p m x p m i y p

x m p m i y P x

i y p

1 1

1

) ( )

| (

) ( )

| ( )

| (

)

| ( )

| (

(10)

B. Class mixtures estimation

If pixels that are connected in training sample, are supposed as a mixture for a class, we will have many mixture and therefore calculation time will increased. So each class mixtures must decrease. Suggestive method for

20 40 60 80 100 120 140

Figure (1): classification without covariance correction

20 40 60 80 100 120 140

Figure (2): classification after covariance correction.

determination the number of the mixtures of a class, uses squared Mahalanobis distance among all pair mixtures of a class [7]. First, it is supposed that each class has m mixtures according to its training samples regions. So (m-1) distances are calculated for any mixture and totally m×(m-1) distances for all mixtures. For classes with #m!2 , If first and second mixture have covariance matrix 6₁, 6₂ and mean vector P₁^, P2 respectively; squared Mahalanobis distance(D²) between tow mixture is calculated from equation (11).

) (

)

( ₂ ₁ ^' ¹ ₂ ₁

2 P P

¦

P P

D (11)

2

2 2 1 1

¦

¦ ¦

n n n

(12)

2 1,n

n are the number of first and second mixture pixels respectively and n n₁n₂.

Minimum distance of each class and the average of minimum distances of all classes are calculated. Then, mixtures that the distance between them is shorter than average distance, is addressed "mixture #1" and the other mixtures addressed "mixtures#2 to m".

(3)

If a class has tow mixtures, for first and second mixture, covariance matrix is calculated from (13), (14) respectively.

m m m m

m m m

n n n m m m M m

n n n

z

¦

¦ ¦

1 1 1

1

&

,

&

,...., 1

2

2 1

(13)

m m m m m m

m

n n n m m m M m

n n n

z

¦

¦ ¦

2 2 2

2

&

,

&

,..., 1

2

2 1

(14) M is the number of total mixtures.

Then squared Mahalanobis distance is calculated from equation (11). Now there are (M-2) distances for each mixture. Mean distance is calculated for each class mixture and finally a mean value is calculated for this class (mv). If the distance between this class mixtures is shorter than (mv) value, tow mixture define as mixture#1 for this class otherwise first and second mixture define as mixture#1 and mixture#2 respectively.

C. Covariance correction

The number of training sample is important in covariance calculation and obviously impresses the result of classification. Also, Figure (1) shows, after first step of classification, pixels around some mixtures take wrong label.

Considering these mixtures, represent that wrong labelling is because of few training samples. Therefore after first step, these mixtures must be detected and covariance matrix must be corrected too.

So at first covariance matrix is calculated in the case that each class has one mixture. Then by edge detection algorithm, neighbourhood pixels around each class mixture are tested and those mixtures that up to 80% of their neighbourhood pixels have wrong label, are detected and their covariance matrix are corrected. If a class has one mixture, its covariance matrix is corrected by equation (15).

i n

n n

n

n _c _c _m _m

m

¦

¦ ₁¦₁ ₂ ₂ ...

(15) Here i=16 andn₁,…….,n_i are the number of a class pixels and n= n₁+…….+n_i+n_m .

If a class has tow or more mixture, its covariance matrix is corrected by equation (16).

2

¦

¦ ¦

n n ni i m m

m (16) Where n=n_i+n_m.

When covariance matrix is corrected, classification runs again. This method is represented in Figure 4. Figure (1),(2) are shown the images before and after covariance matrix correction (first step).

20 40 60 80 100 120 140

Figure (3): image after classification with proposed algorithm

D. Outlier detection

Proposed algorithm is iterative. If in one of the steps, an error occurs in classification, this error propagates in the next steps and decreases the accuracy and the reliability of the classification [8]. In this article this problem has addressed by outliers detection in each step and remove them for better parameter estimation in the next step [9,10]. These pixels call outlier sample.

Therefore, error propagation is decreased and the accuracy of the classification is increased. The suggestive method is based on the values of pixels entropy and their spatial information. At first entropy is calculated for all of the image pixels by equation (17).

¦

_ipx i px i x

entropy( ) ( | )log ( | ) (17) Figure (5) represents this method for Outlier detection.

E. Stop condition

EM algorithm is an iterative process that uses classification result of previous step and run the classification process again. Stop condition, that is proposed here, for EM algorithm is unequal (18).

01 . 145 0

145

u

nn (18) Where nn is the number of pixels that their class in (t+1) step is different from t step.

III. RESULTS

The AVIRIS hyperspectral image which has been used in this research is related to forest-agriculture zone in Indiana state north-east in 1992 June [11]. This image has 220 bands and 145×145 pixel. This region according to its plants has 16 different classes.

(4)

Without suggestive method (estimate the number of mixture, covariance matrix correction and outlier detection) EM algorithm iterates sixteen times, total accuracy and total reliability of classification yield 53/65% and 62/54%

respectively. With proposed method, iteration decreases to fourteen times and total number of the mixtures decreases from 42 to 28 .Also accuracy and reliability of classification increases to 64/42% and 70/84% respectively. So this method can increase total classification accuracy and reliability up to 11% and 8% respectively. Total accuracy and reliability variations of classification are shown in Figure (6), (7) respectively. These charts illustrate that iteration decrease from 16 to 14. Image that classified by proposed method is shown in Figure (3).

Figure (4): represented method for covariance correction

Figure (5): represented method for Outlier detection

Figure (6): accuracy variations of classification by proposed algorithm

Yes No

Run the first step of classification

Calculation of the covariance matrix (every class has one mixture)

f(i): number of new mixture of class i

1- detection boundary pixels of every mixture of every class and calculation of the number of them(pi)

2- definition of the number of boundary pixels of that mixture that has wrong label(nu)

pi nu

if !80%u

f(i) if

i n

n n n

n c c m m

m

¦ ¦ ¦

¦

¹

¦

¹ ² ² ^...

2

¦

¦ ¦

n n ni i m m m

Run t-th step of classification and parameter estimation

Yes

Pixel x detects as an outlier in t-th step, then it will remove for next step No

If amount "entropy

×weight" of pixel x is higher than the obtained average of

every class

1- Edge detection for all classes regions and determine first and second neighborhood pixels.

2- Calculate the entropy of first and second neighborhood pixels

3- Calculate "entropy ×weight" upper pixels for first and second neighborhood pixels of every class by use of estimated parameters.

4- Calculate mean of "entropy ×weight" for first and second neighborhood for each class.

5- Calculate average of two means of every class

(5)

Figure (7): reliability variations of classification by the proposed algorithm

REFERENCES

[1] S. Rosset, J. Zhu, H. Zou & T. Hastie, A method for inferring label sampling mechanisms in semi-supervised learning. In L. K. Saul, Y.

Weiss and L. Bottou (Eds.), Advances in neural information processing systems 17. Cambridge, MA: MIT Press. 2008.

[2] U. von Luxburg, O. Bousquet, M. Belkin, Limits of spectral clustering. In L. K. Saul, Y. Weiss and L. Bottou (Eds.), Advances in neural information processing systems 17. Cambridge, MA: MIT Press, 2005.

[3] J. Ratsaby, S. Venkatesh, “Learning from a mixture of labeled and unlabeled examples with parametric side information”, Proceedings of the Eighth Annual Conference on Computational Learning Theory, pp 412–417,1995.

[4] K. Nigam, R. Ghani, “Analyzing the effectiveness and applicability of co-training,” Ninth International Conference on Information and Knowledge Management, pp. 86–93, 2000.

[5] V. Castelli, T. Cover, “The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter”

IEEE Transactions on Information Theory, 42, 2101–2117,1996.

[6] X. Zhu, Semi-supervised learning with graphs. Doctoral dissertation, Carnegie Mellon University. CMU-LTI-05-192, 2005.

[7] E. Capp, O. Douc, R. Guillin, A. Marin, J-M. & C. P. Robert, Adaptive importance sampling in general mixture classes. Statistics and Computing, 18(4):447{459, 2008.

[8] E. Acuna, C. A. Rodriguez,”Meta analysis study of outlier detection methods in classification,” Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez, Retrived from academic.uprm. In proceedings IPSI 2004, Venice, 2009.

[9] C. Becker, U. Gather, “The largest nonidentifiable Systems Architecture”, IBM Systems Journal, Vol. 26, No. 3,2001.

[10] C. Becker, U. Gather, “The masking breakdown point of multivariate outlier identification rules”. Journal of the American Statistical Association, 94, 947-955, 1999.

[11] L. Jimenez, D.A. Landgrebe, “Hyperspectral Data Analysis and Feature Reduction via Projection Pursuit,” IEEE Trans.

Geosci.Remote Sens., Vol. 37, No. 6, pp. 2653-2667, Nov. 1999.