Robust Linear Bayes Classifier for Microarray Gene Expression Data Analysis
Md. Mushfiqur Rahman1 Md. Alim Hossen2 Abu Shaleh Mahmud3 Md. Nurul Haque Mollah4 Dept. of Statistics, University of Rajshahi
Rajshahi-6205, Bangladesh.
[email protected], 1 [email protected],2 [email protected],3
Abstract— Classifying tissue samples or genes into different groups is a central task of bioinformatics research. Different classification (supervised and unsupervised) methods are available. However, these existing methods produce low accuracy results if the data set is high-dimensional or contaminated by outliers.
Microarray gene expression data contains high level of noise. These dimensionality issues and the noise in the dataset, substantially influence classification accuracy. To noise resistance and accurately classify of tissue samples or genes, a suitable method is necessary to achieve good performance. In this paper, we proposed highly robust Linear Bayes classification rule for supervised classification in gene expression studies.
Index Terms— Bayes classifier, Gaussian distribution, minimum β-divergence method, Robustness and Gene-expression data.
I. INTRODUCTION
Tumor/cancer classification based on gene expression data or gene classification based on gene expression on normal cell and cancer cell playing the significant role in cancer research [9–11,20,23,24,27]. It is a technique that successfully used to explore the relationship among the genes and the tissue samples [17]. Gene expression data analysis using classifications are considered Golub et. al., Alon et.al. Slonim et. al., Ben-dor et. al., Veer et.al [15, 16, 20, 25,27] and other researchers to elucidate unknown gene function, phenotypic outcomes, disease diagnosis such as tumor or cancer types, analysis of prognosis and treatment outcomes, clinical drug analysis and so on. In the literature, there are several classifier such as Bayes classifier, support vector machines, Fisher linear discriminant analysis, k-Nearest Neighbor classifier, classification tree, Bagging, boosting among others has been applied in gene-expression data analysis studies [5,11,16,18,19,22].
Mainly gene-expression data viewed as a matrix with large number of genes and small number of tissue small. In statistics, it is called n << p where, n shows the number of samples of tumor/ cancer diagnosis and p shows the number of genes and in such situation most of the classification algorithm is infeasible. We can avoid this problem by extracting the essential features from the original data, which was conducted by the many authors [31-34] and so on. Moreover, a comparative classification study can be found in [18] based on selected sets of relevant genes. However, gene expression data are often contains outliers due to several steps involves from sample collection to image processing and these data lead to unreliable and low accuracy analysis as well as the high dimensionality problem. Again which gene expression data contains noise that genes intensity value is very high and potentially drown out biological function of other genes and a cause of high intensity irrelevant expression of gene may be outlier [7]. Thus in this paper our approach is to design a robust classification method for microarray data classification. Analyzing the simulated and real gene expression data the proposed methods performance is better than other classical methods. The advantages of our proposed method are that it can resistance high-dimensionality as well as outlier problem.
II. METHODS
Linear Bayes Classifier
Suppose we have a training gene expression samples
( ), ( ), … , ( )obtained from a gene expression density for k=1… m. Our goal is to classify a new gene expression sample x = ( , , … , ) into one of m gene expression population group Π based on the training sample. The pdf of the new gene expression sample x can be defined as the mixture distributions as follows-
( ; ) = ( ; ), (1)
Where, is the mixing proportion of x belongs to population Π such that∑ = 1.
Then the posterior pdf x belongs to Π is given by-
(Π | ; ) = ( ; )
∑ ( ; ) (2)
For the Bayes classifier, the space of all observations is divided into m mutually exclusive and exhaustive regionsΩ, (k = 1 . . . m). Then define region Ω for classifying new gene expression sample x to one of m populations as follows-
Md. Matiur Rahaman
Dept. of Bioinformatics, College of Life Sciences Zhejiang University, Zijingang Campus,
Hangzhou 310058, China.
Ω : ( ; ) > ( ; ), = 1,2, … , , ( ≠ ) (3)
This minimizes the expected cost of misclassification [1]. For simplicity we assumed that cost and mixing proportion is equal for each population and the gene expression distributions are multivariate Gaussian with identical covariance matrices, then the Bayes classifier is linear [14] and is given by-
( ) = − (4)
Where, = ( )− ( ) and = ( )+ ( ) . The standard estimates of the mean vector ( ) and the covariance matrix are strongly influenced by outliers. It is obvious that the Linear Bayes classifier dependent on the feature vector, mean vectors and covariance matrices those are estimated by the non-robust maximum likelihood estimates based on training sample. Therefore, traditional Linear Bayes procedure may produce misleading results in presence of outliers in the training or test or in both gene expression datasets. To improve the results, Matiur and Mollah, 2014 [35] proposed a robust Bayes classification rule using minimum β−divergence method [4,8].
Robust Linear Bayes classification Rule:
The minimum β-divergence estimators ( ) and ( ) for the mean vector
( ) and the covariance matrix ( ) respectively are obtained iteratively as follows-
( ) =∑
( );( ),( ) ( )
∑ ( );( ),( ) (5) and,
( )=∑
( );( ),( ) (( );( )) (( ) ( )) (( ) ( ))
( ) ∑ ( );( ),( ) (6)
where, ( ); ( ), ( ) = − ( ( )− ( )) ( ) ( ( )− ( )) is β-weight function [4,8] . If ( ) does not exist, then Moore-Penrose generalized inverse of ( ) is used during iteration. If β tends to 0, then (5) and (6) reduces to the classical non- iterative estimates of mean and covariance matrix respectively. However, the β- weight function plays the significant role for robustification of Linear Bayes classifier as discussed below-
Step-1: First, calculate β-weight for the new gene expression feature vector (x) using the β-weight function- ,( ) = ; ̂( ), ( )
= − − ̂( ) ( ) − ̂( ) ; > 0,
=
1, = 0
> , ∈ Π
< , ∉ Π
0, = ∞
(7)
and then we construct a criteria to test whether the feature vector is contaminated or not as follows-
( ) = ,( ) = > ,
< , (8) where, = (1 − ) ∈ ( ) + ∈ ( ), with heuristically η = 0.15, where, D is the gene expression dataset including the new test sample x. It was also used in [8] for choosing the threshold value. If the test sample x is not contaminated by outliers, we compute any one of classifications regions defined in (3) using the minimum β-divergence estimators {̂( ), } of {( ), } where, is computed using equation (6). If test sample x is contaminated by outliers, we classify it by replacing its contaminated components by their mean components as discussed in the following step- 2.
Step-2: For contaminated test gene expression sample x we calculate the absolute difference between the contaminated x and each training gene expression mean vector as-
= abs − ̂( ), ; = 1,2, … , , and ordered such that ( )≤ ( )≤ ⋯ ≤
( ) and then compute sum of the smallest r components of as
= ∑ ( ); = 1,2, … , . Now, we find the tentative class or population for x as-
International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015
05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh www.ru.ac.bd/icmeie2015/proceedings/
ISBN 978-984-33-8940--4
Fig.1
Fig.2
Fig 3(a): Clean data set
Fig 3(b): Training Contaminated data set
k = (9)
and some or all components of the unclassified contaminated x corresponding to
( ), ( ), … , ( ) are assumed to be corrupted by outliers. Then we update x by replacing its corrupted components with the corresponding mean components from the mean vector ̂( ) of k-th population. Let x* be the updated vector of the contaminated data vector x. Then we use x* instead of x to compute any one of classifications regions defined in (3) for classifying x*.etc.
III. GENE EXPRESSION DATA ANALYSIS Simulated Gene Expression Data Set
In here we are generated microarray data using a model displayed in fig. 1 and this model generates microarray gene expression datasets with two levels corresponding to two sets of differentially expressed genes. We randomly added Gaussian noise N (0, σ2) with each expression of each gene. Also, [12] have used this model for generating simulated microarray data sets. In fig.1, columns represent the gene expression positive and negative control of patients and rows represent the gene group. In this simulation study we have generated gene expression datasets with two levels corresponding to two sets of differentially expressed genes where, d is the difference among the expression.
Using this model (fig.1) we generated several training genes set and test genes set with setting different values of the parameters d and with different number of samples (n =
Fig 3(c): Test Contaminated datasets
Fig 3(d): Both Contaminated data set
n1 + n2) and genes (g = g1 + g2). In mathematical terms, let, X= [ , ]× be a gene expression matrix. The numerical value of xij represents the expression
level of a specific gene i (i=1,2, . . . , n) of a particular sample
j (j=1,2, . . . , p). Then a contaminated expression of X is = × , where, κ = v exp{N(0, 1)}; v ∈ (30, 100).
Real Gene Expression Data Set
Head and neck cancer data (HNC): We have analyzed the publicly available microarray data in the study of head and neck cancer, where RNA was extracted from 22 paired samples of HNSCC and normal tissue from the same donors and hybridized to the Affymetrix U95A chip [6]. This data consist of 12625 cellular RNA transcripts contain tumor and normal tissues from 22 patients with histologically confirmed HNSCC.
Colon Data:
Gene expression data from the microarray experiments of colon tissue samples of [15]
gives the expression levels of 2,000 genes for 62 samples of which 40tumor tissues and 22 normal tissues. Although, originally 6,000 gene expression data were measured on Affymetrix human. Among these 6,000 genes 4,000 genes were removed considering the reliability of measured values in the measured expression levels . The measured expression values of 2,000 genes are publicly available at colonCA library in Bioconductor [30] and in http: //microarray.princeton.edu/oncology/affydata/index.html.
Gene Selection: Dimensionality reduction of microarray gene expression data has been performed by many authors, as for example [16,18,20] among others. In our paper, we applied 7 gene selection methods in order to improved classification performance for real gene expression data analysis as Empirical Bayes (EB), significance analysis of microarrays (SAM) [29], t-statistic, linear models for microarray data (Limma) [13], GaGa, [28], Bridge [21] and β-Empirical Bayes ( β-EB) [3]. These methods were applied in the datasets and firstly we found little agreement in gene lists produced by the each of the methods. The t-test, Limma, and SAM detect DE genes based on p-values while, the EB approach, GaGa, Bridge and the β−EB approach detect DE genes based on posterior probabilities. To detect the DE genes in here we consider p-value< 0.05 and the posterior probability (>0.9). Table 1 and Table 2 shows the DE gene lists for two microarray dataset produced by the each of the methods. Also each method obtained same 239 and 24 DE genes for HNC and colon cancer datasets respectively. Then we ordering the DE genes according to its p-value and posterior probability and then selected top 20 genes for each datasets to evaluate the class prediction efficiency of each gene list in training and test cross-validation using six supervised classification methods.
A complete workflow for real data analysis is shown in figure 2.In fig. 2 the most highly differential genes were selected from 2 gene expression datasets using 7 feature selection approaches. In each case genes were selected and 6 classifiers are performed and then misclassification error rate for each classifier was recorded using LOCV. For the performance of our proposed method, we assume that the original data have zero level of noise. Therefore, we added some noise to the original DE gene set in the same way described as before.
IV. RESULT AND DISCUSSION
Simulation results:
Exploring the performance of our proposed method we calculated misclassification rate (MR) for each of the comparative classification method. Therefore, we generated 1,000 simulated microarray datasets for computing MR and added 20% outliers for data
International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015
05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh www.ru.ac.bd/icmeie2015/proceedings/
ISBN 978-984-33-8940--4
Table 1: Misclassification rate (MR) for Head and Neck cancer dataset of top 20 different sets of differentially expressed genes of the different classifier and our proposed classifier.
Table 2: Misclassification rate (MR) for Colon cancer dataset of top 20 different sets of differentially expressed genes of the different classifier and our proposed classifier.
Table 3: Misclassification rate (MR) of the top 20 same differential expressed (DE) genes obtained in different differential expressed calculation method of different classifier including our proposed method.
contamination. Basically, in this paper we compared our proposed classifier with the traditional Liner Bayes classifier, Support Vector Machine (SVM), k−Nearest Neighbor (KNN) and Boosting (Adaboost and Logitboost) method. Fig. 3 shows the box plot of the MR of each classifier for simulated (with and without outliers) dataset, generated with the same setting i.e.,n = 100; d = 2; σ = 3; g = 60. Fig. 3 (a) the box plot shows that without contamination the traditional method and our proposed method produced same result but the SVM performance is better. Fig. 3 (b) the box plot shows that when we have contaminated training data Logitboost and our proposed method MR is smaller than other four methods (Linear Bayes, SVM, KNN and Adaboost). Fig.3 (c) the box plot rate (FPR), false discovery rate (FDR) and misclassification rate (MR) for seeking the performance of the proposed method using different setting of the parameters in the simulated gene expression datasets.Fig.3 represents box plots of MR for the simulated microarray gene expression data and to investigate the performance of our proposed method and it is seen that proposed method gives the smaller error rate in presence of outliers and in absence of outliers its MR is same as classical methods. Table 4 shows the average TPR, FPR, FDR and MCR of the predicted sample with n = 60, 70, 80, 90, 100;
d = 2, σ = 3, g = 30. Table 5 shows the average TPR, FPR, FDR and MCR of the predicted sample with n = 100; d = 1, 1.5, 2, 2.5, 3; σ = 3, g = 30. Table 6 shows the average TPR, FPR, FDR and MCR of the predicted sample with n = 100; d = 2; σ = 1, 1.5, 2, 2.5, 3, g = 30. Table 7 shows the average TPR, FPR, FDR and MCR of the predicted sample with n = 100; d = 2; σ = 3, g = 30, 40, 50, 60, 70. From the results, in each of the cases our proposed method shows the better results than other methods.
Analysis of the head and neck cancer (HNC) data:
Using the top 20 genes table 1 shows the MR for the original and contaminated HNC dataset. From the result, we observed that for both the classical and robust Linear Bayes MR also similar for the each gene list. In the gene set obtained by EB approaches Adaboost classifier produced the smaller MR than others, the gene set obtained by t-test classifier Linear Bayes, SVM, Adaboost, robust Linear Bayes and the gene set obtained by SAM methods classifier Linear Bayes, SVM, KNN, robust Linear Bayes produced
Table 4: To investigate the performance of robust linear Bayes classifier 1,000 contaminated simulated gene expressions datasets average results (with increasing number of sample n) are shown below:
Table 5: To investigate the performance of robust Linear Bayes classifier 1,000 contaminated simulated gene expressions datasets average results (with increasing difference d) are shown below:
Table 6: To investigate the performance of robust Linear Bayes classifier 1,000 contaminated simulated gene expressions datasets average results (with increasing variance σ2) are shown below:
shows that when we have contaminated test dataset our proposed method MR is smaller than among all the classifier. Also, Fig. 3 (d) the box plot shows that when we have contaminated both data set our proposed method gives the better performance comparatively other methods. We also computed true positive rate (TPR), false positive smaller MR than other classifiers respectively. For the gene set obtained by Lima and Bridge the classifier Linear Bayes, SVM and robust Linear Bayes MR is smaller than
International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015
05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh www.ru.ac.bd/icmeie2015/proceedings/
ISBN 978-984-33-8940--4
Table 7: To investigate the performance of robust Linear Bayes classifier 1,000 contaminated simulated gene expressions datasets average results (with increasing number of genes g) are shown below:
others. For the gene set obtained by GaGa and r-EB approaches the classifier SVM, KNN and Linear Bayes, SVM, Logitboost, robust Linear Bayes MR is smaller than others.
Evaluate the performance of our proposed method we added 10% outliers in the original dataset and from the Table 1 we see that in each gene set our proposed classifier produced better results than others.
Analysis of the colon cancer data:
From Table 2 we observed that the gene set obtained by the method t-test, Limma and Bridge SVM classifier produced smaller MR than other classifier, although Linear Bayes and our proposed method MR is smaller. The gene set obtained by Limma and r- EB methods, classifier Linear Bayes and robust Linear Bayes MR is smaller than other.
Here we see that in the entire gene list our proposed method performance is similar like as other classifier which produced smaller MR in the colon cancer original dataset.
Again, for the performance of our proposed method we added only 10% outliers and from Table 2 we see that no method does better perform than our proposed method.
Table 3 shows the MR of the different classifier including our proposed one of the same gene set, obtained by the DE genes calculation method early mentioned. The each method calculated 239 genes for HNC and 24 genes for colon cancer dataset and these set of gene’s characteristic is same.
V. CONCLUSION
We have introduced a robust statistical classifier extending the Linear Bayes classifier using β-divergence method. The methodology we designed for noisy gene expression data analysis. Also, our proposed method is useful for multiclass classification problem in both contaminated and uncontaminated data analysis state. To investigate the performance of the proposed method in a comparison of the classical Linear Bayes method for identification of unknown genes group, we calculated the average values of TPR, FPR, FDR and MR, based on the 1,000 simulated microarray datasets. Clearly we see that performance of both classical Linear Bayes and the proposed method are almost same in absence of outliers, while in presence of outliers, the proposed method shows much better performance than the classical Linear Bayes method. In this simulation the classical Linear Bayes method was found to have much lower value of TPR and higher value of FPR, FDR and MR compared with the case in the presence of outliers. In the analysis of real data we also observed that the results of traditional Linear Bayes and robust Linear Bayes maintained similar values of MR for original datasets. But for contaminated datasets robust Linear Bayes classification accuracy comparatively higher than others classification methods have compared in this study. For the real dataset LOCV (leave-one-out cross- validation) of classifiers was performed using full training and test cross-validation to estimate the performance of the classification algorithm. In this paper also, we have developed a criteria to scrutinize the contaminated gene expression sample or gene. We have updated the contaminated expressions by its corresponding training sample robust group mean [4] based on minimum distance from training sample group. Thus, our method may increase the prediction accuracy for the noisy gene expression dataset. Also, the value of the tuning parameter β plays a key role on the performance of the proposed method and diagnosis the contaminated expressions and we selected β using cross-validation. For each of the contaminated simulated data, the value of the tuning parameter β was selected among on an average 0.015, for real data 0.001 and for each uncontaminated β was selected as 0.
VI. ACKNOWLEDGMENT
This work is supported by HEQEP sub-project (CP-3603, R-3, W-2), Bioinformatics Lab, Department of Statistics, University of Rajshahi, Bangladesh. The authors would like to thank the anonymous reviewers for their helpful comments.
VII. REFERENCES
[1] Anderson, T.W.(2003): An Introduction to Multivariate Statistical Analysis, Wiley Interscience.
[2] Johnson, R.A., Wichern, D.W. (2007): Applied multivariate statistical analysis, Sixth edition, Prentice-Hall.
[3] Mollah, M. M. H. , Mollah, M. N. H. and Kishino, H (2012):β-empirical Bayes inference and model diagnosis of microarray data BMC Bioinformatics,13:135.
[4] Mollah,M.N.H.,Minami,M. and Eguchi, S. (2007): Robust prewhitening for ICA by minimizing beta-divergence and its application to FastICA. Neural processing,Letters,25(2),pp.91-110.
[5] Statnikov, A. , Wang, L. and Aliferis, C., F. (2008):A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification BMC,Bioinformatics,9:319.
[6] Kuriakose MA, Chen WT, He ZM, Sikora AG, Zhang P, Zhang ZY, Qiu WL, Hsu DF, McMunn- Coffran C, Brown SM, Elango EM, Delacure MD, Chen FA: Selection and validation of differentially expressed genes in head and neck cancer. Cell Mol Life Sci 2004 61, 1372-1383.
[7] Mollah, M.N.H., Pritchard, M., Komori, O. and Eguchi, S. (2009): Robust hierarchical clustering for gene expression data analysis. Communications of SIWN,Vol. 6, pp. 118 -122.
[8] Mollah, M.N.H.,Sultana,N., Minami, M. and Eguchi, S. (2010): Robust extraction of local structures by the minimum β-divergence method. Neural Networks, 23, pp. 226-238.
[9] Wang,S.,Gui,j. and Li,X. (2008): Factor analysis for cross-platform tumer classification based on gene expression profiles. Journal of Circuits,Systems,and Computers, 19, pp.243-258.
[10] Wuju L. and Momiao X.(2002): Tumor classification system based on gene expression profile.
Bioinformatics, 18(2):pp.325-326.
[11] Wright G.,Tan B., Rosenwald A., Hurt E., Wiestner A. and Staudt L. (2003): A gene expression- based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci USA, 2003, 100:9991-9996.
[12] Nowak, G. and Tibshirani, R. (2008): Complementary Hierarchical Clustering. Biostatistics. 9, 3, 467-483.
[13] Smyth GK (2004): Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl GenetMol Biol , 3(1):Article3.
[14] Hua, J. et al. (2004): Optimal number of features as a function of sample size for various classification rules. Bioinformatics, vol. 21 no. 8, pages 1509-1515.
[15] Alon,U., Barkai,N., Notterman,D., Gish,K., Mack,S. and Levine,J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS, 96, 6745-6750.
[16] Ben-Dor,A., Bruhn,L., Friedman,N., Nachman,I., Schummer,M. and Yakhini,Z. (2000) Tissue classification with gene expression profiles. J. Comput. Biol., 7, 559-583.
[17] Basford, K. E., McLachlan, G. J. and Rathnayake, S. l. (2012) On the classification of microarray gene-expression data. Briefings in Bioinformatics.
[18] Dudoit,S., Fridlyand,J. and Speed,T.(2002) Comparison of discrimination methods for the classification of tumors using gene expression data. JASA, 97, 77-87.
[19] Furey,T., Cristianini,N., Duffy,N., Bednarski,D., Schummer,M. and Haussler,D. (2000) Support vecto machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16,906-914.
[20] Golub,T., Slonim,D., Tamayo,P., Huard,C., Gaasenbeek,M.,Mesirov,J., Coller,H., Loh,M., Downing,J. et al., (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.Science,286,531-537.
[21] Gottardo, R. (2010): Bayesian Robust Inference of Differential Gene Expression, the bridge package, R, Bioconductor.
[22] Li, L., Rakitsch, B. and Borgwardt, K. (2011) ccSVM : correcting Support Vector Machines for confounding factors in biological data classification. Bioinformatics, Vol. 27,pagesi342-i348.
[23] Li, B. et al.(2010): Gene expression data classification using locally linear discriminant embedding Computers in Biology and Medicine 40,802-810
[24] Tang, K. L. et al. (2010):Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data BMC Bioinformatics,11:109
[25] Slonim,D., Tamayo,P., Mesirov,J., Golub,T. and Lander,E. (2000) Class prediction and discovery using gene expression data. In Proceedings of the 4th International Confererence on Computational Molecular Biology. Universal Academy Press, Tokyo, Japan, pp. 263-272.
[26] Scott, L.P. et al. (2002) Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression”, Letters to Nature, Nature, 415:436-442.
[27] Veer, L.J. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer.
Nature, 415, 530-536.
[28] Rossell D 2009: GaGa: A parsimonious and flexible model for differential expression analysis.
Ann Appl Statist, 3:1035-1051.
[29] Tusher V, Tibshirani R, Chu G (2001): Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci(PNAS), USA, 98:5116-5121.
[30] [http://www.bioconductor.org/].
[31] K.Bae,B.K.Mallick,Geneselectionusingatwo levelhierarchicalBayesian model, Bioinformatics 20(2004)3423-3430.
[32] K.E.Lee,N.Sha,E.R.Dougherty,M.Vannucci,B.K.Mallick,Geneselection:a Bayesian variable selection approach, Bioinformatics,19(2003)90-97.
[33] J.G.Liao,K.V.Chin, Logistic regression for disease classification using microarray data:model selection in a large p and small n case, Bioinformatics 23 (15)(2007)1945-1951.
[34] S. Shah, A. Kusiak, Cancer genese arch with data-mining and genetic algorithms, Computers in Biology and Medicine 37(2)(2007)251-261.
[35] Md. Matiur Rahamna and Md. Rurul Haque Mollah (20014 submitted for publication) Robustification of Gauussian Bayes Classifier by the minimum β-divergence method.