SWUP
Parameter estimation of kernel logistic regression
Riska Yanu Fa’rifah*, Suhartono, Santi Puteri Rahayu
Department of Statistics, Institut Teknologi Sepuluh Nopember, Sukolilo, Surabaya 60111, Indonesia
Abstract
Logistic regression (LR) is a nonlinear classification method, often used for binary data sets. Over fitting the training data sets may arise in LR, especially when the data sets are used have high-dimensional. One of approaches to reduce over fitting is through regularized LR method. Regularized LR can be defined as log-likelihood function of LR adding with regularization parameter. There are regularized optimization problem, because the Loss function (deviance) in regularized LR nonlinear. To minimize this problem, need a linear combination method of regularized LR, known as kernel logistic regression (KLR). KLR is a nonlinear classifier. KLR provide higher classification accuracy of small to medium sample size data sets when compared with LR. With truncated newton method, estimation KLR using maximum likelihood estimation (MLE) can be optimum.
Keywords kernel logistic regression, logistic regression, MLE, regularized logistic regression, truncated Newton
1.
Introduction
Regression is one of statistical method that described the causal relationship between response and predictors (Draper & Smith, 1998). If the response is categorical data (nonmetric), then the analysis which can be used is a classification method, such as logistic regression (LR). Over fitting the training data sets may arise, especially if the data sets have a high-dimensional (Hosmer & Lemeshow, 2000). One of approaches to reduce over fitting is a quadratic regularization, known as regularized LR (Maalouf, 2009). Regularized LR can be formed from adding a regularization parameter on log-likelihood function of LR. If the analysis using the small to medium sample size, loss function (deviance) produced not minimum, because deviance is nonlinear in its parameters. This situation due to parameter estimation of KLR using MLE is not optimum. It can be solved by making a linear combination of regularized LR, known as kernel logistic regression (Maalouf, 2009).
KLR is a nonlinear classifier method. KLR is combination of regularized LR and kernel. Parameter estimation of KLR using MLE not close form, so to optimize the parameter estimation use numerical method. This method named newton raphson (Minka, 2003). However, newton raphson does not provides an optimum estimation. It caused by high-dimensional of hessian matrix. Thus, MaaLouf et al. (2010) adding conjugate gradient algorithm on truncated newton method. This method was first used (Komarek & Moore, 2005) to get the parameter estimation method regularized LR.
2. Materials and methods
2.1 Logistic regression
LR is one of linear classifier method. LR is used to determine the relationship between the categorical dependent variable with one or more predictors both of categorical and continuous data (Agresti, 2002). Given the LR model:
ž Ÿ = ¡¢ Ÿ„,
" ¡¢ Ÿ„,. (1)
From Eq. (1) obtained logit function can be defined as
logitVž Ÿ W = Ÿ , = ,T+ ,•Ÿ + ,¤Ÿ¥+ ,¦Ÿ¢ ,
for = 1,2, … , , and § = 1,2, … , , where and are respectively the number of
observations and the number of predictors.
Parameter estimates obtained from the MLE method, the likelihood function:
< , = ¨
Parameter estimates obtained from the derivative of the log-likelihood function. Results of the first derivative is,
Eq. (2) also known as the gradient vector. That equation still contains parameters. Thus the settlement requires numerical methods. The method used is the Newton–Raphson, by the equation:
,B¶" = ,B¶ + ·¶ , ¸ ¹ℎ · » = −’-,» .o•‘-,¶ .. (3)
Based on the Eq. (3), then to obtain parameter estimation in the LR using Newton–Raphson iterations, hessian matrix required. Hessian matrix can be defined as
H”•€I ,
H,JH, = ∑ -Ÿ¼# /.4žC Ÿ -1 − žC Ÿ .5 Ÿ = 3/½3 (4)
From equation (3) and (4) obtained:
,B¶" = ,B¶ + ¾¶
= ,B¶ − ’-,¶ .o ‘-, ¶ .
= 3/½3 o 3/½¿¶ ,
with ¿ ¶ = ,B¶ + ½o 9 − µ´ . Over fitting the training may occur when high-dimensional data sets are used. Over fitting can be reduced by using regularized LR (Maalouf, 2009) and (Maalouf et al., 2010).
2.2 Regularized logistic regression
SWUP
the MLE method and iteratively re-weighted least square (IRLS) or Newton–Raphson method the parameter estimation of regularized can be obtained:
,B∗ ¶" = 3/½3 + GN o 3/½¿∗ ¶ , with ¿¶ = ,B∗ ¶ + ½o y − µ´ .
Produced deviance function is nonlinear in parameter, so that parameter estimation using the MLE results did not maximum. To overcome these problems (Maalouf, 2009) and (Maalouf, Trafalis, and Adrianto, 2010) adding conjugate gradient method on Newton Raphson iteration and named truncated-iteratively reweighted least square (TR-IRLS) or truncated newton.
3. Results and discussion
Kernel logistic regression is a combination of regularized LR and the kernel function. Besides adding a conjugate gradient method on iteratively reweighted least square, Maalouf (2009) and Maalouf, Trafalis, and Adrianto (2010) explains that deviance is nonlinear to be solved by using a linear combination of. Form linear combinations are:
,⊗= 3/Ã = ∑ m The log-likelihood function of KLR is
ln < m = ∑7 lnVž Ä W
# + ∑7# 1 − lnV1 − ž Ä W −ÀÃ/ÇÃ. (5)
To get the parameter estimation, Eq. (5) is derived to as much as two times, i.e., the first derivative to obtain the gradient vector. Here are the results of the first derivative:
HÅ7I Ã
The first derivative is not close form, because it still contains parameters. So the numerical methods required. The method is called truncated newton. At this iteration method, there are two algorithms, the first algorithm is the outer loop and the second is the inner loop. Outer iteration loop is to obtain estimates of the parameters with the IRLS method, can be defined as:
ô¶o = ô ¶ + · ¶
= ô ¶ − -’ à ¶ .o ‘ à ¶ (6)
‡ ln < Ã
From (7), the parameter estimation KLR using MLE can be written:
ô¶" = Ç/½Ç + GÇ o Ç/½¿¶ .
Based on outer loop algorithm above, the parameter estimation of KLR does not optimum. It is caused the hessian matrix has a high dimension, making it difficult to get the inverse. To solve the problem numerically, the algorithm followed by the inner loop, which uses linear conjugate gradient method (CG) to obtain the optimum parameter estimation. This method has the equation: estimation using MLE witch not close form. This algorithm named IRLS. The second algorithm is conjugate gradient (CG). The result of this research can be applied to analyze of binary and multinomial classification.
Acknowledgment
The first author thanks to Directorate of Higher Education, Ministry of Education and Culture of the Republic of Indonesia which has provided financial support through scholarships BPPDN 2013-2015.
References
SWUP
Draper, N.R., & Smith, H. (1998). Applied regression analysis (3rd ed.). John Wiley & Sons, Inc., New York.
Hosmer, D.W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). John Wiley & Sons, Inc., New York.
Komarek, P., & Moore, A. (2005). Making logistic regression a core data mining tool with TR-IRLS.
Proceedings of the Fifth IEEE International Conference on Data Mining, 685–688.
Maalouf, M. (2009). Robust weighted kernel logistic regression in imbalanced and rare events data. Dissertation for the Degree of Doctor of Philosophy, University of Oklahoma, Oklahoma. Maalouf, M., Trafalis, T.B., & Adrianto, I. (2010). Kernel logistic regression using truncated Newton
method. Springer, Berlin.