• Tidak ada hasil yang ditemukan

Chapter 1 Introduction

4.2 Logistic Regression Model

influential observations are detected using Cook’s distance. The approximation of Cook’s distanceCi, is obtained by

Ci ≈ r2phii

p(1−hii) (4.9)

wherepis the number of model parameters,rp is the Pearson residual, andhii is the leverage which are the elements of the hat matrix. A high Ci implies that there are influential observations in the data set. Usually this is caused by a large deviance residual and leverages.

Generally, the assessment of influential observations is done by an index plot of Ci against the ith observation. For theith observation to have influence on the parameter estimate, its Cook’s distance should be greater than 1.

canonical link function is used, is given by

logit(πi) = log πi 1−πi

=x0iβ , i= 1,2, . . . , n The term

log πi 1−πi is called the logit function. The ratio

πi

1−πi

is the odds of success. The logit is generally referred to as the natural parameter of the binomial distribution and the logit link is its canonical link.

The probit model is an alternative method for modelling a binary response. It follows the assumption that the tolerance term is normally distributed with meanµ, and variance σ2, i.e. N(µ, σ2) for unknown µ and σ (Olson, 2002). The probability πi, is a cumulative distribution function (cdf) for a standard normal distribution. Generally, the probit model is given by

Φ−1i) =x0iβ i= 1,2, . . . , n

The probit link function Φ−1(.) maps the (0, 1) range probabilities onto (−∞,∞) a range of linear predicators. The curve of the model has the shape of the normal cdf when Φ is the standard normal cdf.

The complementary log-log function is similar to the logistic and probit models used for modelling binary response. Dobson (2002) argued that if the values of π are near 0.5, then the complementary log-log function is the same as the logit and probit models. Generally, complementary log-log is applicable if the value of π is near 0 or 1 (when there are extreme values). The arugment is that for complementary log-log, the cdf needs not be symmetric about the mid-point π = 0.5, as is the case for the logit and the probit models. The complementary log-log model is given by

log (πi) = log [− log (1 − π)]

Since logistic regression is a special case of the generalized linear models, the model selection and goodness of fit test is the same as discussed in Section 4.1. Generally, the Pearson chi-square or the likelihood ratio do not have limiting chi-squared distributions, but they are still useful for comparing models and can be applied in an approximate manner to grouped observed and fitted values for a partition of the space of x values. The argument here is that the number of explanatory variables increases, when simultaneous grouping of values for each variable can produce a contingency table with a large number of cells. Thus, it is then useful to use the Hosmer-Lemeshow goodness of fit statistic, which suggests the partition of observed and fitted values according to the estimated (predicted) probabilities of success using the original ungrouped data. This approach forms g groups in the partition of observed frequencies of the response variable y= 1, which have approximately equal size preferably 10 groups (Hosmer and Lemeshow, 2000). Generally, the estimated (predicted) probabilities are ordered in an ascending order before groupings can be performed. According to Hosmer and Lemeshow (2000) there are two grouping strategies that are proposed: first, the table is collapsed based on a percentile of the estimated probabilities; and second the table is collapsed on fixed values of the estimated probability. Thus, the first method used of g group, i.e g = 10 forms the first group containing n01 = n/10 subjects having the smallest estimated probabilities, and the last group containing n010 =n/10 subjects having the largest estimated probability. In the second method, the use of g = 10 groups results in cutpoints defined at the value k/10, where k = 1,2, . . . ,9; this group contains all subjects with estimated probabilities between the adjacent cutpoints. Therefore for the row y = 1, the estimates of the expected values are obtained by summing the estimated probabilities over all subjects in the group. Whereas for the row y = 0, the estimated expected values are obtained by summing over all subjects in a group, one minus the estimated probabilities (Hosmer and Lemeshow, 2000). LetXHL2 denote the Hosmer and Lemeshow goodness of fit statistic test. Then the Hosmer-Lemeshow test statistic XHL2 , has a chi-square distribution with g −2 degress of freedom. This statistic XHL2 , is compared with the critical value of the chi-square distribution, with g−2 df (χg−2,α), for checking model goodness of fit. The interpretation is as follows, if the XHL2 statistic is significant, it implies that there is model

lack of fit. Whereas if theXHL2 statistics is insignificant, it implies that the model fitted the data well, hence there is goodness of fit (Hosmer and Lemeshow, 2000).

Usually after a general diagnostic test for model goodness of fit, it is also important that validation of model predicted probabilities (measurement of model predictive power), is done so as to assess whether the model has really fitted the data well. In this study we used the classification table and the ROC curve to do this assessment.

Generally, a classification table cross classifies a binary response (y), with a prediction of whether it equals 0 or 1 ( y = 0 or 1). The prediction is ˆy = 1, when ˆπi > πo, and ˆ

y = 0 when ˆπi ≤ πo, for some cut-off πo. Most classification tables use π0 = 0.5 (as a threshold ) and summarise the predictive power by Sensitivity = P (ˆy= 1/y = 1), and specificity = P (ˆy = 0/y = 0). Table 4.1 is a plot of sensitivity against 1−specificity. It is the plot of true positive rate against false positive rate.

Table 4.1: Specificity and Sensitivity Classification

Outcome of diagnosis Test

Correct Classification

Y=1 (+ve)

Y=0 (-ve) Total

Yˆ=1 (+ve) S F S+F

Yˆ=0 (-ve) Fn Sn Fn + Sn

Total S+ Fn F+ Sn N= S+F+ Fn + Sn

From Table 4.1, the following probabilities can be estimated:

Sensitivity= S S+Fn

The estimated probability of correctly classifying an observation with an outcome of success, or is the number of true positive assessments over the number of all positive assessments.

Sensitivity = Sn F +Sn

The estimated probability of correctly classifying an observation with an outcome of failure, or is the number of true negative assessments over the number of all negative assessments.

Sensitivity = F F +Sn

The estimated probability of incorrectly classifying an observation with an outcome of failure, or is the number of false positive assessments over the number of all negative assessments.

Sensitivity= Fn S+Fn

The estimated probability of incorrectly classifying an observation with an outcome of suc- cess, or is the number of false negative assessments over the number of all positive assess- ments.

A receiver operating characteristic (ROC) curve is a plot of sensitivity as a function of 1− specificity for the possible cut-offs πo. This curve usually has a concave shape with a 450 line connecting the points (0,0) and (1,1); at this point the area under the curve (AU C) is 0.5. The higher the area under the curve, the better the predictions. PROC LOGISTIC in SAS can be used to plot the ROC curve for the model. The area under a ROC curve is identical to the value of another measure of predictive power, which is referred to as concordance index (Agresti, 2002). For instance, let’s consider all pairs of observations (i, j), such that yi = 1, and yj = 0. Then the concordance index c estimates the probability that the predictions and the outcomes are concordant. A value c = 0.5 means that the predictions were no better than random guessing. Generally, the ROC curves are a popular way of evaluating diagnostic tests. Sometimes such tests have J > 2 ordered response categories, rather than (positive, negative). The ROC curve then refers to the various possible cut-offs for defining a result to be positive. It plots sensitivity against 1− specificity for the possible collapsing of the J categories to a (positive, negative) scale (Agresti, 2002). Decisions on the area under the curve, which is also referred to as the probability that the predicted probability assigned to the event y = 1 is higher than the non-event y = 0, is as follows: if the area under the curve (AU C) is less than 0.6, this suggests that the prediction accuracy of the model is poor; and if the area under the curve is between 0.6−0.7, this suggests that the prediction of the model is good; where as if the area under the curve is between 0.7−0.8 this suggests that the predication accuracy of the model is very good, and when the area under the curve is between 0.9−0.1 this suggests that the prediction accuracy of the model is excellent. Figure 4.1 displays a ROC curve for two models. The AU C for model 1 is less than the AU C for model 2. This implies that

Model 2 has a better predictive accuracy power than model 1. Thus, model 2 is preferable to model 1. Also refer to Satchell and Xia (2006) for more discussion.

Figure 4.1: Roc Curves Plot (Sensitivity against 1-specificity)