Logistic Regression Model - Analysis of a binary response : an application to entrepreneurship

Chapter 1 Introduction

4.2 Logistic Regression Model

influential observations are detected using Cook’s distance. The approximation of Cook’s distanceC_i, is obtained by

C_i ≈ r²_ph_ii

p(1−h_ii) (4.9)

wherepis the number of model parameters,rp is the Pearson residual, andhii is the leverage which are the elements of the hat matrix. A high C_i implies that there are influential observations in the data set. Usually this is caused by a large deviance residual and leverages.

Generally, the assessment of influential observations is done by an index plot of C_i against the i^th observation. For thei^th observation to have influence on the parameter estimate, its Cook’s distance should be greater than 1.

canonical link function is used, is given by

logit(π_i) = log π_i 1−πi

=x⁰_iβ , i= 1,2, . . . , n The term

log π_i 1−π_i is called the logit function. The ratio

πi

1−π_i

is the odds of success. The logit is generally referred to as the natural parameter of the binomial distribution and the logit link is its canonical link.

The probit model is an alternative method for modelling a binary response. It follows the assumption that the tolerance term is normally distributed with meanµ, and variance σ², i.e. N(µ, σ²) for unknown µ and σ (Olson, 2002). The probability π_i, is a cumulative distribution function (cdf) for a standard normal distribution. Generally, the probit model is given by

Φ⁻¹(π_i) =x⁰_iβ i= 1,2, . . . , n

The probit link function Φ⁻¹(.) maps the (0, 1) range probabilities onto (−∞,∞) a range of linear predicators. The curve of the model has the shape of the normal cdf when Φ is the standard normal cdf.

The complementary log-log function is similar to the logistic and probit models used for modelling binary response. Dobson (2002) argued that if the values of π are near 0.5, then the complementary log-log function is the same as the logit and probit models. Generally, complementary log-log is applicable if the value of π is near 0 or 1 (when there are extreme values). The arugment is that for complementary log-log, the cdf needs not be symmetric about the mid-point π = 0.5, as is the case for the logit and the probit models. The complementary log-log model is given by

log (π_i) = log [− log (1 − π)]

Since logistic regression is a special case of the generalized linear models, the model selection and goodness of fit test is the same as discussed in Section 4.1. Generally, the Pearson chi-square or the likelihood ratio do not have limiting chi-squared distributions, but they are still useful for comparing models and can be applied in an approximate manner to grouped observed and fitted values for a partition of the space of x values. The argument here is that the number of explanatory variables increases, when simultaneous grouping of values for each variable can produce a contingency table with a large number of cells. Thus, it is then useful to use the Hosmer-Lemeshow goodness of fit statistic, which suggests the partition of observed and fitted values according to the estimated (predicted) probabilities of success using the original ungrouped data. This approach forms g groups in the partition of observed frequencies of the response variable y= 1, which have approximately equal size preferably 10 groups (Hosmer and Lemeshow, 2000). Generally, the estimated (predicted) probabilities are ordered in an ascending order before groupings can be performed. According to Hosmer and Lemeshow (2000) there are two grouping strategies that are proposed: first, the table is collapsed based on a percentile of the estimated probabilities; and second the table is collapsed on fixed values of the estimated probability. Thus, the first method used of g group, i.e g = 10 forms the first group containing n⁰₁ = n/10 subjects having the smallest estimated probabilities, and the last group containing n⁰₁₀ =n/10 subjects having the largest estimated probability. In the second method, the use of g = 10 groups results in cutpoints defined at the value k/10, where k = 1,2, . . . ,9; this group contains all subjects with estimated probabilities between the adjacent cutpoints. Therefore for the row y = 1, the estimates of the expected values are obtained by summing the estimated probabilities over all subjects in the group. Whereas for the row y = 0, the estimated expected values are obtained by summing over all subjects in a group, one minus the estimated probabilities (Hosmer and Lemeshow, 2000). LetX_HL² denote the Hosmer and Lemeshow goodness of fit statistic test. Then the Hosmer-Lemeshow test statistic X_HL² , has a chi-square distribution with g −2 degress of freedom. This statistic X_HL² , is compared with the critical value of the chi-square distribution, with g−2 df (χg−2,α), for checking model goodness of fit. The interpretation is as follows, if the X_HL² statistic is significant, it implies that there is model

lack of fit. Whereas if theX_HL² statistics is insignificant, it implies that the model fitted the data well, hence there is goodness of fit (Hosmer and Lemeshow, 2000).

Usually after a general diagnostic test for model goodness of fit, it is also important that validation of model predicted probabilities (measurement of model predictive power), is done so as to assess whether the model has really fitted the data well. In this study we used the classification table and the ROC curve to do this assessment.

Generally, a classification table cross classifies a binary response (y), with a prediction of whether it equals 0 or 1 ( y = 0 or 1). The prediction is ˆy = 1, when ˆπ_i > π_o, and ˆ

y = 0 when ˆπ_i ≤ π_o, for some cut-off π_o. Most classification tables use π₀ = 0.5 (as a threshold ) and summarise the predictive power by Sensitivity = P (ˆy= 1/y = 1), and specificity = P (ˆy = 0/y = 0). Table 4.1 is a plot of sensitivity against 1−specificity. It is the plot of true positive rate against false positive rate.

Table 4.1: Specificity and Sensitivity Classification

Outcome of diagnosis Test

Correct Classification

Y=1 (+ve)

Y=0 (-ve) Total

Yˆ=1 (+ve) S F S+F

Yˆ=0 (-ve) Fn Sn Fn + Sn

Total S+ Fn F+ Sn N= S+F+ Fn + Sn

From Table 4.1, the following probabilities can be estimated:

Sensitivity= S S+F_n

The estimated probability of correctly classifying an observation with an outcome of success, or is the number of true positive assessments over the number of all positive assessments.

Sensitivity = S_n F +S_n

The estimated probability of correctly classifying an observation with an outcome of failure, or is the number of true negative assessments over the number of all negative assessments.

Sensitivity = F F +S_n

The estimated probability of incorrectly classifying an observation with an outcome of failure, or is the number of false positive assessments over the number of all negative assessments.

Sensitivity= F_n S+Fn

The estimated probability of incorrectly classifying an observation with an outcome of success, or is the number of false negative assessments over the number of all positive assessments.

A receiver operating characteristic (ROC) curve is a plot of sensitivity as a function of 1− specificity for the possible cut-offs π_o. This curve usually has a concave shape with a 45⁰ line connecting the points (0,0) and (1,1); at this point the area under the curve (AU C) is 0.5. The higher the area under the curve, the better the predictions. PROC LOGISTIC in SAS can be used to plot the ROC curve for the model. The area under a ROC curve is identical to the value of another measure of predictive power, which is referred to as concordance index (Agresti, 2002). For instance, let’s consider all pairs of observations (i, j), such that y_i = 1, and y_j = 0. Then the concordance index c estimates the probability that the predictions and the outcomes are concordant. A value c = 0.5 means that the predictions were no better than random guessing. Generally, the ROC curves are a popular way of evaluating diagnostic tests. Sometimes such tests have J > 2 ordered response categories, rather than (positive, negative). The ROC curve then refers to the various possible cut-offs for defining a result to be positive. It plots sensitivity against 1− specificity for the possible collapsing of the J categories to a (positive, negative) scale (Agresti, 2002). Decisions on the area under the curve, which is also referred to as the probability that the predicted probability assigned to the event y = 1 is higher than the non-event y = 0, is as follows: if the area under the curve (AU C) is less than 0.6, this suggests that the prediction accuracy of the model is poor; and if the area under the curve is between 0.6−0.7, this suggests that the prediction of the model is good; where as if the area under the curve is between 0.7−0.8 this suggests that the predication accuracy of the model is very good, and when the area under the curve is between 0.9−0.1 this suggests that the prediction accuracy of the model is excellent. Figure 4.1 displays a ROC curve for two models. The AU C for model 1 is less than the AU C for model 2. This implies that

Model 2 has a better predictive accuracy power than model 1. Thus, model 2 is preferable to model 1. Also refer to Satchell and Xia (2006) for more discussion.

Figure 4.1: Roc Curves Plot (Sensitivity against 1-specificity)

Dalam dokumen Analysis of a binary response : an application to entrepreneurship success in South Sudan. (Halaman 47-52)