Confidence Intervals - BASIC MODEL STATISTICS

STATA CODE

2.3 BASIC MODEL STATISTICS

2.3.4 Confidence Intervals

Model based 95% confidence intervals are calculated as follows:

> loci <- coef - qnorm(.975) * se

> upci <- coef + qnorm(.975) * se

where qnorm is the outside 2.5% of the observations from each side of the normal distribution.

> qnorm(.975) [1] 1.959964

Together the distribution excludes 5% of the distribution, or 0.05. Many times analysts will use 1.96 instead of the qnorm function when calculating confidence intervals. The confidence intervals for odds ratios are exponentiations of the coefficient-based confidence intervals. Combining everything together, we can use the code below to produce a table displaying the odds ratio of x and the odds-intercept together with their related standard errors, z statistics, p-values, and confidence intervals. I should mention that the model- based confidence intervals we have been discussing are also referred to as Wald confidence intervals.

Calculation of Odds Ratio and Associated Model Statistics

> coef <- logit2$coef

> se <- sqrt(diag(vcov(logit2)))

> zscore <- coef / se

> or <- exp(coef)

> delta <- or * se

> pvalue <- 2*pnorm(abs(zscore),lower.tail=FALSE)

> loci <- coef - qnorm(.975) * se

> upci <- coef + qnorm(.975) * se

> ortab <- data.frame(or, delta, zscore, pvalue, exp(loci), exp(upci))

> round(ortab, 4)

or delta zscore pvalue exp.loci. exp.upci.

(Intercept) 3.0000 3.4641 0.9514 0.3414 0.3121 28.8405 x 0.2222 0.3271 -1.0218 0.3069 0.0124 3.9785

Unfortunately, R users must program a table of odds ratio statistics as I have above. The summary function following glm displays a coefficient table of logistic model base statistics, but there is no function that automatically displays a table of odds ratio statistics. We can create an R function to do just that. We shall call it toOR.R (see Table 2.1), where the OR component of toOR must be in capitals. R is case sensitive. I will place the toOR function into the LOGIT package so that it can be used automatically anytime the package is installed and loaded into memory. It can be used following the use of glm with the binomial family and default logit function. I will also place the function on the book’s web site.

After estimation of a logistic regression using glm—for example, the logit2 model—type

> toOR(logit2)

or delta zscore pvalue exp.loci. exp.upci.

(Intercept) 3.0000 3.4641 0.9514 0.3414 0.3121 28.8405 x 0.2222 0.3271 -1.0218 0.3069 0.0124 3.9785

TABLE 2.1 toOR function

toOR <- function(object, ...) { coef <- object$coef

se <- sqrt(diag(vcov(object))) zscore <- coef / se

or <- exp(coef) delta <- or * se

pvalue <- 2*pnorm(abs(zscore),lower.tail=FALSE) loci <- coef - qnorm(.975) * se

upci <- coef + qnorm(.975) * se

ortab <- data.frame(or, delta, zscore, pvalue, exp(loci), exp(upci))

round(ortab, 4) }

We may use the function on medpar data

> data(medpar) # assumes library(COUNT)or library(LOGIT) loaded

> smlogit <- glm(died ~ white + los + factor(type), family = binomial, data = medpar)

> summary(smlogit)

. . . Coefficients:

Estimate Std. Error z value Pr(>|z|) (Intercept) -0.716364 0.218040 -3.285 0.00102 **

white 0.305238 0.208926 1.461 0.14402 los -0.037226 0.007797 -4.775 1.80e-06 ***

factor(type)2 0.416257 0.144034 2.890 0.00385 **

factor(type)3 0.929994 0.228411 4.072 4.67e-05 ***

> toOR(smlogit)

or delta zscore pvalue exp.loci. exp.upci.

(Intercept) 0.4885 0.1065 -3.2855 0.0010 0.3186 0.7490 white 1.3569 0.2835 1.4610 0.1440 0.9010 2.0436 los 0.9635 0.0075 -4.7747 0.0000 0.9488 0.9783 factor(type)2 1.5163 0.2184 2.8900 0.0039 1.1433 2.0109 factor(type)3 2.5345 0.5789 4.0716 0.0000 1.6198 3.9657

Confidence intervals are very important when interpreting a logistic model, as well as any regression model. By looking at the low and high range of a predictor’s confidence interval an analyst can determine if the predictor contributes to the model.

Remember that a regression p-value is an assessment of whether we may

“significantly” reject the null hypothesis that the coefficient (β) is equal to 0.

If the confidence interval of a predictor includes 0, then we cannot be significantly sure that the coefficient is not really 0 in value. For odds ratios, since the confidence intervals are exponentiations of the coefficient confidence intervals, having the range of the confidence interval include 1 is evidence that the null hypothesis has not been rejected. The confidence intervals for logit2 odds ratio model above both include 1—0.0124123 to 3.9788526 and 0.3120602 to 28.84059. Note that the p-values for both x and the intercept are approximately 0.3. 0.3 far exceeds the 0.05 criterion of significance.

How is the confidence interval to be interpreted? If zero is not within the lower and upper limits of the confidence interval of a coefficient, we cannot conclude that we are 95% sure that the coefficient is “significant”; that is, that the associated p-value is truly under 0.05. Many analysts interpret confidence intervals in such a manner, but they should not.

The traditional logistic regression model we are discussing here is based on a frequency interpretation of statistics. As such the confidence intervals must be interpreted in the same manner. If the coefficient of a logistic model predictor has a p-value under 0.05, the associated confidence interval will not include zero. The interpretation is

Wald Confidence Intervals

If we repeat the modeling analysis a very large number of times, the true coefficient would be within the range of the lower and upper levels of the confidence interval 95 times out of 100.

I earlier mentioned that the use of confint() following R’s glm displays profile confidence intervals. confint.default() produces standard confidence intervals, based on the normal distribution. Profile confidence intervals are based on the Chi2 distribution. Profile confidence intervals are particularly important to use when there are relatively few observations in the model, as well as when the data are unbalanced. For example, if a logistic model has 30 observations, but the response variable consists of 26 ones and only 4 zeros, the data are unbalanced. Ideally a logistic response variable should have relatively equal numbers of 1s to 0s. Likewise, if a binary predictor has nearly all 1s or 0s, the model is unbalanced, and adjustments may need to be made to the model.

In any case, profile confidence intervals are derived as the inverse of the likelihood ratio test defined as

Likelihood ratio test= −2{Lreduced − Lfull}

This is a test we will use later when assessing the significance of adding, or dropping, a predictor or group of predictors from a model. The log-likelihood of a model with all of the predictors is subtracted from the log-likelihood of a model with fewer predictors. The result is multiplied by “ − 2.” The significance of the test is based on the Chi2 distribution, whose arguments are the likelihood ratio test statistic and degrees of freedom. The degrees of freedom consists of how many predictors there are between the full and reduced models. If a single predictor is being evaluated, there is one degree of freedom.

The likelihood ratio test is preferred to the standard Wald assessment based on regression coefficient or odds ratio p-values. We shall discuss the test further in Chapter 4, Section 4.2.

For now you need only know that profile confidence intervals are the inversion of the likelihood ratio test. The statistic is not simple to produce by hand, but easy to display using the confint function. It should be noted that when the predictors are significant and the logistic model is well fit,

Wald or model-based confidence intervals differ little from profile confidence intervals. In the case of the logit2 model where neither x nor the intercept are significant and there are only nine observations in the model, we expect for there to be a somewhat substantial difference in confidence interval values.

Wald or Model-Based Confidence Intervals

> confint.default(logit2) 2.5 % 97.5 % (Intercept) -1.164557 3.361782 x -4.389065 1.380910 Profile Confidence Intervals

> confint(logit2)

Waiting for profiling to be done...

2.5 % 97.5 % (Intercept) -0.9568748 4.105099 x -4.9264210 1.219928

Stata’s pllf command produces profile confidence intervals, but only for con- tinuous predictors.

Scaled, sandwich or robust, and bootstrapped-based confidence intervals will be discussed in Chapter 4, and compared with profile confidence intervals. We shall discuss which should be used given a particular type of data.

2.4 MODELS WITH A

Dalam dokumen Practical Guide to Logistic Regression (Halaman 41-45)