Practical Guide to Logistic Regression

R statistical software is used to display all but one statistical model discussed in the book - exact logistic regression. Otherwise, R is used for all data management, models, post-estimate fit analyses, tests, and graphics related to our discussion of logistic regression in the book.

WHAT IS A STATISTICAL MODEL?

A statistical model is the relationship between the parameters of the underlying PDF of the population data and the estimates made by an analyst of those parameters. That is, regression is typically used to establish an accurate model of the population data.

BASICS OF LOGISTIC REGRESSION MODELING

Logistic regression is particularly valuable because the predictions made from the fitted model are probabilities limited to a range of 0–1. Specifically, a logistic regression model predicts the probability that a response has a value of 1 given a given set of predictor values.

THE BERNOULLI DISTRIBUTION

There is a linear relationship between the model's predicted or fitted values and the terms on the right-hand side of Equation 1.6—the linear predictor. To determine μ based on the linear predictor, xb, we solve the logit function for μ, without showing subscripts, as.

METHODS OF ESTIMATION

Since R's default logistic regression is part of the glm function, we'll go over the basics of how it works. The logic of a standalone R algorithm that can be used for logistic regression is given in Table 1.1.

TABLE 1.1 R function for logistic regression irls_logit <- function(formula, data, tol=.000001) {# set option default values mf <- model.frame(formula, data) # define model frame as mf y <- model.response(mf, “numeric”)# set model resp

SAS CODE

Again, I've provided a full logistic regression model to show where we're going in our discussion of logistic regression. How to interpret coefficients, standard errors, etc. will be dealt with in the following chapters.

STATA CODE

MODELS WITH A BINARY PREDICTOR

It is the sum of the rate of change in y based on a one unit change in x. When x is binary, it is the amount of change in y when x moves from 0 to 1 in value.

PREDICTIONS, PROBABILITIES, AND ODDS RATIOS

What we find is that from the linear predictor of the model and the probabilities we have calculated the model's odds ratios and coefficients. Likewise, we can also see how the fitted values or probabilities are all related as components of the mean parameter estimated by the model.

BASIC MODEL STATISTICS

Standard Errors
z Statistics
p-Values
Confidence Intervals

You predict that the direction of the coefficient in question only goes in one direction. The confidence intervals for odds ratios are exponentials of the coefficient-based confidence intervals.

MODELS WITH A CATEGORICAL PREDICTOR

However, this is the reference level and is used to interpret both type 2 (urgent) and type 3 (emergency). Both will be interpreted as odds ratios, with the denominator of the ratio being the reference level.

MODELS WITH A CONTINUOUS PREDICTOR

Varieties of Continuous Predictors
A Simple GAM
Centering
Standardization

In addition, other predictors in the model may affect the relationship between fit and predictor. For a continuous predictor, the lower value of two consecutive values in the predictor is the reference; the higher the x = 1 level.

PREDICTION

Basics of Model Prediction
Prediction Confidence Intervals

We use the predictor function with options type = "link" and se.fit = TRUE to place the predictions on the scale of the linear predictor, and to guarantee that the lpred object is, in fact, the standard error of the linear predictor .

FIGURE 2.2 Predicted probability of death by length of stay.

SELECTION AND INTERPRETATION OF PREDICTORS

A tabulation of the educlevel predictor is shown below, along with the top six values of all variables in the data. Interpretation gives us the following, with the understanding that the values of the other predictors in the model are held constant. I recommend using a positive interpretation of odds ratio if it makes sense in the context of the study.

STATISTICS IN A LOGISTIC MODEL

A traditional fit statistic that we will discuss in the next chapter is based on the Chi2 distribution of the variance with a dimer of the residual variance. The Pearson Chi2 goodness-of-fit statistic is defined as the square of the raw residual divided by the variance statistic. Note that standardization of the Pearson and variance residuals is accomplished by dividing each of them by the square root of 1 − hat.

INFORMATION CRITERION TESTS

Akaike Information Criterion
Finite Sample
Bayesian Information Criterion
Other Information Criterion Tests

It should be noted that of all the information criteria that have been formulated, this version of AIC is the only one that does not adjust the log-likelihood by n, the number of observations in the model. The Schwarz Bayesian information criterion (BIC) is the most widely used BIC test found in the literature. The BIC statistic is specified as 1925.01, which is the same value displayed in the Stata estat ic post-estimation command.

THE MODEL FITTING PROCESS

ADJUSTING STANDARD ERRORS

Scaling Standard Errors

We can now check whether the quasi-binomial "family" option produces standard errors. The standard errors shown in the quasi-binomial model are the same as the scaled standard errors we generated by hand. This is simply an operation to provide standard errors on a binomial model such as logistic regression.

Robust or Sandwich Variance Estimators

When an analyst models a logistic regression with scaled standard errors, the resulting standard errors will be the same as the model-based standard errors if there are no distributional problems with the data. In other words, the logistic model is not adversely affected if the standard errors are scaled when they do not need to be. However, this is not always the case, but since correlation is assumed to exist within panels, the standard errors of the predictors are assumed to include correlation and should be adjusted using a sandwich variance estimator.

Bootstrapping

It should be mentioned that models evaluating longitudinal and cluster data such as generalized estimating equations always assume that there is more correlation within longitudinal units or panels, or within cluster panels, than between them. The assumption is that panels or groups are independent of each other; that is, there is no correlation between them.

RISK FACTORS, CONFOUNDERS, EFFECT MODIFIERS, AND INTERACTIONS

If we believe that the probability of death by length of hospital stay differs by racial classification, then we must include a White × Moose interaction term in the model. That is, we add the slope of the binary predictor to the product of the slope of the interaction and the value of the continuous predictor, thus powering the whole. For now, remember that when including an interaction term in your model, be sure to include the terms that make up the interaction in the model, but don't worry about their interpretation or meaning.

CHECKING LOGISTIC MODEL FIT

Pearson Chi2 Goodness-of-Fit Test
Likelihood Ratio Test
Residual Analysis
Conditional Effects Plot

For example, I use the medpar data, modeling the probability of death by length of hospital stay, factored by type of admission. However, in the conditional effects plot, each type level is used to produce a probability curve of death for a given length of hospital stay. I have placed the code in Table 4.3 so that it can be placed in the R editor [File > “New Script”] and run.

FIGURE 4.1 Squared standardized deviance versus mu.

CLASSIFICATION STATISTICS

S–S Plot
ROC Analysis
Confusion Matrix

The cut point is usually close to the mean of the predicted values, but usually not equal to the mean. The optimal cut point is defined as the threshold that maximizes the distance to the identity (diagnostic) line of the ROC curve. Because of the sampling nature of the statistic, the cut point for the ROC curve is slightly different from that for the S–S plot (Figure 4.6).

HOSMER–LEMESHOW STATISTIC

The HLTest function used below for a Hosmer-Lemeshow test on the above model is adapted from Bilger and Loughlin (2015). To show how different code can lead to different results, I used code for the H–L test in Hilbe (2009). The H–L test is a nice summary test to use on a logistic model, but interpret it with caution.

MODELS WITH UNBALANCED DATA AND PERFECT PREDICTION

Interestingly, the p-values for the second level of cd4 and cd8, which failed in the standard logistic regression, are statistically significant for the penalized logit model. The likelihood ratio test tells us that the penalized model is also not a good fit. If you find that there is perfect prediction in your model or that the data is highly unbalanced; for example, almost all 1s or 0s for a binary variable, penalized logistic regression may be the only viable way to model it.

EXACT LOGISTIC REGRESSION

The data consists of a random sample of cardiac procedures called CABG and PTCA. Given the size of the data and considering the possibility of correlation in the data, we then model the same data as a. In addition, the model score statistic in the header statistics informs us that the model does not fit the data well (p > 0.05) .

MODELING TABLE DATA

Recalling our discussion earlier in the text, the intercept odd is the denominator of the ratio we just calculated to determine the odds ratio of x. Anscombe residues can be obtained as model output in SAS/Insight, not the SAS command language; The estimated cut-off point can be found when the sensitivity and specificity are closest/equal in the classification table;

THE BINOMIAL PROBABILITY DISTRIBUTION FUNCTION

The first derivative of the cumulant, −n ln(1 − p), with respect to the link, ln(p/(1 − p)), is the mean, whatever it is for the binomial distribution.

FROM OBSERVATION TO GROUPED DATA

Note that the response variable is cbind(y, noty) instead of y as in the Standard Model. Sometimes a data set can be too large to simply transcribe an observation into a grouped format. You can use the above code as a paradigm for converting observation data to a grouped format.

IDENTIFYING AND ADJUSTING FOR EXTRA DISPERSION

I mentioned that for clustered logistic models that a dispersion statistic greater than 1 indicates overdispersion or negligible variation in the data. If a clustered logistic model has a dispersion statistic greater than 1, check each of the 5 indicators of apparent overdispersion to determine whether applying them reduces the dispersion to approximately 1. If the dispersion statistic for a clustered logistic model is less than 1, the data will be underdistributed.

MODELING AND INTERPRETATION OF GROUPED LOGISTIC REGRESSION

Examples of how these indicators of apparent overdispersion affect logistic models are given in Hilbe (2009).

FIGURE 5.1 Leverage versus standardized Pearson.

BETA-BINOMIAL REGRESSION

Note that the kernel of the beta distribution is similar to that of the binomial kernel. As mentioned earlier, the beta binomial distribution is a mixture of the binomial and beta distribution. In this regard, the beta binomial analogous to the heterogeneous negative binomial counting model (Hilbe and de.

A BRIEF OVERVIEW OF BAYESIAN METHODOLOGY

The mean (or median, mode) of a posterior distribution is considered the beta, parameter estimate, or Bayesian coefficient of the variable. Usually the denominator, which is the normalization term, is dropped from the calculations so that the posterior distribution or a model predictor is determined by the product of its probability and prior. I provide the reader with several suggested books on the subject at the end of the chapter.

EXAMPLES: BAYESIAN LOGISTIC REGRESSION

Bayesian Logistic Regression Using R
Bayesian Logistic Regression Using JAGS
Bayesian Logistic Regression with Informative Priors

Which model we use depends on what we think the source of the extra correlation is. A comparison of the standard errors of the two models shows that there is not much extra variation in the data. Options used many times in the model are b0 and B0, which represent the mean and precision of the previous(s).

FIGURE 6.1 R trace and density plots of model with noninformative priors.

Partial Output—Logit Model with Informative Priors

CONCLUDING COMMENTS

For those who want to learn more after reviewing this book, I recommend my logistic regression models in preparation). Practical Guide to Logistic Regression covers the main points of the basic logistic regression model and illustrates how to use it correctly to model a binary response variable. Provides practical guidance on constructing, modeling, interpreting, and evaluating binary response data using logistic regression.

Joseph M. Hilbe

Logistic

Regression