In this section we consider how the relationships in multi-way contingency tables, and more complicated designs, can be explored using regression methods known as logistic regression. Logistic regression is one of the most widely used methods for the analysis of binary data. It is used to examine and describe the relationship between a binary response variable Yi
(e.g. 1=‘success’ or 0=‘failure’) and one or more covariates fori=1,. . .,nindependent subjects. The covariates can be continuous or categorical (e.g. indi- cator variables). Denoting the two possible outcomes forYiby 0 and 1, the probability distribution of the response variable is the Bernoulli distribution with probability of success pi. In common with linear regression, the primary objective of logistic regres- sion is to model the mean of the response variable, given a set of covariates. Recall that with a binary response, the mean of Yi is simply the probability thatYitakes on the value 1,pi. However, what dis- tinguishes logistic regression from linear regression is that the response variable is binary rather than con- tinuous in nature. This has a number of consequences for modelling the mean of the response variable. For ease of exposition, we will first consider the simple case where there is only a single predictor variable, sayxi. Generalisations to more than one predictor variable will be considered later.
Since linear models play such an important and dominant role in applied statistics, it may at first seem natural to assume a linear model relating the mean ofYitoxi,
pr[Yi=1|xi]=pi=β0+β1xi (2.8) However, expressing pi as a linear function is problematic since it violates the restriction that probabilities must lie within the range from 0 to 1. As a result, for sufficiently large or small values ofxi, the linear model given by Equation 2.8 will yield prob- abilities outside of the permissible range. A further difficulty with the linear model for the probabilities is that we often expect a nonlinear relationship betweenpiandxi. For example, a 0.2 unit increase in pimight be considered more ‘extreme’ whenpi=0.1 than whenpi=0.5. In terms of ratios, the change from pi=0.1 topi=0.3 represents a threefold or
200% increase, whereas the change frompi=0.5 to pi=0.7 represents only a 40% increase. In a sense, the units of measurement for a probability or proportion are often not considered to be constant over the range from 0 to 1. The linear probability model given by Equation 2.8 simply does not take this into consideration when relatingpitoxi.
To circumvent these problems, a nonlinear trans- formation is usually applied to pi and the trans- formed probabilities are related linearly to xi. In particular, a transformation ofpi, sayg(pi), is cho- sen so that it maps the range of pi from (0, 1) to (−∞,∞). Since there are many possible transforma- tions,g(pi), that achieve this goal, this leads to an extensive choice of models that are all of the form
g(pi)=β0+β1xi. (2.9) However, the most commonly used in practice are 1 Logit or logistic function:g(pi)=log[pi/(1−pi)]
2 Probit or inverse normal function: g(pi)=−1 (pi), whereis the standardised normal cumula- tive distribution function
3 Complementary log–log function: g(pi)=log [−log(1−pi)].
We note that all of these transformations are very closely related when 0.2<pi<0.8, and in a sense only differ in the degree of ‘tail-stretching’ outside of this range. Indeed, for most practical purposes it is not possible to discriminate between a data analysis that is based on, for example, the logit and probit functions. To discriminate empirically between probit and logistic regression would, in general, require very large numbers of observations.
However, the logit function does have a number of distinct advantages over the probit and complemen- tary log–log functions which probably account for its more widespread use in practice. Later in this chapter we will consider some of the advantages of the logit or logistic function.
When the logit or logistic function is adopted, the resulting model
logit(pi)=log[pi/(1−pi)]=β0+β1xi, (2.10) is known as thelogistic regressionmodel. Recall from Section 2.3.1 that ifpiis the probability of success,
then1−ppi
i is theoddsof success. Consequently, logis- tic regression assumes a linear relationship between the log odds of success andxi. Note that this simple model can be expressed equivalently in terms ofpi,
pi= exp(β0+β1xi)
1+exp(β0+β1xi). (2.11) We must emphasise that Equations 2.10 and 2.11 are completely equivalent ways of expressing the logistic regression model. Expression 2.10 describes how the log odds, log(1−ppi
i), has a linear relationship with xi, while expression 2.11 describes how pi has an S-shaped relationship with increasing values of β1xi; although, in general, this relationship is approximately linear within the range 0.2<pi<0.8 (see Figure 2.1 for a plot ofpiversusxiwhenβ0=0.5 andβ1=0.9). Observe that the expression on the right of (Equation 2.11) cannot yield a value that is either negative or greater than 1. That is, the logistic transformation ensures that the predicted probabilities are restricted to the range from 0 to 1.
x
Probability of Success
−4 −2 0 4
0.0 0.2 0.4 0.6 0.8 1.0
2
Fig 2.1 Plot of logistic response function.
Finally, note that
1−pi= 1
1+exp(β0+β1xi), so that the odds, 1−ppi
i, is simply exp(β0+β1xi).
2.5.1 Interpretation of logistic regression coefficients
Next we consider the interpretation of the logistic regression coefficients,β0andβ1, in Equation 2.10.
In simple linear regression, recall that the interpre- tation of the slope of the regression is in terms of changes in the mean ofYi for a single unit change in xi. Similarly, the logistic regression slope, β1, in Equation 2.10 has interpretation as the change in the log odds of success for a single unit change in xi. Equivalently, a single unit change inxiincreases or decreases the odds of successmultiplicativelyby a factor of exp(β1). Also, recall that the intercept in simple linear regression has interpretation as the mean value of the response variable whenxiis equal to 0. Similarly, the logistic regression interceptβ0, has interpretation as the log odds of success whenxi=0.
Note that, for case–control studies, the interceptβ0
cannot be validly estimated since it is determined by the proportions of ‘successes’ (Y=1) and ‘failures’
(Y=0) selected by the study design. However, in many studies, there is far less scientific interest in the intercept than in the slope.
For the special case wherexiis dichotomous, tak- ing values of 0 and 1, the logistic regression slope, β1, has a simple and very attractive interpretation.
Consider the two possible values forpiwhenxi=0
andxi=1. Letpi(xi=j), denote the probability of success whenxi=j, forj=0, 1.Then,
β1=(β0+β1)−β0
=logit[pi(xi=1)]−logit[pi(xi=0)]
=log
pi(xi=1)×[1−pi(xi=0)]
pi(xi=0)×[1−pi(xi=1)]
which is the log of theOR(or cross-product ratio) in the 2×2 table of the cross-classification ofYiandxi (see Table 2.10). Thus, exp(β1) has interpretation as theORof the response for the two possible values of the covariate.
TheORhas many appealing properties that prob- ably account for the widespread use of logistic regression in many areas of application. First, as was noted earlier, the ORdoes not change when rows and columns of the 2×2 table are interchanged.
This implies that it is not necessary to distinguish which variable is the response and predictor variable in order to estimate theOR. Furthermore, as noted in the previous sections, a very appealing feature of theOR, exp(β1), is that it is equally valid regardless of whether the study design is prospective, cross- sectional or retrospective. That is, logistic regression provides an estimate of the same association between Yi and xi in all three study designs. Finally, in psychiatric studies where Yi typically denotes the presence or absence of a disease or disorder, theOR is often interpreted as an approximation to theRR of disease, p(xp(xi=1)
i=0). When the disease is rare, andpi is reasonably close to 0 in both of the risk groups (often known as the ‘rare disease’ assumption), the ORprovides a close approximation to theRR. Retro- spective designs are especially common in psychiatry Table 2.10 Cross-classification probabilities for logistic regression ofYonx.
Y
1 0
1 x
0
p(x = 1) = exp(β0 + β1)
1 + exp(β0 + β1) 1 − p(x = 1) =
1 + exp(β10 + β1) 1.0
p(x = 0) = exp(β0)
1 + exp(β0) 1 − p(x = 0) = 1
1 + exp(β0) 1.0 Total
where the possible outcomes of interest are very rare.
Although theRRcannot be estimated from a retro- spective study, the ORcan be used to provide an approximation to the RR. Extra care is necessary when interpreting theORas an approximation to theRRin prospective studies. In many prospective studies the binary event is relatively common (say greater than 10%) and the ‘rare disease’ assumption no longer holds; in these settings, theORcan be a very poor and unreliable approximation to theRR and should not be given such an interpretation.
2.5.2 Hypothesis testing and confidence intervals for logistic regression parameters
Often, we are interested in testing for an associa- tion between the predictor in our logistic regression model and the outcome, or, equivalently, testing H0 : β1=0. As for 2×2 table methods, Wald, like- lihood ratio and score statistics can be used for this test. A Wald test statistic can be obtained using the result that the estimate of β1 divided by its stan- dard error (s.e.) approximately follows a N(0, 1) distribution in large samples. A LRT statistic can be obtained by comparing the log likelihood for the full model with the predictor included to the log likelihood for a reduced model including only the interceptβ0; the former is at least as large as the lat- ter. In large samples, twice the difference between the maximised log likelihoods for the full and reduced models approximately follows a chi-square distribu- tion with 1 degree of freedom.
Two-sided Wald confidence limits for β1 can be obtained using the result that ˆβ1 follows an approx- imate normal distribution; the confidence limits are given by the formula ˆβ1±zα/2∗s.e.( ˆβ1). Just as we can exponentiate ˆβ1 to get an estimate of the OR comparing the odds of disease for a unit change in x1, we can exponentiate the lower and upper limits of the confidence interval forβ1 to get a confidence interval for theOR. Estimates ofβ1(or, alternatively, its associatedOR), its standard error and the log like- lihood for the model are available from the output from logistic regression routines from popular sta- tistical software. Test statistics andp-values for tests thatβ1=0 and Wald 95% confidence intervals are often also included automatically. Likelihood ratio and score test statistics can sometimes be requested.
Although Wald tests and confidence intervals are standard output from software for fitting logistic regression, we caution the reader that in certain circumstances the performance of Wald tests (and confidence intervals) can be somewhat irregular and lead to misleading conclusions. As a result, we rec- ommend that LRTs (and confidence intervals) be used whenever possible.
2.5.3 Example: Logistic regression with a single binary covariate
We now return to the Table 2.1 data from the first-episode major affective disorders with psychosis study and show that we can obtain identical results using large-sample methods for 2×2 contingency tables (as reported in Section 2.3.2) and logistic regression. Recall that our interest is in the associa- tion between Axis I comorbidity and 2-year func- tional recovery in this group of patients. Using logistic regression, we fit the model:
logit[pr(Recoveryi=1)]=β0+β1∗Comorbidityi, (2.12) where Recoveryi is an indicator variable coded 1 if the ith subject recovered and 0 otherwise, and Comorbidityiis an indicator variable coded 1 if the ith subject had Axis I Comorbidity and 0 otherwise.
The following are the results:
βˆ s.e.( ˆβ) Z p>|Z| 95% CI Intercept −0.2624 0.1881 −1.39 0.163 −0.6310,
0.1063 Comorbidity−0.7185 0.3343 −2.15 0.032 −1.3737,
−0.0632
The estimate of the OR comparing the odds of recovery in patients with and without Axis I comorbidities is exp(−0.7185)=0.49, and the 95%
confidence interval for the OR is exp(−1.3737,
−0.0632)=(0.25, 0.94). The Wald test statistic for no association (or, equivalently, for H0 : β1=0 or H0 : OR=1), which appears in the table, is Z= −2.15, with an accompanyingp-value of 0.03.
We can obtain the LRT statistic for no associ- ation by fitting the model with the intercept as
the only covariate, which has a log-likelihood of
−119.8066, and comparing it to the log likelihood from the model with both the intercept and comor- bidity as covariates, −117.4037. The LRT statis- tic isχ21=2∗[−117.4037−(−119.8066)]=4.81.
The associatedp-value, which can be obtained using statistical software or estimated from chi-square dis- tribution tables, is 0.03. These results and their interpretation are identical to those obtained using methods for 2×2 contingency tables and reported in Section 2.3.2.
2.5.4 Multiple logistic regression
So far, we have only considered the simple case where there is a single covariatexi. Next, we con- sider the extensions of Equations 2.10 and 2.11 to the case where there are two or more covariates.
Recall that, in Section 2.4, we applied methods for stratified contingency tables to the first-episode major affective disorders with psychosis study data to test that theOR comparing patients with and without comorbidities adjusted for sex equals 1. Methods for stratified contingency tables are useful when adjust- ing for a small number of categorical covariates.
However, multiple logistic regression has important advantages over stratified contingency table meth- ods when the number of categorical covariates is larger or when we want to adjust for quantitative covariates. For example, using the first-episode data, we may want to test that theORadjusted for both sex and age equals 1 and to obtain an estimate of the adjusted ORwithout classifying age into arbi- trary categories.
When there are many covariates, the logistic regression model becomes,
log[pi/(1−pi)]=β0+β1xi1
+β2xi2+ · · · +βKxiK, (2.13) wherexi1,xi2,. . .,xiKare theKcovariates. The logis- tic regression coefficients in Equation 2.13 have the following interpretations. The logistic regres- sion intercept,β0, now has interpretation as the log odds of success when all covariates equal 0, that is whenxi1=xi2= · · · =xiK=0. Each of the logistic regression slopes,βk(fork=1,. . .,K), has interpre- tation as the change in the log odds of success for a
single unit change inxik given that all of the other covariates remain constant.
Note that the appealing property of logistic regres- sion that the sameORcan be estimated from either a prospective or retrospective study design read- ily generalises whenxik is quantitative rather than dichotomous, and also when there are two or more predictor variables. Methods for hypothesis testing and constructing confidence intervals also generalise easily from the predictor in a simple logistic regres- sion model (β1) to a predictor in a multiple logistic regression modelβk. Expressions for Wald test statis- tics and confidence intervals for (βk) can be obtained by substitutingβk forβ1 in the relevant portions of Section 2.5.2. LRTs ofβk=0 can be constructed by comparing the fit of the full model withβk included to the fit of a reduced model with all covariates exceptβkincluded. Twice the difference between the maximised log likelihood for the full model and the maximised log likelihood for the reduced model still approximately follows a chi-square distribution with one degree of freedom.
2.5.5 Example: Multiple logistic regression
To obtain an estimate of the OR for comorbidity adjusted for sex and age and to test that the adjusted ORequals one, we fit the following multiple logistic regression model to the first-episode major affective disorders with psychosis data:
logit[pr(Recoveryi=1)]=β0+β1∗Comorbidityi +β2∗Malei+β3∗Agei,
(2.14) whereMalei is an indicator variable coded 1 if the ith subject is male and 0 if theith subject is female andAgeiis the age of theith subject in decades. The following results are obtained:
βˆ s.e.( ˆβ) Z p>|Z| 95% CI Intercept −1.4019 0.4955 −2.83 0.005 −2.3730,
−0.4307 Comorbidity −0.4845 0.3496 −1.39 0.166 −1.1697,
0.2008 Male 0.0049 0.3243 0.01 0.988 −0.6307,
0.6404 Age 0.3107 0.1094 2.84 0.004 0.0963, 0.5250
The estimate of theORfor comorbidity adjusted for sex and age is exp(−0.4845)=0.62, and its 95% confidence interval is exp(−1.1697, 0.2008)
=(0.31, 1.22). Holding sex and age constant, we estimate that the odds of two-year functional recov- ery is 38% lower for patients with Axis I comorbidity when compared to patients without Axis I comorbid- ity. However, note from the 95% confidence interval that our data are consistent with odds of recovery up to 22% higher for patients with Axis I comor- bidity. In addition, the Wald test statistic for testing that the adjustedORequals one isZ= −1.39 with an associatedp-value of 0.17, and the LRT statis- tic isχ21=1.95 with an associatedp-value of 0.16.
Using either test we conclude there is no association between Axis I comorbidity and 2-year functional recovery after adjusting for sex and age.
We can also use the results from the multiple logis- tic regression to obtain estimates and test statistics for the other covariates in the model. The esti- mated OR comparing odds of recovery in males versus females is 1.00 (95% confidence interval:
0.53, 1.90), and we conclude from the Wald test that there is no evidence of an association between sex and recovery after adjusting for Axis I comor- bidity and age (Z=0.01,p=0.99). On the other hand, the estimatedORcomparing odds of recovery for a 10-year age increase is 1.36 (95% confidence interval: 1.10, 1.69). Adjusting for Axis I comorbid- ity and sex, the odds of two-year functional recovery increases with age (Z=2.84,p=0.004); for every decade age increase, we estimate that the odds of recovery is 36% higher.
2.5.6 Categorical predictors with more than two levels in logistic regression
Section 2.3.3 presented contingency table methods that could be used to test for independence with pre- dictors or outcomes with more than two categories.
This section describes how logistic regression accom- modates predictors with more than two categories, either with or without adjustment for additional covariates. (A later section describes extensions of logistic regression that accommodate outcomes with more than two categories.) For K unordered cat- egories, a test for independence can be obtained by addingK−1 indicator or ‘dummy’ variables as
covariates in the regression, where thekth indicator variable is coded 1 for subjects in thekth category and 0 for all other subjects (so that subjects in the remaining ‘reference’ category are coded 0 for all K−1 indicator variables). A LRT for no association can be conducted by comparing the log likelihood for the model containing the predictor to the log likeli- hood for the model with theK−1 indicator variables corresponding to the predictor removed; the LRT statistic follows a chi-square distribution withK−1 degrees of freedom. Wald and score hypothesis tests are also available. However, when a predictor has three or more categories, the Wald test of no associa- tion is sometimes not available from standard logistic regression output and must be requested.
For ordered categories, a test for independence can be conducted by assigning scores to each level of the predictor and then using the score as a covariate in the regression model. For example, the scores 1, 2 and 3 could be assigned to the categories mild, mod- erate and severe. TheZstatistic for the covariate then corresponds to a test for no association, and inter- pretation of the corresponding regression parameter is similar to the interpretation of a regression param- eter for a quantitative predictor. For example, the OR for the severity predictor would compare the odds of the outcome for a one category increase in severity, either moderate versus mild or severe versus moderate. This approach is most appropriate when the association between the score and outcome is approximately linear.
2.5.7 Example: Logistic regression with a three-level predictor
In Section 2.3.4, we performed tests for independence between type of onset of first-episode affective disor- der with psychosis (categorised as chronic, subacute or acute) and 2-year functional recovery. Equivalent tests can be performed using logistic regression by fitting the model:
logit[pr(Recoveryi=1)]=β0+β1∗Subacutei +β2∗Acutei, (2.15) whereSubacutei is an indicator variable coded 1 if theith subject had subacute onset and 0 otherwise, Acutei is an indicator variable coded 1 if the ith