Logistic regression - Textbook in Psychiatric Epidemiology

In this section we consider how the relationships in multi-way contingency tables, and more complicated designs, can be explored using regression methods known as logistic regression. Logistic regression is one of the most widely used methods for the analysis of binary data. It is used to examine and describe the relationship between a binary response variable Yi

(e.g. 1=‘success’ or 0=‘failure’) and one or more covariates fori=1,. . .,nindependent subjects. The covariates can be continuous or categorical (e.g. indicator variables). Denoting the two possible outcomes forY_iby 0 and 1, the probability distribution of the response variable is the Bernoulli distribution with probability of success p_i. In common with linear regression, the primary objective of logistic regression is to model the mean of the response variable, given a set of covariates. Recall that with a binary response, the mean of Y_i is simply the probability thatY_itakes on the value 1,p_i. However, what dis- tinguishes logistic regression from linear regression is that the response variable is binary rather than continuous in nature. This has a number of consequences for modelling the mean of the response variable. For ease of exposition, we will first consider the simple case where there is only a single predictor variable, sayx_i. Generalisations to more than one predictor variable will be considered later.

Since linear models play such an important and dominant role in applied statistics, it may at first seem natural to assume a linear model relating the mean ofY_itox_i,

pr[Y_i=1|x_i]=p_i=β0+β1x_i (2.8) However, expressing p_i as a linear function is problematic since it violates the restriction that probabilities must lie within the range from 0 to 1. As a result, for sufficiently large or small values ofx_i, the linear model given by Equation 2.8 will yield probabilities outside of the permissible range. A further difficulty with the linear model for the probabilities is that we often expect a nonlinear relationship betweenp_iandx_i. For example, a 0.2 unit increase in pimight be considered more ‘extreme’ whenpi=0.1 than whenpi=0.5. In terms of ratios, the change from pi=0.1 topi=0.3 represents a threefold or

200% increase, whereas the change fromp_i=0.5 to pi=0.7 represents only a 40% increase. In a sense, the units of measurement for a probability or proportion are often not considered to be constant over the range from 0 to 1. The linear probability model given by Equation 2.8 simply does not take this into consideration when relatingp_itox_i.

To circumvent these problems, a nonlinear transformation is usually applied to p_i and the trans- formed probabilities are related linearly to x_i. In particular, a transformation ofp_i, sayg(p_i), is cho- sen so that it maps the range of p_i from (0, 1) to (−∞,∞). Since there are many possible transformations,g(pi), that achieve this goal, this leads to an extensive choice of models that are all of the form

g(p_i)=β0+β1x_i. (2.9) However, the most commonly used in practice are 1 Logit or logistic function:g(p_i)=log[p_i/(1−p_i)]

2 Probit or inverse normal function: g(p_i)=⁻¹ (p_i), whereis the standardised normal cumula- tive distribution function

3 Complementary log–log function: g(pi)=log [−log(1−pi)].

We note that all of these transformations are very closely related when 0.2<pi<0.8, and in a sense only differ in the degree of ‘tail-stretching’ outside of this range. Indeed, for most practical purposes it is not possible to discriminate between a data analysis that is based on, for example, the logit and probit functions. To discriminate empirically between probit and logistic regression would, in general, require very large numbers of observations.

However, the logit function does have a number of distinct advantages over the probit and complementary log–log functions which probably account for its more widespread use in practice. Later in this chapter we will consider some of the advantages of the logit or logistic function.

When the logit or logistic function is adopted, the resulting model

logit(p_i)=log[p_i/(1−p_i)]=β0+β1x_i, (2.10) is known as thelogistic regressionmodel. Recall from Section 2.3.1 that ifpiis the probability of success,

then_1−p^pⁱ

i is theoddsof success. Consequently, logistic regression assumes a linear relationship between the log odds of success andxi. Note that this simple model can be expressed equivalently in terms ofp_i,

p_i= exp(β0+β1x_i)

1+exp(β0+β1x_i). (2.11) We must emphasise that Equations 2.10 and 2.11 are completely equivalent ways of expressing the logistic regression model. Expression 2.10 describes how the log odds, log(_1−p^pⁱ

i), has a linear relationship with x_i, while expression 2.11 describes how p_i has an S-shaped relationship with increasing values of β1x_i; although, in general, this relationship is approximately linear within the range 0.2<pi<0.8 (see Figure 2.1 for a plot ofp_iversusx_iwhenβ0=0.5 andβ1=0.9). Observe that the expression on the right of (Equation 2.11) cannot yield a value that is either negative or greater than 1. That is, the logistic transformation ensures that the predicted probabilities are restricted to the range from 0 to 1.

Probability of Success

−4 −2 0 4

0.0 0.2 0.4 0.6 0.8 1.0

Fig 2.1 Plot of logistic response function.

Finally, note that

1−p_i= 1

1+exp(β0+β1x_i), so that the odds, _1−p^pⁱ

i, is simply exp(β0+β1x_i).

2.5.1 Interpretation of logistic regression coefficients

Next we consider the interpretation of the logistic regression coefficients,β0andβ1, in Equation 2.10.

In simple linear regression, recall that the interpretation of the slope of the regression is in terms of changes in the mean ofYi for a single unit change in x_i. Similarly, the logistic regression slope, β1, in Equation 2.10 has interpretation as the change in the log odds of success for a single unit change in x_i. Equivalently, a single unit change inx_iincreases or decreases the odds of successmultiplicativelyby a factor of exp(β1). Also, recall that the intercept in simple linear regression has interpretation as the mean value of the response variable whenx_iis equal to 0. Similarly, the logistic regression interceptβ0, has interpretation as the log odds of success whenxi=0.

Note that, for case–control studies, the interceptβ0

cannot be validly estimated since it is determined by the proportions of ‘successes’ (Y=1) and ‘failures’

(Y=0) selected by the study design. However, in many studies, there is far less scientific interest in the intercept than in the slope.

For the special case wherex_iis dichotomous, tak- ing values of 0 and 1, the logistic regression slope, β1, has a simple and very attractive interpretation.

Consider the two possible values forp_iwhenx_i=0

andx_i=1. Letp_i(x_i=j), denote the probability of success whenxi=j, forj=0, 1.Then,

β1=(β0+β1)−β0

=logit[p_i(x_i=1)]−logit[p_i(x_i=0)]

=log

p_i(x_i=1)×[1−p_i(x_i=0)]

p_i(x_i=0)×[1−p_i(x_i=1)]

which is the log of theOR(or cross-product ratio) in the 2×2 table of the cross-classification ofY_iandx_i (see Table 2.10). Thus, exp(β1) has interpretation as theORof the response for the two possible values of the covariate.

TheORhas many appealing properties that probably account for the widespread use of logistic regression in many areas of application. First, as was noted earlier, the ORdoes not change when rows and columns of the 2×2 table are interchanged.

This implies that it is not necessary to distinguish which variable is the response and predictor variable in order to estimate theOR. Furthermore, as noted in the previous sections, a very appealing feature of theOR, exp(β1), is that it is equally valid regardless of whether the study design is prospective, cross- sectional or retrospective. That is, logistic regression provides an estimate of the same association between Y_i and x_i in all three study designs. Finally, in psychiatric studies where Y_i typically denotes the presence or absence of a disease or disorder, theOR is often interpreted as an approximation to theRR of disease, ^p(x_p(xⁱ⁼¹⁾

i=0). When the disease is rare, andp_i is reasonably close to 0 in both of the risk groups (often known as the ‘rare disease’ assumption), the ORprovides a close approximation to theRR. Retro- spective designs are especially common in psychiatry Table 2.10 Cross-classification probabilities for logistic regression ofYonx.

1 0

1 x

p(x = 1) = exp(β0 + β1)

1 + exp(β0 + β1) ¹⁻^p(x⁼¹⁾⁼

1 + exp(β10 + β1) ^1.0

p(x = 0) = exp(β0)

1 + exp(β0) 1 − p(x = 0) = 1

1 + exp(β0) ^1.0 Total

where the possible outcomes of interest are very rare.

Although theRRcannot be estimated from a retrospective study, the ORcan be used to provide an approximation to the RR. Extra care is necessary when interpreting theORas an approximation to theRRin prospective studies. In many prospective studies the binary event is relatively common (say greater than 10%) and the ‘rare disease’ assumption no longer holds; in these settings, theORcan be a very poor and unreliable approximation to theRR and should not be given such an interpretation.

2.5.2 Hypothesis testing and confidence intervals for logistic regression parameters

Often, we are interested in testing for an association between the predictor in our logistic regression model and the outcome, or, equivalently, testing H₀ : β1=0. As for 2×2 table methods, Wald, likelihood ratio and score statistics can be used for this test. A Wald test statistic can be obtained using the result that the estimate of β1 divided by its standard error (s.e.) approximately follows a N(0, 1) distribution in large samples. A LRT statistic can be obtained by comparing the log likelihood for the full model with the predictor included to the log likelihood for a reduced model including only the interceptβ0; the former is at least as large as the lat- ter. In large samples, twice the difference between the maximised log likelihoods for the full and reduced models approximately follows a chi-square distribution with 1 degree of freedom.

Two-sided Wald confidence limits for β1 can be obtained using the result that ˆβ1 follows an approx- imate normal distribution; the confidence limits are given by the formula ˆβ1±z_α/2∗s.e.( ˆβ1). Just as we can exponentiate ˆβ1 to get an estimate of the OR comparing the odds of disease for a unit change in x₁, we can exponentiate the lower and upper limits of the confidence interval forβ1 to get a confidence interval for theOR. Estimates ofβ1(or, alternatively, its associatedOR), its standard error and the log likelihood for the model are available from the output from logistic regression routines from popular statistical software. Test statistics andp-values for tests thatβ1=0 and Wald 95% confidence intervals are often also included automatically. Likelihood ratio and score test statistics can sometimes be requested.

Although Wald tests and confidence intervals are standard output from software for fitting logistic regression, we caution the reader that in certain circumstances the performance of Wald tests (and confidence intervals) can be somewhat irregular and lead to misleading conclusions. As a result, we rec- ommend that LRTs (and confidence intervals) be used whenever possible.

2.5.3 Example: Logistic regression with a single binary covariate

We now return to the Table 2.1 data from the first-episode major affective disorders with psychosis study and show that we can obtain identical results using large-sample methods for 2×2 contingency tables (as reported in Section 2.3.2) and logistic regression. Recall that our interest is in the association between Axis I comorbidity and 2-year functional recovery in this group of patients. Using logistic regression, we fit the model:

logit[pr(Recoveryi=1)]=β0+β1∗Comorbidity_i, (2.12) where Recovery_i is an indicator variable coded 1 if the ith subject recovered and 0 otherwise, and Comorbidity_iis an indicator variable coded 1 if the ith subject had Axis I Comorbidity and 0 otherwise.

The following are the results:

βˆ s.e.( ˆβ) Z p>|Z| 95% CI Intercept −0.2624 0.1881 −1.39 0.163 −0.6310,

0.1063 Comorbidity−0.7185 0.3343 −2.15 0.032 −1.3737,

−0.0632

The estimate of the OR comparing the odds of recovery in patients with and without Axis I comorbidities is exp(−0.7185)=0.49, and the 95%

confidence interval for the OR is exp(−1.3737,

−0.0632)=(0.25, 0.94). The Wald test statistic for no association (or, equivalently, for H₀ : β1=0 or H₀ : OR=1), which appears in the table, is Z= −2.15, with an accompanyingp-value of 0.03.

We can obtain the LRT statistic for no association by fitting the model with the intercept as

the only covariate, which has a log-likelihood of

−119.8066, and comparing it to the log likelihood from the model with both the intercept and comorbidity as covariates, −117.4037. The LRT statistic isχ²₁=2∗[−117.4037−(−119.8066)]=4.81.

The associatedp-value, which can be obtained using statistical software or estimated from chi-square distribution tables, is 0.03. These results and their interpretation are identical to those obtained using methods for 2×2 contingency tables and reported in Section 2.3.2.

2.5.4 Multiple logistic regression

So far, we have only considered the simple case where there is a single covariatex_i. Next, we consider the extensions of Equations 2.10 and 2.11 to the case where there are two or more covariates.

Recall that, in Section 2.4, we applied methods for stratified contingency tables to the first-episode major affective disorders with psychosis study data to test that theOR comparing patients with and without comorbidities adjusted for sex equals 1. Methods for stratified contingency tables are useful when adjusting for a small number of categorical covariates.

However, multiple logistic regression has important advantages over stratified contingency table methods when the number of categorical covariates is larger or when we want to adjust for quantitative covariates. For example, using the first-episode data, we may want to test that theORadjusted for both sex and age equals 1 and to obtain an estimate of the adjusted ORwithout classifying age into arbi- trary categories.

When there are many covariates, the logistic regression model becomes,

log[pi/(1−pi)]=β0+β1xi1

+β2x_i2+ · · · +βKx_iK, (2.13) wherex_i1,x_i2,. . .,x_iKare theKcovariates. The logistic regression coefficients in Equation 2.13 have the following interpretations. The logistic regression intercept,β0, now has interpretation as the log odds of success when all covariates equal 0, that is whenxi1=xi2= · · · =xiK=0. Each of the logistic regression slopes,βk(fork=1,. . .,K), has interpretation as the change in the log odds of success for a

single unit change inx_ik given that all of the other covariates remain constant.

Note that the appealing property of logistic regression that the sameORcan be estimated from either a prospective or retrospective study design read- ily generalises whenx_ik is quantitative rather than dichotomous, and also when there are two or more predictor variables. Methods for hypothesis testing and constructing confidence intervals also generalise easily from the predictor in a simple logistic regression model (β1) to a predictor in a multiple logistic regression modelβk. Expressions for Wald test statistics and confidence intervals for (βk) can be obtained by substitutingβk forβ1 in the relevant portions of Section 2.5.2. LRTs ofβk=0 can be constructed by comparing the fit of the full model withβk included to the fit of a reduced model with all covariates exceptβkincluded. Twice the difference between the maximised log likelihood for the full model and the maximised log likelihood for the reduced model still approximately follows a chi-square distribution with one degree of freedom.

2.5.5 Example: Multiple logistic regression

To obtain an estimate of the OR for comorbidity adjusted for sex and age and to test that the adjusted ORequals one, we fit the following multiple logistic regression model to the first-episode major affective disorders with psychosis data:

logit[pr(Recovery_i=1)]=β0+β1∗Comorbidity_i +β2∗Malei+β3∗Age_i,

(2.14) whereMalei is an indicator variable coded 1 if the ith subject is male and 0 if theith subject is female andAge_iis the age of theith subject in decades. The following results are obtained:

βˆ s.e.( ˆβ) Z p>|Z| 95% CI Intercept −1.4019 0.4955 −2.83 0.005 −2.3730,

−0.4307 Comorbidity −0.4845 0.3496 −1.39 0.166 −1.1697,

0.2008 Male 0.0049 0.3243 0.01 0.988 −0.6307,

0.6404 Age 0.3107 0.1094 2.84 0.004 0.0963, 0.5250

The estimate of theORfor comorbidity adjusted for sex and age is exp(−0.4845)=0.62, and its 95% confidence interval is exp(−1.1697, 0.2008)

=(0.31, 1.22). Holding sex and age constant, we estimate that the odds of two-year functional recovery is 38% lower for patients with Axis I comorbidity when compared to patients without Axis I comorbidity. However, note from the 95% confidence interval that our data are consistent with odds of recovery up to 22% higher for patients with Axis I comorbidity. In addition, the Wald test statistic for testing that the adjustedORequals one isZ= −1.39 with an associatedp-value of 0.17, and the LRT statistic isχ²₁=1.95 with an associatedp-value of 0.16.

Using either test we conclude there is no association between Axis I comorbidity and 2-year functional recovery after adjusting for sex and age.

We can also use the results from the multiple logistic regression to obtain estimates and test statistics for the other covariates in the model. The estimated OR comparing odds of recovery in males versus females is 1.00 (95% confidence interval:

0.53, 1.90), and we conclude from the Wald test that there is no evidence of an association between sex and recovery after adjusting for Axis I comorbidity and age (Z=0.01,p=0.99). On the other hand, the estimatedORcomparing odds of recovery for a 10-year age increase is 1.36 (95% confidence interval: 1.10, 1.69). Adjusting for Axis I comorbidity and sex, the odds of two-year functional recovery increases with age (Z=2.84,p=0.004); for every decade age increase, we estimate that the odds of recovery is 36% higher.

2.5.6 Categorical predictors with more than two levels in logistic regression

Section 2.3.3 presented contingency table methods that could be used to test for independence with predictors or outcomes with more than two categories.

This section describes how logistic regression accom- modates predictors with more than two categories, either with or without adjustment for additional covariates. (A later section describes extensions of logistic regression that accommodate outcomes with more than two categories.) For K unordered categories, a test for independence can be obtained by addingK−1 indicator or ‘dummy’ variables as

covariates in the regression, where thekth indicator variable is coded 1 for subjects in thekth category and 0 for all other subjects (so that subjects in the remaining ‘reference’ category are coded 0 for all K−1 indicator variables). A LRT for no association can be conducted by comparing the log likelihood for the model containing the predictor to the log likelihood for the model with theK−1 indicator variables corresponding to the predictor removed; the LRT statistic follows a chi-square distribution withK−1 degrees of freedom. Wald and score hypothesis tests are also available. However, when a predictor has three or more categories, the Wald test of no association is sometimes not available from standard logistic regression output and must be requested.

For ordered categories, a test for independence can be conducted by assigning scores to each level of the predictor and then using the score as a covariate in the regression model. For example, the scores 1, 2 and 3 could be assigned to the categories mild, moderate and severe. TheZstatistic for the covariate then corresponds to a test for no association, and interpretation of the corresponding regression parameter is similar to the interpretation of a regression parameter for a quantitative predictor. For example, the OR for the severity predictor would compare the odds of the outcome for a one category increase in severity, either moderate versus mild or severe versus moderate. This approach is most appropriate when the association between the score and outcome is approximately linear.

2.5.7 Example: Logistic regression with a three-level predictor

In Section 2.3.4, we performed tests for independence between type of onset of first-episode affective disorder with psychosis (categorised as chronic, subacute or acute) and 2-year functional recovery. Equivalent tests can be performed using logistic regression by fitting the model:

logit[pr(Recovery_i=1)]=β0+β1∗Subacute_i +β2∗Acute_i, (2.15) whereSubacute_i is an indicator variable coded 1 if theith subject had subacute onset and 0 otherwise, Acutei is an indicator variable coded 1 if the ith

Dalam dokumen Textbook in Psychiatric Epidemiology (Halaman 32-39)