LOGLINEAR MODELS FOR
INDEPENDENCE AND
INTERACTION IN THREE-WAY
TABLES
Table Structure For Three Dimensions
•
When all variables are categorical, a
multidimensional contingency table
displays the data
•
We illustrate ideas using thr
three-variables case.
•
Denote the variables by X, Y, and Z. We
Death Penalty Example
Defendant’s race
Victim’s
Race Death PenaltyYes No Percentage Yes
White White 19 132 12.6
Black 0 9 0
Marginal table
Defendant’sRace Death PenaltyYes No Total
White 19 141 160
Black 17 149 166
Partial and Marginal Odd Ratio
Partial Odd ratio describe the association
when the third variable is controlled
The Marginal Odd ratio describe the
association when the Third variable is
Associatio
n VariablesP-D P-V D-V
Types of Independence
A three-way IXJXK cross-classification of response variables X, Y, and Z
has several potential types of independence
We assume a multinomial distribution with cell probabilities {i jk},
and
The models also apply to Poisson sampling with means }.
Similarly, X could be jointly independent of Y and Z, or Z could be jointly
independent of X and Y. Mutual independence (8.5) implies joint independence
of any one variable from the others.X holds for each partial table within which and Y are conditionally independent, given Z Z is fixed. That is, ifwhen independence
Marginal vs Conditional Independence
•
Partial association can be quite different
from marginal association
•
For further illustration, we now see that
conditional independence of X and Y,
given Z, does not imply marginal
independence of X and Y
•
The joint probability in Table 5.5 show
hypothetical relationship among three
Table 5.5 Joint Probability
Major Gender Income
Low High Liberal Art Female 0.18 0.12
Male 0.12 0.08 Science or
Engineering
The association between Y=income at first
job(high, low) and X=gender(female, male)
at two level of Z=major discipline (liberal
art, science or engineering) is described by
the odd ratios
Income and gender are conditionally
independent, given major
Marginal Probability of Y and X
Gender Income
low high
Female 0.18+0.02=0.20 0.12+0.08=0.20 Male 0.12+0.08=0.20 0.08+0.32=0.40 Total 0.40 0.60
The odd ratio for the (income,
gender) from marginal table
=2
The variables are not independent
when we ignore major
•
Suppose Y is jointly independent of X and
Z, so
Then
And summing both side over i we obtain
=
Therefore
So X and Y are also conditionally independent.
In summary, mutual indepedence of the variables
implies that Y is jointly independent of X and Z,
which itself implies that X and Y are conditionaaly
independent.
Suppose Y is jointly independent of X and Z, that
is .
Summing over k on both side, we obtain
Thus, X and Y also exhibit marginal independence
So, joint independence of Y from X and Z (or X
from Y and Z) implies X and Y are both
marginally and condotionally independent.
Since mutual independence of X, Y and Z implies
that Y is jointly independent of X and Z, mutual
independence also implies that X and Y are
both marginally and conditionally independent
However, when we know only that X and Y are
conditionally independent,
Summing over k on both sides, we obtain
•
All three terms in the summation involve
k, and this does not simplify to marginal
independence
A model that permits all three pairs to be conditionally dependent is
Loglinear Models for Three
Dimensions
•
Hierarchical Loglinear Models
Let {
ijk} denote expected frequencies.
Suppose all
ijk>0 and let
ijk= log
ijk .A dot in a subscript denotes the average
with respect to that index; for instance,
We set
, ,
The sum of parameters for any index
equals zero. That is
The general loglinear model for a three-way table is
This model has as many parameters as observations and describes all possible positive i jk
Setting certain parameters equal to zero in 8.12. yields the models introduced previously. Table 8.2 lists some of these models. To ease referring
Interpreting Model Parameters
Interpretations of loglinear model parameters use their highest-order terms.
For instance, interpretations for model (8.11). use the two-factor terms to
describe conditional odds ratios
At a fixed level k of Z, the conditional association between X and Y
uses (I- 1)(J – 1). odds ratios, such as the local odds ratios
Similarly, ( I – 1)(K – 1) odds ratios {i (j)k} describe XZ conditional
association, and (J – 1)(K – 1) odds ratios {(i)jk} describe YZ
Loglinear models have characterizations using constraints on
conditional odds ratios. For instance, conditional independence of
X and Y
is equivalent to {ij(k)} = 1, i=1, . . . , I-1, j=1, . . . , J-1, k=1, . . . ,
K.
substituting (8.11) for model (XY, XZ, YZ) into log ij(k) yields
Any model not having the three-factor interaction term has a homogeneous
Alcohol, Cigarette, and Marijuana Use Example
Table 8.3 refers to a 1992 survey by the Wright State University School of
Medicine and the United Health Services in Dayton, Ohio. The survey asked
2276 students in their final year of high school in a nonurban area near
Dayton, Ohio whether they had ever used alcohol, cigarettes, or marijuana.
Table 8.5 illustrates model association patterns by presenting estimated
conditional and marginal odds ratios
For example, the entry 1.0 for the AC conditional association for the model (AM, CM) of AC conditional independence is the
The entry 2.7 for the AC marginal association for this model is the odds ratio
for the marginal AC fitted table
Table 8.5 shows that estimated conditional odds ratios equal 1.0 for each
pairwise term not appearing in a model, such as the AC
association in model ( AM, CM).
For that model, the estimated marginal AC odds ratio differs from 1.0, since conditional independence does not imply marginal
independence.
Model (AC, AM, CM) permits all pairwise associations but maintains
homogeneous odds ratios between two variables at each level of the third.
The AC fitted conditional odds ratios for this model
INFERENCE FOR LOGLINEAR MODELS
Chi-Squared Goodness-of-Fit Tests
As usual, X 2 and G2 test whether a model holds by comparing cell
fitted
values to observed counts
Where nijk = observed frequency and =expected frequency Here df equals the number of cell counts minus the number of model parameters.
For the student survey (Table 8.3), Table 8.6 shows results of testing fit for
several loglinear models.
Models that lack any association term fit poorly
The model ( AC, AM, CM) that has all pairwise associations fits well (P=
0.54)
It is suggested by other criteria also, such as minimizing
AIC= - 2(maximized log likelihood - number of parameters in model)