• Consider an IxJ contingency table that
cross-classifies a multinomial sample of n subjects on two categorical responses.
• The cell probabilities are (i j) and the expected
Independence Model
Under statistical independence
For multinomial sampling
Denote the row variable by X and the
column variable by Y
Thus
for a row effect and a column effect
This is the loglinear model of
independence.
As usual, identifiability requires constraints such as
• The tests using X2 and G2 are also
goodness-of-fit tests of this loglinear model.
• Loglinear models for contingency tables are
GLMs that treat the N cell counts as
independent observations of a Poisson random component.
• Loglinear GLMs identify the data as the N cell
counts rather than the individual classifications of the n subjects.
• The expected cell counts link to the explanatory
• The model does not distinguish
between response and explanatory variables.
• It treats both jointly as responses,
modeling ij for combinations of their
levels.
• To interpret parameters, however, it
• We illustrate with the independence
model for Ix2 tables.
• The final term does not depend on i;
• that is, logit[P(Y=1| X=i)] is identical
at each level of X
• Thus, independence implies a model
An analogous property holds when
J>2.
• Differences between two parameters
for a given variable relate to the log odds of making one response,
Saturated Model
Statistically dependent variables satisfy a more complex loglinear model
The are association terms that reflect deviations from independence.
The represent interactions between X
and Y, whereby the effect of one variable on ij depends on the level of the other
direct relationships exist between log odds ratios and
Parameter Estimation
Let {ij} denote expected frequencies.
Suppose all ijk >0 and let ij = log ij .
A dot in a subscript denotes the
average with respect to that index; for instance,
We set
, ,
The sum of parameters for any index equals zero. That is
INFERENCE FOR LOGLINEAR MODELS
Chi-Squared Goodness-of-Fit Tests
• As usual, X 2 and G2 test whether a model holds
by comparing cell fitted values to observed counts
• Where nijk = observed frequency and =expected
frequency . Here df equals the number of cell counts minus the number of model parameters.
•
Example for Saturated
Model
Sex Party Total
Democrat Republic
Male 222 (204.32) 115 (132.68) 337
Female 240 (257.68) 185 (167.32) 425
Total 462 300 762
Sex Party Total
Democrat Republic
Male Log(204.32) =
5.32 Log(132.68) = 4.89 10.21
Female Log(257.68) =
5.55
Log(167.32) = 5.12
10.67
)=204.38 )=132.95 )=257.24 )=167.34
Model lengkap tidak sesuai