Contingency Tables
Contingency Tables
1
1
.
.
Explain
Explain
22Test of Independence
Test of Independence
2
Contingency Tables
Contingency Tables
•
Tables representing all combinations
Tables representing all combinations
of levels of explanatory and response
of levels of explanatory and response
variables
variables
•
Numbers in table represent
Numbers in table represent
Counts
Counts
of the number of cases in each cell
of the number of cases in each cell
•
Row and column totals are called
Row and column totals are called
Marginal counts
2x2 Tables
2x2 Tables
•
Each variable has 2 levels
Each variable has 2 levels
–
Explanatory Variable – Groups (Typically
Explanatory Variable – Groups (Typically
based on demographics, exposure)
based on demographics, exposure)
2x2 Tables - Notation
2x2 Tables - Notation
Outcome
Present OutcomeAbsent GroupTotal
Group 1 n11 n12 n1.
Group 2 n21 n22 n2.
Outcome
2
2
Test of Independence
Test of Independence
•
1.
1.
Shows If a Relationship Exists
Shows If a Relationship Exists
Between 2 Qualitative Variables
Between 2 Qualitative Variables
–
One Sample Is Drawn
One Sample Is Drawn
–
Does
Does
Not
Not
Show Causality
Show Causality
•
2.
2.
Assumptions
Assumptions
–
Multinomial Experiment
Multinomial Experiment
–
All Expected Counts
All Expected Counts
5
5
2
2
Test of Independence
Test of Independence
Contingency Table
Contingency Table
•
1.
1.
Shows # Observations From 1
Shows # Observations From 1
Sample Jointly in 2 Qualitative
Sample Jointly in 2 Qualitative
Variables
2
2
Test of Independence
Test of Independence
Contingency Table
Contingency Table
•
1.
1.
Shows # Observations From 1
Shows # Observations From 1
Sample Jointly in 2 Qualitative
Sample Jointly in 2 Qualitative
Variables
Variables
Levels of variable 2Levels of variable 2
2
2
Test of Independence
Test of Independence
Hypotheses & Statistic
Hypotheses & Statistic
•
1.
1.
Hypotheses
Hypotheses
–
H
H
00: Variables Are Independent
: Variables Are Independent
2
2
Test of Independence
Test of Independence
Hypotheses & Statistic
Hypotheses & Statistic
•
1.
1.
Hypotheses
Hypotheses
–
H
H
00: Variables Are Independent
: Variables Are Independent
–
H
H
aa: Variables Are Related (Dependent)
: Variables Are Related (Dependent)
•
2.
2.
Test Statistic
Test Statistic
Observed countObserved countExpected Expected count
count
22
n
E n
E n
ij ij
ij
c h
c h
all cells
22
n
E n
E n
ij ij
ij
c h
c h
2
2
Test of Independence
Test of Independence
Hypotheses & Statistic
Hypotheses & Statistic
•
1.
1.
Hypotheses
Hypotheses
–
H
H
00: Variables Are Independent
: Variables Are Independent
–
H
H
aa: Variables Are Related (Dependent)
: Variables Are Related (Dependent)
•
2.
2.
Test Statistic
Test Statistic
•
Degrees of Freedom: (
Degrees of Freedom: (
r
r
- 1)(
- 1)(
RowsRowsc
c
- 1)
- 1)
Columns Columns Observed countObserved count
Expected
Expected
count
count
22
n
E n
E n
ij ij
ij
c h
c h
all cells
22
n
E n
E n
ij ij
ij
c h
c h
2
2
Test of Independence
Test of Independence
Expected Counts
Expected Counts
•
1.
1.
Statistical Independence Means
Statistical Independence Means
Joint Probability Equals Product of
Joint Probability Equals Product of
Marginal Probabilities
Marginal Probabilities
•
2.
2.
Compute Marginal Probabilities &
Compute Marginal Probabilities &
Multiply for Joint Probability
Multiply for Joint Probability
•
3.
3.
Expected Count Is Sample Size
Expected Count Is Sample Size
Times Joint Probability
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Expected Count Example
Expected Count Example
112 112 160 160 Marginal probability =
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Expected Count Example
Expected Count Example
112
112
160
160
78 78 160 160
Marginal probability =
Marginal probability =
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Expected Count Example
Expected Count Example
112
112
160
160
78
78
160
160
Marginal probability =
Marginal probability =
Marginal probability =
Marginal probability =
Joint probability =
Joint probability = 112 112 160 160
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Location
Urban Rural
House Style
Obs.
Obs.
Total
Split-Level
63
49
112
Ranch
15
33
48
Total
78
82
160
Expected Count Example
Expected Count Example
112 112 160 160 78 78 160 160
Marginal probability =
Marginal probability =
Marginal probability =
Marginal probability =
Joint probability =
Joint probability = 112 112
160 160 78 78 160 160
Expected count = 160·
Expected Count Calculation
Expected Count Calculation
Expected count =
Row total
Column total
Sample size
a
fa
f
Expected count =
Row total
Column total
Sample size
Expected Count Calculation
Expected Count Calculation
112·82 112·82
160 160
48·78 48·78
160 160
48·82 48·82
160 160 112·78
112·78 160 160
Expected count =
Row total
Column total
Sample size
a
fa
f
Expected count =
Row total
Column total
Sample size
Diet Pepsi
Diet Coke
No
Yes
Total
No
84
32
116
Yes
48
122
170
Total
132
154
286
Diet Pepsi
Diet Coke
No
Yes
Total
No
84
32
116
Yes
48
122
170
Total
132
154
286
•
You’re a marketing research analyst. You
You’re a marketing research analyst. You
ask a random sample of
ask a random sample of
286
286
consumers if
consumers if
they purchase Diet Pepsi or Diet Coke. At
they purchase Diet Pepsi or Diet Coke. At
the
the
.05
.05
level, is there evidence of a
level, is there evidence of a
relationship
relationship
?
?
2
2
Test of Independence
Test of Independence
2
2
Test of Independence
Test of Independence
2
2
Test of Independence
Test of Independence
Solution
Solution
•
H
H
00:
:
•
H
H
aa:
:
=
=
•
df =
df =
•
Critical Value(s):
Critical Value(s):
Test Statistic:
Test Statistic:
Decision:
Decision:
Conclusion:
Conclusion:
20
Reject
20
2
2
Test of Independence
Test of Independence
Solution
Solution
•
H
H
00:
:
No
No
Relationship
Relationship
•
H
H
aa:
:
Relationship
Relationship
=
=
•
df =
df =
•
Critical Value(s):
Critical Value(s):
Test Statistic:
Test Statistic:
Decision:
Decision:
Conclusion:
Conclusion:
20
Reject
20
2
2
Test of Independence
Test of Independence
Solution
Solution
•
H
H
00:
:
No
No
Relationship
Relationship
•
H
H
aa:
:
Relationship
Relationship
=
=
.05
.05
•
df =
df =
(2 - 1)(2 - 1)
(2 - 1)(2 - 1)
= 1
= 1
•
Critical Value(s):
Critical Value(s):
Test Statistic:
Test Statistic:
Decision:
Decision:
Conclusion:
Conclusion:
20
Reject
20
2
2
Test of Independence
Test of Independence
Solution
Solution
•
H
H
00:
:
No
No
Relationship
Relationship
•
H
H
aa:
:
Relationship
Relationship
=
=
.05
.05
•
df =
df =
(2 - 1)(2 - 1)
(2 - 1)(2 - 1)
= 1
= 1
•
Critical Value(s):
Critical Value(s):
Test Statistic:
Test Statistic:
Decision:
Decision:
Conclusion:
Conclusion:
20 3.841 Reject
20 3.841
Reject
E
E((nnijij)) 5 in all 5 in all cells
cells
170·132 170·132
286 286
170·154 170·154
286 286 116·132
116·132 286 286
154·1 154·11616
286 286
2
2
Test of Independence
Test of Independence
2 2 11 11 2 11 12 12 2 12 22 22 2 222 2 2
84 53 5
53 5
32 62 5
62 5
122 915
915
54 29
n
E n
E n
n
E n
E n
n
E n
E n
n
E n
E n
ij ij ij
.
.
.
.
.
.
.
c h
c h
a f
a f
a f
a f
a f
a f
all cells
2 2 11 11 2 11 12 12 2 12 22 22 2 222 2 2
84 53 5
53 5
32 62 5
62 5
122 915
915
54 29
n
E n
E n
n
E n
E n
n
E n
E n
n
E n
E n
ij ij ij
.
.
.
.
.
.
.
c h
c h
a f
a f
a f
a f
a f
a f
all cells
2
2
Test of Independence
Test of Independence
2
2
Test of Independence
Test of Independence
Solution
Solution
•
H
H
00:
:
No
No
Relationship
Relationship
•
H
H
aa:
:
Relationship
Relationship
= .05
= .05
•
df
df
= (2 - 1)(2 - 1)
= (2 - 1)(2 - 1)
= 1
= 1
•
Critical Value(s):
Critical Value(s):
Test Statistic:
Test Statistic:
Decision:
Decision:
Conclusion:
Conclusion:
20 3.841 Reject
20 3.841
Reject
= .05= .05
2
2
Test of Independence
Test of Independence
Solution
Solution
•
H
H
00:
:
No
No
Relationship
Relationship
•
H
H
aa:
:
Relationship
Relationship
= .05
= .05
•
df
df
= (2 - 1)(2 - 1)
= (2 - 1)(2 - 1)
= 1
= 1
•
Critical Value(s):
Critical Value(s):
Test Statistic:
Test Statistic:
Decision:
Decision:
Conclusion:
Conclusion:
Reject at
Reject at
= .05
= .05
20 3.841 Reject
20 3.841
Reject
= .05= .05
2
2
Test of Independence
Test of Independence
Solution
Solution
•
H
H
00:
:
No
No
Relationship
Relationship
•
H
H
aa:
:
Relationship
Relationship
= .05
= .05
•
df
df
= (2 - 1)(2 - 1)
= (2 - 1)(2 - 1)
= 1
= 1
•
Critical Value(s):
Critical Value(s):
Test Statistic:
Test Statistic:
Decision:
Decision:
Conclusion:
Conclusion:
Reject at
Reject at
= .05
= .05
There is evidence of a
There is evidence of a
relationship
relationship
20 3.841 Reject
20 3.841
Reject
= .05= .05
Siskel and Ebert
Siskel and Ebert
• | Ebert• Siskel | Con Mix Pro | Total
•
---+---+---• Con | 24 8 13 | 45
• Mix | 8 13 11 | 32
• Pro | 10 9 64 | 83
•
Siskel and Ebert
Siskel and Ebert
• | Ebert
• Siskel | Con Mix Pro | Total
•
---+---+---• Con | 24 8 13 | 45
• | 11.8 8.4 24.8 | 45.0
•
---+---+---• Mix | 8 13 11 | 32
• | 8.4 6.0 17.6 | 32.0
•
---+---+---• Pro | 10 9 64 | 83
• | 21.8 15.6 45.6 | 83.0
•
---+---+---• Total | 42 30 88 | 160
• | 42.0 30.0 88.0 | 160.0
Yate’s Statistics
Yate’s Statistics
•
Method of testing for association for
Method of testing for association for
2x2 tables when
2x2 tables when
sample size is
sample size is
moderate ( total observation
moderate ( total observation
between 6 – 25)
between 6 – 25)
ij i j
ij ij
e
e
O
2
2
5
.
0
End of Chapter
Any blank slides that follow are
blank intentionally.
Measures of association
Measures of association
–
Relative Risk
Relative Risk
–
Odds Ratio
Odds Ratio
Relative Risk
Relative Risk
•
Ratio of the probability that the outcome
Ratio of the probability that the outcome
characteristic is present for one group,
characteristic is present for one group,
relative to the other
relative to the other
•
Sample proportions with characteristic
Sample proportions with characteristic
from groups 1 and 2:
from groups 1 and 2:
. 2 21 2
^
. 1 11 1
^
n
n
n
n
Relative Risk
Relative Risk
•
Estimated Relative Risk:
Estimated Relative Risk:
2 ^ 1 ^
RR
95% Confidence Interval for Population Relative Risk:
21 2 ^ 11 1 ^ 96 . 1 96 . 1 ) 1 ( ) 1 ( 71828 . 2 ) ) ( , ) ( ( n n v e e RR e
RR v v
Relative Risk
Relative Risk
•
Interpretation
Interpretation
– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is higher (in the population) for group
is present is higher (in the population) for group
1 if the entire interval is above 1
1 if the entire interval is above 1
– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is lower (in the population) for group 1
is present is lower (in the population) for group 1
if the entire interval is below 1
if the entire interval is below 1
– Do not conclude that the probability of the Do not conclude that the probability of the
outcome differs for the two groups if the interval
outcome differs for the two groups if the interval
contains 1
Example - Coccidioidomycosis and
Example - Coccidioidomycosis and
TNF
TNF
-antagonists
-antagonists
• Research Question: Risk of developing Coccidioidmycosis associated with arthritis therapy?
• Groups: Patients receiving tumor necrosis factor (TNF)
versus Patients not receiving TNF (all patients arthritic)
COC No COC Total TNF 7 240 247
Other 4 734 738 Total 11 974 985
Example - Coccidioidomycosis and
Example - Coccidioidomycosis and
TNF
TNF
-antagonists
-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
)
76
.
17
,
55
.
1
(
)
24
.
5
,
24
.
5
(
:
%
95
3874
.
4
0054
.
1
7
0283
.
1
24
.
5
0054
.
0283
.
0054
.
738
4
0283
.
247
7
3874 . 96 . 1 3874 . 96 . 1 2 ^ 1 ^ 2 ^ 1 ^
e
e
CI
v
RR
Odds Ratio
Odds Ratio
•
Odds of an event is the probability it occurs Odds of an event is the probability it occurs divided by the probability it does not occurdivided by the probability it does not occur
•
Odds ratio is the odds of the event for group 1 Odds ratio is the odds of the event for group 1 divided by the odds of the event for group 2divided by the odds of the event for group 2
•
Sample odds of the outcome for each group:Sample odds of the outcome for each group:22 21 2
12 11
. 1 12
. 1 11
1
/ /
n n odds
n n n
n
n n
odds
Odds Ratio
Odds Ratio
• Estimated Odds Ratio:
21 12 22 11 22 21 12 11 2 1
/
/
n
n
n
n
n
n
n
n
odds
odds
OR
95% Confidence Interval for Population Odds Ratio
22 21 12 11 96 . 1 96 . 1 1 1 1 1 71828 . 2 ) ) ( , ) ( ( n n n n v e e OR e
OR v v
Odds Ratio
Odds Ratio
•
Interpretation
Interpretation
– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is higher (in the population) for group
is present is higher (in the population) for group
1 if the entire interval is above 1
1 if the entire interval is above 1
– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is lower (in the population) for group 1
is present is lower (in the population) for group 1
if the entire interval is below 1
if the entire interval is below 1
– Do not conclude that the probability of the Do not conclude that the probability of the
outcome differs for the two groups if the interval
outcome differs for the two groups if the interval
contains 1
Example - NSAIDs and GBM
Example - NSAIDs and GBM
•
Case-Control Study (Retrospective)
Case-Control Study (Retrospective)
– Cases: 137 Self-Reporting Patients with Glioblastoma Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)
Multiforme (GBM)
– Controls: 401 Population-Based Individuals matched to Controls: 401 Population-Based Individuals matched to
cases wrt demographic factors
cases wrt demographic factors
GBM Present GBM Absent
Total
NSAID User
32
138
170
NSAID Non-User
105
263
368
Total
137
401
538
Example - NSAIDs and GBM
Example - NSAIDs and GBM
) 91 . 0 , 37 . 0 ( ) 58 . 0 , 58 . 0 ( : % 95 0518 . 0 263 1 105 1 138 1 32 1 58 . 0 14490 8416 ) 105 ( 138 ) 263 ( 32 0518 . 0 96 . 1 0518 . 0 96 . 1 e e CI v OR
Absolute Risk
Absolute Risk
•
Difference Between Proportions of outcomes
Difference Between Proportions of outcomes
with an outcome characteristic for 2 groups
with an outcome characteristic for 2 groups
•
Sample proportions with characteristic
Sample proportions with characteristic
from groups 1 and 2:
from groups 1 and 2:
. 2 21 2
^
. 1 11 1
^
n
n
n
n
Absolute Risk
Absolute Risk
2 ^ 1 ^
AR
Estimated Absolute Risk:
95% Confidence Interval for Population Absolute Risk
Absolute Risk
Absolute Risk
•
Interpretation
Interpretation
– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is higher (in the population) for group
is present is higher (in the population) for group
1 if the entire interval is positive
1 if the entire interval is positive
– Conclude that the probability that the outcome Conclude that the probability that the outcome
is present is lower (in the population) for group 1
is present is lower (in the population) for group 1
if the entire interval is negative
if the entire interval is negative
– Do not conclude that the probability of the Do not conclude that the probability of the
outcome differs for the two groups if the interval
outcome differs for the two groups if the interval
contains 0
Example - Coccidioidomycosis and
Example - Coccidioidomycosis and
TNF
TNF
-antagonists
-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
) 0242 . 0 , 0016 . 0 ( 0213 . 0229 . 738 ) 9946 (. 0054 . 247 ) 9717 (. 0283 . 96 . 1 0229 . : % 95 0229 . 0054 . 0283 . 0054 . 738 4 0283 . 247 7 2 ^ 1 ^ 2 ^ 1 ^ CI
AR
Interval is entirely positive, TNF is
Ordinal Explanatory and Response
Ordinal Explanatory and Response
Variables
Variables
•
Pearson’s Chi-square test can be used to
Pearson’s Chi-square test can be used to
test associations among ordinal variables,
test associations among ordinal variables,
but more powerful methods exist
but more powerful methods exist
•
When theories exist that the association is
When theories exist that the association is
directional (positive or negative), measures
directional (positive or negative), measures
exist to describe and test for these specific
exist to describe and test for these specific
alternatives from independence:
alternatives from independence:
– GammaGamma
Concordant and Discordant Pairs
Concordant and Discordant Pairs
•
Concordant Pairs - Pairs of individuals where
Concordant Pairs - Pairs of individuals where
one individual scores “higher” on both ordered
one individual scores “higher” on both ordered
variables than the other individual
variables than the other individual
•
Discordant Pairs - Pairs of individuals where
Discordant Pairs - Pairs of individuals where
one individual scores “higher” on one ordered
one individual scores “higher” on one ordered
variable and the other individual scores
variable and the other individual scores
“
“
low
low
er” on the other
er” on the other
•
C
C
= # Concordant Pairs
= # Concordant Pairs
D
D
= # Discordant
= # Discordant
Pairs
Pairs
– Under Positive association, expect Under Positive association, expect CC > > DD
– Under Negative association, expect Under Negative association, expect C < C < DD
Example - Alcohol Use and Sick
Example - Alcohol Use and Sick
Days
Days
•
Alcohol Risk (Without Risk, Hardly any Risk,
Alcohol Risk (Without Risk, Hardly any Risk,
Some to Considerable Risk)
Some to Considerable Risk)
•
Sick Days (0, 1-6,
Sick Days (0, 1-6,
7)
7)
•
Concordant Pairs - Pairs of respondents
Concordant Pairs - Pairs of respondents
where one scores higher on both alcohol
where one scores higher on both alcohol
risk and sick days than the other
risk and sick days than the other
•
Discordant Pairs - Pairs of respondents
Discordant Pairs - Pairs of respondents
where one scores higher on alcohol risk and
where one scores higher on alcohol risk and
the other scores higher on sick days
the other scores higher on sick days
Example - Alcohol Use and Sick
Example - Alcohol Use and Sick
Days
Days
ALCOHOL * SICKDAYS Crosstabulation
Count
347 113 145 605 154 63 56 273 52 25 34 111 553 201 235 989 Without Risk
Hardly any Risk
Some-Considerable Risk ALCOHOL
Total
0 days 1-6 days 7+ days SICKDAYS
Total
• Concordant Pairs: Each individual in a given cell is concordant with each individual in cells
“Southeast” of theirs
Example - Alcohol Use and Sick
Example - Alcohol Use and Sick
Days
Days
ALCOHOL * SICKDAYS Crosstabulation
Count
347 113 145 605 154 63 56 273 52 25 34 111 553 201 235 989 Without Risk
Hardly any Risk
Some-Considerable Risk ALCOHOL
Total
Measures of Association
Measures of Association
• Goodman and Kruskal’s Gamma:
1 1 ^ ^ D C D C
• Kendall’s b:
) )(
( 2 2 . 2
. 2 ^
j i b n n n n D C When there’s no association between the ordinal variables, the population based values of these measures are 0.
Example - Alcohol Use and Sick
Example - Alcohol Use and Sick
Days
Days
0617
.
0
73496
83164
73496
83164
^
D
C
D
C
Symmetric Measures
.035 .030 1.187 .235 .062 .052 1.187 .235
989 Kendall's tau-b
Gamma Ordinal by
Ordinal
N of Valid Cases
Value
Asymp.
Std. Errora Approx. Tb Approx. Sig.
Not assuming the null hypothesis. a.