Research Methods
William G. Zikmund
Bivariate Analysis - Tests of Differences
Type of Measurement Differences between two independent groups
Differences among three or more independent groups
Interval and ratio Independent groups:
t-test or Z-test
One-way ANOVA
Common Bivariate Tests
Type of Measurement Differences between two independent groups
Differences among three or more independent groups
Ordinal Mann-Whitney U-test
Wilcoxon test Kruskal-Wallis test
Common Bivariate Tests
Type of Measurement Differences between two independent groups
Differences among three or more independent groups
Nominal Z-test (two proportions)
Chi-square test Chi-square test
Common Bivariate Tests
Type of Measurement
Differences between two independent groups
Nominal Chi-square test
Differences Between Groups
• Contingency Tables
• Cross-Tabulation
• Chi-Square allows testing for significant differences between groups
• “Goodness of Fit”
Chi-Square Test
= −
i i
i )²
² (
E E x O
x² = chi-square statistics
Oi= observed frequency in the ithcell Ei = expected frequency on theith cell
n C E ij R i j
Ri= total observed frequency in the ith row Cj= total observed frequency in thejth column n = sample size
Chi-Square Test
Health Economics Research Method 2003/2
Degrees of Freedom (R-1)(C-1)=(2-1)(2-1)=1
d.f.=(R-1)(C-1)
Degrees of Freedom
Men Women Total
Aware 50 10 60
Unaware 15 25 40
65 35 100
Awareness of Tire Manufacturer’s Brand
Chi-Square Test: Differences Among Groups Example
21 ) 21 10
( 39
) 39 50
(
2 22
−
− +
= X
14
) 14 25
( 26
) 26 15
(
2−
2− +
+
161 .
22
643 .
8 654
. 4 762
. 5 102
. 3
2 2
=
= +
+ +
=
χ χ
1 )
1 2
)(
1 2
( .
.
) 1 )(
1 (
. .
=
−
−
=
−
−
= f d
C R
f d
X
2=3.84 with 1 d.f.
Type of Measurement
Differences between two independent groups
Interval and ratio
t-test or Z-test
Differences Between Groups when Comparing Means
• Ratio scaled dependent variables
• t-test
– When groups are small
– When population standard deviation is unknown
• z-test
– When groups are large
2
0
1
2 1
=
−
−
µ µ
µ µ
OR
Null Hypothesis About Mean Differences Between Groups
means random
of y Variabilit
2 mean -
1 mean t =
t-Test for Difference of Means
2 1
2 1
X
S X
t
−
Χ
−
= Χ
X1= mean for Group 1 X2 = mean for Group 2
SX1-X2 = the pooled or combined standard error of difference between means.
t-Test for Difference of Means
2 1
2 1
X X −
t-Test for Difference of Means
X1 = mean for Group 1 X2 = mean for Group 2 SX
1-X2 = the pooled or combined standard error of difference between means.
t-Test for Difference of Means
Pooled Estimate of the Standard Error
( )
+
− +
− +
= −
−
2 1
2 1
2 2 2
2 1
1
1 1
2
) ) 1 (
1
2
1
n n n n
S n
S
S
X Xn
S12 = the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2
Pooled Estimate of the Standard Error
Pooled Estimate of the Standard Error
t-test for the Difference of Means
( )
+
− +
− +
= −
−
2 1
2 1
2 2 2
2 1
1
1 1
2
) ) 1 (
1
2
1
n n n n
S n
S S
X Xn
S12= the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2
Degrees of Freedom
• d.f. = n - k
• where:
– n = n
1 +n
2– k = number of groups
( )( ) ( )( )
+
+
−
=
14 1 21
1 33
6 . 2 13 1
. 2
20
2 22
1 X
S
X797
= .
t-Test for Difference of Means
Example
797 .
2 . 12 5
.
16 −
=
t . 797
3 .
= 4
395 .
= 5
Type of Measurement
Differences between two independent groups
Nominal Z-test (two proportions)
Π
Comparing Two Groups when Comparing Proportions
• Percentage Comparisons
• Sample Proportion - P
• Population Proportion -
Differences Between Two Groups when Comparing Proportions
The hypothesis is:
H
o: Π
1= Π
2may be restated as:
H
o: Π
1− Π
2= 0
2
: π
1= π
H
oor
0 : π
1− π
2= H
oZ-Test for Differences of Proportions
Z-Test for Differences of Proportions
2 1
2 1
2 1
p
S p
p Z p
−
−
−
= − π π
p1 = sample portion of successes in Group 1 p2 = sample portion of successes in Group 2
(π1 − π1) = hypothesized population proportion 1 minus hypothesized population
proportion 1 minus
Sp1-p2 = pooled estimate of the standard errors of difference of proportions
Z-Test for Differences of Proportions
Z-Test for Differences of Proportions
−
− =
2 1
1 1
2
1
p q n n
S p p
p = pooled estimate of proportion of success in a sample of both groups
p = (1- p) or a pooled estimate of proportion of failures in a sample of both groups
n1= sample size for group 1 n2= sample size for group 2
p
q
pZ-Test for Differences of Proportions
Z-Test for Differences of Proportions
2 1
2 2
1 1
n n
p n
p
p n
( )( )
+
−
=
100 1 100
625 1 .
375
2
.
1 p
S
p068
= .
Z-Test for Differences of Proportions
( )( ) ( )( )
100 100
4 . 100 35
. 100
+
= + p
375
= .
A Z-Test for Differences of
Proportions
Type of Measurement
Differences between three or more independent groups
Interval or ratio One-way
ANOVA
Analysis of Variance
Hypothesis when comparing three groups
µ
1= µ
2= µ
3groups within
Variance
groups between
Variance
F − −
−
= −
Analysis of Variance F-Ratio
Analysis of Variance Sum of Squares
between within
total SS SS
SS = +
= =
−
= n
i
c
j
1 1
2
total ( )
SS X ij X
Analysis of Variance Sum of SquaresTotal
Analysis of Variance Sum of Squares
pi = individual scores, i.e., the ith observation or test unit in the jth group
pi = grand mean
n = number of all observations or test units in a group
c = number of jth groups (or columns) Xij
X
= =
−
= n
i
c
j
j
1 1
2
within ( )
SS X ij X
Analysis of Variance Sum of SquaresWithin
Analysis of Variance Sum of SquaresWithin
pi = individual scores, i.e., the ith observation or test unit in the jth group
pi = grand mean
n = number of all observations or test units in a group
c = number of jth groups (or columns) Xij
X
=
−
= n
j
j
n j 1
2
between ( )
SS X X
Analysis of Variance Sum of Squares Between
Analysis of Variance Sum of squares Between
= individual scores, i.e., the ith observation or test unit in the jth group
= grand mean
nj = number of all observations or test units in a group
Xj
X
− 1
= c
MS between SS between
Analysis of Variance Mean Squares Between
c cn
MS within SS within
= −
Analysis of Variance
Mean Square Within
within between
MS F = MS
Analysis of Variance F-Ratio
Sales in Units (thousands) Regular Price
$.99 130 118 87 84 X1=104.75
X=119.58
Reduced Price
$.89 145 143 120 131 X2=134.75
Cents-Off Coupon Regular Price
153 129 96 99 X1=119.25 Test Market A, B, or C
Test Market D, E, or F Test Market G, H, or I Test Market J, K, or L Mean
Grand Mean
A Test Market Experiment
on Pricing
ANOVA Summary Table Source of Variation
• Between groups
• Sum of squares
– SSbetween
• Degrees of freedom
– c-1 where c=number of groups
• Mean squared-MSbetween
– SSbetween/c-1
ANOVA Summary Table Source of Variation
• Within groups
• Sum of squares
– SSwithin
• Degrees of freedom
– cn-c where c=number of groups, n= number of observations in a group
• Mean squared-MSwithin
– SSwithin/cn-c
WITHIN BETWEEN
MS F = MS
ANOVA Summary Table Source of Variation
• Total
• Sum of Squares
– SStotal
• Degrees of Freedom
– cn-1 where c=number of groups, n= number of observations in a group
Research Methods
William G. Zikmund
Bivariate Analysis: Measures of Associations
Measures of Association
• A general term that refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables.
Relationships Among Variables
• Correlation analysis
• Bivariate regression analysis
Type of Measurement
Measure of Association
Interval and Ratio Scales
Correlation Coefficient Bivariate Regression
Type of Measurement
Measure of Association
Ordinal Scales Chi-square
Rank Correlation
Type of Measurement
Measure of Association
Nominal
Chi-Square Phi Coefficient Contingency Coefficient
Correlation Coefficient
• A statistical measure of the covariation or association between two variables.
• Are dollar sales associated with advertising dollar expenditures?
The Correlation coefficient for two variables, X and Y is
xy
Health Economics Research Method 2003/2.
Correlation Coefficient
• r
• r ranges from +1 to -1
• r = +1 a perfect positive linear relationship
• r = -1 a perfect negative linear relationship
• r = 0 indicates no correlation
( )( )
( ) ( )
∑ ∑
∑
−
−
−
= −
=
2 2Y Yi
X Xi
Y Y
X r X
r
xy yx i iSimple Correlation Coefficient
2 2
y x
xy yx
xy r
r σ σ
= σ
=
Simple Correlation Coefficient
= Variance of X
= Variance of Y
= Covariance of X and Y
2
σ
x 2σ
yσ
xySimple Correlation Coefficient
Alternative Method
X Y
NO CORRELATION
Health Economics Research Method 2003/2.
Correlation Patterns
X Y
PERFECT NEGATIVE CORRELATION - r= -1.0
Health Economics Research Method 2003/2.
Correlation Patterns
X Y
A HIGH POSITIVE CORRELATION r = +.98
Health Economics Research Method 2003/2.
Correlation Patterns
Pg 629
( 17 − . 837 6 . 3389 )( 5 . 589 )
= r
712 .
99
3389 .
− 6
= = − . 635
Calculation of r
Coefficient of Determination
Variance
variance
2
Total
Explained r =
Correlation Does Not Mean Causation
• High correlation
• Rooster’s crow and the rising of the sun
– Rooster does not cause the sun to rise.
• Teachers’ salaries and the consumption of liquor
– Covary because they are both influenced by a third variable
Correlation Matrix
• The standard form for reporting correlational results.
Correlation Matrix
Var1 Var2 Var3
Var1 1.0 0.45 0.31
Var2 0.45 1.0 0.10
Var3 0.31 0.10 1.0
Walkup’s
First Laws of Statistics
• Law No. 1
– Everything correlates with everything, especially when the same individual defines the variables to be correlated.
• Law No. 2
– It won’t help very much to find a good correlation between the variable you are interested in and some other variable that you don’t understand any better.
• Law No. 3
– Unless you can think of a logical reason why two variables should be connected as cause and effect, it doesn’t help much to find a correlation between them. In Columbus, Ohio, the mean monthly rainfall correlates very nicely with the number of letters in the names of the months!
Walkup’s
First Laws of Statistics
Going back to previous conditions Going back to previous conditions
Tall men’Tall men’s sonss sons
DICTIONARY DICTIONARY DEFINITION DEFINITION
GOING OR GOING OR MOVING MOVING
BACKWARD BACKWARD
Regression
Bivariate Regression
• A measure of linear association that investigates a straight line relationship
• Useful in forecasting
Bivariate Linear Regression
• A measure of linear association that investigates a straight-line relationship
• Y = a + bX
• where
• Y is the dependent variable
• X is the independent variable
• a and b are two constants to be estimated
Y intercept
• a
• An intercepted segment of a line
• The point at which a regression line intercepts the Y-axis
Slope
• b
• The inclination of a regression line as compared to a base line
• Rise over run
• D - notation for “a change in”
Y
160 150 140 130 120 110 100 90 80
70 80 90 100 110 120 130 140 150 160 170 180 190
X
My line
Your line
Health Economics Research Method 2003/2.
Scatter Diagram
and Eyeball Forecast
130 120 110 100 90 80
80 90 100 110 120 130 140 150 160 170 180 190
X Y
Health Economics Research Method 2003/2.
X a
Y ˆ = ˆ + β ˆ
∆ X ∆ Y ˆ
Regression Line and Slope
X Y
160 150 140 130 120 110 100 90 80
70 80 90 100 110 120 130 140 150 160 170 180 190
Y “hat” for Dealer 3
Actual Y for Dealer 7
Y “hat”for Dealer 7
Actual Y for Dealer 3
Least-Squares
Regression Line
130 120 110 100 90 80
80 90 100 110 120 130 140 150 160 170 180 190
X Y
}
}
Deviation not explained
Total deviation
Deviation explained by the regression
Y
Health Economics Research Method 2003/2.
Scatter Diagram of Explained and Unexplained Variation
The Least-Square Method
• Uses the criterion of attempting to make the least amount of total error in prediction of Y from X. More technically, the procedure used in the least-squares method generates a straight line that minimizes the sum of
squared deviations of the actual values from this predicted regression line.
The Least-Square Method
• A relatively simple mathematical technique that ensures that the straight line will most closely represent the relationship between X and Y.
Regression - Least-Square Method
= n
i
e i 1
2 is minimum
= - (The “residual”)
= actual value of the dependent variable
= estimated value of the dependent variable (Y hat) n = number of observations
i = number of the observation
e
iY
iY ˆ
iY
iY ˆ
iThe Logic behind the Least- Squares Technique
• No straight line can completely represent every dot in the scatter diagram
• There will be a discrepancy between most of the actual scores (each dot) and the
predicted score
• Uses the criterion of attempting to make the least amount of total error in prediction of Y from X
X Y
a ˆ = − β ˆ
Bivariate Regression
( ) ( )( )
(
2) ( )
2ˆ ∑ ∑ ∑ ∑ ∑
−
= −
X X
n
Y X
XY β n
Bivariate Regression
= estimated slope of the line (the “regression coefficient”)
= estimated intercept of the y axis
= dependent variable
= mean of the dependent variable
= independent variable
= mean of the independent variable
= number of observations
βˆ
X
Y
n a ˆ
Y
X
( )
( 245 , 759 ) 3 , 515 , 625
15
875 ,
806 ,
2 345
, 193 ˆ 15
−
= −
β
625 ,
515 ,
3 385
, 686 ,
3
875 ,
806 ,
2 175
, 900 ,
2
−
= −
760 ,
170
300 ,
= 93 = . 54638
( ) 125
54638 .
8 . ˆ = 99 − a
3 . 68 8
. 99 −
=
5 .
= 31
( ) 125
54638 .
8 . ˆ = 99 − a
3 . 68 8
. 99 −
=
5 .
= 31
( ) X
Y ˆ = 31 . 5 + . 546
( ) 89
546 .
5 . 31 +
=
6 . 48 5
. 31 +
=
1 .
= 80
( ) X
Y ˆ = 31 . 5 + . 546
( ) 89
546 .
5 . 31 +
=
6 . 48 5
. 31 +
=
1 .
= 80
( ) 165
546 .
5 . ˆ 31
129) value
Y (Actual 7
Dealer
7
= +
= Y
6 .
= 121
( ) 95
546 .
5 . ˆ 31
) 80 value
Y (Actual 3
Dealer
3
= +
= Y
4 .
= 83
9 9 Y ˆ Y
e i = −
5 . 96 97 −
=
5 .
= 0
( ) 165
546 .
5 . ˆ 31
129) value
Y (Actual 7
Dealer
7
= +
= Y
6 .
= 121
( ) 95
546 .
5 . ˆ 31
) 80 value
Y (Actual 3
Dealer
3
= +
= Y
4 .
= 83
9 9 Y ˆ Y
e i = −
5 . 96 97 −
=
5 .
= 0
119 546
. 5
. ˆ 31
9 = +
Y
F-Test (Regression)
• A procedure to determine whether there is more variability explained by the regression or unexplained by the regression.
• Analysis of variance summary table
Total Deviation can be Partitioned into Two Parts
• Total deviation equals
• Deviation explained by the regression plus
• Deviation unexplained by the regression
“We are always acting on what has just finished happening. It happened at least 1/30th of a second ago.We think we’re in the present, but we aren’t. The present we know is only a movie of the past.”
Tom Wolfe in The Electric Kool-Aid Acid Test
.
i i
i
i
Y Y Y Y Y
Y − = ˆ − + − ˆ
Partitioning the Variance
Total
deviation =
Deviation
explained by the regression
Deviation
unexplained by the regression (Residual error)
+
= Mean of the total group
= Value predicted with regression equation
= Actual value
Y
Y ˆ
Y
i( ) ∑ ( ) ∑ ( )
∑ Y
i− Y
2= Y ˆ
i− Y
2+ Y
i− Y ˆ
i 2Total variation explained
= Explained variation
Unexplained variation (residual) +
SSe SSr
SSt = +
Sum of Squares
Coefficient of Determination r
2• The proportion of variance in Y that is explained by X (or vice versa)
• A measure obtained by squaring the
correlation coefficient; that proportion of the total variance of a variable that is accounted for by knowing the value of another variable
Coefficient of Determination r
2SSt SSe SSt
r 2 SSr
Source of Variation
• Explained by Regression
• Degrees of Freedom
– k-1 where k= number of estimated constants (variables)
• Sum of Squares
– SSr
• Mean Squared
– SSr/k-1
Source of Variation
• Unexplained by Regression
• Degrees of Freedom
– n-k where n=number of observations
• Sum of Squares
– SSe
• Mean Squared
– SSe/n-k
r
2in the Example
875 4 .
. 882 ,
3
49 . 398 ,
2
= 3 =
r
Multiple Regression
• Extension of Bivariate Regression
• Multidimensional when three or more variables are involved
• Simultaneously investigates the effect of two or more variables on a single dependent variable
• Discussed in Chapter 24
Correlation Coefficient, r = .75
Correlation: Player Salary and Ticket Price
-20 -10 0 10 20 30
1995 1996 1997 1998 1999 2000 2001
Change in Ticket Price
Change in Player Salary