• Tidak ada hasil yang ditemukan

Bivariate Analysis - Tests of Differences

N/A
N/A
Protected

Academic year: 2024

Membagikan "Bivariate Analysis - Tests of Differences"

Copied!
61
0
0

Teks penuh

(1)

Research Methods

William G. Zikmund

Bivariate Analysis - Tests of Differences

Type of Measurement Differences between two independent groups

Differences among three or more independent groups

Interval and ratio Independent groups:

t-test or Z-test

One-way ANOVA

Common Bivariate Tests

(2)

Type of Measurement Differences between two independent groups

Differences among three or more independent groups

Ordinal Mann-Whitney U-test

Wilcoxon test Kruskal-Wallis test

Common Bivariate Tests

Type of Measurement Differences between two independent groups

Differences among three or more independent groups

Nominal Z-test (two proportions)

Chi-square test Chi-square test

Common Bivariate Tests

(3)

Type of Measurement

Differences between two independent groups

Nominal Chi-square test

Differences Between Groups

• Contingency Tables

• Cross-Tabulation

• Chi-Square allows testing for significant differences between groups

• “Goodness of Fit”

(4)

Chi-Square Test

= −

i i

i )²

² (

E E x O

x² = chi-square statistics

Oi= observed frequency in the ithcell Ei = expected frequency on theith cell

n C E ij R i j

Ri= total observed frequency in the ith row Cj= total observed frequency in thejth column n = sample size

Chi-Square Test

(5)

Health Economics Research Method 2003/2

Degrees of Freedom (R-1)(C-1)=(2-1)(2-1)=1

d.f.=(R-1)(C-1)

Degrees of Freedom

(6)

Men Women Total

Aware 50 10 60

Unaware 15 25 40

65 35 100

Awareness of Tire Manufacturer’s Brand

Chi-Square Test: Differences Among Groups Example

21 ) 21 10

( 39

) 39 50

(

2 2

2

− +

= X

14

) 14 25

( 26

) 26 15

(

2

2

− +

+

(7)

161 .

22

643 .

8 654

. 4 762

. 5 102

. 3

2 2

=

= +

+ +

=

χ χ

1 )

1 2

)(

1 2

( .

.

) 1 )(

1 (

. .

=

=

= f d

C R

f d

X

2

=3.84 with 1 d.f.

(8)

Type of Measurement

Differences between two independent groups

Interval and ratio

t-test or Z-test

Differences Between Groups when Comparing Means

• Ratio scaled dependent variables

• t-test

– When groups are small

– When population standard deviation is unknown

• z-test

– When groups are large

(9)

2

0

1

2 1

=

µ µ

µ µ

OR

Null Hypothesis About Mean Differences Between Groups

means random

of y Variabilit

2 mean -

1 mean t =

t-Test for Difference of Means

(10)

2 1

2 1

X

S X

t

Χ

= Χ

X1= mean for Group 1 X2 = mean for Group 2

SX1-X2 = the pooled or combined standard error of difference between means.

t-Test for Difference of Means

2 1

2 1

X X −

t-Test for Difference of Means

(11)

X1 = mean for Group 1 X2 = mean for Group 2 SX

1-X2 = the pooled or combined standard error of difference between means.

t-Test for Difference of Means

Pooled Estimate of the Standard Error

( )

 

 

 +

 

 

− +

− +

= −

2 1

2 1

2 2 2

2 1

1

1 1

2

) ) 1 (

1

2

1

n n n n

S n

S

S

X X

n

(12)

S12 = the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2

Pooled Estimate of the Standard Error

Pooled Estimate of the Standard Error

t-test for the Difference of Means

( )

 

 

 +

 

 

− +

− +

= −

2 1

2 1

2 2 2

2 1

1

1 1

2

) ) 1 (

1

2

1

n n n n

S n

S S

X X

n

S12= the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2

(13)

Degrees of Freedom

• d.f. = n - k

• where:

– n = n

1 +

n

2

– k = number of groups

( )( ) ( )( )

 

 

 +

 

 

 +

=

14 1 21

1 33

6 . 2 13 1

. 2

20

2 2

2

1 X

S

X

797

= .

t-Test for Difference of Means

Example

(14)

797 .

2 . 12 5

.

16 −

=

t . 797

3 .

= 4

395 .

= 5

Type of Measurement

Differences between two independent groups

Nominal Z-test (two proportions)

(15)

Π

Comparing Two Groups when Comparing Proportions

• Percentage Comparisons

• Sample Proportion - P

• Population Proportion -

Differences Between Two Groups when Comparing Proportions

The hypothesis is:

H

o

: Π

1

= Π

2

may be restated as:

H

o

: Π

1

− Π

2

= 0

(16)

2

: π

1

= π

H

o

or

0 : π

1

− π

2

= H

o

Z-Test for Differences of Proportions

Z-Test for Differences of Proportions

2 1

2 1

2 1

p

S p

p Z p

= − π π

(17)

p1 = sample portion of successes in Group 1 p2 = sample portion of successes in Group 2

1 − π1) = hypothesized population proportion 1 minus hypothesized population

proportion 1 minus

Sp1-p2 = pooled estimate of the standard errors of difference of proportions

Z-Test for Differences of Proportions

Z-Test for Differences of Proportions

 

 

 −

− =

2 1

1 1

2

1

p q n n

S p p

(18)

p = pooled estimate of proportion of success in a sample of both groups

p = (1- p) or a pooled estimate of proportion of failures in a sample of both groups

n1= sample size for group 1 n2= sample size for group 2

p

q

p

Z-Test for Differences of Proportions

Z-Test for Differences of Proportions

2 1

2 2

1 1

n n

p n

p

p n

(19)

( )( ) 

 

 +

=

100 1 100

625 1 .

375

2

.

1 p

S

p

068

= .

Z-Test for Differences of Proportions

( )( ) ( )( )

100 100

4 . 100 35

. 100

+

= + p

375

= .

A Z-Test for Differences of

Proportions

(20)

Type of Measurement

Differences between three or more independent groups

Interval or ratio One-way

ANOVA

Analysis of Variance

Hypothesis when comparing three groups

µ

1

= µ

2

= µ

3
(21)

groups within

Variance

groups between

Variance

F − −

= −

Analysis of Variance F-Ratio

Analysis of Variance Sum of Squares

between within

total SS SS

SS = +

(22)

= =

= n

i

c

j

1 1

2

total ( )

SS X ij X

Analysis of Variance Sum of SquaresTotal

Analysis of Variance Sum of Squares

pi = individual scores, i.e., the ith observation or test unit in the jth group

pi = grand mean

n = number of all observations or test units in a group

c = number of jth groups (or columns) Xij

X

(23)

= =

= n

i

c

j

j

1 1

2

within ( )

SS X ij X

Analysis of Variance Sum of SquaresWithin

Analysis of Variance Sum of SquaresWithin

pi = individual scores, i.e., the ith observation or test unit in the jth group

pi = grand mean

n = number of all observations or test units in a group

c = number of jth groups (or columns) Xij

X

(24)

=

= n

j

j

n j 1

2

between ( )

SS X X

Analysis of Variance Sum of Squares Between

Analysis of Variance Sum of squares Between

= individual scores, i.e., the ith observation or test unit in the jth group

= grand mean

nj = number of all observations or test units in a group

Xj

X

(25)

− 1

= c

MS between SS between

Analysis of Variance Mean Squares Between

c cn

MS within SS within

= −

Analysis of Variance

Mean Square Within

(26)

within between

MS F = MS

Analysis of Variance F-Ratio

Sales in Units (thousands) Regular Price

$.99 130 118 87 84 X1=104.75

X=119.58

Reduced Price

$.89 145 143 120 131 X2=134.75

Cents-Off Coupon Regular Price

153 129 96 99 X1=119.25 Test Market A, B, or C

Test Market D, E, or F Test Market G, H, or I Test Market J, K, or L Mean

Grand Mean

A Test Market Experiment

on Pricing

(27)

ANOVA Summary Table Source of Variation

• Between groups

• Sum of squares

– SSbetween

• Degrees of freedom

– c-1 where c=number of groups

• Mean squared-MSbetween

– SSbetween/c-1

ANOVA Summary Table Source of Variation

• Within groups

• Sum of squares

– SSwithin

• Degrees of freedom

– cn-c where c=number of groups, n= number of observations in a group

• Mean squared-MSwithin

– SSwithin/cn-c

(28)

WITHIN BETWEEN

MS F = MS

ANOVA Summary Table Source of Variation

• Total

• Sum of Squares

– SStotal

• Degrees of Freedom

– cn-1 where c=number of groups, n= number of observations in a group

(29)

Research Methods

William G. Zikmund

Bivariate Analysis: Measures of Associations

Measures of Association

• A general term that refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables.

(30)

Relationships Among Variables

• Correlation analysis

• Bivariate regression analysis

Type of Measurement

Measure of Association

Interval and Ratio Scales

Correlation Coefficient Bivariate Regression

(31)

Type of Measurement

Measure of Association

Ordinal Scales Chi-square

Rank Correlation

Type of Measurement

Measure of Association

Nominal

Chi-Square Phi Coefficient Contingency Coefficient

(32)

Correlation Coefficient

• A statistical measure of the covariation or association between two variables.

• Are dollar sales associated with advertising dollar expenditures?

The Correlation coefficient for two variables, X and Y is

xy

Health Economics Research Method 2003/2.

(33)

Correlation Coefficient

• r

• r ranges from +1 to -1

• r = +1 a perfect positive linear relationship

• r = -1 a perfect negative linear relationship

• r = 0 indicates no correlation

( )( )

( ) ( )

∑ ∑

= −

=

2 2

Y Yi

X Xi

Y Y

X r X

r

xy yx i i

Simple Correlation Coefficient

(34)

2 2

y x

xy yx

xy r

r σ σ

= σ

=

Simple Correlation Coefficient

= Variance of X

= Variance of Y

= Covariance of X and Y

2

σ

x 2

σ

y

σ

xy

Simple Correlation Coefficient

Alternative Method

(35)

X Y

NO CORRELATION

Health Economics Research Method 2003/2.

Correlation Patterns

X Y

PERFECT NEGATIVE CORRELATION - r= -1.0

Health Economics Research Method 2003/2.

Correlation Patterns

(36)

X Y

A HIGH POSITIVE CORRELATION r = +.98

Health Economics Research Method 2003/2.

Correlation Patterns

Pg 629

( 17 − . 837 6 . 3389 )( 5 . 589 )

= r

712 .

99

3389 .

− 6

= = − . 635

Calculation of r

(37)

Coefficient of Determination

Variance

variance

2

Total

Explained r =

Correlation Does Not Mean Causation

• High correlation

• Rooster’s crow and the rising of the sun

– Rooster does not cause the sun to rise.

• Teachers’ salaries and the consumption of liquor

– Covary because they are both influenced by a third variable

(38)

Correlation Matrix

• The standard form for reporting correlational results.

Correlation Matrix

Var1 Var2 Var3

Var1 1.0 0.45 0.31

Var2 0.45 1.0 0.10

Var3 0.31 0.10 1.0

(39)

Walkup’s

First Laws of Statistics

• Law No. 1

– Everything correlates with everything, especially when the same individual defines the variables to be correlated.

• Law No. 2

– It won’t help very much to find a good correlation between the variable you are interested in and some other variable that you don’t understand any better.

• Law No. 3

– Unless you can think of a logical reason why two variables should be connected as cause and effect, it doesn’t help much to find a correlation between them. In Columbus, Ohio, the mean monthly rainfall correlates very nicely with the number of letters in the names of the months!

Walkup’s

First Laws of Statistics

(40)

Going back to previous conditions Going back to previous conditions

Tall men’Tall men’s sonss sons

DICTIONARY DICTIONARY DEFINITION DEFINITION

GOING OR GOING OR MOVING MOVING

BACKWARD BACKWARD

Regression

Bivariate Regression

• A measure of linear association that investigates a straight line relationship

• Useful in forecasting

(41)

Bivariate Linear Regression

• A measure of linear association that investigates a straight-line relationship

• Y = a + bX

• where

• Y is the dependent variable

• X is the independent variable

• a and b are two constants to be estimated

Y intercept

• a

• An intercepted segment of a line

• The point at which a regression line intercepts the Y-axis

(42)

Slope

• b

• The inclination of a regression line as compared to a base line

• Rise over run

• D - notation for “a change in”

Y

160 150 140 130 120 110 100 90 80

70 80 90 100 110 120 130 140 150 160 170 180 190

X

My line

Your line

Health Economics Research Method 2003/2.

Scatter Diagram

and Eyeball Forecast

(43)

130 120 110 100 90 80

80 90 100 110 120 130 140 150 160 170 180 190

X Y

Health Economics Research Method 2003/2.

X a

Y ˆ = ˆ + β ˆ

∆ X ∆ Y ˆ

Regression Line and Slope

X Y

160 150 140 130 120 110 100 90 80

70 80 90 100 110 120 130 140 150 160 170 180 190

Y hat for Dealer 3

Actual Y for Dealer 7

Y hatfor Dealer 7

Actual Y for Dealer 3

Least-Squares

Regression Line

(44)

130 120 110 100 90 80

80 90 100 110 120 130 140 150 160 170 180 190

X Y

}

}

Deviation not explained

Total deviation

Deviation explained by the regression

Y

Health Economics Research Method 2003/2.

Scatter Diagram of Explained and Unexplained Variation

The Least-Square Method

• Uses the criterion of attempting to make the least amount of total error in prediction of Y from X. More technically, the procedure used in the least-squares method generates a straight line that minimizes the sum of

squared deviations of the actual values from this predicted regression line.

(45)

The Least-Square Method

• A relatively simple mathematical technique that ensures that the straight line will most closely represent the relationship between X and Y.

Regression - Least-Square Method

= n

i

e i 1

2 is minimum

(46)

= - (The “residual”)

= actual value of the dependent variable

= estimated value of the dependent variable (Y hat) n = number of observations

i = number of the observation

e

i

Y

i

Y ˆ

i

Y

i

Y ˆ

i

The Logic behind the Least- Squares Technique

• No straight line can completely represent every dot in the scatter diagram

• There will be a discrepancy between most of the actual scores (each dot) and the

predicted score

• Uses the criterion of attempting to make the least amount of total error in prediction of Y from X

(47)

X Y

a ˆ = − β ˆ

Bivariate Regression

( ) ( )( )

(

2

) ( )

2

ˆ ∑ ∑ ∑ ∑ ∑

= −

X X

n

Y X

XY β n

Bivariate Regression

(48)

= estimated slope of the line (the “regression coefficient”)

= estimated intercept of the y axis

= dependent variable

= mean of the dependent variable

= independent variable

= mean of the independent variable

= number of observations

βˆ

X

Y

n a ˆ

Y

X

( )

( 245 , 759 ) 3 , 515 , 625

15

875 ,

806 ,

2 345

, 193 ˆ 15

= −

β

625 ,

515 ,

3 385

, 686 ,

3

875 ,

806 ,

2 175

, 900 ,

2

= −

760 ,

170

300 ,

= 93 = . 54638

(49)

( ) 125

54638 .

8 . ˆ = 99 − a

3 . 68 8

. 99 −

=

5 .

= 31

( ) 125

54638 .

8 . ˆ = 99 − a

3 . 68 8

. 99 −

=

5 .

= 31

(50)

( ) X

Y ˆ = 31 . 5 + . 546

( ) 89

546 .

5 . 31 +

=

6 . 48 5

. 31 +

=

1 .

= 80

( ) X

Y ˆ = 31 . 5 + . 546

( ) 89

546 .

5 . 31 +

=

6 . 48 5

. 31 +

=

1 .

= 80

(51)

( ) 165

546 .

5 . ˆ 31

129) value

Y (Actual 7

Dealer

7

= +

= Y

6 .

= 121

( ) 95

546 .

5 . ˆ 31

) 80 value

Y (Actual 3

Dealer

3

= +

= Y

4 .

= 83

9 9 Y ˆ Y

e i = −

5 . 96 97 −

=

5 .

= 0

(52)

( ) 165

546 .

5 . ˆ 31

129) value

Y (Actual 7

Dealer

7

= +

= Y

6 .

= 121

( ) 95

546 .

5 . ˆ 31

) 80 value

Y (Actual 3

Dealer

3

= +

= Y

4 .

= 83

9 9 Y ˆ Y

e i = −

5 . 96 97 −

=

5 .

= 0

(53)

119 546

. 5

. ˆ 31

9 = +

Y

F-Test (Regression)

• A procedure to determine whether there is more variability explained by the regression or unexplained by the regression.

• Analysis of variance summary table

(54)

Total Deviation can be Partitioned into Two Parts

• Total deviation equals

• Deviation explained by the regression plus

• Deviation unexplained by the regression

“We are always acting on what has just finished happening. It happened at least 1/30th of a second ago.We think we’re in the present, but we aren’t. The present we know is only a movie of the past.”

Tom Wolfe in The Electric Kool-Aid Acid Test

.

(55)

i i

i

i

Y Y Y Y Y

Y − = ˆ − + − ˆ

Partitioning the Variance

Total

deviation =

Deviation

explained by the regression

Deviation

unexplained by the regression (Residual error)

+

= Mean of the total group

= Value predicted with regression equation

= Actual value

Y

Y ˆ

Y

i
(56)

( ) ∑ ( ) ∑ ( )

∑ Y

i

− Y

2

= Y ˆ

i

− Y

2

+ Y

i

− Y ˆ

i 2

Total variation explained

= Explained variation

Unexplained variation (residual) +

SSe SSr

SSt = +

Sum of Squares

(57)

Coefficient of Determination r

2

• The proportion of variance in Y that is explained by X (or vice versa)

• A measure obtained by squaring the

correlation coefficient; that proportion of the total variance of a variable that is accounted for by knowing the value of another variable

Coefficient of Determination r

2

SSt SSe SSt

r 2 SSr

(58)

Source of Variation

• Explained by Regression

• Degrees of Freedom

– k-1 where k= number of estimated constants (variables)

• Sum of Squares

– SSr

• Mean Squared

– SSr/k-1

Source of Variation

• Unexplained by Regression

• Degrees of Freedom

– n-k where n=number of observations

• Sum of Squares

– SSe

• Mean Squared

– SSe/n-k

(59)

r

2

in the Example

875 4 .

. 882 ,

3

49 . 398 ,

2

= 3 =

r

Multiple Regression

• Extension of Bivariate Regression

• Multidimensional when three or more variables are involved

• Simultaneously investigates the effect of two or more variables on a single dependent variable

• Discussed in Chapter 24

(60)
(61)

Correlation Coefficient, r = .75

Correlation: Player Salary and Ticket Price

-20 -10 0 10 20 30

1995 1996 1997 1998 1999 2000 2001

Change in Ticket Price

Change in Player Salary

Referensi

Dokumen terkait

Signs of the regression coefficient reflect the relationship between independent variables current ratio, accounts receivable turnover, total asset turnover with the dependent

So from the results of data processing by looking at the t test which is a partial test of influence between the three independent variables on the dependent

Table 4 Elastic value about the joint ratio to partial independent variables Dependent variable Independent variable Elastic value CL RT RS HT WB HB 2.66 2.08 3.02 -10.80 -1.01 3.3

Figure S3 Risk of INSTI discontinuation, by age category Abbreviations: EVG-elvitegravir; DTG-dolutegravir; RAL-raltegravir; HR-hazard ratio; CI-confidence interval *Log HR comparing

A summary of the assumptions of multiple regression analysis Assumptions SPSS Statistics References 1 1 The dependent variable - an interval or ratio variable 2 Two or more

To test the influence between the independent and dependent variables, namely the influence of leadership support, organizational culture, facilities, and infrastructure on student

Simultaneous analysis/joint F test between all independent variables on the dependent variable HDI and PMDN together has a significant influence on the dependent variable Labor

Table 5 Students’ attitudes toward mathematics post-test mean scores between experimental and control groups are compared using independent-sample t-test Group N Mean SD t df