Week 10 Hypothesis Testing on Two Samples UMN Lecturer Team

(1)

RSO@9/9/2012

Probabilistic and Statistics

Week 10

Hypothesis Testing on Two Samples

UMN Lecturer Team

Objectives

• Last week, we have studied about hypothesis testing on a mean at a particular value

(2)

Why we need more than two samples

• The experiment is made on two different samples:

– An engineer wants to test the aluminum quality under two different temperatures.

– An UMN student wants to test the system information implementation results between two different groups of users: Marketing and Finance

Independent samples

Two samples are independent when they are taken from two different population where there is no way for the first sample related to the other sample.

Example:

• A student wants to test the quality of two light bulbs from two brands

 two samples from different populations  to find the differences between two population

• A student takes a group of Papua students and a group of Java students and test whether they have a same consumption behavior two samples from different population

(3)

When we know that

our samples are different from each other?

• The differences between two samples are shown on the differences on their means

• Both samples must normally distributed

• Both sample sizes are minimum 30

• Doesn’t matter with their population shape

Central limit theorem

(4)

Example

• We want to test whether the salary of advertising managers are different from the salary of auditing managers

• Here is the data we have:

Steps of making Hypotheses

• Hypothesize

• Choose a proper statistic test

• Step-3 Specify the value of 

• making decision region

• gather sample data

• analysis the sample data

• Statistical Conclusion

(5)

Step-1

• Hypothesize:

– When we don’t care about how far the differences between them use H_o: µ₁=µ₂ H₁: µ₁≠µ₂

– When we want to know how far the differences between them use H_o: µ₁-µ₂= H₁: µ₁ - µ₂≠ 

• Let’s make it one by one:

• H_o: µ₁=µ₂

• H₁: µ₁≠µ₂

STEP-2: Choose a proper statistic test

z distribution is used when

• data is normally distributed

•  is known

• Sample statistic = 𝒙

t distribution is used when

•  is unknown

• Sample statistic = 𝒙 Differences between population means

(6)

Step-3 Specify the value of 

(7)

Step 4: making decision region

• Because it is a two tailed test, we use /2= 0.025  z=1.96

• Decision region:

– Rejection region Z<-1.96 or Z>1.96

– Non Rejection region -1.96 ≤ Z ≤1.96

Step 5: gather sample data

(8)

Step 6: analysis the sample data

• Compute the statistic test

We assume µ1=µ2

Step 7: Statistical Conclusion

• Rejection region Z<-1.96 or Z>1.96

• z=2.35

•  z>1.96

•  z is in rejection region

•  null hypothesis is rejected

•  H_a is accepted

(9)

Step 8: Making business decision

• 𝑥 = $70.7001 (for advertising managers) and

𝑥 = $62.187 (for auditing managers)₂

• It’s true that the salary of advertising managers are different from the salary of auditing managers

• Because of 𝑥 > 𝑥₁ ₂ it can be concluded that

• Advertising managers’ salary is bigger than

auditing managers′

Confidence interval to estimate µ₁-µ₂

How big is the difference?

(10)

Example

Answer

• For CI=98% α =2%  P(-z1<Z<z1)=98%

(11)

Self Study

Homework

• For Monday class:

– Do 10.4 for NIM ended with even number

– Do 10.5 for NIM ended with odd number

• For Thursday class:

(12)

Estimating differences between two means when their variances are unknown

• We don’t know what is ₁ and ₂. So what?

• We can assume that ₁ = _2.

• By assuming this, we can use this formula:

Example

Sebuah perusahaan menyelenggarakan training. Training diselenggarakan dalam 2 metode yang berbeda: metode A dan metode B.

Manager HRD ingin mengetahui apakah ada perbedaan yang signifikan antara kedua metode ini.

(13)

Step-1

• Hypothesize:

STEP-2: Choose a proper statistic test

z distribution is used when

•  is known

• Sample statistic = 𝒙

t distribution is used when

•  is unknown

• Sample statistic = 𝒙 Differences between population means

(14)

Step-3 Specify the value of 

(15)

• Because it is a two tailed test, we use /2= 0.025 df=n₁+n₂-2=15+12-2=25

• t_0.025,25= ±2.060

– Rejection region t<-2.060 or t>2.060 – Non Rejection region -2.060 ≤t≤2.060

(16)

We assume µ1=µ2

• Rejection region t<-2.060 or t>2.060

• t=-5.20

•  t<-2.06

•  t is in rejection region

•  null hypothesis is rejected

(17)

• Significantly µ₁≠µ₂

• 𝑥 = 47.731 (method A) and 𝑥 = 56.5 (method B)2 • Because of 𝑥 < 𝑥₁ ₂ it can be concluded that

• Method B is more effective than method A

How big is the difference?

• When both data is normally distributed

• Both  are unknown

• ₁ = ₂

(18)

Example

• For the previous problems, we can estimate the CI:

47.73 − 56.50 − −5.20 19.495 15 − 1 + 18.273 12 − 1_{15 + 12 − 2} _{15 +}1 ₁₂1 ≤ 𝜇1− 𝜇2

≤ 47.73 − 56.50 + (−5.20) (19.495)(15 − 1) + (18.273)(12 − 1)_{15 + 12 − 2}

(19)

Homework

– Do 10.18 for NIM ended with 0,3,6,9

– Do 10.19 NIM ended with 1,4,7

– Do 10.20 for NIM ended with 2,5,8

When sample is not independent

• Sample is not independent:

– The same humans or objects are used before and after an experiment

(20)

Hypothesis testing for dependent samples

• The approach for dependent samples are different from the independent samples

• Make pairs of related members

• Calculate their differences

(21)

Example

Seorang investor di Bursa saham ingin mengetahui apakah ada perbedaan yang signifikan Antara rasio P/E (Price to earning) dari dua tahun yang berurutan.

9 perusahaan dipilih secara acak, tidak diketahui informasi sebelumnya, dan diasumsikan =1%. Berdistribusi normal.

(22)

• Because it is a two tailed test, we use /2= 0.005 df=n-1=9-1=8

• t_0.005,8= ±3.355

– Rejection region t<-3.355 or t> 3.355 – Non Rejection region - 3.355 ≤t≤ 3.355

(23)

• Rejection region t<-3.355 or t> 3.355

• t=-0.70

•  - 3.355 ≤t≤ 3.355

•  t is in non rejection region

(24)

There is no significant differences in the average P/E ratio between year 1 and year 2.

(25)

Self Study

Homework