STAT170: INTRODUCTORY STATISTICS

(1)

Nowshin Hassan STAT170 – Introductory Statistics Session One

1

S T A T 1 7 0 : I N T R O D U C T O R Y S T A T I S T I C S

CHAPTER 7: HYPOTHESIS TESTS FOR A POPULATION MEAN z-test for a Population Mean

We often want to answer a research question pertaining to some target population of interest. We design a study carefully, and obtain a representative sample, random if possible, from the target population. We use this sample to make inferences about the population and thus to answer the research question.

Answering a Research Question

Example: ‘Is the mean IQ of children who attend country schools the same as that of the general population?’

Let Y = IQ scores of country students (variable of interest)

We know that in the general population, IQ scores follow a normal distribution with 𝜎=15. We will assume this also applies to the target population. So the sample mean will also be from a normal distribution, regardless of sample size. We had a random sample of 36 students from country schools in NSW with sample mean, 𝑦=103.

Null Hypothesis

A null hypothesis is a statement (or claim) that a population parameter has a specified value:

𝐻_!: 𝜇=𝜇_!

That is, the null hypothesis states that a population parameter (in this case 𝜇) is equal to a specific value (which we are calling 𝜇_! here).

o For our research question, the null hypothesis would claim that the mean IQ score of country school students is the same as the general population, i.e. equal to 100. This is because the population mean is known to be 100, and we want to determine if this is the same for country school students.

o ‘Null’ means ‘no effect’ or ‘no change’

o We are told IQ scores are designed to be normally distributed with a 𝜇=100, and 𝜎 =15. If there has been no change in IQ scores over the years, it would be reasonable to hypothesise that the mean IQ score

remained at 100 – this is the null hypothesis.

o So we express this null hypothesis as: 𝐻_!: 𝜇 =100 (null value)

To test this hypothesis – take a sample of students measure IQ and determine mean. Compute a 95% confidence interval and check to see whether mean lies between interval values.

⇒ Yes it does lie b/w both values = sample consistent with null hypothesis

⇒ No it doesn’t lie b/w both values = null value is not a plausible value for population mean Alternative Hypothesis

We also need to specify an alternative hypothesis, 𝑯𝟏 in case we find evidence against the null hypothesis, 𝑯𝟎. è If we were only prepared to reject a null hypothesis when we found evidence to indicate the population

mean was significantly lower than the null value, 𝝁𝟎, the alternative hypothesis would be: 𝑯𝟏: 𝝁<𝝁_𝟎 = (one-‐sided test à p-‐value on one side of distribution)

è If were only prepared to reject a null hypothesis when we found evidence to indicate the population mean was significantly higher than 𝝁𝟎, the alternative hypothesis would be 𝑯𝟏: 𝝁>𝝁_𝟎 (one-‐sided test)

Note: In STAT170, we will NOT use one-‐sided tests – we will only use 2-‐sided tests. These are tests where we are prepared to reject a null hypothesis when we find evidence indicating the population parameter is either higher or lower than 𝝁𝟎. Therefore the alternative hypothesis for this test of a population mean will be:

𝑯_𝟏: 𝝁≠𝝁_𝟎.

Example Continued: Country School Students Hypothesis:

𝐻_!: 𝜇=100 𝐻_!:𝜇≠100 Checking Test Assumptions

We need to ensure that a z-‐test for a population mean will be valid here.

The assumption for a z-‐test for population mean is that the sample mean is drawn from a normal distribution.

v This will be a reasonable assumption if the sample is large enough for the Central Limit Theorem to apply (which is the case here, although normality had been given). If the sample is not large enough, we should

(2)

2 graph some data and is it appears that the sample mean may be drawn from a normal distribution, then we can assume the sample mean should also be drawn from a normal distribution.

v Regardless of sample size, it is always good practice to start your analysis by graphing data Testing a Null Hypothesis

To test a null hypothesis for a pop. mean, we compare the sample value 𝑦, with the corresponding null value, 𝜇!.

For the research question about the IQ scores of country school students, the null value, 𝜇_!=100, and the sample mean 𝑦 =103.

Testing this null hypothesis is the same as asking the question:

‘If the null hypothesis is true (i.e. country school students have IQ scores the same as the general population), how likely is a sample mean to be 3 points (or more) away from the population mean, in either direction?’

A Test Statistic (a.k.a z-statistic)

A test statistic is a scaled measure of the discrepancy between the observed sample statistic and null value (𝐻!).

⇒ Dividing the discrepancy by the standard error gives a result that is independent of the measurement unit For a one-‐sample test of a mean where Y is the variable of interest, the test statistic takes the form:

𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐= 𝑦−𝜇_! 𝑠𝑡.𝑒𝑟𝑟𝑜𝑟 𝑜𝑓𝑦

So the test statistic tells us how many standard errors the sample mean, 𝒚, is away from the null value, 𝝁𝟎. Test Statistic for One Sample z-test of a Population Mean

So if 𝜎 is known, we can use a one-‐sample z test for a population mean. The test statistic is:

z= 𝑦−𝜇_! 𝜎

𝑛

For our example we know 𝜎 =15 and we are testing 𝜇!=100 vs. 𝜇!≠100. With the sample size n = 36 and the sample mean 𝑦 =103, the test statistic is (absolute value):

𝑧= 103−100 15

36

=1.2

I.e. A sample mean IQ of 103 is 1.2 standard errors from 𝜇!. Size of a Test Statistic

∆ If the absolute value of the test statistic is large, it means there is a large discrepancy b/w the null value and the sample value à more likely to reject null hypothesis.

∆ If the absolute value of the test statistic is small, it means the null value is consistent with the data in the sample, i.e. the discrepancy could be attributable to chance – sampling variation à more likely to accept null hypothesis

A more precise measure of discrepancy is given by the p-‐value. [Smallest value of z-‐statistic is 0]

What is a p-value?

P-‐value: the likelihood/probability of getting such a large test statistic (or even larger), assuming that the null hypothesis is true (probability of getting a value under the normal curve more extreme than z-‐statistic in either direction

o The test statistic, z = 1.2 is a measure of the discrepancy between the sample mean, 𝑦=103 and the null value, 𝜇!=100. The p-‐value will tell us how likely we are to select a sample of size 36 which has a sample mean at least 1.2 standard errors away from the null value, if 𝐻! is true.

o The p-‐value is tail area of distribution of the test statistic, assuming 𝐻! is true. For a 2-‐sided z-‐test of a population mean, the p-‐value is the area in both tails of the z distribution because the measure of discrepancy could have been either side of the mean

Note: When z is negative, its absolute value is taken to get the area in the right tail Using a z-table to find the p-value

For this research question we have a test statistic of z = 1.2. Using the standard normal distribution, the total area in the 2 tails of the z distribution to the right of z = 1.2 and to the left of z = -‐1.2 is 0.2302 (p-‐value doubled – 2-‐tail).

The p-‐value is 0.2302. It is the probability of getting a z-‐value with a magnitude greater than 1.2 if 𝐻! is true.

[23% probability of getting a value different from the general pop.]

Significance Level & Decisions A p-‐value may be used to make a decision as follows:

(3)

3

∆ If a p-‐value is small (< 0.05 – this value is called the ‘significance level’), you reject the null hypothesis à the result is ‘statistically significant’

∆ If the p-‐value is large (≥ 0.05) you say the null hypothesis could be true but you do NOT accept it

Example: For the country students the correct decision would be to not reject 𝐻! because it is > 0.05 (not less than the 5% significance level)

SUMMARY

For a two-‐tailed test, a p-‐value obtained from a z-‐statistic by determining the area under the standardised normal curve in the 2 tails above 𝑧 and below -‐𝑧 . This area may be calculated from the z-‐table.

A Significance Result Using a significance level of 5%

(i.e. 𝛼=0.05).

‘If p-‐value is < 0.05, Reject 𝐻!!’ à

Statistically Significant Result Writing a Conclusion Example: Conclusion

Since we do not reject the null hypothesis we do not have evidence to claim that country school students have a different average IQ score to that of the general population. We conclude that, on average IQ scores could be the same among country school students as the general population.

Example: One Sample z-‐test for Population Mean

Is the average stopping distance of school buses travelling 75km/h equal to 80 metres as claimed by public transport authorities?

𝑌=𝑠𝑡𝑜𝑝𝑝𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑖𝑛 𝑚𝑒𝑡𝑟𝑒𝑠 𝜎 =1.2,𝑛 =20,𝑦=80.74,𝜇=80 Hypothesis Test:

H:

𝐻_!:𝜇=80 𝐻_!:𝜇≠80

A: From the histogram, the normality assumption appears to be

reasonable. The sample appears to be drawn from a normally distributed population.

T: z = ^!!!! ^!

! = !".!"!!"

!.!

!"

=2.76 P: p-‐value = 0.0029 x 2 = 0.0058 D: Since p-‐value < 0.05, reject 𝐻!

C: There is evidence to suggest that the average stopping distance for all school buses is more than 80 metres.

p-values and 95% Confidence Intervals

A confidence interval can be used to test a null hypothesis:

o If the confidence interval does NOT include the null value à decision is to reject 𝑯_𝟎 o If the confidence interval does include the null value à decision is do not reject 𝑯_𝟎

A p-‐value is equivalent to this rule: If the p-‐value is < 0.05, 𝐻_! is rejected. If the p-‐value is ≥0.05,𝐻_! is not rejected.

t-test for a Population Mean

A z-‐test for a population mean is only used when the population standard deviation (𝝈) is known. More often when 𝜎 is unknown it can be estimated using the sample standard deviation, s.

» When 𝜎 is unknown, the z-‐statistic should be replaced by a t-‐statistic with v= n – 1 degrees of freedom (df):

𝑡= ^!!!!

!

à this is called the student’s t statistic

» The p-‐value for a t-‐test is obtained from the t-‐distribution Finding a p-value from a t-table

(4)

4 Finding the range for a p-value from a t-

table

Suppose you have conducted a t-‐test for a population mean and have 𝑡 =2.31(since we are only doing two-‐

sided tests you will get the same p-‐value for t = 2.31 or t

= -‐ 2.31). Say the sample size n = 11, then v is 10 (n-‐1).

If the t-‐value had been 2.764 the p-‐value would have been 0.02. If the t-‐value had been 2.228 the p-‐value would have been 0.05. Since the t-‐value is b/w 2.764 and 2.228, the p-‐value must be b/w 0.02 and 0.05.

0.02 < p-‐value < 0.05

Note: In Minitab, it calculates the exact p-‐value. ALSO If the test is two-‐tailed it will be necessary to double the p-‐

value obtained from the table.

It is always better to supplement a p-‐value by showing a confidence interval. A 95% confidence interval for a population mean when the population standard deviation is unknown is:

95% 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑓𝑜𝑟 𝜇= 𝑦−𝑡_!"#$× 𝑠 𝑛 Statistical Significance

Statistically significant: used to describe the result of testing a null hypothesis, which gives a p-‐value < 0.05 Testing that a population has a specified proportion

In this case the null hypothesis takes the form:

𝐻_! =𝜋=𝜋_!

When checking for test assumptions (involving a population proportion), the minimum quantities, 𝑛𝜋! and 𝑛 (1− 𝜋_!) should be ≥5, otherwise the normality assumption is not reasonable.

A z-‐statistic is used to show the disagreement b/w the sample proportion and the null value, 𝜋!. It is shown as follows:

𝑧=𝑝−𝜋_!

𝑠𝑒_! = 𝑝−𝜋_! 𝜋_!(1−𝜋_!)

𝑛