WEEK 3 LECTURES (15/03/16 & 17/03/16)

(1)

WEEK 3 LECTURES (15/03/16 & 17/03/16) Data analysis: comparing means

From histograms to distributions

Population distribution (i.e. distribution of individuals)

• The standard deviation is just the square root of the variance

• The mean and standard deviation allows us to describe the population distribution (which is normally distributed)

• Is it centered on its mean

• Most of the individuals have heights within 2 standard deviations of the mean

• Thus, by knowing the mean and standard deviation, you are able to describe the distribution of the heights of the individuals

Testing sample mean

(2)

• Left histogram describes the individual heights

• Right histogram describes successive measurements of the average height of 20 individuals

• The standard deviation of the right histogram is called the standard error (SM) à it is related to the standard deviation of the distribution of individuals as follows (n = number of people in each sample)

Importance of standard error

• Needed to answer questions about mean of the sample

• Example: wanted to know whether, on average, members of a particular town were taller than 170cm

• Construct two hypotheses

o H0: the heights of individuals in this town are normally distributed with a mean of 170cm

o H1: on average the individuals are taller than 170cm

• H0 is the null hypothesis

• H1 is the alternative hypothesis

• Initially you assumer the null hypothesis is true and on this basis calculate the probability of measuring a sample mean (M) greater or equal to what you actually did measure

• If this probability is very small, you conclude that H0 cannot be true, and reject it in favor of H1

• In other words à conclude that on average individuals in the town must be taller than 170cm

Calculating the probability

• Initially you assume that H0 is true (i.e. you assume that mean = 170cm) and calculate the t statistic based on this assumption

• The t statistic represents how much greater (or less) the sample mean (M) is than the hypothetical mean (170), relative to the standard error

• If the value of the t statistic is large that implies, relative to the standard error, the difference between M and mean is large Critical values of the t statistic

• Look at t table to determine how large the t statistic must be for you to reject H0 in favor of H1

(3)

• If the value of your value t statistic exceeds the value that you look up in the t table then you reject H0 in favor of H1

Worked example

• Should we reject H0

o For our sample, there are n-1 = 3 degrees of freedom (df) o We wish to perform the test at the 5% significance level and we

want the test to be one-tailed

o Because your calculated t statistic is greater than this critical value, you reject H0

o You conclude that from your data, that the mean age in the town is greater than 40

• What if our result is not significant

o Suppose your calculated t statistic was less than this critical value

o You are unable to reject H0 What is a significance level?

• When you test a null hypothesis, you do so at a certain significance level

• This significance level indicates the probability that, if you reject the null hypothesis, you do so incorrectly

• It is probability that you have made a type 1 error (null hypothesis H0 is false when it is in fact true)

• The greater the critical value for the t statistic, the less likely you are to make a type 1 error

(4)