WEEK 3 LECTURES (15/03/16 & 17/03/16) Data analysis: comparing means
From histograms to distributions
Population distribution (i.e. distribution of individuals)
• The standard deviation is just the square root of the variance
• The mean and standard deviation allows us to describe the population distribution (which is normally distributed)
• Is it centered on its mean
• Most of the individuals have heights within 2 standard deviations of the mean
• Thus, by knowing the mean and standard deviation, you are able to describe the distribution of the heights of the individuals
Testing sample mean
• Left histogram describes the individual heights
• Right histogram describes successive measurements of the average height of 20 individuals
• The standard deviation of the right histogram is called the standard error (SM) à it is related to the standard deviation of the distribution of individuals as follows (n = number of people in each sample)
Importance of standard error
• Needed to answer questions about mean of the sample
• Example: wanted to know whether, on average, members of a particular town were taller than 170cm
• Construct two hypotheses
o H0: the heights of individuals in this town are normally distributed with a mean of 170cm
o H1: on average the individuals are taller than 170cm
• H0 is the null hypothesis
• H1 is the alternative hypothesis
• Initially you assumer the null hypothesis is true and on this basis calculate the probability of measuring a sample mean (M) greater or equal to what you actually did measure
• If this probability is very small, you conclude that H0 cannot be true, and reject it in favor of H1
• In other words à conclude that on average individuals in the town must be taller than 170cm
Calculating the probability
• Initially you assume that H0 is true (i.e. you assume that mean = 170cm) and calculate the t statistic based on this assumption
• The t statistic represents how much greater (or less) the sample mean (M) is than the hypothetical mean (170), relative to the standard error
• If the value of the t statistic is large that implies, relative to the standard error, the difference between M and mean is large Critical values of the t statistic
• Look at t table to determine how large the t statistic must be for you to reject H0 in favor of H1
• If the value of your value t statistic exceeds the value that you look up in the t table then you reject H0 in favor of H1
Worked example
• Should we reject H0
o For our sample, there are n-1 = 3 degrees of freedom (df) o We wish to perform the test at the 5% significance level and we
want the test to be one-tailed
o Because your calculated t statistic is greater than this critical value, you reject H0
o You conclude that from your data, that the mean age in the town is greater than 40
• What if our result is not significant
o Suppose your calculated t statistic was less than this critical value
o You are unable to reject H0 What is a significance level?
• When you test a null hypothesis, you do so at a certain significance level
• This significance level indicates the probability that, if you reject the null hypothesis, you do so incorrectly
• It is probability that you have made a type 1 error (null hypothesis H0 is false when it is in fact true)
• The greater the critical value for the t statistic, the less likely you are to make a type 1 error