• Tidak ada hasil yang ditemukan

STAT170: INTRODUCTORY STATISTICS

N/A
N/A
Protected

Academic year: 2025

Membagikan "STAT170: INTRODUCTORY STATISTICS"

Copied!
4
0
0

Teks penuh

(1)

Nowshin Hassan STAT170 – Introductory Statistics Session One

1

S T A T 1 7 0 : I N T R O D U C T O R Y S T A T I S T I C S

CHAPTER 7: HYPOTHESIS TESTS FOR A POPULATION MEAN z-test for a Population Mean

We  often  want  to  answer  a  research  question  pertaining  to  some  target  population  of  interest.  We  design  a  study   carefully,  and  obtain  a  representative  sample,  random  if  possible,  from  the  target  population.  We  use  this  sample  to   make  inferences  about  the  population  and  thus  to  answer  the  research  question.  

Answering a Research Question

Example:  ‘Is  the  mean  IQ  of  children  who  attend  country  schools  the  same  as  that  of  the  general  population?’  

Let  Y  =  IQ  scores  of  country  students  (variable  of  interest)  

We  know  that  in  the  general  population,  IQ  scores  follow  a  normal  distribution  with  𝜎=15.  We  will  assume  this  also   applies  to  the  target  population.  So  the  sample  mean  will  also  be  from  a  normal  distribution,  regardless  of  sample   size.  We  had  a  random  sample  of  36  students  from  country  schools  in  NSW  with  sample  mean,  𝑦=103.  

Null Hypothesis

A  null  hypothesis  is  a  statement  (or  claim)  that  a  population  parameter  has  a  specified  value:  

𝐻!:  𝜇=𝜇!  

That  is,  the  null  hypothesis  states  that  a  population  parameter  (in  this  case  𝜇)  is  equal  to  a  specific  value  (which  we   are  calling  𝜇!  here).  

o For  our  research  question,  the  null  hypothesis  would  claim  that  the  mean  IQ  score  of  country  school   students  is  the  same  as  the  general  population,  i.e.  equal  to  100.  This  is  because  the  population  mean  is   known  to  be  100,  and  we  want  to  determine  if  this  is  the  same  for  country  school  students.    

o ‘Null’  means  ‘no  effect’  or  ‘no  change’  

o We  are  told  IQ  scores  are  designed  to  be  normally  distributed  with  a  𝜇=100,  and  𝜎 =15.  If  there  has  been   no  change  in  IQ  scores  over  the  years,  it  would  be  reasonable  to  hypothesise  that  the  mean  IQ  score  

remained  at  100  –  this  is  the  null  hypothesis.    

o So  we  express  this  null  hypothesis  as:  𝐻!:  𝜇 =100  (null  value)  

To  test  this  hypothesis  –  take  a  sample  of  students  measure  IQ  and  determine  mean.  Compute  a  95%  confidence   interval  and  check  to  see  whether  mean  lies  between  interval  values.  

⇒ Yes  it  does  lie  b/w  both  values  =  sample  consistent  with  null  hypothesis  

⇒ No  it  doesn’t  lie  b/w  both  values  =  null  value  is  not  a  plausible  value  for  population  mean   Alternative Hypothesis

We  also  need  to  specify  an  alternative  hypothesis,  𝑯𝟏  in  case  we  find  evidence  against  the  null  hypothesis,  𝑯𝟎.   è If  we  were  only  prepared  to  reject  a  null  hypothesis  when  we  found  evidence  to  indicate  the  population  

mean  was  significantly  lower  than  the  null  value,  𝝁𝟎,  the  alternative  hypothesis  would  be:  𝑯𝟏:  𝝁<𝝁𝟎  =   (one-­‐sided  test  à  p-­‐value  on  one  side  of  distribution)  

è If  were  only  prepared  to  reject  a  null  hypothesis  when  we  found  evidence  to  indicate  the  population  mean   was  significantly  higher  than  𝝁𝟎,  the  alternative  hypothesis  would  be  𝑯𝟏:  𝝁>𝝁𝟎  (one-­‐sided  test)  

Note:  In  STAT170,  we  will  NOT  use  one-­‐sided  tests  –  we  will  only  use  2-­‐sided  tests.  These  are  tests  where  we  are   prepared  to  reject  a  null  hypothesis  when  we  find  evidence  indicating  the  population  parameter  is  either   higher  or  lower  than  𝝁𝟎.  Therefore  the  alternative  hypothesis  for  this  test  of  a  population  mean  will  be:  

𝑯𝟏:  𝝁≠𝝁𝟎.  

Example  Continued:  Country  School  Students   Hypothesis:  

𝐻!:  𝜇=100   𝐻!:𝜇≠100   Checking Test Assumptions

We  need  to  ensure  that  a  z-­‐test  for  a  population  mean  will  be  valid  here.    

The  assumption  for  a  z-­‐test  for  population  mean  is  that  the  sample  mean  is  drawn  from  a  normal  distribution.    

v This  will  be  a  reasonable  assumption  if  the  sample  is  large  enough  for  the  Central  Limit  Theorem  to  apply   (which  is  the  case  here,  although  normality  had  been  given).  If  the  sample  is  not  large  enough,  we  should  

(2)

Nowshin Hassan STAT170 – Introductory Statistics Session One

2 graph  some  data  and  is  it  appears  that  the  sample  mean  may  be  drawn  from  a  normal  distribution,  then  we   can  assume  the  sample  mean  should  also  be  drawn  from  a  normal  distribution.  

v Regardless  of  sample  size,  it  is  always  good  practice  to  start  your  analysis  by  graphing  data   Testing a Null Hypothesis

To  test  a  null  hypothesis  for  a  pop.  mean,  we  compare  the  sample  value  𝑦,  with  the  corresponding  null  value,  𝜇!.  

For  the  research  question  about  the  IQ  scores  of  country  school  students,  the  null  value,  𝜇!=100,  and  the  sample   mean  𝑦 =103.  

Testing  this  null  hypothesis  is  the  same  as  asking  the  question:  

If  the  null  hypothesis  is  true  (i.e.  country  school  students  have  IQ  scores  the  same  as  the  general  population),  how   likely  is  a  sample  mean  to  be  3  points  (or  more)  away  from  the  population  mean,  in  either  direction?’  

A Test Statistic (a.k.a z-statistic)

A  test  statistic  is  a  scaled  measure  of  the  discrepancy  between  the  observed  sample  statistic  and  null  value  (𝐻!).  

⇒ Dividing  the  discrepancy  by  the  standard  error  gives  a  result  that  is  independent  of  the  measurement  unit   For  a  one-­‐sample  test  of  a  mean  where  Y  is  the  variable  of  interest,  the  test  statistic  takes  the  form:  

𝑇𝑒𝑠𝑡  𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐=   𝑦−𝜇! 𝑠𝑡.𝑒𝑟𝑟𝑜𝑟  𝑜𝑓𝑦      

So  the  test  statistic  tells  us  how  many  standard  errors  the  sample  mean,  𝒚,  is  away  from  the  null  value,  𝝁𝟎.   Test Statistic for One Sample z-test of a Population Mean

So  if  𝜎  is  known,  we  can  use  a  one-­‐sample  z  test  for  a  population  mean.  The  test  statistic  is:  

z=  𝑦−𝜇! 𝜎

𝑛  

For  our  example  we  know  𝜎 =15  and  we  are  testing  𝜇!=100  vs.  𝜇!≠100.  With  the  sample  size  n  =  36  and  the   sample  mean  𝑦 =103,  the  test  statistic  is  (absolute  value):  

𝑧= 103100 15

36

=1.2  

I.e.  A  sample  mean  IQ  of  103  is  1.2  standard  errors  from  𝜇!.   Size of a Test Statistic

∆ If  the  absolute  value  of  the  test  statistic  is  large,  it  means  there  is  a  large  discrepancy  b/w  the  null  value  and   the  sample  value  à  more  likely  to  reject  null  hypothesis.  

∆ If  the  absolute  value  of  the  test  statistic  is  small,  it  means  the  null  value  is  consistent  with  the  data  in  the   sample,  i.e.  the  discrepancy  could  be  attributable  to  chance  –  sampling  variation  à  more  likely  to  accept   null  hypothesis  

A  more  precise  measure  of  discrepancy  is  given  by  the  p-­‐value.  [Smallest  value  of  z-­‐statistic  is  0]  

What is a p-value?

P-­‐value:  the  likelihood/probability  of  getting  such  a  large  test  statistic  (or  even  larger),  assuming  that  the  null   hypothesis  is  true  (probability  of  getting  a  value  under  the  normal  curve  more  extreme  than  z-­‐statistic  in   either  direction  

o The  test  statistic,  z  =  1.2  is  a  measure  of  the  discrepancy  between  the  sample  mean,  𝑦=103  and  the   null  value,  𝜇!=100.  The  p-­‐value  will  tell  us  how  likely  we  are  to  select  a  sample  of  size  36  which  has  a   sample  mean  at  least  1.2  standard  errors  away  from  the  null  value,  if  𝐻!  is  true.  

o The  p-­‐value  is  tail  area  of  distribution  of  the  test  statistic,  assuming  𝐻!  is  true.  For  a  2-­‐sided  z-­‐test  of   a  population  mean,  the  p-­‐value  is  the  area  in  both  tails  of  the  z  distribution  because  the  measure  of   discrepancy  could  have  been  either  side  of  the  mean  

Note:  When  z  is  negative,  its  absolute  value  is  taken  to  get  the  area  in  the  right  tail   Using a z-table to find the p-value

For  this  research  question  we  have  a  test  statistic  of  z  =  1.2.  Using  the  standard  normal  distribution,  the  total  area  in   the  2  tails  of  the  z  distribution  to  the  right  of  z  =  1.2  and  to  the  left  of  z  =  -­‐1.2  is  0.2302  (p-­‐value  doubled  –  2-­‐tail).  

The  p-­‐value  is  0.2302.  It  is  the  probability  of  getting  a  z-­‐value  with  a   magnitude  greater  than  1.2  if  𝐻!  is  true.  

[23%  probability  of  getting  a  value  different  from  the  general  pop.]  

Significance Level & Decisions A  p-­‐value  may  be  used  to  make  a  decision  as  follows:  

(3)

Nowshin Hassan STAT170 – Introductory Statistics Session One

3

∆ If  a  p-­‐value  is  small  (<  0.05  –  this  value  is  called  the  ‘significance  level’),  you  reject  the  null  hypothesis  à  the   result  is  ‘statistically  significant’  

∆ If  the  p-­‐value  is  large  (≥  0.05)  you  say  the  null  hypothesis  could  be  true  but  you  do  NOT  accept  it  

Example:  For  the  country  students  the  correct  decision  would  be  to  not  reject  𝐻!  because  it  is  >  0.05  (not  less   than  the  5%  significance  level)  

SUMMARY  

For  a  two-­‐tailed  test,  a  p-­‐value  obtained  from  a  z-­‐statistic  by  determining  the  area  under  the  standardised   normal  curve  in  the  2  tails  above   𝑧  and  below  -­‐𝑧 .  This  area  may  be  calculated  from  the  z-­‐table.  

A Significance Result Using  a  significance  level  of  5%  

(i.e.  𝛼=0.05).  

‘If  p-­‐value  is  <  0.05,  Reject  𝐻!!’  à  

Statistically  Significant  Result   Writing a Conclusion Example:  Conclusion  

Since  we  do  not  reject  the  null   hypothesis  we  do  not  have   evidence  to  claim  that  country   school  students  have  a  different   average  IQ  score  to  that  of  the  general  population.  We  conclude  that,  on  average  IQ  scores  could  be  the  same   among  country  school  students  as  the  general  population.  

Example:  One  Sample  z-­‐test  for  Population  Mean  

Is  the  average  stopping  distance  of  school  buses  travelling   75km/h  equal  to  80  metres  as  claimed  by  public  transport   authorities?  

𝑌=𝑠𝑡𝑜𝑝𝑝𝑖𝑛𝑔  𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒  𝑖𝑛  𝑚𝑒𝑡𝑟𝑒𝑠   𝜎 =1.2,𝑛 =20,𝑦=80.74,𝜇=80   Hypothesis  Test:  

H:  

𝐻!:𝜇=80   𝐻!:𝜇≠80  

A:  From  the  histogram,  the  normality  assumption  appears  to  be  

reasonable.  The  sample  appears  to  be  drawn  from  a  normally  distributed  population.  

T:  z =  !!!! !

!   = !".!"!!"

!.!

!"

=2.76   P:  p-­‐value  =  0.0029  x  2  =  0.0058   D:  Since  p-­‐value  <  0.05,  reject  𝐻!  

C:  There  is  evidence  to  suggest  that  the  average  stopping  distance  for  all  school  buses  is  more  than  80  metres.  

p-values and 95% Confidence Intervals

A  confidence  interval  can  be  used  to  test  a  null  hypothesis:  

o If  the  confidence  interval  does  NOT  include  the  null  value  à  decision  is  to  reject  𝑯𝟎   o If  the  confidence  interval  does  include  the  null  value  à  decision  is  do  not  reject  𝑯𝟎  

A  p-­‐value  is  equivalent  to  this  rule:  If  the  p-­‐value  is  <  0.05,  𝐻!  is  rejected.  If  the  p-­‐value  is  ≥0.05,𝐻!  is  not  rejected.  

t-test for a Population Mean

A  z-­‐test  for  a  population  mean  is  only  used  when  the  population   standard  deviation  (𝝈)  is  known.  More  often  when  𝜎  is  unknown  it   can  be  estimated  using  the  sample  standard  deviation,  s.  

» When  𝜎  is  unknown,  the  z-­‐statistic  should  be  replaced  by  a  t-­‐statistic  with  v=  n  –  1  degrees  of  freedom  (df):  

𝑡= !!!!

!

 à  this  is  called  the  student’s  t  statistic  

» The  p-­‐value  for  a  t-­‐test  is  obtained  from  the  t-­‐distribution   Finding a p-value from a t-table

(4)

Nowshin Hassan STAT170 – Introductory Statistics Session One

4 Finding the range for a p-value from a t-

table

Suppose  you  have  conducted  a  t-­‐test  for  a  population   mean  and  have  𝑡 =2.31(since  we  are  only  doing  two-­‐

sided  tests  you  will  get  the  same  p-­‐value  for  t  =  2.31  or  t  

=  -­‐  2.31).  Say  the  sample  size  n  =  11,  then  v  is  10  (n-­‐1).  

If  the  t-­‐value  had  been  2.764  the  p-­‐value  would  have   been  0.02.  If  the  t-­‐value  had  been  2.228  the  p-­‐value   would  have  been  0.05.  Since  the  t-­‐value  is  b/w  2.764   and  2.228,  the  p-­‐value  must  be  b/w  0.02  and  0.05.  

0.02  <  p-­‐value  <  0.05  

Note:  In  Minitab,  it  calculates  the  exact  p-­‐value.  ALSO  If  the  test  is  two-­‐tailed  it  will  be  necessary  to  double  the  p-­‐

value  obtained  from  the  table.  

It  is  always  better  to  supplement  a  p-­‐value  by  showing  a  confidence  interval.  A  95%  confidence  interval  for  a   population  mean  when  the  population  standard  deviation  is  unknown  is:  

95%  𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒  𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑓𝑜𝑟  𝜇=  𝑦−𝑡!"#$× 𝑠 𝑛   Statistical Significance

Statistically  significant:  used  to  describe  the  result  of  testing  a  null  hypothesis,  which  gives  a  p-­‐value  <  0.05   Testing that a population has a specified proportion

In  this  case  the  null  hypothesis  takes  the  form:  

𝐻! =𝜋=𝜋!

When  checking  for  test  assumptions  (involving  a  population  proportion),  the  minimum  quantities,  𝑛𝜋!  and  𝑛  (1− 𝜋!)  should  be  ≥5,  otherwise  the  normality  assumption  is  not  reasonable.  

A  z-­‐statistic  is  used  to  show  the  disagreement  b/w  the  sample  proportion  and  the  null  value,  𝜋!.  It  is  shown  as   follows:  

𝑧=𝑝−𝜋!

𝑠𝑒! = 𝑝−𝜋! 𝜋!(1−𝜋!)

𝑛

Referensi

Dokumen terkait

If x is a normally distributed random variable with mean  and standard deviation  , then the z-scores of the values for the random variable have the standard

This process is called estimation , and the statistic we used (the sample mean) is called an estimator.. Using the sample mean to estimate µ is so obvious that it is hard to imagine

3 LO.e: Calculate and interpret measures of central tendency, including the population mean, sample mean, arithmetic mean, weighted average or mean, geometric

The mean value of all the sample means is equal to the population mean ( ). A corollary is that the sample

getting such a different sample mean would be very small, so you reject the null hypothesis4.  In other words, getting a sample mean of 20 is so unlikely if the population mean

Sample Mean Sampling Distribution: Standard Error of the Mean • Different samples of the same size from the same population will yield different sample means • A measure of the

when sample data are used for estimating a population mean, sampling error is not be present because the observed sample statistic is same as the actual value of the population

The Test Statistic and Critical Values  If the sample mean is close to the stated population mean, the null hypothesis is not rejected..  If the sample mean is far from the stated