Statistics calculated from the sample The interval [L, U] is called a 95 confidence interval (C.I.) for the parameter θ if

(1)

I nterval Estimation

3% Not yet decided

1% Buchanan

4% Ralph Nader

45% Al Gore

47% George W. Bush

Result of Nationwide Gallup poll November 5, 2000

Example : Election 2000 voting poll from 2386 interviews

Margin of error : 2%

Can Bush get more votes than Gore?

47% > 45%

Bush Gore

45 %

43 47 49

Point estimates

“too close to call” Interval estimates

Margin of error : 0.05%

Interval estimates

“Bush will win the race”

I nterval Estimation

Example : VSRS from

(

_µ_,_σ2

)

N {X1,X2,...,Xn}

     

n N X

2 , ~ µσ

    

 

 ₋ ₋

Φ −     

 

 ₊ ₋

Φ =    

 

+ ≤ ≤ −

n n

n X

n σ

µ σ µ σ

µ σ µ σ µ σ

µ 1.96 1.96

96 . 1 96

. 1 Pr

( ) (1.96−Φ−1.96) Φ

=0.975−(1−0.975) =

95 . 0 =

I nterval Estimation

     

n N X

2

, ~ µσ

95% of the means within bounds

X

X X

X

X X

X

X X

X _{95% of the intervals contain}µ

n X−1.96σ

n X+1.96σ

n X−1.96σ

n X+1.96σ

n X−1.96σ

n X+1.96σ

n X−1.96σ

n X+1.96σ

n X−1.96σ

n X+1.96σ

n X−1.96σ

n X+1.96σ

n X−1.96σ

n X+1.96σ n

σ µ+1.96 n

σ µ−1.96

area = 0.95

% 95 96 . 1 96

. 1

Pr =

  

 

+ ≤ ≤ −

n X n

σ µ σ

µ 

  

 

+ −

n X n X 1.96σ , 1.96σ

95 % confidence interval

Confidence I nterval

L, U Statistics calculated from the sample

The interval [L, U] is called a 95% confidence interval (C.I.) for the parameter θif

(

θ

)

0.95 for any possible

θ

PrL≤ ≤U =

Confidence level

The interval [L, U] is called a 90% confidence interval (C.I.) for the parameter θif

(

θ

)

0.90 for any possible

θ

PrL≤ ≤U =

The interval [L, U] is called a 100(1-α)% confidence interval (C.I.) for the parameter θif

(

θ

)

1-

α

for any possible

θ

PrL≤ ≤U =

90% 0.1

95% 0.05

99% 0.01

Confidence level α

Confidence I nterval

Normal population

VSRS

(

_µ_,_σ2

)

N µ unknown σ2_known

{X1,X2,...,Xn}

( ) (

1.96 1.96

)

0.95 96

. 1 96

. 1

Pr =Φ −Φ− =

  

  

≤ − ≤ −

n X

σ

µ ₁_.₉₆ ₀_.₉₅

96 . 1

Pr =

  

 

+ ≤ ≤ −

n X n

X σ µ σ

95% confidence interval for µ : 

  

 

+ −

n X n

X 1.96σ , 1.96σ

point estimate margin of error n X±1.96σ

90 . 0 645 . 1 645

. 1

Pr =

  

 

+ ≤ ≤ −

n X

n

X σ µ σ

n X±1.645σ 90% confidence interval for µ :

99 . 0 576 . 2 576

. 2

Pr =

  

 

+ ≤ ≤ −

n X

n

X σ µ σ

n X±2.576σ 99% confidence interval for µ :

α σ µ

σ

α

α _= −

 

 

+ ≤ ≤

− 1

Pr 2 2

n Z X n Z X

n Z

X± α2 σ

100(1-α)% C.I. for µ :

Tail Area

( )0,1 N

0

-Zα/2 Zα/2

α/2 _{1 -}α α/2

645 . 1 05 . 0 = Z

96 . 1 025 . 0 = Z

2 3 . 0 15 .

0 Z

Z0.15==1.036 Z

1 . 0 9 .

0 Z

(2)

Confidence I nterval

Example : X alcohol concentration of French wine

(

_,₁_.₂2

)

~Nµ X

VSRS {X1,X2,...,X60} X=9.3

A 90% C.I. for µ: ( )

60 2 . 1 645 . 1 3 . 9 05 .

0 = ±

±

n Z

X σ

255 . 0 3 . 9 ± =

[

9.045 ,9.555

]

=

(

9.045 9.555

)

0.9

Pr ≤

µ

≤ =

? ?

Confidence I nterval

     

60 2 . 1 , ~

2

µ N X

X

X X

X

X 90% of the intervals contain µ

255 . 0 + µ 255

. 0 − µ

area = 0.90

90 % confidence intervals

[9.045, 9.555] 9.3

8.9 _[ _]

9.155 , 8.645

9.5

[9.245, 9.755]

Normal Model, Unknown Variance

Normal population

VSRS

(

_µ_,_σ2

)

N µ unknown σ2_unknown

{X1,X2,...,Xn}

α σ µ

σ

α

α = −

  

 

+ ≤ ≤

− 1

Pr ₂ ₂

n Z X n Z X

n Z

X± α2 σ

100(1-α)% C.I. for µ :

( )0,1 ~N n X

σ µ −

unknown ?

~ n S

X −µ

( )

∑

= −

−

= n

i i X X n S

1 2

1 1

1 ~ −

− n t n S

X µ

α

µ α

α _= −

 

 

+ ≤ ≤

− − − 1

Pr 1, 2 1, 2

n S t X n S t

X n n

n S t X± n−1,α2

Cauchy t1≡ 7 t

3 t

Student-t Distribution

( )  −∞< <∞

    

+

     

Γ

     

Γ

      + Γ =

+ − −

x r

x r r

r x f

r

, 1 2 1 2

2 1

2 1 2 2 1

r t X~

0 3

3

Student-t Densities N( )0,1

( )X =0 for r>1 E

( ) for 2 2 > −

= r

r r X Var

∞ ≡t

Student-t Distribution

r

t

0

α/2 _{1 -}α α/2

tr,α/2 -tr,α/2

Student-t Distribution Table

r

t

X~

? 05 . 0 ,

8 =

t8,0.05=1.860

(3)

Normal Model, Unknown Variance

Example : X Frequency of elephant call

(

_, 2

)

~Nµσ X

A 95% C.I. for µ: ( )

12 424 . 56 201 . 2 33 . 22 025 . 0 ,

11 = ±

±

n S t X

773 . 4 33 . 22 ± =

[

17.557 ,27.103

]

=

(

17.557 27.103

)

0.95

Pr ≤

µ

≤ =

VSRS {X1,X2,...,X12} X=22.33 S2=56.424

Two Samples Problem

Population 1 Population 2

2

, x

x σ

µ 2

, y

y σ

µ

y

x

µ

−

Compare µxand µyby : unknown

{X1,X2,...,Xm} {Y1,Y2,...,Yn} VSRS

estimate

Normal Models, Known Variances

Normal populations

(

_, 2

)

x x

Nµ σ µ_σx ,µyunknown

x2 , σy2known

point estimate margin of error

(

)

α σ σ µ

(

)

α σ σ _= −α

  

  

+ + − ≤ ≤ + −

− 1

Pr

2 2 2 2

2 2

n m Z Y X n m Z Y

X x y x y

(

)

n m Z Y

X x y

2 2 2

σ σ

α +

± − 100(1-α)% C.I. for µ :

(

_, 2

)

y y Nµ σ

IndependentVSRSs {X1,X2,...,Xm} {Y1,Y2,...,Yn}

Y X−

Point estimator for µx-µy: _

  

  

+ − −

n m N

Y

X x y

y x

2 2 ,

~ µ µ σ σ

Normal Models, Unknown Variances

Normal populations

(

_, 2

)

x x

Nµ σ _σµx ,µyunknown

x2 , σy2unknown

(

X Y

)

Z _mx _ny 2 2 2

σ σ

α +

± − 100(1-α)% C.I. for µx-µy:

(

_, 2

)

y y Nµ σ

IndependentVSRSs {X1,X2,...,Xm} {Y1,Y2,...,Yn}

unknown Assumption : Equal variances _σ2₌_σ2₌_σ2

y x

Pooled sample variance ( ) ( )

2 1

1 2 2

2

− +

− + − =

n m

S n S m

S x y

pool

(

X Y

)

tmn Spool _m _n 1 1 2 ,

2 +

±

− +− α

Normal Models, Unknown Variances

Example : growth of tomatoes

13, 11, 10, 6, 7, 4, 10 Fertilizer B (Y)

12, 11, 7, 13, 8, 9, 10, 13 Fertilizer A (X)

Assumptions : (i) Normal populations

(ii) Independent samples (iii) Equal variance

95% C.I. for µx-µy :

905 . 9 , 125 . 5 , 714 . 8 , 371 . 10 , 7 ,

8 ₌ ₌ ₌ 2₌ 2₌

= n X Y Sx Sy

m

(

X Y

)

tmn Spool _m _n 1 1 2 ,

2 +

±

− +− α

(

)

(

)

7 1 8 1 331 . 7 714 . 8 371 .

10 −

(

)(

±t13,0

)

.025 +

7 1 8 1 331 . 7 160 . 2 661 .

1.661±3.027 [ 1.366 , +4.688]

1 ± = −

Pooled sample variance : ( ) ( )

2 1

1 2 2

2

− +

− + − =

n m

S n S m

S x y

pool

( )( ) ( )( ) ₇_.₃₃₁

2 7 8

905 . 9 1 7 125 . 5 1 8

2 ₌

− +

− + − = pool S

Large Sample Approximation

(

2

)

,σ µ N

Normal Population

α σ µ σ

α

α = −

  

 

+ ≤ ≤

− 1

Pr 2 2

n Z X n Z X

n Z

X± α2 σ

100(1-α)% C.I. for µ : ( )0,1 ~N n X σ

µ −

VSRS

{X1,X2,...,Xn}

α σ µ σ

α

α ≈ −

  

 

+ ≤ ≤

− 1

Pr 2 2

n Z X n Z X

Approximate 100(1-α)% C.I. for µ :

IF σis unknown, replace it by S.

ArbitraryPopulation

2

,σ

µ ?

For large n

( )0,1 ~

. N n X σ

(4)

Population Proportion

LargePopulation

π

−

1

π Population proportion

Random sample of size n

Virus carriers

X= number of virus carriers in the sample

( )

,

π

~

b

n

X

Point estimator for π:

n X

= π

Sample proportion

Population Proportion

( )~ ( )0,1 1

. N n

n X

π π

π − −

For large n

(

π

)

α

π π

α

α _≈ −

  

  

≤ − − ≤

− 1

1

Pr 2 Z 2

n n X Z

Approximate 100(1-α)% C.I. for π :

(

)

(

)

(

Z n

)

n Z n

Z Z

2 2

2 2 2 2 2

2 2

2

1 2

1 4 2

2

α

α α

α π π

π

+

+ − + ±

+

(

)

n

Z π π

π α

−

±

≈ 2 1

Population Proportion

Example : 41 people out of 500 persons were unemployed

082 . 0 500

41 = = πK

Approximate 95% C.I. for π:

( ) _{( ) (} )( )

500 918 . 0 082 . 0 96 . 1 082 . 0 1 025 .

0 = ±

− ±

n

Z π π

πK K K

024 . 0 082 . 0 ± =

[5.8% , 10.6%] =

Estimated unemployment rate is 8.2% with margin of error 2.4% .

Two Samples Problem

Population 1 Population 2

2

1

π

−

Compare π1and π2by : unknown

1

π π2

Sample with size n1 Sample with size n2

X Y

Two Samples Problem

Point estimators 1 1

n X

= π

2 2

n Y

= π

( )~ ( )0,1 1

.

1 1 1

1 1

N n

n X

π π

π − −

For large n1, n2 ₍ ₎~ ( )0,1

1 .

2 2 2

2

2 _N

n n Y

π π

π − −

Approximate 100(1-α)% C.I. for π1-π2:

(

)

(

)

(

)

2 2 2

1 1 1 2 2 1

1 1

n n

Z π π π π

π

π α

−

+ − ±

−

Two Samples Problem

Example : ability of detergents to remove stains

37 42

2

63 Successes

28 1

Failures Detergent

79 91

Total 69.23% 13

9 91 63

1= = =

πK

% 16 . 53 79 42

2= =

πK

Approximate 90% C.I. for π1-π2:

( ) ( ) ( )

2 2 2 1

1 1 05 . 0 2 1

1 1

n n

Z π π π π

π

π− ± − + −

( ) ( )( ) ( )( )

79 79 37 79 42 91

13 4 13 9 645 . 1 79 42 13

9 _± ₊

      ₋ =

[3.88% ,28.26%]

1219 . 0 1607 .

0 ± =

=

0

(5)

Sample Size Determination

n Z

X± α2 σ

100(1-α)% C.I. for µ :

X

n Zα2σ

µ X

n Zα2σ

X

n Zα2σ

X

n Zα2σ

D : precision requirement

2 2 2

2 2

D Z n n Z

D= ασ⇔ = ασ

( ) ( )

2 2

1 1

D Z n n Z

D α π π α π π

− = ⇔ − =

With probability 1-αααα, estimation error is at most D.

Sample Size Determination

Example : survey on entertainment expenses

95% confidence that estimation error is at most $120 Precision requirement :

400 $ =

σ from past surveys

( ) ( )

( )120 42.68 43 400

96 . 1

2 2 2 2

2 2

025 .

0 ₌ ₌ _≈

=

D Z