I nterval Estimation
3% Not yet decided
1% Buchanan
4% Ralph Nader
45% Al Gore
47% George W. Bush
Result of Nationwide Gallup poll November 5, 2000
Example : Election 2000 voting poll from 2386 interviews
Margin of error : 2%
Can Bush get more votes than Gore?
47% > 45%
Bush Gore
45 %
43 47 49
Point estimates
“too close to call” Interval estimates
Margin of error : 0.05%
Interval estimates
“Bush will win the race”
I nterval Estimation
Example : VSRS from
(
µ,σ2)
N {X1,X2,...,Xn}
n N X
2 , ~ µσ
− −
Φ −
+ −
Φ =
+ ≤ ≤ −
n n
n n
n X
n σ
µ σ µ σ
µ σ µ σ µ σ
µ 1.96 1.96
96 . 1 96
. 1 Pr
( ) (1.96−Φ−1.96) Φ
=0.975−(1−0.975) =
95 . 0 =
I nterval Estimation
n N X
2
, ~ µσ
95% of the means within bounds
X
X X
X
X
X X
X
X X
X 95% of the intervals contain µ
n X−1.96σ
n X+1.96σ
n X−1.96σ
n X+1.96σ
n X−1.96σ
n X+1.96σ
n X−1.96σ
n X+1.96σ
n X−1.96σ
n X+1.96σ
n X−1.96σ
n X+1.96σ
n X−1.96σ
n X+1.96σ n
σ µ+1.96 n
σ µ−1.96
area = 0.95
% 95 96 . 1 96
. 1
Pr =
+ ≤ ≤ −
n X n
σ µ σ
µ
+ −
n X n X 1.96σ , 1.96σ
95 % confidence interval
Confidence I nterval
L, U Statistics calculated from the sample
The interval [L, U] is called a 95% confidence interval (C.I.) for the parameter θif
(
θ
)
0.95 for any possibleθ
PrL≤ ≤U =Confidence level
The interval [L, U] is called a 90% confidence interval (C.I.) for the parameter θif
(
θ
)
0.90 for any possibleθ
PrL≤ ≤U =The interval [L, U] is called a 100(1-α)% confidence interval (C.I.) for the parameter θif
(
θ
)
1-α
for any possibleθ
PrL≤ ≤U =90% 0.1
95% 0.05
99% 0.01
Confidence level α
Confidence I nterval
Normal population
VSRS
(
µ,σ2)
N µ unknown σ2known
{X1,X2,...,Xn}
( ) (
1.96 1.96)
0.95 96. 1 96
. 1
Pr =Φ −Φ− =
≤ − ≤ −
n X
σ
µ 1.96 0.95
96 . 1
Pr =
+ ≤ ≤ −
n X n
X σ µ σ
95% confidence interval for µ :
+ −
n X n
X 1.96σ , 1.96σ
point estimate margin of error n X±1.96σ
90 . 0 645 . 1 645
. 1
Pr =
+ ≤ ≤ −
n X
n
X σ µ σ
n X±1.645σ 90% confidence interval for µ :
99 . 0 576 . 2 576
. 2
Pr =
+ ≤ ≤ −
n X
n
X σ µ σ
n X±2.576σ 99% confidence interval for µ :
α σ µ
σ
α
α = −
+ ≤ ≤
− 1
Pr 2 2
n Z X n Z X
n Z
X± α2 σ
100(1-α)% C.I. for µ :
Tail Area
( )0,1 N
0
-Zα/2 Zα/2
α/2 1 -α α/2
645 . 1 05 . 0 = Z
96 . 1 025 . 0 = Z
2 3 . 0 15 .
0 Z
Z0.15==1.036 Z
1 . 0 9 .
0 Z
Confidence I nterval
Example : X alcohol concentration of French wine
(
,1.22)
~Nµ X
VSRS {X1,X2,...,X60} X=9.3
A 90% C.I. for µ: ( )
60 2 . 1 645 . 1 3 . 9 05 .
0 = ±
±
n Z
X σ
255 . 0 3 . 9 ± =
[
9.045 ,9.555]
=(
9.045 9.555)
0.9Pr ≤
µ
≤ =? ?
Confidence I nterval
60 2 . 1 , ~
2
µ N X
X
X
X
X X
X
X 90% of the intervals contain µ
255 . 0 + µ 255
. 0 − µ
area = 0.90
90 % confidence intervals
[9.045, 9.555] 9.3
8.9 [ ]
9.155 , 8.645
9.5
[9.245, 9.755]
Normal Model, Unknown Variance
Normal population
VSRS
(
µ,σ2)
N µ unknown σ2unknown
{X1,X2,...,Xn}
α σ µ
σ
α
α = −
+ ≤ ≤
− 1
Pr 2 2
n Z X n Z X
n Z
X± α2 σ
100(1-α)% C.I. for µ :
( )0,1 ~N n X
σ µ −
unknown ?
~ n S
X −µ
( )
∑
= −
−
= n
i i X X n S
1 2
1 1
1 ~ −
− n t n S
X µ
α
µ α
α = −
+ ≤ ≤
− − − 1
Pr 1, 2 1, 2
n S t X n S t
X n n
n S t X± n−1,α2
Cauchy t1≡ 7 t
3 t
Student-t Distribution
( ) −∞< <∞
+
Γ
Γ
+ Γ =
+ − −
x r
x r r
r x f
r
, 1 2 1 2
2 1
2 1 2 2 1
r t X~
0 3
3
Student-t Densities N( )0,1
( )X =0 for r>1 E
( ) for 2 2 > −
= r
r r X Var
∞ ≡t
Student-t Distribution
r
t
0
α/2 1 -α α/2
tr,α/2 -tr,α/2
Student-t Distribution Table
r
t
X~
? 05 . 0 ,
8 =
t8,0.05=1.860
Normal Model, Unknown Variance
Example : X Frequency of elephant call
(
, 2)
~Nµσ XA 95% C.I. for µ: ( )
12 424 . 56 201 . 2 33 . 22 025 . 0 ,
11 = ±
±
n S t X
773 . 4 33 . 22 ± =
[
17.557 ,27.103]
=(
17.557 27.103)
0.95Pr ≤
µ
≤ =VSRS {X1,X2,...,X12} X=22.33 S2=56.424
Two Samples Problem
Population 1 Population 2
2
, x
x σ
µ 2
, y
y σ
µ
y
x
µ
µ
−Compare µxand µyby : unknown
{X1,X2,...,Xm} {Y1,Y2,...,Yn} VSRS
estimate
Normal Models, Known Variances
Normal populations
(
, 2)
x xNµ σ µσx ,µyunknown
x2 , σy2known
point estimate margin of error
(
)
α σ σ µ(
)
α σ σ = −α
+ + − ≤ ≤ + −
− 1
Pr
2 2 2 2
2 2
n m Z Y X n m Z Y
X x y x y
(
)
n m Z Y
X x y
2 2 2
σ σ
α +
± − 100(1-α)% C.I. for µ :
(
, 2)
y y Nµ σIndependentVSRSs {X1,X2,...,Xm} {Y1,Y2,...,Yn}
Y X−
Point estimator for µx-µy:
+ − −
n m N
Y
X x y
y x
2 2 ,
~ µ µ σ σ
Normal Models, Unknown Variances
Normal populations
(
, 2)
x xNµ σ σµx ,µyunknown
x2 , σy2unknown
(
X Y)
Z mx ny 2 2 2σ σ
α +
± − 100(1-α)% C.I. for µx-µy:
(
, 2)
y y Nµ σIndependentVSRSs {X1,X2,...,Xm} {Y1,Y2,...,Yn}
unknown Assumption : Equal variances σ2=σ2=σ2
y x
Pooled sample variance ( ) ( )
2 1
1 2 2
2
− +
− + − =
n m
S n S m
S x y
pool
(
X Y)
tmn Spool m n 1 1 2 ,2 +
±
− +− α
Normal Models, Unknown Variances
Example : growth of tomatoes
13, 11, 10, 6, 7, 4, 10 Fertilizer B (Y)
12, 11, 7, 13, 8, 9, 10, 13 Fertilizer A (X)
Assumptions : (i) Normal populations
(ii) Independent samples (iii) Equal variance
95% C.I. for µx-µy :
905 . 9 , 125 . 5 , 714 . 8 , 371 . 10 , 7 ,
8 = = = 2= 2=
= n X Y Sx Sy
m
(
X Y)
tmn Spool m n 1 1 2 ,2 +
±
− +− α
(
)
(
)
7 1 8 1 331 . 7 714 . 8 371 .
10 −
(
)(
±t13,0)
.025 +7 1 8 1 331 . 7 160 . 2 661 .
1.661±3.027 [ 1.366 , +4.688]
1 ± = −
Pooled sample variance : ( ) ( )
2 1
1 2 2
2
− +
− + − =
n m
S n S m
S x y
pool
( )( ) ( )( ) 7.331
2 7 8
905 . 9 1 7 125 . 5 1 8
2 =
− +
− + − = pool S
Large Sample Approximation
(
2)
,σ µ N
Normal Population
α σ µ σ
α
α = −
+ ≤ ≤
− 1
Pr 2 2
n Z X n Z X
n Z
X± α2 σ
100(1-α)% C.I. for µ : ( )0,1 ~N n X σ
µ −
VSRS
{X1,X2,...,Xn}
α σ µ σ
α
α ≈ −
+ ≤ ≤
− 1
Pr 2 2
n Z X n Z X
Approximate 100(1-α)% C.I. for µ :
IF σis unknown, replace it by S.
ArbitraryPopulation
2
,σ
µ ?
For large n
( )0,1 ~
. N n X σ
Population Proportion
LargePopulation
π
π
−
1
π Population proportionRandom sample of size n
Virus carriers
X= number of virus carriers in the sample
( )
,
π
~
b
n
X
Point estimator for π:
n X
= π
Sample proportion
Population Proportion
( )~ ( )0,1 1
. N n
n X
π π
π − −
For large n
(
π)
απ π
α
α ≈ −
≤ − − ≤
− 1
1
Pr 2 Z 2
n n X Z
Approximate 100(1-α)% C.I. for π :
(
)
(
)
(
Z n)
n Z n
Z Z
2 2
2 2 2 2 2
2 2
2
1 2
1 4 2
2
α
α α
α π π
π
+
+ − + ±
+
(
)
nZ π π
π α
−
±
≈ 2 1
Population Proportion
Example : 41 people out of 500 persons were unemployed
082 . 0 500
41 = = πK
Approximate 95% C.I. for π:
( ) ( ) ( )( )
500 918 . 0 082 . 0 96 . 1 082 . 0 1 025 .
0 = ±
− ±
n
Z π π
πK K K
024 . 0 082 . 0 ± =
[5.8% , 10.6%] =
Estimated unemployment rate is 8.2% with margin of error 2.4% .
Two Samples Problem
Population 1 Population 2
2
1
π
π
−Compare π1and π2by : unknown
1
π π2
Sample with size n1 Sample with size n2
X Y
Two Samples Problem
Point estimators 1 1
n X
= π
2 2
n Y
= π
( )~ ( )0,1 1
.
1 1 1
1 1
N n
n X
π π
π − −
For large n1, n2 ( )~ ( )0,1
1 .
2 2 2
2
2 N
n n Y
π π
π − −
Approximate 100(1-α)% C.I. for π1-π2:
(
)
(
)
(
)
2 2 2
1 1 1 2 2 1
1 1
n n
Z π π π π
π
π α
−
+ − ±
−
Two Samples Problem
Example : ability of detergents to remove stains
37 42
2
63 Successes
28 1
Failures Detergent
79 91
Total 69.23% 13
9 91 63
1= = =
πK
% 16 . 53 79 42
2= =
πK
Approximate 90% C.I. for π1-π2:
( ) ( ) ( )
2 2 2 1
1 1 05 . 0 2 1
1 1
n n
Z π π π π
π
π− ± − + −
( ) ( )( ) ( )( )
79 79 37 79 42 91
13 4 13 9 645 . 1 79 42 13
9 ± +
− =
[3.88% ,28.26%]
1219 . 0 1607 .
0 ± =
=
0
Sample Size Determination
n Z
X± α2 σ
100(1-α)% C.I. for µ :
X
n Zα2σ
n Zα2σ
µ X
n Zα2σ
n Zα2σ
X
n Zα2σ
n Zα2σ
X
n Zα2σ
n Zα2σ
D : precision requirement
2 2 2
2 2
D Z n n Z
D= ασ⇔ = ασ
( ) ( )
2 2
2 2
1 1
D Z n n Z
D α π π α π π
− = ⇔ − =
With probability 1-αααα, estimation error is at most D.
Sample Size Determination
Example : survey on entertainment expenses
95% confidence that estimation error is at most $120 Precision requirement :
400 $ =
σ from past surveys
( ) ( )
( )120 42.68 43 400
96 . 1
2 2 2 2
2 2
025 .
0 = = ≈
=
D Z