Mathematical Model
Equation, formula
2 mc
E= V=IR
Mathematical Model
(
logθ
)
cos2 3 2 x y=
NRT PV=
P: pressure V: Volume T: Temperature
n: number of moles R: universal gas constant
Assumptions : ideal gas, static and close environment
Mathematical Model
Ideal gas law : PV=NRT
Q1 : Is this relationship true?
Q2 : What is the value of the constant R?
Answer these questions by a set of measurements :
(
Pi,Vi,Ti,Ni)
i i
i i i
T N
V P
R =
Errors due to unknown outside factors exists.
Statistical Model
Observed data p=P+δp v=V+δv t=T+δt n=N+δn
δ Unobserved measurement errors (random)
Ideal gas law :
NRT PV=
(
p
(
−
v
δ
pp)
(
v
p
−
δ
vv) (
=
pn
v−
δ
Rt
n) (
R
nt
−
Rn
δ
t)
tR
n t)
nRt
pv
=
+
δ
+
δ
−
δ
δ
−
δ
−
δ
+
δ
δ
Statistical Model
Systematic component
Random errors
Data
Model parameter Unknown parameter in systematic component e.g. universal gas constant R
Analysis of Variance Model (ANOVA)
One-way ANOVA Compare multiple populations
………….. 2 1 1,σ
µ
2 2 2,σ
µ
1
1 12 11,Y ,...,Yn Y
2
2 22 21,Y ,...,Yn Y
a
an a
a Y Y
Y1, 2,..., 2
, a aσ µ
Assumptions
1. Normal
2. Equal Variances
3. Independence
(
2)
1,σ
µ
N
(
2)
2,σµ
N
(
µ,σ2)
aN
One-way ANOVA
∑
= =a
i i
n N
1
Total sample size
∑
= = a ii i
n
N 1
1 µ
µ Overall population mean (grand mean)
µ µ αi= i−
ithtreatment effect
= ∑
= 0
1 a
i i i
nα
Random errors εij=Yij−µi=Yij−µ−αi
ANOVA model
ij i ij
Y =µ+α +ε j=1,2,...,ni i=1,2,...,a
0
1
= ∑
=
a
i i i
nα
( )
2, 0
~ σ
ε N
iid ij
One-way ANOVA
ANOVA model
ij i ij
Y =µ+α +ε j=1,2,...,ni i=1,2,...,a
0
1
=
∑
=
a
i i i
nα
( )
2, 0
~ σ
ε N
iid ij
µ
Between group
1 α µ +
2 α µ +
a α µ + ………….
21 21 2+ =Y
+α ε
µ
Within group
22 22 2+ =Y
+α ε
µ
2 2 2 2 2+ n =Yn
+α ε
µ
Test for Treatment Effects
same. the all not are s Population : vs same. the are s Population : 1 0 HHH0:ThereH0is:noα1 treatment=α2=leffect.αa vs=0 vsH1 :ThereH1:someare treatmentαi≠0effects.
ithsample mean ∑
= = ni
j ij i i Y n Y 1 1
overall sample mean ∑∑ ∑
= = = = = a i i i a i n j
ij nY
N Y N Y i 1 1 1 1 1
Total sum of squares ∑ ∑
(
)
= = − =a i n j ij T i Y Y SS 1 1 2
Treatment sum of squares ∑ ∑
( )
∑( )
= = = − = − = a i i i a i n j i
A Y Y nY Y
SS i 1 2 1 1 2
Error sum of squares ∑∑
(
)
= = − =a i n j i ij E i Y Y SS 1 1 2
Between Group Variation
Within Group Variation
Test for Treatment Effects
Break down of sum of squares
( )
i(
ij i)
ij Y Y Y Y Y
Y − = − + −
(
)
∑
( )
∑∑
(
)
∑∑
= = = = = − = − + − a i n j i ij a i i i a i n j ij i i Y Y Y Y n Y Y 1 1 2 1 2 1 1 2 E A T SS SSSS = +
Treatment mean squares ∑
( )
= − − = − = a i i i A
A nY Y
a a SS MS 1 2 1 1 1
Error mean squares ∑∑
(
)
= = − − = − = a i n j i ij E E i Y Y a N a N SS MS 1 1 2 1 true 1
H αµii not not allall the thesamesame largeMS variationA tendsof toYibearoundlargeY
MSEis unaffected by the population means.
Test for Treatment Effects
Reject H0if Fis too large.
Reject H0if Fobs > F(a-1, N-a, α).
(
a 1,N a,α)
F − − Obtained from Fdistribution table
Treatment mean squares ∑
( )
= − − = − = a i i i A
A nY Y
a a SS MS 1 2 1 1 1 Test statistic E A MS MS F=
Error mean squares ∑∑
(
)
= = − − = − = a i n j i ij E E i Y Y a N a N SS MS 1 1 2 1
F
Distribution
( ) 1 , 0
2 2 2 2 2 1 1 2 2 2 1 2 1 2
1 1 2
1 1 > + Γ Γ + Γ = + − − x x r r x r r r r r r x f r r r r
(1,2)
~Fr r X ( ) 2 2 2 − = r r X E 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 1 2 3 4 5
F Densities
r1= 2, r2= 4
r1= 4, r2= 6
r1= 9, r2= 9 r1= 12, r2= 12
( ) ( ( ) ( )) 4 2 2 2 2 2 2 1 2 1 2 2 − − − + = r r r r r r X Var
F
Distribution
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80 1 2 3 4 5
(
r
1,
r
2,
α
)
F
α
(
r
1,
r
2)
F
F
Distribution Table
(
3,4,0.05)
=?ANOVA Table
Reject H0if Fobs > F(a-1, N-a, α). Test statistic E A MS MS F= 0 some : vs 0
: 1 2 a 1 i
0 α =α = α = H α ≠
H l
N- 1
SST
Total
SSE/ (N- a)
N-a SSE
Error
MSA/ MSE
SSA/ (a - 1)
a - 1
SSA Treatment F-ratio MS d.f. SS Source
Computational Formulae
( )
TNn T Y Y n SS a i i i a i i i A 2 .. 1 2 1 2 − = − =
∑
∑
= =(
)
∑ ∑
∑
∑ ∑
= = = = = − = − = a i i i a i n j ij a i n j i ij E n T Y Y Y SS i i 1 2 1 1 2 1 1 2(
)
TNY Y Y SS a i n j ij a i n j ij T i i 2 .. 1 1 2 1 1 2 − = − =
∑ ∑
∑ ∑
= = = = ∑ ==ni j ij i Y T 1 ∑ = =a i i T T 1 ..
ithtotal overall total
One-way ANOVA
Example : Color brightness of films
23, 24, 25, 21, 26, 25, 27, 26, 22, 25, 27, 30, 25, 25, 27 Agfa
Data Brand
43, 41, 44, 50, 47, 32, 32, 36, 35, 34, 32, 38, 38, 40, 36 Fuji
32, 34, 31, 30, 37, 28, 28, 27, 30, 32, 26, 29, 27, 30, 31 Kodak 378 578 452 Ti 0 1 3 2 1
0: vs H:not H
H α =α =α
α
=0.0515 3 2 1=n =n =
n
3 =
a N=45
452
1=
T T2=578 T3=378 TT....==1408452+578+378 46040 3 1 15 1 2= ∑∑ = = i j ij Y N T n T SS i i i A 2 .. 3 1 2 − =
∑
= 45 1408 15 378 15 578 154522 2 2 2
− + + =
A
SSA=1363.38
SS N T Y SS a i n j ij T i 2 .. 1 1 2− =
∑ ∑
= = 45 1408 46040 2 − = T SSSST=1985.24
A T
E SS SS
SSE==1985−.24−1363.38
SSE=621.86
SS
One-way ANOVA
N- 1
SST
Total
SSE/ (N- a)
N-a SSE
Error
MSA/ MSE
SSA/ (a - 1)
a - 1
SSA Treatment F-ratio MS d.f. SS Source 1985.24 621.86 1363.38 44 42 2 14.81 681.69 46.03
From F distribution table F(2,42,0.05)≈F(2,40,0.05)=3.23
23
.
3
03
.
46
>
=
−
ratio
F
Reject H0at α= 0.05 .
The color brightness of the three brands of films are significantly different.
Estimation
Treatment effect : αi
Y Yi−
Point
( )
− ± − − N n MS t Y Y i E a N i 1 1 2 ,α Interval
Difference in treatment effects : αi-αj
j i Y
Y−
Point
(
)
+ ± − − j i E a N j i n n MS t Y
Y ,α2 1 1
Interval
Estimation
Example : Color brightness of films
13 . 30 15 452 1= =
Y 38.53
15 578 2= =
Y 25.2
15 378 3= =
Y 31.29
45 1408
= = Y
95% C.I. For α1 :
( )
− ± − N n MS t Y Y E 1 1 1 025 . 0 , 42 1 ( ) ( ) ( ) − ± − 45 1 15 1 81 . 14 021 . 2 29 . 31 13 . 301.16±1.64
−
[
−2.80 ,0.48]
95% C.I. For α2-α3:
(
)
+ ± − 3 2 025 . 0 , 42 3 2 1 1 n n MS t Y Y E ( ) ( ) ( ) + ± − 15 1 15 1 81 . 14 021 . 2 2 . 25 53 . 38.33 2.84 13 ±
[
10.49 ,16.17]
95% C.I. For α1-α2:[
−11.24 ,−5.56]
95% C.I. For α1-α3:
[
2.09 ,7.77]
α2> α3
α1< α2
α1> α3
3 1
2
α
α
Two way ANOVA
Example : Brightness of synthetic fabric
39, 48, 47 39, 42, 46
40, 45, 36 50
36, 39, 43 37, 35, 40
38, 32, 30 40
400°F 375°F
350°F
Time (cycles)
Temperature
Two-way factorial ANOVA model:
ijk ij j i ijk
Y
=
µ
+
α
+
β
+
γ
+
ε
k=1,2,3 j=1,2,3 i=1,20
= = =
=∑ ∑ ∑
∑
j ij i
ij j
j i
i β γ γ
α
(
2)
, 0
~ σ
ε N
iid ijk
Two way ANOVA
Example : Brightness of synthetic fabric
Two-way factorial ANOVA model:
ijk ij j i ijk
Y
=
µ
+
α
+
β
+
γ
+
ε
k=1,2,3 j=1,2,3 i=1,20
= = =
=∑ ∑ ∑
∑
j ij i
ij j
j i
i β γ γ
α
(
2)
, 0
~ σ
ε N
iid ijk MTB > print 'Bright' 'Time' 'Temp'
Data Display
Row Bright Time Temp
1 38 40 350 2 32 40 350 3 30 40 350 4 37 40 375 5 35 40 375 6 40 40 375 7 36 40 400 8 39 40 400 9 43 40 400 10 40 50 350 11 45 50 350 12 36 50 350
………
MTB > ANOVA 'Bright' = Time Temp Time*Temp.
Analysis of Variance (Balanced Designs)
Factor Type Levels Values Time fixed 2 40 50 Temp fixed 3 350 375 400
Analysis of Variance for Bright
Source DF SS MS F P Time 1 150.22 150.22 9.69 0.009 Temp 2 80.78 40.39 2.61 0.115 Time*Temp 2 3.44 1.72 0.11 0.896 Error 12 186.00 15.50
Total 17 420.44
significant
I nteraction
Group mean
Temperature
350 375 400
Time = 40
Time = 50
Additive
Time = 50
Time = 40
Non-additive
Regression
Sir Francis Galton
(1822 – 1911) Height of Son
Height of Father
Height of the sons of fathers regressed
towards the mean height of the population
Regression
Regression Model the relationship between dependent variable and independent variable(s)
Linear relationship Linear
Regression Simple Linear Regression
one independent variable
Tree age (by tree rings) Tree age (by C14)
A-Level Score Overall CGA
Risk of the stock Return of a stock
Extent of training Job performance
Examples
I ndependent variable (X ) Dependent variable (Y )
Simple Linear Regression
Scatterplot
Regression line
Simple Linear Regression Model
n i X
Yi=α+β i+εi , =1,2,..., εiiid~N
( )
0,σ2 assumptionsData :
{
(
X1,Y1) (
,X2,Y2) (
,...,Xn,Yn)
}
Simple Linear Regression Model
Example : Y = Height of son (in cm) X = Height of father (in cm)
Suppose true relation given by Y=0.9X+15
Fathers with same heights Sons with same heights
Unrealistic!
More reasonable relationship : E
( )
Y =0.9X+15182.2 0.7
181.5 185
174.6 -2.4
177 180
171.7 -0.8
172.5 175
169.3 1.3
168 170
Y
εεεε( Random Error) E( Y) = 0.9X + 15
X
Observed Unobserved Unobserved Observed 182.2
185
174.6 180
171.7 175
169.3 170
Y X
Estimate the regression line from these observed data Fit a regression line to the data
Estimation of Model Parameters
Sample statistics
∑
=
= n
i i
X n X
1
1
∑
=
= n
i i
Y n Y
1
1
( ) 2
1 2 1
2
X n X X X S
n
i i n
i i
xx=∑ − =∑ − =
= ( )
2 1
2 1
2
Y n Y Y Y S
n
i i n
i i
yy=∑ − =∑ − = =
(
X X)(
Y Y)
XY nXY Sn
i i i n
i
i i
xy=
∑
− − =∑
−=
=1 1
xx xy S S
b=
=
βK
X b Y
a= −
=
αK
Fitted regression line : YD=a+bX
True regression line : E
( )
Y =α+βX≠≠≠≠
Fitting Regression Line
Example : Study of how wheat yield depends on fertilizer.
80 65 65 70 50 50 40 Y
700 600 500 400 300 200 100 X
X = Fertilizer (in lb/acre) Y = Yield (in bu/acre)
400
=
X Y=60
1400000
7
1 2=
∑
= ii
X 26350
7
1 2=
∑
= ii
Y
184500
7
1 =
∑
= i
i iY
X
Fitting Regression Line
400
=
X Y=60
1400000
7
1 2=
∑
= ii
X 26350
7
1 2=
∑
= i
i
Y
184500
7
1 =
∑
= i
i iY
X
2
1 2
X n X S
n
i i xx=
∑
−= ( )( )
2
400 7 1400000−
= xx
Sxx=280000
S S XY nXY
n
i i i xy=
∑
−=1
( )( )( )7400 60 184500−
= xy
SSxy=16500
xx xy
S S
b= 0.059
280000 16500
= =
b aa==Y60−−b(X0.059)( )400=36.43
Fitted regression line : Y=36.43+0.059X
( )( )2
60 7 26350−
= yy
Syy=1150
S
Fitting Regression Line
X Y=36.43+0.059
Prediction
400
0=
X
(0.059)( )400 43 . 36
0= +
Y
03 . 60
0=
Y 650
0=
X
78 . 74
0=
Y
0
0= X
43 . 36
0= Y
Danger of Extrapolation
SARS Trend
0 200 400 600 800 1000 1200 1400
10-Mar 15-Mar 20-Mar25-Mar 30-Mar 4-Apr 9-Apr 14-Apr 19-Apr
Date
No
. o
f Ca
s
e
s
Danger of Extrapolation
SARS Trend
-500 0 500 1000 1500 2000 2500
28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May 19-May
Date
No
. o
f
Ca
s
e
s
Danger of Extrapolation
SARS Trend
-500 0 500 1000 1500 2000 2500
28-Feb10-Mar 20-Mar30-Mar 9-Apr 19-Apr 29-Apr 9-May 19-May
Date
N
o
. o
f C
ases
Danger of Extrapolation
SARS Trend
0 100 200 300 400 500 600 700 800 900 1000
10-Mar 15-Mar 20-Mar 25-Mar 30-Mar 4-Apr 9-Apr 14-Apr 19-Apr
Date
N
o
. o
f
p
a
tie
n
ts
in
h
o
s
p
it
a
l
Danger of Extrapolation
SARS Trend
-500 0 500 1000 1500 2000
28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May 19-May
Date
No
. of
pat
ient
s
i
n
hos
pi
ta
l
Danger of Extrapolation
SARS Trend
-500 0 500 1000 1500 2000
28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May 19-May
Date
N
o
. o
f pat
ie
n
ts
i
n
hospi
Danger of Extrapolation
SARS Trend
-500 0 500 1000 1500 2000
28-Feb 20-Mar 9-Apr 29-Apr 19-May 8-Jun
Date
N
o
. o
f p
a
ti
en
ts
i
n
h
o
sp
it
al
Danger of Extrapolation
SARS Trend
0 200 400 600 800 1000 1200
28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May 19-May
Date
N
o
. of
pa
ti
e
n
ts
i
n h
os
p
it
a
l
Danger of Extrapolation
SARS Trend
0 200 400 600 800 1000 1200
28-Feb 20-Mar 9-Apr 29-Apr 19-May 8-Jun
Date
N
o
.
o
f p
a
ti
e
n
ts
i
n
h
o
s
p
it
a
l
Nonlinear Relationships
Association
≠
Causation
Example : Price and Demand for gas
Fitted regression line : Demand = 139.24 – 1.11 Price
77 54 1968
35 43 56 87 105 109 136 112 134 Demand
57 54 50 45 43 42 37 31 30 Price
1969 1967 1966 1965 1964 1963 1962 1961 1960 Year
Low demand is due to high price.
? ?
65 58 1970
56 58 1971
58 60 1972
55 73 1973
49 88 1974
39 89 1975
36 92 1976
46 97 1977
40 100 1978
42 Demand
102 Price
1979 Year
Simpson’s Paradox
1960-1965
1966-1973
1974-1979
Year
Test For Regression Effect
Test For Regression Effect
0 : vs 0 :
Test H0 β= H1 β≠
Fitted values Yi=a+bXi
Residuals ri Yi Yi
− =
Decomposition of Variation
(
i) (
i i)
i Y Y Y Y Y
Y
− + − = −
Variation of Y Explained variation
Unexplained variation
Random Error
i i
i Y α βX
ε = − −
≠
Test For Regression Effect
Test For Regression Effect
Decomposition of Variation
(
i) (
i i)
i Y Y Y Y Y
Y
− + − = −
Break down of sum of squares
(
)
∑
(
)
∑
(
)
∑
= =
=
− + − =
− n
i i i n
i i n
i
i Y Y Y Y Y
Y
1 2
1 2
1
2
SST = SSR + SSE
yy
T S
SS =
Total sum of squares
(
)
∑
=− + − = n
i
i R Y bX bX Y
SS
1
2
(
)
∑
=− + = n
i i
R a bX Y
SS
1
2
Regression sum of squares
∑
(
)
= − = n
i i
R b X X
SS
1 2 2
xx xy xx R
S S S b SS
2 2 = =
Error sum of squares SSE=SST−SSR
xx xy yy xx yy E
S S S S b S SS
2
2 = −
− =
Test For Regression Effect
ANOVA table
n- 1
SST
Total
SSE/ (n- 2)
n- 2
SSE
Error
MSR/ MSE
SSR
1
SSR
Regression
F-ratio MS
d.f. SS
Source
Reject H0if Fobs > F(1, n-2, α). Test statistic
E R MS MS
F=
0 : vs 0
: 1
0 β= H β≠
H
R R
R SS
SS
MS = =
1 =n−2
SS
MS E
E
Test For Regression Effect
Example : Wheat yield example
280000 = xx
S Syy=1150 Sxy=16500
Regression line Y=36.43+0.059X
xx R bS
SS =(2 ) ( )
280000 059 .
0 2
= R
SSR=974.68
SS SST=Syy=1150 SSSSSSEEE===1751150SST.32−−SS974R.68
Regression line Y=36.43+0.059X
6 1150 Total
35.064 5
175.32 Error
27.805 974.68
1 974.68 Regression
F-ratio MS
d.f. SS
Source
(
1,5,0.05)
=6.61<27.805 FCoefficient of Determination
Strong relationship High prediction power
T R SS SS
R2=
Explained variation
Total variation
1
0
≤
R
2≤
Perfect linear relationship No linearrelationship
Example : 84.8%
1150 68 . 974
2= =
R
Coefficient of Determination
C.I . For Regression Parameters
100(1 -α)% C.I. for β
xx E n
S MS t b± −2,α2
100(1 -α)% C.I. for α
+ ± −
xx E n
S X n MS t
a 2,α2 1
Large Sxx More accurate estimates
Demonstration
C.I . For Regression Parameters
Example : Wheat yield example
Regression line Y=36.43+0.059X
6 1150 Total
35.064 5
175.32 Error
27.805 974.68
1 974.68 Regression
F-ratio MS
d.f. SS
Source
95% C.I. for β:
xx E
S MS t b±5,0.025( )
280000 064 . 35 57 . 2 059 . 0.059±0.0288
0 ±
[0.0302, 0.0878]
95% C.I. for α:
+ ±
xx E
S X n MS t a
2
025 . 0 , 5
1
( ) ( ) ( )
+ ±
280000 400 7 1 64 . 35 57 . 2 43 . 36
2
172 . 2 43 . 36 ±
[32.892 ,37.236]
Prediction
Predict the value of Y0at a fixed value of X= X0
Point prediction : Y0=a+bX0
100(1 -α)% prediction interval (P.I.)
(
)
−
+ + ± −
xx E
n
S X X n MS t Y
2 0 2
, 2 0
1 1 α
Prediction
Example : Wheat yield example
Regression line Y=36.43+0.059X
6 1150 Total
35.064 5
175.32 Error
27.805 974.68
1 974.68 Regression
F-ratio MS
d.f. SS
Source
At X0= 450, Y0=36.43+(0.059)( )450
98 . 62
0=
Y
90% prediction interval
(
)
−
+ + ±
xx E
S X X n MS t Y
2 0 05
. 0 , 5 0
1 1
( ) ( ) ( )
−
+ + ±
280000 400 450 7 1 1 064 . 35 02 . 2 98 . 62
2
837 . 12 98 .
62 ±
Prediction
Multiple Linear Regression
Example : Fuel consumption data
Data Display
Row State POP TAX NLIC INC ROAD FUELC DLIC 1 ME 1029 9.00 540 3.571 1.976 557 52.4781 2 NH 771 9.00 441 4.092 1.250 404 57.1984 3 VT 462 9.00 268 3.865 1.586 259 58.0087 4 MA 5787 7.50 3060 4.870 2.351 2396 52.8771 5 RI 968 8.00 527 4.399 0.431 397 54.4422 6 CN 3082 10.00 1760 5.342 1.333 1408 57.1058 7 NY 18366 8.00 8278 5.319 11.868 6312 45.0724 8 NJ 7367 8.00 4074 5.126 2.138 3439 55.3007 9 PA 11926 8.00 6312 4.447 8.577 5528 52.9264 10 OH 10783 7.00 5948 4.512 8.507 5375 55.1609 11 IN 5291 8.00 2804 4.391 5.939 3068 52.9957 12 IL 11251 7.50 5903 5.126 14.186 5301 52.4664
………..
ε β β β β
β + + + + +
= TAX DLIC INC ROAD
FUEL 0 1 2 3 4
Multiple Linear Regression
Example : Fuel consumption data
Regression Analysis
The regression equation is
FUEL = 37.7 - 3.48 TAX + 1.34 DLIC - 6.65 INC - 0.242 ROAD
Predictor Coef Stdev t-ratio p
Constant 37.68 18.57 2.03 0.049
TAX -3.478 1.298 -2.68 0.010
DLIC 1.3366 0.1924 6.95 0.000
INC -6.651 1.723 -3.86 0.000
ROAD -0.2417 0.3391 -0.71 0.480
s = 6.633 R-sq = 67.8% R-sq(adj) = 64.9% Analysis of Variance
SOURCE DF SS MS F p
Regression 4 3991.92 997.98 22.68 0.000
Error 43 1892.05 44.00
Total 47 5883.96
Unusual Observations
Obs. TAX FUEL Fit Stdev.Fit Residual St.Resid 37 5.0 63.963 64.758 3.723 -0.795 -0.14 X 40 7.0 96.812 73.371 2.102 23.441 3.73R
R denotes an obs. with a large st. resid.