ANALYSIS OF VARIANCE (ANOVA) INTRODUCTION
The test of significance based on t-distribution is an adequate procedure only for testing the significance of the difference between two sample means. In a situation when we have three or more samples to consider at a time an alternative procedure is needed for testing the hypothesis that all the samples are drawn from the same population I.e., the means are equal.
For example, three types of fertilisers are applied to five plots each and their yields on each of the plot is given as follows
plots Yield of wheat in tons fertiliser
A
Fertiliser B
fertiliser C
1 20 18 25
2 21 20 25
3 23 17 25
4 16 15 25
5 20 25 25
Mean 100/5=20 95/5=19 125/5=25
We have to study if the effect of these fertilisers on the yield is significantly different, or in other words the samples are from the same population. The answer to this is provided by the technique of analysis of variance.
The basic purpose of analysis of variance is to test the homogeneity of several means.
The term ‘Analysis of variance’ was introduced by Prof. R.A. Fisher in 1920’s.
Variation is inherent in nature,
The total variation in any set of numerical data is due to a number of causes which may be calculated as i) assignable causes and ii) chance causes
The variation due to assignable causes can be detected and measured whereas the chance causes is beyond the control of human and cannot be traced.
Examples of assignable causes of variation
Inappropriate procedures, substandard raw materials, measurement errors, temperature etc.,
DEFINITION:
According to Prof. R.A. Fisher, Analysis of Variance (ANOVA) is the “Separation of variance ascribable to one group of causes from the variance ascribable to other group”
The ANOVA consists in the estimation of the amount of variation due to each of the independent factors (causes) separately and then comparing these estimates due to assignable factors(causes), with the estimate due to chance factor (causes). The later being known as experimental error.
ASSUMPTIONS FOR ANOVA TEST
ANOVA test is based on the test statistics F (variance Ratio)
For the validity of the F-test in ANOVA, the following assumptions are made:
i) The observations are independent,
ii) Parent population from which observations are taken is normal, and iii) Various treatment and environmental effects are additive in nature.
IMPORTANCE:
ANOVA technique enables us to compare several population means simultaneously and thus results in lot of savings in time and money
The origin of ANOVA technique lies in agricultural experiments but it finds its applications in almost all types of design of experiments in various diverse fields such as in industry, education, psychology, business etc.,
The ANOVA technique is not designed to test the equality of several population variances. It’s objective is to test the equality of several population means or the homogeneity of several independent sample means.
In addition to testing homogeneity of several sample means, the ANOVA technique is now frequently applied in testing the linearity of the fitted regression line or the significance of the correlation ratio.
MATHEMATICAL MODEL
FIXED EFFECT MODEL AND RANDOM EFFECT MODELS
A fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities.
In random effects model in which all or some of the model parameters are random variables.
Fixed effects model:
Suppose the k-levels of the factor (treatments) under consideration are the only levels of interest and all these are included in the experiment by the investigator or out of a large number of classes, the k classes (treatments) in the model have been specifically chosen by the experimenter. In such a case αi’s the effect of the ith treatment [ αi = (µi - µ)] are fixed constants (unknown) and the model is a fixed effect model.
In the fixed effect model, the conclusions about the test of hypothesis regarding the parameters αi’s will apply only to k-treatments (factor levels) considered in the experiment.
These conclusions cannot be extended to other remaining treatments (factors) which are not considered in the experiment.
Random effects model:
Suppose we have a large number of classes (treatments) and we want to test, through an experiment if all these class effects are equal or not. Due to consideration of time, money or administrative convenience it may not be possible to include all the factor levels in the experiment. In such a situation, we take only a random sample of factor levels in the experiment and after studying and analysing the sample data, we draw conclusions which would be valid for all the factor levels whether included in the experiment or not. In such a situation the parameters αi’s in the model will not be fixed constants but will be random samples and the model is random effect model.
In the random effect model if the null hypothesis of the homogeneity of class (treatment) effects is rejected, then to test to test the difference between two class(treatments) effects we cannot apply the t-test because all treatments are not included in the experiment.
ANOVA FOR FIXED EFFECT MODEL The fixed effect or parametric model used is y
ij= µ
i+ ε
ij=
µ + α
i+ ε
ij(i=1,2…k; j=1,2…n
i) where α
i =µ
i -µ
wherey
ijis the yield from the ith row and jth column
µ is the general mean effect given by µ = ∑
𝑘𝑖=1𝑛iµ𝑖/𝑁
µ
iis the fixed effect due to ith treatment
α
iis the effect of the ith treatment given by
α
i =µ
i -µ
ε
ijis the error effect due to chance ASSUMPTIONS IN THE MODEL
i) all the observations
(y
ij’s )are independent and y
ij ~N(µ
i,σ
e2)
ii) different effects are additive in nature
iii) ε
ijare i.i.d., N(0, σ
e2)
under the third assumption, the model becomes
E( y
ij) = µ
i= µ + α
i(i=1,2…k; j=1,2…n
i) STATISITICAL ANALYSIS OF THE MODEL
Null hypothesis: we have to test the equality of the population means
.Hence the null hypothesis is given by
H
0= µ
1= µ
2= … = µ
k= µ
which reduces to H
0= α
1= α
2= … = α
k= 0 since α
i =µ
i -µ Alternate hypothesis:
H
1: atleast two of the means µ
1, µ
2, … µ
kare different.
let us write
t h a t n=h e . The total numbes ch tels s h =hk
Dn the oD way classifi Cauon t e Values
the1esponse
Vauiables
a eatfeeted by ioo aetoxs. acbors
For example a yceld milk may be atected bi
adfter.ences
entceatnmuit, (i.e) Tatuons
a shell
a s t adifferences en Vauey,
Cie)bued and stok t t e
CoiDs.J-et us
nouoauppose
thal t e n Cows a r eolivicled
2uto ndiffereut
o u p s
9-coups
oT c l a s s e sacoroug to
theibreed
a n d i t o d e s each7coups Containirg CouDs
and then Leb u s Consider tte
bect
eK treat
muutCje) T a h o n s
iven
abrancdom to
Cosen e c h
geoupP
on thayield milk
Let
V Yaeld malk rom The (oo jh breed
07
StoCk
Jed
on the ratioo iLeb
iv2,
l , 2 , . hVarieties of Couws
Row
TobalsRow
MeansCoUMN
MCANnsreatments
CRations 2 j h Vu)()JA (F va)le
1 Ti.
y.2
2 T.
9i.
J i
9inen
k .Column
G 9 . -2 . T n
otal
Two wAy ANDVA for fixed Fffect Mudel kt matbemalical malel i s íven
H + ( - H)+ (Hj -N)+ (PU -Hi.- Mj tH) +E)
Where = o = P
J-
- o -o
Qhd
t )
n t u ' s case t a enbevacton h e c t j = o and the
mode educes to
ALlkart
liw . 'S
a v ediffe rout
Hi2 AEuast wd M;S
a r edifferout
e a s t
adquare Estimates h parameters
Te
Lneas moclel isgiven by
y applying
thepuinupal q
luact
dquae
weqeb
2
E j
2
=FF (9j
-H - i-Pi)
The
normalequabon for estimaAig
= o
& I I (y5
- H -i - P )
dP
dE
ddi
OE=0-a (9j -H- i -B)
d P
d u n e ,
I Ai =o= I B
, Weget um th
akoveequaluonS
Wherve = M
j )
=H; - 0
Stat istical Analysis q a ized ect madal
Nul Nypo thesis CH1
h e t o atmuts ane tu vau.eties G lemen enius
Ho) h treat mut aTe uemopeueous Ho2 The vaudey are emogent ous
e ) , Ho) i. = Pa.
= .=
Ho . ) = M.2
=ALtesnateve Hy pothesis
Ht AE luas luwo h the Mi.
s a r ediffer eut HU At least lwo the H.j's are differ.cut
Ov theiy equaleuu
9-3
P y -
j-9
Tus tu inear model ewmes
9y 9..
+(9i-3)+ (9,j -..) + (9j -9. -.g +9.)
outiboning t e dum dquaeS,
(95)= FF (5-5.-9,+3.)+ (31. -5.)+|
(93-9
2
F (93 -3i.
-9. 3..)+ 5 5 (5, - +
2
9 - 3 . i j - 9 -Jj73.) + 4FE (Sj -y-)
(-9,-9j+9..)t 25 s (Y. -y.)(93-9.
Now
(9-9(y-9-35.)= 7|(-5)T(3--5,) i3 F(S-r (o-)-z(5-3}]=o
(9j-3..)=hx (5. -J.. kE (T;-.+F (0j-535
S-S Sv 1
Where
2
3,
=I (U;-y..) és te total
S.s2
Ot h ( . -7.Sis
S . sdue to treatmat
y (9 -9..) s tha S.s duu to
vauisties,and,
SE
2
T
T(Uj-y;, - y.; +9..)
sthe
e n o r oTJeseduals
deqrees of fcudom fo vauúous
S.sOTheing Cemputed in N=hk quautibies (yj-9.) Which are dukject to one inear toncbsaut F (U -9..)=o
Aall Cay CN-1)
S, Ch-1) ,
-..)
=oS (k-1) (j -9.) =o
OE C x - 1 ) - (z-1) - Ch-1) = (h-1)(k-)
Tus t e pootitioneng a f i s
a sfollows
hCk-) Ck-1)+ Ch-i)+ Ch-1) (v-1)
Which emplies that the d.f are addi tive.
Test dal istice
Mean S.s due to
Ereakmt& =
k-)
Mean S-s oduu t Yauiekies, = = Sy
h-)
Mean SS SE s
(h-1)Cx-)
Emy
Over J fom 1 to h and dividig by h, e feb
hua heti+B+ ]>
9 = P t .
Oves i fm 1 t and dividim by k ard using z; =o
we 3eb,
Oves and 3 oth and diveding by hk and usng
We dhall geb
9.. t E.
ANDVA TABIE foR TWO- WAY
CASSDF2ED
Sources
Ss d4
M-S.SVauance Patio CF)
Vaoioluen
SsA =M.3. S.A M.s.sF
Facbor A M.SSA -FCy-1C-1Xh-
s sA Ck-)-) CRDWS)
SsB S.s -M-S-sB Fp
Pactor B M-S S-B - F(h-1), (E-1)
Ss-B (h-1) h-)
C CojUMNs)
M-S. S E(h-)
(k-1) S.s-E =M-SSE
SSE (h-1) -1D Ch-1)
EroT
Tobal 9ss hie -)
Concuson
k-D.(h-1) Ck-1) We actapt o
otorise,
Fh-1), Ch-s) Ck-1), te actept Me2 bitoruse
ieject H»a.