PDF Analysis of Variance (Anova) Introduction

(1)

ANALYSIS OF VARIANCE (ANOVA) INTRODUCTION

The test of significance based on t-distribution is an adequate procedure only for testing the significance of the difference between two sample means. In a situation when we have three or more samples to consider at a time an alternative procedure is needed for testing the hypothesis that all the samples are drawn from the same population I.e., the means are equal.

For example, three types of fertilisers are applied to five plots each and their yields on each of the plot is given as follows

plots Yield of wheat in tons fertiliser

A

Fertiliser B

fertiliser C

1 20 18 25

2 21 20 25

3 23 17 25

4 16 15 25

5 20 25 25

Mean 100/5=20 95/5=19 125/5=25

We have to study if the effect of these fertilisers on the yield is significantly different, or in other words the samples are from the same population. The answer to this is provided by the technique of analysis of variance.

The basic purpose of analysis of variance is to test the homogeneity of several means.

The term ‘Analysis of variance’ was introduced by Prof. R.A. Fisher in 1920’s.

Variation is inherent in nature,

The total variation in any set of numerical data is due to a number of causes which may be calculated as i) assignable causes and ii) chance causes

The variation due to assignable causes can be detected and measured whereas the chance causes is beyond the control of human and cannot be traced.

Examples of assignable causes of variation

Inappropriate procedures, substandard raw materials, measurement errors, temperature etc.,

(2)

DEFINITION:

According to Prof. R.A. Fisher, Analysis of Variance (ANOVA) is the “Separation of variance ascribable to one group of causes from the variance ascribable to other group”

The ANOVA consists in the estimation of the amount of variation due to each of the independent factors (causes) separately and then comparing these estimates due to assignable factors(causes), with the estimate due to chance factor (causes). The later being known as experimental error.

ASSUMPTIONS FOR ANOVA TEST

ANOVA test is based on the test statistics F (variance Ratio)

For the validity of the F-test in ANOVA, the following assumptions are made:

i) The observations are independent,

ii) Parent population from which observations are taken is normal, and iii) Various treatment and environmental effects are additive in nature.

IMPORTANCE:

ANOVA technique enables us to compare several population means simultaneously and thus results in lot of savings in time and money

The origin of ANOVA technique lies in agricultural experiments but it finds its applications in almost all types of design of experiments in various diverse fields such as in industry, education, psychology, business etc.,

The ANOVA technique is not designed to test the equality of several population variances. It’s objective is to test the equality of several population means or the homogeneity of several independent sample means.

In addition to testing homogeneity of several sample means, the ANOVA technique is now frequently applied in testing the linearity of the fitted regression line or the significance of the correlation ratio.

MATHEMATICAL MODEL

FIXED EFFECT MODEL AND RANDOM EFFECT MODELS

A fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities.

In random effects model in which all or some of the model parameters are random variables.

(3)

Fixed effects model:

Suppose the k-levels of the factor (treatments) under consideration are the only levels of interest and all these are included in the experiment by the investigator or out of a large number of classes, the k classes (treatments) in the model have been specifically chosen by the experimenter. In such a case αi’s the effect of the i^th treatment [ αi = (µi - µ)] are fixed constants (unknown) and the model is a fixed effect model.

In the fixed effect model, the conclusions about the test of hypothesis regarding the parameters αi’s will apply only to k-treatments (factor levels) considered in the experiment.

These conclusions cannot be extended to other remaining treatments (factors) which are not considered in the experiment.

Random effects model:

Suppose we have a large number of classes (treatments) and we want to test, through an experiment if all these class effects are equal or not. Due to consideration of time, money or administrative convenience it may not be possible to include all the factor levels in the experiment. In such a situation, we take only a random sample of factor levels in the experiment and after studying and analysing the sample data, we draw conclusions which would be valid for all the factor levels whether included in the experiment or not. In such a situation the parameters αi’s in the model will not be fixed constants but will be random samples and the model is random effect model.

In the random effect model if the null hypothesis of the homogeneity of class (treatment) effects is rejected, then to test to test the difference between two class(treatments) effects we cannot apply the t-test because all treatments are not included in the experiment.

(4)

(5)

ANOVA FOR FIXED EFFECT MODEL The fixed effect or parametric model used is y

ij

= µ

i

+ ε

ij

=

µ + α

i

+ ε

ij

(i=1,2…k; j=1,2…n

i

) where α

i =

µ

i -

µ

where

y

ij

is the yield from the ith row and jth column

µ is the general mean effect given by µ = ∑

^𝑘_𝑖=1

𝑛iµ𝑖/𝑁

µ

i

is the fixed effect due to ith treatment

α

i

is the effect of the ith treatment given by

α

i =

µ

i -

µ

ε

ij

is the error effect due to chance ASSUMPTIONS IN THE MODEL

i) all the observations

(

y

ij’s )

are independent and y

ij ~

N(µ

i,

σ

e2

)

ii) different effects are additive in nature

(6)

iii) ε

ij

are i.i.d., N(0, σ

e2

)

under the third assumption, the model becomes

E( y

ij

) = µ

i

= µ + α

i

(i=1,2…k; j=1,2…n

i

) STATISITICAL ANALYSIS OF THE MODEL

Null hypothesis: we have to test the equality of the population means

.

Hence the null hypothesis is given by

H

0

= µ

1

= µ

2

= … = µ

k

= µ

which reduces to H

0

= α

1

= α

2

= … = α

k

= 0 since α

i =

µ

i -

µ Alternate hypothesis:

H

1

: atleast two of the means µ

1

, µ

2

, … µ

k

are different.

let us write

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

t h a t n=h e . The total numbes ch tels s h =hk

Dn the oD way classifi Cauon t e Values

^the

1esponse

Vauiables

a e

atfeeted by ioo aetoxs. acbors

For example a yceld milk may be atected bi

adfter.ences

^en

tceatnmuit, (i.e) Tatuons

^{a s}

hell

^{a s}^{t a}

differences en Vauey,

^Cie)

bued and stok t t e

^CoiDs.

J-et us

^nouo

auppose

thal t e ⁿ Cows ^{a r e}

olivicled

^{2uto n}

differeut

o u p s

9-coups

^oT c l a s s e s

acoroug to

^thei

breed

a n d i t o d e s each

7coups Containirg CouDs

and then ^Leb ^{u s} ^Consider tte

bect

^e

K treat

^muut

Cje) T a h o n s

iven

^ab

rancdom to

^Cos

en e c h

geoupP

^on ^tha

yield milk

Let

V Yaeld malk rom The (oo jh breed

07

StoCk

Jed

^on^the^ratiooⁱ

Leb

iv2,

^{l , 2 ,} ^{. h}

(16)

Varieties of Couws

Row

^Tobals

Row

^Means

CoUMN

^MCANns

reatments

CRations 2 j h Vu)()JA (F va)le

1 Ti.

y.2

2 T.

9i.

J i

9in

en

^{k .}

Column

G 9 . -2 . T n

otal

Two wAy ANDVA for fixed Fffect Mudel kt matbemalical malel i s íven

H + ( - H)+ (Hj -N)+ (PU -Hi.- Mj tH) +E)

Where = o = P

J-

- o -o

Qhd

t )

n t u ' s case t a enbevacton h e c t j = o and the

mode educes to

(17)

ALlkart

liw . 'S

^{a v e}

diffe rout

Hi2 AEuast wd M;S

^{a r e}

differout

e a s t

adquare Estimates h parameters

Te

Lneas moclel isgiven by

y applying

^the

puinupal q

luact

dquae

^we

qeb

2

E j

2

=FF (9j

^-H^- ⁱ

-Pi)

The

^normal

equabon for estimaAig

= o

& I I (y5

^{- H -}

i - P )

dP

dE

ddi

OE=0-a (9j -H- i -B)

d P

d u n e ,

I Ai =o= I B

^, ^We

get um th

^akove

equaluonS

(18)

Wherve = M

j )

=H; - 0

Stat istical Analysis q a ized ect madal

Nul Nypo thesis CH1

h e t o atmuts ane tu vau.eties G lemen enius

Ho) h treat mut aTe uemopeueous Ho2 The vaudey are emogent ous

e ) , Ho) i. = Pa.

^{= .}

=

Ho . ) = M.2

⁼

ALtesnateve Hy pothesis

Ht AE luas luwo h the Mi.

^s ^{a r e}

differ eut HU At least lwo the H.j's are differ.cut

Ov theiy equaleuu

(19)

9-3

P y -

j-9

Tus tu inear model ewmes

9y 9..

⁺

(9i-3)+ (9,j -..) + (9j -9. -.g +9.)

outiboning t e dum dquaeS,

(95)= FF (5-5.-9,+3.)+ (31. -5.)+|

(93-9

2

F (93 -3i.

^-

9. 3..)+ 5 5 (5, - +

2

9 - 3 . i j - 9 -Jj73.) + 4FE (Sj -y-)

(-9,-9j+9..)t 25 s (Y. -y.)(93-9.

(20)

Now

(9-9(y-9-35.)= 7|(-5)T(3--5,) i3 F(S-r (o-)-z(5-3}]=o

(9j-3..)=hx (5. -J.. kE (T;-.+F (0j-535

S-S Sv 1

Where

2

3,

⁼

I (U;-y..) és te total

^S.s

2

Ot h ( . -7.Sis

^{S . s}

due to treatmat

y (9 -9..) s tha S.s duu to

vauisties,

and,

SE

2

T

T(Uj-y;, - y.; +9..)

^s

the

e n o r oT

Jeseduals

(21)

deqrees of fcudom fo vauúous

^S.s

OTheing Cemputed in N=hk quautibies (yj-9.) Which are dukject to one inear toncbsaut F (U -9..)=o

Aall Cay CN-1)

S, Ch-1) ,

^-

..)

^=o

S (k-1) (j -9.) =o

OE C x - 1 ) - (z-1) - Ch-1) = (h-1)(k-)

Tus t e pootitioneng a f i s

^{a s}

follows

hCk-) Ck-1)+ Ch-i)+ Ch-1) (v-1)

Which emplies that the d.f are addi tive.

Test dal istice

Mean S.s due to

Ereakmt& =

k-)

Mean S-s oduu t Yauiekies, = = Sy

h-)

(22)

Mean SS SE s

(h-1)Cx-)

Emy

Over J fom 1 to h and dividig by h, e feb

hua heti+B+ ]>

9 = P t .

Oves i fm 1 t and dividim by k ard using z; =o

we 3eb,

Oves and 3 oth and diveding by hk and usng

We dhall geb

9.. t E.

(23)

ANDVA TABIE foR TWO- WAY

^CASSDF2

ED

Sources

Ss d4

M-S.S

Vauance Patio CF)

Vaoioluen

SsA =M.3. S.A M.s.sF

Facbor A M.SSA -FCy-1C-1Xh-

s sA Ck-)-) CRDWS)

SsB S.s -M-S-sB Fp

Pactor B M-S S-B - F(h-1), (E-1)

Ss-B (h-1) h-)

C CojUMNs)

^{M-S. S E}

(h-)

(k-1) S.s-E =M-SSE

SSE (h-1) -1D Ch-1)

EroT

Tobal 9ss hie -)

Concuson

k-D.(h-1) Ck-1) We actapt o

otorise,

Fh-1), Ch-s) Ck-1), te actept Me2 bitoruse

ieject H»a.