Designs with One Source of Variation
3.5 One-Way Analysis of Variance
42 3 Designs with One Source of Variation
Fig. 3.2 Residuals under the full and reduced models whenH0is false
1 2 i
yit
y1.
y2.
e13 e11
e23 e24
Residuals; full model
1 2 i
yit
y..
e13 e11
e23
e24
Residuals; reduced model
Yi t =μ+τ+0i t, 0i t ∼N(0,σ2) , 0i ts are mutually independent,
t =1, . . . ,ri, i =1, . . . , v,
where we write0i t for the(i t)th error variable in the reduced model. To calculate the sum of squares for error,ssE0, we need to determine the value ofμ+τthat minimizes the sum of squared errors
i
t
(yi t −μ−τ)2.
Using calculus, the reader is asked to show in Exercise 7 that the unique least squares estimate ofμ+τ is the sample mean of all the observations; that is,μˆ+ ˆτ=y... Therefore, the error sum of squares for the reduced model is
ssE0=
i
t
(yi t −y..)2
=
i
t
yi t2 −n y2... (3.5.10)
If the null hypothesisH0: {τ1=τi = · · · =τv}is false, and the treatment effects differ, the sum of squares for errorssEunder the full model (3.3.1) is considerably smaller than the sum of squares for errorssE0for the reduced model. This is depicted in Fig.3.2. On the other hand, if the null hypothesis is true, thenssE0andssEwill be very similar. The analysis of variance test is based on the difference ss E0−ssE, relative to the size ofssE; that is, the test is based on(ssE0−ssE)/ssE. We would want to rejectH0if this quantity is large.
We callssT=ssE0−ssEthesum of squares for treatmentsor thetreatment sum of squares, since its value depends on the differences between the treatment effects. Using formulas (3.5.10) and (3.4.5) forss E0andssE, the treatment sum of squares is
3.5 One-Way Analysis of Variance 43
ssT =ssE0−ssE (3.5.11)
=
i
t
yi t2−n y..2
−
i
t
yi t2−
i
riyi2.
=
i
riy2i.−n y..2. (3.5.12)
An equivalent formulation is
ssT =
i
ri(yi.−y..)2. (3.5.13)
The reader is invited to multiply out the parentheses in (3.5.13) and verify that (3.5.12) is obtained.
There is a shortcut method of expanding (3.5.13) to obtain (3.5.12). First write down each term inyand square it. Then associate with each squared term the signs in (3.5.13). Finally, precede each term with the summations and constant outside the parentheses in (3.5.13). This quick expansion will work for all terms like (3.5.13) in this book. Formula (3.5.13) is probably the easier form ofssTto remember, while (3.5.12) is easier to manipulate for theoretical work and use for computations.
Since we will rejectH0ifssT/ssEis large, we need to know what “large” means. This in turn means that we need to know the distribution of the corresponding random variableSST/SSEwhenH0is true, where
SST =
i
ri(Yi.−Y..)2 and SSE=
i
t
(Yi t −Yi.)2. (3.5.14)
Now, as mentioned in Sect.3.4.6, it can be shown thatS S E/σ2has a chi-squared distribution with n−vdegrees of freedom, denoted byχ2n−v. Similarly, it can be shown that whenH0is true,SST/σ2has aχ2v−1distribution, and thatSSTandSSEare independent. The ratio of two independent chi-squared random variables, each divided by their degrees of freedom, has anFdistribution. Therefore, ifH0is true, we have
SST/σ2(v−1)
SSE/σ2(n−v) ∼Fv−1,n−v.
We now know the distribution ofSST/SSEmultiplied by the constant(n−v)/(v−1), and we want to reject the null hypothesis H0: {τ1= · · · =τv}in favor of the alternative hypothesisHA:{at least two of the treatment effects differ} if this ratio is large. Thus, if we writemsT=ssT/(v−1),msE= ssE/(n−v), wheressTandssEare the observed values of the treatment sum of squares and error sum of squares, respectively, our decision rule is to
rejectH0 if msT
msE >Fv−1,n−v,α, (3.5.15) whereFv−1,n−v,αis the critical value from theFdistribution withv−1 andn−vdegrees of freedom withαin the right-hand tail. The probabilityαis often called thesignificance levelof the test and is the probability of rejectingH0when in fact it is true (a Type I error). Thus,αshould be selected to be small if it is important not to make a Type I error (α=0.01 and 0.001 are typical choices); otherwise, αcan be chosen to be a little larger (α=0.10 and 0.05 are typical choices). Critical valuesFv−1,n−v,α
for theFdistribution are given in Table A.6. Due to lack of space, only a few typical values ofαhave been tabulated.
44 3 Designs with One Source of Variation
Table 3.4 One-way analysis of variance table
Source of variation Degrees of freedom Sum of squares Mean square Ratio Expected mean square
Treatments v-1 ssT v−ssT1 msTms E σ2+Q(τi)
Error n-v ssE nss E−v σ2
Total n-1 sstot
Computational formulae ssT=
iriy2i.−n y2.. ssE=
i
ty2i t−
iriy2i. sstot=
i
tyi t2−n y2..
Q(τi)=
iri(τi−
hrhτh/n)2/(v−1)
The calculations involved in the test of the hypothesis H0against HA are usually written as an analysis of variance tableas shown in Table3.4. The last line shows thetotal sum of squaresandtotal degrees of freedom. The total sum of squares,sstot, is(n−1)times the sample variance of all of the data values. Thus,
sstot=
i
t
(yi t−y..)2=
i
t
yi t2 −n y..2. (3.5.16)
From (3.5.10), we see thatsstot happens to be equal toss E0 for the one-way analysis of variance model, and from (3.5.11) we see that
sstot =ssT+ssE.
Thus, the total sum of squares consists of a partssT that is explained by differences between the treatment effects and a partssEthat is not explained by any of the parameters in the model.
Example 3.5.1 Battery experiment, continued
Consider the battery experiment introduced in Sect.2.5.2, p. 24. The sum of squares for error was calculated in Example3.4.2, p. 40, to bessE=28,412.5. The life per unit cost responses and treatment averages are given in Table3.3, p. 41. From these, we haveyi t2 =6,028,288,y..=590.125, and ri =4. Hence, the sums of squaresssT(3.5.12) andsstot(3.5.16) are
ssT =
riy2i.−n y..2
=4(570.752+860.502+433.002+496.252)−16(590.125)2
=427,915.25,
sstot=ssE0 = yi t2−n y..2
=6,028,288−16(590.125)2 = 456,327.75, and we can verify thatsstot=ssT+ssE.
The decision rule for testing the null hypothesis H0 : {τ1 =τ2 =τ3 =τ4}that the four battery types have the same average life per unit cost against the alternative hypothesis that at least two of the battery types differ, at significance levelα, is
rejectH0ifmsT/msE=60.24>F3,12,α.
3.5 One-Way Analysis of Variance 45
Table 3.5 One-way analysis of variance table for the battery experiment
Source of variation Degrees of freedom Sum of squares Mean square Ratio p-value
Type 3 427,915.25 142,638.42 60.24 0.0001
Error 12 28,412.50 2,367.71
Total 15 456,327.75
From Table A.6, it can be seen that 60.24>F3,12,αfor any of the tabulated values ofα. For example, if αis chosen to be 0.01, thenF3,12,0.01=5.95. Thus, for any tabulated choice ofα, the null hypothesis is rejected, and it is concluded that at least two of the battery types differ in mean life per unit cost. In order to investigate which particular pairs of battery types differ, we would need to calculate confidence
intervals. This will be done in Chap.4.
3.5.2 Use of p-Values
The p-valueof a test is the smallest choice ofαthat would allow the null hypothesis to be rejected.
For convenience, computer packages usually print the p-value as well as the ratiomsT/msE. Having information about thep-value saves looking upFv−1,n−v,αin Table A.6. All we need to do is to compare the p-value with our selected value ofα. Therefore, the decision rule for testingH0 : {τ1 = · · ·τv} againstHA:{not all ofτi’s are equal} can be written as
rejectH0if p<α.
Example 3.5.2 Battery experiment, continued
In the battery experiment of Example3.5.1, the null hypothesisH0: {τ1=τ2=τ3=τ4}that the four battery types have the same average life per unit cost was tested against the alternative hypothesis that they do not. The p-value generated by SAS software for the test is shown in Table3.5as p=0.0001.
A value of 0.0001 in the SAS computer output indicates that thep-value is less than or equal to 0.0001.
Smaller values are not printed explicitly. Ifαwere chosen to be 0.01, then the null hypothesis would
be rejected, since p<α.