• Tidak ada hasil yang ditemukan

Analysis of the Fixed Effects Model

The Central Limit Theorem

3.3 Analysis of the Fixed Effects Model

In this section, we develop the single-factor analysis of variance for the fixed effects model.

Recall that yi.represents the total of the observations under the ith treatment. Let represent the average of the observations under the ith treatment. Similarly, let y..represent the grand total of all the observations and represent the grand average of all the observations.

Expressed symbolically,

(3.3) whereNanis the total number of observations. We see that the “dot” subscript notation implies summation over the subscript that it replaces.

We are interested in testing the equality of the atreatment means; that is,E(yij) ii,i1, 2, . . . ,a. The appropriate hypotheses are

(3.4) In the effects model, we break the ith treatment mean i into two components such that ii. We usually think of as an overall mean so that

This definition implies that

That is, the treatment or factor effects can be thought of as deviations from the overall mean.1 Consequently, an equivalent way to write the above hypotheses is in terms of the treatment effects i, say

Thus, we speak of testing the equality of treatment means or testing that the treatment effects (thei) are zero. The appropriate procedure for testing the equality of atreatment means is the analysis of variance.

H1i Z 0 for at least one i H012Áa0

a

i1i0

a

i1i

a

H1i Z j for at least one pair (i,j) H012Áa

y..i1

a j1

n yij y..y../N

yi.

j1n yij yi.yi./n i1, 2, . . . , a y..

yi.

1For more information on this subject, refer to the supplemental text material for Chapter 3.

3.3.1 Decomposition of the Total Sum of Squares

The name analysis of varianceis derived from a partitioning of total variability into its com- ponent parts. The total corrected sum of squares

is used as a measure of overall variability in the data. Intuitively, this is reasonable because if we were to divide SSTby the appropriate number of degrees of freedom (in this case,an 1 N 1), we would have the sample varianceof the y’s. The sample variance is, of course, a standard measure of variability.

Note that the total corrected sum of squares SSTmay be written as

(3.5) or

However, the cross-product term in this last equation is zero, because

Therefore, we have

(3.6) Equation 3.6 is the fundamental ANOVA identity. It states that the total variability in the data, as measured by the total corrected sum of squares, can be partitioned into a sum of squares of the differences between the treatment averages and the grand average plus a sum of squares of the differences of observations withintreatments from the treatment average. Now, the difference between the observed treatment averages and the grand average is a measure of the differences between treatment means, whereas the differences of observations within a treatment from the treatment average can be due to only random error. Thus, we may write Equation 3.6 symbolically as

whereSSTreatmentsis called the sum of squares due to treatments (i.e., between treatments), and SSEis called the sum of squares due to error (i.e., within treatments). There areanNtotal observations; thus,SSThasN1 degrees of freedom. There are alevels of the factor (and a treatment means), so SSTreatmentshasa1 degrees of freedom. Finally, there are nreplicates within any treatment providing n1 degrees of freedom with which to estimate the experi- mental error. Because there are atreatments, we have a(n1)anaNadegrees of freedom for error.

It is instructive to examine explicitly the two terms on the right-hand side of the funda- mental ANOVA identity. Consider the error sum of squares

SSEi1

a j1

n (yijyi.)2i1

a

j1n (yijyi.)2

SSTSSTreatmentsSSE

a

i1j1

n (yijy..)2ni1

a (yi.y..)2i1

a j1

n (yijyi.)2

n

j1 (yijyi.)yi.nyi.yi.n(yi./n)0 2i1

a j1

n (yi.y..)(yijyi.)

a

i1j1

n (yijy..)2ni1

a (yi.y..)2i1

a j1

n (yijyi.)2

a

i1

j1n (yijy..)2

i1a

j1n [(yi.y..)(yijyi.)]2

SSTi1

a j1

n (yijy..)2

3.3 Analysis of the Fixed Effects Model

71

In this form, it is easy to see that the term within square brackets, if divided by n1, is the sample variance in the ith treatment, or

Now asample variances may be combined to give a single estimate of the common popula- tion variance as follows:

Thus,SSE/(Na) is apooled estimateof the common variance within each of the atreatments.

Similarly, if there were no differences between the atreatment means, we could use the variation of the treatment averages from the grand average to estimate 2. Specifically,

is an estimate of 2if the treatment means are equal. The reason for this may be intuitively seen as follows: The quantity estimates 2/n, the variance of the treatment averages, so must estimate 2if there are no differences in treatment means.

We see that the ANOVA identity (Equation 3.6) provides us with two estimates of 2—one based on the inherent variability within treatments and the other based on the variability between treatments. If there are no differences in the treatment means, these two estimates should be very similar, and if they are not, we suspect that the observed difference must be caused by differences in the treatment means. Although we have used an intuitive argument to develop this result, a somewhat more formal approach can be taken.

The quantities

and

are called mean squares. We now examine the expected values of these mean squares.

Consider

1

NaE

i1

a j1

n (y2ij2yijyi.y2i.)

E(MSE)E

NSSEa

N1 aE

i1

a j1

n (yijyi.)2

MSE SSE Na MSTreatmentsSSTreatments

a1 nai1(yi.y..)2/(a1)

ai1(yi.y..)2/(a1) SSTreatments

a1

ni1

a (yi.y..)2

a1

SSE (Na) (n1)S21(n1)S22Á(n1)S2a

(n1)(n1)Á(n1) i1

a

j1n (yijyi.)2

a

i1 (n1)

S2i j1

n (yijyi.)2

n1 i1, 2, . . . , a

Substituting the model (Equation 3.1) into this equation, we obtain

Now when squaring and taking expectation of the quantity within the brackets, we see that terms involving and are replaced by 2 and n2, respectively, because E(ij) 0.

Furthermore, all cross products involving ijhave zero expectation. Therefore, after squaring and taking expectation, the last equation becomes

or

By a similar approach, we may also show that2

Thus, as we argued heuristically,MSESSE/(Na) estimates 2, and, if there are no differ- ences in treatment means (which implies that i0), MSTreatmentsSSTreatments/(a1) also estimates2. However, note that if treatment means do differ, the expected value of the treat- ment mean square is greater than 2.

It seems clear that a test of the hypothesis of no difference in treatment means can be performed by comparing MSTreatmentsandMSE. We now consider how this comparison may be made.

3.3.2 Statistical Analysis

We now investigate how a formal test of the hypothesis of no differences in treatment means (H0:1 2. . .a, or equivalently, H0:12. . . a0) can be performed.

Because we have assumed that the errors ijare normally and independently distributed with mean zero and variance 2, the observations yijare normally and independently distributed with mean iand variance 2. Thus,SST is a sum of squares in normally distributed random variables; consequently, it can be shown thatSST/2is distributed as chi-square with N1 degrees of freedom. Furthermore, we can show that SSE/2is chi-square with Na degrees of freedom and that SSTreatments/2is chi-square with a1 degrees of freedom if the null hypothesis H0:i0 is true. However, all three sums of squares are not necessarily independent because SSTreatmentsandSSEadd toSST. The following theorem, which is a spe- cial form of one attributed to William G. Cochran, is useful in establishing the independence ofSSE andSSTreatments.

E(MSTreatments)2 ni1

a 2i

a1 E(MSE)2 E(MSE) 1

Na

N2n

i1a 2i N2N2ni1

a 2i a2

2i.

2ij

E(MSE) 1

NaE

i1

a j1

n (i ij)21ni1

a

j1

n iij

2

1

NaE

i1

a j1

n y2ij1ni1

a y2i.

1

NaE

i1

a j1

n y2ij2ni1

a y2i.n

i1a yi2.

3.3 Analysis of the Fixed Effects Model

73

2Refer to the supplemental text material for Chapter 3.

Because the degrees of freedom for SSTreatmentsandSSEadd toN1, the total number of degrees of freedom, Cochran’s theorem implies thatSSTreatments/2andSSE/2are independ- ently distributed chi-square random variables. Therefore, if the null hypothesis of no differ- ence in treatment means is true, the ratio

(3.7) is distributed as Fwitha1 and Nadegrees of freedom. Equation 3.7 is the test statis- ticfor the hypothesis of no differences in treatment means.

From the expected mean squares we see that, in general,MSE is an unbiased estimator of 2. Also, under the null hypothesis,MSTreatmentsis an unbiased estimator of 2. However, if the null hypothesis is false, the expected value of MSTreatmentsis greater than 2. Therefore, under the alternative hypothesis, the expected value of the numerator of the test statistic (Equation 3.7) is greater than the expected value of the denominator, and we should reject H0on values of the test statistic that are too large. This implies an upper-tail, one-tail critical region. Therefore, we should reject H0and conclude that there are differences in the treatment means if

whereF0is computed from Equation 3.7. Alternatively, we could use the P-value approach for decision making. The table of Fpercentages in the Appendix (Table IV) can be used to find bounds on the P-value.

The sums of squares may be computed in several ways. One direct approach is to make use of the definition

Use a spreadsheet to compute these three terms for each observation. Then, sum up the squares to obtain SST,SSTreatments, and SSE. Another approach is to rewrite and simplify the def- initions of SSTreatmentsandSSTin Equation 3.6, which results in

(3.8)

(3.9) and

(3.10) SSESSTSSTreatments

SSTreatments1

ni1

a y2i.yN2..

SSTi1

a j1

n y2ij yN2..

yijy..(yi.y..)(yijyi.) F0F ,a1,Na

F0 SSTreatments/(a1)

SSE/(Na) MSTreatments MSE

THEOREM 3-1