• Tidak ada hasil yang ditemukan

Estimation of Pure Error

Dalam dokumen Applied Regression Analysis: A Research Tool (Halaman 159-165)

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

Case 1. A simple hypothesis

4.7 Estimation of Pure Error

144 4. ANALYSIS OF VARIANCEAND QUADRATIC FORMS

TABLE 4.8.Replicate yield data for soybeans exposed to chronic levels of ozone and estimates of pure error. (Data courtesy A. S. Heagle, North Carolina State University.)

Ozone Level (ppm)

.02 .07 .11 .15

238.3 235.1 236.2 178.7 270.7 228.9 208.0 186.0 210.0 236.2 243.5 206.9 248.7 255.0 233.0 215.3 242.4 228.9 233.0 219.5 Yi 242.02 236.82 230.74 201.28 s2i 476.61 114.83 179.99 325.86

variables are involved. In addition, apparent replicates in the observational data may not, in fact, be true replicates due to important variables having been overlooked. Pseudoreplication ornear replicationis sometimes used with observational data to estimateσ2. These are sets of observations in which the values of the independent variables fall within a relatively narrow range.

To illustrate the estimation of pure error, the ozone example used in Example 4.14 Example 1.1 is used. The four observations used in that section were the

means of five replicate experimental units at each level of ozone from a completely random experimental design. The full data set, the treatment means, and the estimates of pure error within each ozone level are given in Table 4.8.

Each s2 is estimated from the variance among the five observations for each ozone level, with 4 degrees of freedom, and is an unbiased estimate of σ2. Since each is the variation ofYijaboutYifor a given level of ozone, the estimates are in no way affected by the form of the response model that might be chosen to represent the response of yield to ozone. Figure 4.2 illustrates that the variation among the replicate observations for a given level of ozone is unaffected by the form of the regression line fit to the data.

The best estimate ofσ2is the pooled estimate s2 =

(ni1)s2i

(ni1) =4(476.61) +· · ·+ 4(325.86) 16

= 274.32

with 16 degrees of freedom, whereni= 4, i=1, 2, 3, 4.

The analysis of variance for the completely random design is given (Ta- ble 4.9) to emphasize thats2is the experimental error from that analysis.

The previous regression analysis (Section 1.4, Tables 1.3 and 1.4) used the

4.7 Estimation of Pure Error 145

FIGURE 4.2.Comparison of “pure error” and “deviations from regression” using the data on soybean response to ozone.

TABLE 4.9. The analysis of variance for the completely random experimental design for the yield response of soybean to ozone.

Source d.f. SS MS

Total(corr) 19 9366.61

Treatments 3 4977.47 1659.16 Regression 1 3956.31 3956.31 Lack of Fit 2 1021.16 510.58 Pure Error 16 4389.14 274.32

146 4. ANALYSIS OF VARIANCEAND QUADRATIC FORMS

treatment means (ofr= 5 observations). Thus, the sums of squares from that analysis have to be multiplied by r = 5 to put them on a “per ob- servation” basis. That analysis of variance, Table 1.4, partitioned the sum of squares among the four treatment means into 1 degree of freedom for the linear regression ofY on ozone level and 2 degrees of freedom forlack of fitof linear regression. The middle three lines of Table 4.9 contain the results from the original analysis multiplied byr= 5. The numbers differ slightly due to rounding the original means to whole numbers.

The expectations of the mean squares in the analysis of variance show what function of the parameters each mean square is estimating. The mean square expectations for the critical lines in Table 4.9 are

E[MS(Regr)] = σ2+β21 x2i,

E[MS(Lack of fit)] = σ2+ (Model bias)2, (4.68) E[MS(Pure error)] = σ2.

Recall that

x2i is used to indicate the corrected sum of squares of the independent variable.

The square on “model bias” emphasizes that any inadequacies in the model cause this mean square to be larger, in expectation, thanσ2. Thus, the “lack of fit” mean square is an unbiased estimate ofσ2only if the linear model is correct. Otherwise, it is biased upwards. On the other hand, the

“pure error” estimate ofσ2obtained from the replication in the experiment is unbiased regardless of whether the assumed linear relationship is correct.

The mean square expectation of MS(Regr) is shown as if the linear model relating yield to ozone level is correct. If the model is not correct (for exam- ple, if the treatment differences are not due solely to ozone differences), the second term in E[MS(Regr)] will include contributions from all variables that are correlated with ozone levels. This is the case even if the variables have not been identified. The advantage of controlled experiments such as this ozone study is that amount of ozone is, presumably, the only variable changing consistently over the ozone treatments. Random assignment of treatments to the experimental units should destroy any correlation be- tween ozone level and any incidental environmenal variable. Thus, treat- ment differences in this controlled study can be attributed to the effects of ozone andE[MS(Regr)] should not be biased by the effects of any uncon- trolled variables. One should not overlook, however, this potential for bias in the regression sum of squares, particularly when observational data are being analyzed.

The independent estimate of pure error, experimental error, provides the Adequacy of the Model basis for two important tests of significance. Theadequacy of the model

can be checked by testing the null hypothesis that “model bias” is zero. Any inadequacies in the linear model will make this mean square larger than

4.7 Estimation of Pure Error 147 σ2 on the average. Such inadequacies could include omitted independent variables as well as any curvilinear response to ozone.

In the ozone example, Example 4.14, the test of the adequacy of the Example 4.15 linear model is

F = MS(Lack of fit)

MS(Pure error) = 510.58 274.32 = 1.86,

which, if the model is correct, is distributed asF with 2 and 16 degrees of freedom. Comparison against the critical valueF(.05,2,16)= 3.63 shows this to be nonsignificant, indicating that there is no evidence in these data that the linear model is inadequate for representing the response of soybean to ozone.

The second hypothesis of interest isH0:β1= 0 against the alternative H0:β1= 0 hypothesis Ha : β1 = 0. If the fitted model is not adequate, then the

parameterβ1 may not have the same interpretation as when the model is adequate. Therefore, when the model is not adequate, it does not make sense to testH0:β1= 0.

Suppose that the fitted model is adequate and we are interested in testing H0 : β1 = 0. The ratio of regression mean square to an estimate of σ2 provides a test of this hypothesis. The mean square expectations show that both mean squares estimateσ2 when the null hypothesis is true and that the numerator becomes increasingly larger asβ1 deviates from zero. One estimate ofσ2is, again, the pure error estimate or experimental error.

For the ozone example, a test statistic for testingH0:β1= 0 is Example 4.16

F = MS(Regr)

MS(Pure error) = 3,956.31

274.32 = 14.42.

Comparing this to the critical value forα=.01,F(.01,1,16)= 8.53, indicates that the null hypothesis that β1= 0 should be rejected. This conclusion differs from that of the analysis in Chapter 1 becauseσ2is now estimated with many more degrees of freedom. As a result, the test has more power for detecting departures from the null hypothesis.

Note that, if the model is truly adequate, then the mean square for lack of fit is also an estimate ofσ2. A pooled estimate ofσ2is given by the sum of SS(Lack of fit) and SS(Pure error) divided by the sum of the corresponding degrees of freedom.

For the ozone example, consider the analysis of variance given in Ta- Example 4.17

148 4. ANALYSIS OF VARIANCEAND QUADRATIC FORMS TABLE 4.10.The analysis of variance for the ozone data.

Source d.f. SS MS

Total(corr) 19 9,366.61

Regression 1 3,956.31 3,956.31

Error 18 5,410.30 300.57

Lack of Fit 2 1,021.16 510.58 Pure Error 16 4,389.14 274.32

ble 4.10. Based on the pooled error, a test statistic for testingH0:β1= 0 is

F = MS(Regression)

MS(Error) = 3,956.31

300.57 = 13.16.

Comparing this to the critical value forα=.01,F(.01,1,18)= 8.29, indicates that H0 :β1 = 0 should be rejected. This F-statistic coincides with the F-statistic given in Chapter 1 for testingH0: β1= 0 in the modelYi = β0+β1Xi+iwhen all of the data in Table 4.8 (instead of only the means, Table 1.1) are used. This test statistic is more powerful than that based on the MS(Pure error). However, if the fitted model is inadequate, then MS(Error) is no longer an unbiased estimate ofσ2, whereas MS(Pure error) is even if the fitted model is not adequate.

Finally, a composite test forH0:β1= 0 and that the model is adequate is given by

F = [SS(Regression) + SS(Lack of fit)]/(1 + 2) MS(Pure Error)

= (3,956.31 + 1,021.16)/3

274.32 = 1659.16

274.32

= 6.05.

Comparing this to the critical value forα=.01,F(.01,3,16)= 3.24, indicates that either the model is not adequate orβ1is not zero. This is equivalent to testing the null hypothesis of no treatment effects in the analysis of variance which is discussed in Chapter 9.

In summary, multiple, statistically independent observations on the de- pendent variable for given values of all relevant independent variables is called true replication. True replication provides for an unbiased estimate ofσ2that is not dependent on the model being used. The estimate of pure error provides a basis for testing the adequacy of the model. True replica- tion should be designed into all studies where possible and the pure error estimate ofσ2, rather than a residual mean square estimate, used for tests of significance and standard errors.

Dalam dokumen Applied Regression Analysis: A Research Tool (Halaman 159-165)