INDEPENDENT VARIABLES
5.4 Results of the Final Model
170 5. CASE STUDY: FIVE INDEPENDENT VARIABLES
TABLE 5.6.Results of the regression of BIOMASS on the two independent vari- ables pH, and K (Linthurst data).
Variable βj s(βj) t Partial SS pH 412.04 48.50 8.50 11,611,782
K −0.487 0.203 −2.40 924,266 Analysis of variance
Source d.f. Sum of Squares Mean Square
Total 44 19,170,963
Regression 2 12,414,654 6,207,327 Residual 42 6,756,309 160,865
very important variable may appear as insignificant if the model contains a correlated variable and, conversely, an otherwise unimportant variable may take on false significance.
The contribution ofSALINITY in the three-variable model is even smaller A 2-Variable Model than it was beforeZn was dropped and is far from being significant. The
next step is to dropSALINITY from the model. In this particular example, one would not have been misled by eliminating bothSALINITY andZn at the previous step. This is not true in general.
The two-variable model containing pH andK gives the results in Ta- ble 5.6. Since the partial sums of squares for bothpH andKare significant, the simplification of the model will stop with this two-variable model. The degree to which the linear model consisting of the two variablespH andK accounts for the variability inBIOMASS isR2=.65, only slightly smaller than theR2=.68 obtained with the original five-variable model.
5.4 Results of the Final Model 171 corrected sum of squares ofBIOMASS. The square root ofR2is the simple correlation betweenBIOMASS andY:
r(Y, Y) = √
.65 =.80.
The estimate ofσ2from this final model iss2= 160,865 with (n−p) = s2(β) 42 degrees of freedom. The variance–covariance matrix for the regression
coefficients is
s2(β) = (X X)−1s2
=
.4865711 −.0663498 −.0001993
−.0663498 .0146211 −.0000012
−.0001993 −.0000012 .00000026
(160,865)
=
78,272 −10,673 −32.0656
−10,673 2,352.0 −0.18950
−32.0656 −0.18950 0.04129
.
The square roots of the diagonal elements give the standard errors of the estimated regression coefficients in the order in which they are listed inβ.
In this model,
β= (β0 β2 β3).
Thus, the standard errors of the estimated regression coefficients are s(β0) =
78,272 = 280 s(β2) =
2,352.0 = 48.5 (5.6)
s(β3) = √
.04129 =.2032.
The regression coefficients for pH andK are significantly different from zero as shown by thet-test (Table 5.6). The critical value of Student’stis t(.05/2,42)= 2.018. (The interceptβ0=−507.0 is not significantly different from zero,t=−1.81, and if one had reason to believe that β0should be zero the intercept could be dropped from the model.)
Theunivariate 95% confidence interval estimatesof the regression coef- Univariate Confidence Intervals ficients (Section 4.6.1),
βj ± t(.05/2,42)s(βj) are −1,072< β0 <58
314< β2 <510
−.898< β3 <−.077.
The value of Student’stfor these intervals ist(.05/2,42)= 2.018. The confi- dence coefficient of .95 applies to each interval statement.
172 5. CASE STUDY: FIVE INDEPENDENT VARIABLES
The Bonferroni confidence intervals (Section 4.6.2), using a joint Bonferroni Confidence Intervals confidence coefficient of .95, are
−1,206< β0 <192 291< β2 <533
−.995< β3 < .021.
The joint confidence of 1−αis obtained by using the value of Student’st forα∗=α/2p: t(.05/(2×3),42)= 2.50.
The Bonferroni intervals are necessarily wider than the univariate con- fidence intervals to allow for the fact that the confidence coefficient of .95 applies to the statement that all three intervals contain their true regres- sion coefficients. In this example, the Bonferroni interval for β3 overlaps zero whereas the univariate 95% confidence interval did not.
The 95% joint confidence region for the three regression coefficients is Joint Confidence Region determined from the quadratic inequality shown in equation 4.60 (Sec-
tion 4.6.3). This three-dimensional 95% confidence ellipsoid is shown in Figure 5.1 for the Linthurst data. The outer box in Figure 5.1 is the Scheff´e 95% confidence region. The inner box in the figure is the Bonferroni confi- dence region.
The ellipsoid in Figure 5.1 has been constructed using 19 cross-sectional planes in each of the three dimensions. The cross-sectional slices were cho- sen equally spaced and such that the most extreme in each direction coin- cided with a side of the Bonferroni box. These extreme slices and areas of the ellipsoid that extend beyond have been darkened to clearly show the portions of the joint confidence ellipsoid that extend beyond the Bonferroni box. Although the ellipsoid extends beyond the Bonferroni box in several areas, it is clear that the ellipsoid takes less volume of the parameter space to ensure 95% confidence in this example.
The sides of the Scheff´e box (Figure 5.1) are tangent to the confidence ellipsoid and, consequently, the Scheff´e box completely contains the ellip- soid. It can be shown in this particular example that the volume of the Bonferroni box is approximately 63% of the volume of the Scheff´e box.
To more clearly show the shape of the joint confidence ellipsoid, the slices created by two sides of the Bonferroni box and the midplane in one dimension have been projected onto the floor in Figure 5.2. The slices show that the ellipsoid is very flattened in one dimension and clearly illustrate the strong interdependence among the regression coefficients as to what constitutes “acceptable” values of the parameters. Also inscribed on the floor is the two-dimensional 95% confidence ellipse calculated from the 2×2 variance–covariance matrix of β2 and β3 ignoring β0. This shows that the two-dimensional confidence ellipse isnota projection of the three- dimensional confidence ellipsoid.
The general shape of the confidence region can be seen from the three- dimensional figure. However, it is very difficult to read the parameter values
5.4 Results of the Final Model 173
FIGURE 5.1.Three-dimensional 95% joint confidence region (ellipsoid) for β0, β2, and β3. The intersection of the Bonferroni confidence intervals (inner box) and the intersection of the Scheff´e confidence intervals (outer box).
FIGURE 5.2.Three-dimensional 95% joint confidence region forβ0,β2, and β3
showing projections of three 2-dimensional slices, corresponding to three values ofβ0, onto the floor. The three values ofβ0 chosen to define the slices were the midpoint and the limits of the 95% Bonferroni confidence interval forβ0.
174 5. CASE STUDY: FIVE INDEPENDENT VARIABLES
corresponding to any particular point in the figure. Furthermore, the joint confidence ellipsoid for more than three parameters cannot be pictured.
A more useful presentation of the joint confidence region is obtained by plotting two-dimensional “slices” through the ellipsoid for pairs of pa- rameters of particular interest. This is done by evaluating the joint con- fidence equation at specific values of the other parameters. Three such two-dimensional ellipses forβ2andβ3are those shown in Figure 5.2. These slices help picture the three-dimensional ellipsoid but they are not to be interpreted individually as joint confidence regions forβ2andβ3.
Alternatively, one can determine the two-dimensional 95% joint confi- dence region for β2 andβ3 ignoringβ0. This region is also shown in Fig- ure 5.2 as the larger ellipse on the floor of the figure. In this case,β2 and β3are only slightly negatively correlated so that the two-dimensional joint confidence region is only slightly elliptical. The very elliptical slices from the original joint confidence region show that the choice ofβ2andβ3for a given value of β0 are more restricted than the two-dimensional joint con- fidence region would lead one to believe. This illustrates the information obscured by confidence intervals or regions that do not take into account the joint distribution of the full set of parameter estimates.
Two-dimensional slices through the joint confidence region in another direction, for given values ofβ2, and the two-dimensional confidence region for β0 andβ3 ignoring β2 are shown in Figure 5.3. The strong negative correlation betweenβ0andβ3is evident in the two-dimensional joint con- fidence region and the slices from the three-dimensional region. Again, it is clear that reasonable combinations of β0 andβ3 are dependent on the assumed value ofβ2, a result that is not evident from the two-dimensional joint confidence region ignoringβ2.
Y and efor this example are not given. They are easily computed as Y1 and s2(Y1) shown in Table 5.2. Likewise, s2(Y) = Ps2 and s2(e) = (I−P)s2 are
not given; each is a 45×45 matrix. Computation of Yi and its variance is illustrated using the first data point. Each Yi is computed using the corresponding row vector from X, which is designated xi. For the first observation,
x1 = ( 1 5.00 1,441.67 ). Thus,
Y1 = x1β
= ( 1 5.00 1,441.67 )
−506.9774 412.0392
−.4871
= 850.99.
The variance of Y1, used as an estimate of the mean aerial BIOMASS at this specific level ofpH (X2) andK (X3), iss2(Y1) =v11s2, wherev11 is
5.4 Results of the Final Model 175
FIGURE 5.3.Two-dimensional slices of the joint confidence region for three val- ues of β2 and the joint confidence region for β0 and β3 ignoringβ2 (shown in dashed line). The arrows indicate the limits of the intersection of the Bonferroni confidence intervals forβ0 andβ3.
the first diagonal element fromP. The ith diagonal element ofP can be obtained individually asvii=xi(XX)−1xi. Or, the variance for any one Yiis obtained as the variance of a linear function ofβ. Thus,
s2(Y1) = x1[s2(β)]x 1
= ( 1 5.00 1,441.67 )
78,272 −10,673 −32.0656
−10,673 2,352.0 −.18950
−32.0656 −.18950 .04129
1 1,441.675.00
= 20,978.78.
Its standard error is
s(Y1) =
20,978.78 = 144.8.
IfY1is used as a prediction of a future observationY0at the specified level x1, then the variance of the prediction error is the variance ofY1increased bys2= 160,865. This accounts for the variability of the random variable being predicted. This gives
s2(Ypred1) = s2(Y0−Y1)
= 20,979 + 160,865 = 181,843
176 5. CASE STUDY: FIVE INDEPENDENT VARIABLES or the standard error of prediction is
s(Ypred1) =
181,843 = 426.4.
The residual for the first observation is e1 and s2(e1) e1 = Y1−Y1= 676−850.99 =−174.99.
The estimated variance ofe1is
s2(e1) = (1−v11)s2.
Sinces2(Y1) =v11s2has already been computed,s2(e1) is easily obtained as
s2(e1) = s2−s2(Y1)
= 160,865−20,979 = 139,886.
The standard error is
s(e1) =
139,886 = 374.0.
These variances are used to compute confidence interval estimates for each of the corresponding parameters. Student’st has 42 degrees of free- dom, the degrees of freedom in the estimate of σ2. For illustration, the 95% confidence interval estimate of the meanBIOMASS production when
pH= 5.00 andK= 1,441.67 ppm,E(Y1), is Confidence
Intervals on E(Yi) Y1 ± t(.05/2,42)s(Y1)
or
850.99 ± (2.018)(144.8), which becomes
558.7 <E(Y1)< 1,143.3.
These results indicate that, with 95% confidence, the true meanBIOMASS forpH= 5.00 andK = 1,441.67 is between 559 and 1,143 gm−2.
If we wish to predict theBIOMASSproductionY0atx0=x1(pH= 5.00 Prediction Intervals forY0 andK= 1,441.67), then a 95% prediction interval forY0is given by
Y1 ± t(.025,42)s(Y0−Y1), which gives
−9.60< Y0<1,711.5.