Results of the Final Model - INDEPENDENT VARIABLES

INDEPENDENT VARIABLES

5.4 Results of the Final Model

170 5. CASE STUDY: FIVE INDEPENDENT VARIABLES

TABLE 5.6.Results of the regression of BIOMASS on the two independent vari- ables pH, and K (Linthurst data).

Variable βj s(βj) t Partial SS pH 412.04 48.50 8.50 11,611,782

K −0.487 0.203 −2.40 924,266 Analysis of variance

Source d.f. Sum of Squares Mean Square

Total 44 19,170,963

Regression 2 12,414,654 6,207,327 Residual 42 6,756,309 160,865

very important variable may appear as insigniﬁcant if the model contains a correlated variable and, conversely, an otherwise unimportant variable may take on false signiﬁcance.

The contribution ofSALINITY in the three-variable model is even smaller A 2-Variable Model than it was beforeZn was dropped and is far from being signiﬁcant. The

next step is to dropSALINITY from the model. In this particular example, one would not have been misled by eliminating bothSALINITY andZn at the previous step. This is not true in general.

The two-variable model containing pH andK gives the results in Ta- ble 5.6. Since the partial sums of squares for bothpH andKare significant, the simplification of the model will stop with this two-variable model. The degree to which the linear model consisting of the two variablespH andK accounts for the variability inBIOMASS isR²=.65, only slightly smaller than theR²=.68 obtained with the original five-variable model.

5.4 Results of the Final Model 171 corrected sum of squares ofBIOMASS. The square root ofR²is the simple correlation betweenBIOMASS andY:

r(Y, Y) = √

.65 =.80.

The estimate ofσ²from this ﬁnal model iss²= 160,865 with (n−p) = s²(β) 42 degrees of freedom. The variance–covariance matrix for the regression

coeﬃcients is

s²(β) = (X X)⁻¹s²



 .4865711 −.0663498 −.0001993

−.0663498 .0146211 −.0000012

−.0001993 −.0000012 .00000026



(160,865)



 78,272 −10,673 −32.0656

−10,673 2,352.0 −0.18950

−32.0656 −0.18950 0.04129



.

The square roots of the diagonal elements give the standard errors of the estimated regression coeﬃcients in the order in which they are listed inβ.

In this model,

β= (β0 β2 β3).

Thus, the standard errors of the estimated regression coeﬃcients are s(β0) =

78,272 = 280 s(β2) =

2,352.0 = 48.5 (5.6)

s(β3) = √

.04129 =.2032.

The regression coefficients for pH andK are significantly different from zero as shown by thet-test (Table 5.6). The critical value of Student’stis t_(.05/2,42)= 2.018. (The interceptβ₀=−507.0 is not significantly different from zero,t=−1.81, and if one had reason to believe that β0should be zero the intercept could be dropped from the model.)

Theunivariate 95% confidence interval estimatesof the regression coef- Univariate Confidence Intervals ficients (Section 4.6.1),

β_j ± t_(.05/2,42)s(β_j) are −1,072< β0 <58

314< β2 <510

−.898< β3 <−.077.

The value of Student’stfor these intervals ist(.05/2,42)= 2.018. The conﬁ- dence coeﬃcient of .95 applies to each interval statement.

172 5. CASE STUDY: FIVE INDEPENDENT VARIABLES

The Bonferroni confidence intervals (Section 4.6.2), using a joint Bonferroni Confidence Intervals confidence coefficient of .95, are

−1,206< β0 <192 291< β2 <533

−.995< β3 < .021.

The joint conﬁdence of 1−αis obtained by using the value of Student’st forα^∗=α/2p: t(.05/(2×3),42)= 2.50.

The Bonferroni intervals are necessarily wider than the univariate confidence intervals to allow for the fact that the confidence coefficient of .95 applies to the statement that all three intervals contain their true regression coefficients. In this example, the Bonferroni interval for β3 overlaps zero whereas the univariate 95% confidence interval did not.

The 95% joint confidence region for the three regression coefficients is Joint Confidence Region determined from the quadratic inequality shown in equation 4.60 (Sec-

tion 4.6.3). This three-dimensional 95% confidence ellipsoid is shown in Figure 5.1 for the Linthurst data. The outer box in Figure 5.1 is the Scheffé 95% confidence region. The inner box in the figure is the Bonferroni confidence region.

The ellipsoid in Figure 5.1 has been constructed using 19 cross-sectional planes in each of the three dimensions. The cross-sectional slices were chosen equally spaced and such that the most extreme in each direction coin- cided with a side of the Bonferroni box. These extreme slices and areas of the ellipsoid that extend beyond have been darkened to clearly show the portions of the joint conﬁdence ellipsoid that extend beyond the Bonferroni box. Although the ellipsoid extends beyond the Bonferroni box in several areas, it is clear that the ellipsoid takes less volume of the parameter space to ensure 95% conﬁdence in this example.

The sides of the Scheffé box (Figure 5.1) are tangent to the confidence ellipsoid and, consequently, the Scheffé box completely contains the ellipsoid. It can be shown in this particular example that the volume of the Bonferroni box is approximately 63% of the volume of the Scheffé box.

To more clearly show the shape of the joint confidence ellipsoid, the slices created by two sides of the Bonferroni box and the midplane in one dimension have been projected onto the floor in Figure 5.2. The slices show that the ellipsoid is very flattened in one dimension and clearly illustrate the strong interdependence among the regression coefficients as to what constitutes “acceptable” values of the parameters. Also inscribed on the floor is the two-dimensional 95% confidence ellipse calculated from the 2×2 variance–covariance matrix of β2 and β3 ignoring β0. This shows that the two-dimensional confidence ellipse isnota projection of the three- dimensional confidence ellipsoid.

The general shape of the confidence region can be seen from the three- dimensional figure. However, it is very difficult to read the parameter values

5.4 Results of the Final Model 173

FIGURE 5.1.Three-dimensional 95% joint confidence region (ellipsoid) for β0, β2, and β3. The intersection of the Bonferroni confidence intervals (inner box) and the intersection of the Scheffé confidence intervals (outer box).

FIGURE 5.2.Three-dimensional 95% joint conﬁdence region forβ0,β2, and β3

showing projections of three 2-dimensional slices, corresponding to three values ofβ0, onto the floor. The three values ofβ0 chosen to define the slices were the midpoint and the limits of the 95% Bonferroni confidence interval forβ0.

174 5. CASE STUDY: FIVE INDEPENDENT VARIABLES

corresponding to any particular point in the ﬁgure. Furthermore, the joint conﬁdence ellipsoid for more than three parameters cannot be pictured.

A more useful presentation of the joint confidence region is obtained by plotting two-dimensional “slices” through the ellipsoid for pairs of parameters of particular interest. This is done by evaluating the joint confidence equation at specific values of the other parameters. Three such two-dimensional ellipses forβ2andβ3are those shown in Figure 5.2. These slices help picture the three-dimensional ellipsoid but they are not to be interpreted individually as joint confidence regions forβ₂andβ₃.

Alternatively, one can determine the two-dimensional 95% joint confidence region for β2 andβ3 ignoringβ0. This region is also shown in Fig- ure 5.2 as the larger ellipse on the floor of the figure. In this case,β2 and β3are only slightly negatively correlated so that the two-dimensional joint confidence region is only slightly elliptical. The very elliptical slices from the original joint confidence region show that the choice ofβ2andβ3for a given value of β₀ are more restricted than the two-dimensional joint confidence region would lead one to believe. This illustrates the information obscured by confidence intervals or regions that do not take into account the joint distribution of the full set of parameter estimates.

Two-dimensional slices through the joint confidence region in another direction, for given values ofβ2, and the two-dimensional confidence region for β0 andβ3 ignoring β2 are shown in Figure 5.3. The strong negative correlation betweenβ₀andβ₃is evident in the two-dimensional joint confidence region and the slices from the three-dimensional region. Again, it is clear that reasonable combinations of β0 andβ3 are dependent on the assumed value ofβ2, a result that is not evident from the two-dimensional joint confidence region ignoringβ2.

Y and efor this example are not given. They are easily computed as Y1 and s²(Y1) shown in Table 5.2. Likewise, s²(Y) = Ps² and s²(e) = (I−P)s² are

not given; each is a 45×45 matrix. Computation of Yi and its variance is illustrated using the ﬁrst data point. Each Yi is computed using the corresponding row vector from X, which is designated x_i. For the ﬁrst observation,

x₁ = ( 1 5.00 1,441.67 ). Thus,

Y1 = x₁β

= ( 1 5.00 1,441.67 )



 −506.9774 412.0392

−.4871



= 850.99.

The variance of Y₁, used as an estimate of the mean aerial BIOMASS at this speciﬁc level ofpH (X2) andK (X3), iss²(Y1) =v11s², wherev11 is

5.4 Results of the Final Model 175

FIGURE 5.3.Two-dimensional slices of the joint confidence region for three val- ues of β2 and the joint confidence region for β0 and β3 ignoringβ2 (shown in dashed line). The arrows indicate the limits of the intersection of the Bonferroni confidence intervals forβ0 andβ3.

the ﬁrst diagonal element fromP. The ith diagonal element ofP can be obtained individually asvii=x_i(XX)⁻¹xi. Or, the variance for any one Y_iis obtained as the variance of a linear function ofβ. Thus,

s²(Y1) = x₁[s²(β)]x 1

= ( 1 5.00 1,441.67 )



 78,272 −10,673 −32.0656

−10,673 2,352.0 −.18950

−32.0656 −.18950 .04129







 1 1,441.675.00





= 20,978.78.

Its standard error is

s(Y1) =

20,978.78 = 144.8.

IfY₁is used as a prediction of a future observationY₀at the speciﬁed level x₁, then the variance of the prediction error is the variance ofY1increased bys²= 160,865. This accounts for the variability of the random variable being predicted. This gives

s²(Ypred1) = s²(Y0−Y1)

= 20,979 + 160,865 = 181,843

176 5. CASE STUDY: FIVE INDEPENDENT VARIABLES or the standard error of prediction is

s(Ypred1) =

181,843 = 426.4.

The residual for the ﬁrst observation is e₁ and s²(e₁) e1 = Y1−Y1= 676−850.99 =−174.99.

The estimated variance ofe1is

s²(e1) = (1−v11)s².

Sinces²(Y1) =v11s²has already been computed,s²(e1) is easily obtained as

s²(e₁) = s²−s²(Y₁)

= 160,865−20,979 = 139,886.

The standard error is

s(e1) =

139,886 = 374.0.

These variances are used to compute conﬁdence interval estimates for each of the corresponding parameters. Student’st has 42 degrees of freedom, the degrees of freedom in the estimate of σ². For illustration, the 95% conﬁdence interval estimate of the meanBIOMASS production when

pH= 5.00 andK= 1,441.67 ppm,E(Y₁), is Conﬁdence

Intervals on E(Yi) Y₁ ± t_(.05/2,42)s(Y₁)

850.99 ± (2.018)(144.8), which becomes

558.7 <E(Y1)< 1,143.3.

These results indicate that, with 95% conﬁdence, the true meanBIOMASS forpH= 5.00 andK = 1,441.67 is between 559 and 1,143 gm⁻².

If we wish to predict theBIOMASSproductionY0atx0=x1(pH= 5.00 Prediction Intervals forY₀ andK= 1,441.67), then a 95% prediction interval forY₀is given by

Y1 ± t(.025,42)s(Y0−Y1), which gives

−9.60< Y0<1,711.5.

Dalam dokumen Applied Regression Analysis: A Research Tool (Halaman 185-192)