• Tidak ada hasil yang ditemukan

Properties of Regression Estimates

Dalam dokumen Applied Regression Analysis: A Research Tool (Halaman 104-109)

MULTIPLE REGRESSION IN MATRIX NOTATION

3.5 Properties of Regression Estimates

88 3. MULTIPLE REGRESSION IN MATRIX NOTATION

This result is based on the assumption that the linear model used is the correct model. If important independent variables have been omitted or if the functional form of the model is not correct, will not be the expectation ofY. Assuming that the model is correct, the joint probability density function ofY is given by

(2π)−n/2|Iσ2|1/2e(1/2){(Y)(Iσ2)1(Y)}

= (2π)−n/2σ−ne(1/2σ2)(Y)(Y). (3.38) Expressing β asβ = [(XX)1X]Y shows that the estimates of the

regression coefficients are linear funtions of the dependent variableY, with βVector the coefficients being given byA= [(XX)1X]. Since theXs are con-

stants, the matrixAis also constant. If the modelY =+is correct, the expectation ofY isand the expectation ofβis

E(β) = [(X X)1X]E(Y)

= [(XX)1X]

= [(XX)1XX]β

= β. (3.39)

This shows thatβ is an unbiased estimator of β if the chosen model is correct. If the chosen model isnot correct, sayE(Y) =+ instead of, then [(XX)1X]E(Y) does not necessarily simplify toβ.

Assuming that the model is correct,

Var(β) = [(X X)1X][Var(Y)][(XX)1X]

= [(XX)1X]Iσ2[(XX)1X].

Recalling that the transpose of a product is the product of transposes in reverse order [i.e., (AB) =BA], thatXX is symmetric, and that the inverse of a transpose is the transpose of the inverse, we obtain

Var(β) = (X X)1XX(XX)1σ2

= (XX)1σ2. (3.40)

Thus, the variances and covariances of the estimated regression coefficients are given by the elements of (XX)1 multiplied by σ2. The diagonal elements give the variances in the order in which the regression coefficients are listed inβand the off-diagonal elements give their covariances. When is normally distributed,βis also multivariate normally distributed. Thus, β∼N(β,(XX)1σ2). (3.41)

In the ozone example, Example 3.3, Example 3.7

3.5 Properties of Regression Estimates 89 (XX)1=

1.0755 9.4340

9.4340 107.8167

.

Thus, Var(β0) = 1.0755σ2 and Var(β1) = 107.8167σ2. The covariance be- tweenβ0andβ1is Cov(β01) =9.4340σ2.

Recall that the vector of estimated means Y is given by Y = [X(XX)1X]Y =P Y. Therefore, usingP X=X, the expectation ofY is

E(Y) =PE(Y) =P Xβ=. (3.42) Thus,Y is an unbiased estimator of the mean ofY for the particular values ofXin the data set, againif the model is correct. The fact thatP X=X can be verified using the definition ofP:

P X = [X(XX)1X]X

= X[(XX)1(XX)]

= X. (3.43)

The variance–covariance matrix ofY can be derived using either the rela- tionshipY =XβorY =P Y. Recall thatP =X(XX)1X. Applying the rules for variances of linear functions to the first relationship gives

Var(Y) = X[Var(β)]X

= X(XX)1Xσ2

= Pσ2. (3.44)

The derivation using the second relationship gives Var(Y) = P[Var(Y)]P

= P Pσ2

= Pσ2, (3.45)

sinceP is symmetric and idempotent. Therefore, the matrixP multiplied by σ2 gives the variances and covariances for all Yi. P is a large n×n matrix and at times only a few elements are of interest. The variances of any subset of theYi can be determined by using only the rows ofX, say Xr, that correspond to the data points of interest and applying the first derivation. This gives

Var(Yr) =Xr[Var(β)]X r=Xr(XX)1Xrσ2. (3.46)

90 3. MULTIPLE REGRESSION IN MATRIX NOTATION Whenis normally distributed,

Y ∼N(,Pσ2). (3.47)

Recall that the vector of residuals eis given by (IP)Y. Therefore, eVector the expectation ofeis

E(e) = (IP)E(Y) = (IP)

= (XP X)β= (XX)β=0, (3.48) where0is an1 vector of zeros. Thus, the residuals are random variables with mean zero.

The variance–covariance matrix of the residual vectoreis

Var(e) = (IP)σ2 (3.49) again using the result that (IP) is a symmetric idempotent matrix. If the vector of regression errorsis normally distributed, then the vector of regression residuals satisfies

e∼N(0,(IP)σ2). (3.50)

Prediction of a future random observation, Y0 = x0β+0 at a given PredictionY0

vector of independent variablesx0, is given byY0=x0β. It is easy to see that

Y0∼N(x0β, x0(XX)1x0σ2). (3.51) This result is used to construct confidence intervals for the meanx0β.

If the future0is assumed to be a normal random variable with mean zero and varianceσ2, and is independent of the historic errors of, then the prediction errorY0−Y0=x0(ββ) + 0satisfies

Y0−Y0∼N$

0,[1 +x0(XX)1x0]σ2%

. (3.52)

This result is used to construct a confidence interval for an individualY0 that we call a prediction interval forY0. Recall that the variance of (Y0−Y0) is denoted by Var(Ypred0).

The matrixP =X(XX)1Xwas computed for the ozone example in Example 3.8 Example 3.3. Thus, with some rounding of the elements inP,

Var(Y) = Pσ2

=



.741 .377 .086 −.205 .377 .283 .208 .132 .086 .208 .305 .402

−.205 .132 .402 .671



σ2.

3.5 Properties of Regression Estimates 91 The variance of the estimated mean ofY when the ozone level is .02 ppm is Var(Y1) = .741σ2. For the ozone level of .11 ppm, the variance of the estimated mean is Var(Y3) =.305σ2. The covariance between the two esti- mated means is Cov(Y1,Y3) =.086σ2.

The variance–covariance matrix of the residuals is obtained byVar(e) = (IP)σ2. Thus,

Var(e1) = (1−.741)σ2=.259σ2 Var(e3) = (1−.305)σ2=.695σ2 Cov(e1, e3) = Cov(Y1,Y3) =−.086σ2.

It is important to note that the variances of theleast squares residuals are not equal toσ2and the covariances are not zero. The assumption of equal variances and zero covariances applies to thei, not theei.

The variance of any particularYi and the variance of the corresponding Var(Yi)

Var(Yi) ei will always add toσ2because

Var(Y) = Var(Y+)

= Var(Y) +Var() +Cov(Y,) +Cov(,Y)

= Pσ2+ (IP)σ2+P(IP)σ2+ (IP)Pσ2

= Pσ2+ (IP)σ2

= Iσ2. (3.53)

Since variances cannot be negative, each diagonal element of P must be between zero and one: 0< vii<1.0, wherevii is theith diagonal element of P. Thus, the variance of any Yi is always less than σ2, the variance of the individual observations. This shows the advantage of fitting a con- tinuous response model, assuming the model is correct, over simply using the individual observed data points as estimates of the mean ofY for the given values of theXs. The greater precision from fitting a response model comes from the fact that each Yi uses information from the surrounding data points. The gain in precision can be quite striking. In Example 3.8, the precision obtained on the estimates of the means for the two intermediate levels of ozone using the linear response equation were.283σ2and.305σ2. To attain the same degree of precision without using the response model would have required more than three observations at each level of ozone.

Equation 3.53 implies that data points having low variance on Yi will Role by Xs have high variance onei and vice versa. Belsley, Kuh, and Welsch (1980)

show that the diagonal elements ofP,viican be interpreted as measures of distance of the corresponding data points from the center of theX-space (fromXin the case of one independent variable). Points that are far from the center of theX-space have relatively largeviiand, therefore, relatively

92 3. MULTIPLE REGRESSION IN MATRIX NOTATION

high variance on Yi and low variance on ei. The smaller variance of the residuals for the points far from the “center of the data” indicates that the fitted regression line or response surface tends to come closer to the observed values for these points. This aspect of P is used later to detect the more influential data points.

The variances (and covariances) have been expressed as multiples ofσ2. Controlling Precision The coefficients are determined entirely by theX matrix, a matrix of con-

stants that depends on the model being fit and the levels of the independent variables in the study. In designed experiments, the levels of the indepen- dent variables are subject to the control of the researcher. Thus, except for the magnitude ofσ2, the precision of the experiment is under the control of the researcher and can be known before the experiment is run. The effi- ciencies of alternative experimental designs can be compared by computing (XX)1andP for each design. The design giving the smallest variances for the quantities of interest would be preferred.

Dalam dokumen Applied Regression Analysis: A Research Tool (Halaman 104-109)