APPEndix 3A - [Wooldridge J.M] Introductory econometrics. A mode(z lib.org)

3a.1 Derivation of the First order Conditions in equation (3.13)

The analysis is very similar to the simple regression case. We must characterize the solutions to the problem

b0min,b₁p,b_k a

i511yi2b02b1xi12p2bkxik2².

Taking the partial derivatives with respect to each of the bj (see Appendix A), evaluating them at the solutions, and setting them equal to zero gives

22_aⁿ

i511yi2 b^₀2 b^₁xi12p2 b^_kxik2 50 22_aⁿ

i51xij1yi2 b^₀2 b^₁xi12p2 b^_kxik2 50, for all j51, p, k.

Canceling the 22 gives the first order conditions in (3.13).

3a.2 Derivation of equation (3.22)

To derive (3.22), write x_i1 in terms of its fitted value and its residual from the regression of x₁ on x2, p, x_k: x_i15x^i11r^i1, for all i51, p, n. Now, plug this into the second equation in (3.13):

i511x^i11r^i12 1yi2 b^₀2 b^₁xi12p2 b^_kxik2 50. [3.63]

By the definition of the OLS residual uî, since xî1 is just a linear function of the explanatory variables xi2, p, x_ik, it follows that gⁿi51xî1uî50. Therefore, equation (3.63) can be expressed as

i51r^i11yi2 b^₀2 b^₁xi12p2 b^_kxik2 50. [3.64]

Since the rî1 are the residuals from regressing x₁ on x₂,c, x_k,gⁿi51 xijrî150, for all j52, …, k. Therefore, (3.64) is equivalent to gⁿi51rî11yi2 b^₁xi12 50. Finally, we use the fact that gⁿi51xî1rî150, which means that b^₁ solves

i51r^i11yi2 b^₁r^i12 50.

Now, straightforward algebra gives (3.22), provided, of course, that gⁿi51r^²i1.0; this is ensured by Assumption MLR.3.

3a.3 proof of Theorem 3.1

We prove Theorem 3.1 for b^₁; the proof for the other slope parameters is virtually identical. (See Appendix E for a more succinct proof using matrices.) Under Assumption MLR.3, the OLS estimators exist, and we can write b^₁ as in (3.22). Under Assumption MLR.1, we can write y_i as in (3.32); substitute this for y_i in (3.22). Then, using gⁿⁱ⁵¹rî150, gⁿⁱ⁵¹x_ijrî150, for all j52, p, k, and gⁿⁱ⁵¹xi1rî15gⁿⁱ⁵¹r^²i1, we have

b^₁5 b11a_aⁿ

i51r^i1u_ib^^aa

i51r^²i1b. [3.65]

Now, under Assumptions MLR.2 and MLR.4, the expected value of each u_i, given all independent variables in the sample, is zero. Since the r^i1 are just functions of the sample independent variables, it follows that

E1b^₁0X2 5 b11 a_aⁿ

i51r^i1E1u_i0X2 b^^aa

n i51r^²i1b 5 b11 a_aⁿ

i51r^i1 # 0b^^aa

i51r^²i1b5 b1,

where X denotes the data on all independent variables and E1b^₁0X2 is the expected value of b^₁, given xi1, p, x_ik, for all i51, p, n. This completes the proof.

3a.4 General omitted variable Bias

We can derive the omitted variable bias in the general model in equation (3.31) under the first four Gauss-Markov assumptions. In particular, let the b^_j, j50, 1, p, k be the OLS estimators from the regression using the full set of explanatory variables. Let the b|

j, j50, 1, p, k21 be the OLS estimators from the regression that leaves out xk. Let d|

j, j51, p, k21 be the slope coefficient on xj in the auxiliary regression of xik on xi1, xi2, p xi,k21, i51, p, n. A useful fact is that

j5 b^_j1 b^_kd|

j. [3.66]

This shows explicitly that, when we do not control for xk in the regression, the estimated partial effect of xj equals the partial effect when we include xk plus the partial effect of xk on y^ times the partial relationship between the omitted variable, xk, and xj, j , k. Conditional on the entire set of explanatory variables, X, we know that the b^_j are all unbiased for the corresponding bj, j51, p, k.

Further, since |d

j is just a function of X, we have E1b|

j0X2 5E1b^_j0X2 1E1b^_k0X2|d

5 bj1 b_k|d

j. [3.67]

Equation (3.67) shows that b|

j is biased for bj unless bk50—in which case xk has no partial effect in the population—or |d

j equals zero, which means that xik and xij are partially uncorrelated in the sample. The key to obtaining equation (3.67) is equation (3.66). To show equation (3.66), we can use equation (3.22) a couple of times. For simplicity, we look at j51. Now, b|

1 is the slope coefficient in the simple regression of yi on r|_i1, i51, p, n, where the r|_i1 are the OLS residuals from the regression of xi1 on xi2, xi3, p, xi,k21. Consider the numerator of the expression for b|

1: gⁿi51|r_i1yi. But for each i, we can write yi5 b^₀1 b^₁xi11p1 b^_kxik1u^i and plug in for yi. Now, by properties of the OLS residuals, the r|_i1 have zero sample average and are uncorrelated with xi2, xi3, p, xi,k21 in the sample. Similarly, the u^i have zero sample average and zero sample correlation

with x_i1, x_i2,c, x_ik. It follows that the r|_i1 and u^i are uncorrelated in the sample (since the r|_i1 are just linear combinations of x_i1, x_i2,c, x_i,k21). So

i51|r_i1y_i5 b^₁a_aⁿ

i51|r_i1x_i1b1 b^_ka_aⁿ

i51|r_i1x_ikb. [3.68]

Now, gⁿi51|r_i1x_i15 gⁿi51|r²_i1, which is also the denominator of b|

1. Therefore, we have shown that b|

15 b^₁1 b^_ka_aⁿ

i51|r_i1x_ikb^^aa

n i51|r²_i1,b 5 b^₁1 b^_k|d

1. This is the relationship we wanted to show.

3a.5 proof of Theorem 3.2

Again, we prove this for j51. Write b^₁ as in equation (3.65). Now, under MLR.5, Var1ui0X2 5 s², for all i51,c, n. Under random sampling, the ui are independent, even conditional on X, and the r^_i1 are nonrandom conditional on X. Therefore,

Var1b^₁0X2 5a_aⁿ

i51r^²i1 Var1ui0X2 b^^aa

n i51r^²i1b² 5a_aⁿ

i51r^²i1s²b^^aa

i51r^²i1b²5 s²^^aa

n i51r^²i1b.

Now, since gⁿⁱ⁵¹r^²i1 is the sum of squared residuals from regressing x1 on x2,p, xk, gⁿⁱ⁵¹r^²i15SST1112R²12. This completes the proof.

3a.6 proof of Theorem 3.4

We show that, for any other linear unbiased estimator b|

1 of b₁, Var1b|

12 $Var1b^₁2, where b^₁ is the OLS estimator. The focus on j51 is without loss of generality.

For b|

1 as in equation (3.60), we can plug in for y_i to obtain b|

15 b₀_aⁿ

i51w_i11 b₁_aⁿ

i51w_i1x_i11 b₂_aⁿ

i51w_i1x_i21p1 b_k_aⁿ

i51w_i1x_ik1 _aⁿ

i51w_i1u_i. Now, since the w_i1 are functions of the x_ij,

E1b|

10X2 5 b₀_aⁿ

i51w_i11 b₁_aⁿ

i51w_i1x_i11 b₂_aⁿ

i51w_i1x_i21p1 b_k_aⁿ

i51w_i1x_ik1 _aⁿ

i51w_i1E1u_i0X2 5 b₀_aⁿ

i51w_i11 b₁_aⁿ

i51w_i1x_i11 b₂_aⁿ

i51w_i1x_i21p1 b_k_aⁿ

i51w_i1x_ik

because E1u_i0X2 50, for all i51,p, n under MLR.2 and MLR.4. Therefore, for E1b|

10X2 to equal b₁ for any values of the parameters, we must have

i51w_i150, _aⁿ

i51w_i1x_i151,_aⁿ

i51w_i1x_ij50, j52,p, k. [3.69]

Now, let r^i1 be the residuals from the regression of x_i1 on x_i2,p, x_ik. Then, from (3.69), it follows that a

i51w_i1r_i151 [3.70]

because xi15xî11rî1 and gⁿⁱ⁵¹wi1xî150. Now, consider the difference between Var1b|

10X2 and Var1b^₁0X2 under MLR.1 through MLR.5:

s²_aⁿ

i51w²i12 s²^^aa

i51r^²i1b. [3.71]

Because of (3.70), we can write the difference in (3.71), without s², as a

i51w²i12a_i51_aⁿ wi1r^i1b

2^^aa

i51r^²i1b. [3.72]

But (3.72) is simply

i511wi12 g^1r^i12², [3.73]

where g^151gⁿⁱ⁵¹wi1rî12/1gⁿⁱ⁵¹r^²i12, as can be seen by squaring each term in (3.73), summing, and then canceling terms. Because (3.73) is just the sum of squared residuals from the simple regression of wi1 on rî1—remember that the sample average of rî1 is zero—(3.73) must be nonnegative. This completes the proof.

105

c h a p t e r 4

Multiple Regression Analysis: Inference

T

his chapter continues our treatment of multiple regression analysis. We now turn to the problem of testing hypotheses about the parameters in the population regression model. We begin in Section 4-1 by finding the distributions of the OLS estimators under the added assumption that the population error is normally distributed. Sections 4-2 and 4-3 cover hypothesis testing about indi- vidual parameters, while Section 4-4 discusses how to test a single hypothesis involving more than one parameter. We focus on testing multiple restrictions in Section 4-5 and pay particular attention to determining whether a group of independent variables can be omitted from a model.

4-1 Sampling Distributions of the OLS Estimators

Up to this point, we have formed a set of assumptions under which OLS is unbiased; we have also derived and discussed the bias caused by omitted variables. In Section 3-4, we obtained the variances of the OLS estimators under the Gauss-Markov assumptions. In Section 3-5, we showed that this variance is smallest among linear unbiased estimators.

Knowing the expected value and variance of the OLS estimators is useful for describing the precision of the OLS estimators. However, in order to perform statistical inference, we need to know more than just the first two moments of b^_j; we need to know the full sampling distribution of the b^_j. Even under the Gauss-Markov assumptions, the distribution of b^_j can have virtually any shape.

When we condition on the values of the independent variables in our sample, it is clear that the sampling distributions of the OLS estimators depend on the underlying distribution of the errors. To make the sampling distributions of the b^_j tractable, we now assume that the unobserved error is nor- mally distributed in the population. We call this the normality assumption.

Assumption MLR.6 Normality

The population error u is independent of the explanatory variables x1, x2,p, xk and is normally distributed with zero mean and variance s²: u,Normal10,s²2.

Assumption MLR.6 is much stronger than any of our previous assumptions. In fact, since u is independent of the x_j under MLR.6, E1u0x1,p, x_k25E1u2 50 and Var1u0x1,p, x_k2 5Var1u25 s². Thus, if we make Assumption MLR.6, then we are necessarily assuming MLR.4 and MLR.5. To emphasize that we are assuming more than before, we will refer to the full set of Assumptions MLR.1 through MLR.6.

For cross-sectional regression applications, Assumptions MLR.1 through MLR.6 are called the classical linear model (CLM) assumptions. Thus, we will refer to the model under these six assumptions as the classical linear model. It is best to think of the CLM assumptions as containing all of the Gauss-Markov assumptions plus the assumption of a normally distributed error term.

Under the CLM assumptions, the OLS estimators b^₀, b^₁, p, b^_k have a stronger efficiency property than they would under the Gauss-Markov assumptions. It can be shown that the OLS estimators are the minimum variance unbiased estimators, which means that OLS has the smallest variance among unbiased estimators; we no longer have to restrict our comparison to estimators that are linear in the y_i. This property of OLS under the CLM assumptions is discussed further in Appendix E.

A succinct way to summarize the population assumptions of the CLM is y0x,Normal1b₀1 b₁x11 b₂x21p1 b_kxk,s²2,

where x is again shorthand for 1x1, p, x_k2. Thus, conditional on x, y has a normal distribution with mean linear in x₁, p, x_k and a constant variance. For a single independent variable x, this situation is shown in Figure 4.1.

f(ylx)

E(y|x) 5 ₀ 1 ₁x x2

normal distributions

x Figure 4.1 The homoskedastic normal distribution with a single explanatory variable.

The argument justifying the normal distribution for the errors usually runs something like this:

Because u is the sum of many different unobserved factors affecting y, we can invoke the central limit theorem (CLT) (see Appendix C) to conclude that u has an approximate normal distribution. This argument has some merit, but it is not without weaknesses. First, the factors in u can have very different distributions in the population (for example, ability and quality of schooling in the error in a wage equation). Although the CLT can still hold in such cases, the normal approximation can be poor depending on how many factors appear in u and how different their distributions are.

A more serious problem with the CLT argument is that it assumes that all unobserved factors affect y in a separate, additive fashion. Nothing guarantees that this is so. If u is a complicated function of the unobserved factors, then the CLT argument does not really apply.

In any application, whether normality of u can be assumed is really an empirical matter. For example, there is no theorem that says wage conditional on educ, exper, and tenure is normally distributed. If anything, simple reasoning suggests that the opposite is true: since wage can never be less than zero, it cannot, strictly speaking, have a normal distribution. Further, because there are minimum wage laws, some fraction of the population earns exactly the minimum wage, which also violates the normality assumption. Nevertheless, as a practical matter, we can ask whether the conditional wage distribution is “close” to being normal. Past empirical evidence suggests that normality is not a good assumption for wages.

Often, using a transformation, especially taking the log, yields a distribution that is closer to normal. For example, something like log(price) tends to have a distribution that looks more normal than the distribution of price. Again, this is an empirical issue. We will discuss the consequences of nonnormality for statistical inference in Chapter 5.

There are some applications where MLR.6 is clearly false, as can be demonstrated with simple introspection. Whenever y takes on just a few values it cannot have anything close to a normal distribution. The dependent variable in Example 3.5 provides a good example. The variable narr86, the number of times a young man was arrested in 1986, takes on a small range of integer values and is zero for most men. Thus, narr86 is far from being normally distributed. What can be done in these cases? As we will see in Chapter 5—and this is important—nonnormality of the errors is not a serious problem with large sample sizes. For now, we just make the normality assumption.

Normality of the error term translates into normal sampling distributions of the OLS estimators:

Normal SampliNg DiStributioNS

Under the CLM assumptions MLR.1 through MLR.6, conditional on the sample values of the independent variables,

b^_j,Normal3bj,Var1b^_j2 4, [4.1]

where Var1b^_j2 was given in Chapter 3 [equation (3.51)]. Therefore, 1b^_j2 bj2/sd1b^_j2 ,Normal10,12.

Theorem

Dalam dokumen [Wooldridge J.M] Introductory econometrics. A mode(z lib.org) (Halaman 125-131)