The linear statistical model: consistency of OLS
We will show the properties of the OLS estimator in two scenarios: first under the assumption that the regressors X are non-stochastic and then assuming they are stochastic.
One of the assumptions in the linear statistical model y=Xβ+u
is that X is a (n×k) non-stochastic matrix. The idea is that X remains fixed in repeated samples, i.e., if we have two samples of size n each, the values of the matrix X will remain constant across samples while the values of y will change.
For example, suppose we are regressing hourly wages (y) onto a constant, age, and gender. Then if n=6 for example, we could have something like
y’=(12, 11, 9, 8, 12, 8)
⎟⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟
⎠ ⎞
⎜⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜
⎝ ⎛
=
0 27 1
0 26 1
0 25 1
1 27 1
1 26 1
1 25 1
X
in our first sample, and something like
y’=(11, 7, 12, 12, 9, 9)
⎟⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟
⎠ ⎞
⎜⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜
⎝ ⎛
=
0 27 1
0 26 1
0 25 1
1 27 1
1 26 1
1 25 1
X
in our second sample. You can see that the two samples feature a different y vector, but the same X matrix.
2 Along with the other assumptions of the linear statistical model (namely, E(u)=0; E(uu’)=σ2I; X is full rank; and u is normally distributed), the assumption that X is non-stochastic allows us to prove the Gauss-Markov theorem, which says that the OLS estimator βˆ =(X’X)-1X’y is BLUE (best linear unbiased estimator), i.e., it is the “best” estimators in a class that includes all unbiased and linear (in y) estimators of β, the unknown parameter of the population.
The normality assumption about u (which translates into a normality assumption about y and thusβˆ -which are both linear transformations of u), allows us to construct hypothesis tests based on the t distribution (if we are testing a single restriction) and the F distribution (if we are testing multiple restrictions). Note that in the unlikely case in which we know σ2, tests are actually conducted using the N(0,1) and the χ2
distribution, respectively.
But what if u is not normal? What distribution(s) can we use to conduct inference about β? In practice, we will still use the distributions above (t and F, respectively for testing single and multiple restrictions on β if σ2 is unknown), but we will justify their usage based on an asymptotic approximation –i.e., a large sample or n→∞ approximation. That is, we will say that the test statistics we use are not exact t or exact F distributions in small samples, but become so in large samples, or asymptotically. The same approximations will be used when X is stochastic to find the optimal properties of βˆ , in particular its consistency.
Consistency of OLS
Start from the non-stochastic X case.
Assume n Q
Notice also that:
(
)
(
)
(
)
(
X X)
X uu X X X X
y X X X
' '
' '
' '
ˆ
1 1 1
− − −
+ =
+ =
=
β
β β
4
(
)
(
)
n u X p n
X X p
u X p X X p
u X X X p p
' lim '
lim
' lim '
lim
' '
lim ˆ
lim
1 1 1
− − −
⎟ ⎠ ⎞ ⎜
⎝ ⎛ + =
+ =
+ =
Consistency of OLS Non-stochastic X case.
We have the linear model
y = Xβ + u and make the assumptions
X is full rank
lim lim
'
Now we check CQM (convergence in quadratic mean) for X u'
n . First, note that
Moreover,
6 We still have the linear model
y = Xβ + u but now make the assumptions
X is full rank
lim lim
'
Furthermore
And therefore as before
(
)
11
1
ˆ
lim lim ' '
' '
lim lim
' lim
p p X X X u
X X X u
p p
n n
X u Q p
n β β
β
β β
−
−
−
= +
⎛ ⎞
= +⎜ ⎟ =
⎝ ⎠
+ =