APPENDIX 4A - PREFACE

10. In this text, we will largely rely on the OLS method for practical rea- sons: (a) Compared to ML, the OLS is easy to apply; (b) the ML and OLS estimators of β¹andβ²are identical (which is true of multiple regressions too); and (c) even in moderately large samples the OLS and ML estimators ofσ²do not differ vastly.

However, for the beneﬁt of the mathematically inclined reader, a brief introduction to ML is given in the appendix to this chapter and also in Appendix A.

and written as¹

LF(β1,β2,σ²)= 1 σⁿ√

2πnexp

−1 2

(Yi−β¹−β²Xi)² σ²

(4)

Themethod of maximum likelihood,as the name indicates, consists in estimating the unknown parameters in such a manner that the probability of observing the given Y’s is as high (or maximum) as possible. Therefore, we have to ﬁnd the maximum of the function (4). This is a straightforward exercise in differential calculus. For differentiation it is easier to express (4) in the log term as follows.²(Note:ln=natural log.)

ln LF= −nlnσ−n

2ln (2π)−1 2

(Yi−β¹−β²Xi)² σ²

= −n

2lnσ²−n

2ln (2π)−1 2

(Yi−β¹−β²Xi)²

σ² (5)

Differentiating (5) partially with respect to β1,β2, andσ², we obtain

∂ln LF

∂β¹ = −1 σ²

(Yi−β1−β2Xi)(−1) (6)

∂ln LF

∂β2 = −1 σ²

(Yi−β¹−β²Xi)(−Xi) (7)

∂ln LF

∂σ² = − n 2σ² + 1

2σ⁴

(Yi−β¹−β²Xi)² (8)

Setting these equations equal to zero (the ﬁrst-order condition for optimization) and letting β˜1,β˜2, and σ˜² denote the ML estimators, we obtain³

˜ σ²

(Yi− ˜β1− ˜β2Xi)=0 (9)

˜ σ²

(Yi− ˜β¹− ˜β²Xi)Xi =0 (10)

− n 2σ˜² + 1

2σ˜⁴

(Yi− ˜β1− ˜β2Xi)²=0 (11)

1Of course, if β1,β2, andσ²are known but the Yiare not known, (4) represents the joint probability density function—the probability of jointly observing the Y_i.

2Since a log function is a monotonic function, ln LF will attain its maximum value at the same point as LF.

3We use˜(tilde) for ML estimators and ˆ(cap or hat) for OLS estimators.

After simplifying, Eqs. (9) and (10) yield Yi =nβ˜1+ ˜β2

Xi (12) YiXi= ˜β1

Xi+ ˜β2

X²_i (13)

which are precisely the normal equations of the least-squares theory obtained in (3.1.4) and (3.1.5). Therefore, the ML estimators, the β˜’s, are the same as the OLS estimators, the βˆ’s, given in (3.1.6) and (3.1.7). This equal- ity is not accidental. Examining the likelihood (5), we see that the last term enters with a negative sign. Therefore, maximizing (5) amounts to minimiz- ing this term, which is precisely the least-squares approach, as can be seen from (3.1.2).

Substituting the ML (=OLS) estimators into (11) and simplifying, we obtain the ML estimator of σ˜²as

˜ σ²= 1

(Yi− ˜β1− ˜β2Xi)²

= 1 n

(Yi− ˆβ¹− ˆβ²Xi)² (14)

= 1 n

uˆ²_i

From (14) it is obvious that the ML estimator σ˜² differs from the OLS estimator σˆ²=[1/(n−2)]

u²_i, which was shown to be an unbiased estimator of σ² in Appendix 3A, Section 3A.5. Thus, the ML estimator of σ²is biased. The magnitude of this bias can be easily determined as follows.

Taking the mathematical expectation of (14) on both sides, we obtain

E(σ˜²)= 1

nE uˆ²_i

= n−2 n

σ² using Eq. (16) of Appendix 3A, (15) Section 3A.5

=σ²−2 nσ²

which shows that σ˜² is biased downward (i.e., it underestimates the true σ²) in small samples. But notice that as n, the sample size, increases in- deﬁnitely, the second term in (15), the bias factor, tends to be zero. There- fore, asymptotically (i.e., in a very large sample), σ˜² is unbiased too, that is, limE(σ˜²)=σ² as n→ ∞. It can further be proved that σ˜² is also a

consistentestimator⁴; that is, as nincreases indeﬁnitelyσ˜²converges to its true value σ².

4A.2 MAXIMUM LIKELIHOOD ESTIMATION OF FOOD EXPENDITURE IN INDIA

Return to Example 3.2 and regression (3.7.2), which gives the regression of food expenditure on total expenditure for 55 rural households in India.

Since under the normality assumption the OLS and ML estimators of the regression coefﬁcients are the same, we obtain the ML estimators as β˜¹= ˆβ¹=94.2087 and β˜²= ˆβ²=0.4386. The OLS estimator of σ² is

σ²=4469.6913, but the ML estimator is σ˜²=4407.1563, which is smaller than the OLS estimator. As noted, in small samples the ML estimator is downward biased; that is, on average it underestimates the true variance σ². Of course, as you would expect, as the sample size gets bigger, the difference between the two estimators will narrow. Putting the values of the estimators in the log likelihood function, we obtain the value of −308.1625. If you want the maximum value of the LF, just take the antilog of −308.1625. No other values of the parameters will give you a higher probability of obtaining the sample that you have used in the analysis.

APPENDIX 4A EXERCISES

4.1. “If two random variables are statistically independent, the coefﬁcient of correlation between the two is zero. But the converse is not necessarily true; that is, zero correlation does not imply statistical independence. How- ever, if two variables are normally distributed, zero correlation necessarily implies statistical independence.” Verify this statement for the following joint probability density function of two normally distributed variables Y1

and Y₂ (this joint probability density function is known as the bivariate normal probability density function):

f(Y1,Y2)= 1 2π σ1σ2

1−ρ²exp

− 1 2(1−ρ²)

× Y1−µ1

σ1

−2ρ(Y1−µ1)(Y2−µ2)

σ1σ2 + Y2−µ2

σ²

4SeeApp. Afor a general discussion of the properties of the maximum likelihood estimators as well as for the distinction between asymptotic unbiasedness and consistency. Roughly speaking, in asymptotic unbiasedness we try to find out the limE(σñ²) asntends to infinity, wherenis the sample size on which the estimator is based, whereas in consistency we try to find out how σñ²behaves as nincreases indefinitely. Notice that the unbiasedness property is a repeated sampling property of an estimator based on a sample of given size, whereas in consistency we are concerned with the behavior of an estimator as the sample size increases indefinitely.

where_µ1=mean of Y1

µ2=mean of Y₂

σ1=standard deviation of Y1

σ2=standard deviation of Y₂

ρ =coefﬁcient of correlation between Y1andY2

4.2. By applying the second-order conditions for optimization (i.e., second- derivative test), show that the ML estimators of_β1,_β2, and_σ² obtained by solving Eqs. (9), (10), and (11) do in fact maximize the likelihood function (4).

4.3. A random variable Xfollows the exponential distributionif it has the following probability density function (PDF):

f(X)=(1/θ)e⁻^X^/θ for X>0

=0 elsewhere

where _{θ >}0 is the parameter of the distribution. Using the ML method, show that the ML estimator of _θ isθˆ=

X_i/n, where nis the sample size.

That is, show that the ML estimator of _θ is the sample mean X¯.

119 Beware of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confession obtained under duress may not be admis- sible in the court of scientiﬁc opinion.¹

As pointed out in Chapter 4, estimation and hypothesis testing constitute the two major branches of classical statistics. The theory of estimation consists of two parts: point estimation and interval estimation. We have dis- cussed point estimation thoroughly in the previous two chapters where we introduced the OLS and ML methods of point estimation. In this chapter we ﬁrst consider interval estimation and then take up the topic of hypothesis testing, a topic intimately related to interval estimation.

5.1 STATISTICAL PREREQUISITES

Before we demonstrate the actual mechanics of establishing conﬁdence in- tervals and testing statistical hypotheses, it is assumed that the reader is familiar with the fundamental concepts of probability and statistics. Although not a substitute for a basic course in statistics, Appendix A provides the essentials of statistics with which the reader should be totally familiar.

Key concepts such as probability, probability distributions, Type I and Type II errors, level of signiﬁcance, power of a statistical test, and conﬁdence intervalare crucial for understanding the material covered in this and the following chapters.

1Stephen M. Stigler, “Testing Hypothesis or Fitting Models? Another Look at Mass Extinc- tions,” in Matthew H. Nitecki and Antoni Hoffman, eds., Neutral Models in Biology,Oxford University Press, Oxford, 1987, p. 148.

5

TWO-VARIABLE

REGRESSION: INTERVAL

Dalam dokumen PREFACE - Spada UNS (Halaman 119-124)