Microeconometrics:
Binary Dependent Variable
Department of Economics
Universitas Padjadjaran
Additional References
•
Dougherty,
Introduction to Econometrics
, 4
thEd, 2011
*best for basics*
•
Golder, M., Advanced Quantitative Analysis: Maximum
Likelihood Estimation,
Estimators we (will) know
•
Ordinary Least Square (OLS)
estimator
–
If we have a SLR of and is exogenous,
then we have
•
Instrumental Variable (IV) estimator
–
If we have a SLR of and is endogenous,
then we have where
•
Maximum Likelihood (ML) estimator
Why uses binary dependent variable?
•
Observed vs unobserved variables
•
Suppose we want to analyse socioeconomic
factors underlying some people to:
–
Corrupt
–
Smoke
–
Borrow money
–
Get a scholarship
Why uses binary dependent variable?
•
Observed vs unobserved variables
•
It would be best to know (observe)
–
Utility derived from corruption, smoking, or
borrowing money, having a boy/girl-friend(s)…
–
The actual (factual) cash fow of families
–
A consistent way of measuring poverty
Why uses binary dependent variable?
•
Observed vs unobserved variables
•
It would be best to know (observe)
–
Utility derived from corruption, smoking, or
borrowing money, having a boy/girl-friend(s)…
–
The actual (factual) cash fow of families
–
A consistent way of measuring poverty
Why uses binary dependent variable?
•
Observed vs unobserved variables
•
What we observe is that
–
Some people corrupt
–
Some people smoke
–
Some people borrow money
–
Some people get scholarships
The mechanism
Suppose:
But , the utility of smoking, is unobserved.
We, however, observe
and
The mechanism
So we estimate
We know the value of , either 0 or 1
•
Because of this, we may think as if is an event
which outcomes is 0 or 1
•
Therefore, essentially what we want to know is
The Linear Probability Model
Using formula for expected value:
]
The Linear Probability Model
If we estimate
with is either 0 or 1
using OLS, we have a Linear
Probability Model
The Linear Probability Model
We know from previous lectures about
OLS that:
•
We assume
•
we can write as
Therefore we can write our LPM
LPM Interpretation
Suppose we have a more complete set
of independent variables:
•
We cannot interpret our ’s as usual,
because
changes ONLY from 0 to 1
(vice versa)
LPM Interpretation
Suppose we have a more complete set
of independent variables:
•
If is continuous:
–
“If increases/decreases by 1 (unit), the
probability of increases/decreases by
percentage points”
LPM Interpretation
Suppose we have a more complete set of
independent variables:
•
If is dummy variable (e.g 1=male):
–
“Suppose there are two individuals who are
identical in every respect but 1 individual is
male the other one is female; The probability
of of male is percentage points higher or
lower (than female)”
Limitations of LPM
•
Distribution of the error term is not following
Normal Distribution, so test statistics are not
robust
•
Suppose
Limitations of LPM
•
Distribution of the error term is not
following Normal Distribution, so test
statistics are not robust
•
Suppose:
–
The probability when is
–
and when is
Limitations of LPM
•
Heteroskedasiticity
Since the error term follows Bernoulli
distribution, then
Variance of the error term:
Limitations of LPM
•
Nonfulfllment of : Does it make sense to
What is a better model for
estimating E(y
i
)?
•
Since probability of an event has to be
between 0 and 1, a good model would be
a nonlinear function of x that its result
never gets negative or larger than 1 !
•
A class of function that we have already
seen in statistics and satisfy this
What is a better model for
estimating E(y
i
)?
What is a better model for E(y
i
)?
•
We denote CDFs using the letter F
Where F is a CDF
•
Therefore to model a binary dependent variables we need
to
choose a CDF
and to have an estimation method
appropriate for estimating and
�
¿ ¿
Solution
•
We need a math function for , or , or , that
always results in values between 0 and 1
•
Whatever the values of independent
variables are (can be from to +), the values
of dependent variable will be between 0 and
1
•
In general:
)
Solution 1: Logit Model
F can be in the form of
equivalently:
Solution 1: Logit Model
Taking the log of both sides
Hence
•
We call
L
i
Logit model
•
We estimate logit model using
Maximum Likelihood method
Logit Model: Coefcients &
Marginal Efects
•
Coefcients are not Marginal Efects
(not directly interprettable)
–
Because of non-linearity setting in the
model
•
Therefore
Logit Model: Coefcients &
Marginal Efects
To get the marginal efect, we need to
diferentiate:
Solution 2: Probit Model
Suppose we have an equation:
But is unobservable
What we observed is actually , which takes the
value of 1 if and 0 otherwise
Solution 2: Probit Model
Hence
The distribution of is
standard normal
Solution 2: Probit Model
Since the normal distribution is
symmetric, we can write
And may be estimated using ML
Probit Model: Coefcients &
Marginal Efects
•
Coefcients are not Marginal Efects
(not directly interpretable)
–
Because of non-linearity setting in the
model
•
Therefore
Marginal Efects
To get the marginal efect, we need to
diferentiate:
Gender Inequality and Poverty in Indonesia: Evidence from Household Data
Kinanti Z. Patria
Estimation of Logit and Probit
Models
•
We do not use OLS, rather we use the
Maximum Likelihood Method
•
MLE (Maximum Likelihood Estimator) of the
unknown parameters are the value of the
parameters that maximize the likelihood
function
MAXIMUM LIKELIHOOD
ESTIMATOR
Maximum Likelihood Estimator
•
Remember that our data is Random
Variable
–
Follows certain probability density
function (pdf) or probability distribution
•
Suppose we have 5 observations of
variable Y
–
What is the odds that we will have these
observations from a normal distribution
with ?
Maximum Likelihood Estimator
•
Remember that our data is Random
Variable
–
Follows certain probability density
function (pdf) or probability distribution
•
Suppose we have 5 observations of
variable Y
–
What is the odds that we will have these
observations from a normal distribution
with ? ?
Maximum Likelihood Estimator
•
“Maximum Likelihood is
just
a
systematic way of searching for the
parameter values of our chosen
distribution that maximize the
a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section R.2 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own who feel that they might beneft from participation in a formal course should consider the London School of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course EC2020 Elements of Econometrics
Method of ML
•
The method of maximum likelihood is
intuitively appealing, because we
attempt to fnd
the values of the true
parameters
that
would have most
likely
produced the data that we in
fact observed.
•
For most cases of practical interest,
the performance of maximum
1 L
m
m
some simple examples. • Suppose that you have a
normally-distributed random variable X with unknown
population mean m and
standard deviation s, and that you have a sample of two
observations, 4 and 6. For the time being, we will assume that
s is equal to 1.
0.0 0.1 0.2
0 1 2 3 4 5 6 7 8
0.00 0.02 0.04 0.06
2
)
(
2
1
2
1
)
(
x
e
x
f
Note constants:
=3.14159
e=2.71828
L
m
m 0.0175
1 L
m
m
Suppose initially you
consider the hypothesis m
= 3.5. Under this
hypothesis the probability density at 4 would be
0.3521 and that at 6 would be 0.0175.
0.0 0.1 0.2
0 1 2 3 4 5 6 7 8
0.00 0.02 0.04 0.06
3.5 0.3521 0.0175 0.0062
L
m
m 0.0175
1 L
m
m
The joint probability density, shown in the
bottom chart, is the product of these, 0.0062.
0.00 0.02 0.04 0.06
0 1 2 3 4 5 6 7 8
0.0 0.1 0.2 0.3
m 0.0540
4.0 0.3989 0.0540 0.0215
L
m
1 L
m
Next consider the
hypothesis m = 4.0. Under this hypothesis the
probability densities associated with the two
observations are 0.3989 and 0.0540, and the joint
probability density is 0.0215.
0.0 0.1 0.2
0 1 2 3 4 5 6 7 8
0.00 0.02 0.04 0.06
L
m
m 0.1295
3.5 0.3521 0.0175 0.0062 4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456
1
Next under the hypothesis
m = 4.5, the probability densities are 0.3521 and 0.1295, and the joint
probability density is 0.0456.
0.00 0.02 0.04 0.06
0 1 2 3 4 5 6 7 8
0.0 0.1 0.2 0.3
4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456 5.0 0.2420 0.2420 0.0585
L
m
m 0.2420
0.2420
1
Under the hypothesis m = 5.0, the probability
densities are both 0.2420 and the joint probability density is 0.0585.
0.00 0.02 0.04 0.06
0 1 2 3 4 5 6 7 8
0.0 0.1 0.2
3.5 0.3521 0.0175 0.0062 4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456 5.0 0.2420 0.2420 0.0585 5.5 0.1295 0.3521 0.0456
L
m
m 0.1295
1
Under the hypothesis m = 5.5, the probability
densities are 0.1295 and 0.3521 and the joint
probability density is 0.0456.
0.0 0.1 0.2 0.3
0 1 2 3 4 5 6 7 8
0.00 0.02 0.04 0.06
4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456 5.0 0.2420 0.2420 0.0585 5.5 0.1295 0.3521 0.0456
L
m
L
m
m 0.1295
1
The complete joint density function for all values of m
has now been plotted in the lower diagram. We see that it peaks at m = 5.
0.00 0.02 0.04 0.06
0 1 2 3 4 5 6 7 8
0.0 0.1 0.2
10
Now we will look at the mathematics of the example. If X is normally distributed with mean m and standard deviation s, its density function is as shown.
2
11
For the time being, we are assuming s is equal to 1, so the density function simplifies to the second expression.
2
2 1
2
1
)
(
e
X12
Hence we obtain the probability densities for the observations where X = 4 and X = 6.
13
The joint probability density for the two observations in the sample is just the product of their individual densities.
joint density
14
In maximum likelihood estimation we choose as our estimate of m ,the value that gives us
the greatest joint density for the observations in our sample. This value is associated with the greatest probability, or maximum likelihood, of obtaining the observations in the
sample.
joint density
MLE AND REGRESSION
ANALYSIS
1
X
X
ib
1b
1+
b
2X
iY =
b
1+
b
23
X
X
ib
1b
1+
b
2X
iY =
b
1+
b
26
Potential values of Y close to b1 + b2Xi will have relatively large densities ...
X
X
ib
1b
1+
b
2X
iY =
b
1+
b
2X
X
ib
1b
1+
b
2X
iY =
b
1+
b
2X
7
... while potential values of Y relatively far from b1 + b2Xi will have small
8
The mean value of the distribution of Yi is b1 + b2Xi. Its standard deviation is
s, the standard deviation of the disturbance term.
X
X
ib
1b
1+
b
2X
iY =
b
1+
b
29
Hence the density function for the ex ante distribution of Yi is as shown.
X
X
ib
1b
1+
b
2X
iY =
b
1+
b
210
The joint density function for the observations on Y is the product of their individual densities.
11
Now, taking b1, b2 and s as our choice variables, and taking the data on Y
and X as given, we can re-interpret this function as the likelihood function
for b1, b2, and s. REMEMBER THIS
12
We will choose b1, b2, and s so as to maximize the likelihood, given the data
on Y and X. As usual, it is easier to do this indirectly, maximizing the
log-likelihood instead.
2
log
13
As usual, the frst step is to decompose the expression as the sum of the logarithms of the factors.
Z
log
2
log
2
1
log
...
2
1
log
14
Then we split the logarithm of each factor into two components. The frst component is the same in each case.
Z
log
2
log
2
1
log
...
2
1
log
15
Hence the log-likelihood simplifes as shown.
Z
log
2
log
2
1
log
...
2
1
log
16
To maximize the log-likelihood, we need to minimize Z. But choosing
estimators of b1 and b2 to minimize Z is exactly what we did when we derived
the least squares regression coefcients.
Z
log
2
log
2
1
log
...
2
1
log
17
Thus, for this regression model, the maximum likelihood estimators of b1 and
b2 are identical to the least squares estimators.
Z
log
2
log
2
1
log
...
2
1
log
18
As a consequence, Z will be the sum of the squares of the least squares residuals.
where
)
(
...
)
(
where
19
To obtain the maximum likelihood estimator of s, it is convenient to
rearrange the log-likelihood function as shown.
Z
log
log
2
2
1
log
1
log
20
Differentiating it with respect to s, we obtain the expression shown.
Z
log
log
2
2
1
log
1
log
2
log
21
The frst order condition for a maximum requires this to be equal to zero. Hence the maximum likelihood estimator of the variance is the sum of the
squares of the residuals divided by n.
log
log
2
2
1
log
1
log
2
log
22
Note that this is biased for fnite samples. To obtain an unbiased estimator,
we should divide by n–k, where k is the number of parameters, in this case 2.
However, the bias disappears as the sample size becomes large.
Z
log
log
2
2
1
log
1
log
2