Microeconometrics: Binary Dependent Variable

(1)

Microeconometrics:

Binary Dependent Variable

Department of Economics

Universitas Padjadjaran

(2)

(3)

(4)

(5)

Additional References

• _Dougherty,

_{Introduction to Econometrics}

_{, 4}

th

Ed, 2011

best for basics

• _{Golder, M., Advanced Quantitative Analysis: Maximum}

Likelihood Estimation,

(6)

Estimators we (will) know

• _{Ordinary Least Square (OLS)}

estimator

–

_{If we have a SLR of and is exogenous,}

then we have

• _{Instrumental Variable (IV) estimator}

–

_{If we have a SLR of and is endogenous,}

then we have where

• _{Maximum Likelihood (ML) estimator}

(7)

Why uses binary dependent variable?

• Observed vs unobserved variables

• Suppose we want to analyse socioeconomic

factors underlying some people to:

–

_Corrupt

–

_Smoke

–

_{Borrow money}

–

_{Get a scholarship}

(8)

Why uses binary dependent variable?

• _{Observed vs unobserved variables}

• _{It would be best to know (observe)}

–

_{Utility derived from corruption, smoking, or}

borrowing money, having a boy/girl-friend(s)…

–

The actual (factual) cash fow of families

–

A consistent way of measuring poverty

(9)

Why uses binary dependent variable?

• _{Observed vs unobserved variables}

• _{It would be best to know (observe)}

–

_{Utility derived from corruption, smoking, or}

borrowing money, having a boy/girl-friend(s)…

–

The actual (factual) cash fow of families

–

A consistent way of measuring poverty

(10)

Why uses binary dependent variable?

• _{Observed vs unobserved variables}

• _{What we observe is that}

–

_{Some people corrupt}

–

_{Some people smoke}

–

_{Some people borrow money}

–

_{Some people get scholarships}

(11)

The mechanism

Suppose:

But , the utility of smoking, is unobserved.

We, however, observe

and

(12)

The mechanism

So we estimate

We know the value of , either 0 or 1

• _{Because of this, we may think as if is an event}

which outcomes is 0 or 1

• _{Therefore, essentially what we want to know is}

(13)

The Linear Probability Model

Using formula for expected value:

]

(14)

The Linear Probability Model

If we estimate

with is either 0 or 1

using OLS, we have a Linear

Probability Model

(15)

The Linear Probability Model

We know from previous lectures about

OLS that:

• _{We assume}

• _{we can write as}

Therefore we can write our LPM

(16)

LPM Interpretation

Suppose we have a more complete set

of independent variables:

• _{We cannot interpret our ’s as usual,}

because

changes ONLY from 0 to 1

(vice versa)

(17)

LPM Interpretation

Suppose we have a more complete set

of independent variables:

• _{If is continuous:}

–

_{“If increases/decreases by 1 (unit), the}

probability of increases/decreases by

percentage points”

(18)

LPM Interpretation

Suppose we have a more complete set of

independent variables:

• _{If is dummy variable (e.g 1=male):}

–

_{“Suppose there are two individuals who are}

identical in every respect but 1 individual is

male the other one is female; The probability

of of male is percentage points higher or

lower (than female)”

(19)

(20)

Limitations of LPM

• _{Distribution of the error term is not following}

Normal Distribution, so test statistics are not

robust

• _Suppose

(21)

Limitations of LPM

• _{Distribution of the error term is not}

following Normal Distribution, so test

statistics are not robust

• _Suppose:

–

_{The probability when is}

–

_{and when is}

(22)

Limitations of LPM

• _{Heteroskedasiticity}

Since the error term follows Bernoulli

distribution, then

Variance of the error term:

(23)

Limitations of LPM

• _{Nonfulfllment of : Does it make sense to}

(24)

What is a better model for

estimating E(y

_i

)?

• _{Since probability of an event has to be}

between 0 and 1, a good model would be

a nonlinear function of x that its result

never gets negative or larger than 1 !

• _{A class of function that we have already}

seen in statistics and satisfy this

(25)

What is a better model for

estimating E(y

i

)?

(26)

What is a better model for E(y

_i

)?

• _{We denote CDFs using the letter F}

Where F is a CDF

• _{Therefore to model a binary dependent variables we need}

to

choose a CDF

and to have an estimation method

appropriate for estimating and

�

¿ ¿

(27)

Solution

• We need a math function for , or , or , that

always results in values between 0 and 1

• _{Whatever the values of independent}

variables are (can be from to +), the values

of dependent variable will be between 0 and

1

• In general:

)

(28)

Solution 1: Logit Model

F can be in the form of

equivalently:

(29)

Solution 1: Logit Model

Taking the log of both sides

Hence

• We call

L

i



Logit model

• _{We estimate logit model using}

Maximum Likelihood method

(30)

Logit Model: Coefcients &

Marginal Efects

• _{Coefcients are not Marginal Efects}

(not directly interprettable)

–

_{Because of non-linearity setting in the}

model

• _Therefore

(31)

Logit Model: Coefcients &

Marginal Efects

To get the marginal efect, we need to

diferentiate:

(32)

Solution 2: Probit Model

Suppose we have an equation:

But is unobservable

What we observed is actually , which takes the

value of 1 if and 0 otherwise

(33)

Solution 2: Probit Model

Hence

The distribution of is

standard normal

(34)

Solution 2: Probit Model

Since the normal distribution is

symmetric, we can write

And may be estimated using ML

(35)

Probit Model: Coefcients &

Marginal Efects

• _{Coefcients are not Marginal Efects}

(not directly interpretable)

–

_{Because of non-linearity setting in the}

model

• _Therefore

(36)

Marginal Efects

To get the marginal efect, we need to

diferentiate:

(37)

(38)

(39)

(40)

(41)

Gender Inequality and Poverty in Indonesia: Evidence from Household Data

Kinanti Z. Patria

(42)

(43)

Estimation of Logit and Probit

Models

• We do not use OLS, rather we use the

Maximum Likelihood Method

• _{MLE (Maximum Likelihood Estimator) of the}

unknown parameters are the value of the

parameters that maximize the likelihood

function

(44)

MAXIMUM LIKELIHOOD

ESTIMATOR

(45)

Maximum Likelihood Estimator

• _{Remember that our data is Random}

Variable

–

_{Follows certain probability density}

function (pdf) or probability distribution

• _{Suppose we have 5 observations of}

variable Y

–

_{What is the odds that we will have these}

observations from a normal distribution

with ?

(46)

Maximum Likelihood Estimator

• _{Remember that our data is Random}

Variable

–

_{Follows certain probability density}

function (pdf) or probability distribution

• _{Suppose we have 5 observations of}

variable Y

–

_{What is the odds that we will have these}

observations from a normal distribution

with ? ?

(47)

Maximum Likelihood Estimator

• _{“Maximum Likelihood is}

_just

_a

systematic way of searching for the

parameter values of our chosen

distribution that maximize the

(48)

a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section R.2 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might beneft from participation in a formal course should consider the London School of Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course EC2020 Elements of Econometrics

(49)

Method of ML

• _{The method of maximum likelihood is}

intuitively appealing, because we

attempt to fnd

the values of the true

parameters

that

would have most

likely

produced the data that we in

fact observed.

• _{For most cases of practical interest,}

the performance of maximum

(50)

1 L

m

some simple examples. • Suppose that you have a

normally-distributed random variable X with unknown

population mean m and

standard deviation s, and that you have a sample of two

observations, 4 and 6. For the time being, we will assume that

s is equal to 1.

0.0 0.1 0.2

0 1 2 3 4 5 6 7 8

0.00 0.02 0.04 0.06

(51)

2 )

(

2

1

2

1 )

(



x

e

x

f

Note constants:



=3.14159

e=2.71828

(52)

L

m

m 0.0175

1 L

m

Suppose initially you

consider the hypothesis m

= 3.5. Under this

hypothesis the probability density at 4 would be

0.3521 and that at 6 would be 0.0175.

0.0 0.1 0.2

0 1 2 3 4 5 6 7 8

0.00 0.02 0.04 0.06

(53)

3.5 0.3521 0.0175 0.0062

L

m

m 0.0175

1 L

m

The joint probability density, shown in the

bottom chart, is the product of these, 0.0062.

0.00 0.02 0.04 0.06

0 1 2 3 4 5 6 7 8

0.0 0.1 0.2 0.3

(54)

m 0.0540

4.0 0.3989 0.0540 0.0215

L

m

1 L

m

Next consider the

hypothesis m = 4.0. Under this hypothesis the

probability densities associated with the two

observations are 0.3989 and 0.0540, and the joint

probability density is 0.0215.

0.0 0.1 0.2

0 1 2 3 4 5 6 7 8

0.00 0.02 0.04 0.06

(55)

L

m

m 0.1295

3.5 0.3521 0.0175 0.0062 4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456

1

Next under the hypothesis

m = 4.5, the probability densities are 0.3521 and 0.1295, and the joint

0.00 0.02 0.04 0.06

0 1 2 3 4 5 6 7 8

0.0 0.1 0.2 0.3

(56)

4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456 5.0 0.2420 0.2420 0.0585

L

m

m 0.2420

0.2420

1

Under the hypothesis m = 5.0, the probability

densities are both 0.2420 and the joint probability density is 0.0585.

0.00 0.02 0.04 0.06

0 1 2 3 4 5 6 7 8

0.0 0.1 0.2

(57)

3.5 0.3521 0.0175 0.0062 4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456 5.0 0.2420 0.2420 0.0585 5.5 0.1295 0.3521 0.0456

L

m

m 0.1295

1

Under the hypothesis m = 5.5, the probability

densities are 0.1295 and 0.3521 and the joint

0.0 0.1 0.2 0.3

0 1 2 3 4 5 6 7 8

0.00 0.02 0.04 0.06

(58)

4.0 0.3989 0.0540 0.0215 4.5 0.3521 0.1295 0.0456 5.0 0.2420 0.2420 0.0585 5.5 0.1295 0.3521 0.0456

L

m

L

m

m 0.1295

1

The complete joint density function for all values of m

has now been plotted in the lower diagram. We see that it peaks at m = 5.

0.00 0.02 0.04 0.06

0 1 2 3 4 5 6 7 8

0.0 0.1 0.2

(59)

10

Now we will look at the mathematics of the example. If X is normally distributed with mean m and standard deviation s, its density function is as shown.

2 _

(60)

11

For the time being, we are assuming s is equal to 1, so the density function simplifies to the second expression.

 2

2 1

2

1 )

(



 



e

X

(61)

12

Hence we obtain the probability densities for the observations where X = 4 and X = 6.

(62)

13

The joint probability density for the two observations in the sample is just the product of their individual densities.

joint density

(63)

14

In maximum likelihood estimation we choose as our estimate of m ,the value that gives us

the greatest joint density for the observations in our sample. This value is associated with the greatest probability, or maximum likelihood, of obtaining the observations in the

sample.

joint density

(64)

MLE AND REGRESSION

ANALYSIS

(65)

1

X

_i

b

₁

b

₁

+

b

₂

X

_i

Y =

b

1

+

b

2

(66)

3

X

_i

b

₁

b

₁

+

b

₂

X

_i

Y =

b

1

+

b

2

(67)

6

Potential values of Y close to b₁ + b₂X_i will have relatively large densities ...

X

_i

b

₁

b

₁

+

b

₂

X

_i

Y =

b

1

+

b

2

(68)

X

_i

b

₁

b

₁

+

b

₂

X

_i

Y =

b

1

+

b

2

X

7

... while potential values of Y relatively far from b₁ + b₂X_i will have small

(69)

8

The mean value of the distribution of Y_i is b₁ + b₂X_i. Its standard deviation is

s, the standard deviation of the disturbance term.

X

_i

b

₁

b

₁

+

b

₂

X

_i

Y =

b

1

+

b

2

(70)

9

Hence the density function for the ex ante distribution of Y_i is as shown.

X

_i

b

₁

b

₁

+

b

₂

X

_i

Y =

b

1

+

b

2

(71)

10

The joint density function for the observations on Y is the product of their individual densities.

(72)

11

Now, taking b₁, b₂ and s as our choice variables, and taking the data on Y

and X as given, we can re-interpret this function as the likelihood function

for b₁, b₂, and s. REMEMBER THIS

(73)

12

We will choose b₁, b₂, and s so as to maximize the likelihood, given the data

on Y and X. As usual, it is easier to do this indirectly, maximizing the

log-likelihood instead.

2 _

log

(74)

13

As usual, the frst step is to decompose the expression as the sum of the logarithms of the factors.

Z

log

2 log

2

1 log

...

2

1 log

(75)

14

Then we split the logarithm of each factor into two components. The frst component is the same in each case.

Z

log

2 log

2

1 log

...

2

1 log

(76)

15

Hence the log-likelihood simplifes as shown.

Z

log

2 log

2

1 log

...

2

1 log

(77)

16

To maximize the log-likelihood, we need to minimize Z. But choosing

estimators of b₁ and b₂ to minimize Z is exactly what we did when we derived

the least squares regression coefcients.

Z

log

2 log

2

1 log

...

2

1 log

(78)

17

Thus, for this regression model, the maximum likelihood estimators of b₁ and

b₂ are identical to the least squares estimators.

Z

log

2 log

2

1 log

...

2

1 log

(79)

18

As a consequence, Z will be the sum of the squares of the least squares residuals.



where

)

(

...

)

(

where

(80)

19

To obtain the maximum likelihood estimator of s, it is convenient to

rearrange the log-likelihood function as shown.

Z

log

2

1 log

(81)

20

Differentiating it with respect to s, we obtain the expression shown.

Z

log

2

1 log

2

log

(82)

21

The frst order condition for a maximum requires this to be equal to zero. Hence the maximum likelihood estimator of the variance is the sum of the

squares of the residuals divided by n.

log

2

1 log

2

log

(83)

22

Note that this is biased for fnite samples. To obtain an unbiased estimator,

we should divide by n–k, where k is the number of parameters, in this case 2.

Microeconometrics: Binary Dependent Variable