Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol97.Issue1.July2000:

(1)

Semiparametric qualitative response model

estimation with unknown heteroscedasticity

or instrumental variables

Arthur Lewbel

*

Department of Economics, Boston College, 140 Commonwealth Ave., Chestnut Hill, MA 02467, USA

Received 1 June 1998; received in"nal form 1 August 1999; accepted 1 August 1999

Abstract

This paper provides estimators of discrete choice models, including binary, ordered, and multinomial response (choice) models. The estimators closely resemble ordinary and two-stage least squares. The distribution of the model's latent variable error is unknown and may be related to the regressors, e.g., the model could have errors that are hetero-scedastic or correlated with regressors. The estimator does not require numerical searches, even for multinomial choice. For ordered and binary choice models the estimator is root Nconsistent and asymptotically normal. A consistent estimator of the conditional error distribution is also provided. ( 2000 Elsevier Science S.A. All rights reserved.

JEL classixcation: C14; C25; C13

Keywords: Semiparametric; Discrete choice; Qualitative response; Heteroscedasticity; Binomial response; Multinomial response; Measurement error; Instrumental variables; Binary choice; Ordered choice; Latent variable models

1. Introduction

A new estimator for qualitative response models is proposed, including binary, multinomial, ordered response, and other discrete choice models. The

*Tel.:#1-617-552-3678; fax:#1-617-552-2308.

E-mail address:[email protected] (A. Lewbel).

(2)

distribution of the latent variable error is unknown and may be related to the regressors, e.g., the model can su!er from conditional heteroscedasticity of unknown form. An instrumental variables version of the estimator can deal with some forms of endogeneous or mismeasured regressors. The estimator does not require numerical searches, even for multinomial choice models. For binary and

ordered choice the estimator is rootN consistent and asymptotically normal.

A consistent estimator of the conditional error distribution is also provided.

This paper's estimator treats one regressor as special. Letvdenote the special

regressor, and letxbe aJvector of other regressors. Assume that the coe$cient

of vis positive (otherwise replace vwith !v. An estimator of the sign ofvis

provided). Without loss of generality normalize this coe$cient to equal one. With this normalization, the standard latent variable binary choice or binomial response model (see, e.g., Maddala, 1983 or McFadden, 1984 for surveys) is

y

i"I(vi#xTib#ei'0), (1.1)

whereiindexes observations, yis the dependent variable,eis the unobserved

latent variable error,bis aJvector of coe$cients, T denotes transpose, andI())

is the indicator function that equals one if ) is true and zero otherwise. The data

areNobservations ofy, v, andx.

Letf(vDx) denote the conditional probability density function ofvgiven an

observationx, which can be estimated from the data. This paper's main result is

that, under fairly general conditions,

b"E(xxT)~1E

A

x y!I(v'0) f(vDx)

B

,

sobcan be estimated by an ordinary least-squares linear regression ofy8 onx,

wherey8

i"[yi!I(vi'0)]/f(viDxi).

Ifxis correlated withe, as in a model with mismeasured regressors,bcan be

estimated by an analogous two-stage least-squares linear regression based on

E(zxT)b"E

A

zy!I(v'0) f(vDz)

B

,

wherezis a vector of instruments that are uncorrelated withe.

LetF

e(eD)) denote the conditional distribution of an observation ofegiven

data ) . The minimal uncorrelated error assumption for linear models

E(ex)"0 (1.2)

(or Eq. (1.4)) is not generally su$cient to identifybin the binary choice model

(1.1). An additional assumption that is made for identi"cation and estimation

here is that the distribution of e be conditionally independent of the one

regressorv, or equivalently

F

(3)

The distribution ofvwill also be assumed to have a large support. Eqs. (1.2)

and (1.3) require that the error distributionF

edoes not depend onv, but permit

virtually any form of heteroscedasticity with respect to the vector of other

regressorsx.

The goal is estimation of the entire model, both the vector b and the

conditional error distributionF

e(eDx). Linearly regressing an estimate ofy8 on

xyields a rootNconsistent and asymptotically normal estimator forb. A

uni-formly consistent estimator of the distribution functionF

e(eDx), and of its density

functionf

e(eDx) is also provided.

Lewbel (1998) uses similar assumptions to those here to identify general latent

variable models of the formy"¸(v#xTb#e) for some functions ¸, but the

two papers di!er in some fundamental ways. The identi"cation in Lewbel (1998)

comes from the change of variables argument :vh(y) dv"

:vh[¸(v#xT_ib#e)] dv":(w!xTb!e)h[¸(w)] dw":wh[¸(w)] dw!

(xTb!e):h[¸(w)] dw, which is linear inxTb!e, and hence permits estimation

ofbfrom the conditional expectation of the above object. This estimator works

for many ¸ functions but not for binary choice, and requires selection of

a&tuning'functionhthat possesses special properties.

The present paper's identi"cation method is quite di!erent, resulting in a simpler estimator. This paper's estimator is based on the observation that y"I(v'!xTb!e) and that :_KLI(v'!xTb!e)!I(v'0) dv"

xTb#e(as long as¸andKare large enough). Eq. (1.3) makes E(y8 Dx) equal the

expection of this integral givenx, and the least-squares estimator then follows

directly.

A perhaps more intuitive estimator can be derived by observing that, given

Eq. (1.3), E(yDv,x) equals the conditional distribution function of !xTb!e,

evaluated atv. From this distribution a direct estimate of the conditional mean

of !xTb!egiven x can be constructed, and b could then be estimated by

regressing that conditional mean estimate onx. An integration by parts

argu-ment is later used to show that this alternative estimator is equivalent in

expectation to regressingy8 onx.

Many estimators exist for binary choice models. Standard maximum

likelihood estimation requires a"nitely parameterized conditional distribution

F

e(eDx). Semiparametric and quasimaximum likelihood estimators of linear latent variable qualitative response models include Cosslett (1983), Ruud (1983), Powell et al. (1989), Ichimura (1993), Klein and Spady (1993),

Newey and Ruud (1994), and HaKrdle and Horowitz (1996). To estimate

(1.1), these estimators require either that F

e(eDv,x)"Fe(e) or that F

e(eDv,x)"Fe(eDv#xTb), and hence they impose far more restrictions onethan does Eq. (1.3) or (1.5). The present paper shares a feature with the Ruud papers of density weighting, though instead of making all of the regressors appear normal as Ruud requires, the present paper's weighting makes the single

(4)

Other semiparametric estimators of binary choice models assume the

condi-tional median or other known quantile of e given x is zero. Examples are

Manski (1975,1985), and Horowitz (1992,1993). Like the present paper's es-timator, these quantile estimators permit quite general forms of conditional

heteroscedasticity, but they converge at slower than root N rates. LetQ_h(eD))

denote thehth quantile ofe, conditional on the data ) . The quantile estimators

assumeQ_h(eDv,x)"Q_h(e) for some knownh. In contrast, by Eq. (1.3) this paper

assumes thatQ_h(eDv,x)"Q_h(eDx) holds for all quantilesh. Instead of assuming

that one of the quantiles does not depend on all of the regressors, this paper assumes that all of the quantiles do not depend on one of the regressors.

The estimator proposed here includes estimation of the constant term in the latent variable, unlike some other estimators like Powell et al. (1989) or Klein and Spady (1993), which fail to estimate the constant term.

The present paper's estimator can be extended to handle certain types of endogeneous or mismeasured regressor models. For this extension, Eqs. (1.2) and (1.3) are replaced with

E(ez)"0, (1.4)

F

ex(e,xDv,z)"Fex(e,xDz), (1.5)

wherezis a vector of instruments andF

ex(e,xD)) denotes the distribution of an

observation of (e,x) conditional on data ). Eq. (1.5) limits the range of

applica-tions of this extension (see Section 3 for details), but very few semiparametric estimators exist for any kind of endogeneous or mismeasured regressor binary choice models. One example is Newey (1985).

Section 2 below provides general examples of models that satisfy the identi-"cation Eqs. (1.2) and (1.3), and describes the basic ordinary least-squares identi"cation result. Section 3 gives example models that satisfy the instrumen-tal variables identi"cation Eqs. (1.4) and (1.5), and provides the corresponding

two-stage least-squares identi"cation ofb. An alternative estimator that gives

some intuition behind the identi"cation is also provided.

When the distribution ofvis unknown, the estimation ofbemploys a

"rst-stage nonparametric density estimator. Section 4 gives the limiting rootN

distri-bution of bK, taking into account the estimation error in the required density

estimate. A simpler estimator that does not require a nonparametric"rst stage is

also provided in Section 4, though more stringent conditions are needed for its consistency. The only computations this alternative, ordered data-based es-timator uses are a sorting of the data and two linear regressions.

Section 5 describes estimation of the distribution of the errors, including

conditional and unconditional moments ofe, and the sign of the coe$cient ofv.

(5)

Section 7 discusses selection of the kernel and bandwidth for the required nonparametric density estimation, and provides the results of a Monte Carlo analysis of the estimator. The Monte Carlo includes an analysis of the behavior of estimated asymptotic standard errors, which is always relevant for empirical applications but is often ignored in Monte Carlo studies of estimators. Section 8 provides concluding remarks.

2. The basic model and estimator

2.1. Example models

Each of the following conditions and models is su$cient to make the identify-ing Eqs. (1.2) and (1.3) hold. Example models that satisfy the instrumental variables assumptions (1.4) and (1.5) are discussed later.

1. Eqs. (1.2) and (1.3) hold when the errorseare mean zero and independent of

the regressors (v,x). This is the traditional latent variable error assumption

satis"ed by standard models like logit and probit.

2. Eq. (1.3) holds whene"l(eH,x) for any functionland any random vector

eH, where the distribution ofeHis independent of (v,x). The functionl, the vector

eHand its distribution do not need to be known, observed, or estimated. This

permits virtually any form of conditional heteroscedasticity involving x, but

not v. For example, the standard random coe$cients model y"

I[v#xT(b#eH)#eH₀'0] with E(eH)"0 and E(eH₀)"0 equals Eq. (1.1) with

e"xTeH#eH₀, and satis"es Eqs. (1.2) and (1.3). Every regressor exceptvcould have a random coe$cient.

3. Eq. (1.3) holds whenvis independent of (e,x). This case is most likely to

occur whenvis determined by experimental design, such as the

&willingness-to-pay' models used to determine the perceived value of a public good. In these

models each agentiis asked if he or she would be willing to pay some randomly

drawn (log) price!v

ifor a public good like a road or a park. Thenyiis one for

a yes answer and zero for no, andx

iconsists of observables that a!ect agenti's

decision. See McFadden (1993) and Lewbel and McFadden (1997), who study

this application in detail. In cases like these where the density ofvis known, the

proposed estimator ofbhas no nonparametric estimation component.

A common application of discrete choice models is wherey

idenotes whether

a consumeripurchases a particular good or service. In these applications, as in

the willingness to pay models, the erroreis interpreted as preference

heterogen-eity, and hence may depend on demographic and taste attributes of the

con-sumer. The errorewill therefore be independent of prices when these prices are

determined solely from the supply side of the economy (e.g., prices that equal the marginal cost of production under perfect competition and constant returns to

(6)

The estimator requires that the latent variablev

i#xTib#ei be linear invi.

This also will be generally satis"ed in purchase decision applications, because such models are identical to willingness-to-pay models, except that the price is determined by the market instead of by experimental design. The consumer is

assumed to buy if the (log) actual price charged,!v

i, is less than the consumer's

(log) reservation price, xT

ib#ei, resulting in a latent variable linear in v.

However, unless purchases are observed over many regimes (or are determined by experimental design, as in the willingness to pay applications), the large

support assumption may not hold whenvis a price. Of course, other variables

may be suitable choices forvas well.

2.2. Identixcation

The ordinary least-squares identi"cation ofbis described here. The extension

to instrumental variables is provided in the next section.

Assumption A.1. Eq. (1.1) holds. The conditional distribution of v given x is absolutely continuous with respect to a Lebesgue measure with nondegenerate

Radon}Nikodym conditional densityf(vDx).

Assumption A.2. LetXdenote the support of the distribution of an observation

of (v,x). LetF

e(eDv,x) denote the conditional distribution of an observation of

e given an observation of (v,x), with support denoted X

e(v,x). Assume F

e(eDv,x)"Fe(eDx) andXe(v,x)"Xe(x) for all (v,x)3X.

Assumption A.3. The conditional distribution ofvgivenx has support [¸,K]

for some constants ¸ and K, !R)¸(0(K)R. The support of

!xTb!eis a subset of the interval [¸,K].

Assumption A.4. E(ex)"0. E(xxT) exists and is nonsingular.

These assumptions do not require independent observations, though the root

Nestimator provided later will assume independence. The identi"cation result

in Theorem 1 below only requires that the expectation of a certain function of

fbe identi"ed.

Assumption A.1 says thatyis given by the binary choice model (1.1) and that

v has a continuous distribution. Assumptions A.2 and A.4 were discussed in

Sections 1 and 2.1 in terms of Eqs. (1.2) and (1.3).

For estimation ofb, the distribution ofeis not required to be continuous, e.g.,

it can be discrete or contain mass points. However, later estimation of the error

distribution functionF

e will assume continuity.

The vector of regressorsxcan include dummy variables. Squares and

interac-tion terms, e.g.,x

(7)

(e.g., correlated with)v, though Assumption A.1 rules out having elements of

xbe deterministic functions ofv.

Assumption A.3 requiresvto have a large support. Standard models like logit

or probit have errors that can take on any value, which would by Assumption

A.3 require v to have support equal to the whole real line. This assumption

implies that the estimator is likely to perform best when the spread of

observa-tions ofvis large relative to the spread ofxTb#e(since if the observed spread of

vvalues were not large, then the observed data would resemble data drawn from

a process that violated Assumption A.3). Assumption A.3 assumes zero is in the

support ofv. To make this hold, for any constantiin the support ofv(e.g., the

mean or median ofv) we can rede"nevasv!_iand correspondingly addito

the estimated constant inxTb.

¹heorem1. Dexney8 by

y8"y!I(v'0)

f(vDx) . (2.1)

If Assumptions A.1}A.3 hold then E(y8 Dx)"xTb#E(eDx). If Assumption A.4 also holds then

b"E(xxT)~1E(xy8). (2.2)

Theorem 1 shows thatb is identi"ed, and can be estimated by an ordinary

least-squares regression ofy8 onx. Theorem 1 is proved in Appendix A.

3. Instrumental variables

This section describes models in which E(ex)O0, including endogeneous regressors and measurement error models, that can be estimated using an instrumental variables extension of Theorem 1. These models satisfy Eqs. (1.4) and (1.5) instead of (1.2) and (1.3). It should be recalled throughout this section

thatxandzcan overlap, that is, one or more elements of the vector of regressors

xcan also be in the vector of instrumentsz.

3.1. Example instrumentalvariables models

One way Eq. (1.5) arises is in structural models of the form (e,x)"_1(e₈,z)

wheree8 is a random vector that is independent of (v,z) and1is some vector

valued function. The function1, the variablese8, and the distribution function of

e8 do not need to be known, observed, or estimated. This structure has the

(8)

v and z is independent of v. As with many semiparametric estimators, the distribution of the regressors is not ancillary, and hence restrictions like this one

onvcan arise.

A system of equations that satis"es (1.1), (1.4) and (1.5) is, for any functionq,

y"I(v#xTb#e'0), (3.1)

x"_q(z)#_e, (3.2)

where eand e are unconditionally mean zero and the distribution of (e,e) is

independent of (v,z). Here e and e can be correlated, and hence e can be

correlated with x. Dependence of (e,e) on z would also be permitted in this

model, provided that (e,e) is conditionally independent ofv, conditioning onz.

The proposed estimator for b does not require specifying or estimating the

functionq.

Eqs. (3.1) and (3.2) are a standard example of a system of equations that would be recursive, except for the fact that the errors are correlated across equations.

More fully endogeneous systems, in whichxdepends onvory, are ruled out by

Eq. (1.5).

Eqs. (3.1) and (3.2) can arise in a standard mismeasured regressors framework. Consider the model

y"I(v#xHTb#eH'0), (3.3)

x"xH#_e8, (3.4)

xH"_q(z)#eH (3.5)

with E(eH, eH, e8)"0. Here vis a regressor that is observed without error, xis

a mismeasured observation of the unobservedxH,eHis the latent model error

and e8 is a vector of measurement errors. Each regressor x

j that is not

mis-measured has z

j"xHj"xj and eHj"e8j"0. Eqs. (3.3)}(3.5) are equivalent to

(3.1) and (3.2) withe"!e8Tb#eHand e"_e_H#_e8. Once again, estimation of

bwill not require speci"cation or estimation of the functionq.

Letw"v#xHTb#eH. If the latent variable wwere observed instead ofy,

then b would be estimated in this measurement error model by a two-stage

least-squares regression ofwonx, assuming that (eH, eH, e8) is uncorrelated with

(v,z). Wheny is observed instead of w, Eq. (1.5) will hold and the estimator

described in the next two sections can be used to estimateb, as long as (eH,eH, e8)

is independent of (v,z), or more generally if (eH,eH,e8) is uncorrelated withzand

its conditional distribution givenvandzdoes not depend on v. Having these

errors be independent ofvandzis stronger than necessary.

The only way Eqs. (3.3)}(3.5) di!er from standard linear mismeasured

re-gressors models (other than ybeing observed instead of the latent w) is that

standard models would permitxHto depend onv. That is possible in the current

(9)

z"o(v,u), which makes (3.5) hold with q(z)"z. The drawback of permitting

xHto depend onvin this way is that the estimation ofbwould then"rst require

estimation of z"o(v,u)"E(xDv,u) (as the "ts from parametrically or

non-parametrically regressing the observedxonvandu).

Eq. (1.5) says thateandvare conditionally independent, conditioned onxand

z, but does not rule out correlations ofvwithxandz. For example, the structure x"az#e₁, v"bz#e₂, ande"ce₁#e

3for any mutually independent

vari-ablesz, e₁, e₂, ande₃(withe₁ ande₃being mean zero) and any constantsa,b,

and c, makes (1.4) and (1.5) hold while making vcorrelate withx andz, and

xcorrelate withe. This structure will be used later in a Monte Carlo design.

3.2. Identixcation with instrumentalvariables

Theorem 1@below describes the instrumental variables identi"cation ofb, as

would be used for the models described in the previous section, that is, where

Eqs. (1.2) and (1.3) are replaced with (1.4) and (1.5). Note that Assumption A.4@

below is the standard assumption about instruments in two stage least squares regressions.

Assumption A.1@. Eq. (1.1) holds. The conditional distribution of vgiven z is absolutely continuous with respect to a Lebesgue measure with nondegenerate

Radon}Nikodym conditional densityf(vDz).

Assumption A.2@. LetXdenote the support of the distribution of an observation

of (v,z). LetF

ex(e,xDv,z) denote the conditional distribution of an observation of (e,x) given an observation of (v,z), with support denoted X

ex(v,z). Assume F

ex(e,xDv,z)"Fex(e,xDz) andXex(v,z)"Xex(z) for all (v,z)3X.

Assumption A.3@. The conditional distribution ofvgivenzhas support [¸,K]

for some constants ¸ and K, !R)¸(0(K)R. The support of

!xTb!eis a subset of the interval [¸,K].

Assumption A.4@. E(ez)"0, E(zzT) exists and is nonsingular, and the rank of

E(xzT) isJ(the dimension ofx).

De"neR

xz,Rzz,D, andy8HbyRxz"E(xzT), Rzz"E(zzT), D"(R

xzR~1zz RTxz)~1RxzR~1zz , (3.6)

y8H"y!I(v'0)

(10)

¹heorem 1@. If Assumptions A.1@}A.3@ hold then E(y8HDz)"E(xDz)Tb#E(eDz). If Assumption A.4@ also hold then

b"DE(zy8H). (3.8)

The earlier Theorem 1 is the special case of Theorem 1@ in which z"x.

Theorem 1@shows thatbis identi"ed, and can be estimated by an ordinary linear

two-stage least-squares regression ofy8Honx, using instrumentsz. The proof of

Theorem 1@is in Appendix A.

Corollary1. If Assumptions A.1@}A.4@hold then

b"DE

A

z

P

K L

E[y!I(v'0)Dv,z] dv

B

. (3.9)

The two-stage least-squares estimator for bbased on Eq. (3.8) entails

aver-agingzy8H, wherey8Hhas the conditional density ofvin the denominator, and

hence may be adversely a!ected by extreme observations ofv. Corollary 1

sug-gests an alternative based on numerically integrating an estimate of the

condi-tional expectation E[y!I(v'0)Dv,z]. While more complicated, this alternative

might be more robust to outliers inv.

3.3. Another estimator and a little intuition

This section describes an alternative estimator closely related to Theorem 1@,

which provides some insight into the identi"cation. This alternative will later be extended to yield an estimator for multinomial choice models.

Here again one can take z"x for applications that do not require separate

instruments.

LetGH(v,z)"E(yDv,z),s"s(x,e)"!xTb!eand let F

s(sDv,z) and fs(sDv,z)

denote the conditional distribution and probability density functions of s,

assuming thate(and hences) conditional onxandzis continuously distributed.

It follows from Eq. (1.5) that F

s(sDv,z)"Fs(sDz), so E[I(s(v)Dv,x,z]"

E[I(s(v)Dx,z]. Therefore,

F

s(vDz)"E[I(s(v)Dz]"E(yDv,z)"GH(v,z). (3.10)

If s were observed, then b could be estimated by a two-stage

least-squares regression of !s on x using instruments z. This

regres-sion would depend on s only through E(zs). The key insight is that this

regression does not require that s be observed. Only an estimate of E(zs)

(11)

is available. In particular,

whereyHHis de"ned by

yHH"!vLGH(v,z)/Lv

f(vDz) . (3.13)

Eqs. (3.12) and (3.13) show that Theorem 1@holds usingyHHin place ofy8H, so

b can be estimated as a linear two-stage least-squares regression of yHH on

xusing instrumentsz. Unlike Theorem 1@, this estimator would require a

pre-liminary estimate ofGHas well asf. Kernel estimators could be used for both.

The above analysis also provides an alternative derivation (and hence

in-terpretation) of Theorem 1@. Applying an integration by parts to the last

inte-gral in Eq. (3.11) yieldsb"DE[z:_KLGH(v,z)!I(v'0) dv]. Proceeding as in Eq.

(3.12) with this expression then yields b"DE(z[GH(v,z)!I(v'0)]/f(vDz))"

DE(z[y!I(v'0)]/f(vDz)), which is Theorem 1@.

4. RootN_{consistent estimation}

This section gives the root N consistent, asymptotically normal limiting

distribution for a two-stage least-squares estimator based on Theorem 1@, using

instrumentsz. All of the equations here simplify to the ordinary least-squares

estimator in Theorem 1 whenz"x.

Letu"u(z) be any vector of variables such that the conditional density of

vgivenzequals the conditional density ofvgivenu, that is,f(vDu)"f(vDz), where

no element of u equals a deterministic function of other elements of u. For

example, ifz"(1,z8,z82), we could takeu"z8. This construction ofuis employed

becausef(v

iDzi) will be estimated asf(viDui) using a kernel density estimator (see

Appendix B for details). Also, if v were known to be independent of some

elements ofz, thenucould exclude those elements. For example, in the

willing-ness to pay application discussed earlier,vis determined by experimental design

(12)

De"neyH_i andgby

Given the de"nition ofu,yH

i"y8Hi from Eq. (3.7), and ifx"zthenyHi"y8i from

Eq. (2.1). The estimator is based on the expression b"Dg, which holds by

Theorem 1@.

Assume we have an estimator fK(vDu) of the conditional density f(vDu). One

choice is a kernel estimate of the joint density ofvand udivided by a kernel

estimate of the density ofu(Eq. (B.3) in Appendix B). A simpler alternative is

given in the next section.

De"neRK

Theg( in Eq. (4.7) is a two-step estimator with a nonparametric"rst step. The

limiting distribution of estimators of these forms has been studied by many authors. See e.g. Newey and McFadden (1994) and references therein. Appendix B provides one set of regularity conditions that do not violate Assumptions

A.1@}A.4@and are su$cient for rootNconsistent, asymptotically normal

conver-gence ofg( and hence ofbK. To summarize the result, de"neq

Appendix B then provides conditions such that

JNg("N~1@2+N i/1

q

(13)

so q

i is the in#uence function for g(. It follows from (4.8) and (4.11)

that JNbK"JNDKg("N~1@2DK+_Ni_/1[z

is the sample variance of q(

i!zixTibK for i"1,2,N, and q(i is constructed by

replacingyH

i withy(Hi in Eq. (4.9) and replacing the conditional expectations in

that equation with nonparametric regressions.

If E(zDu)"z (as will often be the case given the de"nition of u) then

q

i"zi[yHi#E(yHiDui)!E(yHiDvi,ui)], and, the limiting distribution ofbK becomes identical to the limiting distribution of a two stage least-squares regression of

[yH

i#E(yHiDui)!E(yiHDvi,ui)] on x using instruments z. When f is a known

function that does not need to be estimated, the above expressions simplify

further to q

i"ziyHi, a two-stage least-squares regression of yHi on x using

instrumentsz.

Theorem 1@can be generalized to permit nonindependent and nonidentically

distributed observations, essentially by addingisubscripts toX,F, X

ex, and the

expectation operator. Only the conditional density function f(vDu) must be

assumed to be constant (or its variation"nitely parameterized) across

observa-tions, sofand hencebcan be estimated. Many results exist providing limiting

distribution theory for semiparametric estimators when observations are not independently or identically distributed. See, e.g., Andrews (1995).

In place of kernel estimators, a consistentbK could be obtained using a series

expansion-based density estimator offin Eq. (4.6), as in Gallant and Nychka

(1987). Alternatively, parametric speci"cations offmay be used in some

applica-tions, e.g., when v is income, which is known to be well approximated by

a lognormal distribution with a Pareto tail. A consistent bK could also be

obtained from Corollary 1, using either kernel or series based estimators of the conditional expectation in Eq. (3.9). The next section provides another es-timator, one that is computationally extremely simple.

4.1. Avery simple ordered data estimator

This section describes an alternative two-stage least-squares estimator based

on Theorem 1@that does not require a"rst-stage kernel estimator.

Assumption A.5@. Assume that for some vectorc, Eq. (4.13) holds, wherew i is continuously distributed with bounded support, has mean zero, and is

indepen-dent ofz

i. v

(14)

Letf

w denote the unconditional density function of w. If Assumption A.5@

holds thenf

1,2,w(N were sorted in ascending order, the number

immediately preceedingw(

iwould bew~i , and the number immediately following

w( It follows from Theorem 3 of Lewbel and McFadden (1997) that N~1+N_i_/1z

iy((Hi is a consistent estimator of E(z[y!I(v'0)]/fw(w)). Therefore, if

Assumptions A.1@}A.5@hold, then

bK"DK +N i/1

z

iy((Hi/N (4.16)

is a consistent estimator ofb.

To summarize, the ordered data estimator (4.16) consists of (1) de"new( as the

residuals from regressingvonzusing ordinary least squares, (2) sort thew( data

from smallest to largest, to"ndw`_i andw~

i for each observationi, (3) construct

y((H

i in Eq. (4.15) fori"1,2,N, and (4) letbK be the estimated coe$cients from

regressing thisy((H

i onxi using two stage least squares, with instrumentszi. As

with the other estimators, (4.16) reduces to an ordinary least-squares regression whenz"x.

This estimator is convenient for its numerical simplicity, but it requires the

extra Assumption A.5@ for consistency. This assumption limits the permitted

dependence ofvonz. White's test for homoscedasticity or more general tests of

independence of errors from regressors can be applied to w( versus z to test

Assumption A.5@.

5. Other model components

5.1. The sign of the coezcient ofv

Without loss of generality, the linear latent variable binary choice model can be written as

y

(15)

wherea"!1, 0, or#1. The estimator ofbassumeda"1. Ifais not known

a priori, then an estimator ofais required. Let G(v,x,z)"E(yDv,x,z). Assume

F

ing ehas a continuous distribution. Then G will be di!erentiable inv, so let

g(v,x,z)"_LG(v,x,z)/Lv. It then follows from (5.2) that

LE(yDv,x,z)/Lv"g(v,x,z)"_af

e(!va!xTbDx,z). (5.3)

Letd_v denote the density weighted average ofg(v,x,z), and letdK_vequal Powell

et al. (1989) estimator of the weighted average derivative d_v. By (5.3) and

the de"nition of d_v, a"sign(d_v). This suggests the estimator a("sign(dK_v).

SincedK_v converges at rate root N, sign(dK_v) will converge at a rate faster than

root N, so estimating a "rst in this way will not a!ect the limiting root

Ndistribution ofbK.

More generally,a"sign[g(v,x,z)] for all v,x, and z. Therefore a test of the

hypothesisg(v,x,z)"0 is a test of whether a"0. Appclicable nonparametric

consistent tests of g(v,x,z)"0 include Lewbel (1995) and Ait-Sahalia et al.

(1997). Also,G(v,x,z) must be nondecreasing invifa"1, nonincreasing invif

a"!1, and is independent ofvifa"0. The functionGcan be estimated as

a nonparametric regression ofyonv,x, andv. This GK could also be used as a

check on the model speci"cation, in that nonmonotonicity ofGas a function of

vwould indicate that the model is misspeci"ed. Equivalently, having the

non-parametric regression derivative g((v,x,z) be signi"cantly positive for some

values of (v,x,z) and signi"cantly negative for others would indicate model

misspeci"cation.

Suppose we failed to checka( as above, and erroneously applied the estimator

of Theorem 1 assuminga"1, when in facta"!1? It is straightforward to

verify, following the steps of the proof of Theorem 1, that ifa"!1 then E(y8HDz)

will equal E(xTb#e!¸!KDz), soplimbK will equalb!DE(z)(¸#K), where

D is given by Eq. (3.6) and¸ and K are the lower and upper bounds of the

support of v. If the support of v is symmetric, so ¸"!K, then bK will

consistently estimate b regardless of whether a is plus one or minus one.

However, this dependence on the support ofvimplies that the estimatebK is likely

to be quite erratic (i.e., very sensitive to the few largest observations of DvD) if

a"!1.

5.2. The distribution of the latent error

LetGK(v,x,z) be a kernel or other nonparametric regression ofyonv,x, andz,

(16)

Assuming again thata"1, it follows from Eq. (5.2) that

e andfe. The limiting

distribu-tions of these funcdistribu-tions will be the same as if bK were replaced by the true b,

becausebK converges to its limit at rate rootN.

Attributes ofFof interest include moments such as E[t(e)Dx,z] and E[t(e)]

for given functionst. For example,t(e)"eandt(e)"e2provide conditional

and unconditional means and variances ofe. Estimates of such moments could

be recovered fromg( since E[t(e)Dx,z]":t(e)g(!e!xTb,x,z) de. The

follow-ing theorem, which is closely related to McFadden (1993) and Lewbel (1997),

provides a simpler expression for E[t(e)Dx,z], which can be estimated without

"rst nonparametrically estimating the derivativeg.

¹heorem2. Let Assumptions A.1@}A.3@hold. Assumeehas a continuous conditional distribution with density function f

e(eDx,z)"fe(eDv,x,z) and support equal to the whole real line. Assume that t(e) is diwerentiable, t(0)"0, and lim

The assumption thatt(0)"0 is made, without loss of generality, since if it

does not hold thentcould be replaced witht(e)!t(0). De"ne

HK _t_i"_t@(!v

i!xTibK)[yi#I(vi#xTibK(0)!1]/fK(viDxi,zi). (5.6)

By Theorem 2, E[t(e)Dx,z] can be estimated by a nonparametric regression of

HK _t_i onx

i andzi, and unconditional moments E[t(e)] can be estimated as the

sample average ofHK _t_i. In particular,p2_e, the unconditional variance ofe, has

(17)

6. Extensions

6.1. Ordered choice

Ordered response models are summarized in, e.g., Maddala (1983, pp. 46}49).

The ordered choice model withK choices is de"ned as

y

i"1 (the constant) and letb1"0. This assumption is made without loss of

generality, since ifb₁O0 then eacha_k could be rede"ned asa_k!_b

To prove this result let g

k equal the expectation in (6.2). Now yki"

I(v

Eqs. (6.2) and (6.3) are in the form of the two-stage least-squares estimator,

and so have the corresponding rootNconsistent, asymptotically normal

distri-bution. In particular, eacha(

kequals!bK1in Eqs. (4.1)}(4.12), replacingyi with

y

ki in Eqs. (4.1) and (4.6), and eachbKj based on (6.3) equals the corresponding

bK_j in (4.12) replacingy

i with+Kk/1~1yk/(K!1) in Eqs. (4.1)}(4.6).

6.2. Multinomial choice

The estimator given by Eqs. (3.10)}(3.13) is here extended to multinomial choice model estimation. Multinomial (also called polychotomous) response model estimators are discussed in, e.g., Maddala (1983) and McFadden (1984). A convenient feature of the multinomial choice estimator described below is that it requires no numerical integrations, simulations, or searches.

Let the latent variablewH_ki, sometimes interpreted as the utility associated with

choicek, be given bywH_ki"_avH_ki#_b_H_T

k xi#eHkifork"0,2,K. It is assumed that

the special regressorsvH_ki are all distinct, sovH_ki does not equal a deterministic

function of the othervH_mi's, and that the special regressors have a joint

continu-ous distribution, conditional on x

(18)

assumed here thata, the coe$cient ofvH_ki, is negative and is the same for every

choicek. The coe$cientais then without loss of generality normalized to equal

!1. This change in normalization simpli"es the multinomial estimator.

De"nev

ki"vHki!vH0i, bk"(bHk!bH0), eki"(eHki!eH0i), and w

ki"!vki#xTibk#eki, k"0,2,K. (6.4)

Eq. (6.4) makesw

0i"0. The multinomial choice model withK#1 choices is

de"ned asy

0i,xi and (if necessary) a vector of instruments zi. The estimator can be

repeatedKadditional times, renumbering the choices so that each time a

di!er-ent choice is designated by 0, to yield estimates ofbH

k!bHmfor all pairs of choices

ex is now the conditional distribution of the vector of

errorseand the vector of regressorsx. It follows thatF

s(sDv,z)"Fs(sDz) where F

s is the conditional distribution of the vectors, and

E(y

0Dv,z)"E[I(sk(vk,k"1,2,K)Dv,z]"Fs(vDz). (6.6)

Assume thate, and hences, has a continuous conditional distribution. Let

f

s(sDz) denote the conditional probability density function ofs. LetXvdenote the

support ofv, a compact subsetRK, and assume that the support ofsis a compact

subset ofX

v. Then, using (6.6) and paralleling the derivation of (3.10)}(3.12),

E(xDz)Tb_k#E(e

Eq. (6.8) shows the identi"cation of b

k for k"1,2,K. Letting GK H(v,z) be

a nonparametric regression ofy

0i onvi andzi, andfK(vDz) be a kernel estimator

of the conditional densityf(vDz), the estimator corresponding to Eq. (6.8) is

(19)

ForK"1, Eq. (6.8) reduces (after sign changes) to Eq. (3.12). We could apply

integration by partsKtimes to the integral in (6.7) in an attempt to simplify the

estimator analogous to obtaining Theorem 1@ from Eq. (3.11). Unfortunately,

applying integration by parts to the derivatives of E(y

0Dv,z) with respect to

v

m for each mOk yields expressions involving E(y0Dv,z) evaluated at the

boundary of the support ofv

m, and hence the corresponding estimator would

require evaluating a nonparametric regression at the boundary of the support of the data. A similar transformation is required to derive the in#uence function for Eq. (6.9) (see, e.g., Stoker, 1991 for an analogous calculation), and hence Eq. (6.9),

while consistent, will in general converge at a slower than rootNrate.

6.3. Partly linear latentvariable choice models

Theorem 1@can be extended to estimatem(x) in the model

y

i"I[vi#m(xi)#ei'0], (6.10)

wherem(x) is an unknown smooth function. An example is wherey

i is a

pur-chase decision,!v

i is a log price (or a willingness to pay bid), andm(xi)#eiis

agenti's log reservation price (or willingness to pay) function. Let Assumptions

A.1}A.3, hold with Eq. (6.10) instead of Eq. (1.1), and assume E(eDx)"0. The

proof of Theorem 1 will then yield E(yHDx)"m(x)#E(eDx)"m(x), so a

consis-tent estimator ofm(x) is any nonparametric regression (e.g., a kernel regression)

ofy!I(v'0)/fK(vDx) onx.

Under weaker conditions regarding the error, one can estimate the vectorbin

models of the partly linear form

y

i"I[vi#u(x, b)#ei'0], (6.11)

where u(x,b) is a known function of the unknown parameter vector b. Let

Assumptions A.1@}A.4@ hold with Eq. (6.11) instead of Eq. (1.1). The proof of

Theorem 1@will then yield E(y8HDz)"E[u(x, b)#eDz], so

E

A

z

C

y!I(v'0)

f(vDz) !u(x,b)

DB

"0. (6.12)

Therefore, b can be consistenly estimated if it can be identi"ed from the

moments (6.12). Given the limiting distributions derived in Section 4, the resulting estimator will have the same limiting distribution as GMM using

the moment conditions E(z[q!u(x,b)])"0, whereqis de"ned in Eq. (4.9).

6.4. Threshold and censored regression models

For any scalari, de"ney

i(i) by y

(20)

so y

i(i) is one when the latent variable exceeds the threshold i, and zero

otherwise. Continue to assume thate

i satis"es either (1.2) and (1.3) or (1.4) and

(1.5). The binary choice model (1.1) is equivalent to this model when only y

i"yi(0) is observed. Assume now thatyi(i) is observed for at least two values

ofi. Without loss of generality, let the observed data bey

i(0) andyi(i) for some

knowni'0. This will be su$cient to identify bothband the scale parametera.

Let x

1i"1 for all i, so b1 is the constant term in the latent variable.

Maintaining the assumptiona'0, (6.13) can be written as

y

i(i)"I[vi#xTib(i)#ei(i)'0], (6.14)

where b₁(i)"(b₁!i)/a,b

j(i)"bj/a for j"2,2,J, and ei(i)"ei/a. Since

Eq. (6.14) is in the form of Eq. (1.1), for each observed thresholdiwe can apply

Theorem 1@ to estimate the vectorb(i) by linearly regressingyH(i) onxusing

instrumentsz, where

yH

i(i)"[yi(i)!I(vi'0)]/f(viDui). (6.15)

The limiting rootNdistribution of the resultingbK(i) is given by Eqs. (4.1)}(4.12)

withy

i(i) in place ofyi. GivenbK(0) andbK(i) for someiO0, rootNconsistent,

asymptotically normal estimates of the originalaandb in Eq. (6.13) are then

given bya("_i/[bK₁(0)!bK₁(i)] and, for alli, bK"_bK(i)a(. Ify

i(i) is observed for

more values ofi, then additional estimates ofaandbare obtained, and these

estimates could be e$ciently combined using a minimum chi squared procedure. This same estimator can also be used for censored regression models. Suppose w8

i is observed, where

w8_i"max(v

ia#xTib#ei, 0). (6.16)

This is the standard"xed censoring point censored regression model. In

particu-lar, if ewere normal and independent ofvand x then this would be a Tobit

model. Assume that the distribution ofe

iis unknown, but satis"es (1.2) and (1.3),

or (1.4) and (1.5). Then for anyi*0, we can constructy

i(i)"I(wi'i), and so the threshold model estimator of the previous paragraph can be used to estimate

aandb.

Some more e$cient, related estimators of censored and truncated regression

models based on weighting byfare given in Khan and Lewbel (1999).

7. A simulation study

7.1. Kernel and bandwidth selection

When Eqs. (B.1)}(B.3) are used to estimate f(vDu), a kernel and bandwidth

(21)

to eliminate asymptotic bias, it is commonly recognized that the greater vari-ance of high order kernels makes them inferior to low-order kernels unless the sample size is very large. For Eq. (B.1), the simulations here use the quartic product kernel

wherepl is the estimated standard deviation of the componentcl. The

analog-ous quartic product kernel is also used forK

vc in (B.2). This kernel is chosen

because it is computationally quick, it is generally well behaved (see HaKrdle,

1990), and is optimal in a related kernel averaging context (see HaKrdle et al.,

1992).

Ordinary cross validation methods are not suitable for choosing the

band-width because, typical of rootN semiparametrics, Theorem 3 requires

under-smoothing relative to optimal pointwise convergence. HaKrdle et al. (1992) and

Powell and Stoker (1996) calculate optimal bandwidths for some root N

es-timators of density functionals. While it is probably possible to adapt methods similar to theirs to the present context, a simpler procedure was used here.

Ify

The functional in Eq. (7.3) is very similar to the one de"ningbK, and by Theorem 3

the rate at which the bandwidth must shrink for rootNconvergence ofdK todis

comparable to that required forbK. This suggests choosing the bandwidth that

maximizes the accuracy ofdK.

This procedure is used in the Monte Carlo's. In each replication, the following

steps are performed. First,fK in Eq. (B.3) is estimated using each of the

band-widths 0.5, 1, 1.5, 2, 2.5, 3, 3.5, and 4, resulting in eight di!erent estimates offK.

Next, lettingdequal twice the estimated standard deviation ofv,dK in Eq. (7.3) is

estimated using each of the eight estimates of fK, and (dK!_d)2is calculated for

each. Among these eight, the fK that minimizes (dK!_d)2 is then used for that

replication to estimatebK and its asymptotic variance.

(22)

7.2. Monte Carlo analysis

The simulated model is

y

i"I(vi#b1#b2x2i#ei'0) (7.4)

for scalar random variables y

i, vi, x2i, and ei (x1i"1 is the constant). Let

e

i1,ei2, ei3, andei4 be independent random variables, each having mean zero

and variance one, wheree

i1 is uniformly distributed,ei2 andei3 are standard

normals, ande

i4 is a mixture of normals de"ned below.

The&clean probit' design de"nes regressors, errors, and instruments by

x

2i"ei1, vi"2ei2, e"ei3, ui"x2i,zi"(1,ui)T (7.5)

makingv

iandeibe independent normals, andx2ibe an independent uniformly

distributed regressor. In this clean designz

i"xi, so the two-stage least-squares estimator reduces to ordinary least squares.

Lete_i₄be N(!0.3, 0.91) with probability 0.75 and N(0.9, 0.19) with

probabil-ity 0.25. This mixture of normals is designed to yield a distribution that is both

skewed and bimodal, but still has mean zero and variance one. The &messy'

design de"nes regressors, errors, and instruments by

x

2i"ei1#ei4,vi"2ei2#ei4, ei"ei1#ei3, ui"ei4, zi"(1,ui)T. (7.6) The result is that this messy design has regressors and instruments that are all skewed, multimodal, and cross correlated with each other, and further has errors e

i_{The true values of the coe$cients are}that are correlated withxi with correlation 0.5._b

1"b2"1. The models are estimated

with sample sizeN"100. The number of Monte Carlo replications is 10,000.

The results are in Tables 1 and 2.

The two rows in each block of numbers in the tables correspond to the

estimates bK₁ and bK₂, respectively. The summary statistics reported for the

Monte Carlo distribution of each bK_j are the mean (MEAN), the standard

deviation (SD), the lower quartile (LQ), the median (M), the upper quartile (UQ), the root mean squared error (RMSE) the mean absolute error (MAE), and the median absolute error (MDAE).

The replications include calculation of the estimated standard errors of the

b@s, as described after Eq. (4.12). The mean of these estimated standard error

estimates (MESE) is reported, as is the percent of replications in which eachbK

jis

within two estimated standard errors of the trueb_j (%2SE).

Four di!erent estimators are compared. The "rst is maximum likelihood

(23)

The second estimator is the proposed semiparametric estimator in Eqs.

(4.3)}(4.8), but using the true conditional densityf(vDu) that thev

i observations

are drawn from.

The third estimator is the same as the second, except that it uses the

kernel estimated densityfK(vDu) instead of the truef(vDu). There are two sources

of error inbK. One source is the least-squares estimation error from regressing

yH

i onxi, and the other source of error is the use of the estimated conditional

density functionfK in place of the true densityfin constructingy(H

i. The second

and third blocks of numbers in the tables can be compared to assess the separate

contributions of these sources of error. The estimated density fK in the third

estimator is given by Eqs. (B.1)}(B.3) withd

i empty andci"ui. The kernels and

bandwidths are chosen as described in the previous section. Althoughv

i and e

ihave supports equal to the whole real line, no trimming is done, soqN"0 in

Eq. (B.3).

The fourth reported estimator is the simple ordered data estimator, Eq. (4.16).

As would be hoped, the semiparametric estimator using the true f appears

close to mean unbiased in both the clean and messy designs. The semiparametric

estimator usingfK has a 12}14% mean and median bias in the clean design, and

a more substantial 20}57% bias in the messy design. For comparison, ML probit appears unbiased in the clean design, but is much worse than the semiparametric estimator in the messy design (where it is inconsistent), having a 46}91% bias. The observation that the semiparametric estimator using the

truefis virtually mean unbiased in both designs implies that the estimator will

be unbiased even in small samples when the densityfis known, and should be

close to unbiased whenever samples are large enough to accurately estimatef.

The simple ordered data estimator performed comparably to the kernel-based estimator, having greater variance and less mean bias, leading overall to a mod-estly higher root mean squared error.

Variances and other measures of error magnitudes are similar across the estimators, with di!erences being primarily due to the above mean and median biases. The exceptions are that the root mean squared errors of ML are about 30% smaller than the semiparametric estimator in the clean probit design

(where ML is e$cient), and the variance ofbK is quite large using the truefin the

messy design. This latter result is likely due to a few outliers arising from very

tinyfvalues, and might be mitigated with trimming. A similar result is found in

Lewbel (1997). It is also possible that the asymptotic variance usingfK is smaller

than the asymptotic variance using the truef. This would be consistent with the

densityfbeing ancillary tobK.

In applications, con"dence intervals and test statistics are often as important as point estimates, yet many Monte Carlo analyses fail to evaluate the quality of standard error estimates. Standard errors are estimated here exactly as they

would be with real data. In each replication,q

iis estimated by replacingyHi with

(24)

a kernel regression. Standard errors are then given by the square root of the

diagonal of DK var(qY (

i!zixTibK)DKT]/N. For the probit model, White corrected

standard errors are reported. In keeping with the spirit of the simple ordered data estimator, for that estimator ordinary two-stage least-squares White

cor-rected standard errors (which ignore estimation error in the construction ofy((H

i) are reported.

Standard errors are of course intended to estimate the standard deviations

of bK. In almost all of the estimators, the mean estimated standard errors

of the estimators are close to the Monte Carlo standard deviations, as

desired. The exception is the estimator using the true f in the messy design,

which as discussed above su!ers from outliers. The coverage probabilities of $2 standard error con"dence intervals are close to (but a little smaller than)

95%, except for bK₂ in the messy design which su!ered from substantial

mean bias.

As discussed earlier, based on the support assumptions it is expected that the

larger the observed spread ofv, the better the performance of the estimator. This

is demonstrated in Table 3. In Tables 1 and 2, v was constructed to have

the same standard deviation as xTb#e. The design in Table 3 is identical

to the messy design in Table 2, except that eachvwas doubled, so in Table 3 the

standard deviation ofvis twice the standard deviation ofxTb#e. Comparing

Tables 2 and 3 shows that this increased spread in v reduces the mean and

median bias considerably.

These results are encouraging. Limited experiments with other sample sizes indicate that the mean and median biases, where present, do diminish as

Nincreases, though rather slowly.

Table 1

Clean probit design!

j MEAN SD LQ MED UQ RMSE MAE MDAE MESE %2SE

Probit 1 1.00 0.21 0.86 0.99 1.14 0.21 0.17 0.14 0.20 0.94

ML 2 1.01 0.22 0.86 1.00 1.15 0.22 0.17 0.15 0.21 0.94

Semipar 1 1.00 0.28 0.81 0.99 1.17 0.28 0.22 0.18 0.27 0.94

truef 2 1.00 0.30 0.80 0.98 1.19 0.30 0.24 0.20 0.28 0.94

Semipar 1 1.13 0.27 0.95 1.13 1.31 0.30 0.24 0.20 0.29 0.94

kernelfK 2 1.14 0.32 0.92 1.12 1.33 0.35 0.27 0.21 0.32 0.92

Simple 1 1.00 0.30 0.80 0.98 1.19 0.30 0.24 0.19 0.34 0.97

ordered 2 1.00 0.36 0.76 0.98 1.20 0.36 0.28 0.23 0.36 0.94

(25)

Table 2 Messy design

Probit 1 1.46 0.35 1.22 1.43 1.67 0.57 0.48 0.44 0.33 0.76

ML 2 1.91 0.39 1.64 1.85 2.13 0.99 0.91 0.85 0.37 0.24

Semipar 1 1.01 2.10 0.53 0.85 1.23 2.10 0.55 0.38 0.60 0.90

truef 2 0.99 2.64 0.38 0.73 1.20 2.64 0.70 0.49 0.69 0.80

Semipar 1 0.80 0.38 0.55 0.81 1.06 0.43 0.34 0.29 0.37 0.92

kernelfK 2 0.43 0.40 0.18 0.44 0.69 0.69 0.59 0.57 0.34 0.59

Simple 1 0.87 0.60 0.47 0.82 1.19 0.61 0.47 0.39 0.57 0.91

ordered 2 0.77 0.69 0.32 0.67 1.09 0.73 0.57 0.49 0.60 0.80

Table 3

Messy design, V doubled

Probit 1 1.34 0.36 1.10 1.33 1.57 0.49 0.40 0.35 0.33 0.81

ML 2 1.73 0.32 1.51 1.71 1.93 0.80 0.73 0.71 0.31 0.33

Semipar 1 1.00 0.69 0.55 0.95 1.37 0.69 0.51 0.42 0.60 0.93

truef 2 0.97 0.87 0.42 0.85 1.38 0.87 0.63 0.50 0.68 0.88

Semipar 1 0.99 0.58 0.61 1.00 1.38 0.58 0.46 0.38 0.56 0.94

kernelfK 2 0.71 0.59 0.32 0.70 1.09 0.66 0.53 0.45 0.53 0.87

Simple 1 0.98 0.79 0.47 0.90 1.40 0.79 0.59 0.47 0.71 0.92

ordered 2 0.94 0.96 0.35 0.81 1.36 0.96 0.68 0.53 0.77 0.86

8. Conclusions

This paper used instruments (or regressors) uncorrelated with latent

variable errors and a &special regressor' to obtain semiparametric

identi"ca-tion of discrete choice model coe$cients in the presence of condiidenti"ca-tional heteroscedasticity of unknown form, and for some endogeneous and

mis-measured regressor models. Simple root N consistent, asymptotically

normal estimators for the binary choice model were constructed based on this identi"cation. A Monte Carlo analysis shows that the estimators work reason-ably well.

The Monte Carlo also indicates that the asymptotic standard error estimates perform well. It should be possible to construct bootstrap con"dence intervals instead (though the consistency of doing so is not proved here). The estimator

requires at most order N2calculations, so bootstrapping would be

computa-tionally practical.

Successful application of the estimator hinges crucially on appropriate

selec-tion and construcselec-tion of the special regressor v. Theoretical and simulation

(26)

or otherwise centered close to zero,var(vDx) should be large relative tovar(xTb),

andvshould have a strong positive e!ect ony.

Further study is called for on ways to improve the"nite sample properties of

the estimator. For example, accuracy might be improved by replacingvandxTb

withv!iandxTb#ifor one or more values ofi, or more generally replacing

vandxTbwithv!xThandxT(b#h) for some vectorh. Di!erent values ofior

hwill result in di!erent numbers of observations in which y

i!I(vi'0), and

henceyH

i, is nonzero. Possibilities might include lettingibe the sample mean or

median ofv, or lettinghbe the coe$cients from regressingvonx, analogous to

the ordered data estimator. Note thatiorhwould need to be chosen in some

way that does not violate the assumed conditional independence ofewith the

transformedv.

It would also be useful to study the more complicated estimators based on

Eqs. (3.9), (3.12), and (4.16), to see if they perform better in "nite samples.

Estimation accuracy might also be improved by better bandwidth choice pro-cedures.

The estimators in this paper can be applied and extended in a variety of ways.

HonoreH and Lewbel (1999) extend them to derive binary choice panel estimators

with weakly exogeneous regressors. Khan and Lewbel (1999) construct related estimators for censored regression models. Lewbel and McFadden (1997) apply

the present paper's estimator as the"rst stage in the estimation of features of the

distribution of the latent variable in applications where v is determined by

experimental design.

E$ciency of this paper's estimator might be increased by use of some

weighted least-squares procedure, or by estimatingband the density ofvjointly

instead of in two stages. Lewbel and McFadden (1997) report comparisons of the ordinary least squares (z"x) estimator derived here to Klein and Spady

(1993). They "nd that this paper's least-squares estimator is very ine$cient

relative to Klein and Spady when the errors are independent of the regressors (i.e., when Klein and Spady is consistent and semiparametrically e$cient.). It is not known what the e$ciency bound is under the present paper's assumptions, where general forms of heteroscedasticity are permitted.

Acknowledgements

This research was supported in part by the National Science Foundation through grant SBR-9514977. The author wishes to thank some anonymous

referees, Dan McFadden, Bo HonoreH, Jim Heckman, Xiaohong Chen, Richard

(27)

Appendix A. Proofs

Eq. (3.8) then follows from the law of iterated expectations.

(28)

Note thatt(e)"0 ate"0, and that the last equality above is an integration by

parts, which uses the assumption that [t(e)FH(e,x,z)]D=_~="0. By the above

equation, the de"nition ofFH, and Eq. (5.4),

E[t(e)Dx,z]"

P

=

~=

t@(e)[G(!e!xTb,x,z)#I(e'0)!1] de.

Next do the change of variablesv"!e!xTbto get

E[t(e)Dx,z]"

P

=

~=

t@(!v!xTb)[G(v,x,z)#I(v#xTb(0)!1] dv

"

P

=

~=

t@(!v!xTb)[E(yDv,x,z)#I(v#xTb(0)!1] dv

"E[t@(!v!xTb)[E(yDv,x,z)

#I(v#xTb(0)!1]/f(vDx,z)Dx,z].

Eq. (5.5) then follows from the law of iterated expectations.

Appendix B. RootN_convergence

A set of regularity conditions that are su$cient for root N consistent,

asymptotically normal convergence of g(, and hence of bK, is provided here as

Theorem 3 below. Theorem 3 is a special case of a two step estimator with

a kernel estimated nonparametric "rst step. The general theory of such

es-timators is described in Newey and McFadden (1994) and Newey (1994). The proof of Theorem 3 is omitted, because the result is a special case of the theorems provided in Appendix B of Lewbel (1998), and is little more than an application of Theorem 8.11 in Newey and McFadden (1994). See also Robinson (1988), Powell et al. (1989), Lewbel (1995), and Andrews (1995).

As discussed in Section 4, de"ne uto be the vector of variables such that

f(vDu)"f(vDz) and no element ofuequals a function of other elements ofu. In

Theorem 3,f(vDu) is used for estimatingg. The vectorubelow is divided into

a vector of continuously distributed elements c and a vector of discretely

distributed elementsd, to permit regressors and instruments of both types.

Theorem 3 gives the limiting distribution for an estimate of E[w/f(vDu)] for

arbitraryw.

Assumption B.1. Each u_i"(w

i,vi,ui) is an independently, identically

distrib-uted draw from some joint data generating process, fori"1,2,N. LetXbe the

(29)

Assumption B.2. Let u

i"(ci,di) for some vectors ci and di. The support of

the distribution ofc

iis a compact subset ofRk. The support of the distribution of

d is a "nite number of real points. The support of the distribution of v i is

some interval [¸,K] on the real line R, for some constants ¸ and

K,!R)¸(K)R. The underlying measure l can be written in

product form as l"l

w]lv]lc]ld, where lc is Lebesgue measure on Rk.

Each c

i is drawn from an absolutely continuous distribution (with respect to

a Lebesgue measure withkelements). De"nef

u(ui) to be the (Radon}Nikodym)

conditional density of u

i given di, multiplied by the marginal probability

mass function ofd

i, and de"nefvu(vi,ui) to be the (Radon}Nikodym) conditional

density of (v

i,ui) givendi, multiplied by the marginal probability mass function

ofd

i.

Assumption B.3. Letf(vDu)"f

vu(v,u)/fu(u) and lethi"wi/f(viDui). Assumef(vDu) is

bounded away from zero except possibly on the boundary of the support ofv,

and assume that for any positive constantq, sup_u|X I(DvD)q~1)DwD/f(vDu) exists.

Assume there exists a positive constant0such thatw"0 whenevercis within

a distance0of the boundary of the support ofc.

Assumption B.4. Let I

qi"I(DviD)q~1), nu(ui)"!E(IqihiDui) and nvu(vi,ui)"

Assumption B.5. The kernel functionsK

c(c) andKvc(v,c) have supports that are

convex, possibly unbounded subsets of Rk and Rk`1, respectively, with

nonempty interiors, with the origin as an interior point. The kernel functions are bounded di!erentiable functions having absolutely integrable fourier

trans-forms.K

c(c) satis"es:Kc(c) dc"1,:cKc(c) dc"0, andKc(c)"Kc(!c) for allc,

and similarly forK

(30)

Assumption B.6. The kernelK

All partial derivatives of f

t(c,d) with respect to c of order p exist, and for

all 0)o)p and all d on the support of d

i, for l1#2#lk"o,

:n

u(c,d)[Loft(c,d)/Ll1c12Llkck] dcexists, where the integral is over the support

ofc. All of the conditions in this assumption also hold forK

vcandfvc, replacing

cwith (v,c) everywhere above.

Assumption B.7. There exists a positive constant qH such that ¸(!1/qH, 1/qH(K, andy!I(v'0)"0 for allDvD'1/qH. For allNgreater

than some constant,q_N satis"esq_N(_q_H.

De"ne the kernel density estimators:

fK_u(c,d)"(Nbk_N)~1+N

N is the bandwidth andqN is a trimming parameter.

Eqs. (B.1) and (B.2) constructfK_u andfK_vu separately for each value of d

i, and

then averages the results. Theorem 3 below also holds ifI(d"d

i) in Eqs. (B.1)

and (B.2) is replaced by K

d[(d!di)/bN] for some kernel function Kd, which

results in smoothing data across discreted&cells'at small sample sizes, and at

large sample sizes becomes equal to (B.1) and (B.2).

(31)

Theorem 3 implies that JN(g(!_g)NN[0,var(q)]. Eqs. (4.9)}(4.11) follow

from Theorem 3 by takingw

i"zi[yi!I(vi'0)] in Assumptions B.1 and B.3, and Eq. (B.6).

In general, kernel plug-in estimators employ some form of trimming to deal

with boundary issues that otherwise interfere with rootNconvergence.

Assump-tion B.3 assumesw

i"0 whenciis a near a boundary, which automatically trims out those observations. This assumption could be replaced by asymptotic trimming. Moreover, this trimming is only needed to deal with the e!ects on

the bias and variance of terms (in the sample average that de"nesg() that lie

near the boundary of the support ofc. Since the probability of observing ac

i

near the boundary of the support is zero, and the magnitude ofh

iremains"nite

even when c

i on the boundary, violations of this assumption are unlikely to

seriously e!ect the estimator in practice.

Next, consider trimming ofv. The estimator in Theorem 3 is asymptotically

equivalent to a"xed&trimming'of all observations havingDvD'1/qH, which is

inside the support of v. By Assumption B.7, h

i"0 for all DviD'1/qH, so the estimand is una!ected by this trimming. Given Eq. (1.1), this assumption

requires thatxTb#e have bounded support, and hence rules out normal or

logistic errors. This is not as serious a limitation as it might seem, since for

example anedistribution that was a normal, truncated at plus and minus 100

standard deviations would not be ruled out, and would be indistinguishable from a normal at any feasible sample size.

Alternatively, from Lewbel (1997) (based on Robinson, 1988), Theorem 3 will hold replacing Assumption B.7 with the following asymptotic

trimm-ing Assumption B.7@, which permits both e and v to have support equal

to the whole real line, but requires the distribution ofvto have relatively thick

tails.

Assumption B.7@. For all (u,v) on their support, E([DhD#E(DhD)#E(DhDu)#

E(DhDu,v)]l) exists for somel'2. The support ofcis compact. Letf

v(v) be the

marginal density of an observation of vand let f(v)"E[hf

v(v)Dv]. There exist

constants d'0 and dH'0 such that as DvDPR, zeta(v)"O(DvD~1~d) and

E[DqD2f

v(v)Dv)"O(DvD~1~dH). Nq2dNP0 asNPR.

Given these assumptions,fK can and should be trimmed away from zero.

References

Ait-Sahalia, Y., Bickel, P.J., Stoker, T.M., 1997. Goodness of"t tests for regression using kernel methods. Unpublished manuscript.

(32)

Gallant, A.R., Nychka, D.W., 1987. Semi-nonparametric maximum likelihood estimation. Econo-metrica 55, 363}390.

HaKrdle, W., Hart, J., Marron, J.S., Tsybakov, A.B., 1992. Bandwidth choice for average derivative estimation. Journal of the American Statistical Association 87, 218}226.

HaKrdle, W., Horowitz, J.L., 1996. Direct semiparametric estimation of single-index models with discrete covariates. Journal of the American Statistical Association 91, 1632}1640.

HonoreH, B.E., Lewbel, A., 1999. Semiparametric binary choice panel data models without strictly exogeneous regressors. Unpublished manuscript.

Horowitz, J.L., 1992. A smoothed maximum score estimator for the binary response model. Econometrica 60, 505}532.

Horowitz, J.L., 1993. Semiparametric estimation of a work-trip mode choice model. Journal of Econometrics 58, 49}70.

Ichimura, H., 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics 58, 71}120.

Khan, S., Lewbel, A., 1999. Weighted and two stage least squares estimation of semiparametric censored and truncated regressions. Unpublished manuscript.

Klein, R., Spady, R.H., 1993. An e$cient semiparametric estimator for binary response models. Econometrica 61, 387}421.

Lewbel, A., 1995. Consistent nonparametric tests with an application to slutsky symmetry. Journal of Econometrics 67, 379}401.

Lewbel, A., 1997. Semiparametric estimation of location and other discrete choice moments. Econometric Theory 13, 32}51.

Lewbel, A., 1998. Semiparametric latent variable model estimation with endogenous or mis-measured regressors. Econometrica 66, 105}121.

Lewbel, A., McFadden, D.L., 1997. Estimating features of a distribution from binomial data. Unpublished manuscript.

Maddala, G.S., 1983. Limited Dependent and Qualitative Variables in Econometrics. Econometric Society Monograph No. 3. Cambridge University Press, Cambridge.

Manski, C., 1975. Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics 3, 205}228.

Manski, C., 1985. Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator. Journal of Econometrics 27, 313}334.

McFadden, D.L., 1984. Econometric analysis of qualitative response models. In: Griliches, Z., Intriligator, M.D. (Eds.), Handbook of Econometrics, Vol. 2. Elsevier, Amsterdam, pp. 1395}1457. McFadden, D.L., 1993. Estimation of social value from willingness-to-pay data. Unpublished

manuscript.

Newey, W.K., 1985. Semiparametric estimation of limited dependent variable models with endogen-ous explanatory variables. Annales de l'INSEE 59}60, 219}237.

Newey, W.K., 1994. The asymptotic variance of semiparametric estimators. Econometrica 62, 1349}1382.

Newey, W.K., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engle, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics, Vol. iv. Elsevier, Amsterdam, pp. 2111}2245.

Newey, W.K., Ruud, P.A., 1994. Density weighted linear least squares. University of California at Berkeley working paper.

Powell, J.L., Stock, J.H., Stoker, T.M., 1989. Semiparametric estimation of index coe$cients. Econometrica 57, 1403}1430.

Powell, J.L., Stoker, T.M., 1996. Optimal bandwidth choice for density-weighted averages. Journal of Econometrics 75, 291}316.

(33)

Ruud, P.A., 1983. Su$cient conditions for the consistency of maximum likelihood estimation despite misspeci"cation of distribution in multinomial discrete choice models. Econometrica 51, 225}228.