Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol101.Issue2.2001:

(1)

*Corresponding author. Tel.:#1-480-965-6574; fax:#1-480-965-0748. E-mail address:[email protected] (S.C. Ahn).

GMM estimation of linear panel data models

with time-varying individual e!ects

Seung Chan Ahn

!

,

*, Young Hoon Lee

"

, Peter Schmidt

#

!Department of Economics, Arizona State University, Tempe, AZ 85827, USA

"Department of Economics, Hansung University, Seoul, South Korea

#Department of Economics, Michigan State University, E. Lansing, MI 48824, USA Received 25 July 1995; received in revised form 10 August 2000; accepted 2 October 2000

Abstract

This paper considers models for panel data in which the individual e!ects vary over time. The temporal pattern of variation is arbitrary, but it is the same for all individuals. The model thus allows one to control for time-varying unobservables that are faced by all individuals (e.g., macro-economic events) and to which individuals may respond di!er-ently. A generalized within estimator is consistent under strong assumptions on the errors, but it is dominated by a generalized method of moments estimator. This is perhaps surprising, because the generalized within estimator is the MLE under normal-ity. The e$ciency gains from imposing second-moment error assumptions are evaluated; they are substantial when the regressors and e!ects are weakly correlated. ( 2001 Elsevier Science S.A. All rights reserved.

JEL classixcation: C23

Keywords: Panel data; Time-varying e!ects; Generalized method of moments; MLE

1. Introduction

The literature on panel data often assumes observations on a large number of

individuals with several observations on each individual. The"xed e!ects model

(2)

1We would like to thank an Associate Editor for providing us this example.

has been widely adopted as a treatment of unobservable individual heterogen-eity in such data. This model assumes that an unobservable time-invariant variable (or set of variables) is associated with each individual. (See, for example, Mundlak, 1978; MaCurdy, 1981; Chamberlain, 1984.) The impact of the missing

variable on the dependent variable is referred to as the individual e!ect, and one

wishes to avoid the potential bias that occurs when the individual e!ect is

correlated with the included explanatory variables. The conventional "

xed-e!ects treatment of time-invariant individual e!ects has been the within

es-timator, which is equivalent to least squares after transformation of the data to deviations from individual means.

In this paper, we consider the case that the individual e!ects are time varying.

That is, we wish to accommodate time-varying unobservables that a!ect the

dependent variable di!erently for di!erent individuals, and that may be

corre-lated with the included explanatory variables. This is possible under the

assump-tion that the temporal pattern (though not necessarily the level) of the e!ects is

the same for each individual.

More speci"cally, we will consider the model

y

it"Xitb#Zic#uit,

u

it"htai#eit,i"1,2,N,t"1,2,¹.

(1)

Hereidenotes cross-sectional unit (individual) andtdenotes time. We assume

thatNis large while¹is small, so that asymptotic properties of estimates will be

derived assuming NPR with ¹"xed. The dependent variable is y

it, Xit is

a 1]kvector of time-varying explanatory variables, andZ

i is a 1]gvector of

time-invariant regressors. The last entry ofZ

i is one, so that the last parameter

incdenotes the overall intercept term. Thee_itare random noise with E(e_it)"0.

Thea

i are unobservables, andht is the parameter measuring the e!ect ofai on

y

itat timet. Clearly this speci"cation implies that the temporal pattern of the

e!ect ofa

ionyitis the same across di!erenti. For identi"cation of the model, we

set h₁"1. (This does not involve any loss of generality, so long as h₁O0. Alternatively, we can follow Kiefer (1980a) who normalizes the model by setting +_Tt_/1h2

t"1.) This model includes the conventional "xed e!ects model as the

special case thath

t"1 for allt.

This model may be useful in several applications. First, in a model of earnings

or hourly wages, the e!ecta

imay be the individual's unobservable talent or skill

and the parameter h

t would represent the price of the skill which is not

necessarily constant over time.1Second, speci"cation (1) may be used to

(3)

Schmidt (1993) have applied model (1) to the frontier production function

(e$ciency measurement) problem, assuming that the temporal pattern of

tech-nical ine$ciency is the same for all"rms. Third, as noted above, a more common

motivation would be to control for unobservables that are not time invariant, but whose temporal variation is the same for all individuals. A macro shock would have this feature: In a rational expectation model using panel data, the

parameterh

tcould be thought of as a macro shock and the e!ectai as the e!ect

of a shock on individuali(heterogeneous over individuals due to di!erences in

individuals' information sets, preferences, etc.). There is a large literature on

estimation of rational expectations models using panel data. This literature dates back at least to Hall and Mishkin (1982) and Shapiro (1984); see Keane and Runkle (1992) for a more recent example. The equation that is estimated has an error that is interpretable as a forecast error, or macro shock, and which correspondingly cannot be averaged away over individuals. Thus, the usual

orthogonality conditions that re#ect uncorrelatedness of the shock with

vari-ables in the information set at the time the forecast is made do not lead to

consistent estimation asNPRwith¹"xed (see Hayashi, 1992). However, the

model given above can accommodate such a macro shock. Further examples for model (1) are given in Appendix A.

There are only a few previous studies of models which involve time-varying

individual e!ects. Holtz-Eakin et al. (1988) consider a vector autoregressive

model which allows for nonconstant individual e!ects (see also Chamberlain,

1984). They estimate the model by applying instrumental variables to the

quasi-di!erenced autoregressive equations. Chamberlain (1992) considers a

gen-eral random coe$cient model which speci"es only the conditional expectation

of the dependent variable given exogenous variables and the individual e!ect.

He derives an e$cient generalized method of moments (GMM) estimator based

on the moment conditions implied by the conditional mean speci"cation.

This paper was motivated by the approach of Kiefer (1980a) and Lee (1991).

They considered estimation of (1) by least squares, treating both theh

t and the

a

i as "xed, and derived a concentrated least-squares (CLS) estimator as the

solution to an eigenvalue problem. If thee

itare iid normal and the explanatory

variables are independent of thee

it, this estimator is the maximum likelihood

estimator (MLE) and can also be interpreted as a conditional maximum likeli-hood estimator (CMLE). A nonstandard and noteable result is that the consist-ency of the CLS estimator depends on the second-moment assumptions on the

e

it. In particular, consistency requires that theeit be non-autocorrelated and

have constant variance. Furthermore, given the strong assumptions needed for

consistency, the CLS estimator turns out to be ine$cient. We show in this paper

that it is dominated by a GMM estimator that makes use of the "rst- and

second-order moment conditions implied by exogeneity of the regressors and by

nonautocorrelation and homoskedasticity of the e

it. This is an unusual and

(4)

thought to be the potential inconsistency of the MLE. Neyman and Scott (1948)

provide another example of a case in which the MLE is consistent but not e$cient.

This paper di!ers from the work of Chamberlain (1992) primarily because we

view the regression model as a projection model rather than one based on conditional expectations. Thus, for example, we impose conditions of no cor-relation between explanatory variables and errors, but we do not consider nonlinear functions of explanatory variables that may be relevant under the conditional expectation model. Instead, we focus on exploiting the moment conditions implied by restrictions on the variances and covariances of the errors,

and we show that imposing these conditions can result in substantial e$ciency

gains. In keeping with the spirit of the projection model, we consider the unconditional variances and covariances of the errors, whereas a generalization of the conditional expectations model would naturally focus on conditional variances and covariances, and again would generate additional nonlinear moment conditions.

The plan of the paper is as follows. Section 2 de"nes some notation and lists

some assumptions. Section 3 considers GMM estimation based on assumptions of exogeneity of the regressors and several sets of assumptions about the second

moments of the errors. Section 4 gives some results on the e$ciency of these

estimators. Section 5 examines the properties of the CLS estimator from

a GMM perspective and shows why it is ine$cient. Section 6 provides

calcu-lations of the relative asymptotic e$ciency of the GMM and CLS estimators.

Finally, our concluding remarks are given in Section 7.

2. Preliminaries

The model to be considered is as given in Eq. (1) above. De"ning

h"(h₂,2,h_T)@andn"(1,h@)@, we can write the¹observations for individualias

We"rst make the following Basic Assumptions (BA):

[=

i,ai,ei@] is independently and identically distributed (iid) overi.

(BA.1)

e

ithas"nite moments up to fourth order, and E(eit)"0.

(5)

The second moment matrixRis"nite and nonsingular.

i as random, our model is intended to capture the

essential feature of a"xed e!ects model, in the usual sense thata

iis allowed to be

correlated with =

i in an unrestricted way. The assumption thatai is

uncor-related withe

iis standard, and is typically motivated in the following way. If the

a

i were taken to be literally"xed, we would plausibly assume a condition like

&lim

N?=N~1Ria2i exists and is "nite'. But, given our assumptions on ei, this

would imply thatplim

N?=N~1Riaiei"0, which is the same result as is implied

by our assumption thata_i ande_i are uncorrelated.

Assumption (BA.5) is clearly weaker than the assumption that

E(e

iD=i,ai)"0. The implication of (BA.5) is that model (1) re#ects a linear

projection speci"cation rather than a conditional expectation. More speci"cally,

this assumption asserts that Proj(y

itD=i,ai)"Xitb#Zic#htai, where

Proj(qDh) represents the population least-squares projection of a variable qon

a set of variables h. We will strengthen this assumption in some of our later

analysis. Also, it is obvious that (BA.5) implies that e

it is uncorrelated with

X

is for alltands. In some applications, such as rational expectations models or

sequential moment restriction models (Keane and Runkle, 1992; Chamberlain,

1992), we may wish to assert only the weak exogeneity condition that e

it is

uncorrelated withX

is fors)t; that is E(eitXis)"0 fors)t, which is weaker

than the sequential conditional moment restriction, E(e

itDXi1,2,Xit)"

0,t"1,2,¹. The weak exogeneity condition would lead to relatively

straight-forward modi"cations of our GMM estimators, since it would just reduce the set

of available moment conditions. Chamberlain (1992) derives the e$ciency

bound for the model with sequential moment restrictions, and Hahn (1997)

provides the e$cient GMM estimator whose asymptotic covariance matrix

coincides with the e$ciency bound. For the model with the weak exogeneity

assumption, the e$ciency bound can be easily found. Since using more moment

conditions never hurts the asymptotic e$ciency of GMM, the GMM estimator

utilizing all of the moment conditions implied by the weak exogeneity

assump-tion should be e$cient.

The coe$cients (c) of the time-invariant regressors (Z

i) are obviously not

identi"ed ifmis constant; that is, ifh₂"2"_h

T"1. Otherwise assumption

(BA) serves to identify the parametersb,candh. Assumption (BA.4) is made to

ensure identi"cation of h. It is not hard to see that condition (BA.4) fails in

the case thata_i is uncorrelated with every variable in=

i, and that in this case

the overall intercept andh₂,2,h_T are not identi"ed. Intuitively, because the

covariance matrix ofe_i is unrestricted, correlation between=

i andai is needed

to distinguish the error component ma

i from ei. Further discussion of this

(6)

Since model (2) allows the coe$cient (h) of the e!ectsa

i to vary over time,

a natural generalization of the model would be the case in which the coe$cients

(c) of the time-invariant regressors are allowed to be time dependent.

How-ever, unfortunately, the parameters in c(say,c₁,2,c_T) cannot be separately

identi"ed. All we can identify are the reduced-form parameters, (c₂!h₂c₁),2,

(c

T!hTc1). Thus, unless there exist some prior restrictions on c1, it is not

possible to identify each ofc's. More detailed discussion of this problem is given

in Appendix B.

In addition to our Basic Assumptions, we also consider the following

Covariance Assumptions (CA) on thee

it:

Thus, under (CA), the covariance matrix ofe

i is of the formp2eIT. Assumption

(CA.1) is implied by some versions of model (1), such as the weak exogeneity

projection model that asserts that Proj(y

itD=i,ai,yi1,2,yi,t~1)"Xitb#

Z

ic#htai. Assumption (CA.2) is not typically implied by projection models

(or conditional expectations models either), but it has often been assumed

in the traditional "xed e!ects model. We note once more that (CA.1) and

(CA.2) are assumptions about unconditional second moments. Conditional expectations models such as the rational expectations model have implications for conditional second moments, and therefore may generate additional moment conditions.

3. GMM estimation under basic and covariance assumptions

We will "rst consider GMM estimation under our basic assumptions (BA)

only. Under (BA), it is apparent that the following moment conditions hold:

E(=_@

iuit)"htRwa,t"1,2,¹, (4)

whereh₁"1. We can eliminate the nuisance parametersR

wa from these

mo-ment conditions without any loss of useful information on the parameters of interest. While there are many ways to do so, we choose the following

(¹!1)(¹k#g) moment conditions:

E[=_@

i(uit!htui1)]"0, t"2,2,¹. (5)

GMM based on the moment conditions (5) can be viewed as a natural

extension of the within estimator for the usual"xed e!ects model. To

under-stand this analogy, we should de"ne the equivalents of the within and between

transformations. De"ne G"G(h)"(!h,I

T~1)@, where IT~1 is the

(7)

2If the time-varying regressorsX

itwere only weakly exogenous (i.e., E(eitDXi1,2,Xit)"0), the moment conditions given in (7) can be modi"ed as follows. For simplicity, assume the time-invariant regressors Z

i are uncorrelated with the eit for any t. Let =it"(Xi1,2,Xit,Zi). De"ne

=_*

i"diag(=i1,2,=i,T~1). Then the moment conditions implied by the weak exogeneity

as-sumption can be expressed as E[=_H@

i G(h)@ui(b,c)]"0.

[G, (1/m@m)m]@is a nonsingular nonstochastic matrix, and therefore,

premultiply-ing both sides of Eq. (2) by this matrix does not cause any informational loss. This transformation gives us:

y

it"Xitb!Xi1[htb]#Zi[(1!ht)c]#htyi1#(uit!htui1), (6a)

(1/m@m)m@y

i"(1/m@m)m@Xib#(1/m@m)(m@eT)Zic#(1/m@m)m@ui, (6b)

where Eq. (6a) holds fort"2,2,¹. Eq. (6a) are obtained simply by subtracting

h_t times the"rst equation from the last (¹!1) equations of (1), a generalized

within transformation. Chamberlain (1984, p. 1263) considers the same di!

er-encing method. Eq. (6b) is simply the weighted sum overtof Eq. (1), a

generaliz-ation of the between transformgeneraliz-ation. GMM based on the moment conditions (5) amounts to estimating system (6a) by nonlinear three-stage least squares, using

=

ias instruments for each of the equations. Eq. (6b) is irrelevant for estimation

under BA, because no legitimate instruments are available for it unless we make additional assumptions.

We now consider GMM estimation based on the moment conditions (5). De"ne the vector of the parameters of interest byd"(b@,c@,h@)@. We will denote GMM based on the moment conditions (5) by GMM1, and the resulting

estimator bydK₁, in order to distinguish it from other GMM estimators which we

will consider later. The estimator dK₁ can be obtained by the usual GMM

procedure. The (¹!1)(¹k#g) moment conditions in (5) can be expressed in

Then, dK₁ solves the problem, min

dNb1N(d)@<~111b1N(d), where <11"E[b1ib@1i].

Here<

11can be replaced by any consistent estimate, say<K11. A natural candidate

for<K

11 isN~1+Ni/1b1i(dK)b1i(dK)@, wheredK is an initial consistent estimate ofd.

We now consider GMM estimation under the covariance assumptions (CA)

as well as the basic assumptions (BA). Under (CA.1) and (CA.2), the¹(¹#1)/2

distinct elements of E(u

(8)

R

aa. Since h is already identi"ed under (BA) only, (CA) should imply

¹(¹#1)/2!2 additional moment conditions which can be imposed in GMM

estimation ofd. We can derive these moment conditions by essentially the same

method used by Ahn and Schmidt (1995b) in the simple dynamic panel data model. Note that:

We can"nd two types of moment conditions from (9). First, noting thath₁"1,

there are ¹(¹}1)/2!1 distinct conditions that E(u

ituis)"hsE(uitui1) for any

1(s(t. These conditions re#ect restrictions on the o!-diagonal elements of

E(u

One of the many equivalent ways to express these moment conditions is as follows: The¹(¹!1)/2!1 moment conditions in (10) are valid as long as (CA.1) holds, while the validity of the (¹!1) moment conditions in (11) also requires (CA.2). Conditions (11) imply that the errors in (6a) and (6b) are uncorrelated.

For a matrix representation of (10) and (11), de"ne ;₀

(9)

where b

2i(d) and b3i(d) are the M¹(¹!1)/2!1N]1 and (¹!1)]1 vectors,

respectively.

From now on, the procedure GMM2 and resulting estimatordK₂ will denote

GMM based on moment conditions (7) and (12), and GMM3 anddK₃will denote

GMM based on conditions (7), (12) and (13). If the basic assumptions (BA) and

the covariance assumptions (CA) are satis"ed, GMM3 is at least as e$cient as

GMM2 and GMM1, while GMM2 is at least as e$cient as GMM1. This

follows from the general principle that adding valid moment conditions cannot

decrease asymptotic e$ciency.

Since the moment conditions b

1i,b2iandb3i are nonlinear, the GMM

esti-matesdK₁,dK₂ anddK₃ need to be calculated by an iterative procedures.

Alterna-tively, following Newey (1985), we can use a linearized GMM estimator. Details can be found in an earlier version of this paper (available on request). We also note the following. Given the validity of the moment conditions (5) (or (7))

that follow from (BA), the ¹(¹!1)/2!1 moment conditions in (10) can be

replaced by

Given conditions (7), conditions (14) contain the same information as (12) (or (10)), but they are more nearly linear.

To obtain a matrix representation of (14), we de"ne>0

it"(yit,yi,t`1,2,yiT)

for t(¹, and >₀

iT"(yi2,yi3,2,yi,T~1). Let >i0"diag(>0i3,>0i4,2,>0iT).

Then we can express (14) as

E[bH

2i(d)]"E[>0i@G(h)@ui(b,c)]"0, (15)

wherebH_2i(d) is theM¹(¹!1)/2!1N]1 vector. Then the GMM estimator based

onb

1iandb*2iis asymptotically equivalent to GMM2, and the GMM estimator

based onb

1i, b2i*andb3i is asymptotically equivalent to GMM3.

Although it would perhaps be unusual, we could also consider GMM that assumes (BA) and (CA.2) (homoskedasticity) but not (CA.1)

(nonautocorrela-tion). When (CA.2) holds but (CA.1) does not, the moment conditionsb

3i based

on (13) (or equivalently (11)) do not hold. To obtain moment conditions implied by (CA.2) without (CA.1) holding, observe that the diagonal elements in (9)

contain two nuisance parameters,R

aaandp2e. Eliminating these parameters, we

can obtain the following¹!2 moment conditions:

E[(h2₂!_h₂

t)u2i1#(h2t!1)u2i2#(1!h22)uit]"0 fort'2. (16)

(10)

For a matrix representation of (16), de"ne

G

H(h)"

C

h2₂!_h₂

3 h22!h24 2 h22!h2T

h2₃!1 h2₄!1 2 _h₂

T!1

1!_h₂

2 0 2 0

0 1!h2₂ 2 0

F F 0

0 0 2 1!h2₂

D

TC(T~2)

(17)

and u*

i(b,c)"(ui1(b,c)2,2,uiT(b,c)2)@. Then we can express the (¹!2)

mo-ment conditions implied by homoskedasticity only as

E[b

NH,i(d)]"E[GH(h)@uHi(b,c)]"0. (18)

Then GMM imposing (BA) and (CA.2) would be based onb

1iandbNH,i. We

will call this estimator GMM-NH or dK_NH, since it imposes (BA) plus no

heteroskedasticity. This estimator would be (asymptotically) more e$cient than

GMM1 but less e$cient than GMM3; it cannot be ranked relative to GMM2.

4. Some e7ciency results

In this section we prove some e$ciency results for the estimatorsdK₁(GMM1),

dK₂(GMM2),dK_NH (GMM-NH) anddK₃ (GMM3) of the previous section, under

somewhat stronger assumptions than we made there. In particular, we show that under certain independence assumptions we can substantially reduce the

number of moment conditions without a loss of e$ciency.

We"rst consider the GMM1 estimatordK₁ based on the moment conditions

(5)}or equivalently, (7). The number [(¹!1)(¹k#g)] of moment conditions

in (5) could be very large. In this case, GMM using all of the moment conditions

may be computationally burdensome, and may have poor"nite sample

proper-ties. This raises the question of whether we can reduce the number of moment

conditions without losing e$ciency of estimation. This is possible if we make

a stronger assumption than the uncorrelatedness condition (BA.5). Speci"cally,

suppose that we replace (BA.5) with the following assumption ((BI), for basic assumption with independence):

[=

i,ai] is independent ofei. (BI)

De"ne U"E(e_ie@_i) and W"G@UG, where Gis as de"ned in the discussion above Eqs. (6a) and (6b). Under (BI), we can optimally weight the moment

conditions (5) using W_~1, decreasing the number of moment conditions that

(11)

We"rst de"ne the linear projection ofa

This is essentially the same construction as was applied by Chamberlain (1984)

to the usual"xed e!ects model. This model serves to identify the parameters

b,c,handn; for example, following Chamberlain, consistent estimates of these

parameters can be obtained by applying the minimum distance principle to the least-squares estimates from (20).

Now consider the following moment conditions.

E[X@

(21c) imply thek,g, and¹!1 moment conditions, respectively. Accordingly,

the total number of moment conditions in (21a)}(21c) is exactly the same as the

number of parameters ind. These conditions can be imposed as follows. De"ne

R

i(h,n)"[G(h)@Xi,G(h)@eTZi, (=in)IT~1]. LetdQ"(bQ@,c5@,hQ@)@denote the GMM

estimator based on the moment conditions (21a)}(21c). It solves

1

wherehK,n( andWK are initial consistent estimates.

To interpret the moment conditions (21a)}(21c), recall that G(h)@u

i(b,c) is

a vector whose typical element is of the formu

it!htui1. These moment

condi-tions say that some linear funccondi-tions of =

i are uncorrelated withuit!htui1.

Therefore, they are linear combinations of the moment conditions (5), which say

that=

iis uncorrelated withuit!htui1for allt. As a result,dK1(GMM1) must be

e$cient relative todQ. However, when (BA.5) is strengthened to (BI), we have the

following useful result.

Proposition 1. Under (BA.1)}(BA.4) and (BI),dK₁anddQ are asymptotically equiv a-lent.

Proof. See Appendix C.

It is worth noting that under (BI) bothdK₁anddQ are in general ine$cient. We

(12)

De"neh

i"h(=i)"E(aiD=i). Then Proposition 3 of Chamberlain implies that

dK₁is dominated by a GMM estimator, saydK_C, which is obtained by exploiting

the moment conditions (21a)}(21c), but with (=

in) replaced byhi. The e$ciency

gain of dK_C over dK₁ is simply due to the fact that the conditional mean h

i is

a better approximation of a

i than its linear projection, =in. Of course,dK1 is

asymptotically equivalent todK_C ifh

i"=in. A practical di$culty in applying

Chamberlain's method is that the calculation of dK_C requires nonparametric

estimation of h

i. Also, under (BI) even dKC is generally ine$cient, because it

exploits the restrictions implied by E(e

iD=i,ai)"0, but not those implied by

independence.

Independence between the explanatory variables and thee_itinvolves

higher-order moments which may be relevant for GMM. However, the following

proposition states that under a normality assumption,dK₁can be fully e$cient.

Proposition 2. Assume thate

i andgi are stochastically independent of=i and of

each other. Assume thatg

i andei are normally distributed withvariancepg2and

covariance matrixU, respectively. ThendK₁ is asymptotically ezcient.

The proof is given in Appendix C. Basically it shows that, under the

assump-tions made,dK₁ is locally semiparametrically e$cient. We can note that for the

MLE,dand the nuisance parametersp2_g,U_and_n_{are estimated jointly. A}

side-light of Proposition 2 is that under the normality assumption GMM1 allows

one to estimatede$ciently without estimating the nuisance parameters.

We now turn to the estimators dK₂, dK_NH and dK₃ that impose the basic

assumptions (BA) and the covariance assumptions (CA). Again we want to know whether we can reduce the number of moment conditions without losing

e$ciency. To proceed, we need to strengthen the covariance assumptions.

Speci"cally, we replace (CA) by the stronger assumption (CI) that thee

itare iid:

For anyi, thee

itare mutually independent. (CI.1)

For anyi, thee

itare identically distributed. (CI.2)

While the estimatorsdK₂,dK_NHanddK₃are in general ine$cient, they can be fully

e$cient under (CI) and a normality assumption.

Proposition 3. Assume that (BA.1)}(BA.4) and BI hold. Assume thatg

iis

indepen-dent of W

i andei. Assume that gi andei are normally distributed with variance

p_g2

and covariance matrix U_, _respecti_v_{ely. If} ₍_CI.1₎ _holds, _dK

2 has the same

asymptotic distribution as the(restricted) MLEofdin the model given in Eq. (20) above, imposing the restriction thatU_{is diagonal. If}(CI.2)holds but(CI.1)does not, the GMM estimator based on the moment conditions b

1i and bNH,i (dKNH) is

(13)

the same asymptotic distribution as the (restricted) MLE of d, imposing the restriction thatU"p2_eI

T.

Basically, this proposition states conditions under which the GMM

es-timatorsdK₂,dK₃anddK_NH are e$cient in the class of estimators that make use of

second-moment information. Like Proposition 2, it can be viewed as a local

e$ciency result. The basic assumptions (BA) and covariance assumptions (CA)

are stated in terms of uncorrelatedness only, so that only second-moment information is relevant. When we replace uncorrelatedness by independence, additional moment conditions involving higher-order moments are potentially relevant. However, these additional moment conditions do not lead to gains in

asymptotic e$ciency when g

i and the eit are normal. This is because under

normality only the"rst two moments matter.

We can also show that under the assumptions of Proposition 3, many moment conditions in (7), (12) and (13) become redundant in the sense that

imposing them in GMM does not improve the e$ciency of the estimator. Stated

formally:

Proposition 4. Assume (BA.1)}(BA.4), (BI), (CI.1), (CI.2), and normality of thee

it.

ThendK₃has the same asymptotic distribution as the GMM estimator based on the moment conditions (21a)}(21c) and (13).

Proposition 4 gives conditions under which we can considerably reduce the

number of moment conditions in GMM3 without losing asymptotic e$ciency.

It was already clear from Proposition 3 that such a reduction in relevant moment conditions is possible under some assumptions, but the assumptions of Proposition 4 are weaker than those of Proposition 3, because we do not need to

assume normality ofg

i.

GMM3 is based on the set of moment conditions (7), (12) and (13). Proposi-tion 4 essentially says two things. First, it says that we can replace the largest set of moment conditions (7) by the much smaller set of moment conditions

(21a)}(21c). This is the same conclusion as in Proposition 1. However, it does not

follow as an implication of Proposition 1; it is a new result. As shown in Breusch et al. (1999), the fact that one set of moment conditions is redundant, given another set of moment conditions, can change when the other set of moment conditions is made larger. Second, the proposition says that we can omit the set of moment conditions (12), because these moment conditions are redundant given those in (7) and (13), or given those in (21a), (21b) and (13). This

redund-ancy result is the reason that we expressed the moment conditionsb

(14)

the way we did. The redundant conditionsb

2igiven in (12) are those that re#ect

uncorrelatedness of the errors. This is a very useful result because it allows us to

eliminate a large number of nonlinear moment conditions without losing e$

cien-cy. We note that, although the moment conditions (12) are redundant given those in (7) and (13), they are not uninformative in general. For example, the GMM2

estimator based on (7) and (12) is more e$cient than the GMM1 estimator based

on (7) only, even given the assumptions of Proposition 4, so that there is useful information in (12) that is not in (7). However, given the assumptions of Proposi-tion 4, there is no useful informaProposi-tion in (12) that is not in (7) plus (13).

5. Concentrated least squares

In this section, we shall consider the concentrated least-squares (CLS)

estima-tion of the model. The CLS estimator is de"ned as follows. We treat thea

i as

parameters to be estimated (along with b, c and h), so that this is a true

"xed-e!ects treatment of the model. We can consider the following least-squares problem:

Solving the "rst-order conditions of problem (23), we can express thea

i in

this estimator is the MLE. Lee (1991) shows that it can also be derived as a conditional MLE estimator, where the conditional likelihood function is

obtained by conditioning onm@(y

i!Xib!eTZic), the su$cient statistic forai.

Because the a_i are&incidental parameters' in the sense of Neyman and Scott

(1948)}that is, the number ofa

i grows with sample sizeN }and because the

su$cient statistic for a

(15)

3It might be worth mentioning that under the weak exogeneity assumption, E(e_isX it)"0 for s*t, the CLS estimator would become inconsistent, and modifying it to be consistent under weak exogeneity would fundamentally destroy its least-squares or maximum likelihood interpretation.

asymptotic properties of the MLE or conditional MLE do not apply, and the

asymptotic properties of the CLS estimator need to be derived directly.3

Further discussion of similar cases can be found in Cornwell and Schmidt (1991), Kiefer (1980b) and Chamberlain (1992).

The CLS estimator was"rst considered by Kiefer (1980a) and was considered

later but independently by Lee (1991) and Lee and Schmidt (1993). They provide some interesting computational results. For example, they show that the CLS

estimator can be computed by a speci"c iterative process. For a given value ofm,

the estimator ofbandcis obtained by regression ofM(m)y

i onM(m)(Xi,eTZi);

conversely, for a given value ofb and c, the estimator of mis the eigenvector

corresponding to the largest eigenvalue ofR

iui(b,c)ui(b,c)@.

Under the strong assumptions for which the CLS estimator is the MLE (iid

normale's and"xed regressors), Lee (1991) rigorously considers the asymptotic

properties of the CLS estimator. An interesting point that we will develop below is that the CLS estimator is consistent under weaker assumptions than Lee made, but consistency basically requires our covariance assumptions as well as

the basic assumptions. An important"nding of Lee's is that the e$ciency of the

MLE is in doubt. He shows that E(L2CSSE/LdLd@) fails to be proportional to

E[(LCSSE/Ld)(LCSSE/Ld@)] even under his strong assumptions. Due to this fact,

the asymptotic covariance matrix ofJN(dK_CLS!d) takes the form

C

E

A

L2CSSE

which is similar in form to the covariance matrix of a quasi-ML estimator under

distributional misspeci"cation. This result casts doubt on the e$ciency of CLS,

and we will show that this doubt is justi"ed, in the sense that the CLS estimator

is dominated by a GMM estimator. This may be a surprising result because the incidental parameters problem is typically thought to cast doubt on the

consist-ency of the MLE, not on its e$ciency.

A simple example may be worth brief consideration before moving on to the

general case. Let¹"2 and suppose that there are no explanatory variables in

the model, so we simply have

y

i1"ai#ei1,yi2"aih#ei2. (27)

We wish to estimate the scalar parameter h and the nuisance parameters

a₁,2,a

N. The CLS estimator minimizes the criterion function

CSSE"R

(16)

estimator ofhsolves the equation

Obviously, this corresponds to imposition of the moment condition

E[(y

i1#hyi2)(yi2!hyi1)]"0. (29)

Evaluating the expectation on the left-hand side of (29), we obtain

hvar(e

i2)!hvar(ei1)#(1!h2)cov(ei1,ei2), and this equals zero for all values of

hif and only if the covariance assumptions hold. Thus, the consistency of the

estimator follows from and requires these covariance assumptions.

Further-more, the ine$ciency of the CLS estimator follows from its failure to exploit the

moment condition E(y

i2!hyi1)"0, which is informative for hso long as the

mean of thea_i is nonzero.

We now return to the general model (1), and we consider the properties of the CLS estimator from the perspective of GMM. For expository convenience, we

will do so under assumptions (BA.1)}(BA.4), (BI) and (CI). These assumptions

are weaker than the assumptions made by Lee, but stronger than actually

necessary, since (BA) and (CA) are su$cient for the consistency and asymptotic

normality of the CLS estimator.

We previously de"ned the quasi-di!erencing matrixG, which has the

prop-erty that, for any ¹!dimensional vector d, the tth element of G@d equals

d

t`1!ht`1d1. Using the fact that M(m)"G(G@G)~1G@, the "rst-order

condi-tions for the minimization of CSSE can be written as follows:

LCSSE

The consistency of the CLS estimator requires that the terms in the large

brackets in Eq. (30) have expectation zero (k#g#(¹!1) moment conditions).

The"rst two terms have expectation zero under our basic assumptions; what is

required is essentially thate

i is uncorrelated withXi andZi. However, for the

last term to have expectation zero requires thatm@u

i(b,c) be uncorrelated with

G@u

i(b,c), which requires the covariance assumptions. Furthermore, it is clear

from (30) that the CLS estimator is equivalent to a GMM estimator based on

the moment conditions (21a), (21b) and (13), withW"G@Gin (21a) and (21b).

(17)

GMM3 exploits, which explains the ine$ciency of CLS compared to GMM3.

To make this point clear, consider the case in which thee

itare iid normal. Then

Proposition 4 implies that dK₃ is asymptotically equivalent to the GMM

estimator based on conditions (21a)}(21c) and (13), and the ine$ciency of CLS

results from the fact that it fails to exploit the moment conditions (21c).

6. Calculation of asymptotic covariance matrices

In this section we attempt to quantify the e$ciency gains of using the moment

conditions implied by the covariance assumptions. We consider"ve estimators,

GMM1 (dK₁), GMM2 (dK₂), GMM-NH (dK_NH), GMM3 (dK₃) and CLS (dK_CLS). We

will calculate and compare the asymptotic covariance matrices of these esti-mates, for various parameter values. A derivation of the asymptotic covariance matrices is given in Appendix C. The calculations will be made assuming that

(BA.1)}(BA.4), (BI) and (CI) hold. Under these assumptions, GMM3 is most

e$cient, while both GMM2 and GMM2-NH are more e$cient than GMM1.

However, a priori e$ciency comparison of GMM2 with GMM-NH is not

possible, because these estimates are based upon nonnested sets of moment

conditions. By the same reasoning, the e$ciency comparison of CLS with

GMM1, GMM2 and GMM-NH is not possible.

The basis of our calculations is model (1) with a single time-varying regressor and an intercept term

covariance matrices by settingp2_e"1. Our choices of parameter values ensure

(18)

Table 1 reports some results for the case that the e

it are normal. The

numbers in the parentheses are the variances of the GMM1 estimates. The other results are the ratios of the asymptotic variances of the GMM2, GMM-NH, GMM3 and CLS estimates to the asymptotic variances

of the GMM1 estimates [e.g., var(bK₂)/var(bK₁)]. A salient fact in Table 1

(and also the other tables) is that exploiting the covariance restrictions

does not provide much e$ciency gain for the estimates of b, while it

substan-tially improves the e$ciency of the estimates of cand the h

t. Apparently, the

moment conditions implied by the orthogonality between the errors and

the regressors are much more important in the estimation of the coe$cients

of time-varying regressors than those which are implied by the covariance restrictions.

In Table 1, with a few exceptions, the e$ciency of GMM2, GMM-NH,

GMM3 and CLS over GMM1 increases, as the time-variation of the individual

e!ect (o), the mean of the individual e!ect (k_a) and the variance of the individual

e!ect (p2_a) increase, and as the correlation between the regressor and the

individual e!ect (p

Xa) decreases. In the cases with small values of pXa, the

e$ciency gains from using GMM2, GMM-NH, GMM3 and CLS instead of

GMM1 are very large. This is reasonable because GMM1 does not provide

a consistent estimate ofcor of theh

t whenpXa"0. It is interesting to see that

the CLS estimates, in general, are slightly better than both the GMM2 and

GMM-NH estimates. The e$ciency of the CLS estimates is in most cases almost

equal to that of the GMM3 estimates. Thus, while GMM3 does dominate CLS, it is not very much better given the assumptions underlying our calculations. There are two exceptional cases in which CLS performs worse than other GMM

estimators: the cases withp2_a"0.38 and withp_X_a"1.15. These are parameter

values near the boundary of the parameter subspace for which all covariance

matrices are positive de"nite.

The e$ciency gains of GMM-NH over GMM1 are, in general, greater

than the e$ciency gains of GMM2 over GMM1, but only by very small

margins. This result indicates that the e$ciency gains by exploiting the

moment conditions that require homoskedasticity of thee

itare only marginally

greater than the e$ciency gains from the moment conditions that require

mutual zero correlation among the e

it. Further, the e$ciency gains of

GMM3 over both GMM2 and GMM-NH are, if any, only marginal. This

means that the e$ciency gains from using both sets of moment conditions

(implied by both the zero-correlation and homoskedasticity assumptions) are not too much greater than the gains from using only one set of moment conditions.

Table 2 reports results for di!erent values of¹. As¹increases, the e$ciency

gains of GMM2, GMM-NH, GMM3 and CLS over GMM1 tend to become smaller and more nearly equal. This implies that marginal information in the

(19)

E$ciency comparison of GMM2, GMM-NH, GMM3 and CLS with GMM1. Thee_itare iid N(0, 1). The base values of parameters are:o"0.5;p2_a"2;

p2_X"2;p_X_a"0.5;k_a"1;k_X"1;T"3

b c h

2 h3 b c h2 h3

GMM1 (I) (0.25)! (810) (2.60) (5.89) (II) (0.26) (5.09) (8.78) (13.8)

GMM2 0.999" 0.465 0.388 0.264 0.974 0.999 0.208 0.220

GMM-NH o"0.05 0.999" 0.465 0.364 0.264 k_a"0.00 0.973 0.999 0.201 0.212

GMM3 0.999" 0.429 0.339 0.215 0.972 0.999 0.198 0.195

CLS 0.999" 0.430 0.341 0.216 0.972 0.999 0.198 0.195

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (17.8) (4.82) (13.5)

GMM2 0.973 0.450 0.291 0.225 0.973 0.450 0.291 0.225

GMM-NH o"0.50 0.973 0.439 0.289 0.215 k_a"1.00 0.973 0.439 0.289 0.215

GMM3 0.972 0.425 0.282 0.197 0.972 0.425 0.282 0.197

CLS 0.972 0.425 0.282 0.197 0.972 0.425 0.282 0.197

GMM1 (0.27) (8.59) (8.64) (27.3) (0.26) (123) (3.56) (13.4)

GMM2 0.944 0.449 0.250 0.206 0.973 0.259 0.257 0.227

GMM-NH o"1.00 0.942 0.434 0.250 0.195 k_a"3.00 0.973 0.247 0.251 0.215

GMM3 0.942 0.429 0.247 0.188 0.972 0.228 0.236 0.196

CLS 0.942 0.429 0.247 0.188 0.972 0.228 0.236 0.196

GMM1 (0.29) (4.12) (61.9) (231) (0.26) (335) (3.42) (13.4)

GMM2 0.878 0.454 0.194 0.181 0.973 0.239 0.240 0.227

GMM-NH o"4.00 0.875 0.442 0.193 0.175 k_a"5.00 0.973 0.226 0.230 0.214

GMM3 0.875 0.441 0.193 0.175 0.972 0.207 0.212 0.195

CLS 0.875 0.441 0.193 0.175 0.972 0.207 0.213 0.195

GMM1 (III) (0.26) (17.8) (4.82) (13.5) (IV) (0.26) (151) (39.1) (148)

GMM2 1.000 1.000 1.000 1.000 0.966 0.054 0.036 0.021

GMM-NH p2_a"0.38 1.000 1.000 1.000 1.000 p_X_a"0.15 0.966 0.052 0.036 0.020

GMM3 1.000 1.000 1.000 1.000 0.966 0.051 0.035 0.018

CLS 1.010 1.250 1.280 1.350 0.966 0.051 0.035 0.018

S.C.

Ahn

et

al.

/

Journal

of

Econometrics

101

(2001)

219

}

255

(20)

Table 1 (Continued)

b c h

2 h3 b c h2 h3

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (17.8) (4.82) (13.5)

GMM2 0.997 0.944 0.921 0.918 0.973 0.450 0.291 0.225

GMM-NH p2_a"0.50 0.997 0.940 0.920 0.915 p_X_a"0.50 0.973 0.439 0.289 0.215

GMM3 0.995 0.899 0.890 0.860 0.972 0.425 0.282 0.197

CLS 0.998 0.967 0.968 0.954 0.972 0.425 0.282 0.197

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (9.88) (2.35) (5.30)

GMM2 0.973 0.450 0.291 0.225 0.985 0.792 0.598 0.561

GMM-NH p2_a"2.00 0.973 0.439 0.289 0.215 p_X_a"0.80 0.984 0.780 0.595 0.545

GMM3 0.972 0.425 0.282 0.197 0.983 0.759 0.582 0.505

CLS 0.972 0.425 0.282 0.197 0.983 0.759 0.584 0.507

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (7.32) (1.32) (2.58)

GMM2 0.969 0.360 0.155 0.103 1.000 1.000 1.000 1.000

GMM-NH p2_a"4.00 0.969 0.357 0.154 0.100 p

Xa"1.15 1.000 1.000 0.999 1.000

GMM3 0.969 0.353 0.152 0.096 1.000 1.000 0.999 0.999

CLS 0.969 0.353 0.152 0.096 1.000 1.020 1.050 1.060

!This number indicates the asymptotic variance ofJN(bK₁!_b).

"These numbers indicate the ratios of the asymptotic variances ofJN(bK₂!_b),JN(bK_NH!_b),JN(bK₃!_b) andJN(bK_CLS!_b) over the asymptotic variance ofJN(bK₁!_b), respectively.

S.C.

Ahn

et

al.

/

Journal

of

Econometrics

101

(2001)

219

}

(21)

E$ciency comparison of GMM2, GMM-NH, GMM3 and CLS with GMM1. Thee_itare iid N(0, 1). The base values of parameters are:o"0.5;p2_a"2;

p2_X"2;p

Xa"0.5;ka"1;kX"1

b c h

2 h3 h4 h5 h6 h7 h8 h9 h10

GMM1 T"4 (0.17)! (0.81) (3.11) (7.11) (13.3) GMM2 0.977" 0.524 0.399 0.312 0.276 GMM-NH 0.977" 0.522 0.397 0.312 0.274 GMM3 0.977" 0.517 0.396 0.309 0.268 CLS 0.977" 0.518 0.396 0.309 0.269

GMM1 T"6 (0.10) (3.10) (2.12) (3.91) (6.51) (9.93) (14.2)

GMM2 0.984 0.663 0.544 0.490 0.455 0.433 0.419

GMM-NH 0.984 0.663 0.543 0.490 0.455 0.433 0.418

GMM3 0.984 0.662 0.543 0.490 0.455 0.432 0.417

CLS 0.984 0.662 0.543 0.490 0.455 0.432 0.417

GMM1 T"10 (0.06) (1.12) (1.50) (2.45) (3.73) (5.35) (7.31) (9.60) (12.2) (15.2) (18.5)

GMM2 0.993 0.848 0.741 0.726 0.713 0.704 0.697 0.692 0.688 0.684 0.682

GMM-NH 0.993 0.848 0.741 0.726 0.713 0.704 0.697 0.692 0.688 0.684 0.682

GMM3 0.993 0.847 0.741 0.726 0.713 0.704 0.697 0.692 0.688 0.684 0.682

CLS 0.993 0.847 0.741 0.726 0.713 0.704 0.697 0.692 0.688 0.684 0.682

S.C.

Ahn

et

al.

/

Journal

of

Econometrics

101

(2001)

219

}

255

(22)

Tables 3 and 4 report the results for some cases in which the e

it are

distributed as Chi-squared with one degree of freedom, but normalized

so that E(e

it)"0 and var(eit)"1. Note that the results reported for GMM1

and GMM2 are identical to those in Tables 1 and 2. This is so because under assumptions (BI) and (CI), the covariance matrices of GMM1 and GMM2 estimates are independent of the third and fourth moments

of thee

it. (See the forms of the covariance matrices given in Appendix C.) Most

of the e$ciency comparisons reported in Tables 3 and 4 are similar to those in

Tables 1 and 2.

Several general comments can be made on the results in Tables 1}4. First,

covariance restrictions are particularly important when¹is small. In this case,

GMM exploiting the moment conditions implied by the covariance restrictions

is worthwhile. Second, the e$ciency gains of GMM2 over GMM1 or

GMM-NH over GMM1 are often substantial, while the additional e$ciency gains of

GMM3 over GMM2 or GMM-NH are usually small. This implies that most of

e$ciency gains of exploiting covariance restrictions can be obtained by either

imposing the moment conditions that require only the mutual zero-correlation

among thee

it, or by the moment conditions that only require homoskedasticity.

Third, CLS performs well in general, but it can be worse than GMM1 in some cases. Furthermore, its consistency requires both nonautocorrelation and

homoskedasticity of the e

it. There seems to be little reason to prefer CLS to

GMM2 or GMM-NH, at least in terms of the asymptotic properties of the estimates.

7. Conclusions

In this paper we have considered a panel data model with time-varying

individual e!ects, which is a generalization of the traditional"xed e!ects model.

This model is of theoretical interest, and we believe that it is also empirically

useful in a variety of settings. We have counted and expressed the "rst- and

second-order moment conditions implied by alternative sets of assumptions on the model, and we have considered several GMM estimators and examined the

conditions under which these estimators are e$cient. While some of the

avail-able moment conditions are nonlinear, we can derive very simple linearized

GMM estimators that are asymptotically as e$cient as the nonlinear GMM

estimators. These alternative GMM procedures are available from the authors upon request.

The main novelty in our GMM approach is the exploitation of possible covariance restrictions. Our calculations of asymptotic covariance matrices of the GMM estimators demonstrate that exploiting covariance restrictions can

result in substantial e$ciency gains, especially when the number of time-series

(23)

E$ciency comparison of GMM2, GMM-NH, GMM3 and CLS with GMM1. Thee_itare iids2(1), normalized so that E(e_it)"0 and var(e_it)"1. The base values of parameters are:o"0.5;p2_a"2;p2_X,2;p_X_a"0.5;k_a"1;k_X"1;T"3

b c h

2 h3 b c h2 h3

GMM1 (I) (0.25)! (810) (2.60) (5.89) (II) (0.26) (5.09) (8.78) (13.8)

GMM2 0.999" 0.465 0.388 0.264 0.974 0.999 0.208 0.220

GMM-NH o"0.05 0.999" 0.465 0.358 0.264 k_a"0.00 0.973 0.993 0.203 0.215

GMM3 0.999" 0.255 0.352 0.248 0.973 0.625 0.203 0.206

CLS 0.999" 0.268 0.593 0.328 0.974 1.010 0.228 0.229

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (17.8) (4.82) (13.5)

GMM2 0.973 0.450 0.291 0.225 0.973 0.450 0.291 0.225

GMM-NH o"0.50 0.973 0.450 0.264 0.217 k_a"1.00 0.973 0.450 0.264 0.217

GMM3 0.973 0.278 0.264 0.207 0.973 0.278 0.264 0.207

CLS 0.974 0.319 0.358 0.232 0.974 0.319 0.358 0.232

GMM1 (0.27) (8.59) (8.64) (27.3) (0.26) (123) (3.56) (13.4)

GMM2 0.944 0.449 0.250 0.206 0.973 0.259 0.257 0.227

GMM-NH o"1.00 0.943 0.448 0.230 0.193 k_a"3.00 0.973 0.257 0.222 0.220

GMM3 0.942 0.310 0.230 0.188 0.973 0.205 0.216 0.209

CLS 0.943 0.364 0.273 0.198 0.974 0.206 0.271 0.232

GMM1 (0.29) (4.12) (61.9) (231) (0.26) (335) (3.42) (13.4)

GMM2 0.878 0.454 0.194 0.181 0.973 0.239 0.240 0.227

GMM-NH o"4.00 0.876 0.449 0.189 0.171 k_a"5.00 0.973 0.236 0.217 0.221

GMM3 0.875 0.373 0.186 0.171 0.973 0.203 0.209 0.210

CLS 0.875 0.429 0.194 0.174 0.974 0.209 0.244 0.233

GMM1 (III) (0.26) (17.8) (4.82) (13.5) (IV) (0.26) (151) (39.1) (148)

GMM2 1.000 1.000 1.000 1.000 0.966 0.054 0.036 0.021

GMM-NH p2_a"0.38 1.000 0.998 0.851 0.995 p_X_a"0.15 0.966 0.054 0.033 0.020

GMM3 1.000 0.854 0.894 0.995 0.966 0.033 0.033 0.019

CLS 1.050 1.350 2.070 2.430 0.966 0.038 0.044 0.021

S.C.

Ahn

et

al.

/

Journal

of

Econometrics

101

(2001)

219

}

255

(24)

Table 3 (Continued)

b c h

2 h3 b c h2 h3

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (17.8) (4.82) (13.5)

GMM2 0.997 0.944 0.921 0.918 0.973 0.450 0.291 0.225

GMM-NH p2_a"0.50 0.997 0.943 0.790 0.912 p_X_a"0.50 0.973 0.450 0.264 0.217

GMM3 0.997 0.732 0.777 0.897 0.973 0.278 0.264 0.207

CLS 1.020 0.883 1.480 1.560 0.974 0.319 0.358 0.232

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (9.88) (2.35) (5.30)

GMM2 0.973 0.450 0.291 0.225 0.985 0.792 0.598 0.561

GMM-NH p2_a"2.00 0.973 0.450 0.264 0.217 p_X_a"0.80 0.985 0.792 0.540 0.546

GMM3 0.973 0.278 0.264 0.207 0.984 0.494 0.539 0.527

CLS 0.974 0.319 0.358 0.232 0.986 0.569 0.739 0.597

GMM1 (0.26) (17.8) (4.82) (13.5) (0.26) (7.32) (1.32) (2.58)

GMM2 0.969 0.360 0.155 0.103 1.000 1.000 1.000 1.000

GMM-NH p2_a"4.00 0.969 0.360 0.148 0.100 p

Xa"1.15 1.000 0.995 0.847 0.993

GMM3 0.969 0.225 0.148 0.098 1.000 0.653 0.844 0.993

CLS 0.969 0.293 0.178 0.104 1.010 0.761 1.330 1.250

S.C.

Ahn

et

al.

/

Journal

of

Econometrics

101

(2001)

219

}

(25)

E$ciency comparison of GMM2, GMM-NH, GMM3 and CLS with GMM1. Thee_itare iid.s2(1), normalized so that E(e_it)"0 and var(e_it)"1. The base values of parameters are:o"0.5;p2_a"2;p2_X"2;p

Xa"0.5;ka"1;kX"1

b c h

2 h3 h4 h5 h6 h7 h8 h9 h10

GMM1 T"4 (0.17)! (8.10) (3.11) (7.11) (13.3) GMM2 0.977" 0.524 0.399 0.312 0.276 GMM-NH 0.977" 0.522 0.327 0.288 0.266 GMM3 0.977" 0.346 0.326 0.288 0.263 CLS 0.978" 0.433 0.454 0.340 0.286

GMM1 T"6 (0.10) (3.10) (2.12) (3.91) (6.51) (9.93) (14.2)

GMM2 0.984 0.663 0.544 0.490 0.455 0.433 0.419

GMM-NH 0.984 0.656 0.410 0.402 0.395 0.390 0.384

GMM3 0.984 0.452 0.409 0.400 0.394 0.389 0.383

CLS 0.984 0.609 0.574 0.513 0.471 0.444 0.426

GMM1 T"10 (0.06) (1.12) (1.50) (2.45) (3.73) (5.35) (7.31) (9.60) (12.2) (15.2) (18.5)

GMM2 0.993 0.848 0.741 0.726 0.713 0.704 0.697 0.692 0.688 0.684 0.682

GMM-NH 0.993 0.834 0.533 0.548 0.558 0.566 0.571 0.575 0.577 0.579 0.580

GMM3 0.993 0.591 0.532 0.545 0.555 0.562 0.568 0.572 0.575 0.577 0.578

CLS 0.993 0.822 0.753 0.737 0.723 0.712 0.704 0.698 0.693 0.689 0.686

S.C.

Ahn

et

al.

/

Journal

of

Econometrics

101

(2001)

219

}

255

(26)

We have shown that a concentrated least-squares estimator, which is a natu-ral genenatu-ralization of the within estimator and which is the MLE when the errors

are iid normal and the regressors are"xed, is consistent only when the errors are

white noise. Furthermore, given the conditions needed for its consistency, this

estimator is ine$cient. This is an unusual and intriguing theoretical result, and it

is reasonable to ask to what class of models it generalizes. This is an interesting topic for further research.

Another useful generalization of the model is to allow for multiple-component

individual e!ects. That is, instead of a single component of the formh

tai, we

could consider multiple components of the form +_Jj_/1h

jtaji. For example, in

some models one might wish to include a time-invariant individual e!ect (i.e.,

h_1t"1 for allt) as well as a time-varying individual e!ect. As another example,

in models in which theh_t represent macro shocks, one might have two or more

distinct macro shocks to which individuals react di!erently. Lee (1991) has

considered conditional least-squares estimation of multiple components models, and a GMM treatment should also be feasible.

Acknowledgements

The"rst author gratefully acknowledges the"nancial support of the College

of Business and Dean's Council of 100 at Arizona State University, the

Eco-nomic Club of Phoenix, and the alumni of the College of Business. The second

author was"nancially supported for this research by Hansung University in the

year of 1999. The third author gratefully acknowledges the"nancial support of

the National Science Foundation.

Appendix A

In this section, we consider several examples of the panel data model with

time-varying individual e!ects introduced in (1). The"rst example is an

inter-temporal consumption model under the complete market hypothesis. Assume

that an individual's (or a household's) intertemporal utility function ;

it is

a separable function of his/her consumption level (c

it) and working hours (hit):

That is,;

it";i(cit,hit)";1,i(cit)#;2,i(hit). Suppose that all of the"nancial

and labor markets are competitive and complete. Under this assumption, the equilibrium of resource allocation should be Pareto-optimal. Assuming the

discount factor is constant over di!erent i, Altug and Miller (1990) show that

one of the Pareto-optimum conditions (their Eq. (2.6)) is

L; it

Lc

it

"k

(27)

where k

i is individual i's Lagrangean multiplier for his/her lifetime budget

constraint andj

t is the price of contingent claims at timet which is realized

along the equilibrium path. Thej

t can be also interpreted as the social value

(shadow price) of holding a unit of wealth at time t, andk

ijt as individuali's

marginal utility of wealth at timet. Suppose that the utility function;

1,i(cit) is

quadratic:;

1,i(cit)"a(cit!dit)2, wheredit is a time-varying component of the

utility function. Although this form of the utility function may be theoretically implausible, it has been widely used in empirical studies (see, for example, Hall,

1978; Hall and Mishkin, 1982; Pischke, 1995). Specify d

it"Xitb#Zic#eit,

where X

it and Zi are individual-speci"c variables. Then, condition (A.1)

im-mediately implies model (1), if we sety

it"cit,ai"ki/(2a), andht"jt.

We can obtain similar results with di!erent utility functions. For example,

suppose that the consumption-component utility function;

1,iis of the constant

absolute risk aversion (CARA) form: ;

1,i(cit)"!(1/pi)exp[!pi(cit!dit)],

whered

it"Xitf1#Zif2#eit. Hall (1988) and Townsend (1994) used this type

of utility function for their empirical studies. Following Townsend (1994), we

allow the parameterp to be individual-speci"c. Then, substituting this utility

function into (A.1) and taking logarithms yield

c

result can be obtained even if the utility function is of the constant relative risk

aversion form: ;

1(cit)"(1/pi)citpi[expM!ditN]pi~1. A detailed derivation is

available from the authors upon request.

Another example for model (1) is an asset pricing model with time-varying risk premiums (see, for example, Campbell, 1987; Ferson and Foerster, 1994; Zhou, 1994). In order to facilitate our comparison between model (1) and the

asset pricing model, we temporarily switch the roles ofiandt: That is, we index

time byiand individual assets byt. Then, a simple asset pricing model is given

by E(R

itDXi~1)"j0,i#btj1,i, wheret"0, 1,2,¹,i"1,2,N,Ritis the gross

return of assettat timei,X

i~1 is the information set at time (i!1),bt is the

conditional beta of asset t corresponding to the market return, j_0,i is

the so-called zero-beta return at timei(Black, 1972), andj

1,iis the market-wide

risk premium. Following the literature on asset pricing models, we assume that

¹(the number of assets) is"xed andN(the number of time-series observations)

is large. De"ner

it"Rit!Ri0, which is the excess return of assettat timeiover

the return of an arbitrarily chosen asset zero. Then, the asset pricing model can be rewritten as

E(r

itDXi~1)"btj1,i, (A.3)

whereb

(28)

In order to estimate and test for model (A.3), it is frequently assumed that

j

1,i is a linear function of some time-series macroeconomic variables; that is,

j

1,i"Fif, whereFi is a row vector of the macro variables andfis the column

vector of unknown parameters. Then, the parameters b

t's and f are jointly

estimated by GMM (with a normalizationb

1"1). There are two drawbacks of

this conventional approach. First, the linearity assumption is fairly arbitrary. Second, there is no economic theory that guides the choice of the variables for

F

i. Our model (1) provides a convenient alternative method of estimation and

testing. Condition (A.3) implies that the excess returnsr

it are generated by the

following linear regression model:

r

it"btj1,i#eit, (A.4)

where E(e_itDX

i~1)"0. This model is exactly of the form of model (1) without

regressors, if we replace b

t by ht, and j1,i by ai. Thus, we can estimate the

parametersb

t by the methods discussed in Section 3. It is possible to estimate

b

twithout making any parametric restrictions on thej1,i. Once the parameters

b

t's are estimated, the asset-pricing restriction (A.3) can be tested by standard

GMM speci"cation tests.

Appendix B

Identixcation and correlation between =

i and ai: In the text, we noted that

nonzero covariance between=

i andai was needed to distinguishmai fromei,

and thus to identifyh. Letk_a"E(a_i). We might suppose thatk_aO0 would also

serve to identifyh, even whena_iis not correlated with the explanatory variables.

However, this is not true when the model contains an intercept. To see the

nature of identi"cation problem, we de"ne the model as

Model A: y

it"X0itb0#c0#aiht#eit,

where c

0 is the overall intercept term, X0itcontains Xit and all of Zi

ex-cept interex-cept, and b

0 contains the corresponding coe$cients. Now, for

arbitrary scalar c, consider the following transformation of the parameters:

c*

0"c0#c;k*a"ka!c (corresponding to a*i"ai!c); h*1"h1"1; for

c"k_a,h*

t is arbitrary, and for cOka,h*t"(kaht!c)/(ka!c),t"2,2,¹.

Then we can rewrite the model as

Model B: y

it"X0itb0#cH0#aHihHt#eit.

The expected value of (y

it!X0itb0) is the same in Model B, with any value ofc,

as it is in Model A. Furthermore, if=

i is uncorrelated withuit"aiht#eit, it is

also uncorrelated with u*

it"a*ih*i#eit, and identi"cation would have to be

obtained by some other type of information (e.g., restrictions on the covariance

structure of thee

(29)

identify theh

t through the temporal pattern of the covariance between=i and

u

it, which depends on_Identi_x_{cation when the coe}ht. _z_{cients of the time-in}_v_ariant _v_{ariables are}

time-varying: We show here that the time-varying coe$cients of the time-invariant

regressors,c₁,2,c_T, are not separately identi"ed. To do so, we rewrite Eq. (1) as

y

i"Xitb#Zict#ui; uit"htai#eit. (B.1)

De"ne cH_t"_c

t!htc, wherecis any nonzero vector whose number of entries

equals the number of variables inZ

i. Also de"nea*i"ai#Zic. Then, we can

This result indicates thatc₁,2,c_Tare not separately identi"ed. Some

normaliz-ation restrictions may lead to identi"cation of a set of reduced-form parameters.

For example, suppose that we setcH₁"0. With this restriction, we can rewrite

(B.1) as

y

i"Xitb#ZicHt#uHi;uHit"htaHi#eit, (B.3)

where cH_t"c₂!h

tc1 for t'1 and aHi"ai#Zic1. Then, we can estimate the

reduced-form parameterscH₂,2,cH_Tby the GMM method introduced in Section 3.

Appendix C

Proof of Proposition 1. Using standard results for GMM, the asymptotic

covariance matrix of JN(dK₁!d) equals [B@₁(<

T~1?n). Then it is straightforward to show that

B

the usual regularity conditions for GMM, we can show that

(30)

where the last equality is due to (C.1). This result implies thatdQ is asymptotically equivalent to the GMM estimator based on the moment condition

E[B@₁(<

11)~1b1i]"0. Therefore, we have

cov[JN(dQ!d)]"[B@₁<~1

11B1(B@1<11~1B1)~1B@1<~111B1]~1"(B1@<~111B1)~1.

Proof of Propositions 2 and 3. Under the assumptions of the propositions, we can

write model (2) as Eq. (20) of the text. We denote the covariance matrix ofe

iby

N₍_h_,_o_{), where}_o_{is the vector made up of}_p₂

g and the parameters (U) underlying

the covariance matrix ofe

i. We also de"ner"(b@,c@,h@,n@,o@)@"(d@,n@,o@)@, and

denote its dimension bys

1. Consider the following moment conditions:

E[f

i(r)]"E[ei1=i,ei2=i,2,eiT=i, (v[eie@i]!v[N])@]@"0, (C.3)

wheree

it"yit!Xitb!Zic!ht=infor anyt, andv(N) equals vec(N) with the

above-diagonal elements of N _{deleted. We denote the dimension of} _f

i by s2.

Consider an alternative estimators,rL, which solves

S(r()@f

N(r()"0, (C.5)

whereS(r) is anys

2]s1nonstochastic matrix of full column rank. It is

straight-forward to show that

JN(r(!r)PN[0, (S@F)~1S@*S(F@S)~1]. (C.6)

Investigating the form of cov[JN(r(!r)], we can easily see that the optimal

choice ofSmust beD_~1_F_{. This implies that}_r₈ _{should be e}_$_{cient relative to any}

GMM estimator that solves (C.5).

We will now show that r8 is equivalent to the ML estimator,

r(

The ML estimatorr(

ML satis"es the following"rst-order conditions:

(31)

N

We can verify that with an appropriate choice of S, say S

ML, the conditions

(C.8)}(C.11) can be expressed in the same form as (C.5). Then the e$ciency of

ML estimator implies that S_ML"D_~1_F_{. In order words,} _dI _{is asymptotically}

identical todK_ML.

We now consider the case in whichUis unrestricted. In this case,p2_gandUare

not separately identi"ed, so we simply leto"v(N). For the proof of Proposition 2,

it is enough to show that dK₁ is equivalent to dI. Subtracting h

t times the"rst

subset of moment conditions of (C.3) from the next (¹!1) subsets, and rewriting the results, we obtain the following moment conditions:

E[b@

1i,ei1=i, (v(eie@i)!o)@]@"0. (C.12)

Since condition (C.12) results from a nonsingular transformation of (C.3), the GMM estimates based on (C.12) and (C.3) must be asymptotically equivalent.

Note thatdappears only in the"rst subset of moment conditions in (C.12) while

the next two subsets just identify n and o given d. Then it follows from the

separability result of Ahn and Schmidt (1995a) that GMM based onb

1i alone

yields asymptotically the same estimate ofdas GMM based on all the moment

conditions in (C.5). This result establishes the equivalence ofdK₁ anddI.

Consider the case in whichU"diag(p2₁,p2₂,2,p2_T). Leto"(p2₁,p2₂,2,p2_T)@.

Then we can equivalently express condition (C.3) as

E[b@

1i,b@2i,ei1=i, (ei1ei2!h2p2g,ei12!p2g!p21,ei22!h22p2g!p22,2,

e

iT!h2Tp2g!p2T)@]@"0. (C.13)

Note that the third and fourth subsets of (C.13) just identifyn,oandp2_g givend.

Therefore, by the same reasoning as above, the GMM estimator based on the

"rst two subsets of moment conditions in (C.13) alone is equivalent to that based

on all of the conditions. In other words,dK₂is asymptotically identical todI and

dK_ML. Using similar methods, we can establish the e$ciency ofdK_NH anddK₃.K

Derivation of the asymptotic covariance matrices: Here we derive the

(32)

(BA.1)}(BA.4), (BI) and (CI). Letq"(¹!1)(¹!2)/2. De"ne

22to be theq](¹!2) matrix of which all entries are zero except

that the ith entry in the jth column equals one where i"₊js/1(¹!1!s),

Then with tedious algebra we can obtain the following results:

(33)

H

With these results, we have