07350015%2E2014%2E962697

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20

Download by: [Universitas Maritim Raja Ali Haji] Date: 11 January 2016, At: 19:53

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Rethinking the Univariate Approach to Panel Unit

Root Testing: Using Covariates to Resolve the

Incidental Trend Problem

Joakim Westerlund

To cite this article: Joakim Westerlund (2015) Rethinking the Univariate Approach to Panel Unit

Root Testing: Using Covariates to Resolve the Incidental Trend Problem, Journal of Business & Economic Statistics, 33:3, 430-443, DOI: 10.1080/07350015.2014.962697

To link to this article: http://dx.doi.org/10.1080/07350015.2014.962697

Accepted author version posted online: 25 Sep 2014.

Submit your article to this journal

Article views: 121

View related articles

(2)

Rethinking the Univariate Approach to Panel

Unit Root Testing: Using Covariates to Resolve

the Incidental Trend Problem

Joakim W

ESTERLUND

Department of Economics, Lund University, SE-22007 Lund, Sweden; Deakin University, 3125 Burwood, Australia ([email protected])

In an influential article, Hansen showed that covariate augmentation can lead to substantial power gains when compared to univariate tests. In this article, we ask if this result extends also to the panel data context? The answer turns out to be yes, which is maybe not that surprising. What is surprising, however, is the extent of the power gain, which is shown to more than outweigh the well-known power loss in the presence of incidental trends. That is, the covariates have an order effect on the neighborhood around unity for which local asymptotic power is negligible.

KEY WORDS: Covariates; Incidental trends; Local asymptotic power; Panel data; Unit root test.

1. INTRODUCTION

As is well known, univariate unit root tests, such as the conventional augmented Dickey–Fuller (ADF) test, have low power, and much effort has therefore gone into the development of various modifications aimed to increase power (see Ley-bourne, Kim, and Newbold2005, and the references provided therein). In many cases, however, there is more information to be had, and then power can be increased without the need for such modifications. For example, in regression analysis, due to the risk of obtaining spurious results, it is quite common to pretest for unit roots, and then it seems quite natural to try to make use also of the information contained in the other variables of the model. After all, we typically do not use regressions unless we believe that the included variables are correlated. This is the idea of Hansen (1995), who developed a covariate augmented ADF (CADF) test that is shown to be at least as powerful as the ADF test. (The CADF considered here is not to be confused with the cross-sectionally augmented Dickey–Fuller of Pesaran (2007).)

But while the CADF approach has attracted some attention, the single most common way by far in which researchers have been trying to increase the power of the ADF test is through the use of panel data. Thus, in this case the source of extraneous information is not a set of correlated covariates but rather a cross-section of similar units.

In light of these developments, the question naturally arises if there are any power gains to be made by considering a panel CADF (PCADF) test that exploits both sources of information? Intuitively, since the two types of information are individually important for power, there should be some merit in combining them. Of course, this article is not the first to recognize the value of covariate augmentation in a panel data context (see, e.g., Pesaran2007; Chang and Song2009; Pesaran, Smith, and Yamagata 2013, who used covariates to address the problem of cross-section dependence); however, it is the first to study analytically the power implications of doing so. In other words, while previously the rationale for the covariates (in panels) has

always been to improve upon size accuracy, no one has yet considered their effect on power.

Our main finding is that the information contained in the covariates is useful when testing for a unit root in panel data, and that the power of the PCADF test can be substantially increased, far beyond that achievable by existing tests that do not employ any covariate information. The largest difference occurs in the presence of incidental trends, which are problematic in the sense that their estimation is known to lead to low power. In fact, as Moon, Perron, and Phillips (2007, p. 445) concluded from their analysis of the local power of univariate panel unit root tests with trends:

An important empirical consequence of the present investiga-tion is that increasing the complexity of the fixed effects in a panel model inevitably reduces the potential power of unit root tests. This reduction in power has a quantitative manifestation in the radial order of the shrinking neighborhoods around unity for which asymptotic power is nonnegligible. When there are no fixed effects or constant fixed effects, tests have power in a neighborhood of unity of order 1/√N T (whereNandT denote the size of the cross-section and time dimensions, respectively). When incidental trends are fitted, the tests only have power in a larger neighborhood of order 1/N1/4_T_.

Moon and Perron (1999) showed that the maximum like-lihood estimator of the local-to-unity parameter in near unit root panels is inconsistent. They called this phenomenon, which arises because of the presence of an infinite number of nuisance parameters, an “incidental trend problem,” be-cause it is analogous to the well-known incidental parameter problem in dynamic fixed-T panels. The above-mentioned re-duction in the order of the shrinking neighborhoods around unity for which power is nonnegligible is a manifestation of this problem, and has in fact given rise to a separate liter-ature (see, e.g., Moon and Perron 2004; Moon and Phillips

July 2015, Vol. 33, No. 3 DOI:10.1080/07350015.2014.962697

430

(3)

2004; Moon, Perron, and Phillips 2007; Phillips and Sul 2007). One of the main conclusions from this literature is that the incidental trend problem is a general phenomenon that applies to all panel unit root tests. Indeed, as Moon, Perron, and Phillips (2007, p. 445) concluded, “the present article shows that discriminatory power against a unit root is generally weakened as more complex deterministic regressors are included.”

In this article, we show that this need not be the case, and that the use of covariates can compensate for the loss of power caused by the incidental trends. That is, the PCADF test has nonneg-ligible power within 1/√N T-neighborhoods of the null even if incidental trends are present. This property makes PCADF unique, as there is presently no other test with incidental trends that has power within such neighborhoods. Conversely, if the rate of shrinking is given by 1/N1/4_T_{, unlike existing tests, the}

power of PCADF is actually increasing inN.

2. MODEL AND ASSUMPTIONS

Consider the panel variableyi,t, observable fort =1, . . . , T

time series and i₌1, . . . , N cross-section units. The data-generating process (DGP) of this variable is given by

yi,t =θi′dt+ui,t, (1)

whereui,tis the stochastic part ofyi,t, whiledt =(1, t)′is the

de-terministic part, for which there are two models; (1)θ₌(θ1, θ2)′

withθ1 unrestricted andθ2 =0 (unit-specific intercepts), and

(2)θ unrestricted (unit-specific intercepts and incidental time trends). Thus, while our main focus lies with model 2, for com-pleteness, we will also consider model 1. The stochastic part is allowed to depend on anm-vector of covariates,xi,t, which could

potentially be common acrossi(thereby allowing for some form of cross-section dependence). Specifically,

φi(L)ui,t =ρiui,t−1+vi,t, (2) vi,t =λi(L)′(xi,t−γi)+ǫi,t, (3)

i(L)(xi,t−γi)=εi,t, (4)

where γi =E(xi,t), and i(L)=Im− p

j=1 j iLj,φi(L)=

1−pj=1φj iLj, andλi(L)= p

j=0λj iLj are polynomials in

the lag operator L. In the assumptions that follow k denotes a generic constant, and tr(A) and||A_{|| =}√tr(A′_A_{) denote the}

trace and Frobenius (Euclidean) norm of the matrixA, respec-tively.

Assumption 1.

(i) ηi,t =(ǫi,t, εi,t′ )′ is independent and identically

dis-tributed (iid) such that E(ηi,t)=0, E(ηi,tη′i,t)=ηi =

diag(σ2

ǫi, εi)>0 andE(||ηi,t||k)<∞fork≥4;

(ii) E(||(ui,−p, xi,′−p)′||2), . . . , E(||(ui,0, xi,′0)′||2)<∞;

(iii) i(L) andφi(L) have all roots outside the unit circle, and p

j=0||λj i||<∞.

Similar to Hansen (1995) our asymptotic analysis supposes thatρi is local-to-zero asN, T → ∞. However, since we are

using panel data, the rate of shrinking is different. In particular,

it is assumed that

ρi= φi(1)ci

Nκ_T , (6)

whereκ >0 is a constant andciis drift parameter that satisfies

Assumption 2.

Assumption 2.

(i) ci is iid withµkc =E(cik)<∞fork≥3 andµ0c=1;

(ii) ci andηi,tare mutually independent.

Ifci =ρi =0, thenyi,tis unit root nonstationary, whereas if ci =0, thenyi,tis either locally stationary (ci <0) or locally

ex-plosive (ci >0). The null and alternative hypotheses considered

here are given byH0:c1= · · · =cN=0 andH1:ci=0 for

somei, respectively, which can be formulated more compactly in terms of the moment ofciasH0:µ2c=0 andH1:µ2c>0,

respectively.

As Hansen (1995) showed, with serially correlated er-rors, the power of the CADF test depends not only on ci,

but also on the long-run correlation coefficient between vi,t

and ǫi,t, as given by ρvǫi=σǫi/σvi∈(0,1], where σvi2 = λi(1)′ i(1)−1εi i(1)−1λi(1)′+σǫi2 (see the Appendix). Thus,

ifρvǫi→1, thenxi,tdoes not make any contribution to the

vari-ation in yi,t, whereas if ρvǫi →0, thenxi,t explains all the

variation inyi,t.

Assumption 3. N_i₌₁ρk

vǫi/N →ρkvǫ∈(0,∞) as N → ∞

fork_∈(−∞,_∞).

Remark 1. The assumption thatηi,tis cross-section

indepen-dent is restrictive, but can be relaxed by requiring that some of the elements ofxi,tare constant ini, in which case (3) becomes a

common factor model, with the common elements ofxi,ttaking

the role of common factors. In the article, we assume thatxi,tis

known, in which case the presence of common covariates does not affect the results. If there are common covariates that are unobserved, then one possibility is to follow Bai and Ng (2004), and use estimated principal component factors in their stead. In Section 4, we elaborate on this point. For the time being, however, we maintain the assumption thatxi,tis known.

Remark 2. Assumption 1 (ii) ensures that the initial values ofui,t andxi,tareOp(1), which is relevant if the initialization

took place somewhere in the recent past. While admittedly the simplest way to relax the otherwise so common zero initial value assumption (see, e.g., Moon, Perron, and Phillips2007), the results reported herein hold also when the initialization is in the distant past, such that the initial values ofui,t andxi,t are Op(

√

T ) (see Westerlund2014a). This is different from the time series case where the size of the initial value strongly influences the performance of unit root tests, up to the point of reversing the ranking of different tests (see, e.g., M¨uller and Elliott2003).

Remark 3. The requirement thatcihas at least three moments

is only needed when analyzing the local power, and is not neces-sary for deriving the asymptotic null distribution of the PCADF test, in which caseci and all its moments are zero. In fact, the

current moment condition is less restrictive than the otherwise so common bounded support assumption, which implies that

(4)

all moments are bounded (see, e.g., Moon, Perron, and Phillips 2007; Moon and Perron2008).

Remark 4. Unlike most other studies where the rate of shrink-ing of the local alternative, κ, is prespecified a priori, in the present study the “appropriate” value ofκwill be considered a part of the analysis (see Section3.2for a detailed discussion).

Remark 5. The requirement thatλi(L),φi(L), and i(L) are

all of the same order p is not a restriction. If the orders are different, then we simply setpequal to the maximum order of φi(L) andλi(L). i(L) does not have to be estimated and can

therefore be of any order (even infinite, although that would require changing Assumption 1 (iii)). The lag order also does not have to be the same for alli, but could be allowed to differ without affecting the results.

Remark 6. As withp(see Remark 5), the assumption that the number of regressors contained inxi,t,m, is the same for alliis

not a restriction. Hence, in practice there is nothing that prevents the number of covariates to differ from unit to unit, which is of course a great advantage, especially in applications where data on some units are scarce (see Section4for a discussion). In fact,

Tcould also be allowed to differ across units.

3. MAIN RESULTS

In this section, we begin by introducing the PCADF test statistic and its asymptotic distribution. Then we discuss, in turn, the implications for power and implementation.

3.1 The PCADF Statistic and its Asymptotic Distribution

Let

Rdyi,t−1=yi,t−1−

T

t=p+2

yi,t−1dt′ ⎛ ⎝

T

t=p+2

dtdt′ ⎞ ⎠

−1

dt

be the detrended version of yi,t−1, where Rd is the ordinary

least-square (OLS) residual operator. Equations (1) and (2) can be rewritten as

Rdyi,t =ρiRdyi,t−1+′iRdzi,t+Rdǫi,t, (7)

wherezi,t =(yi,t−1, . . . , yi,t−p, xi,t′ , . . . , xi,t′ −p)′withi =

(φ1i, . . . , φpi, λ′0i, . . . , λ′pi)′being the associated vector of

co-efficients. Define

Ai,T =

1 ˆ σyiσˆǫi

1 T

T

t=p+2

RzRdyi,t−1RzRdyi,t,

Bi,T =

1 ˆ σ_yi2

1 T2

T

t=p+2

(RzRdyi,t−1)2,

where ˆσ2

yi =σˆ

2

vi/(1− p

j=1φˆj i)2 in an estimator of σ_yi2 = σ2

vi/φi(1)2, ˆσvi2 = T

t=p+2vˆ 2

i,t/T, ˆσ

2

ǫi = T

t=p+2ǫˆ 2

i,t/T with

ˆ

vi,t =Rdyi,t−ρˆiRdyi,t−1−

p

j=1φˆj iRdyi,t−j, and ˆρi, ˆi,

ˆ

φj iand ˆǫi,tcoming from the OLS fit of (6), andRzisRd with zi,t in place of dt. LettingAT =

N

i=1AiT/N with a similar

definition ofBT, the PCADF statistic considered in this article

is given by

tPCADF=

√ N AT

BT .

The asymptotic distribution of this test statistic is provided in Theorem 1.

Theorem 1. Under Assumptions 1–3, asN, T _{→ ∞},

tPCADF−

√ N µ_∼

2

j=1

N1/2−j κ((ρ₋1vǫ−ρ1vǫ)r2j−ρ1vǫr1j)

+N1/2−2κρ1vǫ r22

2 +N(0, σ

2

)

+Op µ1c Nκ

+Op µ3c N3κ−1/2

+Op

√ N √

T

,

where∼signifies asymptotic equivalence,

r11 =

µ1cα0β1

2β₀3/2 ,

r12 =

µ2cα0β2

2β₀3/2 − 3µ2

1cα0β12

8β₀5/2 ,

r2j =

µj cβj−1

√ β0

,

µ₌ α_√0ρ1vǫ β0

,

σ2₌1−ρ2vǫ+ α1ρ2vǫ

β0 +

ρ2₁_vǫα₀2α2

4β3 0

,

and numerical values ofα0,α1,α2,β0,β1, andβ2are given in

Table 1.

Remark 7. The last three terms in the asymptotic distribution oftPCADF−

√

N µare remainders. The first two of these are only relevant under the alternative thatci =0, and are negligible for

allκ >1/6 (provided thatci has at least three moments). The

third remainder does not depend onciand is therefore there also

under the unit root null. It follows that for this term to go away we needN/T _→0 as N, T _{→ ∞} (which in practice means that N << T). The reason for this remainder is the assumed heterogeneity of the DGP, whose elimination induces an esti-mation error inT, which is then aggravated when pooling across

N. The condition thatN/T _→0 prevents this error from having a dominating effect.

Remark 8. Under the null hypothesis of a unit root ci and

all its moments are zero, and therefore (tPCADF−

√

N µ)/σ _→d N(0,1) asN, T _{→ ∞}withN/T _→0. The fact that this result holds independently ofρvǫiis very convenient in the sense that

the critical values are always the same. This stands in sharp contrast to the time series case where the asymptotic distribu-tion of the CADF test depends on the value taken byρvǫi(see

Hansen1995).

Remark 9. In absence of covariates and serial correlation, the PCADF test is simply the usual pooledt-test for a unit root, for which there are results that are similar to Theorem 1 (see, e.g., Moon, Perron, and Phillips 2007, Lemma 4; Moon and

(5)

Perron2008, Theorem 4.2). Thus, even without covariates, by allowing for weakly dependent innovations, the present result represents an extension of those theories. However, this is not all. Indeed, while most existing research focus on the first-order effect, Theorem 1 is based on an expansion that keeps terms that are of higher order in the magnitude. It is therefore ex-pected to produce more accurate predictions (see Westerlund and Larsson 2012, for a similar approach). Suppose, for ex-ample, thatρvǫi =1 andκ=1/2, in which case the first term

in the asymptotic distribution oftPCADF−

√

N µsimplifies to −(r11+N−1/2r12)= −r11+O(1/

√

N) (as shown in more de-tail in Section 3.2). Most researches focus on the first-order term, here given byr11, which only depend onµ1c, and they do

not consider the effect ofr12capturing the dependence onµ2c

(see, e.g., Moon, Perron, and Phillips2007). This latter effect can potentially be rather important, because in small samples there is typically a dependence also on higher moments. Indeed, as Moon and Perron (2008, p. 91) concluded from their sim-ulations, “Despite our theoretical results, there is somewhat of a power loss against a heterogeneous alternative in finite sam-ples.” Theorem 1 shows how the test depends onµ2c and is

therefore able to explain this type of behavior.

Remark 10. The PCADF test is based on OLS detrending. Another possibility is to follow Elliott et al. (1996) and use generalized least-square (GLS) detrending. A third possibility is to use recursive detrending, as is done in Westerlund (2014b).

3.2 Implications for Power

To appreciate fully the power implications of Theorem 1, it is instructive to consider the two polar cases of mini∈[1,N]ρvǫi→1

(no covariate information) and maxi∈[1,N]ρvǫi →0 (maximum

covariate information). For simplicity, we focus on the mean of the asymptotic distribution oftPCADF−

√

N µ(although there is also a variance effect), as captured by

2

j=1

N1/2−j κ((ρ₋1vǫ−ρ1vǫ)r2j −ρ1vǫr1j)+N1/2−2κρ1vǫ r22

2 .

(8) We begin with model 1 with unit-specific intercepts but no trends. On one end of the scale, if mini∈[1,N]ρvǫi→

1, then (ρ₋1vǫ−ρ1vǫ)=o(1) and ρ1vǫ →1, which means

that (7) reduces to −2j=1N1/2−j κr1j +N1/2−2κr22/2=

−(N1/2−κ_r

11+N1/2−2κ(r12−r22/2)). The leading term in

this expression is given by −N1/2−κ_r

11. If κ >1/2, then

−N1/2−κ_r

11 =o(1), and therefore power is negligible, whereas

ifκ <1/2, then−N1/2−κ_r

11diverges, and therefore power goes

to one as N _{→ ∞}. Only in the intermediate case when κ ₌ 1/2, such that−(N1/2−κ_r

11+N1/2−2κ(r12−r22/2))= −(r11+

N−1/2(r12−r22/2))= −r11+O(1/

√

N) is power nonnegligi-ble in the usual nonincreasing sense. This is in agreement with the results reported by, for example, Moon and Perron (2004) for theirt+ panel unit root test that does not employ any co-variate information. As usual in the literature, Moon and Perron (2004) only considered the first-order term,r11, which only

de-pends onµ1c. Their results are therefore silent when it comes

to the effect of higher order moments. Theorem 1 includes an

Table 1. Coefficients of the asymptotic distribution of the PCADF statistic

Coefficient Model 1 Model 2

α0 −1/2 −1/2

α1 1/12 1/60

α2 1/45 11/6300

β0 1/6 1/15

β1 1/12 0

β2 1/20 −1/420

NOTE: Models 1 and 2 refer to the cases with a heterogenous intercept but no trend, and heterogenous intercepts and trends, respectively.

additional second-order term,−N−1/2₍_r

12−r22/2), which

de-pends onµ2cand is therefore more general in this regard. Note

in particular that ifµ1c=0 andµ2c>0 (positive and negative

values ofcicancel out), such thatr11=0 and (r12−r22/2)=0,

then −(N1/2−κ_r

11+N1/2−2κ(r12−r22/2))= −N1/2−2κ(r12−

r22/2). This means that while negligible for κ=1/2, power

is nonnegligible forκ ₌1/4. Thus, in this case the results of Moon and Perron (2004) would lead us to believe that there is no power, when in fact there is, but just not within 1/√N T -neighborhoods of the null.

At the other end of the scale, if maxi∈[1,N]ρvǫi→0, then ρ1vǫ→0 and (ρ−1vǫ−ρ1vǫ)=ρ−1vǫ+o(1), where ρ−1vǫ

is divergent. Moon, Perron, and Phillips (2007) derived the power envelope for the model without covariates and showed that it is defined for κ ₌1/2. The PCADF test also has power in such neighborhoods. However, since in this case (7) reduces to2_j₌₁N1/2−j κ₍_ρ

−1vǫ−ρ1vǫ)r2j =(ρ₋1vǫ− ρ1vǫ)r21+O(1/

√

N)=ρ₋1vǫr21+o(1), the power of this test

approaches one as maxi∈[1,N]ρvǫi→0 for anyµ1c=0, such

thatr21=µ1c√β0 =0. It is therefore more powerful than the

existing tests that ignore the covariates.

Let us now consider model 2 with both unit-specific intercept and trends, which is our main focus. If mini∈[1,N]ρvǫi→1,

then we again have that (ρ₋1vǫ−ρ1vǫ)=o(1), and therefore

power is determined by−(N1/2−κ_r

11+N1/2−2κ(r12−r22/2)).

However, sinceβ1 =0 in this case (seeTable 1),r11=r22=

0, which means that −(N1/2−κ_r

11+N1/2−2κ(r12−r22/2))=

−N1/2−2κ_r

12. This shows that power is negligible forκ=1/2,

which is a reflection of the incidental trend problem. However, while negligible for κ₌1/2, since (r12−r22/2)=0, power

is still nonnegligible for κ ₌1/4, which is also the value of κ that defines the power envelope for model 2 without co-variates (Moon, Perron, and Phillips 2007). The fact that the PCADF test “only” has power within 1/N1/4_T

-neighborhoods when mini∈[1,N]ρvǫi→1 of the null is therefore not totally

unexpected.

The situation is, however, very different when mini∈[1,N]ρvǫi→k∈(0,1) (at least some covariate

in-formation). Indeed, since r21 =0, this means that (7) can

be written as N1/2−κ₍_ρ

−1vǫ−ρ1vǫ)r21+O(N1/2−2κ), where

(ρ₋₁_vǫ₋ρ₁_vǫ)r21=0, suggesting that power is no longer

negligible for κ ₌1/2. Thus, as in model 1, the use of the covariates has implications for power. The main difference is that the effect is now much stronger than before, with the covariates even having an effect on the value ofκ for which

(6)

power is nonnegligible. Since the envelope without covariates in this case is defined forκ ₌1/4, PCADF is again more powerful than existing tests. However, unlike the situation in model 1, this superiority does not require maxi∈[1,N]ρvǫi→0. In fact, all

that is needed for power within 1/√N T-neighborhoods of the null is that the fraction of cross-section unit for whichρvǫi <1

is nonnegligible, such that (ρ₋1vǫ−ρ1vǫ)>0. Moreover, since

power approaches one as maxi∈[1,N]ρvǫi →0 (see the above

discussion for model 1), the power of PCADF with trends can be made arbitrarily close to the power of the same test without trends, meaning that the covariates should be able to compensate fully for the loss of power caused by the incidental trends.

Remark 11. The intuition for the increased power in the presence of covariates can be appreciated by looking at (1) and (2), which with θi=0 can be rewritten as φi(L)yi,t= ρiyi,t−1+vi,t. The corresponding model conditional onxi,tand

assuming for simplicity thatγi =0 is given by φi(L)yi,t= ρiyi,t−1+λi(L)′xi,t+ǫi,t. The variance of ǫi,t is given by σ2

ǫi =σ

2

vi−λi(1)′ i(1)−1εi i(1)−1λi(1)′≤σvi2 (see Section

2), suggesting that the OLS estimator of the parameters of the conditional model will be more precise, leading to a more pow-erful test statistic. Of course, as a referee of this journal cor-rectly points out, while indicative of higher power, this does not imply in any way the above-mentioned effect on κ when mini∈[1,N]ρvǫi→k∈(0,1).

4. ISSUES OF IMPLEMENTATION

4.1 Mean and Variance Correction Factors

As pointed out in Remark 8, for standard normal inference (under the null), we use (tPCADF−

√

N µ)/σ. However, this test statistic is not really feasible, asµandσ2_{depend on}_ρ

kvǫ. In

applications, this quantity therefore has to be replaced by an estimator. A natural consistent candidate is given by ˆρkvǫ = N

i=1ρˆ

k

vǫi/N, where ˆρvǫi=σˆǫi/σˆvi. (The sample correlation

coefficient between ˆvi,t and ˆǫi,t can also be used to estimate

ˆ ρvǫi.)

Remark 12. The PCADF test can be applied regardless of whether there are any covariates available. Without covariates, the test is similar in spirit to the one of Levin, Lin, and Chu (2002). The main difference lies in the definition ofµ. In this article,µis asymptotic, whereas in Levin, Lin, and Chu (2002) it is estimated using kernel methods, which not only compli-cates the computation of the test statistic, but can also lead to poor small-sample performance (Westerlund and Breitung 2013). The PCADF test is therefore expected to be more robust in this regard.

4.2 Critical Region

Whether the test should be one- or two-sided depends on what one is willing to assume regarding the DGP. If µ1c is driving

power (see Sections3.1and3.2) and the null is tested against the one-sided (locally stationary) alternative thatci < 0, then

the left-tail standard normal critical values are enough. As al-ready mentioned, most research only considerµ1c. It is therefore

standard to focus on left-tailed tests. The problem is if µ2c is

driving power (as would be the case if positive and negative values ofci cancel out; see Section 3.2), which calls for the

use of a two-sided test. Thus, if the researcher has little or no feeling for the integration properties of his/her data, it is prob-ably safest to use a two-sided test (although this is expected to lead to a loss of power when compared with the case when the alternative is known to be one-sided). (Needless to say, the choice of alternative matters for interpretation of the test out-come. If the alternative is formulated asci <0, then a rejection

should be taken as evidence in favor of stationarity, whereas if the alternative is formulated asci =0, then a rejection should

be interpreted more broadly as providing evidence against the unit root null.)

4.3 Cross-Section Dependence

As mentioned in Remark 1, one way to accommodate cross-section dependence in the current DGP is to assume that some of the elements inxi,t are common acrossi, which, if known,

will not affect the results presented so far. If there are common covariates that are unobserved, one possibility is to follow Bai and Ng (2004,2010), and use estimated principal component factors in their stead. To formalize the ideas, suppose that the DGP is again given by (2)–(4) but thatyi,tin (1) has the following

factor structure:

yi,t=θi′dt+′iFt+ui,t, (9)

where Ft is an r-dimensional vector of common factors (or

unobserved covariates) withi being the associated vector of

factor loadings, anddtandui,tare as before.

Assumption 4.

(i) i is nonrandom such that ||i||<∞ and N

i=1i′i/N →>0 asN → ∞;

(ii) Ft =(L)gt, where gt is iid with E(gt)=0, E(gtgt′)=g >0, E(||gt||4)<∞, (L)=

∞

n=0nLn, E[Ft(Ft)′]=

∞

n=0ng′n>0,

∞

n=0j||n||<∞, and(1) has rankr∗∈[0, r];

(iii) ui,tandgtare mutually independent.

Assumption 4 is the same as in Bai and Ng (2004, 2010), and we therefore refer to these articles for a discussion. An important feature of the above DGP is thatFtandui,tcan have

different orders of integration (asr∗_{, the rank of the long-run}

covariance matrix ofFt, is not required to be full, but can take

on any value in [0, r]). In this section, however, we focus on testingui,t; see Bai and Ng (2004) for a detailed treatment of

the testing ofFt. The basic idea is exactly the same as in Bai and

Ng (2004,2010), that is, we begin by estimating and subtracting fromyi,tan estimate of′iFt. Since the resulting “defactored” yi,tis consistent forui,t, it can be subjected to any existing panel

unit root test. While Bai and Ng (2004) considered one of the combination ofp-value type statistics of Choi (2001), Bai and Ng (2010) considered versions of the pooledt-tests of Moon and Perron (2004). In this section, we applytPCADF.

Consider model 1. Under the above conditions,

yi,t=′iFt+ui,t, (10)

which is just a static common factor model foryi,t. However,

unlike the common factor model for yi,t, in the above model

(7)

both the common and idiosyncratic components are stationary. Applying the principal components method to this model yields estimates ˆi andFt of (the space spanned by)i andFt,

respectively. The defactored version ofyi,t is simply the

accu-mulated sum of ui,n=yi,n−ˆ′iFn; ˆui,t= t

n=2ui,n

fort₌2, . . . , T and ˆui,1=0. The resulting PCADF test

statis-tic, denotedt_PCADF∗ , is just as before but withyi,t replaced by

ˆ

ui,t. In model 2,yi,thas a nonzero mean (given byθ2). In this

case, we therefore demeanyi,tprior to application of principal

components.

Proposition 1. Under Assumptions 1–4, asN, T _{→ ∞}with N/T _→0 andκ >1/6,

t_PCADF∗ ₋√N µ_∼

2

j=1

N1/2−j κ((ρ₋₁_vǫ₋ρ₁_vǫ)r2j −ρ1vǫr1j)

+N1/2−2κρ1vǫ r22

2 +N(0, σ

2

).

According to Proposition 1, the PCADF statistic based on the defactored data has the same asymptotic distribution as the original test statistic in the case without common factors. In other words, the defactoring has no effect on the local power of the test for a unit root inui,t.

4.4 Selecting the Covariates

Of course, one may argue that in applications the above results are somewhat “idealized,” in the sense that the covariates inxi,t

might be difficult to find. However, we argue that this criticism need not be too much of a problem. There are a number of reasons for this.

• Most variables in economics and finance are correlated, a finding with ample theoretical support. Indeed, as Pesaran, Smith, and Yamagata (2013) argued, in these fields it is actually difficult to find variables that are uncorrelated. For example, in testing for unit roots in a panel of real outputs, one would expect the shocks to output to also manifest themselves in employment, consumption, and investment. In the case of testing for unit roots in inflation, one would expect the shocks to inflation to also affect short-term and long-term interest rates. Hence, given the availability of panel data, candidate covariates should be relatively easy to find. Also, as pointed out in Section1, typically the unit root testing is a part of the analysis of multiple variables, in which case relevant covariate candidates are particularly easy to find.

• Pretesting for covariate relevance is very simple. Indeed, because of the differing orders of magnitude of the associ-ated variables, the OLS estimators ofρiandi in (6) are

asymptotically uncorrelated, suggesting that we do not lose generality by considering a separate hypothesis test fori.

The OLS estimator ˆi ofi is asymptotically normal (a

formal proof is available upon request), suggesting that the testing can be carried out in the usual manner using, for example, a Wald test.

• As already pointed out (see Remark 6), the number of included covariates for each unit can differ. Hence, since the only thing that matters for power is the information content of the average covariate, as measured byρ₋₁_vǫand

ρ1vǫ(see Theorem 1), there can even be units wherexi,t=

{∅}. Needless to say, this flexibility is a great advantage in practice, as it allows one to selectively pick those covariates for each unit that are most relevant/readily available. • As long as the lag augmentation order is larger than p

(the true order), there is no need to pinpointp. Indeed, if λi(L)=0, such that the covariates are absent, the

asymp-totic distribution of the (unrestricted) PCADF statistic is still the same as in Theorem 1. This means that the asymp-totic “price” of including redundant lags is zero, a result that is verified by our simulations (see in Section5). Simi-larly, the price of including redundant covariates (contem-poraneously and/or in lagged form) is also zero. Erroneous omission of covariates is, on the other hand, more prob-lematic, as in this case Theorem 1 need not hold. However, even in situations such as this there is still “hope,” in that λi(L) does not have to be zero; ifλi(L)=λ0i, such thatxi,t

only enters the equation forvi,t contemporaneously, then

Theorem 1 continues to hold even ifxi,tis omitted (a proof

is available upon request). As a rule, though, all covariates that are significant should be included in the testing. Then there is also the fact that if one would like to entertain the possibility of (omitted) covariates, one is likely to be better off using the covariates available rather than no covari-ates at all (as when using a conventional univariate panel data test).

Remark 13. In practice, the elements inxi,t need not be

sta-tionary. In such cases, we recommend first-differencing all unit root covariates. (Most of the assumptions placed on the covari-ates can be relaxed as long as the number of units for which the assumptions fail remains fixed asN _{→ ∞}. We can, for exam-ple, permit for unit root covariates, provided that the number of unit root units is fixed. The intuition is simple; if the number is fixed, then the faction of units with a violation goes to zero as N _{→ ∞}, and therefore their impact on the test is going to be negligible. In practice this means that the number of units with a violation should be “small” relative toN.) The results reported in Theorem 1 are unaffected by this. Thus, just as in cointegration analysis, unless the order of integration ofxi,t is

known, it should be pretested for unit roots. The problem is if the order of integration of xi,t is misspecified. Hansen (1995)

showed that while erroneous inclusion of unit root covariates invalidates the test, over-differencing only results in mild power losses. He therefore recommended taking first differences not only of all unit root covariates but also of all near unity root co-variates, and so do we. (Some simulation results for the case of under/over-differenced covariates are available upon request.)

5. SIMULATIONS

In this section, we investigate the small-sample properties of the (nondefactored) PCADF test through a small simulation study using (1)–(5) as DGP. For simplicity, we assume that m₌1, θi =0, γi=0, (ǫi,t, εi,t)∼N(0, I2), ui,0=xi,0=0,

κ =1/2, and ci ∼U(a, b). Note in particular how the mean

and variance of ci can be written in terms of a and b as µ1c=(a+b)/2 and µ2c−µ21c=(b−a)

2_/_{12, respectively.}

For our theory to provide an accurate description of actual test behavior, the values ofaandbcannot be “too large,” as this will

(8)

Table 2. Size whenρvǫ=1

N T t+ _Vˆ_{N T} _V_{N T} _t_PCADF

Model 1

10 100 6.4 2.0 4.5 8.6

10 200 5.9 1.7 4.9 7.1

10 400 5.4 2.3 4.6 6.2

20 100 5.5 2.6 5.1 7.2

20 200 5.4 3.5 5.7 6.4

20 400 6.5 3.2 5.2 6.9

40 100 5.7 4.3 5.0 9.1

40 200 5.6 3.8 5.0 6.7

40 400 5.4 3.2 5.2 6.3

Model 2

10 100 5.2 2.5 4.5 9.5

10 200 5.1 2.7 4.9 7.7

10 400 5.3 3.5 4.6 6.9

20 100 4.2 2.9 5.1 10.6

20 200 5.3 4.3 5.7 8.2

20 400 5.7 4.8 5.2 7.1

40 100 4.1 2.7 5.0 13.1

40 200 4.7 4.7 5.0 8.1

40 400 4.7 4.7 5.2 6.5

NOTES:t+_{and ˆ}_V

N T refer to the tests of Moon and Perron (2008), and Moon, Perron,

and Phillips (2007), respectively,VN Trefers to the power envelope for the model without

covariates, andρvǫrefers to the (homogenous) correlation betweenvi,tandǫi,t. SeeTable

1for an explanation of models 1 and 2.

tend to activate theOp(µ1c/

√

N) andOp(µ3c/N) remainders

in the asymptotic distribution reported in Theorem 1. Similarly, in order not to activate theOp(

√

N/T) remainder, in the sim-ulations we set T >> N. The presence of serial correlation did not have any major effects on the results, and we there-fore also setφi(L)= i(L)=1 andλi(L)=λ0. (For example,

withφi(L)=1−φ1Landφ1=0.5 the results for the PCADF

test were basically indistinguishable from the ones already in the article. Another advantage of focusing on the results for the case without serial correlation is that enables comparison with the power envelope and point optimal test of Moon, Perron, and Phillips (2007).) Thus, in this DGP,σ_ǫi2 ₌1 andσ_vi2 ₌λ2₀₊1, and henceρ_vǫi2 ₌ρ2_vǫ₌1/(λ2₀₊1).

The PCADF test is constructed as left-tailed with ˆρkvǫin place

ofρmvǫ when computingµandσ2(see Section4). The power

envelope for the model without covariates, denoted VN T, the

ˆ

VN T common point-optimal test of Moon, Perron, and Phillips

(2007), and the t+ _{test of Moon and Perron (2008) are also}

simulated. (VN T is based on settingciin the test to−0.5. Moon,

Perron, and Phillips (2007) also considered (in our notation) ci = −1 and ci = −2; however, in our simulations ci = −5

generally led to the best performance.) All tests are carried out at the 5% level, and the number of replications is set to 3000. All powers are adjusted for size.

The size results for the case whenρvǫ=1 are reported in

Table 2. (As alluded to in Remark 8, the asymptotic distribution of the PCADF test is asymptotically invariant with respect to ρvǫ, a result that is supported by our (unreported) simulation

results. Hence, since under the null the value ofρvǫis irrelevant,

inTable 2we focus on the case whenρvǫ =1.) If asymptotic

theory is a reliable guide to the small-sample behavior of the

tests, all sizes should be close to 5%. In agreement with this, we see that while generally oversized, as expected, the distor-tions of the PCADF test tend to diminish with increases inT. Conversely, the distortions increases with decreases inT; there-fore, the distortions forT <100 are generally larger than those reported inTable 2. Another observation is that, whilet+ and PCADF are oversized, ˆVN T is undersized. However, the

distor-tions are generally not larger than that they can be attributed to simulation uncertainty. Indeed, with 3000 replications the 95% confidence interval for the size of the 5% level tests studied here (in %) is [4.2,5.8].

The power results reported inTables 3and4can be summa-rized as follows:

• Power is usually above what is predicted by asymptotic theory, as obtained by simulating the asymptotic distribu-tion given in Theorem 1 with all parameters set to their values in the DGP, especially forρvǫ close to one.

How-ever, the discrepancy diminishes with increases inN and

T.

• The asymptotic distributions of ˆVN T, t+, and tPCADF in

model 1 whena₌bandρvǫ =1 are given byµ1c/

√ 2+ N(0,1), 3√5µ1c/3

√

51+N(0,1), and √30µ1c/16+ N(0,1), respectively. (The asymptotic distributions oft+ and ˆVN T are given in Moon, Perron, and Phillips (2007,

sec. 4.1).) Consistent with this we see that ˆVN T is

gen-erally most powerful, at least among the larger values of

N, followed byt+_{and then}_t

PCADF. However, we also see

that there is a large range of empirically relevant values for

N andT where the difference in power is not that large. The PCADF test therefore performs well even when the covariates are irrelevant.

• As expected, the power of the PCADF test for the case when ρvǫ=1 is mainly driven by µ1c. However, there

is also a second-order effect working through variance of ci. In particular, both the empirical and theoretical power

seem to be decreasing in |a₋b_|. This is illustrated in Table 3, which reports power fora _{= −}4 andb₌0, and a₌b_{= −}2. Thus, whileµ1cis the same in the two cases,

in the former the variance is larger (4/3 as compared to zero).

• Sinceκ ₌1/2 in the simulations, whenρvǫ =1 in model

2 none of the tests considered, including PCADF, should have any power beyond size, and this is also what we see inTable 3.

• Asρvǫ is reduced, the relative power of the PCADF test

increases. This is seen inTable 4. Take, for example, the case whenρvǫ=0.3 in model 1, in which the power of

the PCADF test is almost two times as large as the power envelope for the model without covariates, and it is almost four times as large as the power oft+and ˆVN T.

• As expected, the difference in power whenρvǫ<1 is larger

in model 2 than in model 1, with PCADF being the only test with power beyond size. In fact, according to the results reported inTable 4, in this case the power of the PCADF test is no less than 10 times as large as that of t+ and

ˆ

VN T. Of course, since the power of the two latter tests

is negligible, while the power of the former is not, the

(9)

Table 3. Size-adjusted power whenρvǫ=1

a_{= −}4, b₌0 a₌b_{= −}2

N T t+ _Vˆ_{N T} _V_{N T} _t_PCADF _Theory _t+ _Vˆ_{N T} _V_{N T} _t_PCADF _Theory

Model 1

10 100 9.7 14.7 52.8 7.6 8.6 9.9 15.6 43.8 7.6 10.2

10 200 11.2 17.7 50.7 9.0 7.0 12.1 19.9 41.7 9.3 8.2

10 400 13.0 19.3 51.1 8.9 7.9 13.2 22.1 41.7 9.0 9.5

20 100 12.2 18.8 48.3 10.5 8.5 13.0 21.0 39.8 10.2 9.5

20 200 13.3 19.7 48.1 10.7 7.9 14.9 23.0 39.7 11.3 9.1

20 400 13.9 26.2 47.6 10.6 8.9 15.0 28.2 39.2 11.7 10.1

40 100 13.7 21.8 49.2 11.1 9.5 14.8 23.9 40.9 12.0 9.9

40 200 15.9 26.1 50.6 11.4 9.6 16.9 28.8 42.1 12.1 10.3

40 400 15.8 28.6 49.9 11.0 10.0 16.8 31.3 41.3 11.7 11.1

Model 2

10 100 5.5 5.9 5.0 5.5 5.6 5.3 5.6 5.0 5.2 5.5

10 200 5.9 6.2 5.0 5.3 5.3 5.8 5.7 5.0 5.2 5.2

10 400 5.5 5.8 5.0 5.5 5.6 5.5 5.5 5.0 5.3 5.4

20 100 5.4 5.6 5.0 4.8 5.4 5.4 5.5 5.0 5.0 5.3

20 200 5.2 5.7 5.0 5.5 5.4 5.3 5.6 5.0 5.5 5.3

20 400 5.1 5.6 5.0 5.5 5.3 5.1 5.5 5.0 5.4 5.2

40 100 5.3 5.4 5.0 5.3 5.2 5.2 5.1 5.0 5.4 5.2

40 200 5.3 5.5 5.0 5.4 5.3 5.1 5.3 5.0 5.3 5.2

40 400 5.6 5.9 5.0 5.6 5.3 5.5 5.9 5.0 5.6 5.3

NOTES:aandbare such thatρi=ci/√N T, whereci∼U(a, b). “Theory” refers to the theoretical power of the PCADF test (see Theorem 1). SeeTables 1and2for an explanation of the rest.

difference in power will increase as a andb get further away from zero.

The results for the PCADF test based on defactored data are not reported but we briefly describe them. First, as expected, the

size results are very close to those reported inTable 2. This is true regardless of howiandFt are generated. Second, power

is very close to the theoretical prediction obtained by simulating the asymptotic distribution given in Proposition 1. Hence, power is unaffected by the defactoring.

Table 4. Size-adjusted power whena=b= −2

ρvǫ=0.7 ρvǫ=0.3

N T t+ _Vˆ_{N T} _V_{N T} _t_PCADF _Theory _t+ _Vˆ_{N T} _V_{N T} _t_PCADF _Theory

Model 1

10 100 12.1 16.6 43.8 19.1 16.7 11.5 14.6 43.8 74.9 58.3

10 200 11.8 18.4 41.7 19.6 13.9 12.0 17.8 41.7 74.5 55.5

10 400 12.9 21.9 41.7 17.9 15.3 13.4 22.7 41.7 74.8 56.2

20 100 12.1 19.5 39.8 19.4 15.9 12.2 19.5 39.8 78.0 62.5

20 200 14.9 24.4 39.7 21.3 15.7 13.9 23.2 39.7 77.8 63.2

20 400 16.0 26.5 39.2 22.4 16.3 16.7 24.5 39.2 78.4 62.3

40 100 14.2 23.1 40.9 19.2 17.6 15.6 22.5 40.9 79.2 69.8

40 200 15.1 25.8 42.1 24.1 17.7 15.9 29.9 42.1 79.9 70.6

40 400 19.0 35.5 41.3 22.7 18.6 17.2 29.5 41.3 81.0 69.8

Model 2

10 100 5.4 6.1 5.0 11.7 11.1 5.9 5.8 5.0 45.3 48.8

10 200 5.4 5.5 5.0 11.0 8.9 5.5 6.0 5.0 45.5 46.5

10 400 5.6 5.6 5.0 10.5 10.2 5.8 5.0 5.0 47.6 46.1

20 100 5.2 5.5 5.0 11.0 9.4 5.4 5.5 5.0 45.7 44.0

20 200 5.7 5.3 5.0 11.3 9.0 5.4 5.2 5.0 48.8 44.1

20 400 5.5 5.4 5.0 10.9 9.9 5.3 5.7 5.0 47.0 43.7

40 100 5.5 5.5 5.0 10.5 9.3 5.1 5.2 5.0 44.1 45.1

40 200 4.9 5.7 5.0 10.2 9.6 5.2 5.2 5.0 46.8 46.6

40 400 5.5 5.4 5.0 12.2 9.9 5.3 5.6 5.0 48.8 45.0

NOTE: SeeTables 1–3for an explanation.

(10)

Overall, the simulation results suggest that our asymptotic theory provides a useful guide to the small-sample performance of the PCADF test. They also suggest that the PCADF test can lead to substantial power gains when compared to existing tests, especially in model 2. In fact, since the inclusion of irrelevant covariates seems to cause only minor reductions in power, the use of PCADF seems to come at little or no cost.

6. CONCLUDING REMARKS

The power increasing potential of covariate augmentation in the panel setting is interesting not only by itself but also because of the implications it has for theoretical and applied work. For example, the fact that in the presence of incidental trends the use of covariates has an order effect on the shrinking neighbor-hoods around unity for which asymptotic power is nonnegligible is expected to have implications for the rate of consistency for estimation of autoregressive roots near unity (see Moon and Phillips2004, and the references provided therein). This possi-bility is currently being explored in a separate work. From an applied point of view, the minor additional complication of hav-ing to estimateρvǫihas a major benefit in that the precision of

the test is expected to be drastically improved. This is especially true in the case of incidental trends, in which some authors have even gone as far as to recommend not using some of their uni-variate panel data tests (see, e.g., Moon and Perron2004,2008). Thus, given the availability of data on potential covariates, and the fact that their information content does not even have to be particularly high, the PCADF test developed here should be a valuable addition to the already existing menu of panel unit root tests.

APPENDIX: PROOFS

Proof of Theorem 1. We begin by considering model 1. By the Beveridge–Nelson decomposition of λi(L), λi(L)=λi(1)+

where the first remainder in the first equality is due to the approximation p(p₋1)x2

=(px)2

+O(px2_{). The second equality holds, because}

O_p(c2

3κ_{). The above result implies}

1

ConsiderBi,T. By direct calculation using the definition of theRz

operator,

Similarly, from the definitions of ˆσ2

yi and ˆσǫi2, and the consistency

T). (The details of these calculations are available upon re-quest.) By using this, Taylor expansion of the inverse of ˆσ2

yi, and then

(11)

and theO

p(c

3

i/N3κ) remainder includes all other cross-products.

Sim-ilarly, since

we obtain, by the same arguments used for expandingBi,T,

Ai,T =

together with another Taylor expansion of the inverse square root of BT, implies

The values ofβjcan be obtained by direct calculation.β0is

particu-larly simple and is given byβ0=1/6 (see Levin et al.2002). Consider

the first result reduces tor2_/_{2, whereas in the second result we can}

simply interchangesandt, and thus alsorandv. By using this, the fact thatR1i,t=

(12)

asT _{→ ∞}, which uses that

Let us now considerI1N. By Taylor expansion to the second order

of the inverse square root of2_j₌₀N−j κ_µ

wherer11andr12are implicitly defined.

Next, considerI2N T. By using deviations from means,

1 t, suggesting that

E Phillips and Moon (1999) are satisfied (details are available upon re-quest). Therefore,

(13)

asN, T _{→ ∞}. By using this,2_j₌₀N−j κ_µ

j cβj=β0+O(Nµ1c/κ),

and Taylor expansion of the inverse square root,

I2N T =

by Theorem 2 of Phillips and Moon (1999),

1

asT _{→ ∞}. A similar calculation reveals that

α1= lim

(see Levin et al.2002). Therefore, by Theorem 2 of Phillips and Moon (1999),

show thatZ2is uncorrelated withZ1, and hence independent by

nor-mality. Thus, since

As for the second term inI3N T, by using Taylor expansion of the

inverse square root of 2_j₌₀N−j κ_B

This result, together with Theorem 2 of Phillips and Moon (1999), and

α2= lim

T→∞var(B0i,T)= 1 45

(Levin et al.2002), implies

√

available upon request). It follows that

I3N T →d N(0, σ2) (A.9)

Thus, putting everything together,

tPCADF−

and so the proof for model 1 is complete.

(14)

As for model 2, lettingat=(4−6t /T) andbt =(12t /T −6), it

can be shown that

RdR2i,t−1=R2i,t−1−atR2i,−1−bt

except of course for the change in moments, which are now those of a detrended random walk. α0,β0, α1, andα2 are relatively easy

to compute and are given byα0= −1/2,β0=1/15,α1=1/60, and

α2=11/6300 (see Levin et al.2002). This leaves us withβ1andβ2.

We begin by consideringβ1. Clearly,

B1i,T =

where, by using some of the previous results,

1

β2comprises two terms, of which the first can be expanded as

1

and some previous results,

1

(15)

asT _{→ ∞}. Thus, putting everything together,

1 σ2 yi

1 T4

T

t=p+2

E((RdU1i,t−1)2)→

1 420.

The second term inβ2is

1 σ2 yi

1 T4

T

t=p+2

E(RdU0i,t−1RdU2i,t−1)

= 1 σ2 vi

1 T4

T

t=p+2

E

R2i,t−1 R0i,t−1−atR0i,−1

−bt

1 T2

T

s=p+2

sR0i,s−1

+O

p 1 √

T

,

where

1 σ2 vi

1 T5

T

t=p+2 T

s=p+2

btsE(R2i,t−1R0i,s−1)

→ 1 60

1

r=0

(12r−6)(10r3−r5)dr= 29 210,

1 σ2 vi

1 T3

T

t=p+2

atE(R2i,t−1R0i,−1)

→1 6

1

r=0

(2₋3r)r3₍₄

−r)dr_{= −}1 20

asT _{→ ∞}. Therefore,

1 σ2 yi

1 T4

T

t=p+2

E(RdU0i,t−1RdU2i,t−1)→ −

1 210,

which in turn implies

E(B2i,T)→ −

1 420,

as required for the proof for model 2.

Proof of Proposition 1. This proof follows is analogous to the proof of Theorem 1 in Westerlund (2014c). The details are therefore

omit-ted.

ACKNOWLEDGMENTS

Previous versions of this article were presented at a work-shop in Maastricht and at a seminar at Deakin University. The author thanks workshop and seminar participants, and in par-ticular J¨org Breitung, Rong Chen (Editor), In Choi, Rolf Lars-son, Jean-Pierre Urbain, one associate editor, and two anony-mous referees for many valuable comments and suggestions. Financial support from the Knut and Alice Wallenberg Foun-dation through a Wallenberg Academy Fellowship is gratefully

acknowledged. Thank you also to the Jan Wallander and to Hedelius Foundation for financial support under grant number P2014–0112:1.

[Received December 2012. Revised August 2014.]

REFERENCES

Bai, J., and Ng, S. (2004), “A Panic Attack on Unit Roots and Cointegration,” Econometrica, 72, 1127–1177. [431,434]

——— (2010), “Panel Unit Root Tests with Cross-Section Dependence: A Further Investigation”,Econometric Theory, 26, 1088–1114. [434] Chang, Y., and Song, W. (2009), “Test for Unit Roots in Small Panels With

Short-Run and Long-Run Dependence,”Review of Economic Studies, 76, 903–935. [430]

Choi, I. (2001), “Unit Root Tests for Panel Data,”Journal of International Money and Finance, 20, 249–272. [434]

Elliott, G., Rothenberg, T. J., and Stock, J. H. (1996), “Efficient Tests for an Autoregressive Unit Root”,Econometrica, 64, 813–836. [433]

Hansen, B. E. (1995), “Rethinking the Univariate Approach to Unit Root Test-ing: Using Covariates to Increase Power,”Econometric Theory, 11, 1148– 1171. [430,431,432,435]

Levin, A., Lin, C., and Chu, C.-J. (2002), “Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties,”Journal of Econometrics, 108, 1–24. [434,439,441,442]

Leybourne, S. J., Kim, T.-H., and Newbold, P. (2005), “Examination of Some More Powerful Modifications of the Dickey-Fuller Test,”Journal of Time Series Analysis, 26, 355–369. [430]

Moon, H. R., and Perron, B. (1999), “Maximum Likelihood Estimation in Panels With Incidental Trends,”Oxford Bulletin of Economics and Statistics, 61, 771–748. [430]

——— (2004), “Testing for Unit Root in Panels with Dynamic Factors,”Journal of Econometrics, 122, 81–126. [430,433,434,438]

——— (2008), “Asymptotic Local Power of Pooled t-Ratio Tests for Unit Roots in Panels With Fixed Effects,”Econometrics Journal, 11, 80–104. [432,433,436,438]

Moon, H. R., Perron, B., and Phillips, P. C. B. (2007), “Incidental Trends and the Power of Panel Unit Root Tests,”Journal of Econometrics, 141, 416–459. [430,431,432,433,436]

Moon, H. R., and Phillips, P. C. B. (2004), “GMM Estimation of Autoregressive Roots Near Unity With Panel Data,”Econometrica, 72, 467–522. [431,438] M¨uller, U., and Elliott, G. (2003), “Tests for Unit Roots and the Initial

Condi-tion,”Econometrica, 71, 1269–1286. [431]

Pesaran, H. M. (2007), “A Simple Panel Unit Root Test in Presence of Cross-Section Dependence,”Journal of Applied Econometrics, 22, 265–312. [430] Pesaran, H. M., Smith, L. V., and Yamagata, T. (2013), “Panel Unit Root Test in the Presence of a Multifactor Error Structure,”Journal of Econometrics, 175, 94–115. [430,435]

Phillips, P. C. B., and Moon, H. R. (1999), “Linear Regression Limit Theory of Nonstationary Panel Data,”Econometrica, 67, 1057–1111. [440,441] Phillips, P. C. B., and Sul, D. (2007), “Bias in Dynamic Panel Estimation With

Fixed Effects, Incidental Trends and Cross Section Dependence,”Journal of Econometrics, 137, 162–188. [431]

Westerlund, J. (2014a), “Pooled Panel Unit Root Tests and the Effect of Past Initialization,”Econometric Reviews. [431]

——— (2014b), “The Power of PANIC,”Journal of Econometrics.[433] ——— (2014c), “The Effect of Recursive Detrending on Panel Unit Root Tests,”

Journal of Econometrics. [443]

Westerlund, J., and Breitung, J. (2013), “Lessons From a Decade of IPS and LLC,”Econometric Reviews, 32, 547–591. [434]

Westerlund, J., and Larsson, R. (2012), “Testing for Unit Roots in a Panel Random Coefficient Model,”Journal of Econometrics, 167, 254–273. [433]