07350015%2E2014%2E880349

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20 Download by: [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI

TANJUNGPINANG, KEPULAUAN RIAU] Date: 11 January 2016, At: 20:58

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Selecting the Correct Number of Factors in

Approximate Factor Models: The Large Panel Case

With Group Bridge Estimators

Mehmet Caner & Xu Han

To cite this article: Mehmet Caner & Xu Han (2014) Selecting the Correct Number of Factors in Approximate Factor Models: The Large Panel Case With Group Bridge Estimators, Journal of Business & Economic Statistics, 32:3, 359-374, DOI: 10.1080/07350015.2014.880349

To link to this article: http://dx.doi.org/10.1080/07350015.2014.880349

Published online: 28 Jul 2014.

Submit your article to this journal

Article views: 166

View related articles

View Crossmark data

(2)

Selecting the Correct Number of Factors in

Approximate Factor Models: The Large Panel

Case With Group Bridge Estimators

Mehmet C

ANER

Department of Economics, Raleigh, North Carolina State University, NC 27695 ([email protected])

Xu H

AN

Department of Economics and Finance, City University of Hong Kong, Hong Kong ([email protected])

This article proposes a group bridge estimator to select the correct number of factors in approximate factor models. It contributes to the literature on shrinkage estimation and factor models by extend-ing the conventional bridge estimator from a sextend-ingle equation to a large panel context. The proposed estimator can consistently estimate the factor loadings of relevant factors and shrink the loadings of ir-relevant factors to zero with a probability approaching one. Hence, it provides a consistent estimate for the number of factors. We also propose an algorithm for the new estimator; Monte Carlo experiments show that our algorithm converges reasonably fast and that our estimator has very good performance in small samples. An empirical example is also presented based on a commonly used U.S. macroeconomic dataset.

KEY WORDS: Bridge estimation; Common factors; Selection consistency.

1. INTRODUCTION

In recent years, factor models have become increasingly im-portant in both finance and macroeconomics. These models use a small number of common factors to explain the comovements of a large number of time series. In finance, factor models are the foundation for the extension of the arbitrage pricing theory (Chamberlain and Rothschild1983). In macroeconomics, em-pirical evidence shows that a few factors can explain a substan-tial amount of the variation in major macroeconomic variables (Sargent and Sims1977; Stock and Watson 1989; Giannone, Reichlin, and Sala2004). Cross-country differences have been explained using factor models by Gregory and Head (1999) and Forni et al. (2000). Other applications of factor models include forecasting (Stock and Watson2002), dynamic stochastic gen-eral equilibrium macroeconomic models (Boivin and Giannoni 2006), structural VAR analysis (Bernanke, Boivin, and Eliasz 2005; Stock and Watson2005; Forni and Gambetti2010), and consumer demand and microbehavior (Forni and Lippi1997; Lewbel1991).

As the common factors are often unobserved, it is natural to ask how many factors should be included in practice. For example, in the factor-augmented VAR models of Bernanke, Boivin, and Eliasz (2005), an incorrect number of common fac-tors can lead to an inconsistent estimate of the space spanned by the structural shocks, and impulse responses based on such estimates may be misleading and result in bad policy sugges-tions. Recent papers by Stock and Watson (2009) and Breitung and Eickmeier (2011) also find that structural breaks in factor loadings can lead to a change in the number of factors in sub-samples. This implies that correctly determining the number of factors in subsamples can help us detect structural changes in factor models.

In this article, we propose a group bridge estimator to de-termine the number of factors in approximate factor models. Compared to conventional information criteria, the bridge esti-mator has the advantage that it can conduct both estimation and model selection in one step. It avoids the instability caused by subset selection or stepwise deletion (Breiman1996). The insta-bility associated with model selection shows up in the following way: when new data are added, the estimated model changes dramatically. Hence, instability in a factor model context means that the estimated number of factors fluctuates widely around the true number as new data are introduced. Also, Huang, Horowitz, and Ma (2008, HHM hereafter) show that the bridge estimator possesses the oracle property, that is, it will provide efficient estimates as if the true model is given in advance. Moreover, as discussed in De Mol, Giannone, and Reichlin (2008), shrinkage based estimators such as bridge have certain optimal risk prop-erties as seen in Donoho and Johnstone (1994). Thus, we expect that the bridge estimator will preserve these good properties in the high-dimensional factor models.

This article builds a connection between shrinkage estima-tion and factor models. Despite the large literature on both fac-tor models and shrinkage estimation, the interaction between these two fields is rather small. Bai and Ng (2008) applied the shrinkage estimation to select the relevant variables and refine forecasts based on factor models. However, their application of shrinkage estimation is still restricted to a single equation and does not apply to large panels. This article contributes to these two strands of the literature in the following ways. First, we

359

(3)

extend the conventional bridge estimator from a single equation (such as Knight and Fu2000; Caner and Knight2013) to a large panel context. This extension is not trivial because shrinkage estimators in the literature (such as HHM) usually assume inde-pendent error terms. In contrast, our estimator allows the error terms to have correlations in both cross-section and time dimen-sions. In addition, the regressors are not observed but estimated in factor models, so the estimation errors in factors bring another layer of difficulty to our extension. Moreover, the sparsity con-dition in factor models is different from the one that is common in the literature. Many papers allow the number of parameters to increase with the sample size (e.g., Zou and Zhang 2009; Caner and Zhang2014), but most of them rely on the sparsity condition that the number of nonzero coefficients has to diverge at a slower rate than the sample size. In factor models, however, the number of nonzero factor loadings is proportional to the cross-sectional dimension.

Additionally, our estimator provides a new way to determine the number of factors. We prove that our estimator can consis-tently estimate the factor loadings (up to a rotation) of relevant factors and shrink the loadings of irrelevant factors to 0 with a probability approaching 1. Hence, given the estimated factors, this new estimator can select the correct model specification and conduct the estimation of factor loadings in one step. We pe-nalize the Euclidean norm of the factor loading vectors. Specif-ically, all cells in the zero factor loading vectors are shrunk simultaneously. Thus, our estimation has a group structure.

In recent years, much attention has been drawn to the estima-tion of the number of factors. Bai and Ng (2002) and Onatski (2010) proposed statistics to determine the number of static fac-tors in approximate factor models.1_{Onatski (}₂₀₀₉_{) constructed}

tests for the number of factors based on the empirical distri-bution of eigenvalues of the data covariance matrix. Amengual and Watson (2007), Bai and Ng (2007), and Hallin and Liska (2007) developed estimators for the number of dynamic factors. These methods are roughly equivalent to finding some threshold to distinguish large and small eigenvalues of the data covariance matrix. Some other related methods are developed by Kapetan-ios (2010), Han (2012), and Seung and Horenstein (2013). We solve the problem from a different but related angle: the group bridge estimator directly focuses on the shrinkage estimation of the factor loading matrix. Simulations show that our estimator is robust to cross-sectional and serial correlations in the error terms and outperforms existing methods in certain cases.

Section2introduces the model, the assumptions, and presents the theoretical results of our estimator. Section3conducts sim-ulations to explore the finite sample performance of our esti-mator. Section4provides an empirical example using a com-monly used macroeconomic dataset. Section5concludes. The appendix contains all proofs.

2. THE MODEL

We use the following representation for the factor model:

Xit=λ0

′

iF

0

t +eit, (2.1)

1_{The definitions of static factors, dynamic factors, and approximate factor}

mod-els are discussed in the second paragraph of Section2.

whereXitis the observed data for theith cross-section at timet, fori₌1,2, . . . , N andt₌1,2, . . . , T,F0

t is anr×1 vector

of common factors, andλ0_i is anr_×1 vector of factor loadings associated withF0

t . The true number of factors is “r.”λ

0′

i Ft0is

the common component ofXitandeitis the idiosyncratic error. F_t0,λ0_i,eit, andrare unobserved.

Here we explain several terms used in our discussion of fac-tor models. We call (2.1) anapproximate factor modelbecause our assumptions (see Section2.1) allow some cross-sectional correlations in the idiosyncratic errors. It is more general than a strict factor model, which assumes eit to be uncorrelated across i. Also, we refer to (2.1) as a static factor model be-cause Xit has a contemporaneous relationship with the fac-tors. This is different from thedynamic factor modelin Forni et al. (2000). The dynamic factor model has the representa-tion Xit=qj₌1ϕij(L)fj t+eit, where theq-dimensional dy-namic factors are orthonormal white noises and the one-sided filters ϕij(L) are dynamic factor loadings. Hence, the factor modelXit=aift+bift−1+eithas one dynamic factor but two static factors. Our focus is the static factor model with cross-sectionally correlatedeit. For conciseness, we use the term “fac-tor model” to refer to the “static fac“fac-tor model” and “approxi-mate factor model” in the rest of the article, unless otherwise specified.

Now, we rewrite model (2.1) in the vector form

Xi =F0λ0i +ei fori=1,2, . . . , N, (2.2)

where Xi =(Xi1, Xi2, . . . , XiT)′ is aT-dimensional vector of observations on theith cross-section,F0

=(F10, F 0 2, . . . , F

0

T)′

is a T _×r matrix of the unobserved factors, and ei =

(ei1, ei2, . . . , eit)′is aT-dimensional vector of the idiosyncratic shock of theith cross-section. Models (2.1) and (2.2) can also be represented in the matrix form:

X₌F00′₊e, (2.3) whereX₌(X1, X2, . . . , XN) is aT ×N matrix of observed

data,0 =(λ0

1, λ02, . . . , λ0N)′ is the N×r factor loading

ma-trix, ande₌(e1, e2, . . . , eN) is aT ×Nmatrix of idiosyncratic

shocks.

We use principal components analysis (PCA) to estimate un-known factors, and thereafter we use bridge estimation to get the correct number of factors. The conventional PCA minimizes the following objective function:

V(k)= min

k_,Fk(N T) −1

N

i=1

T

t=1

Xit−λk

′

i F k t

2

, (2.4)

where the superscript k denotes the fact that the number of factors is to be determined, and bothλk_i andFk

t arek×1 vectors.

We consider 0≤k_≤p, wherepis some predetermined upper bound such thatp_≥r. For a givenk, the solution to the above objective function (2.4) is that ˆFk(T ×k) is equal to√T times the eigenvectors corresponding to theklargest eigenvalues of XX′and ˆk_OLS₌X′Fˆk( ˆFk′Fˆk)−1=X′Fˆk/T (as eigenvectors are orthonormal, ˆFk′Fˆk/T ₌Ik).

LetCN T =min(

√

N ,√T). We define our group bridge es-timator as ˆthat minimizes the following objective function:

(4)

ˆ

₌argminL() (2.5)

L()=

1 N T

N

i=1

T

t=1

Xit−λ

′

iFˆ p t

2

+ γ

C_{N T}2

p

j=1

1 N

N

i=1

λ2_ij α⎤

⎦, (2.6)

where ˆFp is the T _×p principal component estimate of the factors, ˆFt

p

is the transpose of the tth row of ˆFp, λi =

(λi1, . . . , λip)′ is a p×1 vector in a compact subset of Rp,

_≡(λ1, λ2, . . . , λN)′isN×p,αis a constant with 0< α <

1/2, andγis a tuning parameter.

The penalty term in (2.6) is different from that of a conven-tional bridge estimator such as the one in HHM (2008). First, unlike the single equation shrinkage estimation, our penalty term is divided byC2

N T, which indicates that the tuning parameter will

depend on bothNandT. Second, we usep

j=1( 1

N

i=1λ 2

ij)α instead of the HHM (2008)-type of penaltyN

i=1 p

j=1|λij|δ (0< δ <1). A necessary condition for N

i=1 p

j=1|λij|δ to shrink the last p₋r columns of to zero is that the cor-responding nonpenalized estimates for these loadings have to converge in probability to zero for alli’s andj ₌r₊1, . . . , p. This necessary condition fails because the lastp₋r columns of ˆFp _{are correlated with some}_X

i’s, due to the limited

cross-sectional correlation among idiosyncratic errors. Hence, using N

i=1 p

j=1|λij|δas the penalty will yield many zero and a few nonzero estimates in the lastp₋rcolumns of, and will cause overestimation if we use the number of nonzero columns in as an estimator forr.

To shrink the entirejth (j > r) column of to zero, we apply the penalty term in (2.6) to penalize the norm of the entire vectorj _≡(λ1j, . . . , λNj)′rather than each single

ele-mentλij. The idea is related to the group LASSO (Yuan and Lin 2006), which is designed for single equation models. In a factor model setup, the analog of the group LASSO’s penalty term

will be proportional top

j=1

N

i=1λ 2

ij, which provides an in-tuition to explain why we have 0< α <1/2. IfN ₌1, then the model only involves a single equation, and the group LASSO’s penalty reduces top

j=1|λj|. Accordingly, our penalty becomes

p

j=1|λj|2αwith 0<2α <1, which is the familiar penalty term

in a single equation bridge estimation.

Huang et al. (2009) also proposed a group bridge estima-tor for single equation shrinkage estimation. In our large panel setup, the analog of Huang et al.’s (2009) penalty term will be proportional to p

j=1(

N

i=1|λij|)δ (0< δ <1). Their penalty is designed to achieve selection consistency both within and among groups. Under this article’s framework, a group of co-efficients correspond to a column in the factor loading matrix. The context of this article is different from that of Huang et al. (2009), and we do not need to consider the within group se-lection for three reasons. First, within group sese-lection means distinguishing zero from nonzero elements in the firstrcolumns of(N_×p), but this is not necessary because our purpose is to distinguish the firstr nonzero columns from the lastp₋r zero columns of, that is, we only need to select groups to

determiner. Second, it is well known that both factors and fac-tor loadings estimated by principal components are consistent estimates of their original counterparts up to some rotations. The rotations can be so generic that the economic meanings of the principal component estimators are completely unclear. The within group selection is based on the post-rotation factor loadings. Hence, unless there is a specific reason to detect zero and nonzero elements in the postrotation factor loading matrix, within group selection is not very meaningful under the current setup. Moreover, as our penalty term uses square rather than absolute values, the computation is much easier than that of the penalty used by Huang et al. (2009).

Hirose and Konishi (2012) applied the group LASSO tor in the context of strict factor models. However, their estima-tor is substantially different from (2.5), primarily because they focused on selecting variables that are not affected by any of the factors, whereas this article focuses on selecting factors that do not affect any of the variables. In other words, they were inter-ested in which rows ofare zero, whereas we care about which columns ofare zero, so Hirose and Konishi’s (2012) penalty term cannot determine the number of factors by design. One could in principle construct a group LASSO estimator to select the zero columns of. However, such an estimator is different from Hirose and Konishi (2012), and its theoretical property is an open question and beyond the scope of this article.

2.1 Assumptions

LetA_≡[trace(A′_A)]1/2

denote the norm of matrixAand

→p denote convergence in probability. Our theoretical results

are derived based on the following assumptions:

1. EF0

t4<∞, T−1

T t₌1Ft0F0

′

t →p F (r×r) as T →

∞, andF is finite, and positive definite.

2. λ0_i_≤λ <¯ _∞, 0′

0_/N

− →0 as N → ∞, and

is finite, and positive definite. The jth column of ,

denoted asj_{, is inside a compact subset of}_RN _{for all} _N

andj ₌1, . . . , p.

3. There exists a positive and finite constantM that does not depend onNorT, such that for allNandT,

(i) Eeit=0,E|eit|8< M. (ii) E(e′

set/N)=E[N−1 N

i=1eiseit]=ιN(s, t),

|ιN(s, s)| ≤Mfor alls, and

T−1

T

s=1

T

t=1

|ιN(s, t)| ≤M.

(iii) E(eitej t)=τij,twhere|τij,t| ≤ |τij|for someτijand for allt. In addition

N−1

N

i=1

N

j=1

|τij| ≤M.

(iv) E(eitej s)=τij,t sand

(N T)−1

N

i=1

N

j=1

T

s=1

T

t=1

|τij,t s| ≤M.

(5)

(v) For every (t, s),

E

N−1/2

N

i=1

[eiseit−E(eiseit)] 4

≤M.

4.

E ⎡

⎣ 1 N

N

i=1

1

√

T

t=1

F_t0eit

2⎤

⎦≤M.

5. The eigenvalues of the matrix (F) are distinct.

6. (i) AsNandT _{→ ∞}, γ CNT →

0,

whereCNT=min(

√

N ,√T). (ii) Also, asN _{→ ∞},

γ

C_NT2α → ∞, where 0< α <1/2.

Assumptions 1–4 are Assumptions A–D in Bai and Ng (2002). Assumption 1 is standard for factor models. With Assumption 2, we consider only nonrandom factor loadings. The results hold when the factor loadings are random but independent of factors and errors. The divergence rate of ′ _{is assumed to be}_N_,

which is the same as Bai and Ng (2002). It is possible to allow a divergence rate slower thanN, which indicates a weaker fac-tor structure. A slower divergence rate of′_{will slow down}

the convergence rate of F_tk and change the divergence rate of γ specified in Assumption 6. We conjecture that our bridge estimator still works when ′diverges at a slower rate and we leave that as a future research topic. Assumption 3 allows cross-sectional as well as serial dependence in the idiosyncratic errors. Hence, the model follows an approximate factor struc-ture. Assumption 4 allows for some dependency between factors and errors. Assumption 5 is Assumption G in Bai (2003). This assumption ensures the existence of the limit of the rotation matrixH, which will be introduced in the next section.

Assumption 6 is an assumption about the tuning parameter. Compared with the single equation bridge estimation, where the tuning parameter only depends onNorT,γ’s divergence rate depends on both N and T. Hence, we extend the application of the bridge estimation from single equation models to large panels. Also, note that a conventional bridge penalty will involve the sum of|λij|δwith 0< δ <1. However, as we useλ2ijinstead of|λij|in the penalty term, the restriction on the power has to be adjusted accordingly, that is, 0<2α <1.

2.2 Theoretical Results

Before discussing the main theoretical results, it is useful to describe some notations and transformations. We partition ˆFp

in the following way:

ˆ

Fp ₌( ˆF1:r. ˆ..F(r+1):p), (2.7) where ˆF1:rcorresponds to the firstrcolumns of ˆFp, and ˆF(r+1):p corresponds to the lastp₋rcolumns of ˆFp_{. It is worth noting}

that there is an important difference between ˆF1:r_{and ˆ}_F(r+1):p_.

It is well known that ˆF1:rconsistently estimatesF0(for eacht) up to some rotation (Bai2003), but ˆF(r+1):pis just some vectors containing noisy information from the idiosyncratic errors. In fact, the property of ˆF(r+1):pis to the best of our knowledge still hardly discussed in either the factor model or random matrix theory literature.2_{We will treat ˆ}_F1:r_{and ˆ}_F(r+1):p_{using different}

techniques. Let ˆF1:r

t denote the transpose of thetth row of ˆF1:r. Bai (2003)

showed that ˆF1:r

t −H′Ft0→p0, whereHis somer×rmatrix3

converging to some nonsingular limitH0. Since the factors are

estimated up to the nonsingular rotation matrixH, we rewrite (2.2) as

Xi =F0H·H−1λ0i +ei. (2.8)

Replacing the unobservedF0H with the principal component estimates ˆFp, we obtain the following transformation:

Xi =Fˆ1:rH−1λ0i +ei+(F0H−Fˆ1:r)H−1λ0i

=Fˆpλ∗_i ₊ui, (2.9)

whereui andλ∗i are defined as

ui≡ei−( ˆF1:r−F0H)H−1λ0i, λ∗i ≡

H−1_λ0

i

0(p−r)×1

. (2.10)

We set the lastp₋r entries ofλ∗

i equal to zero because the

model hasrfactors. Note that ˆFp_{is an estimated regressor and}

the transformed error termuiinvolves two components: the true

error term, which is heteroscedastic and serially correlated, and an additional term depending on ˆF1:r,F0, and λ0_i. Compared with the iid errors in HHM (2008) or the independent errors in Huang et al. (2009), our error termeiis much less restricted (see

Assumption 3). Also, note that the second term inui involves

estimated factors via principal components and that (2.6) uses estimated factors instead of the unobservedF0_{. These are new}

in the shrinkage literature and bring an extra layer of difficulty. Hence, our estimator in large panels is a nontrivial extension of existing methods.

Given theλ∗

i defined above, we can obtain the transformed

N_×ploading matrix:

∗_≡(λ∗1, . . . , λ∗N)′≡[

0_H−1′..

.0N×(p−r)].

Thus, the goal is to prove that ˆ₋∗ _{converges to zero. The}

lastp₋r columns ˆ should be zero and its first r columns should be nonzero, so that we can consistently determine the number of factors using the number of nonzero columns in ˆ. Let ˆλi denote the transpose of theith row of ˆ, the solution to

2_{This is why Bai and Ng (}₂₀₀₂_{) defined their estimator for factors as ˆ}_Fk_Vk_,

whereVk _{is a diagonal matrix consisting of the first}_k_{largest eigenvalues of}

XX′_{/N T}_{in a descending order. Since the}_k_{th (}_{k > r}_{) eigenvalue of}_XX′_{/N T}

isOp(CNT−2), the lastp−rcolumns of their factor matrix are asymptotically

zeros and only the firstrcolumns of the factor matrix matter. As their criteria focus on the sum of squared errors, this asymptotically not-of-full-column-rank design does not affect their result. Since we focus on estimating and penalizing the factor loadings, we use ˆFk_{to ensure full column rank instead of Bai and}

Ng’s (2002) ˆFk_Vk_{as the estimator for}_F0_.

3_{The definition of}_H_{is given by Lemma 1 in the appendix.}

(6)

(2.6). The following theorem establishes the convergence rate of ˆλi.

Theorem 1. Under Assumptions 1–6,

N−1

N

i=1

λˆi−λ∗i

2 =Op

C_NT−2

.

It is noteworthy that this rate is not affected by γ, as long as the divergence rate ofγ satisfies Assumption 6. This is an extension of Huang, Horowitz, and Ma’s (2008) result to a high-dimensional factor model context. Under some appropriate assumptions, HHM (2008) showed that the bridge estimator (denoted by ˆβ) has the familiar OLS convergence rate, that is,

βˆ₋β₀2

=Op(N−1), in a single equation model. In this

ar-ticle, the rate depends on bothN andT and is similar to Bai’s (2003) result for the OLS estimator of factor loadings (given the principal components estimates of factors). Hence, Theo-rem 1 shows that the group bridge estimator defined in (2.5) is as good as conventional estimators in terms of convergence rate. The next theorem shows that our estimator can estimate the lastp₋rcolumns of ˆexactly as zeros with a probability converging to 1.

Theorem 2. Under Assumptions 1–6,

(i) ˆ(r+1):p

=0N×(p−r) with probability converging to 1 as

N, T _{→ ∞}, where ˆ(r+1):p _{denotes the last} _p

−r columns of ˆ.

(ii) P( ˆj

=0)→0 as N, T _{→ ∞} for any j ₌1, . . . , r, where ˆj _{represents the}_j_{th column of the ˆ}_.

This theorem shows that our estimator achieves the selection consistency for the zero elements in∗: the firstrcolumns in

ˆ

will be nonzero and lastp₋rcolumns of ˆwill be zero as

NandTdiverge. Hence, it is natural to define the estimator for the number of factors as

ˆ

r_≡the number of nonzero columns of ˆ. (2.11)

The following corollary establishes the consistency of ˆr, and the proof is straightforward given Theorem 2.

Corollary 1. Under Assumptions 1–6, the number of factors in (2.3) can be consistently determined by ˆr.

2.3 Computation

In this section we show how to implement our method. The initial step is estimating thep factors via PCA in (2.4). Next, we show how to solve the optimization in (2.5) and (2.6). As the bridge penalty is not differentiable at zero for 0< α <1/2, standard gradient-based methods are not applicable to our es-timator. We develop an algorithm to compute the solution for (2.6). Let ˆ(m) be the value of the mth iteration from the op-timization algorithm,m₌0,1, . . .. LetT _{denote the}

conver-gence tolerance andν denote some small positive number. In this article, we setT ₌₅_×₁₀−4_and_ν

=10−4_.

Set the initial value equal to the OLS solution, ˆ(0) =

X′Fˆp_/T_{. For}_m

=1,2, . . .,

1. Let ˆ(m),j denote thejth column of ˆ(m). Computeg1=

(X₋Fˆpˆ(m))′_Fˆp_/N _and

g2(j, ν)= −

αγˆ(m),j

N[( ˆ(m),j′_ˆ(m),j_/N)1−α₊_ν]·

2T C2

NT ,

whereν avoids the 0 denominator when ˆ(m),j ₌0. Since the value of the tuning parameter will be selected as well, the constant term 2T /C2

NT(for a given sample) can be dropped in the algorithm.

Setg2(ν)=[g2(1, ν), g2(2, ν), . . . , g2(p, ν)].

2. Define the N_×p gradient matrix g₌[g1_{, g}2_{, . . . , g}p_]

(N×p), and theith element of the jth column gj _(j

=

1, . . . , p) is defined as

gj_i ₌g1,ij+g2(j, ν)i if|λ

(m),j i |>T

gj_i ₌0 if|λ(_im),j_{| ≤}T_,

whereg1,ij is the (i, j)th element ofg1,g2(j, ν)i is theith

element ofg2(j, ν), andλ (m),j

i is theith element of ˆ

(m),j

. 3. Let max|g_|denote the largest element ingin terms of

abso-lute value. Rescaleg₌g/max|g_|if max|g_|>0, otherwise setg₌0.

4. ˆ(m+1)₌ˆ(m)₊_×g/(1₊m/750), where is the in-crement for this iteration algorithm. We set₌2_×10−3_,

which follows the value used by HHM (2008). 5. Replacembym₊1 and repeat Steps 1–5 until

max

i=1,...,N;j=1,...,p|λ

(m),j i −λ

(m+1),j i | ≤T.

After convergence, we truncateλ(_im),j to 0 if|λ(_im),j_{| ≤}T_.

6. Set ˆr₌the number of nonzero columns in ˆ(m).

This algorithm is a modified version of the one proposed by HHM (2008). We use the OLS estimate instead of zero as the ini-tial value to accelerate the algorithm. Also, we modify HHM’s computation of the gradient in Step 2 so that the estimated load-ing will remain unchanged after beload-ing shrunk to zero. This was suggested by Fan and Li (2001) and can substantially accelerate the convergence in our high-dimensional setup. Additionally, note that in Step 4 we gradually decrease the increment, asm in-creases by dividing 1+m/750.4_{This ensures the convergence}

of the algorithm.

To see that the algorithm yields an estimator converging to the true value, first note that the algorithm approximates the bridge penalty function by

2αγ C2

NTN

·

p

j=1 j

1 (u′_u/N)1−α

+νu

′_du, _(2.12)

where the variable of integrationuis anN_×1 vector andj

is a line integral through the gradient of the approximated penalty term with the endpointj_{. Model (}_2.12_{) is continuously}

differ-entiable with respect to theN_×1 vectorj_{, so the computed}

estimator will converge to the minimum of the approximated objective function. Also, the approximated penalty function and

4_{Our Monte Carlo experiments show that the algorithm is reasonably fast and}

performs very well in terms of selecting the true value ofr. We also tried 500 and 1000 instead of 750 to adjust the increment, and the results are very similar and, hence, not reported.

(7)

its gradient converge to the bridge penalty and its gradient as ν_→0, respectively. Since all factor loadings are defined in a compact set by Assumption 2 and the bridge estimator is a global minimizer by HHM (2008), the minimum of the approx-imated objective function will converge to the minimum of (2.6) as ν_→0. Hence, our algorithm can generate a consistent es-timator. To ensure that the algorithm is not trapped in a local minimum, we use the OLS estimate as the initial value, which is close to the global minimum of (2.6) and reduces the compu-tation load. One could also use multiple starting values to avoid obtaining a local minimum.

To implement our estimator, we need to determine the values of α and γ. We set α₌0.25 as our benchmark value. We also vary α between 0.1 and 0.4; simulations (see Table 7) show that our estimator is very robust to the choice ofα. The tuning parameterγ is set equal toφ_·[min(N, T)]0.45 _because

Assumption 6 requires thatγ /CNT→0. The constantφvaries on the grid [1.5:0.25:8]. Instead of searching on a grid such as γ ₌1,5,10, . . . ,1000 for all sample sizes, this setup allows the grid to change as the sample size changes, so that it avoids unnecessary computation and thus accelerates our algorithm. We follow the method developed by Wang, Li, and Leng (2009) to determine the optimal choice ofγ. The optimalγ is the one that minimizes the following criterion:

log

(N T)−1

N

i=1

T

t=1

(Xit−λˆ′iFˆ p t )2

+r(γˆ ) _N

+T N T

ln

_{N T}

N₊T

·ln[ln(N)],

where ˆr(γ) denotes the fact that the estimated number of factors is affected by the choice ofγ. Note that the number of parameters in our model is proportional toN, so we use ln[ln(N)] in the penalty, which has the same rate as suggested by Wang, Li, and Leng (2009).

3. SIMULATIONS

In this section, we explore the finite sample properties of our estimator. We also compare our estimator with some of the estimators in the literature: ICp1 proposed by Bai and Ng

(2002), ICT1;nproposed by Hallin and Liska (2007), ED proposed

by Onatski (2010), and ER and GR proposed by Seung and Horenstein (2013).

Given the principal component estimator ˆFk, Bai and Ng (2002) used OLS to estimate the factor loadings by minimizing the sum of squared residuals:

V(k,Fˆk)=min

1 N T

N

i=1

T

t=1

Xit−λk

′

i Fˆ k t

2 .

Then the ICp1estimator is defined as thekthat minimizes the

following criterion:

ICp1(k)=log[V(k,Fˆk)]+k _N

+T N T

ln

_{N T}

N₊T

.

The criterion can be considered the analog of the conventional BIC in the factor model context. It penalizes the integer kto prevent the model from overfitting.

Hallin and Liska’s (2007) estimator is designed to determine the number of dynamic factors (usually denoted by q in the literature), but it is applicable in our Monte Carlo experiments where static and dynamic factors coincide by design, that is, r₌q. Hallin and Liska’s (2007) estimator follows the idea of Bai and Ng (2002), but their penalty term involves an additional scale parameterc. In theory, the choice ofc >0 does not mat-ter asN andT _{→ ∞}. For a givenN andT ,they propose an algorithm to select the scale parameter:cis varied on a prede-termined grid ranging from zero to some positive number ¯c, and their statistic is computed for each value ofcusing J nested subsamples (i=1, . . . , Nj; t=1, . . . , Tj) for j =1, . . . , J,

whereJis a fixed positive integer, 0< N1<· · ·< Nj <· · ·<

NJ =N and 0< T1<· · ·< Tj <· · ·< TJ =T. Hallin and

Liska (2007) showed that [0, ¯c] will contain several intervals that generate estimates not varying across subsamples. They called these “stability intervals” and suggested that thec be-longing to the second stability interval is the optimal one (the first stability intervals contains zero). Since the scale parameter in ICT1;n can adjust its penalty, it is expected that IC

T

1;nshould

be more robust to the correlation in the idiosyncratic errors than Bai and Ng’s (2002) estimators, whose penalty solely depends onNandT.

Onatski’s (2010) ED estimator is designed to select the num-ber of static factors, so it is directly comparable with our esti-mator. ED chooses the largestkthat belongs to the following set:

{k_≤p: ρk(X′X/T)−ρk+1(X′X/T)> ζ},

whereζ is a fixed positive constant, andρk(.) denotes thekth

largest eigenvalue of a matrix. The choice ofζtakes into account the empirical distribution of the eigenvalues of the data sample covariance matrix,5so it is also a robust estimator to correlated error terms.

Recently, Seung and Horenstein (2013) proposed two estima-tors based on the eigenvalues ofX′X. They defined

ER(k)= ρk(X

′_{X/N T}₎

ρk+1(X′X/N T)

, GR(k)=ln[V(k−1)/V(k)]

ln[V(k)/V(k+1)],

where V(k)=min(N,T)

j=k+1 ρj(X′X/N T). The ER and GR

esti-mators are the integers that maximize ER(k) and GR(k), respec-tively.6_{Seung and Horenstein’s (}₂₀₁₃_{) results show that their}

estimators can outperform the ED estimator in many cases. Thus, it is expected that competing with ICT1:n, ED, ER, and GR

will not be an easy task; however, simulations show that our estimator is still more accurate in certain cases, so it will be useful and valuable in practice.

We conduct seven experiments in our Monte Carlo simula-tions. In the first six experiments, we setα₌0.25. In the last experiment, we varyαbetween 0.1 and 0.4 to see the stabil-ity of our estimator. The computation of our estimator and the selection of the tuning parameter follow the steps outlined in Section 2.3. We consider the data-generating process (DGP)

5_{See Onatski (}₂₀₁₀_{) for the details of the computation of}_ζ_.

6_{Ideas similar to the ER estimator have also been considered by Luo, Wang, and}

Tsai (2009) and Wang (2012)

(8)

similar to that of Bai and Ng (2002):

Xit=

r

j=1

λijFj t+

√

θ eit,

where the factorsFj t’s are drawn fromN(0,1) and the factor

loadingsλij’s are drawn fromN(0.5,1). θ will be selected to control the signal-to-noise ratio in the factor model. The follow-ing DGPs are applied to generate the error termeitin the seven different experiments:

E1:eit=σiuit,uit=ρui,t−1+vit+1≤|h|≤5βvi−h,t, where

vit∼iidN(0, 1) and σi is iid and uniformly distributed

between 0.5 and 1.5.

E2:eit=vitFt, wherevit∼iidN(0,1).

DGP E1 is similar to the setup in Bai and Ng (2002). The parameters β and ρ control the cross-sectional and se-rial correlations of the error terms, respectively. We also con-sider cross-sectional heteroscedasticity by introducingσi. The

setup of σi is the same as that in Breitung and Eickmeier

(2011). Note thatE(λijFj t)2=5/4 andE(σi2)=13/12. We set

θ₌θ0 =15r(1−ρ2)/13(1+10β2) in experiments 1–4 and 7

so that the factors explain 50% of the variation in the data. In experiment 6, we setθ₌2θ0to explore how the estimators

perform in data with a lower signal-to-noise ratio. DGP E2 is ap-plied in experiment 5, where we explore the case of conditional heteroscedasticity. In this experiment, we setθ₌5/4 so that theR2_{is also 50%. All experiments are replicated 1000 times.}

The upper bound on the number of factors is 10, that is,p₌10. In experiments 1–4, we vary the values ofβ andρ.Table 1 summarizes the results of our first experiment with no corre-lation in the errors, that is,β ₌ρ ₌0. The numbers outside the parentheses are the means of different estimators for the number of factors, and the numbers in (a|b) mean thata% of

the replications produce overestimation,b% of the replications produce underestimation, and 1−a%−b% of the replications produce the correct estimation of the number of factors. It is not surprising that ICp1is rather accurate with no correlation in e. Whenr ₌5 andT ₌50, our estimator is better than ICT1;n

but less accurate than ED, ER, and GR. However, asNandT

increase, our estimator can detect the correct number of factors almost 100% of the time. In contrast, ICT1;nstill has a downward

bias withN ₌T ₌200, whenr₌3 or 5.

In experiments 2 – 4, we consider three correlation structures in the error terms: cross-sectional correlation only (β ₌0.2 andρ ₌0), serial correlation only (β =0 andρ₌0.7), and coexisting cross-sectional and serial correlations (β =0.1 and ρ ₌0.6). The results are reported in Tables2–4, respectively. The results of these three experiments demonstrate very similar patterns. Bai and Ng’s (2002) ICp1 tends to overestimate the

number of factors. Whenr₌1, the estimates by ICp1 change

dramatically as more data are introduced. All other estimators tend to underestimate the number of factors whenr₌3 or 5 in small samples (NorT ₌50). However, our estimator is more accurate than ICT1;nand ED, especially whenr=5. Compared

with ER and GR, our estimator also has some advantages when errors are serially correlated. For example, our estimator has uniformly smaller biases than GR except whenr₌1 andN ₌ T ₌50 inTable 4.

In our fifth experiment, we consider conditionally het-eroscedastic errors using DGP E2. The results are reported in Table 5. ICp1substantially overestimates the number of factors

when r ₌1. Compared with ICT1;n, our estimator tends to be

more accurate, especially in small samples. Also, none of ED, GR, and our estimator dominates the other two. Our estimator is more accurate than ED whenr₌1, and more accurate than ER and GR whenr₌3, whereas ER and GR tend to have smaller biases whenr₌1 andr₌5.

Table 1. No cross-sectional or serial correlation,β₌0 andρ₌0

r N T Bridge ICp1 ICT1;n ED ER GR

1 50 50 1.00 (0_|0) 1.00 (0_|0) 1.01 (1_|0) 1.02 (2_|0) 1.00 (0_|0) 1.00 (0_|0) 100 50 1.00 (0_|0) 1.00 (0_|0) 1.00 (0_|0) 1.02 (1_|0) 1.00 (0_|0) 1.00 (0_|0) 100 100 1.00 (0_|0) 1.00 (0_|0) 1.00 (0_|0) 1.01 (1_|0) 1.00 (0_|0) 1.00 (0_|0) 100 200 1.00 (0_|0) 1.00 (0_|0) 1.00 (0_|0) 1.03 (1_|0) 1.00 (0_|0) 1.00 (0_|0) 200 100 1.00 (0_|0) 1.00 (0_|0) 1.00 (0_|0) 1.01 (1_|0) 1.00 (0_|0) 1.00 (0_|0) 200 200 1.00 (0_|0) 1.00 (0_|0) 1.00 (0_|0) 1.02 (2_|0) 1.00 (0_|0) 1.00 (0_|0) 3 50 50 2.83 (0_|13) 2.99 (0_|1) 2.60 (1_|25) 3.02 (1_|0) 2.88 (0_|8) 2.94 (0_|4) 100 50 2.94 (0_|5) 3.00 (0_|0) 2.87 (0_|8) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 100 100 3.00 (0_|0) 3.00 (0_|0) 2.99 (0_|1) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 100 200 3.00 (0_|0) 3.00 (0_|0) 2.99 (0_|0) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 200 100 3.00 (0_|0) 3.00 (0_|0) 2.95 (0_|4) 3.00 (0_|0) 3.00 (0_|0) 3.00 (0_|0) 200 200 3.00 (0_|0) 3.00 (0_|0) 2.45 (0_|37) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 5 50 50 1.81 (0_|98) 4.54 (0_|40) 1.43 (0_|94) 4.06 (1_|28) 3.15 (0_|59) 3.75 (0_|45)

100 50 1.87 (0_|98) 4.90 (0_|10) 1.41 (0_|94) 4.99 (0_|1) 4.53 (0_|15) 4.77 (0_|8) 100 100 4.80 (0_|9) 5.00 (0_|0) 4.52 (0_|20) 5.00 (0_|0) 4.97 (0_|1) 5.00 (0_|0) 100 200 5.00 (0_|0) 5.00 (0_|0) 4.98 (0_|1) 5.00 (0_|0) 5.00 (0_|0) 5.00 (0_|0) 200 100 5.00 (0_|0) 5.00 (0_|0) 4.13 (0_|30) 5.00 (0_|0) 5.00 (0_|0) 5.00 (0_|0) 200 200 5.00 (0_|0) 5.00 (0_|0) 3.29 (0_|77) 5.01 (0_|0) 5.00 (0_|0) 5.00 (0_|0)

NOTE: The error terms are generated using DGP E1.βcontrols the cross-sectional correlation, andρcontrols the serial correlation of the errors. The factors explain 50% of the variation in the data. Our estimator is compared with Bai and Ng’s (2002) ICp1, Hallin and Liska’s (2007) ICT1;n, Onatski’s (2010) ED, and Seung and Horenstein’s (2013) ER and GR. The upper

bound of the number of factors is set equal to 10, andris the true number of factors. The numbers outside the parentheses are the means of different estimators over 1,000 replications, and the numbers in (a|b) mean thata% of the replications produce overestimation,b% of the replications produce underestimation, and 1−a%−b% of the replications produce the correct estimation of the number of factors.

(9)

Table 2. Cross-sectional correlation only,β₌0.2 andρ₌0

1 50 50 1.41 (28_|0) 7.18 (100_|0) 1.63 (44_|0) 1.73 (17_|0) 1.00 (0_|0) 1.01 (0_|0) 100 50 1.00 (0_|0) 5.06 (96_|0) 1.13 (9_|0) 1.11 (6_|0) 1.00 (0_|0) 1.00 (0_|0) 100 100 1.00 (0_|0) 9.16 (100_|0) 2.08 (44_|0) 1.04 (3_|0) 1.00 (0_|0) 1.00 (0_|0) 100 200 1.00 (0_|0) 10.00 (100_|0) 2.63 (40_|0) 1.02 (1_|0) 1.00 (0_|0) 1.00 (0_|0) 200 100 1.00 (0_|0) 1.41 (34_|0) 1.08 (4_|0) 1.04 (4_|0) 1.00 (0_|0) 1.00 (0_|0) 200 200 1.00 (0_|0) 7.21 (100_|0) 1.01 (1_|0) 1.02 (1_|0) 1.00 (0_|0) 1.00 (0_|0) 3 50 50 3.21 (31_|12) 8.92 (100_|0) 1.95 (12_|65) 2.08 (11_|60) 2.38 (9_|54) 3.11 (21_|41)

100 50 2.94 (0_|6) 6.85 (97_|0) 2.29 (4_|43) 2.95 (3_|7) 2.70 (0_|19) 2.80 (0_|13) 100 100 3.00 (0_|0) 9.79 (100_|0) 3.60 (40_|2) 3.01 (1_|0) 2.93 (0_|5) 2.98 (0_|2) 100 200 3.00 (0_|0) 10.00 (100_|0) 3.78 (33_|0) 3.00 (0_|0) 2.97 (0_|1) 2.99 (0_|1) 200 100 3.00 (0_|0) 3.46 (37_|0) 3.04 (5_|0) 3.02 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 200 200 3.00 (0_|0) 8.31 (100_|0) 3.00 (0_|0) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 5 50 50 2.62 (1_|90) 9.73 (100_|0) 1.21 (0_|99) 0.99 (3_|96) 2.14 (11_|86) 2.86 (20_|76)

100 50 2.35 (0_|94) 8.39 (97_|0) 1.19 (0_|98) 1.55 (1_|83) 1.88 (1_|84) 2.42 (3_|74) 100 100 4.69 (3_|20) 9.96 (100_|0) 3.87 (19_|52) 2.66 (0_|56) 2.36 (0_|71) 2.80 (1_|61) 100 200 5.32 (28_|2) 10.00 (100_|0) 4.87 (30_|39) 4.12 (0_|21) 2.72 (0_|59) 3.25 (0_|45) 200 100 4.99 (0_|1) 5.50 (38_|0) 3.48 (0_|55) 5.02 (1_|0) 4.68 (0_|8) 4.89 (0_|3) 200 200 5.00 (0_|0) 9.25 (100_|0) 4.93 (0_|4) 5.00 (0_|0) 4.98 (0_|0) 5.00 (0_|0)

Table 6reports the results of our sixth experiment, where we consider a weaker factor structure by setting θ₌2θ0, that is,

factors only explain 1/3 of the variation in the data. The error terms are generated using DGP E2 withβ ₌0.1 andρ₌0.6. It is expected that it is more difficult to estimate the correct number of factors in this setup. Bai and Ng’s (2002) ICp1 is

still severely upwardly biased. In addition, our estimator tends to perform better than ICT1;nand ED whenr =1 or 3. Compared

with ER and GR, our estimator tends to have smaller biases whenN ₌200 orT ₌200.

As mentioned previously, it is very difficult for our estimator to outperform the competitors uniformly. However, it can be seen from Tables1–6 that our estimator performs better than the competing estimators under many circumstances, especially when the idiosyncratic shocks have substantial serial and cross-sectional correlations.

Table 3. Serial correlation only,β₌0 andρ₌0.7

1 50 50 1.09 (9_|0) 9.03 (100_|0) 1.33 (24_|0) 1.19 (10_|0) 1.00 (0_|0) 1.00 (0_|0) 100 50 1.02 (2_|0) 9.52 (100_|0) 1.21 (15_|0) 1.13 (5_|0) 1.00 (0_|0) 1.00 (0_|0) 100 100 1.00 (0_|0) 6.90 (100_|0) 1.34 (16_|0) 1.05 (4_|0) 1.00 (0_|0) 1.00 (0_|0) 100 200 1.00 (0_|0) 1.13 (12_|0) 1.00 (0_|0) 1.04 (3_|0) 1.00 (0_|0) 1.00 (0_|0) 200 100 1.00 (0_|0) 9.78 (100_|0) 1.11 (5_|0) 1.02 (2_|0) 1.00 (0_|0) 1.00 (0_|0) 200 200 1.00 (0_|0) 1.91 (58_|0) 1.00 (0_|0) 1.03 (2_|0) 1.00 (0_|0) 1.00 (0_|0) 3 50 50 3.07 (17_|9) 9.69 (100_|0) 2.40 (8_|37) 2.18 (5_|41) 2.43 (3_|40) 2.72 (8_|31)

100 50 3.03 (7_|4) 9.92 (100_|0) 2.61 (6_|24) 2.61 (3_|21) 2.58 (1_|27) 2.76 (3_|20) 100 100 3.00 (0_|0) 8.47 (100_|0) 3.04 (5_|2) 3.03 (1_|0) 2.98 (0_|1) 2.99 (0_|0) 100 200 3.00 (0_|0) 3.15 (14_|0) 3.00 (0_|0) 3.02 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 200 100 3.00 (0_|0) 9.96 (100_|0) 3.02 (3_|1) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 200 200 3.00 (0_|0) 3.91 (59_|0) 3.00 (0_|0) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 5 50 50 2.74 (0_|87) 9.92 (100_|0) 1.32 (1_|93) 0.89 (2_|96) 2.20 (7_|86) 2.79 (13_|78)

100 50 3.01 (0_|82) 10.00 (100_|0) 1.19 (1_|93) 0.88 (1_|95) 1.84 (4_|86) 2.51 (9_|77) 100 100 4.82 (1_|11) 9.35 (100_|0) 3.42 (1_|56) 3.92 (1_|26) 3.38 (0_|45) 3.86 (0_|33) 100 200 5.00 (0_|0) 5.14 (13_|0) 4.91 (0_|4) 5.01 (1_|0) 4.82 (0_|5) 4.96 (0_|1) 200 100 5.00 (0_|1) 9.99 (100_|0) 3.59 (0_|46) 4.88 (0_|3) 4.11 (0_|23) 4.53 (0_|12) 200 200 5.00 (0_|0) 5.91 (60_|0) 4.98 (0_|1) 5.01 (0_|0) 5.00 (0_|0) 5.00 (0_|0)

(10)

Table 4. Both cross-sectional and serial correlation,β₌0.1 andρ₌0.6

1 50 50 1.11 (9_|0) 7.90 (100_|0) 1.37 (26_|0) 1.23 (13_|0) 1.00 (0_|0) 1.00 (0_|0) 100 50 1.00 (0_|0) 7.44 (100_|0) 1.15 (12_|0) 1.13 (9_|0) 1.00 (0_|0) 1.00 (0_|0) 100 100 1.00 (0_|0) 3.96 (95_|0) 1.37 (19_|0) 1.07 (5_|0) 1.00 (0_|0) 1.00 (0_|0) 100 200 1.00 (0_|0) 2.61 (84_|0) 1.06 (3_|0) 1.08 (5_|0) 1.00 (0_|0) 1.00 (0_|0) 200 100 1.00 (0_|0) 3.20 (89_|0) 1.05 (3_|0) 1.06 (4_|0) 1.00 (0_|0) 1.00 (0_|0) 200 200 1.00 (0_|0) 1.75 (56_|0) 1.00 (0_|0) 1.05 (4_|0) 1.00 (0_|0) 1.00 (0_|0) 3 50 50 3.06 (18_|11) 9.04 (100_|0) 2.40 (8_|39) 2.32 (8_|37) 2.39 (4_|42) 2.68 (9_|32)

100 50 2.97 (1_|4) 9.02 (100_|0) 2.71 (5_|18) 2.87 (5_|10) 2.65 (0_|23) 2.80 (1_|14) 100 100 3.00 (0_|0) 5.98 (96_|0) 3.09 (9_|1) 3.04 (3_|0) 2.98 (0_|1) 2.99 (0_|1) 100 200 3.00 (0_|0) 4.51 (84_|0) 3.04 (4_|0) 3.04 (3_|0) 3.00 (0_|0) 3.00 (0_|0) 200 100 3.00 (0_|0) 5.37 (93_|0) 3.00 (1_|1) 3.02 (2_|0) 3.00 (0_|0) 3.00 (0_|0) 200 200 3.00 (0_|0) 3.75 (56_|0) 3.00 (0_|0) 3.02 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 5 50 50 2.65 (0_|89) 9.58 (100_|0) 1.35 (1_|94) 1.02 (2_|94) 2.08 (6_|87) 2.59 (11_|79)

100 50 2.75 (0_|86) 9.68 (100_|0) 1.42 (0_|89) 1.32 (1_|86) 1.98 (2_|82) 2.50 (4_|74) 100 100 4.81 (1_|12) 7.76 (96_|0) 3.57 (1_|54) 3.87 (2_|28) 3.24 (0_|47) 3.80 (1_|34) 100 200 5.03 (4_|1) 6.50 (83_|0) 4.88 (0_|7) 4.95 (2_|2) 4.45 (0_|14) 4.71 (0_|8) 200 100 4.99 (0_|1) 7.45 (92_|0) 4.43 (0_|21) 4.99 (1_|1) 4.50 (0_|13) 4.81 (0_|5) 200 200 5.00 (0_|0) 5.74 (56_|0) 4.99 (0_|1) 5.00 (0_|0) 5.00 (0_|0) 5.00 (0_|0)

In the last experiment, we vary the value ofαto check the sta-bility of our estimator. We setα_{∈ {}0.1,0.15,0.25,0.35,0.4}, (β, ρ)∈ {(0,0), (0.1,0.6)}andN ₌T ₌100. Whenr₌0,θ is set equal to 15(1−ρ2)/13(1+10β2). Generally speaking, our estimator is rather stable to the choice ofα. Whenr₌1 or 3, our estimator almost always gives the correct result. When r₌5, our estimator performs well for all values ofα except

forα₌0.4: the estimate is 3.36 when the errors are correlated. When r₌0, our estimator performs well for all values ofα except forα₌0.1: it has an upward bias when the errors are correlated. However, our simulation (not reported in the table) confirms that these biases vanish asNandTincrease. These re-sults indicate that it is better to avoid very small (close to zero) or very large (close to 0.5)αfor small samples in practice.

Table 5. Conditionally heteroscedastic errors

1 50 50 1.22 (17_|0) 8.34 (97_|0) 1.34 (25_|0) 1.28 (20_|0) 1.00 (0_|0) 1.01 (1_|0) 100 50 1.17 (15_|0) 8.58 (98_|0) 1.30 (22_|0) 1.36 (23_|0) 1.00 (0_|0) 1.00 (0_|0) 100 100 1.04 (4_|0) 4.22 (89_|0) 1.50 (29_|0) 1.29 (21_|0) 1.00 (0_|0) 1.00 (0_|0) 100 200 1.00 (0_|0) 1.68 (48_|0) 1.13 (8_|0) 1.19 (16_|0) 1.00 (0_|0) 1.00 (0_|0) 200 100 1.06 (5_|0) 5.79 (96_|0) 1.32 (21_|0) 1.32 (22_|0) 1.00 (0_|0) 1.00 (0_|0) 200 200 1.01 (1_|0) 2.94 (83_|0) 1.07 (6_|0) 1.29 (22_|0) 1.00 (0_|0) 1.00 (0_|0) 3 50 50 2.82 (0_|16) 3.30 (23_|0) 2.18 (3_|48) 2.93 (5_|8) 2.66 (0_|23) 2.77 (1_|16)

100 50 2.96 (0_|4) 3.30 (23_|0) 2.54 (3_|26) 3.08 (8_|1) 2.89 (0_|8) 2.94 (1_|5) 100 100 3.00 (0_|0) 3.02 (2_|0) 3.07 (8_|1) 3.06 (5_|0) 3.00 (0_|0) 3.00 (0_|0) 100 200 3.00 (0_|0) 3.00 (0_|0) 3.00 (0_|0) 3.04 (4_|0) 3.00 (0_|0) 3.00 (0_|0) 200 100 3.00 (0_|0) 3.07 (7_|0) 3.01 (1_|0) 3.10 (9_|0) 3.00 (0_|0) 3.00 (0_|0) 200 200 3.00 (0_|0) 3.00 (0_|0) 2.99 (0_|1) 3.09 (8_|0) 3.00 (0_|0) 3.00 (0_|0) 5 50 50 1.57 (0_|100) 4.61 (3_|37) 1.31 (0_|97) 2.25 (1_|69) 2.31 (0_|79) 2.96 (1_|67)

100 50 1.62 (0_|100) 4.94 (1_|7) 1.39 (0_|93) 4.05 (1_|24) 3.34 (0_|48) 3.98 (1_|33) 100 100 4.53 (0_|19) 5.00 (0_|0) 4.08 (0_|36) 5.01 (2_|0) 4.75 (0_|7) 4.89 (0_|3) 100 200 5.00 (0_|0) 5.00 (0_|0) 4.92 (0_|4) 5.03 (2_|0) 5.00 (0_|0) 5.00 (0_|0) 200 100 4.98 (0_|1) 5.00 (0_|0) 4.63 (0_|13) 5.04 (3_|0) 4.99 (0_|0) 5.00 (0_|0) 200 200 5.00 (0_|0) 5.00 (0_|0) 4.96 (0_|2) 5.04 (4_|0) 5.00 (0_|0) 5.00 (0_|0)

NOTE: The conditionally heteroscedastic error terms are generated using DGP E2. The factors explain 50% of the variation in the data. Our estimator is compared with Bai and Ng’s (2002) ICp1, Hallin and Liska’s (2007) IC1;Tn, Onatski’s (2010) ED, and Seung and Horenstein’s (2013) ER and GR. The upper bound of the number of factors is set equal to 10, andris

the true number of factors. The numbers outside the parentheses are the means of different estimators over 1,000 replications, and the numbers in (a|b) mean thata% of the replications produce overestimation,b% of the replications produce underestimation, and 1−a%−b% of the replications produce the correct estimation of the number of factors.

(11)

Table 6. Weaker factor structure,β₌0.1,ρ₌0.6 andR2

=33.3%

1 50 50 1.10 (10_|0) 6.80 (99_|0) 1.31 (24_|0) 1.23 (14_|0) 1.00 (0_|0) 1.00 (0_|0) 100 50 1.00 (0_|0) 6.76 (100_|0) 1.17 (14_|0) 1.13 (8_|0) 1.00 (0_|0) 1.00 (0_|0) 100 100 1.00 (0_|0) 3.26 (91_|0) 1.44 (19_|0) 1.07 (5_|0) 1.00 (0_|0) 1.00 (0_|0) 100 200 1.00 (0_|0) 2.09 (71_|0) 1.12 (4_|0) 1.09 (5_|0) 1.00 (0_|0) 1.00 (0_|0) 200 100 1.00 (0_|0) 2.66 (82_|0) 1.08 (4_|0) 1.04 (4_|0) 1.00 (0_|0) 1.00 (0_|0) 200 200 1.00 (0_|0) 1.45 (39_|0) 1.00 (0_|0) 1.04 (3_|0) 1.00 (0_|0) 1.00 (0_|0) 3 50 50 1.85 (5_|73) 8.22 (98_|0) 1.40 (4_|81) 1.10 (5_|83) 1.66 (7_|77) 2.08 (14_|67)

100 50 1.80 (0_|73) 8.38 (99_|0) 1.48 (2_|75) 1.44 (3_|70) 1.50 (1_|76) 1.88 (5_|65) 100 100 2.78 (0_|16) 5.22 (90_|0) 2.93 (7_|14) 2.78 (4_|13) 2.41 (0_|33) 2.59 (0_|24) 100 200 2.99 (0_|1) 4.01 (68_|0) 3.07 (4_|1) 3.04 (3_|0) 2.84 (0_|9) 2.92 (0_|5) 200 100 2.99 (0_|1) 4.79 (83_|0) 2.88 (2_|10) 3.02 (2_|0) 2.88 (0_|7) 2.95 (0_|3) 200 200 3.00 (0_|0) 3.42 (35_|0) 3.00 (0_|0) 3.01 (1_|0) 3.00 (0_|0) 3.00 (0_|0) 5 50 50 0.81 (0_|100) 8.43 (89_|5) 0.97 (0_|99) 0.69 (1_|99) 0.82 (1_|99) 1.12 (3_|96)

100 50 0.66 (0_|100) 8.83 (96_|2) 0.83 (0_|99) 0.75 (0_|99) 0.50 (0_|100) 0.61 (0_|100) 100 100 0.95 (0_|100) 6.66 (81_|2) 2.12 (3_|87) 1.07 (1_|95) 0.78 (0_|99) 0.98 (0_|98) 100 200 1.44 (0_|98) 5.83 (61_|0) 3.83 (2_|51) 2.22 (1_|67) 0.93 (0_|95) 1.30 (0_|89) 200 100 1.15 (0_|99) 6.68 (81_|0) 2.21 (0_|85) 2.43 (0_|62) 0.84 (0_|96) 1.24 (0_|89) 200 200 4.61 (0_|11) 5.46 (39_|0) 4.00 (0_|37) 4.95 (1_|2) 3.45 (0_|38) 4.02 (0_|25)

NOTE: The error terms are generated using DGP E1.βcontrols the cross-sectional correlation, andρcontrols the serial correlation of the errors. The factors explain 33.3% of the variation in the data. Our estimator is compared with Bai and Ng’s (2002) ICp1, Hallin and Liska’s (2007) ICT1;n, Onatski’s (2010) ED, and Seung and Horenstein’s (2013) ER and GR.

The upper bound of the number of factors is set equal to 10, andris the true number of factors. The numbers outside the parentheses are the means of different estimators over 1,000 replications, and the numbers in (a|b) mean thata% of the replications produce overestimation,b% of the replications produce underestimation, and 1−a%−b% of the replications produce the correct estimation of the number of factors.

4. EMPIRICAL APPLICATION

In this section, we apply our method to the dataset used by Stock and Watson (2005). The dataset consists of 132 U.S. macroeconomic time series, spanning the period of 1960.1– 2003.12. The variables are transformed to achieve stationarity and then outliers are adjusted as described in Appendix A of Stock and Watson (2005). We set the upper bound on the num-ber of factors equal to 10 and apply various methods to the data. First, one should be careful about the interpretation of the result by Hallin and Liska’s (2007) ICT1;n. Unlike in our

simulations, where the dynamic factors are by design the same as static factors, the number of static and dynamic factors are not necessarily the same in practice. Hence, ICT

1;nis

inconclu-sive with regards to the number of static factors in the data. It finds five dynamic factors; however, this only implies that the

number of static factors is no less than five. Alessi, Barigozzi, and Capasso’s (2010) estimator, which is the static factor ver-sion of Hallin and Liska’s estimator, suggests that there are two static factors. Second, Bai and Ng’s (2002) ICp1andP Cp1

find seven and nine static factors, respectively. Since the dataset consists of monthly observations on macroeconomic variables from several categories (industrial production indexes, price in-dexes, employment, housing start, asset returns, etc.), the er-ror terms are likely to be correlated in both time and cross-sectional dimensions. Our simulations have already shown that ICp1 and PCp1 tend to overestimate the correct number of

fac-tors when the errors are correlated. The point was also made by Uhlig (2009) that the high number of factors found by ICp1and

P Cp1 may be due to the high temporal persistence of the data.

In contrast, Onatski’s (2010) ED and Seung and Horenstein’s (2013) ER and GR detect only one static factor. Additionally, we

Table 7. Stability to the choice ofα(N₌T ₌100)

β ρ α r₌0 r₌1 r₌3 r₌5

0 0 0.10 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 4.88 (0_|9)

0 0 0.15 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 4.85 (0_|11)

0 0 0.25 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 4.80 (0_|9)

0 0 0.35 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 4.90 (0_|3)

0 0 0.40 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 4.81 (0_|5)

0.1 0.6 0.10 0.26 (22_|0) 1.04 (4_|0) 3.00 (0_|0) 4.93 (1_|7)

0.1 0.6 0.15 0.05 (5_|0) 1.00 (0_|0) 3.00 (0_|0) 4.87 (0_|11)

0.1 0.6 0.25 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 4.81 (1_|12)

0.1 0.6 0.35 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 4.58 (2_|15)

0.1 0.6 0.40 0.00 (0_|0) 1.00 (0_|0) 3.00 (0_|0) 3.36 (8_|46)

NOTE: The error terms are generated using DGP E1.βcontrols the cross-sectional correlation, andρcontrols the serial correlation of the errors. The factors explain 50% of the variation in the data whenr >0. The upper bound of the number of factors is set equal to 10, andris the true number of factors. The numbers outside the parentheses are the means of different estimators over 1000 replications, and the numbers in (a|b) mean thata% of the replications produce overestimation,b% of the replications produce underestimation, and 1−a%−b% of the replications produce the correct estimation of the number of factors.