*Corresponding author. Tel.:#1-225-388-3782; fax:#1-225-388-3807. E-mail address:[email protected] (M.W. McCracken).
Robust out-of-sample inference
Michael W. McCracken
*
Department of Economics, Louisiana State University, 2107 CEBA, Baton Rouge, LA 70803-0306, USA Received 25 September 1998; received in revised form 29 November 1999; accepted 13 March 2000
Abstract
This paper presents analytical, empirical and simulation results concerning inference about the moments of nondi!erentiable functions of out-of-sample forecasts and forecast errors. Special attention is given to the measurement of a model's predictive ability using the test of equal mean absolute error. Tests for equal mean absolute error and mean square error are used to evaluate predictions of excess returns to the S & P 500 composite. Simulations indicate that appropriately constructed tests for equal mean absolute error can provide more accurately sized and more powerful tests than inappro-priately constructed tests for equal mean absolute error and mean square error. ( 2000 Elsevier Science S.A. All rights reserved.
JEL classixcation: C52; C53; C32; C12
Keywords: Forecasting; Forecast evaluation; Hypothesis testing; Model comparison
1. Introduction
It is becoming common to evaluate a forecasting model's ability to predict using out-of-sample methods. Meese and Rogo!(1983), in predicting exchange rates, report the mean square error (MSE) of forecast errors. Akgiray (1989) uses the mean absolute error (MAE) to evaluate volatility forecasts of stock returns. Engel (1994) reports the number of times the direction of change in exchange
rates is accurately predicted. Swanson and White (1995) report the Schwarz information criterion as well as the out-of-sampleR2that result when forward interest rates are used to predict future spot rates.
These papers, and many others, evaluate predictive ability in one of two ways. Most do so by simply constructing point estimates of some measure of predic-tive ability. The most common measure is MSE. A few others argue heuristically that their tests of predictive ability are limiting normal and hence asymptotically valid t-statistics can be used to test hypotheses. For example, Pagan and Schwert (1990) and Fair and Shiller (1990) construct regression based tests for e$ciency and encompassing respectively. However, they do not provide a set of su$cient conditions for their statistics to be asymptotically standard normal. Recent theoretical work has attempted to provide those su$cient conditions.
When parametric forecasts and forecast errors are used to estimate moments or conduct inference there are two sources of uncertainty. There is uncertainty that exists even when we know the model parameters and there is uncertainty due to the estimation of parameters. Diebold and Mariano (1995) show how to construct asymptotically valid out-of-sample tests of predictive ability when there is no parameter uncertainty, for example, when parameters are known. Under this restriction, they are able to construct tests of hypotheses that involve moments of di!erentiable and nondi!erentiable functions such as those used to construct tests for equal MSE and equal MAE between two predictive models. When parameters are unknown, and must be estimated, parameter uncer-tainty can play a role in out-of-sample inference. West (1996) has shown how the uncertainty due to parameter estimation can a!ect the asymptotic distribution of moments of di!erentiable functions of out-of-sample forecasts and forecast errors. Given a parametric forecasting model, this allows for inference concern-ing tests of serial correlation, e$ciency, encompassing, zero mean prediction error and equal MSE between two predictive models.
In this paper I close some of the gaps between the work by Diebold and Mariano (1995) and West (1996). I extend the work by Diebold and Mariano (1995) by showing that parameter uncertainty can a!ect out-of-sample inference regarding moments of nondi!erentiable functions. As in West (1996), the para-meter uncertainty causes the limiting covariance structure to be nonstandard. The limiting covariance matrix contains two components: a standard compon-ent that would exist if the parameters used to construct forecasts were known in advance and a second component due to the fact that parameters are not known and have to be estimated.
measure of predictive ability is nondi!erentiable. Secondly, I allow model parameters to be estimated using loss functions that are not di!erentiable. By doing so I permit a greater degree of freedom in choosing the loss function used to estimate the parameters to match the loss function used to evaluate the forecasts. This may be bene"cial in light of the discussion in Weiss (1996).
These extensions are potentially useful since nonsmooth measures of predic-tive ability have been used to evaluate parametric predicpredic-tive models. Granger (1969) provides an early theoretical discussion. Empirical examples are plentiful. Gerlow et al. (1993) use MAE to evaluate predictive ability. Swanson and White (1997) use mean absolute percentage error (MAPE) to measure predictive ability. Stekler (1991) compares the predictive ability of two parametric models using the test of percent better, or what Diebold and Mariano (1995) refer to as the&sign test'. Engel (1994) constructs a test for sign predictability based upon the binomial distribution. Henriksson and Merton (1981) and Pesaran and Timmermann (1992) construct tests for sign predictive ability using a standard normal approximation.
Each of the measures of predictive ability mentioned above can be used to construct tests of forecast accuracy. As presented though, most ignore the possibility that the forecasts are generated parametrically and hence may be a!ected by parameter uncertainty. The results of West and McCracken (1998), concerning smooth measures of predictive ability, suggest that in many circum-stances it is inappropriate to ignore the parameter uncertainty.
In this paper I provide analytical, empirical and simulation results indicating that ignoring parameter uncertainty can be inappropriate when nonsmooth measures of predictive ability are used. I focus on the test of equal MAE as an example in which accounting for parameter uncertainty can be important. Although I emphasize the absolute value function, the asymptotic results are applicable to tests that use indicator functions.
For the results of this paper to hold, however, certain conditions must be met. Perhaps the most important is Assumption 4. There I assume that the expecta-tion of the funcexpecta-tion of interest must be continuously di!erentiable in the parameters. This assumption is not very restrictive when the absolute value function is being used and is the reason I use the test of equal MAE as a foil throughout the paper. It can be a problem when indicator functions are used. In particular it can be a problem for tests of sign predictability. See the discussion following Assumption 4 for further detail.
request from the author presents details of proofs omitted from the paper to save space.
2. Theoretical results
This section presents su$cient conditions for asymptotic inference about the moments of functions of out-of-sample forecasts and forecast errors. These conditions will su$ce to show, in Theorem 2.3.1, that out-of-sample averages consistently estimate population means, and when appropriately scaled are asymptotically normal. These conditions also su$ce to show, in Theorem 2.3.2, that the limiting covariance structure can be consistently estimated by a straightforward application of Slutsky's theorem.
For any functionf,f
t,q(bKt) will denote the parametric estimate offt`q(bH). Also,
in order to minimize notation,f
t`q will denoteft`q(bH).
2.1. Environment
Throughout it is assumed thatMX
sNT`s/1qis a given sample of observables. The
latter portion of that sample contains a continuous stream ofPs-step ahead forecasts. The"rst forecast,y
R,q(bKR), is based upon a parameter vector estimated
using observationss"1,2,R. Further forecasts,y
t,q(bKt), are each constructed
using an estimated parameter vector that is based on observations
s"1,2,t,R)t)¹,R#P!q. The time period for which thePforecasts
are generated will be referred to as the out-of-sample period.
As in West and McCracken (1998) I will allow for three di!erent forecasting schemes. The recursive, rolling and"xed forecasting schemes di!er in how they construct the sequence of parameter estimates used to construct the sequence of forecasts and forecast errors. A brief description is given below.
Keim and Stambaugh (1986) use the recursive scheme. Under this scheme a sequence of forecasts is generated using updated parameter estimates. At each timet"R,2,¹the parameter estimatebKtdepends explicitly on all
obser-vables from s"1,2,t. If OLS is used to estimate the parameters from
a scalar linear model with regressors Z
s and predictand ys then bKt"
(t~1+ts/1Z
sZ@s)~1(t~1+ts/1Zsys). The"rst forecast is then of the formyR,q(bKR).
The second forecast,y
R`1,q(bKR`1) is constructed similarly using observations
s"1,2,R#1. This process is iteratedPtimes so that for eacht3[R,¹], the
parameter estimates use observationss3[1,t].
Chen and Swanson (1996) use the rolling scheme. Under this scheme the sequence of parametric forecasts is constructed in much the same way as the recursive scheme. The rolling scheme di!ers from the recursive in its treat-ment of observations from the distant past. The rolling scheme uses only a
1Notice that the"xed and rolling parameter estimates should be subscripted both bytandR. In order to simplify the notation the subscriptRwill be suppressed.
observations are not used in estimating the parameters. If OLS is used to estimate the parameters from a scalar linear model with regressors Z
s and
predictandy
s then bKt"(R~1+ts/t~R`1ZsZ@s)~1(R~1+ts/t~R`1Zsys). This
im-plies that the"rst rolling forecast, y
R,q(bKR), and forecast error are identical to
those for the recursive. The second rolling forecast,y
R`1,q(bKR`1), is constructed
using only observationss"2,2,R#1 to estimate the model parameters. This
implies that the second rolling forecast and forecast error are distinct from those using the recursive scheme. The process is iteratedPtimes such that for each
t3[R,¹] the parameter estimates use observationss3[t!R#1,t].
Kuan and Liu (1995) use the"xed scheme. This method is distinct from the previous two in that the parameters are not updated when new observations become available. Since the parameter vector is estimated only once, each of the P forecasts, y
t,q(bKR), uses the same parameter estimate.1 If OLS is
used to estimate the parameters using regressors Z
s and predictand ys then bKt"(R~1+Rs/1Z
sZ@s)~1(R~1+Rs/1Zsys). Hence for each forecast from time t3[R,¹], the parameter estimate only uses observationss3[1,R].
Since, we are ultimately interested in conducting inference concerning the population moments of functions of parametric forecasts and forecast errors, a description of these functions is in order. The function
f
t,q(bKt),f(q,Xt,bKt) (l]1) (1)
depends upon three arguments. The"rst is a"nite forecast horizon,q*1. The second, X
t, is a "nite dimensioned vector of observables. The dating of the
subscripttis not meaningful. For example, if we are interested in the one-step ahead MAE from a scalar linear regression model,f
t,1(bKt)"Dyt`1!Z@tbKtD. Since
the realized scalar left-hand side variable is y
t`1, and the variables used for
prediction areZ
t,Xt"(yt`1,Z@t)@.
The third argument,bKt, is an estimate of a (k]1) unknown parameter vector
bH. When the inference to be conducted is simply a diagnostic of a single parametric model, such as the test of zero median error for which
f
t,1(bKt)"1Myt`1!Z@tbKt)0N, bH is the vector of parameters that index that
particular parametric model. On the other hand, if the inference to be conducted is meant to detect which of two nonnested competing models is more accurate,
bHis formed by stacking the vector of parameters that index each of the two models. For example, suppose that we are interested in comparing the one-step ahead MAE from two scalar nonnested linear regression models. If we let i"1, 2 index the two models (along with their respective regressors and parameter estimates), f
Given such a function, we are interested in testing (say) the scalar null hypothesis H
0: Eft`q"h0 for some"niteh0. To do so, we will focus on test
statistics of the formXK~0.5P~0.5+Tt/R(f
t,q(bKt)!Eft`q) whereXK is a consistent
estimate of the appropriate limiting variance. In Theorem 2.3.1 it is shown that this statistic is asymptotically standard normal and hence asymptotically valid inference can be conducted using standard normal tables.
2.2. Assumptions
Within the following, for any matrixA,DAD"max
i,jDai,jD,DD.DDQis the¸Qnorm,
sup
t denotes supRxtxT, and forht(b) de"ned in Assumption 1, g
t(b)"[(ft,q(b)!Eft,q(b))@,ht(b)@]@. (2)
Assumption 1. The estimatebKt satis"esbKt!bH"B(t)H(t), whereB(t) is (k]q) and H(t) is (q]1), with (a) B(t)P
a.s.B,B a matrix of rank k, (b)
H(t)"t~1+ts/1h
s,R~1+ts/t~R`1hs and R~1+Rs/1hs for the recursive, rolling
and"xed schemes respectively; for the orthogonality conditionh
s,hs(bH), and
(c) Eh s"0.
Assumption 1 provides for a wide range of methods of estimating parameters. In particular, it allows for maximum likelihood, nonlinear least squares and a range of generalized method of moments estimators. It allows for linear and nonlinear models as well as single and multiple equation systems.
As an example of the notation in Assumption 1 consider that our statistic is used to test for equal MAE between two competing linear models. Suppose that each of the two models, fory
t`1, has the representationyt`1"Z@i,tbHi#ui,t`1
fori"1, 2. Consider further that for each i"1, 2 OLS provides a consistent estimate ofbHi (k
i]1). Since there are two sets of parameters needed to construct
this test, bKt"(bK@1,
t,bK@2,t)@(k1#k2"k]1), and hence B (k]q,q"q1#q2,
q
1"k1,q2"k2) and hs(q]1) are B"
A
(EZ1,tZ1,@ t)~1 0k1Cq20
k2Cq1 (EZ2,tZ@2,t)~1
B
, h s"
A
u1, s`1Z1,s u2,
s`1Z2,s
B
. (3)
Assumption 2. R,PPRas¹PR, and lim
T?=P/R"n, 0)n(R.
2The same type of problem exists for the Henriksson}Merton test (1981) and the Pesaran}Timmermann test (1992). I use this example to simplify the presentation.
theorem results to both the parameters estimated in-sample and the out-of-sample average of the functionf
t`q.
Assumption 3. For some d'1, (a) Ef
t`q"h0, (b) Xt is strong mixing with
coe$cients of size!2d/(d!1), (c)g
t(bH) is covariance stationary, and (d) for an
open neighborhoodNofbH, sup
tDDsupb|Ngt(b)DD2d(R, (e)Xis p.d. .
Assumption 3 is similar to that in West and McCracken (1998) with two important distinctions. The"rst is that I weaken the moment conditions so that only 2d rather than 4d need exist. This may prove helpful in the context of forecasting excess returns for which there is evidence of leptokurtosis. The second di!erence is that I reduce the order of the mixing coe$cients from
!3d/(d!1) to!2d/(d!1). The covariance stationarity assumption is prim-arily for simplifying the algebra when constructing a consistent estimate of the asymptotic covariance matrix in Theorem 2.3.2.
Assumption 4. For eachi3M1,2,l#qN: (a) Egi,t(b) is continuously di!
erenti-able in the neighborhoodN(from Assumption 3) ofbHadmitting a mean value expansion Eg
i,t(b)"Egi,t#(LEgi,t(bI)/Lb)(b!bH) where gi,t is a scalar, b
is (k]1) and bI is on the line betweenb andbH, (b) there exists a "nite con-stant D such that sup
tsupb|NDLEgi,t(b)/LbD(D, and (c) for all t,G"Gt,
LEh
t(b)/LbDb/bH andF"F
t,LEft,q(b)/LbDb/bH.
As we will see in Lemma 2.3.2 I separate the parameter uncertainty from the sampling uncertainty by taking a mean value expansion of Ef
t,q(b)Db/bKt as in
Randles (1982), rather than off
t,q(b)Db/bKt as in West (1996). The bound provided
byDsu$ces to show certain terms are o
1(1).
Although Assumption 4(a) is weaker than the di!erentiability condition in West (1996), it is not always satis"ed. A simple counterexample can be construc-ted that is relevant for tests of sign predictability.2 Suppose that
y
t"bHyt~1#ut withut&i.i.d. N(0, 1) andDbHD(1. Let one-step ahead
fore-casts of the form y
tbKt be used to predict yt`1. Consider the function
f
t,1(b)"1Mytb*0N. There are two cases. If bH"0 then Eft`1(bH)"
E1My
tbH*0N"E1M0*0N"1 for all t. But in every open neighborhood of bH"0 there exists a b such that Ef
t,1(b)"E1Mytb*0N"0.5. In this case
Assumption 4(a) fails. On the other hand, if bHO0 then there exists an open neighborhood of bH such that for all b in that neighborhood, Ef
t,1(b)"
E1My
tb*0N"0.5. In this case Assumption 4(a) holds. In the former case the
our test statistic. In the latter case, not only can the results be applied it is clear thatF"0.
This type of problem does not occur for all tests that use indicator functions. Consider the test of zero median error. Using the same environment as in the preceding example we havef
t,1(b)"1Myt`1!ytb)0N"1Mut`1)yt(b!bH)N.
Taking expectations, and lettingUdenote the standard normal c.d.f., we have Ef
t,1(b)"EU(yt(b!bH)). Since U is continuously di!erentiable Assumption
4(a) holds regardless of the value ofbH. Once again, not only can the results of this paper be applied it is clear thatF"0. See Kim and Pollard (1990, p. 205) for a set of conditions su$cient for continuous di!erentiability of expectations of indicator functions.
Assumption 5. Let N(e)"N(bH,e),Mb3Rk:Db!bHD(eN. There exist "nite constantsC,u'0 andQ*2dsuch that for allN(e)LN(from Assumption 3), sup
tDDsupb|N(e)(gt(b)!gt)DDQ)Cer.
In some circumstances it is straightforward to verify the ¸Q continuity condition in Assumption 5. For example, if the parametric model is linear and
f
t,qis Lipschitz (as is the case for the absolute value function), this assumption is
automatically satis"ed. When indicator functions are used, verifying the condi-tion is more di$cult. It will frequently be the case that Assumption 4 and reasonable assumptions on the continuity of the p.d.f. ofX
t will be needed to
verify the condition.
2.3. Results
In this section I utilize the assumptions of Section 2.2 to show that
P~0.5+Tt/R(f
t,q(bKt)!Eft`q) is asymptotically normal with a positive-de"nite
covariance matrixXwhich will usually depend onn"lim
T?=P/R. In order to
construct an asymptotically valid test statistic, I then show that there exists a straightforward and consistent estimatorXK of X.
For the"rst step in the derivation, I borrow a decomposition used by Randles (1982). Let
m0,P"P~0.5+T
t/R
(f
t,q(bKt)!Eft,q(b)Db/bKt!ft`q#Eft`q) (4)
and
m1,
P"P~0.5 T
+
t/R
(f
such that
P~0.5+T
t/R
(f
t,q(bKt)!Eft`q)"m0,P#m1,P.
This decomposition leads to the following two lemmas upon which limiting normality is based.
Lemma 2.3.1. Given Assumptions 1}5,m0,P"o
1(1). Lemma 2.3.2. Given Assumptions 1}5, m1,
P"[P~0.5+Tt/R(ft`q!Eft`q)#
FBP~0.5+Tt/RH(t)]#o
1(1).
It is now clear how both types of uncertainty are present. Sampling uncertain-ty is the"rst term and parameter uncertainty is the second term in the expansion of Lemma 2.3.2. It is important to note that Lemma 2.3.2 provides the same decomposition as in West and McCracken (1998) with F suitably re-de"ned. If we de"ne C
ff(j)"E(ft`q!Eft`q)(ft`q~j!Eft`q)@,Cfh(j)"
E(f
t`q!Eft`q)h@t`q~j,Chh(j)"Eht`qh@t`q~j,Sff"+=j/~=Cff(j),Sfh"+=j/~=
C
fh(j) andShh"+=j/~=Chh(j) we immediately know that the limiting variance
of the bracketed term on the right-hand side of Lemma 2.3.2 is
X"S
ff#jfh(FBS@fh#SfhB@F@)#jhhFBShhB@F@, (5)
where
Scheme jfh jhh
Recursive 1!n~1ln(1#n) 2[1!n~1ln(1#n)] Rolling,n)1 n/2 n!n2/3
Rolling, 1(n(R 1!(2n)~1 1!(3n)~1
Fixed 0 n
(6)
Theorem 2.3.1. Given Assumptions 1}5, (a) P~0.5+Tt/R(f
t,q(bKt)!Eft`q) P
dN(0,X) for X dexned in (5), (b) if either F"0 or n"0 then P~0.5+Tt/R(f
t,q(bKt)!Eft`q)PdN(0,Sff), and (c)P~1+Tt/Rft,q(bKt)P1Eft`q.
Theorem 2.3.1 shows that the statistic is limiting normal and that out-of-sample averages provide consistent estimates of population moments. The distinction between parts (a) and (b) is exclusively whether or not parameter uncertainty is relevant to the asymptotic covariance. To make this more clear, notice that for all sampling schemesX"S
ffwhen eitherF"0 orn"0. Since
3West (1996) notes that under the recursive scheme, parameter uncertainty is also irrelevant when jfh(FBS@th#Sfh@ B@F@)#jhhFBS
hhB@F@"0. Also, West and McCracken (1998) show that augmented regression-based tests can remove parameter uncertainty.
orn"0.3If this is the case then the results of Diebold and Mariano (1995) are applicable even though parameters have been estimated.
The"nal step is to construct a consistent estimate of the covariance matrixX. To do so, we need to design consistent estimates ofS
ff,Sfh,B,F,BShhB@,jfh
andj
hh. One can estimateBconsistently by simply using the in-sample
informa-tion from the"nal parameter estimates. Sincen("P/Ris a consistent estimate of
n and both j
fh and jhh are continuous in n, we can use jKfh,jfh(n() and jK
fh,jfh(n() to consistently estimate jfh and jhh. The term BShhB@ is the
asymptotic covariance matrix of the parameter estimates. Since most software packages automatically provide a consistent estimate of this matrix, an es-timator of BS
hhB@ is immediate from the "nal parameter estimates. If this
estimator is unavailable, another option is presented in Theorem 2.3.2. The matrixFis a bit more di$cult to estimate. SinceFvaries withf
t,q, so will
its estimator. In any respect,Fis an expectation and hence Theorem 2.3.1(c) can be used to estimate it.
To clarify the issues in estimatingF, I will brie#y presentF for the test of equal MAE. For the sequel lettx(x) andW
x(x) denote the marginal p.d.f. and
c.d.f. of a random variablex, and lett
x(xDz) andWx(xDz) denote the conditional
p.d.f. and c.d.f. of a random variable x given the value of another random variablez. Assume that each p.d.f. is continuous and has a bounded density in an open neighborhood of the origin.
The test for equal MAE with q"1 involves the null hypothesis H
0: E(Du1,t`1D!Du2,t`1D)"0. If the two potential models for the predictand
y
t`1 are scalar linear regression models with regressors Z1,t and Z2,t then
the relevant test statistic isXK~0.5P~0.5+Tt/R(f
t,1(bKt)!0) withft,1(bKt)"Dyt`1!
Z@1,tbK1,tD!Dy
t`1!Z@2,tbK2,tDandbKt"(bK@1,t,bK@2,t)@. To presentFit is convenient to
de"ne F"(F
1,F2) relative to the partition of the parameter vector
4It should be noted thatq!1 dependence in the levels of a forecast error does not implyq!1 dependence of a function of those forecast errors. For example, a one-step ahead forecast error may form a martingale di!erence sequence but still exhibit serial correlation in its square. See Harvey et al. (1998) for a discussion.
If we evaluate at the true parameterbHwe have
F negative. If we impose the condition thatX
t is strictly stationary,F"Ft allt.
For the test of equal MAE,F
1 in (8) can be consistently estimated by
FK1"!P~1+T
t/R
sgn(y
t`1!Z@1,tbK1,t)Z@1,t. (9)
ReintroducingF
2into the discussion (and noticing that there is an extra minus
sign introduced), we can estimate F consistently using FK"(FK1,FK2) with
If we are willing to impose the stronger assumption that for eachi"1, 2,u i,t`1,
Similar arguments can be used to deriveFand a consistent estimatorFK for other test statistics. Rather than do so, for the remainder of the paper I will assume that such an estimatorFK exists.
To complete the construction of a consistent estimate of X, we need to generate consistent estimates of S
ff, Sfh and possibly Shh. If ft`q and ht are m-dependent of known order then Assumptions 1}5 su$ce for constructing consistent estimates ofS
ff,SfhandShh(as we will see in Theorem 2.3.2(a)). For
example, when evaluating the q-step ahead predictive ability of two models, Swanson and White (1997) estimateS
ffusing the"rstq!1 sample
autocorrela-tions off
using a kernel-based estimator. Such an estimator requires imposing conditions on a kernel,K(x), as well as stronger moment and mixing conditions ong
t(b).
Assumption 6. (a) Let K(x) be a kernel such that for all
x,DK(x)D)1,K(x)"K(!x),K(0)"1,K(x) is continuous, and
:=~=DK(x)Ddx(R, (b) forude"ned in Assumption 5, some bandwidthMand constanti,i3(0, min(u, 0.5)),M"O(Pi), and (c) There existsr63(1, 2] such that (1!i)~1(r6(dand+=
j/1a(r6
~1~
d~1)
j (R.
Throughout the following, and for"xedj*0,CK
ff(j)"P~1+Tt/R`j(ft,q(bKt)! fM )(f
t~j,q(bKt~j)!fM )@,CKfh(j)"P~1+Tt/R`j(ft,q(bKt)!fM)h@t`q~j(bKt~j) andCKhh(j)"
P~1+Tt/R`jh
t`q(bKt)ht`@ q~j(bKt~j) where fM"P~1+Tt/Rft,q(bKt). Furthermore, for
j(0,CK
ff(j)"CKff(!j)@,CKfh(j)"CKfh(!j)@, and CKhh(j)"CKhh(!j)@.
Theorem 2.3.2. (a) Under Assumptions 1}5, CK
ff(j)P1Cff(j),CKfh(j)P1Cfh(j),
and CK
hh(j)P1Chh(j). (b) Under Assumptions 1}6, SKff"
+P~1
j/~P`1K(j/M)CKff(j)P1Sff,SKfh"+jP/~~1P`1K(j/M)CKfh(j)P1Sfh andSKhh"
+P~1
j/~P`1K(j/M)CKhh(j)P1Shh.
We now have all the tools necessary to conduct asymptotically valid out-of-sample inference concerning the moments of nonsmooth functions of parametric forecasts and forecast errors. For example, given FK, BK and n( such that
FKP
1F,BKP1Bandn(Pnwe can use Theorem 2.3.2 to createSKff,SKfh,SKhhsuch
that XK"SK
ff#jKfh(FKBKSK@fh#SKfhBK@FK@)#jKhhFKBKSKhhBK@FK@P1X. Then, using
The-orem 2.3.1, we know that sT,XK~0.5P~0.5+Tt/R(f
t,q(bKt)!h0)P$N(0,Il). If l"1 we can use standard normal tables to test the null. Ifl'1 we can use the fact thats@TsTP
$s2(l) and hence chi-square tables can be used to test the null.
3. Empirical evidence
the test for equal MAE ignoring parameter uncertainty using the statistic proposed in Diebold and Mariano (1995). Finally, I construct the test for equal MSE ignoring the potential e!ects of parameter uncertainty. Under the assump-tion that OLS provides consistent estimates of the parameters, West (1996) has shown that one can ignore parameter uncertainty when testing for equal MSE.
3.1. Data and sources
The sample period includes 519 monthly observations from 1954:01 to 1997:03. The starting point 1954:01 is chosen to avoid the Treasury-Fed Accord to peg interest rates. It is also the "rst month for which monthly frequency observations for dividend yield exist for the S & P 500 composite.
I use the closing value of the S & P 500 composite as of the"nal Wednesday of the month as the stock price (P
t). These are obtained from Standard and Poor's
Current Statistics (1997) and Security Price Index Record (1997). The one-month risk-free rate (I
t), used to construct excess returns, is the US Treasury Bill
series obtained from Ibbotson Associates (1997). Using these two series I con-struct excess returns as Return
t"(Pt#Dt!Pt~1)/Pt~1!It~1. Standard and
Poor's Statistical Service does not publish the monthly dividend series (D t).
I construct one by summing the present and previous three quarter aggregate dividends and dividing by 12. Pesaran and Timmermann (1995) also use this technique.
The two predictors are dividend yield (D>
t~1) and the earnings}price ratio
(EP
t~1). To insure that the predictors are truly ex ante I do not use the dividend
series (D
t) constructed above since it includes information through the end of the
present quarter. Instead, I use the dividend yield as reported in the Standard and Poor's Security Price Index Record at the end of each month. For the same reasons, I use the inverse of the price}earnings ratio rather than construct an earnings}price ratio using quarterly information on earnings.
Table 1 reports standard descriptive statistics regarding OLS regressions that use the dividend-yield or the earnings}price ratio as predictors. Each regression exhibits little linear predictability. The residuals in each regression have distri-butions that are skewed and heavy tailed. The residuals exhibit little serial correlation but are conditionally heteroskedastic in the regressors and exhibit ARCH-type behavior.
3.2. Methodology and results
Let the scalary
t`1denote Returnt`1and letZ1,tandZ2,t denote the (2]1)
vectors (1, DY
t)@and (1, EPt)@, respectively. We are interested in comparing the
predictive ability of the two simple linear regression models
y
Table 1
Summary statistics for full sample regressions of excess returns to S&P 500 composite!
PanelA: Unrestricted linear regression using both dividend yield and earnings-price ratio
Predictors Constant DY EP R2"0.0079
DW"1.9416
Coe$cient !0.0115 0.0099 !2.6716 Skewness coe$cient"!0.3539 (S.E.) (0.0084) (0.0054) (2.1461) Kurtosis coe$cient"2.0389 LM test for heteroskedasticity in residuals: s2(5)"10.7396 withp-value"0.0567 LM test for serial correlation in residuals: s2(12)"13.0227 withp-value"0.3674 LM test for serial correlation in squared
residuals:
s2(12)"26.6380 withp-value"0.0087 PanelB: restricted linear regression using dividend yield
Predictors Constant DY R2"0.0048
DW"1.9399
Coe$cient !0.0058 0.0032 Skewness coe$cient"!0.3714 (S.E.) (0.0079) (0.0022) Kurtosis coe$cient"1.9942 LM test for heteroskedasticity in residuals: s2(2)"6.0936 withp-value"0.0475 LM test for serial correlation in residuals: s2(12)"12.8804 withp-value"0.3778 LM test for serial correlation in squared
residuals:
s2(12)"27.5580 withp-value"0.0064 PanelC: restricted linear regression using earnings-price ratio
Predictors Constant EP R2"0.0020
DW"1.9406
Coe$cient 0.0052 0.7589 Skewness coe$cient"!0.3568 (S.E.) (0.0061) (0.8570) Kurtosis coe$cient"2.0280 LM test for heteroskedasticity in residuals: s2(2)"8.4639 withp-value"0.0145 LM test for serial correlation in residuals: s2(12)"12.6658 withp-value"0.3938 LM test for serial correlation in squared
residuals:
s2(12)"27.3262 withp-value"0.0069
!Notes: The data consist of monthly observations from 1954:01 to 1997:03 (¹"519). See Section 3 of the text for a description of the data. Standard errors are constructed using a heteroskedasticity robust covariance matrix. The skewness and kurtosis coe$cients are constructed using the regres-sion residuals.
The parameters are estimated using OLS and then the parameter estimates
bKi,t are used to construct the forecastsZ@ i,tbKi,t.
In this exercise I construct each of the three test statistics nine di!erent ways corresponding to three di!erent forecasting schemes (recursive, rolling and "xed) and three di!erent splits of the data. I use the three sample splits (54:01}89:12, 90:01}97:03), (54:01}79:12, 80:01}89:12) and (54:01}79:12, 80:01}97:03). Given these splits, the corresponding values ofn("P/Rare 0.20, 0.38 and 0.66.
5I use the integer part ofP1@3as the window width.
of heteroskedasticity or serial correlation. I use a Newey}West (1987) serial correlation consistent covariance estimator of S
ff, Sfh and Shh.5 I use the
out-of-sample forecast errors and out-of-sample values ofy
t`1,Z1,t and Z2,t
in the construction of f
t,1(bKt)"Dyt`1!Z@1,tbK1,tD!Dyt`1!Z@2,tbK2,tD and h
t,1(bKt)"[(yt`1!Z@1,tbK1,t)Z1,@ t,(yt`1!Z@2,tbK2,t)Z@2,t]@. I estimate B using the
out-of-sample observations onZ
1,t andZ2,t to form the (4]4) block diagonal
matrix with (P~1+Tt/
RZ1,tZ@1,t)~1 in the upper (2]2) diagonal position and
(P~1+Tt/
RZ2,tZ@2,t)~1in the lower (2]2) diagonal position. To estimateFI use
(9) and (10) directly.
The test of equal MAE is constructed a second time ignoring parameter uncertainty. This time the variance is estimated only using an estimate ofS
ff.
The estimate was identical to the one used above.
For the sake of comparison, the test of equal MSE was also constructed. For this test, f
t,1(bKt)"(yt`1!Z@1,tbK1,t)2!(yt`1!Z@2,tbK2,t)2. Under the
assump-tions that OLS provides consistent estimates of the parameters,F"0. Using the results in West (1996) we then know that we can ignore parameter uncertainty when estimating the asymptotic variance. In estimating S
ff, I presume no
knowledge regarding the existence of serial correlation. Once again I use the Newey}West (1987) estimator.
Table 2 reports the results of the tests. Each subpanel corresponds to one of the sample splits. The"rst four columns report the raw out-of-sample MAE and MSE associated with each of the two predictive models. The MAE values are scaled by 100 and the MSE values are scaled by 1000. Note that in every instance, the MAE and MSE is larger for model 1 than for model 2.
Column 5 reports the test for equal MAE that accounts for parameter uncertainty. Column 6 reports the test for equal MAE that ignores parameter uncertainty. In every instance, accounting for parameter uncertainty increases the magnitude of the estimated variance. This causes the statistics that account for parameter uncertainty to be uniformly smaller than the ones that do not. This e!ect can also be seen in thep-values reported in columns 8 and 9. Because of these changes, there are instances in which accounting for parameter uncer-tainty can a!ect the decision to reject or fail to reject the null of equal MAE.
During the 1980s, there does not appear to be any di!erence in the predictive ability using either of the two models. This holds whether we use MAE or MSE as the measure of predictive ability. For this time frame, accounting for para-meter uncertainty made little di!erence in the tests for equal MAE.
Table 2
Testing for relative predictive ability of predictions of excess returns to S&P 500 composite!
Raw values Statistics P-values (2-sided)
Adj. UnAdj. Adj. UnAdj.
MAE-1 MAE-2 MSE-1 MSE-2 MAE MAE MSE MAE MAE MSE
90:01}97:03:n("0.20
Recursive 2.691 2.614 1.183 1.153 1.990 2.565 1.726 0.047 0.010 0.084 Rolling 2.688 2.634 1.183 1.163 1.541 2.354 1.455 0.123 0.019 0.146 Fixed 2.724 2.619 1.199 1.154 1.899 2.540 1.765 0.058 0.011 0.078 80:01}89:12:n("0.38
Recursive 3.551 3.536 2.280 2.271 0.506 0.515 0.339 0.613 0.606 0.734 Rolling 3.552 3.551 2.284 2.282 0.046 0.048 0.115 0.963 0.962 0.909 Fixed 3.557 3.534 2.270 2.253 0.539 0.590 0.432 0.590 0.555 0.666 80:01}97:03:n("0.66
Recursive 3.189 3.148 1.819 1.801 1.724 1.844 0.987 0.085 0.065 0.324 Rolling 3.203 3.180 1.828 1.817 1.154 1.326 0.768 0.249 0.185 0.442 Fixed 3.228 3.153 1.831 1.792 1.610 2.223 1.364 0.107 0.026 0.173
!Notes: Table 2 reports empirical results relevant to testing for equal MAE and equal MSE between two models used to predict the S&P 500 composite portfolio. Model 1 is an OLS estimated linear regression with an intercept and once lagged dividend yield. Model 2 is the same but uses the earnings}price ratio. The
"rst four columns report the realized out-of-sample values of the MAE and MSE associated with each model during three di!erent forecast periods. Column 5 reports the values of the test for equal MAE adjusted (Adj.) for parameter uncertainty. Columns 6 and 7 report the values of the statistics used to construct the tests for equal MAE and equal MSE both ignoring parameter uncertainty (UnAdj.). Columns 8}10 report thep-values (2-sided, from the standard normal distribution) associated with the statistics in columns 5}7. MAEs are scaled by 100. MSEs are scaled by 1000.
forecasting schemes. We do reject at the 5% when the recursive scheme is used but the evidence is weaker than when parameter uncertainty is ignored. Notice that during the 1990s the test for equal MSE fails to reject the null of equal predictive ability at the 5% for any of the sampling schemes. The null can be rejected at the 10% level when either the recursive or"xed schemes are used.
Similar observations can be made regarding the tests for equal MAE through-out both the 1980s and 1990s. Ignoring parameter uncertainty, the"xed scheme rejects the null at the 5% level. When parameter uncertainty is accounted for we fail to reject at even the 10% level. When the rolling scheme is used we fail to reject the null at the 10% level regardless of parameter uncertainty. When the recursive scheme is used we reject the null at the 10% level regardless of parameter uncertainty. Over the same time frame, we fail to reject the null for equal MSE when any of the forecasting schemes are used.
clear is whether it was necessary to account for parameter uncertainty in the"rst place. Recall that if F"0 then parameter uncertainty is asymptotically irrel-evant and henceS
ff is the relevant asymptotic variance. For the test of equal
MAE,Fcan be zero if the disturbances have a zero median conditional on the values of the regressors. In this application it seems reasonable to reject that assertion. Using the skewness coe$cients reported in Table 1, the null of zero skewness is rejected at the 1% level for each of the three sets of residuals.
4. Simulation evidence
The asymptotic results of Section 2 need only be appropriate for large in-sample sizes R and out-of-sample sizes P. It is not clear how well the asymptotic approximation will perform in sample sizes commonly used in empirical work. To examine this problem, I present simulations of the three tests of either equal MAE or equal MSE between the two simple linear regressions
y
t`1"Z@1,tb1H#u1,t`1 and yt`1"Z@2,tbH2#u2,t`1, (12@)
whereZ
1,tandZ2,t denote the (2]1) vectors (1,z1,t)@and (1,z2,t)@, respectively.
Each statistic is constructed in precisely the same manner as in Section 3. For each statistic I report the size and size-adjusted power of the test in samples of the size used in Section 3.
First, I simulate a hypothetical data generating processes that is stylized to the empirical results of Section 3. The data generating process I have chosen has the representation
y
t`1"bH2,2z2,t`1#ut`1, ut`1"c(x1,t`1#x2,t`1)#gt`1,
x
i,t`1"2~0.5[(1!a2)z2i,t`1!1], zi,t`1"azi,t#ei,t`1, (13)
e
i,t`1&i.i.d. N(0, 1) gt`1&i.i.d.t(6), e1,t`1oe2,t`1ogt`1,
c"0.25,a"0.9.
The parameterbH2,2(the second component ofbH2) is a tuning parameter used to distinguish between the null and alternative. WhenbH2,2"0 the null of either equal MAE or equal MSE is satis"ed. When bH2,2O0 the alternative holds; model two has both a lower MAE and lower MSE than does model 1. I allow this parameter to vary across the range 0, 0.10, 0.25, 0.50 and 1.00. By doing so I am better able to determine how accounting for parameter uncertainty a!ects the power of the test. I am also better able to determine whether tests for equal MAE or tests for equal MSE are more powerful for detecting small deviations from the null.
The initial conditions for thez
i,tare drawn from their unconditional
distribu-tion. The y
500#519"500#(¹#1)"1019. The initial 500 observations are generated to burn out the e!ects of initial conditions.
The results are based upon 5000 replications. Note that the same simulated data is used for each sampling scheme and each (P,R) combination in order to facilitate the di!erent small sample comparisons. To make comparisons possible across the three hypothesis tests the random number generator is seeded so that the three sets of 5000 separate samples are the same.
I chose this data generating process for two basic reasons. The"rst is that it exhibits many of the characteristics of the data used in Section 3. The regressors exhibit strong serial dependence. The distribution of the predictand, y
t, has
heavy tails and is skewed. The residuals from the two predictive models will exhibit conditional heteroskedasticity in the regressors. Also, since the re-gressors are serially correlated, the squares of the residuals from the two predictive models will be serially correlated (i.e. GARCH(1, 1)-like e!ects).
The second reason is that I wanted the two linear models to have little, if any, predictive ability in order to match the very smallR2values commonly observed in the literature. Here, both predictive models have a populationR2of zero. This occurs because bothbH1andbH2are zero under the null. This implies that both models have the same predictive ability and hence the null is satis"ed.
But it does more than that. It implies thatf
t`1andSffare equal to zero when
either MAE or MSE is used to measure predictive ability. This does not imply that an asymptotically standard normal test for equal predictive ability cannot be constructed. For there to be asymptotic normality the limiting variance,X, must be positive de"nite. If parameter uncertainty is irrelevant thenS
ffmust be
positive de"nite. On the other hand, if f
t`1 is zero for all t, and hence
S
ffFor the test of equal MAE in this exercise, and when parameter uncer-"FBS@fh"0, thenFBShhB@F@must be positive de"nite.
tainty is accounted for, this is not a problem. This occurs because the dis-turbances are skewed for each predictive model and hence F"
(!E sgn(u
1,t`1)Z@1,t, E sgn(u2,t`1)Z@2,t)@O0. It is a problem for the test of equal
MSE since consistent estimation of the parameters by OLS implies
F"(!Eu
1,t`1Z@1,t, Eu2,t`1Z@2,t)@"0. Hence, a priori we expect the test for
equal MAE, corrected for parameter uncertainty, to be reasonably sized. We also expect the test for equal MAE, without the correction for parameter uncertainty, and the test for equal MSE to be missized.
Table 3 reports the actual size of the three tests when the critical values
Table 3
Actual size of out-of-sample tests!
Valid MAE Invalid MAE Invalid MSE
1% 5% 10% 1% 5% 10% 1% 5% 10%
n("0.20 R 0.0118 0.0754 0.1416 0.0758 0.2048 0.3032 0.0642 0.2220 0.3382 L 0.0120 0.0784 0.1484 0.0670 0.1974 0.2986 0.0578 0.2116 0.3336 F 0.0134 0.0720 0.1504 0.1130 0.2556 0.3542 0.1048 0.2798 0.3916
n("0.38 R 0.0072 0.0418 0.0968 0.0434 0.1624 0.2604 0.0344 0.1704 0.2910 L 0.0082 0.0476 0.1064 0.0434 0.1536 0.2484 0.0344 0.1604 0.2722 F 0.0046 0.0362 0.0898 0.1046 0.2522 0.3462 0.0908 0.2632 0.3796
n("0.66 R 0.0036 0.0328 0.0768 0.0386 0.1382 0.2224 0.0324 0.1388 0.2376 L 0.0056 0.0458 0.0976 0.0330 0.1188 0.1984 0.0250 0.1200 0.2142 F 0.0012 0.0194 0.0596 0.1074 0.2626 0.3524 0.0904 0.2618 0.3770
!Notes: Subpanels denotedn("0.20, 0.38 and 0.66 indicate sample sizes and splits corresponding to those used in the empirical results reported in Table 2. Columns denoted 1%, 5% and 10% present the actual size of the test when the critical values $2.576, $1.96 and $1.645 are used, respectively. Rows denoted R, L and F signify the use of the Recursive, roLLing and Fixed schemes, respectively. The results are based upon 5000 replications. See Section 4 for further details.
true for the"xed scheme. Overall it seems that the valid version of the test for equal MAE is reasonably sized for smaller values ofn( while the two invalid tests are seriously oversized for all values of n(. Corradi et al. (1999) also "nd that smaller values ofn( lead to more accurately sized tests.
Table 4 presents the actual size-adjusted power of the three tests. Each panel corresponds to a particular choice of the parameterbH2,2. In each case, theR2for model 1 is zero but theR2for model 2 ranges from 0.61, 0.15, 0.04 to 0.003 as
bH2,2 varies from 1.00, 0.50, 0.25 to 0.10. In the"rst two panels the size-adjusted power is quite good for each of the three tests. The asymptotically valid test for equal MAE is best followed by its invalid version and then the test for equal MSE. It also appears that larger values ofn( are associated with greater power for each of the tests.
Table 4
Size adjusted power of out-of-sample tests!
Valid MAE Invalid MAE Invalid MSE
1% 5% 10% 1% 5% 10% 1% 5% 10%
PanelA:bH2,2"1.00,R2"0.61
n("0.20 R 1.0000 1.0000 1.0000 0.9998 1.0000 1.0000 0.9874 0.9984 0.9996 L 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9876 0.9988 0.9998 F 1.0000 1.0000 1.0000 0.9982 1.0000 1.0000 0.9602 0.9958 0.9988 n("0.38 R 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9910 1.0000 1.0000 L 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9918 1.0000 1.0000 F 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9652 0.9966 0.9998 n("0.66 R 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9984 1.0000 1.0000 L 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9988 1.0000 1.0000 F 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9896 0.9994 1.0000
PanelB:bH2,2"0.50,R2"0.15
n("0.20 R 0.9882 0.9962 0.9986 0.9486 0.9856 0.9930 0.8976 0.9698 0.9864 L 0.9870 0.9968 0.9988 0.9564 0.9874 0.9936 0.8974 0.9740 0.9876 F 0.9856 0.9950 0.9976 0.8794 0.9744 0.9874 0.7994 0.9466 0.9748 n("0.38 R 0.9978 0.9992 0.9998 0.9874 0.9972 0.9986 0.9384 0.9896 0.9952 L 0.9972 0.9992 0.9994 0.9904 0.9972 0.9986 0.9410 0.9904 0.9960 F 0.9970 0.9986 0.9994 0.9488 0.9912 0.9964 0.8512 0.9614 0.9856 n("0.66 R 1.0000 1.0000 1.0000 0.9990 1.0000 1.0000 0.9878 0.9994 0.9996 L 0.9998 1.0000 1.0000 0.9992 1.0000 1.0000 0.9900 0.9990 0.9996 F 1.0000 1.0000 1.0000 0.9982 0.9992 0.9996 0.9470 0.9912 0.9982
PanelC:bH2,2"0.25,R2"0.04
n("0.20 R 0.7482 0.7858 0.8478 0.3836 0.6224 0.7182 0.3582 0.5636 0.6822 L 0.6274 0.7836 0.8376 0.4084 0.6422 0.7290 0.3580 0.5830 0.6950 F 0.6264 0.7760 0.8304 0.2614 0.5356 0.6676 0.2534 0.4874 0.6166 n("0.38 R 0.7482 0.8730 0.9066 0.5360 0.7412 0.8168 0.4406 0.6790 0.7676 L 0.7162 0.8626 0.9032 0.5568 0.7404 0.8208 0.4438 0.6796 0.7804 F 0.7388 0.8514 0.8910 0.3496 0.6178 0.7278 0.3060 0.5464 0.6624 n("0.66 R 0.9272 0.9618 0.9746 0.7830 0.9078 0.9412 0.6688 0.8490 0.8952 L 0.9020 0.9590 0.9728 0.7846 0.9130 0.9514 0.6794 0.8622 0.9098 F 0.8998 0.9438 0.9606 0.6404 0.8080 0.8728 0.4730 0.7216 0.8160
PanelD:bH2,2"0.10,R2"0.003
n("0.20 R 0.1182 0.2426 0.3312 0.0544 0.1510 0.2268 0.0588 0.1484 0.2288 L 0.1114 0.2338 0.3180 0.0598 0.1634 0.2298 0.0574 0.1574 0.2350 F 0.1172 0.2424 0.3288 0.0332 0.1214 0.2002 0.0386 0.1216 0.1930 n("0.38 R 0.1228 0.2772 0.3652 0.0618 0.1792 0.2644 0.0518 0.1690 0.2484 L 0.0948 0.2522 0.3410 0.0650 0.1692 0.2652 0.0510 0.1568 0.2512 F 0.1264 0.2698 0.3520 0.0342 0.1200 0.2054 0.0354 0.1120 0.1888 n("0.66 R 0.2076 0.3622 0.4552 0.1012 0.2498 0.3498 0.0862 0.2356 0.3202 L 0.1578 0.3160 0.4142 0.0924 0.2452 0.3522 0.0818 0.2270 0.3300 F 0.2002 0.2424 0.4532 0.0658 0.1704 0.2618 0.0484 0.1552 0.2434
The simulations indicate that correcting for parameter uncertainty can im-prove the size of tests for equal MAE when parameter uncertainty is asymp-totically relevant. They also indicate that tests for equal MAE can be better at detecting small deviations from the null than can tests for equal MSE. This is especially important given that linear models tend to have low levels of predic-tive ability for excess returns to many assets.
5. Conclusion
In this paper, I show that when parameters are used to construct forecasts and forecast errors, parameter uncertainty can a!ect the limiting distribution of nonsmooth out-of-sample measures of predictive ability. Section 2 presents su$cient conditions for scaled out-of-sample averages of nondi!erentiable func-tions of forecasts and forecast errors to be asymptotically normal. For these functions I show that the limiting covariance structure can be consistently estimated in a straightforward manner.
I then consider how well these statistics perform in moderate sample sizes and how important it is to account for parameter uncertainty when estimating the limiting covariance. The empirical exercise in Section 3 indicates that at times, the correction for parameter uncertainty can lead to di!erent conclusions regarding the predictive ability of a model. The simulation exercise in Section 4 shows that the tests can be well-sized if one accounts for parameter uncertainty. For the test of equal MAE the test was more accurately sized and more powerful when the covariance was estimated accounting for parameter uncertainty than it was ignoring the parameter uncertainty. The simulations also indicate that the test for equal MAE may be a better choice than the test for equal MSE in detecting small deviations in predictive ability between two forecasting models. There are several possible topics for future research concerning out-of-sample inference. Perhaps, the most important would be to develop a general theory for the out-of-sample comparison of nested models; such a theory would have applications to tests of causality (Ashley et al., 1980) and the martingale di! er-ence hypothesis. Such a theory could also be extended to the out-of-sample comparison of multiple nested models. In either case it would be useful to allow for models with stationary or nonstationary observations. Secondly, since power of the test is of primary importance it would be helpful to determine the optimal choice of sample split for maximizing power of the test.
Acknowledgements
to John Jones, Stephen Sapp and Tricia Gladden for their suggestions. An earlier draft of this paper was distributed using the title&Out-of-Sample Infer-ence for Moments of Nondi!erentiable Functions'.
Appendix A
Notation: sup
tdenotes supRxtxT;&var'and&cov'denote variance and covariance;
all limits are taken as ¹ goes to in"nity; the summation +t denotes +Tt/R; for Lemmas A.2, A.3 and 2.3.1 N(e) denotes the open ball N(bH,e) about bH
generated by the max norm; f
t(c) denotes ft,q(c)!Eft,q(c)!ft`q#Eft`q; For
notational simplicity, I consider throughout the case in whichk"1,l"1 and
q"1 so thatbH, f
t,q andht are scalars.
Lemma A.1. Fora3[0, 0.5), (a) sup
tDPaH(t)DP10; and (b) suptDPa(bKt!bH)DP10. Lemma A.2. For all a3[0, 0.5) and e'0 such that N(P~ae)LN and
0(P~ae(1, there exist constants 0(CI(R and u
0'0 such that (a)
sup
tDDsupc|N(P~ae)ft(c)DD2d)CI(P~ae)r0; and (b) for all integers j,
sup
tDE supMc0,c1N|N(P~ae)ft(c0)ft`j(c1)D)CI aj(d~1)@d(P~ae)r0. Lemma A.3. For xxed j, CK
ff(j)P1Cff(j), CKfh(j)P1Cfh(j) and CK
hh(j)P1Chh(j).
Proof of Lemma A.3. Consider CK
ff(j)"P~1+Tt/R`j(ft,q(bKt)! fM )(f
t~j,q(bKt~j)!fM ). The other autocovariances can be handled similarly. By
adding and subtracting terms we have
CK
ff(j)"P~1 T
+
t/R`j
(f
t`q!Eft`q)(ft`q~j!Eft`q)#rT, (A.1)
where
r
T"P~1 T
+
t/R`j
(f
t,q(bKt)!ft`q(bH))(ft`q~j!Eft`q)#(Eft`q!fM )2 #(Ef
t`q!fM )P~1
T
+
t/R`j
(f
t`q~j!Eft`q) #P~1 +T
t/R`j
(f
#P~1 +T
Since the"rst term of (A.1) converges in probability toC
ff(j) by White (1984,
Corollary 3.48), I need only show thatr
Tconverges in probability to zero. Using
the triangle and the Cauchy}Schwarz inequalities it is straightforward to show that the absolute value of (A.2) must be less than or equal to
r8
To facilitate reference to Theorem 2.3.2, it is useful to show that for all
for all e'0 and 0(i/(2u))a(0.5 there exists ¹
The remainder of the proof is to show that there exists ¹
1 such that for all ¹'¹
1the"rst term on the r.h.s. of (A.4) is less thand/2. Applying Markov's
inequality and Assumption 5 we have
#Prob
A
supThe remainder of the proof then is to show that there exists¹
1'¹0such that
for all ¹'¹
1 the "rst term on the r.h.s. of (A.5) is less than d/2. For the
remainder of this proof only, let +j denote +~P`1xjE0xP~1. Applying Chebyshev's inequality we have
e20Prob
A
supD
1"+=j/0ja(jd~1)@d and hence D2"+=j/0a(jd~1)@d are positive and "nite. If
I choose¹
1andesuch that for all¹'¹1,e((dPar0e20/2CI(D1#P~1D2))1@r0
and 0(P~ae(1, the result follows. h Proof of Lemma 2.3.2. Expanding Ef
t,q(bKt) aboutbHwe have It then su$ces to show that the latter three terms of (A.8) are o
1(1). Using the
triangle inequality we know that the absolute value of the latter three terms in (A.8) are less than or equal to
Since DFD and DBD are "nite, sup
tDB(t)!BD"o1(1) by Lemma A.1(a),
and sup
tDLEft,q(bIt)/Lb!FD"o1(1) by the continuity of LEft,q(b)/Lb and
Lemma A.1(b), the result will follow ifP~0.5+tDH(t)D)sup
tP0.5DH(t)D"O1(1).
I will show this for the recursive scheme, the"xed follows immediately and the rolling follows from a decomposition similar to that in Lemma A.1. From Hall and Heyde (1980, p. 20) and the recursive proof in Lemma A.1,h
tis a mixingale
satisfying E[sup
1xsxTD(h1#2#hs)2D])c¹for a constantc. But
PE
C
supt
Dt~2(h1#2#h
t)2D
D
)PR~2EC
sup tD(h1#2#h
t)2D
D
)PR~2E
C
sup1xsxT D(h
1#2#hs)2D
D
which is less than or equal toPR~2c¹which in turn converges tocn(1#n) and hence sup
tP0.5DH(t)D"O1(1) by Markov's inequality. h Proof of Theorem 2.3.1. (a) LetX(¹),P~0.5+t(f
t`q!Eft`q#FBH(t)). From
Lemmas 2.3.1 and 2.3.2 we knowP~0.5+t(f
t,q(bKt)!Eft`q)"X(¹)#o1(1), with
lim var[X(¹)]"X. Asymptotic normality then follows from Theorem 3.1 of Wooldridge and White (1998). Details are in the additional appendix. (b) Follows immediately from (a) and (5) in the text. (c) Follows immediately from (a) and (b). h
Proof of Theorem 2.3.2. The proof of (a) is immediate from Lemma A.3. The proof of (b) requires more detail. The proof will be provided forSKff, the others follow from similar arguments. Using the decomposition in Lemma A.3 we have
SKff" P+~1
j/~P`1
K(j/M)CK ff(j)
" P+~1
j/~P`1
K(j/M)
G
P~1 +Tt/R`j
(f
t`q!Eft`q)(ft`q~j!Eft`q)
H
# P+~1
j/~P`1
K(j/M)r
T (A.9)
for r
T de"ned in (A.2). The "rst right-hand side term in (A.9) converges in
probability toS
ff by Hansen (1992, Theorem 1). It then remains to be shown
that the second term in (A.9) is o
1(1). SinceCff(j)"Cff(!j)@it is su$cient to
show this for the+Pj/0~1K(j/M)r
Tportion of the second term. Sincer8T(de"ned in
utilized to obtain
K
P~1+
j/0
K(j/M)r T
K
)P~1
+
j/0
DK(j/M)Dr8T
)(M/Pi)(M~1P+~1
j/0
DK(j/M)D)Pir8
T. (A.10)
By assumption (M/Pi)"O
1(1) and (M~1+Pj/0~1DK(j/M)D)P:=0DK(x)Ddx(R.
The result follows since by the proof of Lemma A.3,r8
T is o1(P~i). h
References
Akgiray, V., 1989. Conditional heteroscedasticity in time series of stock returns: evidence and forecasts. Journal of Business 62, 55}80.
Ashley, R., Granger, C.W.J., Schmalensee, R., 1980. Advertising and aggregate consumption: an analysis of causality. Econometrica 48, 1149}1167.
Campbell, J.Y., Shiller, R.J., 1988. Stock prices, earnings, and expected dividends. The Journal of Finance 43, 661}676.
Chen, X., Swanson, N.R., 1996. Semiparametric ARX neural network models with an application to forecasting in#ation. Working Paper, University of Chicago and Pennsylvania State University. Corradi, V., Swanson, N.R., Olivetti, C., 1999. Predictive ability with cointegrated variables,
manuscript, Texas A&M University.
Davidson, R., 1994. Stochastic Limit Theory. Oxford University Press, New York.
Diebold, F.X., Mariano, R.S., 1995. Comparing predictive accuracy. Journal of Business and Economic Statistics 13, 253}263.
Engel, C., 1994. Can the Markov switching model forecast exchange rates? Journal of International Economics 36, 151}165.
Fair, R.C., Shiller, R.J., 1990. Comparing information in forecasts from econometric models. The American Economic Review 80, 375}389.
Fama, E.F., 1991. E$cient capital markets: II. The Journal of Finance 46, 1575}1617.
Fama, E.F., French, K.R., 1988. Dividend yields and expected stock returns. Journal of Financial Economics 22, 3}25.
Gerlow, M.E., Irwin, S.H., Liu, T., 1993. Economic evaluation of commodity price forecasting models. International Journal of Forecasting 9, 387}397.
Granger, C., 1969. Prediction with a generalized cost of error function. Operational Research Quarterly 20, 199}207.
Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and Its Application. Academic Press, New York.
Hansen, B.E., 1992. Consistent covariance matrix estimation for dependent heterogeneous pro-cesses. Econometrica 60, 967}972.
Harvey, D.I., Leybourne, S.J., Newbold, P., 1998. Forecast evaluation tests in the presence of ARCH. Manuscript, Loughborough University and University of Nottingham.
Henriksson, R.D., Merton, R.C., 1981. On market timing and investment performance II: statistical procedures for evaluating forecasting skills. Journal of Business 54, 513}533.
Ibbotson Associates, 1997. In: Kaplan, P.D. (Ed.), Stocks, Bonds, Bills and In#ation: 1997 Yearbook. R.G. Ibbotson Associates, Chicago.
Kim, J., Pollard, D., 1990. Cube root asymptotics. The Annals of Statistics 18, 191}219. Kuan, C., Liu, T., 1995. Forecasting exchange rates using feedforward and recurrent neural
networks. Journal of Applied Econometrics 10, 347}364.
Meese, R.A., Rogo!, K., 1983. Empirical exchange rate models of the seventies: do they"t out of sample? Journal of International Economics 14, 3}24.
Newey, W.K., West, K.D., 1987. A simple positive semi-de"nite, heteroskedasticity and autocorrela-tion consistent covariance matrix. Econometrica 55, 703}708.
Pagan, A., Schwert, G.W., 1990. Alternative models for conditional stock volatility. Journal of Econometrics 45, 267}290.
Pesaran, M.H., Timmermann, A., 1992. A simple nonparametric test of predictive performance. Journal of Business and Economic Statistics 10, 561}565.
Pesaran, M.H., Timmermann, A., 1995. Predictability of stock returns: robustness and economic signi"cance. The Journal of Finance 50 (4), 1201}1228.
Randles, R.H., 1982. On the asymptotic normality of statistics with estimated parameters. The Annals of Statistics 10, 463}474.
Shiller, R.J., 1984. Stock prices and social dynamics. Brookings Papers on Economic Activity 2, 457}510.
Standard and Poor's Current Statistics, 1997, September. McGraw-Hill, New York. Standard and Poor's Security Price Index Record, 1997. McGraw-Hill, New York.
Stekler, H.O., 1991. Macroeconomic forecast evaluation techniques. International Journal of Fore-casting 7, 375}384.
Swanson, N.R., White, H., 1995. A model}selection approach to assessing the information in the term structure using linear models and arti"cial neural networks. Journal of Business and Economic Statistics 13, 265}275.
Swanson, N.R., White, H., 1997. A model}selection approach to real-time macroeconomic forecast-ing usforecast-ing linear models and arti"cial neural networks. The Review of Economics and Statistics 79, 540}550.
Weiss, A.A., 1996. Estimating time series models using the relevant cost function. Journal of Applied Econometrics 11, 539}560.
West, K.D., 1996. Asymptotic inference about predictive ability. Econometrica 64, 1067}1084. West, K.D., McCracken, M.W., 1998. Regression-based tests of predictive ability. International
Economic Review 39, 817}840.
White, H., 1984. Asymptotic Theory for Econometricians. Academic Press, New York. White, H., 2000. A reality check for data snooping. Econometrica, in preparation.