Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol99.Issue2.Dec2000:

(1)

*Corresponding author. Tel.:#1-225-388-3782; fax:#1-225-388-3807. E-mail address:[email protected] (M.W. McCracken).

Robust out-of-sample inference

Michael W. McCracken

*

Department of Economics, Louisiana State University, 2107 CEBA, Baton Rouge, LA 70803-0306, USA Received 25 September 1998; received in revised form 29 November 1999; accepted 13 March 2000

Abstract

This paper presents analytical, empirical and simulation results concerning inference about the moments of nondi!erentiable functions of out-of-sample forecasts and forecast errors. Special attention is given to the measurement of a model's predictive ability using the test of equal mean absolute error. Tests for equal mean absolute error and mean square error are used to evaluate predictions of excess returns to the S & P 500 composite. Simulations indicate that appropriately constructed tests for equal mean absolute error can provide more accurately sized and more powerful tests than inappro-priately constructed tests for equal mean absolute error and mean square error. ( 2000 Elsevier Science S.A. All rights reserved.

JEL classixcation: C52; C53; C32; C12

Keywords: Forecasting; Forecast evaluation; Hypothesis testing; Model comparison

1. Introduction

It is becoming common to evaluate a forecasting model's ability to predict using out-of-sample methods. Meese and Rogo!(1983), in predicting exchange rates, report the mean square error (MSE) of forecast errors. Akgiray (1989) uses the mean absolute error (MAE) to evaluate volatility forecasts of stock returns. Engel (1994) reports the number of times the direction of change in exchange

(2)

rates is accurately predicted. Swanson and White (1995) report the Schwarz information criterion as well as the out-of-sampleR2that result when forward interest rates are used to predict future spot rates.

These papers, and many others, evaluate predictive ability in one of two ways. Most do so by simply constructing point estimates of some measure of predic-tive ability. The most common measure is MSE. A few others argue heuristically that their tests of predictive ability are limiting normal and hence asymptotically valid t-statistics can be used to test hypotheses. For example, Pagan and Schwert (1990) and Fair and Shiller (1990) construct regression based tests for e$ciency and encompassing respectively. However, they do not provide a set of su$cient conditions for their statistics to be asymptotically standard normal. Recent theoretical work has attempted to provide those su$cient conditions.

When parametric forecasts and forecast errors are used to estimate moments or conduct inference there are two sources of uncertainty. There is uncertainty that exists even when we know the model parameters and there is uncertainty due to the estimation of parameters. Diebold and Mariano (1995) show how to construct asymptotically valid out-of-sample tests of predictive ability when there is no parameter uncertainty, for example, when parameters are known. Under this restriction, they are able to construct tests of hypotheses that involve moments of di!erentiable and nondi!erentiable functions such as those used to construct tests for equal MSE and equal MAE between two predictive models. When parameters are unknown, and must be estimated, parameter uncer-tainty can play a role in out-of-sample inference. West (1996) has shown how the uncertainty due to parameter estimation can a!ect the asymptotic distribution of moments of di!erentiable functions of out-of-sample forecasts and forecast errors. Given a parametric forecasting model, this allows for inference concern-ing tests of serial correlation, e$ciency, encompassing, zero mean prediction error and equal MSE between two predictive models.

In this paper I close some of the gaps between the work by Diebold and Mariano (1995) and West (1996). I extend the work by Diebold and Mariano (1995) by showing that parameter uncertainty can a!ect out-of-sample inference regarding moments of nondi!erentiable functions. As in West (1996), the para-meter uncertainty causes the limiting covariance structure to be nonstandard. The limiting covariance matrix contains two components: a standard compon-ent that would exist if the parameters used to construct forecasts were known in advance and a second component due to the fact that parameters are not known and have to be estimated.

(3)

measure of predictive ability is nondi!erentiable. Secondly, I allow model parameters to be estimated using loss functions that are not di!erentiable. By doing so I permit a greater degree of freedom in choosing the loss function used to estimate the parameters to match the loss function used to evaluate the forecasts. This may be bene"cial in light of the discussion in Weiss (1996).

These extensions are potentially useful since nonsmooth measures of predic-tive ability have been used to evaluate parametric predicpredic-tive models. Granger (1969) provides an early theoretical discussion. Empirical examples are plentiful. Gerlow et al. (1993) use MAE to evaluate predictive ability. Swanson and White (1997) use mean absolute percentage error (MAPE) to measure predictive ability. Stekler (1991) compares the predictive ability of two parametric models using the test of percent better, or what Diebold and Mariano (1995) refer to as the&sign test'. Engel (1994) constructs a test for sign predictability based upon the binomial distribution. Henriksson and Merton (1981) and Pesaran and Timmermann (1992) construct tests for sign predictive ability using a standard normal approximation.

Each of the measures of predictive ability mentioned above can be used to construct tests of forecast accuracy. As presented though, most ignore the possibility that the forecasts are generated parametrically and hence may be a!ected by parameter uncertainty. The results of West and McCracken (1998), concerning smooth measures of predictive ability, suggest that in many circum-stances it is inappropriate to ignore the parameter uncertainty.

In this paper I provide analytical, empirical and simulation results indicating that ignoring parameter uncertainty can be inappropriate when nonsmooth measures of predictive ability are used. I focus on the test of equal MAE as an example in which accounting for parameter uncertainty can be important. Although I emphasize the absolute value function, the asymptotic results are applicable to tests that use indicator functions.

For the results of this paper to hold, however, certain conditions must be met. Perhaps the most important is Assumption 4. There I assume that the expecta-tion of the funcexpecta-tion of interest must be continuously di!erentiable in the parameters. This assumption is not very restrictive when the absolute value function is being used and is the reason I use the test of equal MAE as a foil throughout the paper. It can be a problem when indicator functions are used. In particular it can be a problem for tests of sign predictability. See the discussion following Assumption 4 for further detail.

(4)

request from the author presents details of proofs omitted from the paper to save space.

2. Theoretical results

This section presents su$cient conditions for asymptotic inference about the moments of functions of out-of-sample forecasts and forecast errors. These conditions will su$ce to show, in Theorem 2.3.1, that out-of-sample averages consistently estimate population means, and when appropriately scaled are asymptotically normal. These conditions also su$ce to show, in Theorem 2.3.2, that the limiting covariance structure can be consistently estimated by a straightforward application of Slutsky's theorem.

For any functionf,f

t,q(bKt) will denote the parametric estimate offt`q(bH). Also,

in order to minimize notation,f

t`q will denoteft`q(bH).

2.1. Environment

Throughout it is assumed thatMX

sNT`s/1qis a given sample of observables. The

latter portion of that sample contains a continuous stream ofPs-step ahead forecasts. The"rst forecast,y

R,q(bKR), is based upon a parameter vector estimated

using observationss"1,2,R. Further forecasts,y

t,q(bKt), are each constructed

using an estimated parameter vector that is based on observations

s"1,2,t,R)t)¹,R#P!q. The time period for which thePforecasts

are generated will be referred to as the out-of-sample period.

As in West and McCracken (1998) I will allow for three di!erent forecasting schemes. The recursive, rolling and"xed forecasting schemes di!er in how they construct the sequence of parameter estimates used to construct the sequence of forecasts and forecast errors. A brief description is given below.

Keim and Stambaugh (1986) use the recursive scheme. Under this scheme a sequence of forecasts is generated using updated parameter estimates. At each timet"R,2,¹the parameter estimatebK_tdepends explicitly on all

obser-vables from s"1,2,t. If OLS is used to estimate the parameters from

a scalar linear model with regressors Z

s and predictand ys then bKt"

(t~1+_ts_/1Z

sZ@s)~1(t~1+ts/1Zsys). The"rst forecast is then of the formyR,q(bKR).

The second forecast,y

R`1,q(bKR`1) is constructed similarly using observations

s"1,2,R#1. This process is iteratedPtimes so that for eacht3[R,¹], the

parameter estimates use observationss3[1,t].

Chen and Swanson (1996) use the rolling scheme. Under this scheme the sequence of parametric forecasts is constructed in much the same way as the recursive scheme. The rolling scheme di!ers from the recursive in its treat-ment of observations from the distant past. The rolling scheme uses only a

(5)

1Notice that the"xed and rolling parameter estimates should be subscripted both bytandR. In order to simplify the notation the subscriptRwill be suppressed.

observations are not used in estimating the parameters. If OLS is used to estimate the parameters from a scalar linear model with regressors Z

s and

predictandy

s then bKt"(R~1+ts/t~R`1ZsZ@s)~1(R~1+ts/t~R`1Zsys). This

im-plies that the"rst rolling forecast, y

R,q(bKR), and forecast error are identical to

those for the recursive. The second rolling forecast,y

R`1,q(bKR`1), is constructed

using only observationss"2,2,R#1 to estimate the model parameters. This

implies that the second rolling forecast and forecast error are distinct from those using the recursive scheme. The process is iteratedPtimes such that for each

t3[R,¹] the parameter estimates use observationss3[t!R#1,t].

Kuan and Liu (1995) use the"xed scheme. This method is distinct from the previous two in that the parameters are not updated when new observations become available. Since the parameter vector is estimated only once, each of the P forecasts, y

t,q(bKR), uses the same parameter estimate.1 If OLS is

used to estimate the parameters using regressors Z

s and predictand ys then bK_t"(R~1+_Rs_/1Z

sZ@s)~1(R~1+Rs/1Zsys). Hence for each forecast from time t3[R,¹], the parameter estimate only uses observationss3[1,R].

Since, we are ultimately interested in conducting inference concerning the population moments of functions of parametric forecasts and forecast errors, a description of these functions is in order. The function

f

t,q(bKt),f(q,Xt,bKt) (l]1) (1)

depends upon three arguments. The"rst is a"nite forecast horizon,q*1. The second, X

t, is a "nite dimensioned vector of observables. The dating of the

subscripttis not meaningful. For example, if we are interested in the one-step ahead MAE from a scalar linear regression model,f

t,1(bKt)"Dyt`1!Z@tbKtD. Since

the realized scalar left-hand side variable is y

t`1, and the variables used for

prediction areZ

t,Xt"(yt`1,Z@t)@.

The third argument,bK_t, is an estimate of a (k_]1) unknown parameter vector

bH. When the inference to be conducted is simply a diagnostic of a single parametric model, such as the test of zero median error for which

f

t,1(bKt)"1Myt`1!Z@tbKt)0N, bH is the vector of parameters that index that

particular parametric model. On the other hand, if the inference to be conducted is meant to detect which of two nonnested competing models is more accurate,

bHis formed by stacking the vector of parameters that index each of the two models. For example, suppose that we are interested in comparing the one-step ahead MAE from two scalar nonnested linear regression models. If we let i"1, 2 index the two models (along with their respective regressors and parameter estimates), f

(6)

Given such a function, we are interested in testing (say) the scalar null hypothesis H

0: Eft`q"h0 for some"niteh0. To do so, we will focus on test

statistics of the formXK~0.5P~0.5+_Tt_/_R(f

t,q(bKt)!Eft`q) whereXK is a consistent

estimate of the appropriate limiting variance. In Theorem 2.3.1 it is shown that this statistic is asymptotically standard normal and hence asymptotically valid inference can be conducted using standard normal tables.

2.2. Assumptions

Within the following, for any matrixA,DAD"max

i,jDai,jD,DD.DDQis the¸Qnorm,

sup

t denotes supRxtxT, and forht(b) de"ned in Assumption 1, g

t(b)"[(ft,q(b)!Eft,q(b))@,ht(b)@]@. (2)

Assumption 1. The estimatebK_t satis"esbK_t!_b_H"B(t)H(t), whereB(t) is (k_]q) and H(t) is (q_]1), with (a) B(t)P

a.s.B,B a matrix of rank k, (b)

H(t)"t~1+_ts_/1h

s,R~1+ts/t~R`1hs and R~1+Rs/1hs for the recursive, rolling

and"xed schemes respectively; for the orthogonality conditionh

s,hs(bH), and

(c) Eh s"0.

Assumption 1 provides for a wide range of methods of estimating parameters. In particular, it allows for maximum likelihood, nonlinear least squares and a range of generalized method of moments estimators. It allows for linear and nonlinear models as well as single and multiple equation systems.

As an example of the notation in Assumption 1 consider that our statistic is used to test for equal MAE between two competing linear models. Suppose that each of the two models, fory

t`1, has the representationyt`1"Z@i,tbHi#ui,t`1

fori"1, 2. Consider further that for each i"1, 2 OLS provides a consistent estimate ofbH_i (k

i]1). Since there are two sets of parameters needed to construct

this test, bK_t"(bK@_1,

t,bK@2,t)@(k1#k2"k]1), and hence B (k]q,q"q1#q2,

q

1"k1,q2"k2) and hs(q]1) are B"

A

(EZ1,tZ1,@ t)~1 0k1Cq2

0

k2Cq1 (EZ2,tZ@2,t)~1

B

, h s"

A

u_1, s`1Z1,s u_2,

s`1Z2,s

B

. (3)

Assumption 2. R,PPRas¹PR, and lim

T?=P/R"n, 0)n(R.

(7)

2The same type of problem exists for the Henriksson}Merton test (1981) and the Pesaran}Timmermann test (1992). I use this example to simplify the presentation.

theorem results to both the parameters estimated in-sample and the out-of-sample average of the functionf

t`q.

Assumption 3. For some d'1, (a) Ef

t`q"h0, (b) Xt is strong mixing with

coe$cients of size!2d/(d!1), (c)g

t(bH) is covariance stationary, and (d) for an

open neighborhoodNofbH, sup

tDDsupb|Ngt(b)DD2d(R, (e)Xis p.d. .

Assumption 3 is similar to that in West and McCracken (1998) with two important distinctions. The"rst is that I weaken the moment conditions so that only 2d rather than 4d need exist. This may prove helpful in the context of forecasting excess returns for which there is evidence of leptokurtosis. The second di!erence is that I reduce the order of the mixing coe$cients from

!3d/(d!1) to!2d/(d!1). The covariance stationarity assumption is prim-arily for simplifying the algebra when constructing a consistent estimate of the asymptotic covariance matrix in Theorem 2.3.2.

Assumption 4. For eachi3_M1,2,l#qN: (a) Eg_i_,_t(b) is continuously di!

erenti-able in the neighborhoodN(from Assumption 3) ofbHadmitting a mean value expansion Eg

i,t(b)"Egi,t#(LEgi,t(bI)/Lb)(b!bH) where gi,t is a scalar, b

is (k]1) and bI is on the line betweenb andbH, (b) there exists a "nite con-stant D such that sup

tsupb|NDLEgi,t(b)/LbD(D, and (c) for all t,G"Gt,

LEh

t(b)/LbDb/bH andF"F

t,LEft,q(b)/LbDb/bH.

As we will see in Lemma 2.3.2 I separate the parameter uncertainty from the sampling uncertainty by taking a mean value expansion of Ef

t,q(b)Db/bK_t as in

Randles (1982), rather than off

t,q(b)Db/bK_t as in West (1996). The bound provided

byDsu$ces to show certain terms are o

1(1).

Although Assumption 4(a) is weaker than the di!erentiability condition in West (1996), it is not always satis"ed. A simple counterexample can be construc-ted that is relevant for tests of sign predictability.2 Suppose that

y

t"bHyt~1#ut withut&i.i.d. N(0, 1) andDbHD(1. Let one-step ahead

fore-casts of the form y

tbKt be used to predict yt`1. Consider the function

f

t,1(b)"1Mytb*0N. There are two cases. If bH"0 then Eft`1(bH)"

E1My

tbH*0N"E1M0*0N"1 for all t. But in every open neighborhood of bH"0 there exists a b such that Ef

t,1(b)"E1Mytb*0N"0.5. In this case

Assumption 4(a) fails. On the other hand, if bHO0 then there exists an open neighborhood of bH such that for all b in that neighborhood, Ef

t,1(b)"

E1My

tb*0N"0.5. In this case Assumption 4(a) holds. In the former case the

(8)

our test statistic. In the latter case, not only can the results be applied it is clear thatF"0.

This type of problem does not occur for all tests that use indicator functions. Consider the test of zero median error. Using the same environment as in the preceding example we havef

t,1(b)"1Myt`1!ytb)0N"1Mut`1)yt(b!bH)N.

Taking expectations, and lettingUdenote the standard normal c.d.f., we have Ef

t,1(b)"EU(yt(b!bH)). Since U is continuously di!erentiable Assumption

4(a) holds regardless of the value ofbH. Once again, not only can the results of this paper be applied it is clear thatF"0. See Kim and Pollard (1990, p. 205) for a set of conditions su$cient for continuous di!erentiability of expectations of indicator functions.

Assumption 5. Let N(e)"N(bH,e),Mb3Rk:Db!bHD(eN. There exist "nite constantsC,u'0 andQ*2dsuch that for allN(e)LN(from Assumption 3), sup

tDDsupb|N(e)(gt(b)!gt)DDQ)Cer.

In some circumstances it is straightforward to verify the ¸Q continuity condition in Assumption 5. For example, if the parametric model is linear and

f

t,qis Lipschitz (as is the case for the absolute value function), this assumption is

automatically satis"ed. When indicator functions are used, verifying the condi-tion is more di$cult. It will frequently be the case that Assumption 4 and reasonable assumptions on the continuity of the p.d.f. ofX

t will be needed to

verify the condition.

2.3. Results

In this section I utilize the assumptions of Section 2.2 to show that

P~0.5+T_t_/_R(f

t,q(bKt)!Eft`q) is asymptotically normal with a positive-de"nite

covariance matrixXwhich will usually depend onn"lim

T?=P/R. In order to

construct an asymptotically valid test statistic, I then show that there exists a straightforward and consistent estimatorXK of X.

For the"rst step in the derivation, I borrow a decomposition used by Randles (1982). Let

m_0,_P"P~0.5+T

t/R

(f

t,q(bKt)!Eft,q(b)Db/bK_t!f_t`_q#Ef_t`_q) (4)

and

m_1,

P"P~0.5 T

+

t/R

(f

(9)

such that

P~0.5+T

t/R

(f

t,q(bKt)!Eft`q)"m0,P#m1,P.

This decomposition leads to the following two lemmas upon which limiting normality is based.

Lemma 2.3.1. Given Assumptions 1}5,m_0,_P"o

1(1). Lemma 2.3.2. Given Assumptions 1}5, m_1,

P"[P~0.5+Tt/R(ft`q!Eft`q)#

FBP~0.5+_Tt_/_RH(t)]#o

1(1).

It is now clear how both types of uncertainty are present. Sampling uncertain-ty is the"rst term and parameter uncertainty is the second term in the expansion of Lemma 2.3.2. It is important to note that Lemma 2.3.2 provides the same decomposition as in West and McCracken (1998) with F suitably re-de"ned. If we de"ne C

ff(j)"E(ft`q!Eft`q)(ft`q~j!Eft`q)@,Cfh(j)"

E(f

t`q!Eft`q)h@t`q~j,Chh(j)"Eht`qh@t`q~j,Sff"+=j/~=Cff(j),Sfh"+=j/~=

C

fh(j) andShh"+=j/~=Chh(j) we immediately know that the limiting variance

of the bracketed term on the right-hand side of Lemma 2.3.2 is

X"S

ff#jfh(FBS@fh#SfhB@F@)#jhhFBShhB@F@, (5)

where

Scheme j_fh j_hh

Recursive 1!n~1ln(1#n) 2[1!n~1ln(1#n)] Rolling,n)1 n/2 n!n2/3

Rolling, 1(n(R 1!(2n)~1 1!(3n)~1

Fixed 0 n

(6)

Theorem 2.3.1. Given Assumptions 1}5, (a) P~0.5+T_t_/_R(f

t,q(bKt)!Eft`q) P

dN(0,X) for X dexned in (5), (b) if either F"0 or n"0 then P~0.5+T_t_/_R(f

t,q(bKt)!Eft`q)PdN(0,Sff), and (c)P~1+Tt/Rft,q(bKt)P1Eft`q.

Theorem 2.3.1 shows that the statistic is limiting normal and that out-of-sample averages provide consistent estimates of population moments. The distinction between parts (a) and (b) is exclusively whether or not parameter uncertainty is relevant to the asymptotic covariance. To make this more clear, notice that for all sampling schemesX"S

ffwhen eitherF"0 orn"0. Since

(10)

3West (1996) notes that under the recursive scheme, parameter uncertainty is also irrelevant when j_fh(FBS@_th#S_fh@ B@F@)#j_hhFBS

hhB@F@"0. Also, West and McCracken (1998) show that augmented regression-based tests can remove parameter uncertainty.

orn"0.3If this is the case then the results of Diebold and Mariano (1995) are applicable even though parameters have been estimated.

The"nal step is to construct a consistent estimate of the covariance matrixX. To do so, we need to design consistent estimates ofS

ff,Sfh,B,F,BShhB@,jfh

andj

hh. One can estimateBconsistently by simply using the in-sample

informa-tion from the"nal parameter estimates. Sincen("P/Ris a consistent estimate of

n and both j

fh and jhh are continuous in n, we can use jKfh,jfh(n() and jK

fh,jfh(n() to consistently estimate jfh and jhh. The term BShhB@ is the

asymptotic covariance matrix of the parameter estimates. Since most software packages automatically provide a consistent estimate of this matrix, an es-timator of BS

hhB@ is immediate from the "nal parameter estimates. If this

estimator is unavailable, another option is presented in Theorem 2.3.2. The matrixFis a bit more di$cult to estimate. SinceFvaries withf

t,q, so will

its estimator. In any respect,Fis an expectation and hence Theorem 2.3.1(c) can be used to estimate it.

To clarify the issues in estimatingF, I will brie#y presentF for the test of equal MAE. For the sequel lett_x(x) andW

x(x) denote the marginal p.d.f. and

c.d.f. of a random variablex, and lett

x(xDz) andWx(xDz) denote the conditional

p.d.f. and c.d.f. of a random variable x given the value of another random variablez. Assume that each p.d.f. is continuous and has a bounded density in an open neighborhood of the origin.

The test for equal MAE with q"1 involves the null hypothesis H

0: E(Du1,t`1D!Du2,t`1D)"0. If the two potential models for the predictand

y

t`1 are scalar linear regression models with regressors Z1,t and Z2,t then

the relevant test statistic isXK~0.5P~0.5+_Tt_/_R(f

t,1(bKt)!0) withft,1(bKt)"Dyt`1!

Z@_1,_tbK_1,_tD!_Dy

t`1!Z@2,tbK2,tDandbKt"(bK@1,t,bK@2,t)@. To presentFit is convenient to

de"ne F"(F

1,F2) relative to the partition of the parameter vector

(11)

4It should be noted thatq!1 dependence in the levels of a forecast error does not implyq!1 dependence of a function of those forecast errors. For example, a one-step ahead forecast error may form a martingale di!erence sequence but still exhibit serial correlation in its square. See Harvey et al. (1998) for a discussion.

If we evaluate at the true parameterbHwe have

F negative. If we impose the condition thatX

t is strictly stationary,F"Ft allt.

For the test of equal MAE,F

1 in (8) can be consistently estimated by

FK₁"!P~1+T

t/R

sgn(y

t`1!Z@1,tbK1,t)Z@1,t. (9)

ReintroducingF

2into the discussion (and noticing that there is an extra minus

sign introduced), we can estimate F consistently using FK"(FK₁,FK₂) with

If we are willing to impose the stronger assumption that for eachi"1, 2,u i,t`1,

Similar arguments can be used to deriveFand a consistent estimatorFK for other test statistics. Rather than do so, for the remainder of the paper I will assume that such an estimatorFK exists.

To complete the construction of a consistent estimate of X, we need to generate consistent estimates of S

ff, Sfh and possibly Shh. If ft`q and ht are m-dependent of known order then Assumptions 1}5 su$ce for constructing consistent estimates ofS

ff,SfhandShh(as we will see in Theorem 2.3.2(a)). For

example, when evaluating the q-step ahead predictive ability of two models, Swanson and White (1997) estimateS

ffusing the"rstq!1 sample

autocorrela-tions off

(12)

using a kernel-based estimator. Such an estimator requires imposing conditions on a kernel,K(x), as well as stronger moment and mixing conditions ong

t(b).

Assumption 6. (a) Let K(x) be a kernel such that for all

x,DK(x)D)1,K(x)"K(!x),K(0)"1,K(x) is continuous, and

:=_~=DK(x)Ddx(R, (b) forude"ned in Assumption 5, some bandwidthMand constanti,i3(0, min(u, 0.5)),M"O(Pi), and (c) There existsr63(1, 2] such that (1!i)~1(r6(dand+=

j/1a(r6

~1_~

d~1₎

j (R.

Throughout the following, and for"xedj*0,CK

ff(j)"P~1+Tt/R`j(ft,q(bKt)! fM )(f

t~j,q(bKt~j)!fM )@,CKfh(j)"P~1+Tt/R`j(ft,q(bKt)!fM)h@t`q~j(bKt~j) andCKhh(j)"

P~1+_Tt_/_R`jh

t`q(bKt)ht`@ q~j(bKt~j) where fM"P~1+Tt/Rft,q(bKt). Furthermore, for

j(0,CK

ff(j)"CKff(!j)@,CKfh(j)"CKfh(!j)@, and CKhh(j)"CKhh(!j)@.

Theorem 2.3.2. (a) Under Assumptions 1}5, CK

ff(j)P1Cff(j),CKfh(j)P1Cfh(j),

and CK

hh(j)P1Chh(j). (b) Under Assumptions 1}6, SKff"

+P~1

j/~P`1K(j/M)CKff(j)P1Sff,SKfh"+jP/~~1P`1K(j/M)CKfh(j)P1Sfh andSKhh"

+P~1

j/~P`1K(j/M)CKhh(j)P1Shh.

We now have all the tools necessary to conduct asymptotically valid out-of-sample inference concerning the moments of nonsmooth functions of parametric forecasts and forecast errors. For example, given FK, BK and n( such that

FKP

1F,BKP1Bandn(Pnwe can use Theorem 2.3.2 to createSKff,SKfh,SKhhsuch

that XK"SK

ff#jKfh(FKBKSK@fh#SKfhBK@FK@)#jKhhFKBKSKhhBK@FK@P1X. Then, using

The-orem 2.3.1, we know that s_T,XK~0.5P~0.5+_Tt_/_R(f

t,q(bKt)!h0)P$N(0,Il). If l"1 we can use standard normal tables to test the null. Ifl'1 we can use the fact thats@_Ts_TP

$s2(l) and hence chi-square tables can be used to test the null.

3. Empirical evidence

(13)

the test for equal MAE ignoring parameter uncertainty using the statistic proposed in Diebold and Mariano (1995). Finally, I construct the test for equal MSE ignoring the potential e!ects of parameter uncertainty. Under the assump-tion that OLS provides consistent estimates of the parameters, West (1996) has shown that one can ignore parameter uncertainty when testing for equal MSE.

3.1. Data and sources

The sample period includes 519 monthly observations from 1954:01 to 1997:03. The starting point 1954:01 is chosen to avoid the Treasury-Fed Accord to peg interest rates. It is also the "rst month for which monthly frequency observations for dividend yield exist for the S & P 500 composite.

I use the closing value of the S & P 500 composite as of the"nal Wednesday of the month as the stock price (P

t). These are obtained from Standard and Poor's

Current Statistics (1997) and Security Price Index Record (1997). The one-month risk-free rate (I

t), used to construct excess returns, is the US Treasury Bill

series obtained from Ibbotson Associates (1997). Using these two series I con-struct excess returns as Return

t"(Pt#Dt!Pt~1)/Pt~1!It~1. Standard and

Poor's Statistical Service does not publish the monthly dividend series (D t).

I construct one by summing the present and previous three quarter aggregate dividends and dividing by 12. Pesaran and Timmermann (1995) also use this technique.

The two predictors are dividend yield (D>

t~1) and the earnings}price ratio

(EP

t~1). To insure that the predictors are truly ex ante I do not use the dividend

series (D

t) constructed above since it includes information through the end of the

present quarter. Instead, I use the dividend yield as reported in the Standard and Poor's Security Price Index Record at the end of each month. For the same reasons, I use the inverse of the price}earnings ratio rather than construct an earnings}price ratio using quarterly information on earnings.

Table 1 reports standard descriptive statistics regarding OLS regressions that use the dividend-yield or the earnings}price ratio as predictors. Each regression exhibits little linear predictability. The residuals in each regression have distri-butions that are skewed and heavy tailed. The residuals exhibit little serial correlation but are conditionally heteroskedastic in the regressors and exhibit ARCH-type behavior.

3.2. Methodology and results

Let the scalary

t`1denote Returnt`1and letZ1,tandZ2,t denote the (2]1)

vectors (1, DY

t)@and (1, EPt)@, respectively. We are interested in comparing the

predictive ability of the two simple linear regression models

y

(14)

Table 1

Summary statistics for full sample regressions of excess returns to S&P 500 composite!

PanelA: Unrestricted linear regression using both dividend yield and earnings-price ratio

Predictors Constant DY EP R2"0.0079

DW"1.9416

Coe$cient !0.0115 0.0099 !2.6716 Skewness coe$cient"!0.3539 (S.E.) (0.0084) (0.0054) (2.1461) Kurtosis coe$cient"2.0389 LM test for heteroskedasticity in residuals: s2(5)"10.7396 withp-value"0.0567 LM test for serial correlation in residuals: s2(12)"13.0227 withp-value"0.3674 LM test for serial correlation in squared

residuals:

s2(12)"26.6380 withp-value"0.0087 PanelB: restricted linear regression using dividend yield

Predictors Constant DY R2"0.0048

DW"1.9399

Coe$cient !0.0058 0.0032 Skewness coe$cient"!0.3714 (S.E.) (0.0079) (0.0022) Kurtosis coe$cient"1.9942 LM test for heteroskedasticity in residuals: s2(2)"6.0936 withp-value"0.0475 LM test for serial correlation in residuals: s2(12)"12.8804 withp-value"0.3778 LM test for serial correlation in squared

residuals:

s2(12)"27.5580 withp-value"0.0064 PanelC: restricted linear regression using earnings-price ratio

Predictors Constant EP R2"0.0020

DW"1.9406

Coe$cient 0.0052 0.7589 Skewness coe$cient"!0.3568 (S.E.) (0.0061) (0.8570) Kurtosis coe$cient"2.0280 LM test for heteroskedasticity in residuals: s2(2)"8.4639 withp-value"0.0145 LM test for serial correlation in residuals: s2(12)"12.6658 withp-value"0.3938 LM test for serial correlation in squared

residuals:

s2(12)"27.3262 withp-value"0.0069

!Notes: The data consist of monthly observations from 1954:01 to 1997:03 (¹"519). See Section 3 of the text for a description of the data. Standard errors are constructed using a heteroskedasticity robust covariance matrix. The skewness and kurtosis coe$cients are constructed using the regres-sion residuals.

The parameters are estimated using OLS and then the parameter estimates

bK_i_,_t are used to construct the forecastsZ@ i,tbKi,t.

In this exercise I construct each of the three test statistics nine di!erent ways corresponding to three di!erent forecasting schemes (recursive, rolling and "xed) and three di!erent splits of the data. I use the three sample splits (54:01}89:12, 90:01}97:03), (54:01}79:12, 80:01}89:12) and (54:01}79:12, 80:01}97:03). Given these splits, the corresponding values ofn("P/Rare 0.20, 0.38 and 0.66.

(15)

5I use the integer part ofP1@3as the window width.

of heteroskedasticity or serial correlation. I use a Newey}West (1987) serial correlation consistent covariance estimator of S

ff, Sfh and Shh.5 I use the

out-of-sample forecast errors and out-of-sample values ofy

t`1,Z1,t and Z2,t

in the construction of f

t,1(bKt)"Dyt`1!Z@1,tbK1,tD!Dyt`1!Z@2,tbK2,tD and h

t,1(bKt)"[(yt`1!Z@1,tbK1,t)Z1,@ t,(yt`1!Z@2,tbK2,t)Z@2,t]@. I estimate B using the

out-of-sample observations onZ

1,t andZ2,t to form the (4]4) block diagonal

matrix with (P~1+_Tt_/

RZ1,tZ@1,t)~1 in the upper (2]2) diagonal position and

(P~1+_Tt_/

RZ2,tZ@2,t)~1in the lower (2]2) diagonal position. To estimateFI use

(9) and (10) directly.

The test of equal MAE is constructed a second time ignoring parameter uncertainty. This time the variance is estimated only using an estimate ofS

ff.

The estimate was identical to the one used above.

For the sake of comparison, the test of equal MSE was also constructed. For this test, f

t,1(bKt)"(yt`1!Z@1,tbK1,t)2!(yt`1!Z@2,tbK2,t)2. Under the

assump-tions that OLS provides consistent estimates of the parameters,F"0. Using the results in West (1996) we then know that we can ignore parameter uncertainty when estimating the asymptotic variance. In estimating S

ff, I presume no

knowledge regarding the existence of serial correlation. Once again I use the Newey}West (1987) estimator.

Table 2 reports the results of the tests. Each subpanel corresponds to one of the sample splits. The"rst four columns report the raw out-of-sample MAE and MSE associated with each of the two predictive models. The MAE values are scaled by 100 and the MSE values are scaled by 1000. Note that in every instance, the MAE and MSE is larger for model 1 than for model 2.

Column 5 reports the test for equal MAE that accounts for parameter uncertainty. Column 6 reports the test for equal MAE that ignores parameter uncertainty. In every instance, accounting for parameter uncertainty increases the magnitude of the estimated variance. This causes the statistics that account for parameter uncertainty to be uniformly smaller than the ones that do not. This e!ect can also be seen in thep-values reported in columns 8 and 9. Because of these changes, there are instances in which accounting for parameter uncer-tainty can a!ect the decision to reject or fail to reject the null of equal MAE.

During the 1980s, there does not appear to be any di!erence in the predictive ability using either of the two models. This holds whether we use MAE or MSE as the measure of predictive ability. For this time frame, accounting for para-meter uncertainty made little di!erence in the tests for equal MAE.

(16)

Table 2

Testing for relative predictive ability of predictions of excess returns to S&P 500 composite!

Raw values Statistics P-values (2-sided)

Adj. UnAdj. Adj. UnAdj.

MAE-1 MAE-2 MSE-1 MSE-2 MAE MAE MSE MAE MAE MSE

90:01}97:03:n("0.20

Recursive 2.691 2.614 1.183 1.153 1.990 2.565 1.726 0.047 0.010 0.084 Rolling 2.688 2.634 1.183 1.163 1.541 2.354 1.455 0.123 0.019 0.146 Fixed 2.724 2.619 1.199 1.154 1.899 2.540 1.765 0.058 0.011 0.078 80:01}89:12:n("0.38

Recursive 3.551 3.536 2.280 2.271 0.506 0.515 0.339 0.613 0.606 0.734 Rolling 3.552 3.551 2.284 2.282 0.046 0.048 0.115 0.963 0.962 0.909 Fixed 3.557 3.534 2.270 2.253 0.539 0.590 0.432 0.590 0.555 0.666 80:01}97:03:n("0.66

Recursive 3.189 3.148 1.819 1.801 1.724 1.844 0.987 0.085 0.065 0.324 Rolling 3.203 3.180 1.828 1.817 1.154 1.326 0.768 0.249 0.185 0.442 Fixed 3.228 3.153 1.831 1.792 1.610 2.223 1.364 0.107 0.026 0.173

!Notes: Table 2 reports empirical results relevant to testing for equal MAE and equal MSE between two models used to predict the S&P 500 composite portfolio. Model 1 is an OLS estimated linear regression with an intercept and once lagged dividend yield. Model 2 is the same but uses the earnings}price ratio. The

"rst four columns report the realized out-of-sample values of the MAE and MSE associated with each model during three di!erent forecast periods. Column 5 reports the values of the test for equal MAE adjusted (Adj.) for parameter uncertainty. Columns 6 and 7 report the values of the statistics used to construct the tests for equal MAE and equal MSE both ignoring parameter uncertainty (UnAdj.). Columns 8}10 report thep-values (2-sided, from the standard normal distribution) associated with the statistics in columns 5}7. MAEs are scaled by 100. MSEs are scaled by 1000.

forecasting schemes. We do reject at the 5% when the recursive scheme is used but the evidence is weaker than when parameter uncertainty is ignored. Notice that during the 1990s the test for equal MSE fails to reject the null of equal predictive ability at the 5% for any of the sampling schemes. The null can be rejected at the 10% level when either the recursive or"xed schemes are used.

Similar observations can be made regarding the tests for equal MAE through-out both the 1980s and 1990s. Ignoring parameter uncertainty, the"xed scheme rejects the null at the 5% level. When parameter uncertainty is accounted for we fail to reject at even the 10% level. When the rolling scheme is used we fail to reject the null at the 10% level regardless of parameter uncertainty. When the recursive scheme is used we reject the null at the 10% level regardless of parameter uncertainty. Over the same time frame, we fail to reject the null for equal MSE when any of the forecasting schemes are used.

(17)

clear is whether it was necessary to account for parameter uncertainty in the"rst place. Recall that if F"0 then parameter uncertainty is asymptotically irrel-evant and henceS

ff is the relevant asymptotic variance. For the test of equal

MAE,Fcan be zero if the disturbances have a zero median conditional on the values of the regressors. In this application it seems reasonable to reject that assertion. Using the skewness coe$cients reported in Table 1, the null of zero skewness is rejected at the 1% level for each of the three sets of residuals.

4. Simulation evidence

The asymptotic results of Section 2 need only be appropriate for large in-sample sizes R and out-of-sample sizes P. It is not clear how well the asymptotic approximation will perform in sample sizes commonly used in empirical work. To examine this problem, I present simulations of the three tests of either equal MAE or equal MSE between the two simple linear regressions

y

t`1"Z@1,tb1H#u1,t`1 and yt`1"Z@2,tbH2#u2,t`1, (12@)

whereZ

1,tandZ2,t denote the (2]1) vectors (1,z1,t)@and (1,z2,t)@, respectively.

Each statistic is constructed in precisely the same manner as in Section 3. For each statistic I report the size and size-adjusted power of the test in samples of the size used in Section 3.

First, I simulate a hypothetical data generating processes that is stylized to the empirical results of Section 3. The data generating process I have chosen has the representation

y

t`1"bH2,2z2,t`1#ut`1, ut`1"c(x1,t`1#x2,t`1)#gt`1,

x

i,t`1"2~0.5[(1!a2)z2i,t`1!1], zi,t`1"azi,t#ei,t`1, (13)

e

i,t`1&i.i.d. N(0, 1) gt`1&i.i.d.t(6), e1,t`1oe2,t`1ogt`1,

c"0.25,a"0.9.

The parameterbH_2,2(the second component ofbH₂) is a tuning parameter used to distinguish between the null and alternative. WhenbH_2,2"0 the null of either equal MAE or equal MSE is satis"ed. When bH_2,2O0 the alternative holds; model two has both a lower MAE and lower MSE than does model 1. I allow this parameter to vary across the range 0, 0.10, 0.25, 0.50 and 1.00. By doing so I am better able to determine how accounting for parameter uncertainty a!ects the power of the test. I am also better able to determine whether tests for equal MAE or tests for equal MSE are more powerful for detecting small deviations from the null.

The initial conditions for thez

i,tare drawn from their unconditional

distribu-tion. The y

(18)

500#519"500#(¹#1)"1019. The initial 500 observations are generated to burn out the e!ects of initial conditions.

The results are based upon 5000 replications. Note that the same simulated data is used for each sampling scheme and each (P,R) combination in order to facilitate the di!erent small sample comparisons. To make comparisons possible across the three hypothesis tests the random number generator is seeded so that the three sets of 5000 separate samples are the same.

I chose this data generating process for two basic reasons. The"rst is that it exhibits many of the characteristics of the data used in Section 3. The regressors exhibit strong serial dependence. The distribution of the predictand, y

t, has

heavy tails and is skewed. The residuals from the two predictive models will exhibit conditional heteroskedasticity in the regressors. Also, since the re-gressors are serially correlated, the squares of the residuals from the two predictive models will be serially correlated (i.e. GARCH(1, 1)-like e!ects).

The second reason is that I wanted the two linear models to have little, if any, predictive ability in order to match the very smallR2values commonly observed in the literature. Here, both predictive models have a populationR2of zero. This occurs because bothbH₁andbH₂are zero under the null. This implies that both models have the same predictive ability and hence the null is satis"ed.

But it does more than that. It implies thatf

t`1andSffare equal to zero when

either MAE or MSE is used to measure predictive ability. This does not imply that an asymptotically standard normal test for equal predictive ability cannot be constructed. For there to be asymptotic normality the limiting variance,X, must be positive de"nite. If parameter uncertainty is irrelevant thenS

ffmust be

positive de"nite. On the other hand, if f

t`1 is zero for all t, and hence

S

ff_{For the test of equal MAE in this exercise, and when parameter uncer-}"FBS@fh"0, thenFBShhB@F@must be positive de"nite.

tainty is accounted for, this is not a problem. This occurs because the dis-turbances are skewed for each predictive model and hence F"

(!E sgn(u

1,t`1)Z@1,t, E sgn(u2,t`1)Z@2,t)@O0. It is a problem for the test of equal

MSE since consistent estimation of the parameters by OLS implies

F"(!Eu

1,t`1Z@1,t, Eu2,t`1Z@2,t)@"0. Hence, a priori we expect the test for

equal MAE, corrected for parameter uncertainty, to be reasonably sized. We also expect the test for equal MAE, without the correction for parameter uncertainty, and the test for equal MSE to be missized.

Table 3 reports the actual size of the three tests when the critical values

(19)

Table 3

Actual size of out-of-sample tests!

Valid MAE Invalid MAE Invalid MSE

1% 5% 10% 1% 5% 10% 1% 5% 10%

n("0.20 R 0.0118 0.0754 0.1416 0.0758 0.2048 0.3032 0.0642 0.2220 0.3382 L 0.0120 0.0784 0.1484 0.0670 0.1974 0.2986 0.0578 0.2116 0.3336 F 0.0134 0.0720 0.1504 0.1130 0.2556 0.3542 0.1048 0.2798 0.3916

n("0.38 R 0.0072 0.0418 0.0968 0.0434 0.1624 0.2604 0.0344 0.1704 0.2910 L 0.0082 0.0476 0.1064 0.0434 0.1536 0.2484 0.0344 0.1604 0.2722 F 0.0046 0.0362 0.0898 0.1046 0.2522 0.3462 0.0908 0.2632 0.3796

n("0.66 R 0.0036 0.0328 0.0768 0.0386 0.1382 0.2224 0.0324 0.1388 0.2376 L 0.0056 0.0458 0.0976 0.0330 0.1188 0.1984 0.0250 0.1200 0.2142 F 0.0012 0.0194 0.0596 0.1074 0.2626 0.3524 0.0904 0.2618 0.3770

!Notes: Subpanels denotedn("0.20, 0.38 and 0.66 indicate sample sizes and splits corresponding to those used in the empirical results reported in Table 2. Columns denoted 1%, 5% and 10% present the actual size of the test when the critical values $2.576, $1.96 and $1.645 are used, respectively. Rows denoted R, L and F signify the use of the Recursive, roLLing and Fixed schemes, respectively. The results are based upon 5000 replications. See Section 4 for further details.

true for the"xed scheme. Overall it seems that the valid version of the test for equal MAE is reasonably sized for smaller values ofn( while the two invalid tests are seriously oversized for all values of n(. Corradi et al. (1999) also "nd that smaller values ofn( lead to more accurately sized tests.

Table 4 presents the actual size-adjusted power of the three tests. Each panel corresponds to a particular choice of the parameterbH_2,2. In each case, theR2for model 1 is zero but theR2for model 2 ranges from 0.61, 0.15, 0.04 to 0.003 as

bH_2,2 varies from 1.00, 0.50, 0.25 to 0.10. In the"rst two panels the size-adjusted power is quite good for each of the three tests. The asymptotically valid test for equal MAE is best followed by its invalid version and then the test for equal MSE. It also appears that larger values ofn( are associated with greater power for each of the tests.

(20)

Table 4

Size adjusted power of out-of-sample tests!

Valid MAE Invalid MAE Invalid MSE

1% 5% 10% 1% 5% 10% 1% 5% 10%

PanelA:bH_2,2"1.00,R2"0.61

n("0.20 R 1.0000 1.0000 1.0000 0.9998 1.0000 1.0000 0.9874 0.9984 0.9996 L 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9876 0.9988 0.9998 F 1.0000 1.0000 1.0000 0.9982 1.0000 1.0000 0.9602 0.9958 0.9988 n("0.38 R 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9910 1.0000 1.0000 L 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9918 1.0000 1.0000 F 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9652 0.9966 0.9998 n("0.66 R 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9984 1.0000 1.0000 L 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9988 1.0000 1.0000 F 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9896 0.9994 1.0000

PanelB:bH_2,2"0.50,R2"0.15

n("0.20 R 0.9882 0.9962 0.9986 0.9486 0.9856 0.9930 0.8976 0.9698 0.9864 L 0.9870 0.9968 0.9988 0.9564 0.9874 0.9936 0.8974 0.9740 0.9876 F 0.9856 0.9950 0.9976 0.8794 0.9744 0.9874 0.7994 0.9466 0.9748 n("0.38 R 0.9978 0.9992 0.9998 0.9874 0.9972 0.9986 0.9384 0.9896 0.9952 L 0.9972 0.9992 0.9994 0.9904 0.9972 0.9986 0.9410 0.9904 0.9960 F 0.9970 0.9986 0.9994 0.9488 0.9912 0.9964 0.8512 0.9614 0.9856 n("0.66 R 1.0000 1.0000 1.0000 0.9990 1.0000 1.0000 0.9878 0.9994 0.9996 L 0.9998 1.0000 1.0000 0.9992 1.0000 1.0000 0.9900 0.9990 0.9996 F 1.0000 1.0000 1.0000 0.9982 0.9992 0.9996 0.9470 0.9912 0.9982

PanelC:bH_2,2"0.25,R2"0.04

n("0.20 R 0.7482 0.7858 0.8478 0.3836 0.6224 0.7182 0.3582 0.5636 0.6822 L 0.6274 0.7836 0.8376 0.4084 0.6422 0.7290 0.3580 0.5830 0.6950 F 0.6264 0.7760 0.8304 0.2614 0.5356 0.6676 0.2534 0.4874 0.6166 n("0.38 R 0.7482 0.8730 0.9066 0.5360 0.7412 0.8168 0.4406 0.6790 0.7676 L 0.7162 0.8626 0.9032 0.5568 0.7404 0.8208 0.4438 0.6796 0.7804 F 0.7388 0.8514 0.8910 0.3496 0.6178 0.7278 0.3060 0.5464 0.6624 n("0.66 R 0.9272 0.9618 0.9746 0.7830 0.9078 0.9412 0.6688 0.8490 0.8952 L 0.9020 0.9590 0.9728 0.7846 0.9130 0.9514 0.6794 0.8622 0.9098 F 0.8998 0.9438 0.9606 0.6404 0.8080 0.8728 0.4730 0.7216 0.8160

PanelD:bH_2,2"0.10,R2"0.003

n("0.20 R 0.1182 0.2426 0.3312 0.0544 0.1510 0.2268 0.0588 0.1484 0.2288 L 0.1114 0.2338 0.3180 0.0598 0.1634 0.2298 0.0574 0.1574 0.2350 F 0.1172 0.2424 0.3288 0.0332 0.1214 0.2002 0.0386 0.1216 0.1930 n("0.38 R 0.1228 0.2772 0.3652 0.0618 0.1792 0.2644 0.0518 0.1690 0.2484 L 0.0948 0.2522 0.3410 0.0650 0.1692 0.2652 0.0510 0.1568 0.2512 F 0.1264 0.2698 0.3520 0.0342 0.1200 0.2054 0.0354 0.1120 0.1888 n("0.66 R 0.2076 0.3622 0.4552 0.1012 0.2498 0.3498 0.0862 0.2356 0.3202 L 0.1578 0.3160 0.4142 0.0924 0.2452 0.3522 0.0818 0.2270 0.3300 F 0.2002 0.2424 0.4532 0.0658 0.1704 0.2618 0.0484 0.1552 0.2434

(21)

The simulations indicate that correcting for parameter uncertainty can im-prove the size of tests for equal MAE when parameter uncertainty is asymp-totically relevant. They also indicate that tests for equal MAE can be better at detecting small deviations from the null than can tests for equal MSE. This is especially important given that linear models tend to have low levels of predic-tive ability for excess returns to many assets.

5. Conclusion

In this paper, I show that when parameters are used to construct forecasts and forecast errors, parameter uncertainty can a!ect the limiting distribution of nonsmooth out-of-sample measures of predictive ability. Section 2 presents su$cient conditions for scaled out-of-sample averages of nondi!erentiable func-tions of forecasts and forecast errors to be asymptotically normal. For these functions I show that the limiting covariance structure can be consistently estimated in a straightforward manner.

I then consider how well these statistics perform in moderate sample sizes and how important it is to account for parameter uncertainty when estimating the limiting covariance. The empirical exercise in Section 3 indicates that at times, the correction for parameter uncertainty can lead to di!erent conclusions regarding the predictive ability of a model. The simulation exercise in Section 4 shows that the tests can be well-sized if one accounts for parameter uncertainty. For the test of equal MAE the test was more accurately sized and more powerful when the covariance was estimated accounting for parameter uncertainty than it was ignoring the parameter uncertainty. The simulations also indicate that the test for equal MAE may be a better choice than the test for equal MSE in detecting small deviations in predictive ability between two forecasting models. There are several possible topics for future research concerning out-of-sample inference. Perhaps, the most important would be to develop a general theory for the out-of-sample comparison of nested models; such a theory would have applications to tests of causality (Ashley et al., 1980) and the martingale di! er-ence hypothesis. Such a theory could also be extended to the out-of-sample comparison of multiple nested models. In either case it would be useful to allow for models with stationary or nonstationary observations. Secondly, since power of the test is of primary importance it would be helpful to determine the optimal choice of sample split for maximizing power of the test.

Acknowledgements

(22)

to John Jones, Stephen Sapp and Tricia Gladden for their suggestions. An earlier draft of this paper was distributed using the title&Out-of-Sample Infer-ence for Moments of Nondi!erentiable Functions'.

Appendix A

Notation: sup

tdenotes supRxtxT;&var'and&cov'denote variance and covariance;

all limits are taken as ¹ goes to in"nity; the summation +_t denotes +_Tt_/_R; for Lemmas A.2, A.3 and 2.3.1 N(e) denotes the open ball N(bH,e) about bH

generated by the max norm; f

t(c) denotes ft,q(c)!Eft,q(c)!ft`q#Eft`q; For

notational simplicity, I consider throughout the case in whichk"1,l"1 and

q"1 so thatbH, f

t,q andht are scalars.

Lemma A.1. Fora3[0, 0.5), (a) sup

tDPaH(t)DP10; and (b) suptDPa(bKt!bH)DP10. Lemma A.2. For all a3[0, 0.5) and e'0 such that N(P~ae)LN and

0(P~ae(1, there exist constants 0(CI(R and u

0'0 such that (a)

sup

tDDsupc|N(P~a_e)f_t(c)DD₂_d)CI(P~ae)r0; and (b) for all integers j,

sup

tDE supM_c₀_,c₁N_|_N₍_P~a_e)f_t(c₀)f_t`j(c₁)D)CI a_j(d~1)@d(P~ae)r0. Lemma A.3. For xxed j, CK

ff(j)P1Cff(j), CKfh(j)P1Cfh(j) and CK

hh(j)P1Chh(j).

Proof of Lemma A.3. Consider CK

ff(j)"P~1+Tt/R`j(ft,q(bKt)! fM )(f

t~j,q(bKt~j)!fM ). The other autocovariances can be handled similarly. By

adding and subtracting terms we have

CK

ff(j)"P~1 T

+

t/R`j

(f

t`q!Eft`q)(ft`q~j!Eft`q)#rT, (A.1)

where

r

T"P~1 T

+

t/R`j

(f

t,q(bKt)!ft`q(bH))(ft`q~j!Eft`q)#(Eft`q!fM )2 #(Ef

t`q!fM )P~1

T

+

t/R`j

(f

t`q~j!Eft`q) #P~1 +T

t/R`j

(f

(23)

#P~1 +T

Since the"rst term of (A.1) converges in probability toC

ff(j) by White (1984,

Corollary 3.48), I need only show thatr

Tconverges in probability to zero. Using

the triangle and the Cauchy}Schwarz inequalities it is straightforward to show that the absolute value of (A.2) must be less than or equal to

r8

To facilitate reference to Theorem 2.3.2, it is useful to show that for all

(24)

for all e'0 and 0(i/(2u))a(0.5 there exists ¹

The remainder of the proof is to show that there exists ¹

1 such that for all ¹'¹

1the"rst term on the r.h.s. of (A.4) is less thand/2. Applying Markov's

inequality and Assumption 5 we have

(25)

#Prob

A

sup

The remainder of the proof then is to show that there exists¹

1'¹0such that

for all ¹'¹

1 the "rst term on the r.h.s. of (A.5) is less than d/2. For the

remainder of this proof only, let +_j denote +_~_P`₁_x_j_E₀_x_P_~1. Applying Chebyshev's inequality we have

e2₀Prob

A

sup

(26)

D

1"+=j/0ja(jd~1)@d and hence D2"+=j/0a(jd~1)@d are positive and "nite. If

I choose¹

1andesuch that for all¹'¹1,e((dPar0e20/2CI(D1#P~1D2))1@r0

and 0(P~ae(1, the result follows. h Proof of Lemma 2.3.2. Expanding Ef

t,q(bKt) aboutbHwe have It then su$ces to show that the latter three terms of (A.8) are o

1(1). Using the

triangle inequality we know that the absolute value of the latter three terms in (A.8) are less than or equal to

(27)

Since DFD and DBD are "nite, sup

tDB(t)!BD"o1(1) by Lemma A.1(a),

and sup

tDLEft,q(bIt)/Lb!FD"o1(1) by the continuity of LEft,q(b)/Lb and

Lemma A.1(b), the result will follow ifP~0.5+_tDH(t)D)sup

tP0.5DH(t)D"O1(1).

I will show this for the recursive scheme, the"xed follows immediately and the rolling follows from a decomposition similar to that in Lemma A.1. From Hall and Heyde (1980, p. 20) and the recursive proof in Lemma A.1,h

tis a mixingale

satisfying E[sup

1xsxTD(h1#2#hs)2D])c¹for a constantc. But

PE

C

sup

t

Dt~2(h₁#2#h

t)2D

D

)PR~2E

C

sup t

D(h₁#2#h

t)2D

D

)PR~2E

C

sup

1xsxT D(h

1#2#hs)2D

D

which is less than or equal toPR~2c¹which in turn converges tocn(1#n) and hence sup

tP0.5DH(t)D"O1(1) by Markov's inequality. h Proof of Theorem 2.3.1. (a) LetX(¹),P~0.5+_t(f

t`q!Eft`q#FBH(t)). From

Lemmas 2.3.1 and 2.3.2 we knowP~0.5+_t(f

t,q(bKt)!Eft`q)"X(¹)#o1(1), with

lim var[X(¹)]"X. Asymptotic normality then follows from Theorem 3.1 of Wooldridge and White (1998). Details are in the additional appendix. (b) Follows immediately from (a) and (5) in the text. (c) Follows immediately from (a) and (b). h

Proof of Theorem 2.3.2. The proof of (a) is immediate from Lemma A.3. The proof of (b) requires more detail. The proof will be provided forSK_ff, the others follow from similar arguments. Using the decomposition in Lemma A.3 we have

SK_ff" P₊~1

j/~P`1

K(j/M)CK ff(j)

" P₊~1

j/~P`1

K(j/M)

G

P~1 +T

t/R`j

(f

t`q!Eft`q)(ft`q~j!Eft`q)

H

# P₊~1

j/~P`1

K(j/M)r

T (A.9)

for r

T de"ned in (A.2). The "rst right-hand side term in (A.9) converges in

probability toS

ff by Hansen (1992, Theorem 1). It then remains to be shown

that the second term in (A.9) is o

1(1). SinceCff(j)"Cff(!j)@it is su$cient to

show this for the+P_j_/0~1K(j/M)r

Tportion of the second term. Sincer8T(de"ned in

(28)

utilized to obtain

K

P~1

+

j/0

K(j/M)r T

K

)

P~1

+

j/0

DK(j/M)Dr8_T

)(M/Pi)(M~1P+~1

j/0

DK(j/M)D)Pir8

T. (A.10)

By assumption (M/Pi)"O

1(1) and (M~1+Pj/0~1DK(j/M)D)P:=0DK(x)Ddx(R.

The result follows since by the proof of Lemma A.3,r8

T is o1(P~i). h

References

Akgiray, V., 1989. Conditional heteroscedasticity in time series of stock returns: evidence and forecasts. Journal of Business 62, 55}80.

Ashley, R., Granger, C.W.J., Schmalensee, R., 1980. Advertising and aggregate consumption: an analysis of causality. Econometrica 48, 1149}1167.

Campbell, J.Y., Shiller, R.J., 1988. Stock prices, earnings, and expected dividends. The Journal of Finance 43, 661}676.

Chen, X., Swanson, N.R., 1996. Semiparametric ARX neural network models with an application to forecasting in#ation. Working Paper, University of Chicago and Pennsylvania State University. Corradi, V., Swanson, N.R., Olivetti, C., 1999. Predictive ability with cointegrated variables,

manuscript, Texas A&M University.

Davidson, R., 1994. Stochastic Limit Theory. Oxford University Press, New York.

Diebold, F.X., Mariano, R.S., 1995. Comparing predictive accuracy. Journal of Business and Economic Statistics 13, 253}263.

Engel, C., 1994. Can the Markov switching model forecast exchange rates? Journal of International Economics 36, 151}165.

Fair, R.C., Shiller, R.J., 1990. Comparing information in forecasts from econometric models. The American Economic Review 80, 375}389.

Fama, E.F., 1991. E$cient capital markets: II. The Journal of Finance 46, 1575}1617.

Fama, E.F., French, K.R., 1988. Dividend yields and expected stock returns. Journal of Financial Economics 22, 3}25.

Gerlow, M.E., Irwin, S.H., Liu, T., 1993. Economic evaluation of commodity price forecasting models. International Journal of Forecasting 9, 387}397.

Granger, C., 1969. Prediction with a generalized cost of error function. Operational Research Quarterly 20, 199}207.

Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and Its Application. Academic Press, New York.

Hansen, B.E., 1992. Consistent covariance matrix estimation for dependent heterogeneous pro-cesses. Econometrica 60, 967}972.

Harvey, D.I., Leybourne, S.J., Newbold, P., 1998. Forecast evaluation tests in the presence of ARCH. Manuscript, Loughborough University and University of Nottingham.

Henriksson, R.D., Merton, R.C., 1981. On market timing and investment performance II: statistical procedures for evaluating forecasting skills. Journal of Business 54, 513}533.

Ibbotson Associates, 1997. In: Kaplan, P.D. (Ed.), Stocks, Bonds, Bills and In#ation: 1997 Yearbook. R.G. Ibbotson Associates, Chicago.

(29)

Kim, J., Pollard, D., 1990. Cube root asymptotics. The Annals of Statistics 18, 191}219. Kuan, C., Liu, T., 1995. Forecasting exchange rates using feedforward and recurrent neural

networks. Journal of Applied Econometrics 10, 347}364.

Meese, R.A., Rogo!, K., 1983. Empirical exchange rate models of the seventies: do they"t out of sample? Journal of International Economics 14, 3}24.

Newey, W.K., West, K.D., 1987. A simple positive semi-de"nite, heteroskedasticity and autocorrela-tion consistent covariance matrix. Econometrica 55, 703}708.

Pagan, A., Schwert, G.W., 1990. Alternative models for conditional stock volatility. Journal of Econometrics 45, 267}290.

Pesaran, M.H., Timmermann, A., 1992. A simple nonparametric test of predictive performance. Journal of Business and Economic Statistics 10, 561}565.

Pesaran, M.H., Timmermann, A., 1995. Predictability of stock returns: robustness and economic signi"cance. The Journal of Finance 50 (4), 1201}1228.

Randles, R.H., 1982. On the asymptotic normality of statistics with estimated parameters. The Annals of Statistics 10, 463}474.

Shiller, R.J., 1984. Stock prices and social dynamics. Brookings Papers on Economic Activity 2, 457}510.

Standard and Poor's Current Statistics, 1997, September. McGraw-Hill, New York. Standard and Poor's Security Price Index Record, 1997. McGraw-Hill, New York.

Stekler, H.O., 1991. Macroeconomic forecast evaluation techniques. International Journal of Fore-casting 7, 375}384.

Swanson, N.R., White, H., 1995. A model}selection approach to assessing the information in the term structure using linear models and arti"cial neural networks. Journal of Business and Economic Statistics 13, 265}275.

Swanson, N.R., White, H., 1997. A model}selection approach to real-time macroeconomic forecast-ing usforecast-ing linear models and arti"cial neural networks. The Review of Economics and Statistics 79, 540}550.

Weiss, A.A., 1996. Estimating time series models using the relevant cost function. Journal of Applied Econometrics 11, 539}560.

West, K.D., 1996. Asymptotic inference about predictive ability. Econometrica 64, 1067}1084. West, K.D., McCracken, M.W., 1998. Regression-based tests of predictive ability. International

Economic Review 39, 817}840.

White, H., 1984. Asymptotic Theory for Econometricians. Academic Press, New York. White, H., 2000. A reality check for data snooping. Econometrica, in preparation.