Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji] Date: 11 January 2016, At: 22:27
Journal of Business & Economic Statistics
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20
Rejoinder
Andrew J. Patton & Allan Timmermann
To cite this article: Andrew J. Patton & Allan Timmermann (2012) Rejoinder, Journal of Business & Economic Statistics, 30:1, 36-40, DOI: 10.1080/07350015.2012.634354
To link to this article: http://dx.doi.org/10.1080/07350015.2012.634354
Published online: 22 Feb 2012.
Submit your article to this journal
Article views: 97
Rejoinder
Andrew J. P
ATTONDepartment of Economics, Duke University, 213 Social Sciences Building, Box 90097, Durham, NC 27708 ([email protected])
Allan T
IMMERMANNUniversity of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 ([email protected])
1. INTRODUCTION
We thank our discussants for their stimulating and thoughtful comments on our article. Each of the five discussions raises an interesting set of points, and across all of them a few key themes arise. The following sections discuss each of these topics in turn.
2. INSTABILITIES IN THE DATA GENERATING PROCESS
Croushore and Rossi both raise the question of how tests of forecast rationality fare in the presence of instabilities in the data generating process for the target variable or the forecasts. Croushore presents empirical results showing that the statisti-cal significance of forecast bias varies greatly depending on the choice of start and end dates for the sample period. Rossi out-lines interesting work from Rossi and Sekhposyan (2011), which allows the researcher to conduct tests of forecast rationality that are robust to the presence of such instabilities, by computing test statistics across a range of subsamples, and then obtaining a critical value for some function (e.g., the sup) of these test statistics. We view this as a very useful extension since the type of macroeconomic variables often considered in forecast ratio-nality tests are indeed subject to breaks, notably inflation. By Proposition 3, instability in the target variable should not affect the multi-horizon bounds we propose, provided that forecasters’ information set is expanding through time and the fixed-event setup is adopted. However, the finite sample properties of the bounds tests could well be affected by such instability, in which case an approach like that of Rossi and Sekhposyan could be useful.
Croushore also notes that some of the tests could wrongly reject the null of forecast rationality due to changes in the defi-nition of the target variable through time. As he acknowledges, this is less of an issue when the test is based solely on the forecasts, at least if changes to the definition of the target vari-able happen after the target date. If it happens midstream, that is, while forecasts for a given target date are still being pro-duced, the results could of course be affected. This corresponds to changing the definition of the target variable at some date, t−hM, say, fromYtto
...
Yt, in which case the variances of the
opti-mal long-term and short-term forecasts, ˆY∗
t|t−hL =E[
...
Yt|Ft] and
Yt∗|t−h
L =E[Yt|Ft], respectively, (forhS< hM < hL) need not
satisfy the mean squared forecast bounds of the article since they apply to different variables. This can equivalently be thought of as a violation of the assumption of an expanding information
set as time progresses, which is one of the key assumptions underpinning the proposed tests.
3. PARAMETER ESTIMATION ERROR
Hoogerheide, Ravazzolo, and van Dijk present an interesting application of our “optimal revision regression” [see Equation (20) of the main article] to a Value-at-Risk forecasting appli-cation. This application exploits the fact that no data on the target variable are required, a desirable feature in volatility or value-at-risk applications, where the target variable is latent. In a simulation study for this application, these authors document that the test is (somewhat) oversized when parameter estimation error (PEE) is ignored (size of 0.12 or 0.08 for a nominal 0.05 test). West also raises concerns about the finite sample size of our rationality tests in the presence of PEE, and notes that in other applications PEE can lead to size distortions.
Hoogerheide et al. propose a simulation-based method for obtaining critical values that take into account parameter es-timation error, while West provides references for asymptotic adjustments for PEE. In applications where the forecast eval-uator is also the forecast producer, and so simulation-based and asymptotic adjustments are feasible, we agree that this is an improvement over simply ignoring PEE. For such cases, it is also feasible to construct critical values for the sec-ond moment bounds that account for the estimation scheme. However, we note that survey or expert forecasts, such as those in our empirical application, are typically based both on (unknown) econometric models and also on subjective judg-ment. In such applications, it is not clear how PEE should be handled.
We should note that PEE may affect the critical values for certain tests, but more important is whether such tests are even applicable. For example, while Corollary 1 states that the mean squared forecast error must be weakly increasing in the fore-cast horizon, in the presence of PEE it has long been known that this monotonicity property may break down, see Hoque, Magnus, and Pesaran (1988), Theorem 2, and Magnus and Pe-saran (1989), Theorem 1, for example, in the case of an AR(1) model. Thus, a term structure of mean squared error (MSE) values that is not weakly increasing neednotbe indicative of
© 2012American Statistical Association Journal of Business & Economic Statistics
January 2012, Vol. 30, No. 1 DOI:10.1080/07350015.2012.634354
36
Patton and Timmermann: Rejoinder 37
Table 1. Monotonicity bounds and regression results in the presence of parameter estimation error in an AR(1) model
φ=0.1 φ=0.5 φ=0.95
Estimation sample size: 25 50 100 25 50 100 25 50 100
Tests
NOTE: This table presents results on whether a given monotonicity bound or regression parameter result based on forecast optimality in the absence of estimation error continues to hold in the presence of estimation error. If the result holds it is marked with “√,” otherwise it is marked with “·”. The model considered is a simple AR(1) with an intercept,yt=µ+φyt−1+εt, εt∼N(0,1). Three in-sample estimation samples are considered (25,50, and 100 observations) and three levels of persistence are considered (φ=0.1,0.5,0.95).
forecast suboptimality, rather it may simply reflect PEE. Fur-thermore, the regression parameter restrictions (such as those from the Mincer–Zarnowitz regression or the “optimal revision regression,” MZ and ORR) may also fail to hold in the presence of PEE. For example, a finite sample bias in the estimation of the lag coefficients in AR models is well known, and this will inevitably lead to deviations of the population MZ and ORR parameters from their hypothesized values. In general, as the regression-based tests are based onequalityrestrictions rather than inequality restrictions, one might suspect that these are more sensitive to the presence of PEE.
To study whether this is an issue in applications similar to those considered in our Monte Carlo study, we conducted the following study: consider a simple AR(1) with intercept as in Section 5 of the article, with autoregressive coefficient φ∈ {0.1,0.5,0.95}. We consider rolling window estimation of this model based on R∈ {25,50,100}observations. (The results forR=1000 are very close to theR=100 case and so are not reported.)Table 1records a “√” if the monotonicity results established in the article hold across forecasting horizons h=1,2, . . . ,8 in the presence of estimation error, and records a “·” if this is violated. This table shows that for estimation sample sizes of 100 or more, all of the monotonicity and bounds results in the article continue to hold, at least for this popular data generating process. For an estimation sample of size 50, consistent with the results in Magnus and Pesaran (1989, Table 1), all results in the paper hold, except for a single result (MSE) for the least predictable case (φ=0.1).It is only when the sample size is 25 observations that we see scattered evidence of PEE leading to breakdowns in the monotonicity results of the article. This is in contrast with the regression-based results, all of which fail to hold in the presence of PEE, even with large estimation samples due to the finite sample biases in the parameter estimates. This confirms the intuition above that tests
based on inequality restrictions are likely to be less sensitive to PEE than those based on equality restrictions.
InTable 2, we present the results of a simulation study of the finite sample size of the proposed tests in the presence of PEE. These results can be compared to those inTable 1of the article for H =8 and zero measurement error. The results confirm that the bounds-based tests (in the top seven rows) are largely robust to PEE. It is only when persistence is low (φ=0.1) and the sample size is very short (25 observations) that we see some evidence that these tests are oversized. The regression-based tests, on the other hand, are all grossly oversized, which is consistent with the theoretical results in Table 1, and with the findings reported inTable 1 by Hoogerheide et al., which showed that the parameter restrictions that hold under the absence of PEE do not, in fact, hold in general.
The results in Tables1and 2 are specific to the simple AR(1) data generating process (DGP). The effect of PEE will generally depend on both the complexity of the model used to compute forecasts, larger effects likely associated with large-scale multi-variate models,and on the true DGP, both of which are typically unknown. It will also depend on the estimation method and the scheme used to update the parameter estimates (fixed, rolling, or expanding window). Another issue is that most results are limited to forecasts based on plug-in estimators which ignore the effect of estimation error and so could be improved, for example, through a bias-adjustment procedure, in a way that depends on the forecast horizon.
4. FINITE SAMPLE PERFORMANCE
Because samples used to evaluate forecast rationality are typically quite small, we agree with the observation made by Croushore that forecast rationality tests can be quite sensitive to the sample at hand. Figure 1 in Croushore’s analysis
Table 2. Monte Carlo simulation of size of tests of forecast optimality in the presence of parameter estimation error in an AR(1) model
φ=0.1 φ=0.5 φ=0.95
Estimation sample: 25 50 100 25 50 100 25 50 100
Tests
MZ on short horizon 94.0 58.8 33.2 22.5 4.1 5.9 38.7 15.4 14.6
Univar opt. rev. regr. 61.6 36.4 26.5 35.8 24.0 21.9 36.6 23.9 15.8
Univar opt. rev. regr., with proxy 31.1 26.1 23.8 50.4 32.8 26.3 45.2 26.6 17.9
Univar MZ, Bonf. 85.5 56.8 38.4 83.9 64.0 53.9 80.7 60.5 37.6
Univar MZ, Bonf., with proxy 53.1 41.6 43.0 81.0 64.1 24.5 79.7 60.0 36.9
Vector MZ 99.8 97.3 92.4 97.8 93.9 89.3 73.2 57.8 37.2
Vector MZ, with proxy 57.8 54.3 67.5 80.8 83.2 87.3 67.5 48.5 30.0
Bonf, using actuals 40.1 22.7 21.9 17.5 10.8 9.6 18.1 8.3 6.1
Bonf, using forecasts only 12.5 10.2 12.1 25.2 13.4 11.5 21.4 8.6 6.0
Bonf, all tests 34.5 17.0 18.7 21.2 11.1 9.0 18.2 6.3 4.9
NOTES: This table presents the outcome of 1,000 Monte Carlo simulations of the size of various forecast optimality tests, corresponding toTable 1of the article. Data are generated by a first-order autoregressive process with persistence parameterφof 0.1,0.5,or 0.95.Three rolling window estimation samples are considered (25, 50, and 100 observations). The maximum forecast horizon is assumed to be eight periods. The simulations assume an out-of-sample size of 100 observations and a nominal size of 10%. The inequality tests are based on the Wolak (1989) test and use simulated critical values based on a mixture of chi-squared variables. Tests labeled “with proxy” refer to cases where the one-period forecast is used in place of the predicted variable.
illustrates that the finite sample power of the test is an important consideration. For the graph with a fixed end date (2006), the length of the evaluation sample expands as one moves to the left along thex-axis, thereby increasing the power of the test and the probability of detecting any biases that may be present in the forecasts. This does not, however, help explain why the evidence against the null of no bias is weaker to the right in the graph that conditions on the starting date (1971) and expands the evaluation window as time moves forward. In this case, the test in fact rejects short evaluation windows.
In principle, using information on several forecast horizons should help improve power. For example, the optimal revision regression
tests for biases in the individual forecast revisions as well as in the long-run forecast. This allows us to decompose any biases into the individual forecast horizons, which is more difficult if only a single forecast horizon is considered. For comparison, the regressions proposed in Nordhaus (1987) and Lahiri and Sheng (2008)
dt|hS,hM =βdt|hM,hL+ut or (2)
et|t−hS =γ dt|hS,hL+ut, (3)
are equality tests of covariances for a given horizon although they exploit multi-horizon information through use of forecast revisions.
The empirical work reported by Lahiri suggested that fore-casters revise their forecasts in a nonsmooth way as a
func-tion of the forecast horizon which can cause problems such as ill-behaved regressors in the optimal revision regression. This observation suggests that one could increase the finite sample power of the tests by carefully selecting a subset of horizons and focusing on the associated moments.
One way to study the finite sample performance of the new forecast rationality tests is to use the bootstrap reality check of White (2000) or the refinement proposed by Hansen (2005), rather than rely on asymptotical critical values.Table 3 illus-trates the outcome of 1,000 Monte Carlo simulations used to study the size and power of a subset of the variance bounds. Compared with the tests based on the Wolak method, these tests are more undersized under the White approach, but slightly less so using the Hansen refinement whenH =8. The power of the tests is comparable across the three methods, with a slightly better performance under the asymptotical critical values. The bootstrap approach is likely to outperform by a greater margin as the number of forecast horizons,H, gets larger.
5. INTERPRETATION OF THE TESTS
Lahiri and West both raise the point that the tests proposed in our article will fail to detect certain deviations from rationality, that is, they are not consistent tests. West makes this point in the context of a mis specified AR(1) forecasting model applied to a MA(2) data-generating process, while Lahiri considers a correctly specified MA(q) model based on incorrect parameters. This is a valid concern, and is one that applies not only to our bound-based tests but also to the familiar Mincer–Zarnowitz tests, forecast error autocorrelation tests, our proposed optimal revision regression, etc. Consistent regression-based tests of
Patton and Timmermann: Rejoinder 39
Table 3. Comparison of asymptotic- and bootstrap-based tests
Size Equal noise Increasing noise
Asymp White Hansen Asymp White Hansen Asymp White Hansen
Tests
Dec Cov, with proxy 6.0 0.1 5.0 15.5 11.8 14.7 13.9 13.0 12.0
Cov bound, with proxy 0.0 5.9 5.1 99.2 98.7 98.5 100.0 100.0 100.0
NOTES. This table presents the outcome of 1,000 Monte Carlo simulations of the size and power of various forecast optimality tests, corresponding to Tables1and 2 of the article. All tests have a nominal size of 10%. The forecast horizon is assumed to be eight periods, the level of measurement error is assumed to be “medium,” and the out-of-sample period is assumed to be 100 observations. The first column of each panel presents results based on the Wolak (1989) test and uses simulated critical values based on a mixture of chi-squared variables. These values are repeated from column 5 ofTable 1and columns 5 and 11 ofTable 2in the article (reported here for ease of comparison). The second and third columns present results based on a bootstrap implementation based on White (2000) and Hansen (2005), respectively.
forecast rationality were considered by Corradi and Swanson (2002), though to the best of our knowledge the multi-horizon equivalent of the tests in that article are not available in the literature. Moreover, practice tests with power against generic alternatives can be difficult to interpret and are unlikely to have much power in any given direction, given the short evaluation samples that are typically available.
While we acknowledge that each of our bounds has power only against certain deviations from rationality, when used as asuiteof tests they cover a variety of deviations. Further, our simulation study suggests that combining these tests via a Bon-ferroni approach does not lead to an overly conservative test, and provides power in a (finite) variety of directions.
5.1 Improving Imperfect Forecasts
One of the benefits of using tests with power in a specific di-rection is that they offer clues to how suboptimal forecasts can be improved. Croushore raises the point that unlike more stan-dard regression-based tests of forecast rationality, the bounds tests proposed in our article do not naturally provide an “im-proved” forecast as an output. For example, consider a standard Mincer–Zarnowitz regression
Along similar lines, our proposed optimal revision regression in Equation (1) may also be used to generate an improved forecast
ˆ
It is true that the bounds tests we propose do not immediately lead to an improved forecast. However, considering what types of mis-specification will be detected by the various bounds tests does offer the possibility of improving the forecasts. We next illustrate how the rationality tests can be interpreted using two examples of suboptimal forecasts.
5.2 The Lazy Forecaster
Consider the case of a lazy forecaster, who, in constructing a short-horizon forecast, ˜Yt|t−hS, does not update his long-horizon
forecast, ˜Yt|t−hL,with relevant information, and hides this lack of
updating by adding a small amount of zero-mean, independent noise to the long-horizon forecast. In that case
˜
Thus, such a forecaster will be revealed via a violation of the MSE bounds. This forecaster will also violate the bound on the variance of the forecast revision
VY˜t|t−hS−Y˜t|t−hL
As shown in Corollary 4, the variance of the optimal forecast revision is bounded above by twice its covariance with the target variable. When the forecast is “updated” with pure noise, this bound is violated and in this case the long-horizon forecast is better than the short-horizon forecast, even when the latter is available.
5.3 Overreacting to News
As a second example, consider a forecaster who overreacts to new information about the target variable. We will model this as a forecast given by
ˆ
Yt|t−h=Yˆt∗|t−h−1+γ dt∗|h,h+1, γ ≥1. (9)
When γ =1, we have ˆYt|t−h=Yˆt∗|t−h−1+dt∗|h,h+1=Yˆt∗|t−h,
and thus the forecast is optimal, but whenγ >1 the forecaster updates her forecast by more than what is optimal. Consider now what this does to the observed forecast revisions
dt|h,h+1=Yˆt|t−h−Yˆt|t−h−1=γ dt∗|h,h+1+(1−γ)dt∗|h+1,h+2.
(10)
Notice that unlike optimal forecast revisions, which are uncor-related, these forecast revisions will be negatively correlated
cov
dt|h,h+1, dt|h+1,h+2
=γ(1−γ)V
dt∗|h+1,h+2
≤0 forγ ≥1 (with equality whenγ =1). (11)
This implies that mean squared forecast revisions (MSFR) can violate the weakly increasing property established in Corollary 1
Vdt|h,h+2
=Vdt|h,h+1+dt|h+1,h+2
=Vdt|h,h+1
+Vdt|h+1,h+2
+2covdt|h,h+1, dt|h+1,h+2
V
dt|h,h+1. (12)
Only when γ =1 does the third term drop out and we are ensured that the left-hand side is weakly greater than the
right-hand side. Thus, a violation of the MSFR condition, as found in our empirical application for GDP growth forecasts, illustrated in Figure 2, may indicate overreaction to news arriving between the release dates of forecasts.
ADDITIONAL REFERENCES
Corradi, V., and Swanson, N. R. (2002), “A Consistent Test for Nonlinear Out of Sample Predictive Accuracy,”Journal of Econometrics, 110, 353–381. [39]
Hoque, A., Magnus, J.R., and Pesaran, B. (1988), “The Exact Multi-Period Mean-Square Forecast Error for the First-Order Autoregressive Model,” Journal of Econometrics, 39, 237–246. [36]
Lahiri, K., and Sheng, X. (2008), “Evolution of Forecast Disagreement in a Bayesian Learning Model,”Journal of Econometrics, 144, 325–340. [38] Magnus, J.R., and Pesaran, B. (1989), “The Exact Multi-Period Mean-Square
Forecast Error for the First-Order Autoregressive Model With an Intercept,” Journal of Econometrics, 42, 157–179. [36]
Rossi, B., and Sekhposyan, T. (2011), “Forecast Optimality Tests in the Presence of Instabilities,” Mimeo, Duke University. [36]