5 Some choices in forecast construction
5.4 Practical problems in the model-building process
5.4.1 Independence of inputs to your forecast
One of the most common ways of model-building is to use simple linear regression to identify some variable that is believed to have been a lead indicator for some asset returns.
When we do this, we have the problem that our factors are not always independent of each other. In these circumstances, if we do a stepwise regression the fitted value of the coefficients will depend on the order in which the different terms are introduced. This is because if two factors are highly correlated, whichever is introduced first will pick up most of the variation that is associated with the pair. Hence, the coefficients in your model are not a single unique set of values even when you have comprehensive and accurate data on the system.
109 Some choices in forecast construction
When constructing a forecast, you are often explicitly or implicitly using sub-models that have been calibrated independently. For example, your top-down forecasts may be prepared by an economist or strategist, while your bottom-up forecasts are prepared by a set of company analysts or fund managers. Hence the drivers for these sub-models may be highly correlated without you realizing it. If you now enter high scores for two correlated factors, you are in effect double-counting that score.
There are several palliative actions which can be adopted to minimize these problems.
The simplest is careful selection of your factor, set to be as independent as possible.
More sophisticated approaches range from jointly estimating all the driver sensitivities so that the stepwise regression effect counteracts the double-counting effect, through to the formulation of Grinold and Kahn (1999: 263), where you assume that you have a knowledge of the full covariance between forecast and returns as follows:
Ey�g =Ey+Covy gVar−1gg−Eg (5.8)
where g is the vector of raw forecast scores and y is the return vector. In words, this is saying that our expected return given our scores is consensus expected return plus a scaled term proportional to the deviation of our score from its consensus value – the scaling term being derived from the covariance of observed return with the forecast scores.
5.4.2 Independence of updates in a Bayesian context
One of the more elegant forecasting techniques that you can use is based on a Bayesian analysis, where the investor has prior information represented by a probability distribution for likely return. This is then updated as new information arrives to produce an updated distribution.
If the information in the prior and the update are independent of each other, then this results in a very attractive cumulative process which is easy to implement – as we can see by considering Bayes’ theorem in more detail. At one level, Bayes’ equation is little more than an axiomatic statement about conditional probabilities:
PA�B = PAPB�A/PB
For our purposes here, this is really an updating equation. Given an initial probability P(A), we can refine that probability, as new information becomes available in the form of an observation with probability P(B), and the likelihood of seeing that data given your hypothesis (P(B�A)/P(B)).
This equation is easily extended to refer to probability distributions rather than prob
abilities. For this to be computationally convenient, we need to assume a form of these distributions. Within financial forecasting circles, the assumed distribution form will almost always be multivariate normal.
When we multiply a multivariate normal distribution (representing our prior views) by another (representing our likelihood), the result (our posterior probability distribution) is itself a multivariate normal. Hence it can be used in its turn as a new prior to which you add further information. This is called a conjugate prior form.
In principle, we can carry on in this manner indefinitely; however, there is a built-in assumption in so doing that we must recognize, and that is that at each stage the new
110 Forecasting Expected Returns in the Financial Markets
information distribution is independent of all of the previous inputs. An example might clarify this idea of independence.
Suppose you have entered a forecast return distribution for a company that was based at least in part on the assumption about its sector going into recession with poor sales prospects. Now you want to update this by adding some company-specific information;
they have just appointed a well-respected MD and you believe this will affect their prospects relative to the sector. Is this independent information or not?
If the new appointee is well respected because he is an excellent salesman rather than good at controlling costs, this new information is NOT independent. The extent to which you expect his skills to improve the company’s performance depends on whether the company finds itself in an environment where sales growth or cost-cutting are the key skill areas. When new information is not independent, you can still add it. However, this needs to be done in such a way that the related items of information are simultaneously input in the form of a joint probability distribution. This allows you to include the correlation between the management change and the expected macroenvironment.
In principle, you could assume that all your input was correlated with all of the other inputs. If so, you would then need to estimate all of those correlations. If you follow this route, you run the risk of introducing new potential sources of noise in your calculation.
Doing this can make it increasingly difficult for you to maintain a physical insight into the estimation process. This will have been replaced with a dependence on the accuracy and stability of these intermediate correlation calculations. It can now be seen that this is essentially the same problem as identified above for the Grinold and Kahn equation.
Of course, these issues are never black and white. If the assumption of independence is a sufficiently good approximation, then it allows you to keep a simpler, more understand
able model. If it is not a good approximation, then there should be a strong relationship that can be reliably captured in a correlation coefficient. In practice, this is the heart of the model-building process. Models are approximations of reality. Model-building is the process of deciding when these approximations are sufficiently close, and when you need to elaborate the model.
5.4.3 Relative value, factor and scenario information
Frequently when constructing a forecast, the insight that is available will not be an expectation on absolute return, but one of likely performance of one asset relative to another with an estimated uncertainty around that value. At any point in time there may be many such views on likely relative performance that we wish to include in our analysis.
The Bayesian updating framework that we have considered in the last section can be conveniently extended to cater for this problem. This is then known as the mixed estimation approach to establishing a relative value forecast, as popularized by Black and Litterman (BL). This takes a prior multivariate normal distribution (in the BL formulation this prior is derived from a set of CAPM equilibrium assumptions) with a mean of and covariance S, i.e.
pdf1r =cexp−05x− �S−1x− (5.9)
It then updates it with a second multivariate normal
pdf2x= cexp−05x−g�V−1x−g (5.10)
111 Some choices in forecast construction
where x is the relative return (constructed by calculating the return on an arbitrary number of linear combinations of return on each asset) with a mean of g and covariance of V (this can be thought of as the mean and covariance of return on a set of hypothetical portfolios). That is,
x = Pr (5.11)
Substituting (5.11) into (5.10) gives:
pdf2r exp−05Pr−g�V−1Pr−g (5.12)
Remembering that ea ×eb = ea+b, the product of the two multivariate normals equa
tions (5.1) and (5.4) can be written as:
pdf1r×pdf2r exp−05r− �S−1r− −05Pr−g�V−1Pr−g (5.13) In order to return this to standard multivariate normal form (and hence achieve conjugate prior form), we need to expand the quadratic elements of this equation and collect terms.
This results in the following equations for the updated mean and covariance : =S−1 +P�V−1 P−1S−1 +P�V−1 g
=S−1 +P�V−1P−1
(The details of this expansion and collection of terms is given in Satchell and Scowcroft (2000) and reproduced in this book.)
While this process uses a Bayesian idea of taking a prior and updating it with a likelihood function, it is important to recognize that this likelihood specified by the model builder is the likelihood of these relative values being true conditional on your other relative views also being satisfied. In the general case, this condition is likely to be satisfied at the mean of the distribution but not elsewhere. Hence, the covariance matrix resulting from this analysis should not be interpreted as representing the real expected covariance of the likely asset return. Also, for the estimation of the mean, the result can sometimes be unstable to small changes in the input parameters.
These effects are best understood by taking a stylized example of perfect confidence in our relative value terms. If we do this, we can see that the effect of the prior distribution becomes negligible, and the updating terms become in effect a set of simultaneous linear equations. Hence we should expect that under some circumstances we will suffer from ill-conditioning in this analysis.
The strengths of this approach are its practical operational benefits. Taking the above comparison with simultaneous linear equations, we can now specify a degree of confidence in each equation and find the best fit, both given the different confidence value, and also biased by the prior toward the most likely solution.
This has a number of practical consequences:
1. We will be able to find a solution and a confidence in that solution 2. Inconsistent inputs will be reconciled
112 Forecasting Expected Returns in the Financial Markets
• if they are equal confidence, their effects will be the average of the two indepen
dently
• if one is high confidence, it will dominate the low confidence input
3. If a forecast is missing from the update set of ‘fuzzy’ simultaneous equations, then this value will be filled in from the prior.
Hence this is a very convenient way of entering multiple relative value views; however, the same basic analysis can also be used to enter factor views. The key to this is to observe that in equation (5.12) above, x can be allowed to represent the asset return (rather than the relative return) and r to represent the factor return (rather than the asset return). Because of the conjugate prior structure of these equations, we can therefore merge factor views with our asset and relative value views by repeatedly using the same basic equation.
The final type of insight that we wish to incorporate into our mixture of normals approach is views on scenarios and scenario probability. In these models, a series of discrete states are postulated to represent the expected distribution of return. As with the other forecasting techniques, the independence or otherwise of these scenarios needs to be carefully considered and a sufficient number generated to allow any optimization process to identify a realistic risk-return tradeoff. Typically, this requires many more scenarios than there are assets.
As the number of such individual scenarios increases, the task of generating them becomes more tedious. Hence our formulation of stochastic scenarios, where we assume states of the world each with its own probability distribution in multivariate normal form from which a set of individual scenarios can be drawn – i.e. in our formulation, scenarios are the sum of a number of normal distributions with different probability.
In this way we have arrived at the general form of our mixture of normals approach where the resulting non-normal distribution can be represented by a Monte Carlo dis
tribution. When we do this, it is then trivial to see that we can apply arbitrary payoff functions to this distribution to capture the effects of optionality, convertible bonds or structured products. The problem then is to optimize, given this Monte Carlo set, as discussed in the next section.