8 Robust optimization for utilizing forecasted returns in institutional
8.2 Notions of robustness
There is a large literature on uncertainty and robust techniques. These occur in many fields, particularly mathematics, finance, economics and statistics, and with broad appli
cations in investment, operations control and engineering. In this section we consider the application of robust concepts of interest to an institutional investor. Our principal interest here will be on forecasting returns, but, as will become clear to the reader, this will involve a wide range of robust concepts, all of which will have relevance to the forecasting problem.
It is apparent that robustness is a very broad subject and a complete survey would include thousands of references. We will concentrate on the areas we find relevant to forecasting returns and investment choices. We now present a list of the broad topics we shall consider.
Initially we shall review robust statistics and ranked forecasts, followed by a dis
cussion of Bayesian adjustments, including shrinkage and Black–Litterman, as well as a discussion of risk aversion. Moving more to the optimization side of the problem, we look at the role of constraints, resampling as well as utility bounds. We end with a discussion of the mathematics of robust optimization, including second-order cone programming.
Our first notion of robustness is the theory of robust statistics. There are many dif
ferent variants of this, and it is a large area in its own right. We will consider general robust estimators, and then concentrate on the work of Victora-Feser. Huber (1964) and Hampel (1974) both consider issues of robust estimation, and the effect of data errors on estimators. In fact, many robust estimators or robust versions of estimators have been developed to deal with data errors (for example, Least Trimmed Squares (Rousseeuw, 1985), Iteratively Reweighted Least-Squares (Holland and Welsch, 1977), Generalized Method of Moments estimation (Ronchetti and Trojani, 2001), M-estimators (Hampel et al., 1986), S-estimators (Rousseeuw and Yohai, 1984), or �-estimators (Yohai and Zamar, 1988)).
Victoria-Feser considers robust statistics in the framework of expected returns and robust portfolio selection. Victoria-Feser (2000) considers what happens to conventional
179 Robust optimization for utilizing forecasted returns in institutional investment
statistical techniques if the distribution of returns is contaminated by mixing it with some other distribution, typically a one-off extreme stock. So, for example, we may wish to examine what happens to return generating models by regarding the huge price movement of October 1987 as such a contamination. Perret-Gentil and Victoria-Feser (2003) expand on the above, and show that the use of robust estimates in portfolio selection problems result in more stable portfolios, mitigating estimation error and more importantly model uncertainty.
An alternative notion of robustness is based on using weaker information on the return forecasts. Thus we might consider only looking at rankings of stocks rather than actual point estimates of returns. This can be thought of as expressing robustness through incomplete information. The basic idea is that we would regard conventional return forecasts as so unreliable that we might only be willing to produce rankings.
This is indeed what many funds do. Again this in itself would not exclude optimiza
tion. Indeed, risk could be measured conventionally. We would then need to combine ranked returns with conventional risk. This might involve adjusting the coefficient of risk aversion or mapping back forecasts to returns (see Satchell et al., 2003; Satchell and Wright, 2005).
The approach outlined above has been criticized by Almgren and Chriss (2004, 2006) in an influential paper that is republished in this book (see Chapter 4). They have preferred an approach based on using cones. Each region of equal returns in n-dimensional space is defined as a cone, within which the true expected returns are believed to lie. A probability measure P is assigned defining the relative probability of the true return vector being in any location within the cone. These are linked to a constraint set within which the optimal portfolio should lie. For a single expected return vector, as in a standard Markowitz case, a portfolio is preferred to another one if it has a higher expected return. Given the (multiple) feasible expected return vectors in the cone, the authors deem a portfolio A to be superior to another B when the n-dimensional area where A’s returns are higher is larger than the area where B’s returns are larger, under the probability measure P above.
Note that the authors find the centroid, or n-dimensional geometric centre-of-mass of the consistent set, to define perfectly this probability measure P. The process of finding an optimal portfolio is therefore reduced to a manipulation of this centroid. The intuition is to use minimal information on alpha to get a large feasible set of portfolios that might be dominating. This procedure is similar to some sophisticated manager performance systems that compare the manager’s portfolio weights with all randomly generated weights that satisfy the same portfolio constraints. The comparison is then between the manager and the average portfolio, that is, the centroid.
There is a broad range of Bayesian adjustments, both by practitioners and in the academic literature. James and Stein (1961), Jobson et al. (1979) and Jorion (1986) have produced biased-shrinkage estimators applied to the mean. Note that these Bayesian adjustments are not necessarily restricted to forecasted returns, but can be applied to the other inputs to an optimizer as well. The idea is again exactly the same, where sample information is shrunk towards model-based ex ante information, again to deal with estimation uncertainty. For example, Ledoit and Wolf (2003) suggest shrinking the sample covariance matrix towards a single-index covariance matrix, and Ledoit and Wolf (2004) shrink it towards a constant correlation matrix to control for estimation error.
The optimal shrinkage intensity is derived by minimizing a loss function based on the Frobenius norm of the matrix; see their paper for details.
180 Forecasting Expected Returns in the Financial Markets
With respect to improving forecasted returns, Black and Litterman (1990, 1992) pro
posed an intuitive way of combining model or equilibrium views with the views of the investor. Their suggestion was to start from equilibrium returns when optimizing (or by implied expected equilibrium returns as extracted by a reverse optimization method), and tilting towards the investor’s views according to the degree of confidence of the investor on these views. The result should be a more stable portfolio. While the original paper was on asset allocation, the same methodology can be extended to an equities factor model. Lee (2000) suggests that the Black–Litterman model ‘largely mitigates’ the error- maximization issue of optimizers by spreading the errors across the vector of expected returns. Kandel and Stambaugh (1996), Pastor and Stambaugh (2000), Barberis (2000) and Pastor (2000) consider similar models of Bayesian updating. A number of chapters in this book discuss Black and Litterman’s influential model.
Cavadini et al. (2001) concentrate on the importance of risk aversion on portfolio choice. Specifically, they consider what they call pseudo risk aversion corrections to explicitly address both estimation risk and model risk at the same time. In effect, they penalize risky allocations that are particularly exposed to both types of risk. They find substantial reductions in their loss function as a result, and superior results to addressing either estimation risk or model risk separately.
Practitioners routinely use constraints in the optimization to control for estimation error. In fact, Jagannathan and Ma (2003) demonstrate that even imposing wrong con
straints can help. They show that imposing a non-negativity constraint to a portfolio optimization process is equivalent to shrinking extreme elements of the covariance matrix.
In fact, the reduction of sampling error is often higher than the increase of specification error, resulting in less risky portfolios overall.
Michaud (1989, 1998) proposed a re-sampled frontier of optimal portfolios to deal with ambiguity aversion. He assumes that the true population is multivariate normal, where the mean and covariance matrix are given by a maximum likelihood estimator on historical data. Based on simulation over uncertain parameters, repeated mean-variance optimizations and an averaging of the optimal portfolios result in a more stable and robust solution, addressing the ‘error-maximization’ issue.1 One can interpret this as a resolution of ambiguity aversion by treating each Monte Carlo resolution as a prior, and each prior as equally likely. It is not clear that a simple average of the optimal portfolios, proposed by Michaud, captures the distributional properties of the underlying uncertainty, nor that it is consistent with a maximum expected utility framework. Harvey et al. (2004) concentrate on the importance of higher moments and parameter uncertainty, pointing to Jensen’s inequality problem with Michaud’s re-sampling. This would imply that the re-sampled portfolio would converge in probability to the population expected value of the optimized portfolio, which will be typically biased in finite samples. For our purposes, it is not clear that, say, bequeathing some combined prior distribution to the re-sampled portfolios would lead to any theoretical or practical advantage over the robust techniques.
Also, while intuitive in its conception, the computational overload of re-sampling might be a negative consideration for the practical investor.
Another robust procedure is to use techniques that give you lower bounds on your expected utility. Clearly, if this lower bound is valid for all scenarios, it will encompass
1The error-maximization issue refers to the estimation-error insensitivity of mean-variance optimizations, whereby for example excessive weights can be placed on unrealistically high estimates. Jobson and Korkie (1980) were the first to point to this bias.
181 Robust optimization for utilizing forecasted returns in institutional investment
the analysis that we discuss later. Such an approach is discussed in Anderson et al. (2000).
More precisely, they use an exponential inequality that bounds the tail probability of final wealth (a Markov inequality). This procedure has been applied independently, to build portfolios with robust risk characteristics, by Chu (2004).
So far we have considered the role of uncertainty and uncertainty aversion when modelling preferences and resolutions based on treating the inputs or the outputs of a standard quadratic optimizer. However, there have recently been improvements in the mathematical and optimization literature itself.
Quadratic programming, while consistent with the Markowitz risk-return framework, can be restrictive with regards to real-life problems as it can only handle linear con
straints. Non-linear optimization can only deal with a limited problem size, because the time taken to optimize increases exponentially with the number of assets. It can also lead to local as opposed to global optima. Rockafellar (1993) famously notes that in fact it is not the non-linearity per se that slows a solution down, but rather the non- convexity of the objective function or the constraints. This observation sparked a series of advancements. Nesterov and Nemirovsky (1994) developed an interior point method for non-linear convex optimization problems. This allows an investor to define a more general class of problems, with any number of quadratic constraints, which can still be solved efficiently (see for example Ben-Tal and Nemirovski (1998), or Lobo et al. (1997) for an implementation, including a link to sample code). In fact, primal-dual interior- point or barrier methods have been developed for various classes of problems. Other advances in optimization methodologies include integrating the maximin expected utility theory of Gilboa and Shmeidler (1989) mathematically (as opposed to via the objective function) into robust optimization methods. Lobo (2000) and Cornuejols and Tütüncü (2006) provide excellent reviews of the mathematical advances in robust optimization, with applications in finance.
The above discussion has wandered a little from our objectives of understanding how forecast error and optimization might interface. For our purposes, uncertainty about the return forecasts can be explicitly treated via more complex objective formulations. For example, extra quadratic constraints can be defined with regard to the first and second moments of the historic forecast error for each asset. Using the interior point methods above, with the extra quadratic constraints, would control for assets where forecasts have historically been wrong or volatile.
As expected, there have been numerous applications of the above optimization advances in the financial literature. For example, Goldfarb and Iyengar (2003) show that their problem formulation, which includes ‘uncertainty structures’ for dealing with uncertainty in the estimation of the model parameters, is in fact a conic optimization problem. Second- order cone programming (SOCP) deals with linear objective functions and linear and SOC constraints.
Costa and Paiva (2002) also consider a portfolio optimization problem under param
eter uncertainty (both for the return forecasts and the covariance matrix). They show it is possible to consider classes of problems that have an optimal solution which will stay unchanged as we vary other aspects of the problem. In fact, they show their problem for
mulation can be reduced to a linear matrix inequalities optimization problem. However, these linear matrix inequalities that the unknown parameters satisfy and other aspects of the problem do not seem to have any economic meaning – at least none is provided by the authors.
�
182 Forecasting Expected Returns in the Financial Markets