A5.4 RECOMMENDED READING
9.4 FORECAST EVALUATION APPROACHES TO BACKTESTING
Box 9.4 A Risk-return Backtest
Bederet al.(1998, pp. 302–303) suggest a rather different backtest based on the realised risk–
return ratio. Assume for the sake of argument that we have normally distributed P/L, with mean µand standard deviationσ. Our VaR is then−µ−αclσ, and the ratio of expected P/L to VaR is−µ/(µ+αclσ)= −1/[1+αclσ/µ]. If the model is a good one and there are no untoward developments over the backtesting period, we should find that the ratio of actual P/L to VaR is equal to this value give or take a (hopefully well-behaved) random error. We can test this prediction formally (e.g., using Monte Carlo simulation), or we can use the ratio of actual P/L to VaR as an informal indicator to check for problems with our risk model.
The fourth ingredient is a score function, which takes as its inputs our loss function and bench- mark values. For example, if we take our benchmark to be the expected value ofCt under the null hypothesis that the model is ‘good’, then we might use a quadratic probability score (QPS) function, given by:
QPS=(2/n) n
t=1
(Ct−p)2 (9.6)
(see Lopez (1999, p. 47)). The QPS takes a value in the range [0,2], and the closer the QPS value to zero, the better the model. We can therefore use the QPS (or some similar score function) to rank our models, with the better models having the lower scores.
The QPS criterion also has the attractive property that it (usually) encourages truth-telling by VaR modellers: if VaR modellers wish to minimise their QPS score, they will (usually) report their VaRs
‘truthfully’ (Lopez (1999, pp. 47–48)). This is a useful property in situations where the backtester and the VaR modeller are different, and where the backtester might be concerned about the VaR modeller reporting false VaR forecasts to alter the results of the backtest.
9.4.2 The Frequency-of-tail-losses (Lopez I) Approach
To implement forecast evaluation, we need to specify the loss function, and a number of different loss functions have been proposed. Perhaps the most straightforward is the binomial loss function proposed by Lopez (1998, p. 121)), which gives an observation a value of 1 if it involves a tail loss, and a value of 0 otherwise. Equation (9.5) therefore takes the form:
Ct= 1
0 if Lt >VaRt
Lt ≤VaRt (9.7)
This ‘Lopez I’ loss function is intended for the user who is (exclusively) concerned with the frequency of tail losses. The benchmark for this loss function is p, the expected value ofE(Ct).15
9.4.3 The Size-adjusted Frequency (Lopez II) Approach
This loss function ignores the magnitude of tail losses. If we wish to remedy this defect, Lopez (1998, p. 122) himself suggests a second, size-adjusted, loss function:
Ct =
1+(Lt−VaRt)2
0 if Lt >VaR
Lt ≤VaRt
(9.8) This loss function allows for the sizes of tail losses in a way that Equation (9.7) does not: a model that generates higher tail losses would generate higher values of Equation (9.8) than one that generates lower tail losses, other things being equal. However, with this loss function, there is no longer a straightforward condition for the benchmark, so we need to estimate the benchmark by some other means (e.g., Monte Carlo simulation).16
15Although the Lopez procedures are not formal statistical tests, Haas (2001, p. 5) observes that they can be converted into such tests by simulating a large number of P/L series, calculating theC-value for each, and deriving the criticalC-value that corresponds to a chosen confidence level. We then carry out our tests by comparing our actualC-values to these critical C-values. This is an interesting suggestion that is worth exploring further.
16One way to do so is suggested by Lopez (1998, pp. 123–24). He suggests that we assume the observed returns are independent and identically distributed (iid); we can then use this assumption to derive an empirical loss function and a value
9.4.4 The Blanco-Ihle Approach
However, the size-adjusted loss function (Equation (9.8)) has the drawback that it loses some of its intuition, because squared monetary returns have no ready monetary interpretation. Accordingly, Blanco and Ihle (1998) suggest a different size-loss function:
Ct =
(Lt−VaRt)/VaRt
0 if Lt>VaRt
Lt≤VaRt (9.9)
This loss function gives each tail-loss observation a weight equal to the tail loss divided by the VaR.
This has a nice intuition, and ensures that higher tail losses get awarded higherCtvalues without the impaired intuition introduced by squaring the tail loss.
The benchmark for this forecast evaluation procedure is also easy to derive: the benchmark is the expected value of the difference between the tail loss and the VaR, divided by the VaR itself, and this is equal to the difference between the ETL and the VaR, divided by the VaR.17
9.4.5 An Alternative Sizes-of-tail-losses Approach
Yet the Blanco–Ihle loss function also has a problem: because Equation (9.9) has the VaR as its denominator, it is not defined if the VaR is zero, and will give mischievous answers if VaR gets
‘close’ to zero or becomes negative. It is therefore unreliable unless we can be confident of the VaR being sufficiently large and positive.
What we want is a size-based loss function that avoids the squared term in the Lopez II loss function, but also avoids denominators that might be zero-valued. A promising candidate is the tail loss itself:
Ct= Lt
0 if Lt >VaRt
Lt ≤VaRt (9.10)
The expected value of the tail loss is of course the ETL, so we can choose the ETL as our benchmark, and use a quadratic score function such as:
QS=(2/n) n
t=1
(Ct−ETLt)2 (9.11)
This approach penalises deviations of tail losses from their expected value, which makes intuitive sense. Moreover, because it is quadratic, it gives very high tail losses much greater weight than more
‘normal’ tail losses, and therefore comes down hard on very large losses.
of the final score; if we repeat the operation a large number of times, we can use the average final score as our estimate of the benchmark. However, if the VaR model is parametric, we can also use simpler and more direct approaches to estimate the benchmark: we simulate P/L data under the null hypothesis using Monte Carlo methods, and we can take the benchmark to be the average of our final scores.
17Blanco and Ihle also suggest a second approach that incorporates concerns about both the frequency and the size of tail losses. If we letCtfrequencybe the Lopez I frequency-loss function (Equation (9.7)), andCsizet be the Blanco–Ihle size-loss function (Equation (9.9)), they suggest an alternative loss function that is a weighted average of both, with the weighing factor reflecting our relative concern about the two sources of loss. This is an appealing idea, but this suggestion does not produce reliable rankings:CtfrequencyandCtsizeare not defined in terms of the same underlying metric, so irrelevant changes (e.g., like redefining our monetary units: say, going from dollars to cents) can alter our scores, and so change our rankings. The idea of taking a weighted average is a good one, but we need a more reliable way of implementing it.