In many operational risk applications the relationship proposed by a sim- ple linear regression model, with only one independent variable, may not adequately explain the variation in the dependent variable. This is because in practice there will be many influences on the dependent variable. In such cases, we can extend simple linear regression to multiple linear regression.
Given the dependent random variable Yandkexplanatory variablesX1,X2, . . . ,Xk, multiple regression model takes the form
whereeis the residual generally assumed to be an independent identical nor- mally distributed random variable with mean 0 and variance s 2.
yi =α +β1xi1+ β xi + + βkxik+ εi
2 2 L
Linear Regression in Operational Risk Management 149
FIGURE 12.8 Histogram of standardized residuals for linear regression of operational losses on system downtime.
6 5 4 3 2 1
−1 0
−2
−3
−4 150
100
50
0
Standardized residual
Frequency
FIGURE 12.9 Normal probability plot for the standardized residuals for linear regression of operational losses on system downtime.
−3 −2 −1 0 1 2 3 4 5
1 5 10 20 30 40 50 60 70 80 90 95 99
Normal Probability Plot for SRES1
Percent
Data
ML Estimates - 95% CI
EXAMPLE 12.6 MULTIPLE LINEAR REGRESSION OF OPERATIONAL LOSSES
Recall Example 12.1 in which system downtime was the sole independent variable and operational losses were the dependent variable. We might expect a more accurate model if we included a number of other independent vari- ables. Table 12.2 provides details on other independent variables that also might be relevant. They include the number of trainees employed in the back office on a particular day, number of experienced staff, volume of transac- tions and number of transaction errors. Therefore, we might postulate
1 2 3 4 5
y i = α + β1 xi + β2 xi − β3 xi +β4 xi + β5 xi +εi
x
where Y is operational losses, x1 is system downtime, x2 is the number of trainees working in the back office, x3 is the number of experienced staff,
4 is the volume of transactions, and x5 is the number of transaction errors.
We might expect the coefficient on b3 to be negative—the more experienced the staff working in the back office on a particular day, the less likely is an operational loss. All the other independent variables are expected to have a positive sign.
Estimation, Model Fit, and Hypothesis Testing
As with simple linear regression, parameters can be estimated using ordinary least squares or maximum likelihood. Model fit cannot necessarily be assessed using R2 because it can be inflated towards its maximum value of 1 simply by TABLE 12.2 Postulated Causes of Operational Losses for a Multiple Regression Model
System
Operational down- Experienced Transaction Date loss ($) time Trainees staff Transactions errors
1-Jun 1,610,371 9 15 30 389,125 38,456
2-Jun 25,677 0 7 21 327,451 28,372
3-Jun 1,504,852 11 6 29 258,321 23,916
4-Jun 0 0 5 37 209,124 17,456
5-Jun 913,881 7 16 33 198,243 15,912
6-Jun 2,352,458 18 4 33 152,586 7,629
7-Jun 3,549,325 19 0 3 121,411 9,070
8-Jun 0 0 16 34 127,407 7,370
9-Jun 0 0 14 28 144,760 10,238
10-Jun 1,649,917 13 9 32 116,548 7,827
151
Linear Regression in Operational Risk Management
adding more independent variables to the regression equation. The adjusted coefficient of determination [R2(adj)] takes into account the number of ex- planatory variables in the model:
RSS
1 n − k R2(adj )= −
TSS
n −1 Notice that when k =1:
RSS
1 n −k 2
R2(adj )= − = − RSS
= R
TSS 1 TSS
n −1
The value of the estimated coefficients can be investigated using the t test described in the previous section. We may also be interested in the joint test of the null hypothesis that none of the explanatory variables have any effect on the dependent variable. In this case, provided our regression model has an intercept, we would use the test statistic
(
TSS − RSS)
F =
k RSS
(
n − − k 1)
This test statistic has an F distribution with k and n −k −1 degrees of free- dom. Rejection of the null hypothesis implies that at least one of the explan- atory variables has an effect on the dependent variable.
EXAMPLE 12.7 ESTIMATION OF PARAMETERS OF MULTIPLE LINEAR REGRESSION OF OPERATIONAL LOSSES
The parameter estimates for Example 12.1 are shown in Table 12.3. The first thing to notice is that the signs are more or less as expected. Second, the coefficient on system downtime is only slightly different from the value esti- mated in simple linear regression. However, R2 is higher, indicating that the model fits the data slightly better than simple linear regression. The t sta- tistic and F test statistic, with their corresponding p values, are also listed.
TABLE 12.3 Multiple Linear Regression for Operational Loss The regression equation is
Opp_Loss = 153165 + 135950 System down time + 5506 Trainees − 11286 Exp_Staff + 0.888 Transactions − 0.52 Trans_errors
Predictor Coef SE Coef T p VIF
Constant 153165 46611 3.29 0.001
Sys_down 135950 1864 72.93 0.000
Trainees 5506 2335 2.36 0.019 1.7
Exp_Staf −11286 1416 −7.97 0.000 1.7
Transact 0.8883 0.3135 2.83 0.005 1.4
Trans_err −0.520 1.826 −0.28 0.776 1.4
R2(adj) = 95.4%
S = 160556
F joint test = 4082 (p < 0.001)
All of the independent variables, except transaction errors are significantly different from zero. The coefficients shown in Table 12.3 allow us to assess the impact on operational losses of a change in the independent variables.
For example reducing the number of trainees on a particular day by 1 indi- vidual will lead to a reduction in operational losses of around $5,506, whilst reducing the number of trained staff by 1 individual increases opera- tional losses by $11,286. Knowledge of this type is important in assessing the operational impact of decisions to change the trainee/experienced staff mix.
Checking the Assumptions of the Multiple Regression Model
We can use the same procedures mentioned for simple linear regression. The only additional concern is that the independent variables are uncorrelated. If this is not the case, the regression model may suffer from multicollinearity.
Multicollinearity occurs when a linear relationship exists among the inde- pendent variables. If this occurs, the estimated regression coefficients become unreliable. There are a number of quantitative ways to detect multi- collinearity. The simplest involves inspecting the sample correlation matrix constructed from the independent variables. Another way to detect multi- collinearity, available in most regression software packages, involves the calculation of variance inflationary factors (VIFs) for each variable. VIFs are used to detect whether one predictor has a strong linear association with the remaining predictors (the presence of multicollinearity among the pre- dictors). VIFs measure how much the variance of an estimated regression coefficient increases if the independent variables are highly correlated. It is generally accepted that a value greater than 5 indicates multicollinearity may be a serious problem, and therefore the regression coefficients, partic- ularly those with high VIFs, may be unreliable.
1
153
Linear Regression in Operational Risk Management
EXAMPLE 12.8 MISSPECIFICATION TESTING THE SIMPLE LINEAR REGRESSION OF OPERATIONAL LOSSES
ON SYSTEM DOWNTIME
Figure 12.10 plots the standardized residuals for the multiple regression model of Example 12.7. The vast majority of residuals lie within −2 to 2. Table 12.3 also reports that the variance inflation factors all are less than 2. Multi- collinearity should not be a problem. Inspection of Figure 12.10 does not reveal any violations in homoscedasticity, although the pattern of observa- tions seems to indicate failure of the independence assumption. The stan- dardized residuals appear approximately normal, as shown in Figure 12.11.