MULTIPLE REGRESSION - Operational Risk

In many operational risk applications the relationship proposed by a simple linear regression model, with only one independent variable, may not adequately explain the variation in the dependent variable. This is because in practice there will be many influences on the dependent variable. In such cases, we can extend simple linear regression to multiple linear regression.

Given the dependent random variable Yandkexplanatory variablesX¹,X², . . . ,X^k, multiple regression model takes the form

whereeis the residual generally assumed to be an independent identical nor- mally distributed random variable with mean 0 and variance s ².

y_i =α +β1x_i1+ β x_i + + β_kx_i^k+ ε_i

2 2 L

Linear Regression in Operational Risk Management 149

FIGURE 12.8 Histogram of standardized residuals for linear regression of operational losses on system downtime.

6 5 4 3 2 1

−1 0

−2

−3

−4 150

100

Standardized residual

Frequency

FIGURE 12.9 Normal probability plot for the standardized residuals for linear regression of operational losses on system downtime.

−3 −2 −1 0 1 2 3 4 5

1 5 10 20 30 40 50 60 70 80 90 95 99

Normal Probability Plot for SRES1

Percent

Data

ML Estimates - 95% CI

EXAMPLE 12.6 MULTIPLE LINEAR REGRESSION OF OPERATIONAL LOSSES

Recall Example 12.1 in which system downtime was the sole independent variable and operational losses were the dependent variable. We might expect a more accurate model if we included a number of other independent variables. Table 12.2 provides details on other independent variables that also might be relevant. They include the number of trainees employed in the back office on a particular day, number of experienced staff, volume of transactions and number of transaction errors. Therefore, we might postulate

1 2 3 4 5

y i = α + β1 x_i + β2 x_i − β3 x_i +β4 x_i + β5 x_i +εi

where Y is operational losses, x¹is system downtime, x²is the number of trainees working in the back office, x³is the number of experienced staff,

4 is the volume of transactions, and x⁵is the number of transaction errors.

We might expect the coefficient on b₃to be negative—the more experienced the staff working in the back office on a particular day, the less likely is an operational loss. All the other independent variables are expected to have a positive sign.

Estimation, Model Fit, and Hypothesis Testing

As with simple linear regression, parameters can be estimated using ordinary least squares or maximum likelihood. Model fit cannot necessarily be assessed using R²because it can be inflated towards its maximum value of 1 simply by TABLE 12.2 Postulated Causes of Operational Losses for a Multiple Regression Model

System

Operational down- Experienced Transaction Date loss ($) time Trainees staff Transactions errors

1-Jun 1,610,371 9 15 30 389,125 38,456

2-Jun 25,677 0 7 21 327,451 28,372

3-Jun 1,504,852 11 6 29 258,321 23,916

4-Jun 0 0 5 37 209,124 17,456

5-Jun 913,881 7 16 33 198,243 15,912

6-Jun 2,352,458 18 4 33 152,586 7,629

7-Jun 3,549,325 19 0 3 121,411 9,070

8-Jun 0 0 16 34 127,407 7,370

9-Jun 0 0 14 28 144,760 10,238

10-Jun 1,649,917 13 9 32 116,548 7,827

151

Linear Regression in Operational Risk Management

adding more independent variables to the regression equation. The adjusted coefficient of determination [R²(adj)] takes into account the number of explanatory variables in the model:

 RSS 

 

1 n − k  R²(adj )= − 

 TSS 

 

n −1  Notice that when k =1:

 RSS 

 

1 n −k  ²

R²(adj )= −  = − RSS

= R

 TSS  1 TSS

 

n −1 

The value of the estimated coefficients can be investigated using the t test described in the previous section. We may also be interested in the joint test of the null hypothesis that none of the explanatory variables have any effect on the dependent variable. In this case, provided our regression model has an intercept, we would use the test statistic





(

^TSS⁻ ^RSS

)

 

F =  

  k RSS

 

 



(

ⁿ^{− −}k ¹

)



 

This test statistic has an F distribution with k and n −k −1 degrees of free- dom. Rejection of the null hypothesis implies that at least one of the explanatory variables has an effect on the dependent variable.

EXAMPLE 12.7 ESTIMATION OF PARAMETERS OF MULTIPLE LINEAR REGRESSION OF OPERATIONAL LOSSES

The parameter estimates for Example 12.1 are shown in Table 12.3. The first thing to notice is that the signs are more or less as expected. Second, the coefficient on system downtime is only slightly different from the value estimated in simple linear regression. However, R²is higher, indicating that the model fits the data slightly better than simple linear regression. The t statistic and F test statistic, with their corresponding p values, are also listed.

TABLE 12.3 Multiple Linear Regression for Operational Loss The regression equation is

Opp_Loss = 153165 + 135950 System down time + 5506 Trainees − 11286 Exp_Staff + 0.888 Transactions − 0.52 Trans_errors

Predictor Coef SE Coef T p VIF

Constant 153165 46611 3.29 0.001

Sys_down 135950 1864 72.93 0.000

Trainees 5506 2335 2.36 0.019 1.7

Exp_Staf −11286 1416 −7.97 0.000 1.7

Transact 0.8883 0.3135 2.83 0.005 1.4

Trans_err −0.520 1.826 −0.28 0.776 1.4

R²(adj) = 95.4%

S = 160556

F joint test = 4082 (p < 0.001)

All of the independent variables, except transaction errors are significantly different from zero. The coefficients shown in Table 12.3 allow us to assess the impact on operational losses of a change in the independent variables.

For example reducing the number of trainees on a particular day by 1 individual will lead to a reduction in operational losses of around $5,506, whilst reducing the number of trained staff by 1 individual increases operational losses by $11,286. Knowledge of this type is important in assessing the operational impact of decisions to change the trainee/experienced staff mix.

Checking the Assumptions of the Multiple Regression Model

We can use the same procedures mentioned for simple linear regression. The only additional concern is that the independent variables are uncorrelated. If this is not the case, the regression model may suffer from multicollinearity.

Multicollinearity occurs when a linear relationship exists among the independent variables. If this occurs, the estimated regression coefficients become unreliable. There are a number of quantitative ways to detect multicollinearity. The simplest involves inspecting the sample correlation matrix constructed from the independent variables. Another way to detect multicollinearity, available in most regression software packages, involves the calculation of variance inflationary factors (VIFs) for each variable. VIFs are used to detect whether one predictor has a strong linear association with the remaining predictors (the presence of multicollinearity among the predictors). VIFs measure how much the variance of an estimated regression coefficient increases if the independent variables are highly correlated. It is generally accepted that a value greater than 5 indicates multicollinearity may be a serious problem, and therefore the regression coefficients, partic- ularly those with high VIFs, may be unreliable.

153

Linear Regression in Operational Risk Management

EXAMPLE 12.8 MISSPECIFICATION TESTING THE SIMPLE LINEAR REGRESSION OF OPERATIONAL LOSSES

ON SYSTEM DOWNTIME

Figure 12.10 plots the standardized residuals for the multiple regression model of Example 12.7. The vast majority of residuals lie within −2 to 2. Table 12.3 also reports that the variance inflation factors all are less than 2. Multi- collinearity should not be a problem. Inspection of Figure 12.10 does not reveal any violations in homoscedasticity, although the pattern of observa- tions seems to indicate failure of the independence assumption. The standardized residuals appear approximately normal, as shown in Figure 12.11.

Dalam dokumen Operational Risk - with Excel and VBA (Halaman 169-174)