• Tidak ada hasil yang ditemukan

PARAMETRIC AND NONPARAMETRIC CORRELATION COEFFICIENTS

Dalam dokumen Operational Risk - with Excel and VBA (Halaman 145-151)

125

Correlation and Dependence

Point Biserial Correlation

When one of the variables is binary and the other continuous, we can use the point biserial correlation coefficient. If S is a continuous variable and Y a binary variable taking the values 0 and 1, the point biserial correlation is calculated as

(

S1 S0

)

ρ = p (1− p) σS

S

where S1 = mean of S when Y = 1

0 = mean of S when Y = 0

s s = sample standard deviation of S r = proportion of values where Y = 1

A VBA function to estimate the biserial correlation coefficient is given below. Note that the first column passed to the function should be the continuous variable and the second column the binary variable.

Function Biserial(data As Range) ' Calculate Biserial correlation Dim number_columns As Double

Dim number_rows As Double Dim row As Integer

Dim x1 As Double Dim x0 As Double Dim s As Double Dim p As Double Dim average_x As Double Dim all_x() As Double

number_columns = data.Columns.Count number_rows = data.Rows.Count

If (number_columns <> 2) Then 'check no more than 2 columns Biserial = ˝2 columns only˝

ElseIf (number_rows< 4) Then 'We should use at least 4 ' observations although the more the better Biserial = ˝need at least 4 rows˝

Else x0 = 0

x1 = 0 p = 0

s = 0

average_x = 0 Dim number_ones As Integer number_ones = 0

ReDim all_x(number_rows)

For row = 1 To number_rows ' calculate averages and sum of binary variable

Dim is_one As Integer

is_one = data(row, 2).Value p = p + is_one

If (is_one = 0) Then

x0 = x0 + data(row, 1).Value Else

x1 = x1 + data(row, 1).Value number_ones = number_ones + 1 End If

average_x = average_x + data(row, 1).Value all_x(row) = data(row, 1).Value

Next row

x0 = x0 / (number_rows − number_ones) x1 = x1 / number_ones

average_x = average_x / number_rows p = p / number_rows

For row = 1 To number_rows 'calculate standard deviation s = s + (all_x(row) - average_x)ˆ2

Next row

s = (s / (number_rows−1))ˆ0.5 Dim temp As Double

Biserial = ((x1 - x0) * (p * (1 - p))ˆ0.5) / s ' Return Biserial correlation

End If End Function

EXAMPLE 11.4 CORRELATION BETWEEN OPERATIONAL RISK AND GROSS INCOME OF BUSINESS LINES

An application of this function is given in the worksheet Biserial in the workbook Operational Risk 11.xls. The worksheet is based on the follow-

127

Correlation and Dependence

TABLE 11.3 OR Risk and Gross Income across Business Lines

Operational Gross income Risk

Business line risk (millions $) coding

Corporate finance Low 117.78 0

Trading and sales High 161.84 1

Retail banking Low 117.11 0

Commercial banking High 161.91 1

Payment and settlement High 162.11 1

Agency services and custody High 80.45 1

Asset management Low 50.57 0

Retail brokerage High 172.88 1

ing example: Let us imagine that the level of operational risk in an institu- tion’s business lines is graded as high or low based on the opinion of an OR analyst. Suppose we are interested in assessing the degree of correlation between OR risk and gross income of the business lines. Since in this case OR risk is a binary variable and gross income a continuous variable, we can use the biserial coefficient to estimate correlation. Table 11.3 provides a typical example. The final column gives the mapping of high or low into a binary variable. Using the above Biserial() function in the worksheet

Biserial, the estimate of correlation is 0.57.

Tetrachoric Correlation

Tetrachoric correlation measures the association between two binary vari- ables. Assume T and S are dichotomized at unknown threshold values θS

and θT , respectively. Our observable measurements on S and T are denoted by Sd and Td, where Td =1 if T ≥ θT (otherwise Td = 0), and Sd =1 if S ≥qS (otherwise Sd =0). The joint distribution of (Sd, Td) can be summarized as

TABLE 11.4 The General Situation for Outcomes of a Binary Variables S and T with Probability of Occurrence

T = 1 T = 0

S = 1 P11 P01 PS

S = 0 P10 P00 1 PS

PT 1 −PT 1

Note Pkj is the probability that T =k and S =j where j, k =0 or 1

a 2 × 2 contingency table. The general situation is outlined in Table 11.4, where Pij = Prob (Td = i , Sd = j). Each cell is a bivariate normal integral.

For example:

P00 = Prob ( Td = 0, Sd = 0) = Prob (T < θ T , S < θ S )

θ T θ S

t s r) dtds Φ ( , ,

=

∫ ∫

--

The actual formula for the tetrachoric correlation coefficient is complex and contains an infinite series of terms. However, Pearson1 provides an easy-to- use approximation2 given by

 

o

ρ = ˆ cos 180 

  1 + bc ad







where a, b, c, and d refer to the frequencies in a fourfold table in cells 11, 12, 21, and 22, respectively, and where row 1 and column 2 designate presence.

Consider Table 11.5, which provides information concerning whether a reputational risk event has occurred alongside an internal OR audit score.

TABLE 11.5 Reputational Risk Events and OR Internal Audit Score for 12 Fictional Banks

Recorded Data Data Mapping

Reputational OR Reputational OR Bank risk event audit score risk event Audit Score

XYZ Bank No Low 0 1

GIA Financials No Low 0 1

City FG Holdings Yes High 1 0

Financial Street Bank Yes High 1 0

FPG No Low 0 1

Boston Regal Yes High 1 0

Imperial Crown No Low 0 1

Market DG Yes Low 1 1

Coventry Provincial No High 0 0

Bank 10 Yes High 1 0

AG Swift Inc Yes Low 1 1

High Street Holdings Yes Low 1 1

129

Correlation and Dependence

For this table the tetrachoric correlation coefficient is equal to 0.58. A VBA function to calculate tetrachoric correlation is

Function Tetra(S As Range, T As Range)' Function takes two binary ranges S and T

' Error checks

If (S.Columns.Count > 1 Or T.Columns.Count > 1) Then Tetra = ˝Need only 1 column˝

ElseIf (S.Rows.Count < 10 Or T.Rows.Count < 10) Then Tetra = ˝Need at least 10 rows˝

ElseIf (S.Rows.Count <> T.Rows.Count) Then Tetra = ˝Need at equal number of rows˝

Else ' correlation calculation starts here Dim a As Integer

Dim b As Integer Dim c As Integer Dim d As Integer Dim i As Integer a = 0

b = 0 c = 0 d = 0

Dim pi As Double pi = 3.14159265358979 For i = 1 To S.Rows.Count

If (S(i, 1) = 1 And (T(i, 1) = 1)) Then a = a + 1 If (S(i, 1) = 1 And (T(i, 1) = 0)) Then b = b + 1 If (S(i, 1) = 0 And (T(i, 1) = 1)) Then c = c + 1 If (S(i, 1) = 0 And (T(i, 1) = 0)) Then d = d + 1 Next i

Tetra = Cos(pi / (1 + (Sqr((b * c) / (a * d))))) End If

End Function

The function Tetra() takes two columns which must be of equal length and have at least 10 rows. An example of the use of this function is given in the worksheet Tetrachoric. The worksheet combines the Tetra()

function with a simulation of the two binary variables “OR Audit Score”

and “Reputational Risk Event” for 12 fictional financial institutions. Press

<F9> to run the simulation.

Note that when bc = ad,

means that the overall denominator is 2, and that the overall fraction r = 0 and therefore there is no correlation between S and T. When bc dominates over ad, the overall denominator is greater than 2, which means that the overall fraction is less than 90º, and the resulting estimate of ˆr is negative.

Dalam dokumen Operational Risk - with Excel and VBA (Halaman 145-151)