OTHER APPROACHES FOR MODELING BIVARIATE BINARY ENDPOINTS

As always in statistical modeling, there are many ways to achieve our objec- tives. In this section we highlight a number of alternative models.

Bivariate Bernoulli Model and Bivariate Binomial Model

The bivariate Bernoulli distribution is natural starting point for modeling bivariate binary data. It is parameterized by two marginal parameters (P_Sand P_T) and one bivariate parameter (P₁₁). The correlation between Sand T(r) is given by⁷

(P − P P ρ = ¹¹ ^{S T})

P_S(1 − P_S) P_T(1 − P_T) It can also be shown that

 P P (1 − P)(1 − P_T)

 ≤ r ≤

Max ^{S T} , S

 (1 − P)(1 − P _{S T} 

 ^S ^T) P P

, ,

^t]! ^S ^P^T ^ ^

Gumbel-type Models

An alternative to the bivariate binomial distribution was introduced by Gumbel.⁸Gumbel developed several bivariate logistic distributions with the property that their marginal distributions were also logistic. His first distribution is given by

F s t ( , ) = [1 + exp^−s+ exp^−t]⁻¹ (13.5) where F(s, t) is the cumulative probability function.

Gumbel showed that the logistic distribution implied by Equation 13.5 is asymmetric. In particular, the probability at the center F(0,0) =0.3333 instead of 0.25 as implied by the bivariate normal probability function used in the bivariate logistic model discussed in Case Study 13.1. The author goes on to show that Equation 13.5 cannot be split into the product of the marginal distributions; therefore, it implies a priori that the variables are dependent. In addition, Gumbel showed that Equation 13.5 can be used in cases where the marginal distributions are symmetrical and resemble the normal distribution if the sample coefficient correlation is of the order 0.5.

Gumbel’s second logistic distribution is given by

s t

F s t ( , ) =[1 +exp ^{− −1}^s] [1 +exp ^{− −}^t] [1 ¹ +αexp ^{− −}

[1 + exp^{− −1}^s] [1 + exp^{− −}¹] (13.6) where the unknown parameter αis subject to the restriction −1 < α< 1. The author shows that the correlation (r) between Sand Tis a function of α given by

3α ρ = ₂ π

where π =3.1415927 . . . . Equation 13.6 is more flexible than Equation 13.5 due to the addition of the dependence parameter a. The probability at

0 0 . the center F( , ) = 0 25 

1 + α  Independence occurs when a = 0.

 4 

Grizzle-type Models

Following Gumbel’s proposal, a number of authors introduced covariates into the bivariate logistic framework. An early marginal modeling approach was developed by Grizzle,⁹who analyzed data on coal miners in nine five- year-wide age groups reporting either, neither, or both of the respiratory symptoms, breathlessness and wheeze. For each age category the author considered only the marginal data of how many had or did not have breathlessness and how many or did not have wheeze. Two marginal models were developed for each symptom:

 P_S 

x i =1, ..., 9 1. ln

1 −P  = α^wheezei + βwheezei i

 ^S 

 P_S 

x i=1,...,9 2. ln

1 −P  = αbreathlessnessi+βbreathlessnessi i

 ^S 

175

Logistic Regression in Operational Risk Management

where x is an indicator variable for wheeze or breathlessness. Mantel and Brown along with Nerlove and Press¹⁰extended this model into the multivariate setting.

Generalization of the Ashford-Sowden Probit Model

Morimune,¹¹in a discussion of the differences between the bivariate logistic and bivariate normal distribution, developed a generalization of the Ashford- Sowden probit model in which the correlation coefficient is made a function of X:

X + 1. P₁₁ exp ( β X +β κX)

= ^S

X + 1 +exp ( β_SX +β κX) 2. P₀₁ exp ( β X)

= ^S −P₁₁

X + 1 +exp ( β_SX +β κX) 3. P₁₀ exp ( βX)

= − P₁₁

X + 1 +exp ( β_SX +β κX) 4. P₀₀= −1 P₀₁−P ₁₀−P₁₁

This was extended into a fully logistic version of the Ashford-Sowden model by Maddala,¹²given by

X + 1. P₁₁ exp ( β X +β κX)

= ^S

X + 1 + exp ( β_SX +β κX) 2. P₀₁ exp ( β X)

= ^S −P₁₁

1 +exp ( β_SX) 3. P₁₀ exp ( βX)

= − P₁₁

1 +exp ( βX) 4. P₀₀= −1 P₀₁−P ₁₀−P₁₁ Copula-based Models

An alternative framework for developing multivariate models is based on the use of copula functions. Let P_Sand P_Trepresent two univariate marginal distribution functions. Let H(u, v) denote a bivariate distribution function concentrated on the unit square having uniform marginal distributions, so that H(1,1) = 1, H(0,0) = 0, H(u,1) = u, and H(1,v) = v. Such bivariate

distribution functions are called copulas. We can define a bivariate distribution for random variables S and T by

< [ s t

Prob (S s, T <t) =H P _s(),P_T()]

Since H has uniform marginal distributions, S and T have marginal distribution functions P_Sand P_T. All that remains is to choose an appropriate distribution for H. There are a wide variety of one-parameter and two-parameter parametric families of multivariate copulas. The choice between one- parameter and two-parameter families will depend on how many types of dependence you wish to capture in your model. One-parameter families, in particular the Plackett family,¹³have been popularized by Dale,¹⁴and Le Cessie and Van Houwelingen.¹⁵For 0 ≤d < ∞, the Placket copula is given by

H s t ( , ,δ ) = 1 (δ 1)(s t) [(1 (δ 1)(s 2

(δ −1)⁻¹

{

¹⁺ ⁻ ^{+ −} ⁺ ⁻ ⁺^t) ⁾

4(δ st ^/

− −1) ]^{1 2}

}

where d is the dependence parameter such the marginal distributions are independent if d =1.

The Plackett copula is considered by Dale,¹⁶who investigates the problem of correlated ordinal outcomes. Dale quantified the dependence parameter d in the Plackett copula as the “cross ratio” between outcomes defined as

δ = P P₁₁₀₀ P P_{01 10}

When the responses are binary, the cross ratio reduces to the odds ratio. Le Cessie and Van Houweligen,¹⁷who used the Plackett copula, with the dependence parameter given by the Dale cross ratio, to model correlated binary outcomes in such a way that the marginal response probabilities are logistic.

SUMMARY

There are many important research topics in OR for which the dependent variable is limited (discrete, not continuous). OR managers may wish to ana- lyze whether some event occurred, such as failure of a system or fraud. Binary logistic regression is a type of regression analysis where the dependent variable is a coded 0 or 1. Logistic regression has many analogies to linear regression, the logistic regression coefficients correspond to coefficients in the linear regression equation, and a pseudo-R²statistic such as McFadden’s

2 can be used to summarize the strength of the relationship. Unlike linear regression, logistic regression does not assume linearity of relationship

177

Logistic Regression in Operational Risk Management

between the dependent and independent variables, nor does it require nor- mally distributed random variables or homoscedasticity. We can easily extend logistic regression into the bivariate setting. Indeed, the bivariate model developed in this chapter allows us to explore the relationships between the independent and binary dependent variables. The model is sim- ple to apply and easy to implement. The parameters in the joint model have the same interpretation as they do in the standard logistic regression setting.

In the following chapter we discuss how we can generate multivariate regression–type models where the dependent variables may be all binary, all continuous, or an arbitrary combination.

REVIEW QUESTIONS

1. Outline the differences between logistic and linear regression.

2. Can we estimate the logistic regression parameters using ordinary least squares?

3. How can we assess model fit in logistic regression?

4. What is the odds ratio?

5. Write a function in VBA that returns the parameter estimates for the bivariate logistic model introduced in this chapter.

■ Test the model on a range of simulated and real binary data.

■ Is the latent variable assumption of normality generally valid for your data?

I

CHAPTER 14 Mixed Dependent Variable Modeling

n the previous two chapters we have discussed how to use linear regression and logistic regression when we have continuous or binary dependent variables. In some other situations we might wish to model jointly dependent variables that are a mixture of binary and continuous variables. For example, an OR manager might be interested in developing a joint model with the binary dependent variable “transaction completed” alongside the continuous dependent variable “transaction time.” In this chapter we outline a general and easily implemented approach for multivariate regression modeling of this type.

Dalam dokumen Operational Risk - with Excel and VBA (Halaman 194-200)

OTHER APPROACHES FOR MODELING BIVARIATE BINARY ENDPOINTS

[

]

[

]

[

]

[

]

∑ ∑

∑ ∑

∑

∑

∑

∑

{

}

SUMMARY

REVIEW QUESTIONS

FURTHER READING

I

CHAPTER 14

Mixed Dependent Variable Modeling