Logistic Regression - CHAPTER THREE Data and Methodology

CHAPTER THREE Data and Methodology

3.9 Logistic Regression

Chapter Three Data and Methodology

While correlation coefficients are normally reported as r = (a value between -1 and +1), squaring them makes then easier to understand. The square of the coefficient (or r square) is equal to the percent of the variation in one variable that is related to the variation in the other. After squaring r, ignore the decimal point. An r of 0.5 means 25% of the variation is related (0.5 squared =0.25). An r value of 0.7 means 49% of the variance is related (0.7 squared = 0.49).

3.8.2 Spearman's Rank Correlation Coefficient

Spearman's rank correlation coefficient is denoted as ‟s for a population parameter and as for a sample statistic. It is appropriate when one or both variables are skewed or ordinal (Myers, Jerome L.; Well, Arnold D, 2003) and is robust when extreme values are present. For a correlation between variables x and y, differences between the ranks of each observation on the two variables are calculated the formula for calculating the sample Spearman's correlation coefficient is given by

^∑₍₎, where is the difference in ranks for x and y.

Chapter Three Data and Methodology

Let Y be the binary response dependent variable which takes on values 1 and 0 with probability and respectively so that,

( ) = ₍ ₎ , Where is explanatory variable and ( ) =

^{( )}

( ) ^{( )}

Therefore, we can write,

^{( )}

^{( )} = ⁽ ⁾………. (7.1)

Now if we take the natural log of the equation (7.1) we have,

( )= ………...(7.2)

Here, given in equation (7.1) is known as odds ratio and ( ) given in equation (7.2) is known as log odds.

Instead of single explanatory variable we can use two or more explanatory variables. Let be the vector of k independent explanatory variables for the ith respondent. The natural logarithm of the ratio and gives the linear function of and the model (7.2) becomes, ( )= ∑ ………..….(7.3) Where in general we consider and is the parameter relating to

. The function (7.3) is the linear function of both the variable X and the parameters and the left side of the equation is known as logit transformation and the whole relationship is called logistic regression model.

Chapter Three Data and Methodology

3.9.1 Interpretation of the Parameters

Interpretation of the Parameters in logistic regression model is not so straightforward as in linear regression model. Therefore, it is obvious that a brief discussion is necessary to explain parameters in logistic regression model. Since logit transformation ( ) is linear in parameters, we can interpret the parameters using the arguments of linear regression.

Thus the interpretation may be introduced as follows:

We have,

is a linear in parameters, i.e.

(

)= , so arguing analogously as in the case of linear regression model we can explain that

(j=1,2,………..,k) represent the rate of change in (

) for one unit change in considering other explanatory variables remaining constant.

Also the parameters in logistic regression model can be interpreted through odds ratio. In order to describe the situation, let us consider that the explanatory variable is dichotomous one. This situation is not only the simplest but also it gives the conceptual foundation for all other situations.

The explanation is given below:

We have, ( )= , here we consider is a dichotomous variable taking values 0 and 1, then the odds

Chapter Three Data and Methodology

ratio „O‟ (say) for against (taking all other X‟s are fixed) is given by ⁽^⁄ ^{) { (}^⁄ ^⁄ ^)}

( ⁄ ) { ( ⁄ ⁄ )}

= =

Hence so, we can directly estimate the coefficients of a logistic regression model as ̂ and hence and can be interpreted.

Hence the interpretation of is naturally somewhat different from the interpretation in the linear regression and it is obviously represents the amount by which the log odds change per unit change in . It is somewhat more meaningful, however, to state that a one-unit increase in increases the odds by the multiplicative factor . If a qualitative independent variable has m categories, we introduce only (m-1) dummy variables and the rest one is taken as reference category.

3.9.2 Determining the worth of the individual regressors

In the following section we have discussed various statistics that have been suggested for assessing the worth of each individual regressor. The results can be generalized to the situation in which more than one regressor is added to an existing model.

3.9.3 Odds Ratio

Goodman and Kruskal (1954, 1959) present a great many measures of association for 2×2 tables that are not function of and give their statistical properties in their research work named odds ratio. The odds ratio is a way of comparing whether the probability of a certain event is the same for two groups. The odds ratio takes values between zero and infinity. One is the neutral value and means that there is no difference

Chapter Three Data and Methodology

between the groups compared; close to zero or infinity means a large difference. An odds ratio larger than one means that group one has a larger proportion than group two, if the opposite is true the odds ratio will be smaller than one. In other words, an odds ratio of 1 implies that the event is equally likely in both groups. An odds ratio greater than one implies that the event is more likely in the first group. An odds ratio less than one implies that the event is less likely in the first group. For more details, let us consider the following typical 2×2 table

Table # 3.02: 2×2 Contingency table

Total

a b a+b

c d c+d

Total a+c b+d a+b+c+d

In the above table, the odds ratio for row are ⁄ . The odds for row are ⁄ . The odds ratio (OR) is simply the ratio of the two odds given by _⁄^⁄ , which can be simplified as ⁄ , hence it is clear that if the odds are the same in each row, then the odds ratio is 1.

The odds themselves are also a ratio. To explain this we will take an example with probability. Let‟s say that the probability of success is then the probability of failure is , then the odds of success is defined as „odds (success0‟= ⁄ , that is, the odds success are 4 to 1. Then the odds of failure would be „odds (failure)‟

= ⁄ /0.8 = 0.25, that is, the odds of failure are 1 to 4. Next let‟s compute the odds ratio by OR = odds (success)/ odds (failure) = 4/0.25 = 16, the interpretation of this odds ratio would be that the odds of success are 16 times greater than for failure. Now if we had formed the odds ratio the other way around with odds of failure in the numerator, we would

Chapter Three Data and Methodology

have gotten something like this, OR = odds (failure)/odds (success) = 0.25/4 = 0.0625, interestingly enough, the interpretation of this odds ratio is nearly the same as the one above. Here the interpretation is that the odds of failure are one-sixteenth the odds of success. In fact, if we take the reciprocal of the first odds ratio we get 1/16 = 0.0625.

3.9.4 Relative Risk

A more direct measure comparing the probabilities in two groups is the relative risk, which is also known as the risk ratio. The relative risk is simply the ratio of the two conditional probabilities. The risk ratio takes on values between zero and infinity. One is the neutral value and means that there is no difference between the groups on the variable concerned.

A risk ratio larger than one means that group one has a larger proportion than group two, if the opposite is true the risk ratio will be smaller than one. For the table 3.1 the relative risk for the event can be defined as

⁄

⁄ , similarly relative risk for the event is given by ^⁄_⁄ , the relative risk or risk ratio gives us the percentage difference in classification between group one and group two. For example, 8% of freezers produced without quality control have paint scratches. This percentage is reduced to 5% if quality control is introduced. The risk ratio RR = 8/5 =1.6 and its interpretation as 60% more freezers are damaged if there is no quality control.

Dalam dokumen Demand for Universal Primary Education in Bangladesh (Halaman 95-100)