• Tidak ada hasil yang ditemukan

Other Classical Logistic Regression Statistics

Chapter 3: DATA AND RESEARCH METHODOLOGY 25

3.2 Classical and Bayesian Logistic Regression Model

3.2.2 Other Classical Logistic Regression Statistics

The likelihood function of expression in (3.13) is derived as follows:

P(β|y, x) = Yn i=1

g(xi, β) Yi

1−g(xi, β) 1−Yi

. (3.14)

= Yn i=1

e(xi,β) 1 +e(xi,β)

Yi

. 1

1 +e(xi,β) 1−Yi

.

To continue with the Bayesian analysis, it is essential to derive a joint prior distribu- tion over the parameter space. In reality, this is very hard to do because the relation- ship between the data and the parameters is very complex. The easiest way to over- come this difficulty is to propose either an informative prior or a non-informative prior, but with small precision, avoiding any complaint about the specification of subjective beliefs (O’hagan et al., 1990). In this thesis, we use non-informative nor- mal priors with extremely small precisions which were set to the parameters. There- fore, the next procedure is to obtain the posterior distribution, since the inference is based on the information provided by the posterior distribution.

P(β|y, x) =`(β).

Yn i=1

g(xki, β) yi

1−g(xki, β) 1−yi

.

P(β|y, x)∝P(β).

Yn i=1

e(xki,β) 1 +e(xki,β)

yi

. 1

1 +e(xki,β) 1−yi

. (3.15)

Now, we found that expression (3.15) is a complex function of the parameters, and numerical approaches are required in order to obtain the marginal posterior distribu- tion for each of the model parameters. Approximations can be obtained via numeri- cal integration (Naylor & Smith, 1982). Simulation-based methods have proliferated in the last ten years or so, yielding two popular approaches known as importance sampling (Zellner & Rossi, 1984) and Gibbs sampling (Dellaportas & Smith, 1993;

Albert & Chib, 1995). Re-sampling techniques, applied to logistic regression for ran- domized response data, were alternatively proposed by Souza and Migon (2004).

employed are discussed as follows:

Deviance analysis of model fit

The principle here is that the observed values of the response must be com- pared with the estimated values from the models with and without the vari- able under consideration. The comparison between the observed values of the response variable to the predicted values is based on the log likelihood func- tion, and is represented as follows:

Xk i=1

pilog[π(ηi)] + (1−pi)log[1−π(ηi)]

. (3.16)

The comparison of observed to the predicted values using log-likelihood is based on the statisticDwritten as

D=−2log

likelihoodof thecurrentmodel likelihoodof thesaturatedmodel

which we expressed as

D=−2[log(Lr)−log(Lt)], (3.17) where log(Lr) is the log-likelihood for the extended model and log(Lt)is the log-likelihood for the simpler model, andDis the deviance. With large sam- ple sizes, deviance (D) approximately follows a chi-square distribution with (t−r) degrees of freedom. Where t and r are the number of parameters in the saturated and proposed models, respectively. Deviance is used for the assess- ment of the model’s goodness of fit. The combination of the above expressions, deviance (D), can be represented as

D=−2 Xk i=1

pilog

π(ηi) pi

+ (1−pi)log

1−π(ηi) 1−pi

. (3.18)

• Akaike Information Criteria (AIC)

This statistical test measures the relative value of a statistical model for a given set of data. AIC is expressed as

AIC = 2δ−2ln(L),

whereLis the likelihood of the model andδ indicates the number of parame- ters on the model under consideration. AIC is used to select the best model in a set of data. The model with the lowest AIC value of should be given prefer-

ence.

• Wald test

This statistic is used to assess the significance of individual logistic regression coefficients.

W = βˆ1

SE( ˆβ1).

where βˆ1 is the estimate of the coefficient of the explanatory variable and SE( ˆβ1)is the standard error ofβˆ1.

• Cox and Snell Pseudo R2 The R2 for logistic regression is estimated by Cox and Snell Pseudo and is represented as

R2 = 1− `(o)

`( ˆβ) n2

.

`(o)is the log-likelihood of the initial model, and`(β) is the log-likelihood of the current model.

`(o) =n

ˆ πlog

πˆo 1−πˆo

+log

1−πˆo

.

whereπˆo =Pk

i=1 nipi

n , andnis a vector whose elements are the weight for the ith case.

• Likelihood Ratio Test

This statistics tests for the significance of all the variables included in the logis- tic regression model.

−2log Ao

A1

=−2[log(A0)−log(A1)]

whereAois the maximum value for the likelihood function of a simple model andA1 is the maximum value for the likelihood function of a full model. Ac- cording to (Hosmer & Lemeshow, 2000a,b), a simple model has one variable dropped, while a full model contains all the parameters of interest.

• Deviance Information Criteria (DIC)

The ability to fit complex multilevel models using Markov Chain Monte Carlo (MCMC) techniques presents a need for methods to compare alternative mod- els. The standard model comparison techniques such as AIC and BIC require the specification of the number of parameters in each model. For multilevel

models which contain random effects, the number of parameters is not gener- ally obvious and as such an alternative technique of comparison is demanded.

The most widely used of such alternative technique is the Deviance informa- tion Criteria (DIC) as suggested by Spiegelhalter et al. (2002). The DIC statistic is a generalization of the AIC, and is based on the posterior mean of the de- viance, which is also a measure of model complexity and fit. The deviance is defined as

D(θ) =−2 logf(y|θ).

since DIC is a measure of model complexity, it considers a measure of the ef- fective number of parameters in a model, and is defined by

pD= ¯D(θ)−(θ).˘

whereD(θ)¯ is the posterior expectation of the deviance, given by D(θ) =¯ −2E

logf(y|θ)|y

.

and(θ)˘ is the deviance evaluated at some estimateθ˘ofθ. Therefore, we now define the deviance information criteria (DIC) by

DIC = ¯D(θ) +pD= 2 ¯D(θ)−θ.ˆ (3.19) whereD¯ is the posterior mean of the deviance that measures the goodness of fit, andpDrepresent the effective number of parameters in the model. In the case of the Bayesian and bootstrapping models, low values ofD¯ imply a better fit, while small values of pDimply a parsimonious model. pDis higher for a more complex model, and DIC appears to select the correct model. The best fitting model is one with the smallest DIC, as suggested by (Spiegelhalter et al., 2002; Lesaffre & Lawson, 2012).