• Tidak ada hasil yang ditemukan

Chapter 3: DATA AND RESEARCH METHODOLOGY 25

3.2 Classical and Bayesian Logistic Regression Model

3.2.4 Bayesian Prior Distributions

This thesis make use of a full Bayesian approach in estimation, with prior distribu- tions assigned to all the parameters. Bayesian statistics differs from classical statistics in the sense that parameters are regarded as random variables in the former, while a prior distribution has to be specified in order to make inference in the latter. The major challenge in Bayesian statistics is the correct specification of a Bayesian prior distribution, because appropriate prior specification is key in Bayesian modeling.

(Gelman, 2002) indicated that the prior distribution is an important part of Bayesian inference, representing information about an uncertain parameterθwhich is com- bined with the probability distribution of the likelihood of new data to produce the posterior distribution. This is then used for future inference onθ. Therefore, neces- sary precaution should be taken in selecting priors because inappropriate choices of priors may result to wrong inference (Institute et al., 2008).

In specifying priors, a number of points need to be considered. A key point among them is the fact that priors can be tentative. Because inference is assumed to be de- pendent on prior choice, alternative priors are examined to explore how sensitive the main conclusions are to alterations in the prior. Also, it is important and neces- sary to allow prior beliefs to be influenced by data. There are different types of prior distributions, some of which are discussed below:

Non-informative and Informative prior distributions

As earlier mentioned, the most key important aspect of Bayesian statistics is the setting up of the right prior to include in the model. It is important at this point to explain the major differences between non-informative and infor- mative prior distributions when specifying the Bayesian prior distributions.

Non-informative (vague) priors are used if either little is known about the co- efficient values, or one wishes to make sure that prior information plays very little role in the model. This simply means the data is allowed to remain in- fluential in the analysis under consideration. As a result of the objectivity of non-informative priors, the majority of researchers in statistics prefer to make use of it compared to informative priors. The most common choice of non- informative priors is the flat prior, which assigns equal likelihood to all possi- ble values of the parameters. On the other hand, an informative prior distri- bution summarizes the evidence about the parameters of interest from many sources and often has considerable impact on the posterior distribution.

In addition, for our data to remain influential, this thesis utilizes non-informative priors that will not influence the posterior distribution. We assume a multivari- ate normal prior onβ with a large variance(σ2 = 1000)and mean(µk = 1), except otherwise stated.

β0∼N(b020).

The variance (σ2) needs to be transformed before it is introduced into the model. Hence, we use τ =0.001 as the transformed variance. In the case of the Bayesian multilevel regression model, each random effect uses a gamma distribution with α=0.1 and β = 0.01. This thesis utilizes a multivariate nor- mal (b00) prior density for the parameter vector β. We also assumed that the prior for the ith component be normal (b1, S21), while the priors for each component are independent of each other. Therefore,

Σ0 =



s20 · · · 0 ... . .. ...

0 · · · s21



and

b0=



 b0

... b1.



Hence, we can write the general formula for prior distribution as follows:

p(β)∝exp

−1 2

β−b0

0 Σ−10

β−b0

(3.20) Since a multivariate normal prior does not have to be made up from inde- pendent components, the posterior distribution will be a multivariate normal (b11), where

Σ−11 = Σ−10 + Σ−1M L,

and

b1= Σ1−1M L|βˆM L+ Σ1−10 |b0.

whereΣM Lis the covariance matrix of the maximum likelihood estimate (MLE) vector, and its inverse is represented as

Σ−1M L =XΣ−1X

while βˆM L is the maximum likelihood vector and Σ−10 is the prior precision matrix.

The posterior distribution refers to the distribution of the parameters after data observation. The estimates of Bayesian inference are obtained by sampling from the posterior distribution. In terms of the Bayesian approach, we can write the posterior distribution as

p(β|yi)∝p(yi|β)p(β)

wherep(yi|β)is the likelihood function and is expressed as:

p(y|β) =exp

−1 2

β−bLS 0

Σ−1LS

β−bLS

.

Hence, the posterior distribution is represented as:

p(β|yi)∝exp

−1 2

β−b1

0 Σ−11

β−b1

. (3.21)

Improper priors

When the integral over the sample space does not converge, the probability distribution specified forθis assumed to be improper.

p(θ)∝1,

As argued by (Lancaster, 2004), an improper prior distribution can lead to pos- terior impropriety. To establish whether a posterior distribution is proper, past studies indicate that the normalizing constantR

p(y|θ)p(θ)dθis finite for allys.

When an improper prior distribution leads to an improper posterior distribu- tion, inferences based on the improper posterior distributions are not valid (In- stitute et al., 2008).

Prior for fixed effects

Fixed-effect parameters have no constraints, and as such can assume any value.

A prior distribution for such parameters will need to be defined over the whole real line. The conjugate prior distribution for such parameters is the normal distribution.

Normal prior with huge variance

As the variance of the normal distribution is increased, the distribution be- comes locally flat around its mean. As earlier mentioned, fixed effects can assume any value. However, a close examination of the data can narrow the range of values and a suitable normal prior can be found. Generally, the nor- mal prior, p(θ)∝N(0;104) will be an acceptable approximation to a uniform distribution. If the fixed effects are very large however, a suitable increase in the prior variance may be necessary.

3.2.5 Bayesian Posterior Distribution via Markov Chain Monte Carlo