• Tidak ada hasil yang ditemukan

Markov Chain Monte Carlo Simulation

Chapter 3: DATA AND RESEARCH METHODOLOGY 25

3.2 Classical and Bayesian Logistic Regression Model

3.2.6 Markov Chain Monte Carlo Simulation

numbers from the posterior distribution. In reality, a posterior distribution is often of higher dimension and analytically intractable, and when the posterior distribution comes from a distribution that is complex, the MCMC approach offers a better alter- native for summarizing the posterior distribution. The quality of the MCMC sample depends on how quickly the sampling procedure explores the posterior distribution.

tation of the desired distribution, based on the influence of initial values in the early part of the chain and the inability of the within-chain serial correlation. According to (Gelman et al., 2014b), these difficulties are dealt with by simulating multiple chains with different starting values distributed through the parameter space, mon- itoring convergence, and discarding the early iterations of the simulation. Autocor- relation can be reduced by applying thinning, which results in keeping some parts of the simulation after discarding some. There are several ways to assess or mon- itor whether parallel chains have converged or not, some which are explained by (Gelman et al., 2003). The Gelman-Rubin statistic is used to assess the convergence of chains separately for all parameters under consideration (Gelman et al., 2014a,b;

Brooks & Roberts, 1998). In a situation where the sampling is to continue indefi- nitely, convergence will be monitored by estimating the factor by which the scale parameter might shrink.

Suppose we have N samples ofω from each of Cchains and represent this aswnc. We define the within-sequence meanw¯cand overall meanw¯as:

¯ wc= 1

N XN n=1

wnc

¯ w= 1

C XC c=1

¯ wc.

We now define the between chain variance (B) and within chain variance (T) as fol- lows:

B = N

C−1 XC c=1

( ¯wc−w)¯ 2. (3.25)

T = 1 C

XC c=1

1 N −1

XN n=1

( ¯wnc−w¯c)2

. (3.26)

After that, we construct two estimates of the variance ofw. The first estimate of the variance ofwis expected to underestimate the var(w) if the chains have not ranged over the full posterior, while the second estimate, expressed as

ˆ

υ= N −1

N W + 1 NT

is an estimate of the var(w) that is unbiased when equilibrium (stationary) con- ditions are reached, but is an overestimate when the starting points were over-

dispersed. The test statistic for the Gelman-Rubin diagnostic test can be estimated as follows:

Rˆ=

rN−1

N + B

N T.

Rˆ is called the estimated potential scale reduction factor (PSRF) and measures the degree to which the posterior variance would decrease if we were to continue sam- pling by increasing N. If the potential scale reduction factor is greater, say above 2, then further simulations are required, but whenRˆ → 1, sayR<1.1 is an indicatorˆ of convergence. This simply means the variance between the chains is similar to the variance within each chain. The corrected version of theRˆstatistic was defined by Brooks and Roberts (Brooks & Roberts, 1998), and is expressed asRˆc= a+3a+1R, whereˆ ais the estimate of the degrees of freedom for the pooled posterior variance estimate.

Raftery and Lewis’s diagnostic (Brooks & Roberts, 1998) determines the minimum number of iterations based on minimal autocorrelation, and the required sample size and length of burn-in process for a single chain. The Geweke diagnostic compares values in the first part of the Markov chain analysis to those in the latter stage to detect failure of convergence (Ntzoufras, 2011). Gewekes statistic has an asymptoti- cally standard normal distribution, and is expressed as

Zk= θ¯1−θ¯2

q 1

K1S1(0) + K1

2S2(0) →N(0,1), n→ ∞. (3.27) whereS1(0)andS2(0)are respectively classical estimates of the respective variances An inability to reach convergence in MCMC sampling may reflect problems in model identifiability due to overfitting. To overcome such problems, running multiple chains often helps by diagnosing poor identifiability. This is illustrated mostly when identifiability constraints are missing from a model, such as in discrete mixture mod- els that are subject to label switching during MCMC updating (Fr ¨uhwirth-Schnatter, 2001). One chain may have a different label from others, so that obtaining the Gelman-Rubin (G-R) statistic for some parameters is not sensible. A choice of diffuse priors tends to increase the chance of poorly identified models, especially in complex hierarchical models or small sample datasets as revealed by Gelfand et al. (1995).

Correlation between parameters within the parameter set = (θ1, ..., θk) increases the dependence between successive iterations, while informative priors may help in identification and convergence. As reported by (Zuur et al., 2002) re-parameterization measures aimed at reducing correlation such as centering predictor variables in re- gression usually improve convergence. According to (Heidelberger & Welch, 1983), the Heidelberger-Welch (HW) diagnostic is an automated test for checking the sta-

tionarity of the chain and further evaluating whether the length of the chain is suf- ficient to ensure the desired accuracy for the posterior means of the parameters in Bayesian analysis. This test is based on the Cramer-von Mises test statistic which decides to either accept or reject the null hypothesis that the chain is from a station- ary distribution. The test will first check for stationarity, and thereafter determine the accuracy of the model parameters.

The advantage of the Bayesian method is that when the posterior distribution is simulated, the uncertainty of the parameter estimates is taken into account. That is, the uncertainty in the parameter estimates for the fixed part is taken into account in the estimates for the random part. Moreover, simulating a large sample from the posterior distribution is useful because it provides not only point estimates of the unknown parameters, but also confidence intervals that do not rely on the assump- tion of normality for the posterior distribution. Hence, credible intervals are also accurate for small samples dataset (Tanner & Wong, 1987). The number of MCMC iterations required are very large when the sample size is very small, since MCMC techniques do not perform very well with small datasets. This may lead to high auto- correlation among the parameters, particularly when estimating the mean. This is why up to 1,500,000 iterations were run in all our Bayesian models with a lag of 60 to reduce autocorrelation, which may have necessitated even more iterations.

Many other diagnostic tools have been proposed by (Ntzoufras, 2011) to assess con- vergence, and compared by (Brooks & Roberts, 1998). Convergence can also be im- plemented in the CODA/BOA package for R. Four different diagnostics are pro- vided by CODA as indicated by (Little & Wang, 1996; Erkanli et al., 1999; Dunson

& Colombo, 2003; Heidelberger & Welch, 1983). In this thesis, visual of diagnostic plots and Gelman-Rubin diagnostic (Rˆ →1) were the main approaches to assessing convergence.