The Basic Principles of Data augmentation

cannot be estimated by using the normal likelihood, the EM algorithm tends to un- derestimatestandard errors, which are critical to hypothesis testing (Allison, 2002).

The common MCMC procedure has desirable features than ML model-based procedure are efficiency and flexibility (McKnight et al., 2007). The MCMC process are efficient because it allows us to estimate parameter even if the unexpressed distribu- tions are unknown or non-normally distributed. The advantage of using software MCMC procedure (Bayesian data augmentation) is that it always finds the solutions even for most complex missing data problems. Especially when given data distribu- tions are unknown or when it does not follow the multivariate normal distribution.

The data Augmentation procedures almost follow the the same logic as the EM, have two iterative steps but Bayesian Data Augmentation procedures are unrestricted to the expectations of distribution (e.g. multivariate normal) unlike EM estimation procedure. In the iterative process, the EM is restricted by the expectation derived from a distribution in order to estimate parameters while the MCMC methods are unlim- ited by distributional assumptions.

Given the data with missing observations, our main purpose is to obtain unbiased parameter estimates but it is very difficult to obtain such estimates by ignoring un- observed data and using the observed data only. The augmented data consist of the set of missing valuesxmv and observed values xov, which form x=(xov,xmv). Data Augmentation procedure allows to augmenting or completing the observed dataxov

with simulated values of the missing dataxmvto manage missing data problem. The main focus is to compute the posterior distributionP(θ | xov), but unfortunately it is very difficult to compute this posterior distribution due to the presence of missing values inx(M. A. Tanner & Wong, 2010). If bothxov andxmv are given, we can calculate or sample from the augmented posterior distributionP(θ | xov,xmv). In- order to obtain posterior distribution,first find the multiple imputation ofxmv from the predictive distribution and then calculate the mean ofP(θ |xov,xmv)divide by the above imputations. Since the predictive distributionP(xmv | xov)depends on the full posterior distributionP(θ |xov), it is important to determineP(θ | xov)for iterative algorithm.

The motivation of the procedure is that the distribution in these two steps are much easier to draw from either of the posterior distributionsP(θ|xov,xmv)andP(θ|xov) or the joint posterior distribution P(θ,xmv | xov). The MCMC procedure has two iterative steps to augment the data, which is the imputation or I-step and posterior or P-step. A data augmentation algorithm provides the way of improving the quality of data and inferences, especially if we have small-sample of data to refine the EM algorithm. The data augmentation iterative steps are repeated until we reach the convergence stage of the algorithm.

The foundation of data augmentation algorithm

The original data augmentation is motivated by following the two basic integrals (Tan et al., 2009):

1.The Posterior identity

P(θ|xov) = Z

xmv

P(θ|xov,xmv)P(xmv |xov)dxmv (3.6) 2. The predictive identity

P(xmv |xov) = Z

P(xmv |xov, ϕ)P(ϕ|xov)dϕ (3.7)

whereP(θ|xov)represent the posterior distribution ofθgiven the observe dataxov, P(xmv | xov) represent the predictive distribution of the missing valuesxmv given the observed dataxov andP(θ|xmv,xov)also represent the conditional distribution of the parameterθgiven the augmented datax= (xmv,xov).

In the above integrals identities, we substitute (3.6) into integral (3.7) and then in- terchange the order of integration. Thus, the posterior distributionP(θ|xov)satisfy the following integral equation

h(θ) = Z

A(θ, ϕ)h(ϕ)dϕ (3.8)

P(θ|xov) = Z

A(θ, ϕ)P(ϕ|xov)dϕ (3.9)

where the kernel function is given by A(θ, ϕ) =

P(θ|xmv,xov)P(xmv |ϕ,xov)dxmv (3.10) LetT be an integral transformation that may transforms any functiongthat is inte-

grable into another integrable functionT gby following integral T g(θ) =

A(θ, ϕ)g(ϕ)dϕ (3.11)

Equation (3.11) can be solved by the method of successive substitution and these suggest a method of determiningP(θ |xov). When we apply a functional analysis, then the fixed point iteration becomes

Pi+1(θ|xov) = Z

A(θ, ϕ)Pi(ϕ|xov)dϕ, i∈N (3.12) On the equation (3.12) above we can implement the method of successive substitution to approximate for the solution.The functionP_i calculated above will always converge to the posterior distribution ofP(θ|xov)under mild conditions. Tanner&

Wong (1987) explain the mild condition of the equation (3.12) as follows

• sufficient conditions for convergence ofP_i+1toPinl₁-norm Tanner&Wong (1987) define the sufficient conditions as follows

|P_i+1(θ|xov)−P_i(θ|xov)|dθ→0 (3.13) asibecame large i.e.i→ ∞:

1. The Kernel functionA(θ, ϕ)must be uniformly bounded and equicontinuous in the parameterθ;

2. The starting value is any initial approximationP₀ must satifies the condition supθP0(θ|xov)

P(θ|xov) <∞.

The computation of intractable integration (3.10), (3.11) and (3.12) is not easy. There are many ways of approximating such complex integrals which are numerical integration, analytical approximations, and Monte Carlo methods. But it is always pos- sible to use Monte Carlo methods because it performs very well (Tanner&Wong, 1987). In the seminal paper, Tanner&Wong (1987) implement the method of Monte Carlo to determine or to compute such integration in (3.9).

3.4.2 General steps of data augmentation algorithm

The data augmentation algorithm iteration process start with imputation (I) step and proceeds to posterior (P) step through the process of Markov chain. This posterior distributionP(θ |xov)may be difficult to be calculated directly. Given the updated

parameter at iteration t which isθ^(t) of the original θthe data augmentation algorithm simulate values using the following two steps:

Imputation (I) Step:

Simulates multiple values of missing data of independent samplex¹_mv,x²_mv, ..,x^k_mv from the currenti^thapproximationPi(θ|xov)to the predictive distribution given by P(xmv |xov).

Posterior (P) Step:

Updating the currenti^th approximation distributionP(xmv | xov) can be approxi- mately obtained as the average ofP(θ|xov,xmv)over missing data that was imputed in the imputation step.

P_i+1(θ|xov) = 1 k

j=1

P(θ|xov,x^j_mv) (3.14)

The missing datax¹_mv,x²_mv, ..,x^k_mv are generated multiple times,then are often called multiple imputation (D. B. Rubin, 1987b). This iterative procedure can be shown to eventually converge to a draw from the joint distribution ofP(xmv, θ|xov)asj→ ∞.

The value ofkneed not be very large, in fact withk = 1the DA algorithm reduces to a special case of the Gibbs sampler.

Dalam dokumen Bayesian data augmentation using MCMC: application to missing values imputation on cancer medication data. (Halaman 50-54)