1.8: Thesis outline
2.1.3 Review for existing advanced missing data techniques
Model-based approach
The data analyst and researchers most of the time uses model-based approaches when trying to determine the unknown parameters θ. The model-based ap- proach is when the likelihood functions is used to find inferences for unknown parameters that are given asθ. Let the vectorxto be a complete data (with no missing data), then the likelihood function for unknown parametersθgivenx is specified by assuming the model to be
L(θ |x)∝P(x|θ) (2.16)
However, if there is a missingness in original datax, let’s try to consider those missing data in the model so that one can obtain precise and unbaised esti- mates. To model such data, let’s assume that the missing data mechanism is MAR (denoted byP(R |ϕ,x). The MAR allow to form a model for full likeli- hood function with the joint parametersθandϕgiven observed dataxov and missing data indicatorR(D. B. Rubin, 1996),
L(θ, ϕ|xov, R)∝ Z
P(xov,xmv |θ)P(R|xov,xmv, ϕ)dxmv (2.17) by assuming that the parametersθandϕare not identical in nature. To specify for Bayesian framework, let’s start by including prior knowledge distributions on the distinct parametersθandϕin order to compute the posterior distribu- tions
L(θ, ϕ|xov,R)∝P(θ, ϕ) Z
P(xov,xmv |θ)P(R|xov,xmv, ϕ)dxmv (2.18) When the prior distributions onθandϕare independent and the missing data mechanism is MAR and ignorable for inferences about unknown parametersθ (D. B. Rubin, 1996). According to D. B. Rubin (1996), this means the following equation can expressed as
P(θ|xov)∝P(θ)P(xov |θa) (2.19) The posterior distribution,P(θ |xov), may not be easy to be computed. How- ever, now we can simulate parameter estimates of unknownθfrom posterior distributionP(θ |xov), for us make posterior inferences for unknown parame- ters given byθ. However, it is difficult to directly simulate sample from poste-
rior distributionP(θ |xov), due to missing data inx. This motivates Bayesian data augmentation method to make it possible to determine posterior infer- ences.
Data augmentation techniques
Data augmentation methods avoid several shortcoming associated with dele- tion methods. Data augmentation methods estimates model parameters from the available data observations as well as from either on the underlying dis- tribution or probability model. In comparisons, some of the single impu- tation and data augmentation methods do not replace unobserved observa- tions. Under estimation of parameters, data augmentation algorithm augment by taking into account the latent data and the observed data. In the miss- ing data problems framework, Maximum Likelihood (ML), Expectation Maxi- mization (EM), Markov Chain Monte Carlo (MCMC) are considered to be aug- mentation methods. According to McKnight et al. (2007), the Markov Chain Monte Carlo (MCMC), Maximum Likelihood (ML) and Expectation Maximiza- tion (EM) methods classication as augmentation methods is not clear-cut. The Markov Chain Monte Carlo (MCMC) method has been referred to as an aug- mentation method within the context of multiple imputation (Allison, 2002).
The ML and EM methods are defined as model-based methods according to Little &Rubin (1987). These procedures mentioned above have also been re- ferred to as data augmentation by J. L. Schafer (1997). We now focus on some of these methods under augmentation methods, namely ML, EM and Markov Chain Monte Carlo (MCMC) version of EM.
Maximum Likelihood (ML)
Maximum Likelihood (ML) was not originally designed to deal with missing data issues in a way such as LOCF and multiple imputation technique. The ML is usually used for estimating parameters under structural equation mod- els (SEM) and ordinary least squares in regression model. The ML is a method that can be used for handling missing data. Little & Rubin (2002) , give the application of ML to missing data problems. Furthermore, in different situa- tions, ML has proven to be an appropriate technique for dealing with missing data issue. Under (MAR or MCAR) missing mechanism , the missing data are ignorable, then ML is adequate to be used, and it gives unbiased estimates (Arbuckle, 1996; Allison, 2002). Therefore, the Maximum Likelihood is fairly
ML estimators for missing data produce unbiased estimates in large samples, asymptotically efficient estimates (small standard errors) and satisfy asymp- totic normality which is to say that estimates approximate a normal distribu- tion which can then be used to exploit a normal approximation for statistical inference, such as finding confidence intervals and p-values (McKnight et al., 2007). The ML can be used in most statistical software including R, SAS, SPSS, S-Plus and more.
Expectation maximization (EM)
The Expectation Maximization algorithm was manly introduced by Dempster et al. (1977) . Expectation Maximization algorithm can be defined as the pro- cess of computing and imputing missing values of each observation under the variable based on the selected probability distribution. According to Little &
Rubin (2002), the Expectation Maximization algorithm is a common iterative method for Maximum Likelihood estimation under missing data problems.
This algorithm hold under the MAR assumption. The basic idea of the EM is to tackle the problem of missing data and solve the complications of estimates that are related to the Maximum Likelihood estimation.
The EM algorithm used the following steps to handle missing data problems:
1. impute missing values for missing data by using predicted values simu- lated by Maximum Likelihood setting.
2. predict parameter estimates based on data simulated in step 1 above.
3. estimate again parameters based on the parameter estimates obtained from previous step 2.
4. estimate again parameters based on the estimate obtained from the data from step 3, and continue multiple times the same process, iterating the process until the converges stage is reached.
The EM algorithm iteration consists of two steps: expectation step and the maximization step (Little & Rubin, 2002). To finish each step the algorithm it- erates multiple times repeatedly until a convergence criterion is met. The the- oritical review of the information about the steps of EM algorithm are found in Dempster et al. (1977) and Little&Rubin(2002). On the convergence state, the fitted parameters are equivalent to a local maximum of a likelihood func- tion according to Dempster et al. (1977) . This algorithm has two disadvan- tages: firstly, performance of the algorithm is very slow to converge. Sec-
ondly, it fails to directly measure the precision for the maximum likelihood estimates. Many proposed techniques introduced to overcome listed draw- backs, and these techniques are documented by Louis (1982); McLachlan &
Krishnan (1997); D. B. Rubin (1991); Baker (1992).