3.10: Mixed data methods of data augmentation
3.10.3 The Restricted general location model
convergence.
where D is n×C matrix of dummy indicators that shows the cell location c = 1,2,3, ..., C. Now we can impose the restriction within cell meansµ,thenµbecome
µ=Bβˆ (3.121)
for some free parameterβ. LetBbe aC×rdesign matrix andu =B×Γ, whereΓ denoter×qmatrix under the assumption that therank(B) =r 6=C. Now this only means that we can only estimater ×qdimension of the matrix Γinstead ofC×q dimension of the meanµ. The constrained general location model also allows the meansµc to move freely from cell to cell, but the only change is that each column in the continuous variables of the matrixµand is bounded in the linear subspace with r-dimensional aroundRcspanned by columns of matrixB. The new regression model becomes
V=DBβ+ (3.122)
with the reduced number of regression coefficients in the free parameter β. The special case of general location model can be obtained by saturating the loglinear model for Dirichlet distribution with hyperparameters and letting matrixB=IC×C
(identity matrix). The regression coefficients are estimable if the contingency tables for categorical variables contains no random zeroes but if it does contain zeroes, it may still be estimable just because estimability depends on the rank ofUBinstead ofUitself. These only holds under the assumption that
rank(B) =rank(DB) =r (3.123)
The likelihood inference of the restricted models
Under restricted models,we have two types of restrictions that we can apply on the models. The first restriction is the loglinear restriction on cell probabilitiesπand the linear restrictions within-cell meansµ. Let the joint unknown parameter space for θ = (π,µ,Σ)and the individual space for the product of πand(µ,Σ) can be ob- tained on the above restrictions. The problem with the joint likelihood for parameter θis that maximization and the estimate for maximum likelihood for cell probabili- ties can be determined by using conventional IPF.
In the restricted model, we usually apply the marginal distribution to cell probabil- ities which allows us to separate factors in the full likelihood in the given model.
The estimate forµandΣcan be obtained from the least-squares regression for the reduced modelV=DBβ+, which generates the estimates forβˆandΣ. Thus,ˆ
βˆ= (BTDTDB)−1BTDTV (3.124)
= (BTZ3B)−1(BTZ2); (3.125) and
nΣˆ = (V−DBβ)ˆ T(V−DBβ)ˆ (3.126) Z1−ZT2B(BTZ3B)−1(BTZ2) (3.127) The ML estimate of the cell meansµis given byµˆ =Bβˆand the covariance matrix will uses the unbiased estimaten(n−r)−1Σˆ instead ofΣ.ˆ
Under the Bayesian inferences the restricted model may be implemented to the in- dependent product of prior distributions for unknown parameter setπ and(µ,Σ).
These independent parameter setπand(µ,Σ)remain independent even under the complete-data posterior distribution. General location model can be obtained by sat- urating the loglinear model for Dirichlet distribution with hyperparameters. How- ever, we can define the proir distribution to be a constrained Dirichlet prior to be components of parameterπwith the density function
P(π)∝
C
Y
c=1
πγ−1c (3.128)
where values ofπ hold under the loglinear constraints.The posterior density for the full data can be constrained using Dirichlet with the new hyperparameters0γc = γc+xc.
The inferences forβandΣunder a noninformative prior
The multivariate regression model under the Bayesian inference is given byf(β,Σ| W). The likelihood function forΣandβis given by
L(β,Σ|W)∝ |Σ|−n2 exp −1
2 trΣ−1(V−DBβ)T(V−BBβ)
(3.129) then, the likelihood function also be rewritten in terms of least-squares estimate
∝ |Σ|−n2 exp −1
2 trΣ−1εˆTεˆ−1
2(β−β)ˆ T(Σ⊗Y)−1(β−β)ˆ
(3.130)
whereβˆis the estimated coefficients matrix,Y= (BTDTDB)−1andεˆ=V−DBβˆis the estimated residuals. The following symbol⊗defines the Kronecker product;
Σ⊗Y=
σ11Y σ12Y · · · σ1qY σ21Y σ22Y · · · σ2qY
... ... . .. ... σq1Y σq2Y · · · σq,qY
The quantity (β −β)ˆ T(Σ⊗Y)−1(β−β)ˆ will be meaningful if the columns of β andβˆare stacked to form a vectors with the length ofrq. We continue to apply an improper uniform prior toβand Jeffreys prior for parameters givenΣthat is ,
P(β,Σ)∝ |Σ|−(q+1)2 (3.131) The combination between the joint prior density (3.365) with the likelihood function (3.263) and also use the Kronecker products betweenβandΣ,
|Σ⊗Y|=|Σ|r|Y|q, (3.132) Therefore, we can get the updated posterior density function which is expressed as follows
P(β,Σ|W)∝ |Σ|−(n−r+q+1)2 exp −1
2 trΣ−1εˆTεˆ
×|Σ⊗Y|−12 exp 1
2(β−β)ˆ T(Σ⊗Y)−1(β−β)ˆ
(3.133) and the following posterior distribution of the product of multivariate normal den- sity forβgivenΣand an inverted-Wishart density forΣ,
P(Σ|K,V) =K−1(n−4,(ˆεTε)ˆ−1) (3.134) P(β|Σ,K,V) =N(β,ˆ Σ⊗Y), (3.135) andΣandYis the Kronecker product ofΣandY(M. Anderson, 2010).
According to (J. L. Schafer, 1997), data augmentation can be used in different multi- ple imputation softwares such as a multivariate normal models, multinomial mod- els and general location models to impute missing observations for different types of variables. J. L. Schafer (1997) shows that these algorithms can be implemented under continuous variables, categorical variables and mixed (continuous and cate- gorical) variables respectively and also prove that this model only holds under the
assumptions of ignorability; that is, missing observations occur at random (MAR).
The descriptions of this model can be found in J. L. Schafer (1997). In this study, we applied the unrestricted general location model since our variables are both categor- ical and continuous variables.
To apply unrestricted general location model to incomplete mixed data above, the software started by using the EM algorithm to compute the maximum likelihood estimates for each cell probabilities, the cell means, and the covariances. EM al- gorithm may be used as starting points simulation step iteration in the algorithm.
Thus, the software starts to apply the iterative simulation technique in the loop to reproduce one or more iterations of a single Markov chain. The iterations simulated consists two steps which is I-step and P-step. In the I-step the random imputation for both missing in categorical and continuous data are drawn from the predicted multinomial distribution and multivariate normal distribution, respectively, with the current estimate of the parameter in the model. The restricted general location model is very useful when n is large compared to number cells in the given data set. The unrestricted general location model has(C−1) +Cp+ p(p+1)2 free param- eters and it became difficult to computeC×pestimate as a number of categorical variablespincrease in the model. The Bayesian Iterative Proportional Fitting (BIPF) algorithm is used to reduce the number of the parameters needed to estimated in the general location model (J. L. Schafer, 1997) . In application, the packagemixis used inRsoftware that was developed by J. L. Schafer (1997), which can be downloaded from cran. In themixlibrary, the general model location data augmentation uses the functionda.mixand BIPF algorithm for the restricted general location model use dabipf.mixfunction.