Chapter 2: Data and exploratory analysis 8
3.5 Types of covariance structures
3.4.1 Random intercepts and slopes model
A model that includes both random intercepts and random slopes is likely the most realistic type of model, although it is also the most complex. In this model, both intercepts and slopes are allowed to vary across groups, meaning that they are dif- ferent in different contexts (Cohen et al., 1983). The random intercept and slopes model is given by
Yij =β0+β1tij +bi1tij+bi0+eij
=β0+ (β1+b1j)tij +b0j+eij
(3.53)
"
bi0
bi1
#
∼N(0,Ωb) (3.54)
Ωb =
"
σ2b0 σb01
σb01 σb2
1
#
(3.55)
eij ∼N(0, σ2e) (3.56)
bi∼M V N(0,Σ) (3.57)
V ar(Yij) =σ12+ 2σ01tij+σ12tij+σ2 (3.58)
Cov(Yij, Yik) =σ02+σ01(tij, tik) +σ21tijtik (3.59)
Cov(Yij, Ylk) = 0 (3.60)
More complex models possible, but harder to fit.
3.5 Types of covariance structures
The distinct nature of longitudinal mixed model analysis is the covariance structure of the observed data. Here measurements made on the same subject are likely to be more similar than measurements made on different individuals. That is, repeated measurements are correlated. The covariance among repeated measures must be
3.5. Types of covariance structures
modeled properly in order for the analysis to be valid. Note the covariance structure is not of primary interest in the analysis, however it plays a huge role for validity of inferences. Thus more effort is usually needed at the beginning of the statistical analysis to assess and identify the best covariance structure of the data.
Until recently, analysis techniques available in computer software only offered the user limited and inadequate choices. One choice was to ignore covariance structure and make invalid assumptions. Another was to avoid the covariance structure issue by analyzing transformed data or making adjustments to otherwise inadequate anal- yses. Ignoring covariance structure may result in erroneous inference, and avoiding it may result in inefficient inference. Recently available mixed model methodology permits the covariance structure to be incorporated into the statistical model.
There are several specific choices of the form of the working covariance structures, but the three most commonly used covariance structures are, compound symmetry (CS), unstructured (UN) and autoregressive (AR(1)).
3.5.1 Compound Symmetry (CS):
Cov(Yijk, Yijl) =σ21 (3.61)
ifk6=lthen
V ar(Yijk) =σ21+σ2 (3.62)
Compound symmetric structure specifies that observations on the same subject have homogeneous covarianceCov(Yijk, Yijl) =σ12and homogeneous varianceV ar(Yijk) = σ12+σ2. The correlation function is ρ = σ2σ21
1+σ2. Note that the correlation does not depend on the value of the lag, in the sense that the correlations between two obser- vations are equal for all pairs of observations on the same subject. This is unrealistic in longitudinal data problem in the sense that observations closer to each other are more correlated than the ones which are further apart (Littell et al., 2000). Com- pound symmetric structure is sometimes called variance components structure, be- cause the two parametersσ21andσ2represent between-subjects and within-subjects variances, respectively. This mix of between and within-subject variances logically motivates the form ofV ar(Yijk)) in many situations and implies a non-negative cor- relation between pairs of within-subject observations. For example if we consider three repeated measures, the compound symmetric correlation structure is given by
3.5. Types of covariance structures
CS=
σ21+σ2 σ12 σ12 σ12 σ21+σ2 σ12 σ12 σ12 σ21+σ2
(3.63)
3.5.2 Unstructured (UN)
All the variances and covariances are different, this is the most flexible covariance structure which leads to the unstructured pattern of correlations which assumes un- constrained pair-wise correlations where each correlation is estimated from the data (the most complex model). This lets the data dictate what they should be and re- quires the estimate of many parameters. The more data that are used to assess the covariance structure, the less data are left to estimate the parameters of linear mod- els. An unstructured covariance structure with three repeated measures is given by
U N =
σ12 σ12 σ13
σ12 σ22 σ23 σ13 σ23 σ23
(3.64)
The unstructured type of correlation has an immediate disadvantage because it in- creases the number of parameters to estimate in the overall model hence causing possible non-convergence problems, particularly those associated with boundary values. The best way adopted to reduce the number of parameters is to assume that all the covariances along the diagonal have a constant variance. Furthermore an analysis that uses this covariance matrix will be less powerful than an analysis that uses a less parametric but more realistic structure, however the problem with that is knowing what that proper structure is (Littell et al., 2000).
3.5.3 Autoregressive, order 1 [AR (1)]:
Cov(Yijk, Yijl) =σ2×ρ|k−l| (3.65) The AR (1) covariance structure specifies homogeneous varianceV ar(Yijk) = σ2. Furthermore it specifies that covariances between observations on the same subject are not equal, but decrease toward zero with increasing lag. This covariance struc- ture is relevant for repeated measures in time, and the term autoregressive is derived from time series analysis that assumes observations are related to their own past val- ues or history through one, two, or a higher order autoregressive (AR) process. The correlation between two responses that aremmeasurements apart isρm, sinceρ is
−1 ≤ ρ ≤ 1 or to be more realistic0 < ρ < 1, the greater the power m ≥ 1, the
3.5. Types of covariance structures
smaller the magnitude. Thus the further measurements are apart, the lower their correlation. In the case of three repeated measurements AR(1) covariance structure is given by
AR(1) =σ2
1 ρ ρ2
ρ 1 ρ
ρ2 ρ 1
(3.66)
Note that the AR (1) resolves some of the objectives to the use of the compound symmetry where only one correlation parameter is needed. AR (1) are a reasonable choice for evenly or equally spaced observations.
The heterogeneous versions of AR (1) and CS, are ARH (1), CSH respectively, and they are simple extensions which assume the variances along the diagonal of the matrix are not equal.
3.5.4 Spatial covariance structures
Spatial correlation structures are very useful for ”unequally spaced longitudinal data which can be viewed as a spatial process in one dimension” (Littell et al., 2000). The advantage of using spatial correlation structures is that they calculate the actual dis- tance between measurements themselves without making any assumptions. Table 3.1gives some of the spatial covariance structures, each one specifying and defining how fast the correlations decrease as functions of the distances between measure- mentsdij.
3.5. Types of covariance structures
Table 3.1:Spatial covariance structures
Structure Example
Power σ2
1 ρd12 ρd13 ρd12 1 ρd23 ρd13 ρd23 1
Linear σ2
1 1−ρd12 1−ρd13 1−ρd12 1 1−ρd23 1−ρd13 1−ρd23 1
Exponential σ2
1 exp(−d12\ρ) exp(−d13\ρ) exp(−d12\ρ) 1 exp(−d23\ρ) exp(−d13\ρ) exp(−d23\ρ) 1
Gaussian σ2
1 exp(−d212\ρ2) exp(−d213\ρ2) exp(−d212\ρ2) 1 exp(−d223\ρ2) exp(−d213\ρ2) exp(−d223\ρ2) 1
Spherical σ2
1
1−(3d2ρ12) + (2ρd3123)
1−(3d2ρ13) + (2ρd3133) 1−(3d2ρ12) + (2ρd3123)
1
1−(3d2ρ12) + (2ρd3123) 1−(3d2ρ12) + (2ρd3123)
1−(3d2ρ12) + (2ρd3123)
1
3.5.5 Choosing the best covariance structure
Ideally the covariance structure should be known from previous work or subject matter considerations. Selecting a structure that is too simple increases the Type I er- ror rate and selecting a structure that is too complex sacrifices power and efficiency Littell et al. (2000).
To choose the best covariance structure use the information criteria (IC). Given a set of candidate covariance structures for the data, the preferred structure is the one with the minimum Akaike information criterion (AIC) value. AIC rewards goodness of fit (as assessed by the likelihood function), but it also includes a penalty that is an increasing function of the number of estimated parameters. The penalty discour- ages over fitting, because increasing the number of parameters in the model almost