On the other hand, in the United States HAART is started when the CD4+ count is less than 350 cells/µL. CD4+ count and viral load are measured at baseline and every six months thereafter.
What drives young women into sex at young age?
Health's HAART rollout, the CAPRISA AIDS treatment project is helping by rolling out HAART for people who are HIV positive. Eligibility is to have a CD4+ count of less than or equal to 200 cells/µL and to be at least 14 years old, but if the CD4+ count is greater than 200 cells/µL and a patient is very ill, he/she still initiated on HAART for ethical reasons.
Objectives of the study
Baseline characteristics
The p-values in table 2.1 are not statistically significant, and therefore we fail to reject the null hypothesis that there is no relationship between place and the variables gender and age group. This observation appears from the percentage distribution of these variables across localities in table 2.1.
Distributional properties of CD4+ count
One can see that the average CD4+ count for the two sites behaves differently from baseline to month 24. From Figure 2.7 one can see that the average CD4+ count for women is greater than that of men from baseline to month 24.
Profile plots for a random sample of patients from each site
The two graphs at original scale and square root scale in Figure 2.10 show the same qualitative feature.
Scatter plots for CD4+ count against covariates
Scatter plot and correlation matrix
The scatter plot and correlation matrix can be used to explore the correlation between repeated measurements. Two adjacent measurements are more correlated than measurements that are distant and this is confirmed by Figure 2.19.
Testing equality of mean CD4+ count for different variables
Again, the p-values in Table 2.5 show that the mean CD4+ counts for males and females in eThekwini are statistically significantly different from month 6 to month 24 with females having the highest CD4+ counts at all visits. The results in tables 2.7 and 2.8 show us that the improvement in the CD4+ count is the same whether you are a male from an urban or rural area.
Sample variogram and autocorrelation
Serial correlation (a measure of within-subject variability) is usually a decreasing function of the time interval between measurements, as shown in the sample correlation matrix in Figure 2.19. Since the slope of the line is not equal to zero, this may indicate the presence of serial correlation. The autocorrelation plot in Figure 2.22 shows a decreasing within-subject correlation from about 0.72 to 0.28 over the range of the data.
The approach of Verbeke and Molenberghs (2000) is frequently used in the development of the theory.
Theory of the linear mixed model
In particular, the 2-step fitting of the linear mixed models to longitudinal data will be the building block of the full model. One of the topics that will be discussed includes estimation and inference for fixed effects. The distinguishing feature of the repeated measures model lies in the specification of the covariance structure of the repeated measures.
The choice of fixed and random effects is not always determined by the structure of the experiment, but may depend on the information needed.
Linear mixed model for longitudinal data
- Two-staged fitting of the linear mixed model for longitudinal data
- Inference for the fixed effects
- Inference for the variance components
- Information criteria
If one is interested in fixed effects (the population mean), then one should focus more on the marginal model. So far we have done variance estimation for the normal population and also in the linear regression model. However, inferences about the covariance structure are of interest as well, especially for the interpretation of random variation in the data.
It is not a test on the model in the sense of hypothesis testing, rather it is a tool for model selection.
Inference for the random effects
It is perfectly reasonable to think that in the bull population there will be bulls other than the kth that nevertheless have (or can have) the same daughter average, namely ¯yk. Relating McCulloch's example to the CD4 count data for this analysis, it is reasonable to imagine that in the population of patients on HAART, there will be other patients on HAART besides those included in the current analysis, who have the same average CD4+ count. The focus will be based on the description of the mean model for square root CD4+ count while also trying to capture the best correlation structure of the repeated measurements within a subject.
The next step in the process will be an attempt to capture any unobserved heterogeneity by allowing for possible subject-specific random effects and correlation structure between the random effects, if any.
Univariate models
Marginal models
Intercepts and slopes are statistically significantly different, with women having, on average, a higher mean square root change in CD4+ count than men. The next submodel fitted is to assess whether the mean square root rate of change in CD4+ count is the same for patients at both sites. This means that the square root regression of CD4+ count against age is not statistically significant (p=0.1453), considering time (visit) and visit*age in the model.
Next, the effect of weight on square root CD4+ counts was modelled, and the results of this analysis are shown in Tables 4.9 and 4.10.
Random effects models
The results in Table 4.14 give us an average intercept and slope over time, where β0=10.7031 is the mean square root CD4+ count at baseline, and this is an estimate of the population mean. The results in Table 4.18 are slightly different from the results in Table 4.6, where we fitted the marginal model. The results for the random effects model in Table 4.20 are similar to the results for the marginal model given in Table 4.8 except that now the standard errors are slightly smaller.
Also, the size of the fixed effects is increased under the random effects model, except for the intercept, when compared to the marginal effects model in Table 4.10.
Multi-covariate models
Marginal model
A graphical representation in terms of AIC, AICC and BIC for each of the covariance structures is shown in Figure 4.3. The covariance parameter estimates under ML are similar to those in Table 4.28 where the REML estimator was used, but note the slight underestimation of the variance components. The covariance parameter estimates for the reduced model in Table 4.32 are almost identical to the parameter estimates in Table 4.28, where we fitted the full model.
The results in Table 4.33 show that there is no difference between eThekwini and Vulindlela in terms of mean CD4+ count after HAART.
Random effects model
The parameter estimates for the spatial linear structure are smaller than those in Table 4.28 because the variation is now separated into two components, within- and between-subject variation. In Table 4.28 we allowed only within-subject variation and between-subject variation was ignored. Then, the model was fitted under ML so that non-significant fixed effects are deleted one by one starting from the most non-significant, and the results are shown in the tables.
The AIC for this model is 19400.9 which is a significant reduction compared to 19438, the AIC for the marginal model with the exact number of fixed effects in Section 4.3.1.
Modelling baseline log viral load
Marginal model
To fit the edge model, we will use the same procedure as in Section 4.3.1. From Figure 4.5, we see that UN and all spatial covariance structures are better structures compared to CS, AR(1), Toeplitz and VC. Among the spatial family, the structure with the smallest AIC was the spatial power and also the exponential spatial both with an AIC of 17023.4.
To check whether this was true, we built model (4.13) without the covariate logv and the interaction term visit*logv in section 4.3.1.
Random effects model
The random intercept was removed to see if the variance for the slope becomes significant. The model was then fitted without the random intercept and the results are shown in Table 4.46. The AIC for this model is 17001.6, which is greater than 16984.5 from the model where we have both the random intercept and the slope.
So, the random effects model will be fitted using the spatial linear covariance structure for the within-subject variation and UN for the between-subject variation.
Model diagnostics
In section 4.3, the random effects model was chosen over the marginal model based on the AIC. However, the violation of the normality assumption does affect the parameter estimates and standard errors of the random effects. First, we plot both the residuals and residuals against the predicted values for the marginal and random effects models in Section 4.3.
The remaining histogram plots for the final marginal and random effects models in this thesis discussed in Section 4.4 are shown in Figures 4.11 through 4.13.
Missing data processes
If data are neither MCAR nor MAR, then the data are not missing at random (MNAR), and this missing mechanism is termed non-ignorable. This means that even if all the available observed information is taken into account, the reason why observations are missing still depends on the unseen observations themselves. Carpenter and Kenward (2006) provide the following example, assuming that 12-week forced expiratory volume (F EV1) was MCAR conditional on baseline F EV1.
Now assume that even conditional on the baseline, the probability of seeing F EV1 in 12 weeks depends on the value in 12 weeks, that is, 12 weeksF EV1 is MNAR.
Missing data frameworks
In our case, the best assumption we make about the CAT data used for this thesis is that it is missing at random (MAR). Handling the case of MNAR will require some untestable assumptions and this type of missing data process methods such as sensitivity analysis which are beyond the scope of the present work. 5.3) The pattern mixture model allows for a different response model for each pattern of missing values, the observed data is a mixture of these weighted by the probability of each missing value or dropout pattern (Molenberghs and Kenward, 2007).
Molenberghs and Kenward (2007) reported that the natural parameters of selection models, pattern mixture models, and shared parameter models have different interpretations, and transforming a statistical model from one of the frameworks to another is generally not straightforward.
Methods for handling missing data
A subject's measurements must be believed to remain at the same level from the moment of abandonment onwards. Marginal and conditional imputation, also known as marginal mean imputation, replaces the missing observation with the mean of the observed values for that variable, and this is clearly problematic for categorical variables. However, marginal mean imputation ignores all other variables in the data set, using it reduces the associations in the data set.
Also, imputing all missing observations to the same value is clearly incorrect and will underestimate the variability in the unseen data (Carpenter and Kenward, 2006).
Application to CAT study: Predictors of withdrawal
Therefore, the most plausible approach is to include either CD4+ count or log virus in the model, but not both. Similarly, there was a significantly different rate of change in CD4+ count over time in patients who started with a lower log viral load at baseline compared to patients with a higher log viral load at baseline. Thus, adjusting the baseline log viral in the model shows that the rate of change in CD4+ count is related to gender, age, weight, baseline log viral and not to location.
A higher rate of change in CD4+ count is specifically associated with women, younger age, lower weight, and higher log viral load at baseline.