Analysis of time-to-event data including frailty modeling.

Frailty modeling is a method where a random effect is incorporated into the Cox proportional hazards model. However, frailty modeling is an important aspect to consider, especially if the Cox proportional hazards model does not adequately describe the distribution of survival time.

Basic Concepts

For example, the patient may still be alive at the end of the study when the event is death.). The individual may have been lost to follow-up, or may not have experienced the event until the end of the study.).

The Survivor Function and the Hazard Function

Using conditional probability laws and the mathematical definition of derivatives, the above equation can be rewritten as follows. Both the survival function and the hazard function can be estimated from the given survival data.

Types of Survival Distributions

Given survival to time t0, the excess lifetime beyond t0 still has the exponential distribution with parameter λ. This result also explains why the exponential distribution is not a realistic distribution for time-to-event data.

Nonparametric Procedures

Thus, an intuitive method of estimating the hazard function is to take the ratio of the number of deaths occurring at a given time to the number of individuals at risk at that time. The cumulative hazard is the integral of the hazard function, and we know this from equation (1.5).

Table 1.2: Number of deaths at time t i Group No. of deaths Number survived Total

Introduction

If the censored observations are treated as survival times, then the resulting sample statistics are not estimates of the survival time distribution. Rather, they are estimators of a combination of the survival time distribution and that of a second distribution that depends on survival time as well as statistical assumptions about the censoring mechanism (Hosmer and Lemeshow, 1999).

Exponential Model

As the sample size increases, the distribution of the minimum value can be found to tend towards Gumbel(0,1) (Hosmer and Lemeshow, 1999). The location scale model can be applied to exponential, Weibull, log-normal and generalized gamma models (Lawless, 1982).

Log-Normal Model

Weibull Model

The assumption that the survival data have a Weibull distribution can be tested using a suitable method described below. To apply this method for assessing the suitability of a Weibull model, the survival function must be estimated.

Log-Logistic Model

For σ > 1 the hazard starts at infinity and decreases to 0 as it increases, and when σ= 1 the hazard takes a value of λ0 whent= 0, and decreases to 0 as t→. The log-logistic model is thus part of a general class of models known as proportional odds models.

Generalised Gamma Model

In addition, the model is difficult and computationally extensive, and thus can take much longer to compute, especially in the case of large datasets. A good summary of the assumptions about and the resulting distribution of T is given in Table 2.2 below.

Model Fitting

The likelihood function provides a quantity corresponding to the probability of the observed data under the model and can be derived to obtain estimates of the unknown parameters. In the construction of the likelihood function, the contribution of the triplets (t,1, x) and (t,0, x) are considered separately.

Choosing the Best Model to Fit the Data

To obtain the estimated values of the unknown parameters, equation (2.7) is differentiated with respect to the unknown parameters, set to 0, and then solved for β. If these equations are non-linear in the unknown parameters to be estimated, iterative techniques such as Newton-Raphson and Fisher Scoring algorithms can be used.

Introduction

Old Order Amish Community Data

It is possible to use a similar approach to estimate the log-logistic model and the log-normal model by plotting the log. It is clear from the graphs in Figures 3.5 and 3.6 that the log-normal model and the log-logistic model are also not appropriate.

Table 3.1: Distribution of sex in Amish data Sex Frequency Percent(%)

Lung Cancer Data

All variables are included in the model to determine whether they affect survival time. The exponential model revealed that treatment group, performance status, liver metastases, and weight loss were significant variables in the model. From Table 3.32 it can be seen that the generalized gamma model is the most suitable model, Table 3.32: Deviation for comparisons of different models.

Table 3.15: Characteristics of study population

Warfarin Data

The plot for the evaluation of the log-normal model shown in Figure 3.11 shows a curve in the middle of the plot, which is more pronounced than in the Weibull case. The log-log survivor plot in Figure 3.16 also appears to be linear, except for the small deviation near the beginning of the line. The graph for the evaluation of the log-normal model shown in Figure 3.17 shows some non-linearity at the beginning of the graph, after that it appears to be quite linear.

Figure 3.10: Log-log survivor plot for lung cancer data set

Introduction

The Cox Proportional Hazards Model

Suppose that the ratio of the two hazard functions for the treatment and control group is If the hazard ratio, ψ, is easily interpreted, the actual shape of the baseline hazard function is of little importance (Hosmer and Lemeshow, 1999). In the case of a single covariate, the square root of the test statistic reduces to.

Fitting the Proportional Hazards Model with Tied Survival TimesSurvival Times

An advantage of using the score test is that the statistic can be calculated without evaluating the maximum partial likelihood estimates of the parameters. Thus, the bound survival times could actually have been observed in any of the d. The variance of the estimated coefficient is obtained from the second partial derivative, evaluated at the estimated value of the parameter.

Estimating the Survivor Function of the Proportional Hazards Regression ModelHazards Regression Model

Then, the conditional baseline survival probability estimator is obtained by solving for ˆαi0 in Eq. The survival function estimator in S(t,x,β) = [S0(t)]exp(xTβ) is obtained by substituting the basic survival function estimators and the parameter estimators using the covariate values of interest (Hosmer and Lemeshow, 1999). The cumulative hazard function estimator is more practical and can be obtained from the following relation.

Introduction

Old Order Amish Community Data

In the above analysis, however, the cluster effect of family is not accounted for in the model. The model-based score and Wald statistics and the likelihood ratio statistic are identical to those obtained previously, and Table 5.4 is the same as before except for the addition of two extra statistics indicating the correlation structure for individuals in the same group. The parameter estimates, and thus the hazard ratios in Table 5.5 remain unchanged, but there is a change in the standard errors in that they are slightly larger than before.

Lung Cancer Data

The hazard for a participant with liver metastases is 1.5 times that for a participant without liver metastases. The hazard for a participant with bone metastases is approximately 1.3 times that for a participant without bone metastases. A participant who experiences weight loss will have a hazard approximately 1.2 times that of a participant who experiences no weight loss.

Warfarin data

People on adjusted dose warfarin have a risk of approx. 62% of the risk for people on fixed-dose warfarin. The interpretation is that for a one-year increase in age, the risk of experiencing an event decreases by an estimated 0.6%.

Introduction

If an individual has a weakness of 2, then that person is twice as likely to die at any given age and time as a standard individual. On the other hand, a person with a weakness of 0.5 is only half as likely to die. It is also more convenient to define frailty in terms of hazard rather than age-specific probability of death,qx for the following reasons.

The Distribution of Frailty

When k= 1 it simplifies to the exponential distribution, and when k becomes large it assumes a bell-shaped distribution similar to that of the normal distribution. It is clear from the above equation that as the cumulative hazard, H(x), increases, the average frailty of the remaining cohort decreases. This implies that the average frailty of those who die in old age, denoted by ¯z0(x), is greater than the average frailty of the survivors. 1979) concluded that ignoring frailty in a survival model can lead to biased estimates.

Univariate Semi-Parametric Frailty Models

Frailty among those who die at any age x is also gamma-distributed, with the same scale parameter λ(x) as among those who survive to age x, but with shape parameter k+ 1. A method to account for the heterogeneity of due to omitted covariates is frailty modeling, whereby an unmeasured random effect in the hazard function is incorporated into the model. The proportional hazards frailty model assumes that for a given frailty variablezi and covariatesxi, individual has in a hazard function given by.

Multivariate Semi-Parametric Frailty Models

To account for the heterogeneity between groups, a random effect (or the fragility effect) is included in the hazard function to account for the correlation of failure times within a group. When using the log-normal density for fZ(·), V ar[Z] =σz2 is used to describe the heterogeneity between the groups, while V ar[Z] = 1/α is used when a gamma density for the fragilities are accepted. For generality, assume that the heterogeneity can be described by a parameter θ, which means σz2 for the log-normal density and 1/α for the gamma density.

Estimation in the Frailty Model

The EM algorithm first finds the expected value of the log likelihood of the total data logf(X, Y|Θ) given the unknown data Y given the observed data X and the current parameter estimates (Bilmes, 1998). To estimate the parameters, it is necessary to increase the logarithm of the probability of the observed data. Suppose the model formulation is the same as the EM algorithm approach.

Figure 6.1: The directed acyclic graph representation of a frailty model

Penalized Partial Likelihood Approach

The parameter estimates for the model are given in Table 7.1, where θ is the estimate of the frailty distribution variance. It took 0 iterations to obtain the frailty variance, and 3 iterations to fit the final model. The estimate for the weakness variance is so small that it is very close to 0.

Table 7.1: Parameter estimates for gamma shared frailty model

Bayesian Approach

The fragility score for cluster 34 is 1.223, which is slightly less than the estimate from the PPL approach; and the fragility score in the 400 group is 0.7368, which is very similar to the score obtained earlier. It is also important to note that the ratio of MC error to standard deviation is less than 5% for each variable. The ratio of MC error to standard deviation was less than 5% for all parameter estimates.

Table 7.4: Bayesian parameter estimates for family data

Comparison of the Different Methods

The main difference is in the estimation of the variance of the vulnerability effect in the PPL model and the Bayesian model. The vulnerability term is simply modeled as a random effect, and an estimate of the vulnerability variance can be obtained. Comparing the Cox proportional hazards model with the frailty models revealed that the fixed effects parameter estimates were very similar.