Cohort studies are usually designed with a specific endpoint in mind. For the sake of concreteness we take the endpoint to be a particular disease. During the course of follow-up a subject either develops the disease or not. If the disease occurs, follow-up ceases for that individual as far as the cohort study is concerned. For that subject the length of follow-up is defined to be the time from the beginning of follow-up until the onset of disease, regardless of what happens subsequently. If the disease does not develop, follow-up continues until the subject becomes unobservable or reaches the termination date (end) of the study, whichever comes first. In this case, length of follow-up is defined to be the time from the beginning of follow-up until either of the preceding two events. We refer to a cohort study in which subjects have different maximum observation times as an open cohort study.
Consider an open cohort study conducted during a given (calendar) time period [τ0, τ1], where τ1 is the termination date of the study. Letτ be a fixed time such thatτ0 < τ ≤τ1. Suppose that subjects are recruited into the study on an ongoing 159 ISBN: 0-471-36914-4
basis throughout[τ0, τ]and that follow-up begins immediately after recruitment.
This method of accrual is referred to as staggered entry because not all members of the cohort are placed under observation at the same time. As a result of staggered entry, subjects inevitably have different maximum observation times. For example, someone recruited at timeτ0will have a maximum observation time ofτ1−τ0, while an individual recruited at timeτwill have a maximum observation time ofτ1−τ. Even if no subjects become unobservable, staggered entry and varying maximum observation times will result in subjects having different lengths of follow-up.
For historical reasons it is usual in the survival analysis literature to refer to the study endpoint as “death” and to the length of follow-up for a given subject as the
“survival” time. These and related conventions are adopted irrespective of whether the study has a mortality endpoint or not. So, for example, when we speak of a subject surviving to the end of the study we mean that, for this individual, the endpoint of interest did not occur. For a given subject, lett denote the survival time and define an indicator variable as follows:δ = 1 if the subject dies, and δ = 0 otherwise.
Whenδ = 0 we say thatt is a censored survival time, and when δ = 1 thatt is uncensored. In this way, the outcome for each subject is made dichotomous—
that is, censored or not. Survival data on each subject can be compactly written in vector form(t, δ), which we refer to as an observation. We say that an observation is censored or uncensored according to whetherδ=0 orδ=1, respectively.
Figure 8.1(a) depicts an open cohort study involving six subjects in which the maximum observation time is 10 years. The horizontal axis is calendar time and, in the above notation,τ0 =0,τ = 5, andτ1 =10. The line for each subject, which we refer to as a follow-up line, stretches between the calendar time points that the individual was under observation. A solid dot indicates that the subject died, and a circle means that the subject was censored. So, subject 1 entered at the beginning of recruitment, was followed for 10 years, and exited the study alive. Subject 2 also
FIGURE 8.1(a) Follow-up times for censored survival data.
entered at the beginning of recruitment but died after 3 years of follow-up. Subject 6 was enrolled at the 1-year point, was followed for 5 years, and exited the study alive.
Figure 8.1(a) involves two types of “time”: calendar time on the horizontal axis and survival time as depicted by the follow-up lines. If it can be assumed that such fac- tors as recruitment and outcome are independent of calendar time, it is appropriate to
“collapse” over the calendar time dimension. This results in Figure 8.1(b) in which all follow-up lines have been given the same starting point. Note that now the hori- zontal axis is labeled survival time.
Cohort data may contain information on several endpoints of interest. For ex- ample, as part of an ongoing follow-up of a group of patients with coronary artery disease, information might be collected on such endpoints as nonfatal myocardial infarction (heart attack), whether revascularization surgery was performed, and fatal myocardial infarction. The same individual could generate the observation(2.5,1) when nonfatal myocardial infarction is the endpoint,(4.0,1)when revasculariza- tion is the endpoint, and(6.0,0)when fatal myocardial infarction is the endpoint.
The interpretation is that this person had a nonfatal myocardial infarction 2.5 years into follow-up, underwent revascularization surgery 1.5 years later, and exited the database alive 2 years after that. The important point is that each choice of endpoint leads to a different definition of survival time and, by virtue of that, to a different cohort study.
According to the above definition of censoring, all subjects who do not develop the disease are lumped together as censored observations. However, the causes of censoring, in particular the reasons for becoming unobservable, may differ among subjects in ways that are important to the interpretation of study findings. For ex- ample, consider a cohort of patients with a particular type of cancer who have been treated with an innovative therapy and who are now being followed for death due to that disease. A subject who is censored as a result of being struck dead by lightning
FIGURE 8.1(b) Survival times for censored survival data.
presumably had a mortality risk from cancer that was no different from any other randomly selected member of the cohort. This type of censoring is said to be unin- formative because a knowledge of the censoring mechanism does not tell us anything about the risk of experiencing the endpoint of interest. When censoring is uninfor- mative, individuals censored at a given point during follow-up are a random sample of the members of the cohort surviving to that time point (Clayton and Hills, 1993,
§7.5).
Now consider a subject who is censored as a result of being lost to follow-up after moving out of the study area. Suppose the reason this person decided to move was a dramatic remission of disease. Had this person remained in the study, there is a less than average chance that death from cancer would have occurred during follow- up. This type of censoring is said to be informative because a knowledge of the censoring mechanism tells us something about the risk of experiencing the endpoint of interest. When censoring is informative, individuals censored at a given point during follow-up are a nonrandom sample of the members of the cohort surviving to that time point, and this can lead to biased risk estimates. In the present example, the type of censoring described would result in the mortality risk being overestimated by the study. Consider a comparative study in which informative censoring takes place in both the exposed and unexposed cohorts. In most situations it is reasonable to assume that the risk estimates for both cohorts will be biased in the same direction.
Consequently, when the risk estimates are combined into a measure of effect, the biases will tend to cancel each other out, to a greater or lesser extent. This means that informative censoring is usually of greater concern when the data are being analyzed in absolute rather than relative terms.
In a particular study, the endpoint might be quite narrowly defined—for example, death from a specific cause, onset of a certain illness, or recovery following a partic- ular type of treatment. In each instance, only the specified endpoint is of interest and all other exits from the cohort are treated as censored observations. For example, con- sider a cohort study of breast cancer patients where the endpoint is death from this disease. In this setting, any reason for a subject becoming unobservable—in partic- ular, death from a cause other than breast cancer—results in a censored observation.
In a sense, the survival analysis is conducted as if death from breast cancer is the only possible cause of death and that, if followed long enough, all subjects would eventually die of this disease. Although such an assumption is usually unrealistic, it offers certain conceptual advantages. In particular, when cohorts are being compared in the same study or across studies, observed mortality differences will be specific to the endpoint of interest and not obscured (confounded) by extraneous factors related to censoring.
The methods of analyzing censored survival data presented in this book are all based on the assumption that censoring is uninformative, an assumption that may not be satisfied in practice. When censoring is informative, this must be considered at some point in the survival analysis. One approach is to model the censoring mecha- nism as part of the survival analysis in an effort to account for informative censoring.
This requires information on the reasons for censoring and usually this degree of detail is unavailable. A practical alternative is to perform the survival analysis under
the assumption that censoring is uninformative and then use qualitative arguments based on what may be known or suspected about the censoring mechanism to decide whether a parameter estimate is significantly biased.