In some cohort studies, exact death times and censoring times are not available. This is often the case with large surveillance systems such as cancer registries, where patient visits are scheduled on a routine basis. For those individuals who die or are censored between appointments, all that may be known is that they survived to the last follow-up time. In this case we say that the survival times are interval-censored and that the data are grouped. The actuarial method is a classical approach to the analysis of interval-censored survival data which has its roots in life table analysis.
The actuarial method differs from the Kaplan–Meier method in that intervals are determined by the investigator rather than based on observed death times. Letτ0=0, letτJ+1be the maximum observation time, and letτ1< τ2<· · ·< τJbeJinterme- diate time points. The actuarial approach begins by partitioning the period of follow- up intoJ+1 intervals:[τ0, τ1),[τ1, τ2), . . . ,[τj, τj+1), . . . ,[τJ−1, τJ),[τJ, τJ+1].
As before, we refer to[τj, τj+1)as the jth interval. Letaj andcj be the numbers of deaths and censored observations in the jth interval, respectively(j=0,1, . . . ,J). With interval-censored data we have no knowledge of the precise death times or cen- soring times, but this does not affect the countsaj andcj. Although the definitions of aj andcj are formally the same as those used in the Kaplan–Meier setting, a difference here is that deaths in the jth interval are permitted to occur throughout the interval rather than only atτj. A further difference is thata0 is not necessar- ily equal to 0. The jth risk set is defined to be the group of subjects surviving to at leastτj (j =0,1, . . . ,J). We adopt the convention that subjects who die atτj
are included in the risk set. Letrj denote the number of subjects in the jth risk set (j =0,1, . . . ,J), and denote byrJ+1the number of subjects who survive toτJ+1. As before, we definecj =cj for j <JandcJ =cJ −rJ+1.
In order to estimate the survival curve it is necessary to make certain assumptions about its functional form and the distribution of censoring times. Specifically, we as- sume thatS(t)is a continuous function that is linear on each of the intervals. In other words, the graph ofS(t)is a series of line segments that meet at values correspond- ing to the endpoints of intervals. We also assume that censoring for reasons other than survival toτJ+1takes place uniformly throughout each interval. Consequently, all censoring, except that due to survival toτJ+1, occurs on average at the midpoint of each interval. Letpj denote the conditional probability of surviving toτj+1, given survival toτj, and letqj =1−pj be the corresponding conditional probability of dying(j =0,1, . . . ,J).
The actuarial approach to estimating the survival function proceeds along the lines of the Kaplan–Meier method. The denominator ofqˆj isrj, and the numerator is defined to be the total number of deaths in the jth interval. The latter quantity is the sum of theaj observed deaths plus the number of unobserved deaths among the cj censored subjects. With the preceding assumptions about the survival curve and censoring patterns, the number of unobserved deaths is estimated to be(qˆj/2)cj. So an estimate ofqj isqˆj = [aj +(qˆj/2)cj]/rj, which can be solved forqˆj to give
ˆ
qj = aj
rj −(cj/2) (9.12)
(j =0,1, . . . ,J). The denominatorrj −(cj/2)will be denoted byrj and referred to as the “effective” sample size. This terminology is appropriate sincerj can be thought of as the number of subjects who would need to be at risk in the absence of censoring in order to give the estimate (9.12). Note thatrj may not be an integer.
Withpˆj =1− ˆqj, we have the estimates
Sˆj = ˆp0pˆ1· · · ˆpj−1 (9.13)
var(Sˆj)=(Sˆj)2
j−1
i=0
ˆ qi
ˆ piri
and
var[log(−logSˆj)] = 1 (logSˆj)2
j−1
i=0
ˆ qi
ˆ piri
(j=1,2. . . ,J+1). A graph of the actuarial survival curve is obtained by plotting theSˆj and then joining these points by straight line segments.
Example 9.7 (Receptor Level–Breast Cancer) Table 9.13 gives the actuarial analysis of the breast cancer data after stratifying by receptor level. The period of
TABLE 9.13 Actuarial Analysis: Receptor Level–Breast Cancer
j τj aj rj cj rj pˆj Sˆj Sj Sj
0 0 5 199 2 198.0 .975 1.0 — —
1 12 17 192 2 191.0 .911 .975 .940 .989
2 24 11 173 1 172.5 .936 .888 .835 .925
3 36 10 161 1 160.5 .938 .831 .771 .877
4 48 6 150 132 84.0 .929 .780 .715 .832
5 60 — 12 — — — .724 .648 .786
follow-up has been divided into 12-month blocks, and the 95% confidence intervals were estimated using the Kalbfleisch–Prentice method. Figure 9.7 shows the graph of the actuarial survival curve and the 95% confidence intervals. Not surprisingly, Figures 9.7 and 9.2 are similar.
FIGURE 9.7 Actuarial survival curve and Kalbfleisch–Prentice 95% confidence intervals: Breast cancer cohort
C H A P T E R 10
Poisson Methods for Censored Survival Data
The Kaplan–Meier method is based on relatively few assumptions; in particular, nothing is specified regarding the functional form of either the survival function or the hazard function. Censoring is assumed to be uninformative, but this is a feature of virtually all of the commonly used methods of survival analysis. Since so little structure is imposed, it is appropriate to view a Kaplan–Meier survival curve as a type of scatter plot of censored survival data. The appearance of a Kaplan–Meier curve can be used to form ideas about the nature of the underlying survival function and hazard function, in much the same way as a scatter plot is used as a visual aid in linear regression.
Despite these advantages, there are difficulties with the Kaplan–Meier approach.
Kaplan–Meier curves are not designed to “smooth” the data while accounting for random variation in the way that a linear regression line is fitted to points in a scatter plot. As a result, Kaplan–Meier survival curves can be erratic in appearance and sensitive to small changes in survival times and censoring patterns, especially when the number of deaths is small. The Kaplan–Meier survival curves for the six receptor level–stage strata shown in Figure 9.6 are relatively well-behaved, but it is easy to imagine how complicated such a graph might otherwise be.
In this chapter we describe parametric methods of survival analysis based on the Weibull, exponential, and Poisson distributions. The computations required by the exponential and Poisson models are relatively straightforward, and the results are readily interpreted. However, this convenience is gained at the expense of having to make strong assumptions about the functional form of the hazard function, a decision that needs to be justified in any application.
10.1 POISSON METHODS FOR SINGLE SAMPLE SURVIVAL DATA