1 Appendix: Estimating cumulative risk and assessment of Weibull model fit.

(1)

1 Appendix: Estimating cumulative risk and assessment of Weibull model fit.

During follow-up, histologic outcomes are only known to have occurred between a prior screening visit and the final biopsy visit at which CIN2+ was diagnosed (“interval censoring”). We assign each CIN2+ to have occurred between the second-to-last screening visit and the biopsy visit where the CIN2+ was diagnosed.

5

The second-to-last screening visit is the earliest time before which the outcome could not plausibly have occurred because the false-negative rate of cotesting is very low.¹ All women prevalently diagnosed with CIN2+ at enrollment were removed when estimating future risk post-enrollment. Any CIN2 diagnosed during follow-up censored CIN3 or cancer outcomes.

10

Cumulative risk was estimated as (1-p)r + p where p is the risk at enrollment and r is the post-enrollment risk accrued over time as estimated by the Weibull survival model. We parameterize the post-enrollment risk as r 1exp( (t/ )^) where  is the Weibull scale and  is the Weibull shape.

We estimated variance on the complementary log-log survival scale (log(-log(S(t))) to estimate 95%

15

confidence intervals and then transformed the intervals back to the cumulative risk scale. S(t) is one minus the cumulative risk. Thus log(-log(S(t))) is log( log( 1 p) (t/ )^). Using standard delta-method

arguments, the variance of log(-log(S(t))) is  













2

) / ( ) 1 log(

1

 

t p



















 



 



 











  















 



 



 



 









 ^ ( ) log ( ) 2 ^ log ( , )

) 1

( ¹

2 2

1  



 



 



 ^



 



Cov t t

t Var

t t

Var t

p n

p

where n is the number of women in the co-test category of interest.

20

However, SAS Proc LIFEREG parameterizes the Weibull as   e x p() ( is the ‘intercept’) and

 1 / ( is the ‘scale’) and returns variances and covariances on this scale. Using the delta-method, we find:V a r( )e x p( 2)V a r(),V a r( ) V a r() /⁴, and

( , ) ( ) ( , ) / 2

C o v    e x p  C o v    .

25

To facilitate estimation of cumulative incidence and regression modeling over time under interval

censoring, we used Weibull survival models². The Weibull model can be a reasonable model when follow- up times are short relative to a woman’s lifespan. Under the Weibull model, the log of the cumulative incidence is approximately linear in the log of follow-up time (the multistage model of carcinogenesis proposed by Armitage and Doll³ also has this property). We assessed this linearity with plots of the

30

complementary log-log of the survival against log time for each model and saw no reason to doubt linearity (plots not shown).

We also assessed fit of the Weibull model by comparing the Weibull cumulative incidence estimates to the sum of the incidence at enrollment (or at the return visit following an enrollment HPV- /Pap-) and the non-parametric cumulative incidence estimate based on the Turnbull algorithm⁴. The

35

Turnbull estimates provide valid cumulative incidence estimates regardless of the true parametric shape of the curve for interval censoring (just like Kaplan-Meier does for the standard case of right censoring) and so we can compare the Weibull estimates to the Turnbull to assess the fit of the Weibull estimates to the data. We prefer the Weibull estimates because the Turnbull estimates may not exist for certain intervals (depending on whether any cases were interval censored in those years), and when they do exist, can be flat

40

for years at a time with sudden spikes upwards when there is a cluster of cases. In contrast, the Weibull smoothes the peaks and valleys and produces more useful and accurate cumulative incidence estimates when the underlying Weibull models is reasonable. The Weibull model also facilitates regression modeling under interval censoring.

45

(2)

2

References

1. Belinson J, Qiao YL, Pretorius R, et al. Shanxi Province Cervical Cancer Screening Study: a cross-sectional comparative trial of multiple techniques to detect cervical neoplasia. Gynecol Oncol. Nov 2001;83(2):439-444.

5

2. Lawless JF. Statistical models and methods for lifetime data. 2nd ed. Hoboken, N.J.: Wiley- Interscience; 2003.

3. Armitage P, Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer. Mar 1954;8(1):1-12.

4. Turnbull B. The empirical distribution function with arbitrarily grouped, censored and truncated

10

data. J. Roy. Stat. Soc. B. 1976;38:290-295.