Estimating differences and ratios in median times to event

(1)

eAppendix: Estimating differences and ratios in median times to event

Elizabeth T. Rogawski

¹

Daniel J. Westreich

¹

Gagandeep Kang

²

Honorine D. Ward

³

Stephen R. Cole

¹

1

Department of Epidemiology, University of North Carolina – Chapel Hill, North Carolina

2

Division of Gastrointestinal Sciences, Christian Medical College, Vellore, India

3

Division of Geographic Medicine and Infectious Diseases, Tufts Medical Center, Boston

eFigure 1. Inverse probability-weighted Kaplan-Meier curves with age as the timescale for time to second diarrhea episode by exclusive breastfeeding at the first episode among 982 children in birth cohorts from Vellore, Tamil Nadu, India 2002-2013. A – Crude (unweighted by

covariates); B – Weighted by malnutrition status at first episode, child sex, low birth weight,

socioeconomic status, maternal education, and household hygiene. Note that crude and

adjusted results were similar since using age as the timescale accounted for confounding by

age.

(2)

The macro is provided on the following page. Users can call the macro with the following code (arguments provided are the defaults):

%time(data=_last_,id=,exp=,covs=,time=,event=,cw=1,drop=,trunclow=0,trunchigh

=100,p=0.5,risktime=,b=2000,seed=12345);run;

Further details about the required arguments and analysis steps are provided at the beginning of the annotated SAS code below. For the user-input time variable, the time at baseline should be coded as 0, and no events should occur at baseline (t=0). While the macro allows the user to define the number of resamples, we recommend a minimum of 2000 resamples, beyond which we found no difference in the estimated confidence intervals at the appropriate number of significant digits to convey precision for each measure. Two confidence intervals are produced to allow users to compare and report as they prefer. We generally recommend reporting the confidence interval derived from the 2.5

^th

and 97.5

^th

percentiles of the bootstrap replicates since this interval is nonparametric and does not require the assumption of normal distribution of estimates. Code to generate simulated time-to-event data to use with the macro is provided following the macro.

The alternative macro designed for discrete time data can be called with the following code:

%distime(data=_last_,id=,exp=,covs=,time=,event=,cw=1,drop=,trunclow=0,trunch igh=100,p=0.5,risktime=,b=200,seed=12345);run;

For this analysis, users must download a previously published macro

¹

to generate restricted quadratic splines available at a URL provided in the notes at the beginning of the macro code.

1. Howe CJ, Cole SR, Westreich DJ, Greenland S, Napravnik S, Eron JJ Jr. Splines for trend analysis and continuous confounder control. Epidemiology. 2011;22(6):874-875.

doi:10.1097/EDE.0b013e31823029dd.

(3)

/****************************************************************************

Macro to estimate semi-parametric time differences/ratios and risk differences/ratios

(with time-to-event data, time-fixed covariates, one observation per subject) User names in the following order:

- data = dataset name

- id = unique identifier for subjects

- exp = exposure variable (coded 0=unexposed, 1=exposed) - covs = covariates list separated by a space (no commas)

- time = time to event (coded baseline=0; no events can occur at time=0) - event = indicator for event (coded 0=censored/drop-out, 1=event)

- cw = indicator for macro to include censoring weights: 1=yes; 0=no (default

= 1 (yes))

- drop = indicator for dropout (coded 1=dropout, 0=not drop-out) - trunclow = lower truncation percentile for weights (default = 0) - trunchigh = higher truncation percentile for weights (default = 100) - p = percentile for calculating contrast (default = 50%)

- risktime = time to calculate risk difference and risk ratio

- b = # bootstraps (default = 2000; we recommend at least 2000 resamples in final estimation, but suggest optimizing the analysis with fewer resamples to save computing time)

- seed = seed for random number generation

Macro outputs: Time difference and time ratio at p%, and risk difference and risk ratio at t=risktime with 95% CI derived from distribution and standard deviation of bootstrapped estimates

Macro performs the following steps:

1. Check and abort if user-requested survival percentile (p) is larger than the survival proportion at the end of the study in one or both exposure groups

2. Check and abort if user-requested time to calculate risk difference/ratio (risktime) is smaller than total study time

3. Resample data set of size n with replacement b times for bootstrapping 95%

CIs

4. Calculate treatment weights as P(exposure|covariates) based on a logistic model

5. Calculate censoring weights as P(drop|covariates) in time deciles based on a logistic model if cw=1

6. Truncate IPTC weights

7. Draw weighted Kaplan-Meier curves for exposed and unexposed groups (recheck conditions in steps 1 and 2 on weighted data)

8. Calculate time difference and time ratio at p%

9. Calculate risk difference and ratio at t=risktime

10. Report 2.5th and 97.5th percentile of distribution of bootstrapped estimates as lower and upper 95% confidence limits respectively

11. Calculate 95% CI using standard deviation of bootstrapped estimates as standard error of original point estimate

12. Print results

13. Warning is printed in the log file if the number of bootstraps included in the confidence intervals is less than requested due to failures in resamples with insufficient events

Note on weight truncation:

(4)

The default truncation percentiles for the IP weights correspond to no truncation. No truncation will often be appropriate since the exposure weights are estimated in the time-fixed setting (baseline covariates only), in which the weights should be fairly stable. However, the censoring weights are time-varying (in time deciles) and may be less stable due to the

multiplication of weights over time. When incorporating the censoring weights, we recommend truncating at the 0.5th and 99.5th percentiles will likely be appropriate. The mean of stabilized weights should be close to 1.

Note on bootstrap resamples:

While the macro allows the user to define the number of resamples, we recommend 2000 resamples as the minimum for final analysis. However, we recommend optimizing the analysis with fewer resamples in the interest of computing time.

Note on confidence intervals:

Two confidence intervals are produced to allow users to compare and report as they prefer. We generally recommend reporting the confidence interval derived from the 2.5^th and 97.5^th percentiles of the bootstrap replicates since this interval does not require the assumption of normal distribution of estimates.

****************************************************************************/

%macro

time(data=_last_,id=,exp=,covs=,time=,event=,cw=1,drop=,trunclow=0,trunchigh=

100,p=0.5,risktime=,b=2000,seed=12345);

*1. Abort immediately if user-requested p is smaller than the survival proportion at the end of the study;

proc lifetest data=&data plots=none method=pl outsurv=test noprint;

time &time*&event(0);

strata &exp;

run;

data test2;

set test;

where . < survival <= &p;

run;

proc sort data=test2; by stratum &time;run;

data test3;

set test2;

by stratum ;

if first.stratum then do;

output;

end;

run;

%LET dsid=%SYSFUNC(OPEN(test3));

%LET nobs=%SYSFUNC(ATTRN(&dsid.,NOBS));

%LET rc=%SYSFUNC(CLOSE(&dsid.));

%IF &Nobs. EQ 0 or &Nobs. EQ 1 %THEN %DO; %PUT ERROR: p TOO SMALL. User-

requested survival percentile (p) smaller than the survival proportion at the end of the study in one or both exposure groups.; %ABORT cancel; %END;

*2. Abort immediately if user-requested risktime is larger than total study time;

data test4;

(5)

set test;

diff = abs(&time-(&risktime));

run;

proc sort data=test4; by stratum diff;run;

data test5;

set test4;

by stratum ;

if first.stratum then do;

output;

end;

run;

data test6;

set test5;

where survival ~= . ; run;

%IF &Nobs. EQ 0 or &Nobs. EQ 1 %THEN %DO; %PUT ERROR: risktime TOO LARGE.

User-requested time to calculate risk difference/ratio (risktime) larger than total study time.; %ABORT cancel; %END;

*3. Resample dataset for bootstrapping;

proc surveyselect data=&data noprint

method=urs outhits samprate=1 reps=&b out=b seed=&seed;

run;

*Concatenate original data with resampled data;

data &data;

replicate =0;

set &data;

run;

data c;

set &data b;

run;

*4. Calculate treatment weights for original and resampled data;

proc logistic descending data=c noprint;

model &exp=;

output out=o1 prob=tn;

by replicate;

run;

proc logistic descending data=c noprint;

model &exp=&covs;

output out=o2 prob=td;

by replicate;

run;

proc sort data=o1; by replicate &id; proc sort data=o2; by replicate &id;run;

data o12;

merge o1 o2;

by replicate &id;

run;

data d(drop=_level_ tn td);

set o12;

by replicate &id;

(6)

if &exp=0 then do;

tn=1-tn; td=1-td;

end;

tw=1/td;

stw=tn/td;

run;

*5. Expand data set to deciles to add censoring weights;

%if &cw = 1 %then %do;

data e;

set d;

by replicate &id;

do dec=1 to 10;

output;

end;

run;

*Output deciles of time based on distribution of dropouts;

proc univariate data=d noprint; var &time; output out = dropdec pctlpts = 10 20 30 40 50 60 70 80 90 pctlpre = drop; where replicate = 0 and &drop

= 1;run;

proc univariate data=d noprint; var &time; output out = dec pctlpts = 0 100 pctlpre = time; where replicate = 0;run;

*Add to main data;

proc transpose data=dropdec out=dropdec2;run; data dropdec2; set dropdec2;

keep col1; run;

proc transpose data=dec out=dec2; run; data dec2; set dec2; keep col1; run;

data dropdec2b; set dropdec2 dec2; run;

proc sort data=dropdec2b; by col1;run;

data dropdec3;

set dropdec2b;

retain dec;

dec + 1;

run;

data dropdec4;

set dropdec3;

dec = dec+1;

rename col1 = col2;

label col1 = col2;

run;

data dropdec5;

merge dropdec3 dropdec4;

by dec;

if dec = 1 then col2 = 0;

if dec = 12 or dec = 1 then delete; run;

data dropdec5b;

set dropdec5;

dec = dec - 1;

rename col2 = in col1 = end;

label col2 = in col1 = end; run;

data dropdec5c;

set dropdec5b;

if dec = 1 then in = 0;

run;

(7)

proc sort data=e; by dec; run; proc sort data=dropdec5c; by dec;run;

data f;

merge e dropdec5c;

by dec;

run;

proc sort data=f; by replicate &id dec; run;

*Create discretized event (y2) and drop (drop2) variables;

data g;

set f;

if &time > end then do;

y2 = 0; drop2 = 0; out = end; end;

else if ((in < &time <= end) and &event = 1) then do;

y2 = 1; drop2 = 0; out = &time; end;

else if ((in < &time <= end) and &drop = 1) then do;

else if ((in < &time <= end) and &drop = 0 and &event = 0) then do;

if &time <= in then delete;

run;

*Calculate censoring weights;

proc logistic data=g desc noprint; *pooled logistic model for dropout;

class dec;

model drop2=dec &exp /rl;

output out=nd p=nd;

by replicate;

run;

data nd;

set nd;

label nd=nd;

keep replicate &id dec nd; run;

proc logistic data=g desc noprint;

class dec;

model drop2=dec &exp &covs/rl;

output out=dd p=dd;

by replicate;

run;

data dd;

set dd;

label dd=dd;

keep replicate &id dec dd;run;

data h;

merge g nd dd;

by replicate &id dec;

run;

proc sort data=h; by replicate &id dec numberhits;run;

data h2;

set h;

by replicate &id dec numberhits;

retain hit;

if first.numberhits then hit = 0;

hit = hit + 1;

run;

(8)

proc sort data=h2; by replicate &id hit dec;run;

data h3;

set h2;

by replicate &id hit;

retain numd dend;

if first.hit then do; numd=1; dend=1; end;

if drop2=0 then do;

numd=numd*(1-nd);

dend=dend*(1-dd);

end;

if drop2=1 then do;

numd=numd*nd;

dend=dend*dd;

end;

dw=numd/dend; *censoring weights;

fw=stw*dw; *product of exposure weights and censoring weights;

run;

*6. Truncate IPTC weights;

proc univariate data =h3 noprint;var fw; output out=fw pctlpts=&trunclow

&trunchigh pctlpre=fw

pctlname=low high;where replicate =0 ; run;

data h4;

set h3;

if _n_ eq 1 then do;

set fw;

end;

run;

data h5;

set h4;

if fw > fwhigh then fw = fwhigh;

else if . < fw < fwlow then fw = fwlow;

run;

proc sort data=h5; by replicate &id hit dec;run;

*7. Draw weighted KM curves;

proc phreg data=h5 noprint;

model (in,out)*y2(0)= /rl;

weight fw;

strata &exp;

baseline out=i survival=survival;

by replicate;

run;

%end;

%else %if &cw = 0 %then %do;

data h3;

set d;

fw = stw;

in = 0;

out = &time;

run;

(9)

*6. Truncate IPT weights;

proc univariate data =h3 noprint;var fw; output out=fw pctlpts=&trunclow

data h4;

set h3;

set fw;

end;

run;

data h5;

set h4;

run;

*7. Draw weighted KM curves;

proc phreg data=h5 noprint;

model (in,out)*&event(0)= /rl;

weight fw;

strata &exp;

baseline out=i survival=survival;

by replicate;

run;

%end;

*Output time at p% survival for each exposure group (smallest time at which S(t) <= p);

data j;

set i;

where survival <= &p;

run;

proc sort data=j; by replicate &exp out;run;

data k;

set j;

by replicate &exp ; if first.&exp then do;

output;

end;

run;

*Macro aborts if survival percentile smaller than total survival proportion in either exposure group (recheck after weighting);

data test7;

set k;

where . < survival <= &p and replicate = 0;

run;

(10)

*Record number of bootstraps included in time diff/ratio confidence intervals;

data numcheck;

set k;

where replicate ~=0;

run;

proc sort data=numcheck; by &exp; run;

data numcheck2;

set numcheck;

by &exp;

retain repnum;

if first.&exp then repnum = 0;

repnum = repnum + 1;

if last.&exp then output;

keep repnum;

run;

proc transpose data=numcheck2 out=numcheck3;

var repnum;

run;

*8. Calculate time difference and time ratio at p%;

DATA l;

SET k;

timediff = out - lag(out);

timeratio = out/lag(out);

RUN;

data m;

set l;

if &exp = 0 then delete;

lntimeratio = log(timeratio);

run;

data pointest;

set m;

if replicate ~= 0 then delete;

keep timediff timeratio;

run;

*9. Calculate risk difference and ratio at t=risktime;

data n;

set i;

diff = abs(out-(&risktime));

run;

proc sort data=n; by replicate &exp diff;run;

data o;

set n;

output;

end;

run;

(11)

*Record number of bootstraps included in risk diff/ratio confidence intervals;

data numcheck5;

set o;

if survival = . then delete;

run;

proc sort data=numcheck5; by &exp; run;

data numcheck6;

set numcheck5;

by &exp;

retain repnum;

keep repnum;

run;

var repnum;

run;

DATA p;

SET o;

riskdiff = (1-survival) - (1-lag(survival));

riskratio = (1-survival)/(1-lag(survival));

RUN;

data q;

set p;

lnriskratio = log(riskratio);

run;

data riskpointest;

set q;

keep riskdiff riskratio;

run;

*10. Calculate 95% CIs from 2.5th and 97.5th percentiles of distribution of bootstrapped estimates (on log scale for time ratio);

proc univariate data=m noprint;

var timediff lntimeratio;

output out=Pctls pctlpts = 2.5 97.5

pctlpre = Difference Ratio pctlname = pct25 pct975;

where replicate ~= 0;

run;

data pctls;

set pctls;

expratiopct25 = exp(ratiopct25);

drop ratiopct25 ratiopct975;

run;

proc univariate data=q noprint;

var riskdiff lnriskratio;

(12)

output out=RiskPctls pctlpts = 2.5 97.5

pctlpre = RDifference RRatio pctlname = pct25 pct975;

run;

data RiskPctls;

set RiskPctls;

expriskratiopct25 = exp(rratiopct25);

drop rratiopct25 rratiopct975;

run;

*11. Calculate 95% CI using standard deviation of bootstrapped estimates as standard error of original point estimate;

proc means data=m stddev noprint;

var timediff;

output out=stddev1 stddev=diffstd;

run;

proc means data=m mean stddev noprint;

var lntimeratio;

output out=stddev2 stddev=lnratiostd;

run;

proc means data=q mean stddev noprint;

var riskdiff;

output out=stddev3 stddev=riskdiffstd;

run;

proc means data=q mean stddev noprint;

var lnriskratio;

output out=stddev4 stddev=lnriskratiostd;

run;

data stddev1; set stddev1; drop _type_ _freq_; run; data stddev2; set stddev2; drop _type_ _freq_; run; data stddev3; set stddev3; drop _type_

_freq_; run; data stddev4; set stddev4; drop _type_ _freq_; run;

data stddev; *calculate CIs;

merge pointest riskpointest stddev1 stddev2 stddev3 stddev4;

difflowerCL = timediff - 1.96*diffstd;

diffupperCL = timediff + 1.96*diffstd;

ratiolowerCL = exp(log(timeratio)-1.96*lnratiostd);

ratioupperCL = exp(log(timeratio)+1.96*lnratiostd);

riskdifflowerCL = riskdiff - 1.96*riskdiffstd;

riskdiffupperCL = riskdiff + 1.96*riskdiffstd;

riskratiolowerCL = exp(log(riskratio)-1.96*lnriskratiostd);

riskratioupperCL = exp(log(riskratio)+1.96*lnriskratiostd);

drop diffstd lnratiostd riskdiffstd lnriskratiostd;

run;

*12. Print results;

data all;

merge stddev pctls riskpctls;

run;

data results;

set all;

(13)

timediffCL = PUT(timediff,6.3) !! ' (' !! PUT(differencepct25,6.3) !!

', ' !! PUT(differencepct975,6.3) !! ')';

timeratioCL = PUT(timeratio,6.3) !! ' (' !! PUT(expratiopct25,6.3) !!

', ' !! PUT(expratiopct975,6.3) !! ')';

timediffSD = PUT(timediff,6.3) !! ' (' !! PUT(difflowerCL,6.3) !! ', '

!! PUT(diffupperCL,6.3) !! ')';

timeratioSD = PUT(timeratio,6.3) !! ' (' !! PUT(ratiolowerCL,6.3) !! ', ' !! PUT(ratioupperCL,6.3) !! ')';

riskdiffCL = PUT(riskdiff,6.3) !! ' (' !! PUT(Rdifferencepct25,6.3) !!

', ' !! PUT(Rdifferencepct975,6.3) !! ')';

riskratioCL = PUT(riskratio,6.3) !! ' (' !! PUT(expriskratiopct25,6.3)

!! ', ' !! PUT(expriskratiopct975,6.3) !! ')';

riskdiffSD = PUT(riskdiff,6.3) !! ' (' !! PUT(riskdifflowerCL,6.3) !!

', ' !! PUT(riskdiffupperCL,6.3) !! ')';

riskratioSD = PUT(riskratio,6.3) !! ' (' !! PUT(riskratiolowerCL,6.3)

!! ', ' !! PUT(riskratioupperCL,6.3) !! ')';

label timediffCL="Time difference (95% CI)" timeratioCL="Time ratio (95% CI)" timediffSD="Time difference (Wald 95% CI)"

timeratioSD="Time ratio (Wald 95% CI)" riskdiffCL="Risk difference (95% CI)" riskratioCL="Risk ratio (95% CI)"

riskdiffSD="Risk difference (Wald 95% CI)"

riskratioSD="Risk ratio (Wald 95% CI)";

keep timediffCL timeratioCL timediffSD timeratioSD riskdiffCL riskratioCL riskdiffSD riskratioSD;

run;

proc print data=results noobs label;

var timediffCL timediffSD timeratioCL timeratioSD riskdiffCL riskdiffSD riskratioCL riskratioSD;

run;

*Print weighted KM curves;

goptions reset=all;

axis1 label=(angle=90) order=(0 to 1 by .2) offset=(0,0);

symbol1 v=none i=steplj c=black line=1 width=2;

proc gplot data=i;

label survival='Weighted survival';

label out='Time';

plot survival*out=&exp / vaxis=axis1;

where replicate = 0;

run; quit;

*13. Warning is printed in the log file if the number of bootstraps included in the confidence intervals is less than requested due to failures in

resamples with insufficient events;

data numcheck4;

set numcheck3;

reps = min (col1, col2);

repsdiff = &b - reps;

if repsdiff > 0 then put reps 'WARNING: Only ' reps ' bootstrap

resamples included in time difference and ratio confidence intervals (fewer than requested by ' repsdiff ' replicates).';

label reps = Number of bootstrap samples included in confidence intervals;

run;

(14)

data numcheck8;

set numcheck7;

resamples included in risk difference and ratio confidence intervals (fewer than requested by ' repsdiff ' replicates).';

run;

%mend time;

(15)

/****************************************************************************

Code to generate example data to run macro to estimate semi-parametric time differences/ratios and risk differences/ratios

****************************************************************************/

data ex1;

lambdat = 0.5; *baseline hazard;

do i = 1 to 500;

x=0; *unexposed;

v=rand("Bernouilli",0.3); *confounder;

if v = 1 then d=rand("Bernouilli",0.25);

else if v = 0 then d=rand("Bernouilli",0.05);

*drop out (associated with confounder);

z=normal(0);

output;

end;

do i = 501 to 1000;

x=1; *exposed;

v=rand("Bernouilli",0.65); *confounder;

if v = 1 then d=rand("Bernouilli",0.15);

else if v = 0 then d=rand("Bernouilli",0.2);

*drop out (associated with confounder);

z=normal(1);

output;

end;

run;

data ex2;

set ex1;

*association of exposure and confounder with outcome;

linpred = exp(0.5*x + 0.8*v + 0.1*z);

*generate time of event;

t = rand("WEIBULL", 1, lambdaT * linpred);

run;

data example;

set ex2;

*create event variable;

if d = 0 and t <=2 then event = 1;

*administratively censor at t=2;

if t > 2 then do; event = 0; d = 0; t = 2; end;

if d = 1 then event = 0;

run;

*Call macro;

*No censoring weights;

*Adjusted;

%time(data=example,id=i,exp=x,covs=v

z,time=t,event=event,cw=0,drop=d,trunclow=0.5,trunchigh=99.5,p=0.5,risktime=1 ,b=2000,seed=12953);run;

*Crude;

%time(data=example,id=i,exp=x,covs=

,time=t,event=event,cw=0,drop=d,trunclow=0.5,trunchigh=99.5,p=0.5,risktime=1, b=2000,seed=12953);run;

*Censoring weights;

(16)

*Adjusted;

%time(data=example,id=i,exp=x,covs=v

z,time=t,event=event,cw=1,drop=d,trunclow=0.5,trunchigh=99.5,p=0.5,risktime=1 ,b=2000,seed=12953);run;

*Crude;

%time(data=example,id=i,exp=x,covs= ,

2000=t,event=event,cw=1,drop=d,trunclow=0.5,trunchigh=99.5,p=0.5,risktime=1,b

=100,seed=12953);run;

*Cox model for comparison;

proc phreg data=example ;

model t*event(0)= x v z/rl;

run;

proc phreg data=example ; model t*event(0)= x /rl;

run;

(17)

/****************************************************************************

Macro to estimate semi-parametric time differences/ratios and risk differences/ratios

(discrete person-time data, time-fixed covariates, multiple observations per subject)

User names in the following order:

- data = dataset name

- id = unique identifier for subjects

- exp = exposure variable (coded 0=unexposed, 1=exposed) - covs = covariates list separated by a space (no commas) - time = discretized time to event

- event = indicator for event (coded 0=censored/drop-out, 1=event)

- cw = indicator for macro to include censoring weights: 1=yes; 0=no (default

= 1 (yes))

- drop = indicator for dropout (coded 1=dropout, 0=no drop-out) - trunclow = low weight truncation percentile

- trunchigh = high weight truncation percentile

- p = percentile for calculating contrast (default = 50%) - risktime = time to calculate risk difference and risk ratio - b = # bootstraps (default = 200)

- seed = seed for random number generation

Macro outputs: Time difference and time ratio at p%, and risk difference and risk ratio at t=risktime with 95% CI derived from distribution and standard deviation of bootstrapped estimates

Macro performs the following steps:

1. Check and abort if user-requested survival percentile (p) is larger than the survival proportion at the end of the study in one or both exposure groups

2. Check and abort if user-requested time to calculate risk difference/ratio (risktime) is smaller than total study time

3. Create variables for a restricted quadratic spline for time with 5 knots;

4. Resample data set of size n with replacement b times for bootstrapping 95% CIs

5. Calculate treatment weights as P(exposure|covariates) based on a logistic model

6. Calculate censoring weights as P(drop|covariates) in time deciles based on a logistic model if cw=1

7. Truncate IPTC weights

8. Calculate parameters for KM curves for exposed and unexposed groups *(recheck conditions in steps 1 and 2 on weighted data)

9. Calculate time difference and time ratio at p%

10. Calculate risk difference and ratio at t=risktime

11. Report 2.5th and 97.5th percentile of distribution of bootstrapped estimates as lower and upper 95% confidence limits respectively

12. Calculate 95% CI using standard deviation of bootstrapped estimates as standard error of original point estimate

13. Print results

14. Warning is printed in the log file if the number of bootstraps included in the confidence intervals is less than requested due to failures in resamples with insufficient events

****************************************************************************/

(18)

*NOTE: USER MUST EDIT FILE PATH BELOW TO INCLUDE RESTRICTED QUADRATIC SPLINE MACRO FROM Howe et al. Epidemiology 2011 Nov. 22(6):874-5.

available at:

http://download.lww.com/wolterskluwer_vitalstream_com/PermaLink/EDE/A/EDE_201 1_08_25_HOWE_201074_SDC1.pdf;

%include "C:\RQSmacro.sas";

%macro

distime(data=_last_,id=,exp=,covs=,time=,event=,cw=1,drop=,trunclow=0,trunchi gh=100,p=0.5,risktime=,b=200,seed=12345);

*1. Abort immediately if user-requested p is smaller than the survival proportion at the end of the study;

proc sort data=&data; by &exp &time; run;

proc univariate data=&data noprint;

var &event ; by &exp &time;

output out=test1 n=atRisk sum=totalDead ; run;

proc sort data=test1; by &exp &time;run;

data test2;

set test1;

by &exp &time;

retain km0 ;

if first.&exp and first.&time then do;

km0=1; kmw=1; end;

else do;

if km0 ne 0 then km0 = km0*(1-totalDead/atRisk);

*unweighted survival probability;

end;

run;

data test3;

set test2;

where . < km0 <= &p;

run;

proc sort data=test3; by &exp &time;run;

data test4;

set test3;

by &exp ;

if first.&exp then do;

output;

end;

run;

*2. Abort immediately if user-requested risktime is larger than total study time;

proc sort data=test2; by &time;run;

data test5;

(19)

set test2;

by &time;

if last.&time then output;

run;

data test6;

set test5;

if &time < &risktime then delete;

run;

%IF &Nobs. EQ 0 or &Nobs. EQ 1 %THEN %DO; %PUT ERROR: risktime TOO LARGE.

User-requested time to calculate risk difference/ratio (risktime) larger than total study time.; %ABORT cancel; %END;

*3. Create variables for a restricted quadratic spline for time with 5 knots;

%rqspline(data=&data,x=&time,event=&event,k=5,equal=1,cases=1);

*4. Resample data set of size n with replacement b times for bootstrapping 95% CIs;

*Output unique subject IDs;

proc sort data = &data (keep=&id) nodupkey out=a;

by &id; run;

*Sample without replacement - only output 1 record per id (not multiple if multiple hits);

proc surveyselect data=a noprint

method=urs samprate=1 reps=&b out=b seed=&seed;

run;

*Expand original dataset to maximum size - all ids chosen in all b replicates;

data c;

set &data;

do replicate = 0 to &b by 1;

*replicate = 0 will be original data (not resampled);

output; end;

run;

*Merge sampled ids with expanded original dataset;

proc sort data=b; by replicate &id;run;proc sort data=c; by replicate

&id;run;

data d;

merge b c;

by replicate &id;run;

*Output multiple records if multiple hits;

data e;

set d;

if replicate = 0 then numberhits = 1;

if replicate ~= 0 and numberhits = . then delete;

do j = 0 to &b by 1;

if replicate = j then do;

do k = 1 to numberhits;

(20)

output;end;

end;end;

run;

*5. Calculate treatment weights as P(exposure|covariates) based on a logistic model;

proc logistic descending data=e noprint;

model &exp=;

output out=f1 prob=tn;

by replicate;

run;

proc logistic descending data=e noprint;

model &exp=&covs;

output out=f2 prob=td;

by replicate;

run;

proc sort data=f1; by replicate &id; proc sort data=f2; by replicate &id;run;

data f12;

merge f1 f2;

by replicate &id;

run;

data g(drop=j k _level_ tn td);

set f12;

by replicate &id;

if &exp=0 then do;

tn=1-tn; td=1-td;

end;

tw=1/td;

stw=tn/td;

run;

*6. Calculate censoring weights as P(drop|covariates) in time deciles based on a logistic model if cw=1;

%if &cw = 1 %then %do;

proc logistic data=g desc noprint; *pooled logistic model for dropout;

model &drop= &time _&time __&time ___&time ____&time /rl;

output out=nd p=nd;

by replicate;

run;

data nd;

set nd;

label nd=nd;

keep replicate &id &time nd; run;

proc logistic data=g desc noprint;

model &drop= &time _&time __&time ___&time ____&time &covs/rl;

output out=dd p=dd;

by replicate;

run;

data dd;

set dd;

label dd=dd;

keep replicate &id &time dd;run;

proc sort data=g; by replicate &id &time; run;proc sort data=nd; by replicate

&id &time; run;proc sort data=dd; by replicate &id &time; run;

(21)

data h;

merge g nd dd;

by replicate &id &time;

retain numd dend;

if first.&id then do; numd=1; dend=1; end;

if &drop=0 then do;

numd=numd*(1-nd);

dend=dend*(1-dd);

end;

if &drop=1 then do;

numd=numd*nd;

dend=dend*dd;

end;

dw=numd/dend; *censoring weights;

fw=stw*dw; *product of exposure weights and censoring weights;

run;

%end;

%else %if &cw = 0 %then %do;

data h;

set g;

fw = stw;

run;

%end;

*7. Truncate IPTC weights;

proc univariate data =h noprint;var fw; output out=fw pctlpts=&trunclow

data h2;

set h;

set fw;

end;

run;

data h3;

set h2;

run;

*8. Calculate parameters for KM curves;

*Weight the outcome;

data i;

set h3;

wgtoutcome =fw*&event;

run;

proc sort data=i; by replicate &exp &time; run;

proc univariate data=i noprint;

var fw &event wgtoutcome;

by replicate &exp &time;

output out=j n=atRisk n=nTemp1 n=nTemp2

sum=sumWgts sum=totalDead sum=wgtDead;

run;

data k; set j; drop nTemp1 nTemp2; run;

(22)

*Calculate final KM parameters;

proc sort data=k; by replicate &exp &time;run;

data FinalKMData;

set k;

by replicate &exp &time;

retain km0 kmw;

if first.&exp and first.&time then do;

km0=1; kmw=1; end;

else do;

if km0 ne 0 then km0 = km0*(1-totalDead/atRisk);

*unweighted survival prob;

if kmw ne 0 then kmw = kmw*(1-wgtDead/sumWgts);

*weighted survival prob;

end;

run;

*Output time at p% survival for each exposure group (smallest time at which S(t) <= p);

data l;

set FinalKMData;

where kmw <= &p;

run;

proc sort data=l; by replicate &exp &time;run;

data m;

set l;

by replicate &exp;

if first.&exp then do;

output;

end;

run;

*Macro aborts if survival percentile smaller than total survival proportion in either exposure group (recheck after weighting);

data test7;

set FinalKMData;

where . < kmw <= &p and replicate = 0;

run;

*Record number of bootstraps included in time diff/ratio confidence intervals;

data numcheck;

set m;

run;

proc sort data=numcheck; by &exp; run;

(23)

data numcheck2;

set numcheck;

by &exp;

retain repnum;

keep repnum;

run;

var repnum;

run;

*9. Calculate time difference and time ratio at p%;

DATA n;

SET m;

timediff = &time - lag(&time);

timeratio = &time/lag(&time);

RUN;

data o;

set n;

lntimeratio = log(timeratio);

run;

data pointest;

set o;

keep timediff timeratio;

run;

*10. Calculate risk difference and ratio at t=risktime;

data p;

set finalKMdata;

diff = abs(&time-(&risktime));

run;

proc sort data=p; by replicate &exp diff;run;

data q;

set p;

output;

end;

run;

DATA r;

SET q;

riskdiff = (1-kmw) - (1-lag(kmw));

riskratio = (1-kmw)/(1-lag(kmw));

RUN;

data s;

set r;

lnriskratio = log(riskratio);

(24)

run;

data riskpointest;

set s;

keep riskdiff riskratio;

run;

*Record number of bootstraps included in risk diff/ratio confidence intervals;

data numcheck4;

set q;

run;

proc sort data=numcheck4; by &exp; run;

data numcheck5;

set numcheck4;

by &exp;

retain repnum;

keep repnum;

run;

var repnum;

run;

*11. Calculate 95% CIs from 2.5th and 97.5th percentiles of distribution of bootstrapped estimates (on log scale for time ratio);

proc univariate data=o noprint;

var timediff lntimeratio;

output out=Pctls pctlpts = 2.5 97.5

pctlpre = Difference Ratio pctlname = pct25 pct975;

run;

data pctls;

set pctls;

drop ratiopct25 ratiopct975;

run;

proc univariate data=s noprint;

var riskdiff lnriskratio;

output out=RiskPctls pctlpts = 2.5 97.5

pctlpre = RDifference RRatio pctlname = pct25 pct975;

run;

data RiskPctls;

set RiskPctls;

drop rratiopct25 rratiopct975;

run;

(25)

*12. Calculate 95% CI using standard deviation of bootstrapped estimates as standard error of original point estimate;

proc means data=o stddev noprint;

var timediff;

output out=stddev1 stddev=diffstd;

run;

proc means data=o mean stddev noprint;

var lntimeratio;

output out=stddev2 stddev=lnratiostd;

run;

proc means data=s mean stddev noprint;

var riskdiff;

output out=stddev3 stddev=riskdiffstd;

run;

proc means data=s mean stddev noprint;

var lnriskratio;

output out=stddev4 stddev=lnriskratiostd;

run;

data stddev1; set stddev1; drop _type_ _freq_; run; data stddev2; set stddev2; drop _type_ _freq_; run; data stddev3; set stddev3; drop _type_

_freq_; run; data stddev4; set stddev4; drop _type_ _freq_; run;

data stddev; *calculate CIs;

merge pointest riskpointest stddev1 stddev2 stddev3 stddev4;

difflowerCL = timediff - 1.96*diffstd;

diffupperCL = timediff + 1.96*diffstd;

ratiolowerCL = exp(log(timeratio)-1.96*lnratiostd);

ratioupperCL = exp(log(timeratio)+1.96*lnratiostd);

riskdifflowerCL = riskdiff - 1.96*riskdiffstd;

riskdiffupperCL = riskdiff + 1.96*riskdiffstd;

riskratiolowerCL = exp(log(riskratio)-1.96*lnriskratiostd);

riskratioupperCL = exp(log(riskratio)+1.96*lnriskratiostd);

drop diffstd lnratiostd riskdiffstd lnriskratiostd;

run;

*13. Print results;

data all;

merge stddev pctls riskpctls;

run;

data results;

set all;

timediffCL = PUT(timediff,6.3) !! ' (' !! PUT(differencepct25,6.3) !!

', ' !! PUT(differencepct975,6.3) !! ')';

timeratioCL = PUT(timeratio,6.3) !! ' (' !! PUT(expratiopct25,6.3) !!

', ' !! PUT(expratiopct975,6.3) !! ')';

timediffSD = PUT(timediff,6.3) !! ' (' !! PUT(difflowerCL,6.3) !! ', '

!! PUT(diffupperCL,6.3) !! ')';

timeratioSD = PUT(timeratio,6.3) !! ' (' !! PUT(ratiolowerCL,6.3) !! ', ' !! PUT(ratioupperCL,6.3) !! ')';

riskdiffCL = PUT(riskdiff,6.3) !! ' (' !! PUT(Rdifferencepct25,6.3) !!

', ' !! PUT(Rdifferencepct975,6.3) !! ')';

riskratioCL = PUT(riskratio,6.3) !! ' (' !! PUT(expriskratiopct25,6.3)

!! ', ' !! PUT(expriskratiopct975,6.3) !! ')';

(26)

riskdiffSD = PUT(riskdiff,6.3) !! ' (' !! PUT(riskdifflowerCL,6.3) !!

', ' !! PUT(riskdiffupperCL,6.3) !! ')';

riskratioSD = PUT(riskratio,6.3) !! ' (' !! PUT(riskratiolowerCL,6.3)

!! ', ' !! PUT(riskratioupperCL,6.3) !! ')';

label timediffCL="Time difference (95% CI)" timeratioCL="Time ratio (95% CI)" timediffSD="Time difference (Wald 95% CI)"

timeratioSD="Time ratio (Wald 95% CI)" riskdiffCL="Risk difference (95% CI)" riskratioCL="Risk ratio (95% CI)"

riskdiffSD="Risk difference (Wald 95% CI)"

riskratioSD="Risk ratio (Wald 95% CI)";

keep timediffCL timeratioCL timediffSD timeratioSD riskdiffCL riskratioCL riskdiffSD riskratioSD;

run;

proc print data=results noobs label;

var timediffCL timediffSD timeratioCL timeratioSD riskdiffCL riskdiffSD riskratioCL riskratioSD;

run;

*Print weighted KM curves;

goptions reset=all;

axis1 label=(angle=90) order=(0 to 1 by .2) offset=(0,0);

proc gplot data=FinalKMData;

label kmw='Weighted survival';

label &time='Time';

plot kmw*&time=&exp / vaxis=axis1;

where replicate = 0;

run; quit;

*14. Warning is printed in the log file if the number of bootstraps included in the confidence intervals is less than requested due to failures in

resamples with insufficient events;

data numcheck4;

set numcheck3;

resamples included in time difference and ratio confidence intervals (fewer than requested by ' repsdiff ' replicates).';

run;

data numcheck7;

set numcheck6;

resamples included in risk difference and ratio confidence intervals (fewer than requested by ' repsdiff ' replicates).';

run;

%mend distime;